Rev 1737 | Go to most recent revision | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed
Rev 1737 | Rev 2666 | ||
---|---|---|---|
1 | Üßßß |
1 | ,''' |
2 | ÜÜÛÜÜ ÜÜÜÜ ÜÜÜÜÜ ÜÜÜ ÜÜ |
2 | ,,;,, ,,,, ,,,,, ,,, ,, |
3 | Û Û Û Û Û Û |
3 | ; ; ; ; ; ; |
4 | Û ÜßßßßÛ ßßßßÜ Û Û Û |
4 | ; ,''''; '''', ; ; ; |
5 | Û ßÜÜÜÜÛÜ ÜÜÜÜÜß Û Û Û |
5 | ; ',,,,;, ,,,,,' ; ; ; |
6 | 6 | ||
7 | flat assembler 1.66 |
7 | flat assembler 1.70 |
8 | Programmer's Manual |
8 | Programmer's Manual |
9 | 9 | ||
10 | 10 | ||
11 | Table of contents |
11 | Table of contents |
12 | ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ |
12 | ----------------- |
13 | 13 | ||
14 | Chapter 1 Introduction |
14 | Chapter 1 Introduction |
15 | 15 | ||
16 | 1.1 Compiler overview |
16 | 1.1 Compiler overview |
17 | 1.1.1 System requirements |
17 | 1.1.1 System requirements |
18 | 1.1.2 Executing compiler from command line |
18 | 1.1.2 Executing compiler from command line |
19 | 1.1.3 Compiler messages |
19 | 1.1.3 Compiler messages |
20 | 1.1.4 Output formats |
20 | 1.1.4 Output formats |
21 | 21 | ||
22 | 1.2 Assembly syntax |
22 | 1.2 Assembly syntax |
23 | 1.2.1 Instruction syntax |
23 | 1.2.1 Instruction syntax |
24 | 1.2.2 Data definitions |
24 | 1.2.2 Data definitions |
25 | 1.2.3 Constants and labels |
25 | 1.2.3 Constants and labels |
26 | 1.2.4 Numerical expressions |
26 | 1.2.4 Numerical expressions |
27 | 1.2.5 Jumps and calls |
27 | 1.2.5 Jumps and calls |
28 | 1.2.6 Size settings |
28 | 1.2.6 Size settings |
29 | 29 | ||
30 | Chapter 2 Instruction set |
30 | Chapter 2 Instruction set |
31 | 31 | ||
32 | 2.1 The x86 architecture instructions |
32 | 2.1 The x86 architecture instructions |
33 | 2.1.1 Data movement instructions |
33 | 2.1.1 Data movement instructions |
34 | 2.1.2 Type conversion instructions |
34 | 2.1.2 Type conversion instructions |
35 | 2.1.3 Binary arithmetic instructions |
35 | 2.1.3 Binary arithmetic instructions |
36 | 2.1.4 Decimal arithmetic instructions |
36 | 2.1.4 Decimal arithmetic instructions |
37 | 2.1.5 Logical instructions |
37 | 2.1.5 Logical instructions |
38 | 2.1.6 Control transfer instructions |
38 | 2.1.6 Control transfer instructions |
39 | 2.1.7 I/O instructions |
39 | 2.1.7 I/O instructions |
40 | 2.1.8 Strings operations |
40 | 2.1.8 Strings operations |
41 | 2.1.9 Flag control instructions |
41 | 2.1.9 Flag control instructions |
42 | 2.1.10 Conditional operations |
42 | 2.1.10 Conditional operations |
43 | 2.1.11 Miscellaneous instructions |
43 | 2.1.11 Miscellaneous instructions |
44 | 2.1.12 System instructions |
44 | 2.1.12 System instructions |
45 | 2.1.13 FPU instructions |
45 | 2.1.13 FPU instructions |
46 | 2.1.14 MMX instructions |
46 | 2.1.14 MMX instructions |
47 | 2.1.15 SSE instructions |
47 | 2.1.15 SSE instructions |
48 | 2.1.16 SSE2 instructions |
48 | 2.1.16 SSE2 instructions |
49 | 2.1.17 SSE3 instructions |
49 | 2.1.17 SSE3 instructions |
50 | 2.1.18 AMD 3DNow! instructions |
50 | 2.1.18 AMD 3DNow! instructions |
51 | 2.1.19 The x86-64 long mode instructions |
51 | 2.1.19 The x86-64 long mode instructions |
52 | 52 | 2.1.20 SSE4 instructions |
|
- | 53 | 2.1.21 AVX instructions |
|
- | 54 | 2.1.22 AVX2 instructions |
|
- | 55 | 2.1.23 Auxiliary sets of computational instructions |
|
- | 56 | 2.1.24 Other extensions of instruction set |
|
- | 57 | ||
53 | 2.2 Control directives |
58 | 2.2 Control directives |
54 | 2.2.1 Numerical constants |
59 | 2.2.1 Numerical constants |
55 | 2.2.2 Conditional assembly |
60 | 2.2.2 Conditional assembly |
56 | 2.2.3 Repeating blocks of instructions |
61 | 2.2.3 Repeating blocks of instructions |
57 | 2.2.4 Addressing spaces |
62 | 2.2.4 Addressing spaces |
58 | 2.2.5 Other directives |
63 | 2.2.5 Other directives |
59 | 2.2.6 Multiple passes |
64 | 2.2.6 Multiple passes |
60 | 65 | ||
61 | 2.3 Preprocessor directives |
66 | 2.3 Preprocessor directives |
62 | 2.3.1 Including source files |
67 | 2.3.1 Including source files |
63 | 2.3.2 Symbolic constants |
68 | 2.3.2 Symbolic constants |
64 | 2.3.3 Macroinstructions |
69 | 2.3.3 Macroinstructions |
65 | 2.3.4 Structures |
70 | 2.3.4 Structures |
66 | 2.3.5 Repeating macroinstructions |
71 | 2.3.5 Repeating macroinstructions |
67 | 2.3.6 Conditional preprocessing |
72 | 2.3.6 Conditional preprocessing |
68 | 2.3.7 Order of processing |
73 | 2.3.7 Order of processing |
69 | 74 | ||
70 | 2.4 Formatter directives |
75 | 2.4 Formatter directives |
71 | 2.4.1 MZ executable |
76 | 2.4.1 MZ executable |
72 | 2.4.2 Portable Executable |
77 | 2.4.2 Portable Executable |
73 | 2.4.3 Common Object File Format |
78 | 2.4.3 Common Object File Format |
74 | 2.4.4 Executable and Linkable Format |
79 | 2.4.4 Executable and Linkable Format |
75 | 80 | ||
76 | 81 | ||
- | 82 | ||
77 | Chapter 1 Introduction |
83 | Chapter 1 Introduction |
78 | ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ |
84 | ----------------------- |
79 | 85 | ||
80 | This chapter contains all the most important information you need to begin |
86 | This chapter contains all the most important information you need to begin |
81 | using the flat assembler. If you are experienced assembly language programmer, |
87 | using the flat assembler. If you are experienced assembly language programmer, |
82 | you should read at least this chapter before using this compiler. |
88 | you should read at least this chapter before using this compiler. |
83 | 89 | ||
84 | 90 | ||
85 | 1.1 Compiler overview |
91 | 1.1 Compiler overview |
86 | 92 | ||
87 | Flat assembler is a fast assembly language compiler for the x86 architecture |
93 | Flat assembler is a fast assembly language compiler for the x86 architecture |
88 | processors, which does multiple passes to optimize the size of generated |
94 | processors, which does multiple passes to optimize the size of generated |
89 | machine code. It is self-compilable and versions for different operating |
95 | machine code. It is self-compilable and versions for different operating |
90 | systems are provided. All the versions are designed to be used from the system |
96 | systems are provided. All the versions are designed to be used from the system |
91 | command line and they should not differ in behavior. |
97 | command line and they should not differ in behavior. |
92 | 98 | ||
93 | 99 | ||
94 | 1.1.1 System requirements |
100 | 1.1.1 System requirements |
95 | 101 | ||
96 | All versions require the x86 architecture 32-bit processor (at least 80386), |
102 | All versions require the x86 architecture 32-bit processor (at least 80386), |
97 | although they can produce programs for the x86 architecture 16-bit processors, |
103 | although they can produce programs for the x86 architecture 16-bit processors, |
98 | too. DOS version requires an OS compatible with MS DOS 2.0 and either true |
104 | too. DOS version requires an OS compatible with MS DOS 2.0 and either true |
99 | real mode environment or DPMI. Windows version requires a Win32 console |
105 | real mode environment or DPMI. Windows version requires a Win32 console |
100 | compatible with 3.1 version. |
106 | compatible with 3.1 version. |
101 | 107 | ||
102 | 108 | ||
103 | 1.1.2 Executing compiler from command line |
109 | 1.1.2 Executing compiler from command line |
104 | 110 | ||
105 | To execute flat assembler from the command line you need to provide two |
111 | To execute flat assembler from the command line you need to provide two |
106 | parameters - first should be name of source file, second should be name of |
112 | parameters - first should be name of source file, second should be name of |
107 | destination file. If no second parameter is given, the name for output |
113 | destination file. If no second parameter is given, the name for output |
108 | file will be guessed automatically. After displaying short information about |
114 | file will be guessed automatically. After displaying short information about |
109 | the program name and version, compiler will read the data from source file and |
115 | the program name and version, compiler will read the data from source file and |
110 | compile it. When the compilation is successful, compiler will write the |
116 | compile it. When the compilation is successful, compiler will write the |
111 | generated code to the destination file and display the summary of compilation |
117 | generated code to the destination file and display the summary of compilation |
112 | process; otherwise it will display the information about error that occurred. |
118 | process; otherwise it will display the information about error that occurred. |
113 | The source file should be a text file, and can be created in any text |
119 | The source file should be a text file, and can be created in any text |
114 | editor. Line breaks are accepted in both DOS and Unix standards, tabulators |
120 | editor. Line breaks are accepted in both DOS and Unix standards, tabulators |
115 | are treated as spaces. |
121 | are treated as spaces. |
116 | In the command line you can also include "-m" option followed by a number, |
122 | In the command line you can also include "-m" option followed by a number, |
117 | which specifies how many kilobytes of memory flat assembler should maximally |
123 | which specifies how many kilobytes of memory flat assembler should maximally |
118 | use. In case of DOS version this options limits only the usage of extended |
124 | use. In case of DOS version this options limits only the usage of extended |
119 | memory. The "-p" option followed by a number can be used to specify the limit |
125 | memory. The "-p" option followed by a number can be used to specify the limit |
120 | for number of passes the assembler performs. If code cannot be generated |
126 | for number of passes the assembler performs. If code cannot be generated |
121 | within specified amount of passes, the assembly will be terminated with an |
127 | within specified amount of passes, the assembly will be terminated with an |
122 | error message. The maximum value of this setting is 65536, while the default |
128 | error message. The maximum value of this setting is 65536, while the default |
123 | limit, used when no such option is included in command line, is 100. |
129 | limit, used when no such option is included in command line, is 100. |
124 | It is also possible to limit the number of passes the assembler |
130 | It is also possible to limit the number of passes the assembler |
125 | performs, with the "-p" option followed by a number specifying the maximum |
131 | performs, with the "-p" option followed by a number specifying the maximum |
126 | number of passes. |
132 | number of passes. |
127 | There are no command line options that would affect the output of compiler, |
133 | There are no command line options that would affect the output of compiler, |
128 | flat assembler requires only the source code to include the information it |
134 | flat assembler requires only the source code to include the information it |
129 | really needs. For example, to specify output format you specify it by using |
135 | really needs. For example, to specify output format you specify it by using |
130 | the "format" directive at the beginning of source. |
136 | the "format" directive at the beginning of source. |
131 | 137 | ||
132 | 138 | ||
133 | 1.1.3 Compiler messages |
139 | 1.1.3 Compiler messages |
134 | 140 | ||
135 | As it is stated above, after the successful compilation, the compiler displays |
141 | As it is stated above, after the successful compilation, the compiler displays |
136 | the compilation summary. It includes the information of how many passes was |
142 | the compilation summary. It includes the information of how many passes was |
137 | done, how much time it took, and how many bytes were written into the |
143 | done, how much time it took, and how many bytes were written into the |
138 | destination file. |
144 | destination file. |
139 | The following is an example of the compilation summary: |
145 | The following is an example of the compilation summary: |
140 | 146 | ||
141 | flat assembler version 1.66 |
147 | flat assembler version 1.70 (16384 kilobytes memory) |
142 | 38 passes, 5.3 seconds, 77824 bytes. |
148 | 38 passes, 5.3 seconds, 77824 bytes. |
143 | 149 | ||
144 | In case of error during the compilation process, the program will display an |
150 | In case of error during the compilation process, the program will display an |
145 | error message. For example, when compiler can't find the input file, it will |
151 | error message. For example, when compiler can't find the input file, it will |
146 | display the following message: |
152 | display the following message: |
147 | 153 | ||
148 | flat assembler version 1.66 |
154 | flat assembler version 1.70 (16384 kilobytes memory) |
149 | error: source file not found. |
155 | error: source file not found. |
150 | 156 | ||
151 | If the error is connected with a specific part of source code, the source line |
157 | If the error is connected with a specific part of source code, the source line |
152 | that caused the error will be also displayed. Also placement of this line in |
158 | that caused the error will be also displayed. Also placement of this line in |
153 | the source is given to help you finding this error, for example: |
159 | the source is given to help you finding this error, for example: |
154 | 160 | ||
155 | flat assembler version 1.66 |
161 | flat assembler version 1.70 (16384 kilobytes memory) |
156 | example.asm [3]: |
162 | example.asm [3]: |
157 | mob ax,1 |
163 | mob ax,1 |
158 | error: illegal instruction. |
164 | error: illegal instruction. |
159 | 165 | ||
160 | It means that in the third line of the "example.asm" file compiler has |
166 | It means that in the third line of the "example.asm" file compiler has |
161 | encountered an unrecognized instruction. When the line that caused error |
167 | encountered an unrecognized instruction. When the line that caused error |
162 | contains a macroinstruction, also the line in macroinstruction definition |
168 | contains a macroinstruction, also the line in macroinstruction definition |
163 | that generated the erroneous instruction is displayed: |
169 | that generated the erroneous instruction is displayed: |
164 | 170 | ||
165 | flat assembler version 1.66 |
171 | flat assembler version 1.70 (16384 kilobytes memory) |
166 | example.asm [6]: |
172 | example.asm [6]: |
167 | stoschar 7 |
173 | stoschar 7 |
168 | example.asm [3] stoschar [1]: |
174 | example.asm [3] stoschar [1]: |
169 | mob al,char |
175 | mob al,char |
170 | error: illegal instruction. |
176 | error: illegal instruction. |
171 | 177 | ||
172 | It means that the macroinstruction in the sixth line of the "example.asm" file |
178 | It means that the macroinstruction in the sixth line of the "example.asm" file |
173 | generated an unrecognized instruction with the first line of its definition. |
179 | generated an unrecognized instruction with the first line of its definition. |
174 | 180 | ||
175 | 181 | ||
176 | 1.1.4 Output formats |
182 | 1.1.4 Output formats |
177 | 183 | ||
178 | By default, when there is no "format" directive in source file, flat |
184 | By default, when there is no "format" directive in source file, flat |
179 | assembler simply puts generated instruction codes into output, creating this |
185 | assembler simply puts generated instruction codes into output, creating this |
180 | way flat binary file. By default it generates 16-bit code, but you can always |
186 | way flat binary file. By default it generates 16-bit code, but you can always |
181 | turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive. |
187 | turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive. |
182 | Some of the output formats switch into 32-bit mode, when selected - more |
188 | Some of the output formats switch into 32-bit mode, when selected - more |
183 | information about formats which you can choose can be found in 2.4. |
189 | information about formats which you can choose can be found in 2.4. |
184 | All output code is always in the order in which it was entered into the |
190 | All output code is always in the order in which it was entered into the |
185 | source file. |
191 | source file. |
186 | 192 | ||
187 | 193 | ||
188 | 1.2 Assembly syntax |
194 | 1.2 Assembly syntax |
189 | 195 | ||
190 | The information provided below is intended mainly for the assembler |
196 | The information provided below is intended mainly for the assembler |
191 | programmers that have been using some other assembly compilers before. |
197 | programmers that have been using some other assembly compilers before. |
192 | If you are beginner, you should look for the assembly programming tutorials. |
198 | If you are beginner, you should look for the assembly programming tutorials. |
193 | Flat assembler by default uses the Intel syntax for the assembly |
199 | Flat assembler by default uses the Intel syntax for the assembly |
194 | instructions, although you can customize it using the preprocessor |
200 | instructions, although you can customize it using the preprocessor |
195 | capabilities (macroinstructions and symbolic constants). It also has its own |
201 | capabilities (macroinstructions and symbolic constants). It also has its own |
196 | set of the directives - the instructions for compiler. |
202 | set of the directives - the instructions for compiler. |
197 | All symbols defined inside the sources are case-sensitive. |
203 | All symbols defined inside the sources are case-sensitive. |
198 | 204 | ||
199 | 205 | ||
200 | 1.2.1 Instruction syntax |
206 | 1.2.1 Instruction syntax |
201 | 207 | ||
202 | Instructions in assembly language are separated by line breaks, and one |
208 | Instructions in assembly language are separated by line breaks, and one |
203 | instruction is expected to fill the one line of text. If a line contains |
209 | instruction is expected to fill the one line of text. If a line contains |
204 | a semicolon, except for the semicolons inside the quoted strings, the rest of |
210 | a semicolon, except for the semicolons inside the quoted strings, the rest of |
205 | this line is the comment and compiler ignores it. If a line ends with "\" |
211 | this line is the comment and compiler ignores it. If a line ends with "\" |
206 | character (eventually the semicolon and comment may follow it), the next line |
212 | character (eventually the semicolon and comment may follow it), the next line |
207 | is attached at this point. |
213 | is attached at this point. |
208 | Each line in source is the sequence of items, which may be one of the three |
214 | Each line in source is the sequence of items, which may be one of the three |
209 | types. One type are the symbol characters, which are the special characters |
215 | types. One type are the symbol characters, which are the special characters |
210 | that are individual items even when are not spaced from the other ones. |
216 | that are individual items even when are not spaced from the other ones. |
211 | Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of |
217 | Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of |
212 | other characters, separated from other items with either blank spaces or |
218 | other characters, separated from other items with either blank spaces or |
213 | symbol characters, is a symbol. If the first character of symbol is either a |
219 | symbol characters, is a symbol. If the first character of symbol is either a |
214 | single or double quote, it integrates the any sequence of characters following |
220 | single or double quote, it integrates any sequence of characters following it, |
215 | it, even the special ones, into a quoted string, which should end with the same |
221 | even the special ones, into a quoted string, which should end with the same |
216 | character, with which it began (the single or double quote) - however if there |
222 | character, with which it began (the single or double quote) - however if there |
217 | are two such characters in a row (without any other character between them), |
223 | are two such characters in a row (without any other character between them), |
218 | they are integrated into quoted string as just one of them and the quoted |
224 | they are integrated into quoted string as just one of them and the quoted |
219 | string continues then. The symbols other than symbol characters and quoted |
225 | string continues then. The symbols other than symbol characters and quoted |
220 | strings can be used as names, so are also called the name symbols. |
226 | strings can be used as names, so are also called the name symbols. |
221 | Every instruction consists of the mnemonic and the various number of |
227 | Every instruction consists of the mnemonic and the various number of |
222 | operands, separated with commas. The operand can be register, immediate value |
228 | operands, separated with commas. The operand can be register, immediate value |
223 | or a data addressed in memory, it can also be preceded by size operator to |
229 | or a data addressed in memory, it can also be preceded by size operator to |
224 | define or override its size (table 1.1). Names of available registers you can |
230 | define or override its size (table 1.1). Names of available registers you can |
225 | find in table 1.2, their sizes cannot be overridden. Immediate value can be |
231 | find in table 1.2, their sizes cannot be overridden. Immediate value can be |
226 | specified by any numerical expression. |
232 | specified by any numerical expression. |
227 | When operand is a data in memory, the address of that data (also any |
233 | When operand is a data in memory, the address of that data (also any |
228 | numerical expression, but it may contain registers) should be enclosed in |
234 | numerical expression, but it may contain registers) should be enclosed in |
229 | square brackets or preceded by "ptr" operator. For example instruction |
235 | square brackets or preceded by "ptr" operator. For example instruction |
230 | "mov eax,3" will put the immediate value 3 into the EAX register, instruction |
236 | "mov eax,3" will put the immediate value 3 into the EAX register, instruction |
231 | "mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the |
237 | "mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the |
232 | instruction "mov byte [7],3" will put the immediate value 3 into the byte at |
238 | instruction "mov byte [7],3" will put the immediate value 3 into the byte at |
233 | address 7, it can also be written as "mov byte ptr 7,3". To specify which |
239 | address 7, it can also be written as "mov byte ptr 7,3". To specify which |
234 | segment register should be used for addressing, segment register name followed |
240 | segment register should be used for addressing, segment register name followed |
235 | by a colon should be put just before the address value (inside the square |
241 | by a colon should be put just before the address value (inside the square |
236 | brackets or after the "ptr" operator). |
242 | brackets or after the "ptr" operator). |
237 | 243 | ||
238 | Table 1.1 Size operators |
244 | Table 1.1 Size operators |
239 | ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿ |
245 | /-------------------------\ |
240 | ³ Operator ³ Bits ³ Bytes ³ |
246 | | Operator | Bits | Bytes | |
241 | ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵ |
247 | |==========|======|=======| |
242 | ³ byte ³ 8 ³ 1 ³ |
248 | | byte | 8 | 1 | |
243 | ³ word ³ 16 ³ 2 ³ |
249 | | word | 16 | 2 | |
244 | ³ dword ³ 32 ³ 4 ³ |
250 | | dword | 32 | 4 | |
245 | ³ fword ³ 48 ³ 6 ³ |
251 | | fword | 48 | 6 | |
246 | ³ pword ³ 48 ³ 6 ³ |
252 | | pword | 48 | 6 | |
247 | ³ qword ³ 64 ³ 8 ³ |
253 | | qword | 64 | 8 | |
248 | ³ tbyte ³ 80 ³ 10 ³ |
254 | | tbyte | 80 | 10 | |
249 | ³ tword ³ 80 ³ 10 ³ |
255 | | tword | 80 | 10 | |
250 | ³ dqword ³ 128 ³ 16 ³ |
256 | | dqword | 128 | 16 | |
251 | ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ |
257 | | xword | 128 | 16 | |
- | 258 | | qqword | 256 | 32 | |
|
- | 259 | | yword | 256 | 32 | |
|
- | 260 | \-------------------------/ |
|
252 | 261 | ||
253 | Table 1.2 Registers |
262 | Table 1.2 Registers |
254 | ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ |
263 | /-----------------------------------------------------------------\ |
255 | ³ Type ³ Bits ³ ³ |
264 | | Type | Bits | | |
256 | ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ |
265 | |=========|======|================================================| |
257 | ³ ³ 8 ³ al cl dl bl ah ch dh bh ³ |
266 | | | 8 | al cl dl bl ah ch dh bh | |
258 | ³ General ³ 16 ³ ax cx dx bx sp bp si di ³ |
267 | | General | 16 | ax cx dx bx sp bp si di | |
259 | ³ ³ 32 ³ eax ecx edx ebx esp ebp esi edi ³ |
268 | | | 32 | eax ecx edx ebx esp ebp esi edi | |
260 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
269 | |---------|------|------------------------------------------------| |
261 | ³ Segment ³ 16 ³ es cs ss ds fs gs ³ |
270 | | Segment | 16 | es cs ss ds fs gs | |
262 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
271 | |---------|------|------------------------------------------------| |
263 | ³ Control ³ 32 ³ cr0 cr2 cr3 cr4 ³ |
272 | | Control | 32 | cr0 cr2 cr3 cr4 | |
264 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
273 | |---------|------|------------------------------------------------| |
265 | ³ Debug ³ 32 ³ dr0 dr1 dr2 dr3 dr6 dr7 ³ |
274 | | Debug | 32 | dr0 dr1 dr2 dr3 dr6 dr7 | |
266 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
275 | |---------|------|------------------------------------------------| |
267 | ³ FPU ³ 80 ³ st0 st1 st2 st3 st4 st5 st6 st7 ³ |
276 | | FPU | 80 | st0 st1 st2 st3 st4 st5 st6 st7 | |
268 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
277 | |---------|------|------------------------------------------------| |
269 | ³ MMX ³ 64 ³ mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 ³ |
278 | | MMX | 64 | mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 | |
270 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
279 | |---------|------|------------------------------------------------| |
271 | ³ SSE ³ 128 ³ xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 ³ |
280 | | SSE | 128 | xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 | |
- | 281 | |---------|------|------------------------------------------------| |
|
- | 282 | | AVX | 256 | ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 | |
|
272 | ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ |
283 | \-----------------------------------------------------------------/ |
273 | 284 | ||
274 | 285 | ||
275 | 1.2.2 Data definitions |
286 | 1.2.2 Data definitions |
276 | 287 | ||
277 | To define data or reserve a space for it, use one of the directives listed in |
288 | To define data or reserve a space for it, use one of the directives listed in |
278 | table 1.3. The data definition directive should be followed by one or more of |
289 | table 1.3. The data definition directive should be followed by one or more of |
279 | numerical expressions, separated with commas. These expressions define the |
290 | numerical expressions, separated with commas. These expressions define the |
280 | values for data cells of size depending on which directive is used. For |
291 | values for data cells of size depending on which directive is used. For |
281 | example "db 1,2,3" will define the three bytes of values 1, 2 and 3 |
292 | example "db 1,2,3" will define the three bytes of values 1, 2 and 3 |
282 | respectively. |
293 | respectively. |
283 | The "db" and "du" directives also accept the quoted string values of any |
294 | The "db" and "du" directives also accept the quoted string values of any |
284 | length, which will be converted into chain of bytes when "db" is used and into |
295 | length, which will be converted into chain of bytes when "db" is used and into |
285 | chain of words with zeroed high byte when "du" is used. For example "db 'abc'" |
296 | chain of words with zeroed high byte when "du" is used. For example "db 'abc'" |
286 | will define the three bytes of values 61, 62 and 63. |
297 | will define the three bytes of values 61, 62 and 63. |
287 | The "dp" directive and its synonym "df" accept the values consisting of two |
298 | The "dp" directive and its synonym "df" accept the values consisting of two |
288 | numerical expressions separated with colon, the first value will become the |
299 | numerical expressions separated with colon, the first value will become the |
289 | high word and the second value will become the low double word of the far |
300 | high word and the second value will become the low double word of the far |
290 | pointer value. Also "dd" accepts such pointers consisting of two word values |
301 | pointer value. Also "dd" accepts such pointers consisting of two word values |
291 | separated with colon, and "dt" accepts the word and quad word value separated |
302 | separated with colon, and "dt" accepts the word and quad word value separated |
292 | with colon, the quad word is stored first. The "dt" directive with single |
303 | with colon, the quad word is stored first. The "dt" directive with single |
293 | expression as parameter accepts only floating point values and creates data in |
304 | expression as parameter accepts only floating point values and creates data in |
294 | FPU double extended precision format. |
305 | FPU double extended precision format. |
295 | Any of the above directive allows the usage of special "dup" operator to |
306 | Any of the above directive allows the usage of special "dup" operator to |
296 | make multiple copies of given values. The count of duplicates should precede |
307 | make multiple copies of given values. The count of duplicates should precede |
297 | this operator and the value to duplicate should follow - it can even be the |
308 | this operator and the value to duplicate should follow - it can even be the |
298 | chain of values separated with commas, but such set of values needs to be |
309 | chain of values separated with commas, but such set of values needs to be |
299 | enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies |
310 | enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies |
300 | of the given two byte sequence. |
311 | of the given two byte sequence. |
301 | The "file" is a special directive and its syntax is different. This |
312 | The "file" is a special directive and its syntax is different. This |
302 | directive includes a chain of bytes from file and it should be followed by the |
313 | directive includes a chain of bytes from file and it should be followed by the |
303 | quoted file name, then optionally numerical expression specifying offset in |
314 | quoted file name, then optionally numerical expression specifying offset in |
304 | file preceded by the colon, and - also optionally - comma and numerical |
315 | file preceded by the colon, and - also optionally - comma and numerical |
305 | expression specifying count of bytes to include (if no count is specified, all |
316 | expression specifying count of bytes to include (if no count is specified, all |
306 | data up to the end of file is included). For example "file 'data.bin'" will |
317 | data up to the end of file is included). For example "file 'data.bin'" will |
307 | include the whole file as binary data and "file 'data.bin':10h,4" will include |
318 | include the whole file as binary data and "file 'data.bin':10h,4" will include |
308 | only four bytes starting at offset 10h. |
319 | only four bytes starting at offset 10h. |
309 | The data reservation directive should be followed by only one numerical |
320 | The data reservation directive should be followed by only one numerical |
310 | expression, and this value defines how many cells of the specified size should |
321 | expression, and this value defines how many cells of the specified size should |
311 | be reserved. All data definition directives also accept the "?" value, which |
322 | be reserved. All data definition directives also accept the "?" value, which |
312 | means that this cell should not be initialized to any value and the effect is |
323 | means that this cell should not be initialized to any value and the effect is |
313 | the same as by using the data reservation directive. The uninitialized data |
324 | the same as by using the data reservation directive. The uninitialized data |
314 | may not be included in the output file, so its values should be always |
325 | may not be included in the output file, so its values should be always |
315 | considered unknown. |
326 | considered unknown. |
316 | 327 | ||
317 | Table 1.3 Data directives |
328 | Table 1.3 Data directives |
318 | ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿ |
329 | /----------------------------\ |
319 | ³ Size ³ Define ³ Reserve ³ |
330 | | Size | Define | Reserve | |
320 | ³ (bytes) ³ data ³ data ³ |
331 | | (bytes) | data | data | |
321 | ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍ͵ |
332 | |=========|========|=========| |
322 | ³ 1 ³ db ³ rb ³ |
333 | | 1 | db | rb | |
323 | ³ ³ file ³ ³ |
334 | | | file | | |
324 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ |
335 | |---------|--------|---------| |
325 | ³ 2 ³ dw ³ rw ³ |
336 | | 2 | dw | rw | |
326 | ³ ³ du ³ ³ |
337 | | | du | | |
327 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ |
338 | |---------|--------|---------| |
328 | ³ 4 ³ dd ³ rd ³ |
339 | | 4 | dd | rd | |
329 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ |
340 | |---------|--------|---------| |
330 | ³ 6 ³ dp ³ rp ³ |
341 | | 6 | dp | rp | |
331 | ³ ³ df ³ rf ³ |
342 | | | df | rf | |
332 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ |
343 | |---------|--------|---------| |
333 | ³ 8 ³ dq ³ rq ³ |
344 | | 8 | dq | rq | |
334 | ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´ |
345 | |---------|--------|---------| |
335 | ³ 10 ³ dt ³ rt ³ |
346 | | 10 | dt | rt | |
336 | ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÙ |
347 | \----------------------------/ |
337 | 348 | ||
338 | 349 | ||
339 | 1.2.3 Constants and labels |
350 | 1.2.3 Constants and labels |
340 | 351 | ||
341 | In the numerical expressions you can also use constants or labels instead of |
352 | In the numerical expressions you can also use constants or labels instead of |
342 | numbers. To define the constant or label you should use the specific |
353 | numbers. To define the constant or label you should use the specific |
343 | directives. Each label can be defined only once and it is accessible from the |
354 | directives. Each label can be defined only once and it is accessible from the |
344 | any place of source (even before it was defined). Constant can be redefined |
355 | any place of source (even before it was defined). Constant can be redefined |
345 | many times, but in this case it is accessible only after it was defined, and |
356 | many times, but in this case it is accessible only after it was defined, and |
346 | is always equal to the value from last definition before the place where it's |
357 | is always equal to the value from last definition before the place where it's |
347 | used. When a constant is defined only once in source, it is - like the label - |
358 | used. When a constant is defined only once in source, it is - like the label - |
348 | accessible from anywhere. |
359 | accessible from anywhere. |
349 | The definition of constant consists of name of the constant followed by the |
360 | The definition of constant consists of name of the constant followed by the |
350 | "=" character and numerical expression, which after calculation will become |
361 | "=" character and numerical expression, which after calculation will become |
351 | the value of constant. This value is always calculated at the time the |
362 | the value of constant. This value is always calculated at the time the |
352 | constant is defined. For example you can define "count" constant by using the |
363 | constant is defined. For example you can define "count" constant by using the |
353 | directive "count = 17", and then use it in the assembly instructions, like |
364 | directive "count = 17", and then use it in the assembly instructions, like |
354 | "mov cx,count" - which will become "mov cx,17" during the compilation process. |
365 | "mov cx,count" - which will become "mov cx,17" during the compilation process. |
355 | There are different ways to define labels. The simplest is to follow the |
366 | There are different ways to define labels. The simplest is to follow the |
356 | name of label by the colon, this directive can even be followed by the other |
367 | name of label by the colon, this directive can even be followed by the other |
357 | instruction in the same line. It defines the label whose value is equal to |
368 | instruction in the same line. It defines the label whose value is equal to |
358 | offset of the point where it's defined. This method is usually used to label |
369 | offset of the point where it's defined. This method is usually used to label |
359 | the places in code. The other way is to follow the name of label (without a |
370 | the places in code. The other way is to follow the name of label (without a |
360 | colon) by some data directive. It defines the label with value equal to |
371 | colon) by some data directive. It defines the label with value equal to |
361 | offset of the beginning of defined data, and remembered as a label for data |
372 | offset of the beginning of defined data, and remembered as a label for data |
362 | with cell size as specified for that data directive in table 1.3. |
373 | with cell size as specified for that data directive in table 1.3. |
363 | The label can be treated as constant of value equal to offset of labeled |
374 | The label can be treated as constant of value equal to offset of labeled |
364 | code or data. For example when you define data using the labeled directive |
375 | code or data. For example when you define data using the labeled directive |
365 | "char db 224", to put the offset of this data into BX register you should use |
376 | "char db 224", to put the offset of this data into BX register you should use |
366 | "mov bx,char" instruction, and to put the value of byte addressed by "char" |
377 | "mov bx,char" instruction, and to put the value of byte addressed by "char" |
367 | label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char"). |
378 | label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char"). |
368 | But when you try to assemble "mov ax,[char]", it will cause an error, because |
379 | But when you try to assemble "mov ax,[char]", it will cause an error, because |
369 | fasm compares the sizes of operands, which should be equal. You can force |
380 | fasm compares the sizes of operands, which should be equal. You can force |
370 | assembling that instruction by using size override: "mov ax,word [char]", but |
381 | assembling that instruction by using size override: "mov ax,word [char]", but |
371 | remember that this instruction will read the two bytes beginning at "char" |
382 | remember that this instruction will read the two bytes beginning at "char" |
372 | address, while it was defined as a one byte. |
383 | address, while it was defined as a one byte. |
373 | The last and the most flexible way to define labels is to use "label" |
384 | The last and the most flexible way to define labels is to use "label" |
374 | directive. This directive should be followed by the name of label, then |
385 | directive. This directive should be followed by the name of label, then |
375 | optionally size operator (it can be preceded by a colon) and then - also |
386 | optionally size operator (it can be preceded by a colon) and then - also |
376 | optionally "at" operator and the numerical expression defining the address at |
387 | optionally "at" operator and the numerical expression defining the address at |
377 | which this label should be defined. For example "label wchar word at char" |
388 | which this label should be defined. For example "label wchar word at char" |
378 | will define a new label for the 16-bit data at the address of "char". Now the |
389 | will define a new label for the 16-bit data at the address of "char". Now the |
379 | instruction "mov ax,[wchar]" will be after compilation the same as |
390 | instruction "mov ax,[wchar]" will be after compilation the same as |
380 | "mov ax,word [char]". If no address is specified, "label" directive defines |
391 | "mov ax,word [char]". If no address is specified, "label" directive defines |
381 | the label at current offset. Thus "mov [wchar],57568" will copy two bytes |
392 | the label at current offset. Thus "mov [wchar],57568" will copy two bytes |
382 | while "mov [char],224" will copy one byte to the same address. |
393 | while "mov [char],224" will copy one byte to the same address. |
383 | The label whose name begins with dot is treated as local label, and its name |
394 | The label whose name begins with dot is treated as local label, and its name |
384 | is attached to the name of last global label (with name beginning with |
395 | is attached to the name of last global label (with name beginning with |
385 | anything but dot) to make the full name of this label. So you can use the |
396 | anything but dot) to make the full name of this label. So you can use the |
386 | short name (beginning with dot) of this label anywhere before the next global |
397 | short name (beginning with dot) of this label anywhere before the next global |
387 | label is defined, and in the other places you have to use the full name. Label |
398 | label is defined, and in the other places you have to use the full name. Label |
388 | beginning with two dots are the exception - they are like global, but they |
399 | beginning with two dots are the exception - they are like global, but they |
389 | don't become the new prefix for local labels. |
400 | don't become the new prefix for local labels. |
390 | The "@@" name means anonymous label, you can have defined many of them in |
401 | The "@@" name means anonymous label, you can have defined many of them in |
391 | the source. Symbol "@b" (or equivalent "@r") references the nearest preceding |
402 | the source. Symbol "@b" (or equivalent "@r") references the nearest preceding |
392 | anonymous label, symbol "@f" references the nearest following anonymous label. |
403 | anonymous label, symbol "@f" references the nearest following anonymous label. |
393 | These special symbol are case-insensitive. |
404 | These special symbol are case-insensitive. |
394 | 405 | ||
395 | 406 | ||
396 | 1.2.4 Numerical expressions |
407 | 1.2.4 Numerical expressions |
397 | 408 | ||
398 | In the above examples all the numerical expressions were the simple numbers, |
409 | In the above examples all the numerical expressions were the simple numbers, |
399 | constants or labels. But they can be more complex, by using the arithmetical |
410 | constants or labels. But they can be more complex, by using the arithmetical |
400 | or logical operators for calculations at compile time. All these operators |
411 | or logical operators for calculations at compile time. All these operators |
401 | with their priority values are listed in table 1.4. |
412 | with their priority values are listed in table 1.4. The operations with higher |
402 | The operations with higher priority value will be calculated first, you can |
413 | priority value will be calculated first, you can of course change this |
403 | of course change this behavior by putting some parts of expression into |
414 | behavior by putting some parts of expression into parenthesis. The "+", "-", |
404 | parenthesis. The "+", "-", "*" and "/" are standard arithmetical operations, |
415 | "*" and "/" are standard arithmetical operations, "mod" calculates the |
405 | "mod" calculates the remainder from division. The "and", "or", "xor", "shl", |
416 | remainder from division. The "and", "or", "xor", "shl", "shr" and "not" |
406 | "shr" and "not" perform the same logical operations as assembly instructions |
417 | perform the same logical operations as assembly instructions of those names. |
407 | of those names. The "rva" performs the conversion of an address into the |
418 | The "rva" and "plt" are special unary operators that perform conversions |
- | 419 | between different kinds of addresses, they can be used only with few of the |
|
408 | relocatable offset and is specific to some of the output formats (see 2.4). |
420 | output formats and their meaning may vary (see 2.4). |
409 | The numbers in the expression are by default treated as a decimal, binary |
421 | The arithmetical and logical calculations are usually processed as if they |
- | 422 | operated on infinite precision 2-adic numbers, and assembler signalizes an |
|
- | 423 | overflow error if because of its limitations it is not table to perform the |
|
- | 424 | required calculation, or if the result is too large number to fit in either |
|
- | 425 | signed or unsigned range for the destination unit size. However "not", "xor" |
|
- | 426 | and "shr" operators are exceptions from this rule - if the value specified |
|
- | 427 | by numerical expression has to fit in a unit of specified size, and the |
|
- | 428 | arguments for operation fit into that size, the operation will be performed |
|
- | 429 | with precision limited to that size. |
|
- | 430 | The numbers in the expression are by default treated as a decimal, binary |
|
410 | numbers should have the "b" letter attached at the end, octal number should |
431 | numbers should have the "b" letter attached at the end, octal number should |
411 | end with "o" letter, hexadecimal numbers should begin with "0x" characters |
432 | end with "o" letter, hexadecimal numbers should begin with "0x" characters |
412 | (like in C language) or with the "$" character (like in Pascal language) or |
433 | (like in C language) or with the "$" character (like in Pascal language) or |
413 | they should end with "h" letter. Also quoted string, when encountered in |
434 | they should end with "h" letter. Also quoted string, when encountered in |
414 | expression, will be converted into number - the first character will become |
435 | expression, will be converted into number - the first character will become |
415 | the least significant byte of number. |
436 | the least significant byte of number. |
416 | The numerical expression used as an address value can also contain any of |
437 | The numerical expression used as an address value can also contain any of |
417 | general registers used for addressing, they can be added and multiplied by |
438 | general registers used for addressing, they can be added and multiplied by |
418 | appropriate values, as it is allowed for the x86 architecture instructions. |
439 | appropriate values, as it is allowed for the x86 architecture instructions. |
419 | There are also some special symbols that can be used inside the numerical |
440 | There are also some special symbols that can be used inside the numerical |
420 | expression. First is "$", which is always equal to the value of current |
441 | expression. First is "$", which is always equal to the value of current |
421 | offset, while "$$" is equal to base address of current addressing space. The |
442 | offset, while "$$" is equal to base address of current addressing space. The |
422 | other one is "%", which is the number of current repeat in parts of code that |
443 | other one is "%", which is the number of current repeat in parts of code that |
423 | are repeated using some special directives (see 2.2). There's also "%t" |
444 | are repeated using some special directives (see 2.2). There's also "%t" |
424 | symbol, which is always equal to the current time stamp. |
445 | symbol, which is always equal to the current time stamp. |
425 | Any numerical expression can also consist of single floating point value |
446 | Any numerical expression can also consist of single floating point value |
426 | (flat assembler does not allow any floating point operations at compilation |
447 | (flat assembler does not allow any floating point operations at compilation |
427 | time) in the scientific notation, they can end with the "f" letter to be |
448 | time) in the scientific notation, they can end with the "f" letter to be |
428 | recognized, otherwise they should contain at least one of the "." or "E" |
449 | recognized, otherwise they should contain at least one of the "." or "E" |
429 | characters. So "1.0", "1E0" and "1f" define the same floating point value, |
450 | characters. So "1.0", "1E0" and "1f" define the same floating point value, |
430 | while simple "1" defines an integer value. |
451 | while simple "1" defines an integer value. |
431 | 452 | ||
432 | Table 1.4 Arithmetical and logical operators by priority |
453 | Table 1.4 Arithmetical and logical operators by priority |
433 | ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ |
454 | /-------------------------\ |
434 | ³ Priority ³ Operators ³ |
455 | | Priority | Operators | |
435 | ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ |
456 | |==========|==============| |
436 | ³ 0 ³ + - ³ |
457 | | 0 | + - | |
437 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
458 | |----------|--------------| |
438 | ³ 1 ³ * / ³ |
459 | | 1 | * / | |
439 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
460 | |----------|--------------| |
440 | ³ 2 ³ mod ³ |
461 | | 2 | mod | |
441 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
462 | |----------|--------------| |
442 | ³ 3 ³ and or xor ³ |
463 | | 3 | and or xor | |
443 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
464 | |----------|--------------| |
444 | ³ 4 ³ shl shr ³ |
465 | | 4 | shl shr | |
445 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
466 | |----------|--------------| |
446 | ³ 5 ³ not ³ |
467 | | 5 | not | |
447 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
468 | |----------|--------------| |
448 | ³ 6 ³ rva ³ |
469 | | 6 | rva plt | |
449 | ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ |
470 | \-------------------------/ |
450 | 471 | ||
451 | 472 | ||
452 | 1.2.5 Jumps and calls |
473 | 1.2.5 Jumps and calls |
453 | 474 | ||
454 | The operand of any jump or call instruction can be preceded not only by the |
475 | The operand of any jump or call instruction can be preceded not only by the |
455 | size operator, but also by one of the operators specifying type of the jump: |
476 | size operator, but also by one of the operators specifying type of the jump: |
456 | "short", "near" of "far". For example, when assembler is in 16-bit mode, |
477 | "short", "near" of "far". For example, when assembler is in 16-bit mode, |
457 | instruction "jmp dword [0]" will become the far jump and when assembler is |
478 | instruction "jmp dword [0]" will become the far jump and when assembler is |
458 | in 32-bit mode, it will become the near jump. To force this instruction to be |
479 | in 32-bit mode, it will become the near jump. To force this instruction to be |
459 | treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form. |
480 | treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form. |
460 | When operand of near jump is the immediate value, assembler will generate |
481 | When operand of near jump is the immediate value, assembler will generate |
461 | the shortest variant of this jump instruction if possible (but won't create |
482 | the shortest variant of this jump instruction if possible (but will not create |
462 | 32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode, |
483 | 32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode, |
463 | unless there is a size operator stating it). By specifying the jump type |
484 | unless there is a size operator stating it). By specifying the jump type |
464 | you can force it to always generate long variant (for example "jmp near 0") |
485 | you can force it to always generate long variant (for example "jmp near 0") |
465 | or to always generate short variant and terminate with an error when it's |
486 | or to always generate short variant and terminate with an error when it's |
466 | impossible (for example "jmp short 0"). |
487 | impossible (for example "jmp short 0"). |
467 | 488 | ||
468 | 489 | ||
469 | 1.2.6 Size settings |
490 | 1.2.6 Size settings |
470 | 491 | ||
471 | When instruction uses some memory addressing, by default the smallest form of |
492 | When instruction uses some memory addressing, by default the smallest form of |
472 | instruction is generated by using the short displacement if only address |
493 | instruction is generated by using the short displacement if only address |
473 | value fits in the range. This can be overridden using the "word" or "dword" |
494 | value fits in the range. This can be overridden using the "word" or "dword" |
474 | operator before the address inside the square brackets (or after the "ptr" |
495 | operator before the address inside the square brackets (or after the "ptr" |
475 | operator), which forces the long displacement of appropriate size to be made. |
496 | operator), which forces the long displacement of appropriate size to be made. |
476 | In case when address is not relative to any registers, those operators allow |
497 | In case when address is not relative to any registers, those operators allow |
477 | also to choose the appropriate mode of absolute addressing. |
498 | also to choose the appropriate mode of absolute addressing. |
478 | Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with |
499 | Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with |
479 | first operand being 16-bit or 32-bit are by default generated in shortened |
500 | first operand being 16-bit or 32-bit are by default generated in shortened |
480 | 8-bit form when the second operand is immediate value fitting in the range |
501 | 8-bit form when the second operand is immediate value fitting in the range |
481 | for signed 8-bit values. It also can be overridden by putting the "word" or |
502 | for signed 8-bit values. It also can be overridden by putting the "word" or |
482 | "dword" operator before the immediate value. The similar rules applies to the |
503 | "dword" operator before the immediate value. The similar rules applies to the |
483 | "imul" instruction with the last operand being immediate value. |
504 | "imul" instruction with the last operand being immediate value. |
484 | Immediate value as an operand for "push" instruction without a size operator |
505 | Immediate value as an operand for "push" instruction without a size operator |
485 | is by default treated as a word value if assembler is in 16-bit mode and as a |
506 | is by default treated as a word value if assembler is in 16-bit mode and as a |
486 | double word value if assembler is in 32-bit mode, shorter 8-bit form of this |
507 | double word value if assembler is in 32-bit mode, shorter 8-bit form of this |
487 | instruction is used if possible, "word" or "dword" size operator forces the |
508 | instruction is used if possible, "word" or "dword" size operator forces the |
488 | "push" instruction to be generated in longer form for specified size. "pushw" |
509 | "push" instruction to be generated in longer form for specified size. "pushw" |
489 | and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code |
510 | and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code |
490 | without forcing it to use the longer form of instruction. |
511 | without forcing it to use the longer form of instruction. |
491 | 512 | ||
492 | 513 | ||
493 | Chapter 2 Instruction set |
514 | Chapter 2 Instruction set |
494 | ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ |
515 | -------------------------- |
495 | 516 | ||
496 | This chapter provides the detailed information about the instructions and |
517 | This chapter provides the detailed information about the instructions and |
497 | directives supported by flat assembler. Directives for defining labels were |
518 | directives supported by flat assembler. Directives for defining labels were |
498 | already discussed in 1.2.3, all other directives will be described later in |
519 | already discussed in 1.2.3, all other directives will be described later in |
499 | this chapter. |
520 | this chapter. |
500 | 521 | ||
501 | 522 | ||
502 | 2.1 The x86 architecture instructions |
523 | 2.1 The x86 architecture instructions |
503 | 524 | ||
504 | In this section you can find both the information about the syntax and |
525 | In this section you can find both the information about the syntax and |
505 | purpose the assembly language instructions. If you need more technical |
526 | purpose the assembly language instructions. If you need more technical |
506 | information, look for the Intel Architecture Software Developer's Manual. |
527 | information, look for the Intel Architecture Software Developer's Manual. |
507 | Assembly instructions consist of the mnemonic (instruction's name) and from |
528 | Assembly instructions consist of the mnemonic (instruction's name) and from |
508 | zero to three operands. If there are two or more operands, usually first is |
529 | zero to three operands. If there are two or more operands, usually first is |
509 | the destination operand and second is the source operand. Each operand can be |
530 | the destination operand and second is the source operand. Each operand can be |
510 | register, memory or immediate value (see 1.2 for details about syntax of |
531 | register, memory or immediate value (see 1.2 for details about syntax of |
511 | operands). After the description of each instruction there are examples |
532 | operands). After the description of each instruction there are examples |
512 | of different combinations of operands, if the instruction has any. |
533 | of different combinations of operands, if the instruction has any. |
513 | Some instructions act as prefixes and can be followed by other instruction |
534 | Some instructions act as prefixes and can be followed by other instruction |
514 | in the same line, and there can be more than one prefix in a line. Each name |
535 | in the same line, and there can be more than one prefix in a line. Each name |
515 | of the segment register is also a mnemonic of instruction prefix, altough it |
536 | of the segment register is also a mnemonic of instruction prefix, altough it |
516 | is recommended to use segment overrides inside the square brackets instead of |
537 | is recommended to use segment overrides inside the square brackets instead of |
517 | these prefixes. |
538 | these prefixes. |
518 | 539 | ||
519 | 540 | ||
520 | 2.1.1 Data movement instructions |
541 | 2.1.1 Data movement instructions |
521 | 542 | ||
522 | "mov" transfers a byte, word or double word from the source operand to the |
543 | "mov" transfers a byte, word or double word from the source operand to the |
523 | destination operand. It can transfer data between general registers, from |
544 | destination operand. It can transfer data between general registers, from |
524 | the general register to memory, or from memory to general register, but it |
545 | the general register to memory, or from memory to general register, but it |
525 | cannot move from memory to memory. It can also transfer an immediate value to |
546 | cannot move from memory to memory. It can also transfer an immediate value to |
526 | general register or memory, segment register to general register or memory, |
547 | general register or memory, segment register to general register or memory, |
527 | general register or memory to segment register, control or debug register to |
548 | general register or memory to segment register, control or debug register to |
528 | general register and general register to control or debug register. The "mov" |
549 | general register and general register to control or debug register. The "mov" |
529 | can be assembled only if the size of source operand and size of destination |
550 | can be assembled only if the size of source operand and size of destination |
530 | operand are the same. Below are the examples for each of the allowed |
551 | operand are the same. Below are the examples for each of the allowed |
531 | combinations: |
552 | combinations: |
532 | 553 | ||
533 | mov bx,ax ; general register to general register |
554 | mov bx,ax ; general register to general register |
534 | mov [char],al ; general register to memory |
555 | mov [char],al ; general register to memory |
535 | mov bl,[char] ; memory to general register |
556 | mov bl,[char] ; memory to general register |
536 | mov dl,32 ; immediate value to general register |
557 | mov dl,32 ; immediate value to general register |
537 | mov [char],32 ; immediate value to memory |
558 | mov [char],32 ; immediate value to memory |
538 | mov ax,ds ; segment register to general register |
559 | mov ax,ds ; segment register to general register |
539 | mov [bx],ds ; segment register to memory |
560 | mov [bx],ds ; segment register to memory |
540 | mov ds,ax ; general register to segment register |
561 | mov ds,ax ; general register to segment register |
541 | mov ds,[bx] ; memory to segment register |
562 | mov ds,[bx] ; memory to segment register |
542 | mov eax,cr0 ; control register to general register |
563 | mov eax,cr0 ; control register to general register |
543 | mov cr3,ebx ; general register to control register |
564 | mov cr3,ebx ; general register to control register |
544 | 565 | ||
545 | "xchg" swaps the contents of two operands. It can swap two byte operands, |
566 | "xchg" swaps the contents of two operands. It can swap two byte operands, |
546 | two word operands or two double word operands. Order of operands is not |
567 | two word operands or two double word operands. Order of operands is not |
547 | important. The operands may be two general registers, or general register |
568 | important. The operands may be two general registers, or general register |
548 | with memory. For example: |
569 | with memory. For example: |
549 | 570 | ||
550 | xchg ax,bx ; swap two general registers |
571 | xchg ax,bx ; swap two general registers |
551 | xchg al,[char] ; swap register with memory |
572 | xchg al,[char] ; swap register with memory |
552 | 573 | ||
553 | "push" decrements the stack frame pointer (ESP register), then transfers |
574 | "push" decrements the stack frame pointer (ESP register), then transfers |
554 | the operand to the top of stack indicated by ESP. The operand can be memory, |
575 | the operand to the top of stack indicated by ESP. The operand can be memory, |
555 | general register, segment register or immediate value of word or double word |
576 | general register, segment register or immediate value of word or double word |
556 | size. If operand is an immediate value and no size is specified, it is by |
577 | size. If operand is an immediate value and no size is specified, it is by |
557 | default treated as a word value if assembler is in 16-bit mode and as a double |
578 | default treated as a word value if assembler is in 16-bit mode and as a double |
558 | word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are |
579 | word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are |
559 | variants of this instruction that store the values of word or double word size |
580 | variants of this instruction that store the values of word or double word size |
560 | respectively. If more operands follow in the same line (separated only with |
581 | respectively. If more operands follow in the same line (separated only with |
561 | spaces, not commas), compiler will assemble chain of the "push" instructions |
582 | spaces, not commas), compiler will assemble chain of the "push" instructions |
562 | with these operands. The examples are with single operands: |
583 | with these operands. The examples are with single operands: |
563 | 584 | ||
564 | push ax ; store general register |
585 | push ax ; store general register |
565 | push es ; store segment register |
586 | push es ; store segment register |
566 | pushw [bx] ; store memory |
587 | pushw [bx] ; store memory |
567 | push 1000h ; store immediate value |
588 | push 1000h ; store immediate value |
568 | 589 | ||
569 | "pusha" saves the contents of the eight general register on the stack. |
590 | "pusha" saves the contents of the eight general register on the stack. |
570 | This instruction has no operands. There are two version of this instruction, |
591 | This instruction has no operands. There are two version of this instruction, |
571 | one 16-bit and one 32-bit, assembler automatically generates the appropriate |
592 | one 16-bit and one 32-bit, assembler automatically generates the appropriate |
572 | version for current mode, but it can be overridden by using "pushaw" or |
593 | version for current mode, but it can be overridden by using "pushaw" or |
573 | "pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit |
594 | "pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit |
574 | version of this instruction pushes general registers on the stack in the |
595 | version of this instruction pushes general registers on the stack in the |
575 | following order: AX, CX, DX, BX, the initial value of SP before AX was pushed, |
596 | following order: AX, CX, DX, BX, the initial value of SP before AX was pushed, |
576 | BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers |
597 | BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers |
577 | in the same order. |
598 | in the same order. |
578 | "pop" transfers the word or double word at the current top of stack to the |
599 | "pop" transfers the word or double word at the current top of stack to the |
579 | destination operand, and then increments ESP to point to the new top of stack. |
600 | destination operand, and then increments ESP to point to the new top of stack. |
580 | The operand can be memory, general register or segment register. "popw" and |
601 | The operand can be memory, general register or segment register. "popw" and |
581 | "popd" mnemonics are variants of this instruction for restoring the values of |
602 | "popd" mnemonics are variants of this instruction for restoring the values of |
582 | word or double word size respectively. If more operands separated with spaces |
603 | word or double word size respectively. If more operands separated with spaces |
583 | follow in the same line, compiler will assemble chain of the "pop" |
604 | follow in the same line, compiler will assemble chain of the "pop" |
584 | instructions with these operands. |
605 | instructions with these operands. |
585 | 606 | ||
586 | pop bx ; restore general register |
607 | pop bx ; restore general register |
587 | pop ds ; restore segment register |
608 | pop ds ; restore segment register |
588 | popw [si] ; restore memory |
609 | popw [si] ; restore memory |
589 | 610 | ||
590 | "popa" restores the registers saved on the stack by "pusha" instruction, |
611 | "popa" restores the registers saved on the stack by "pusha" instruction, |
591 | except for the saved value of SP (or ESP), which is ignored. This instruction |
612 | except for the saved value of SP (or ESP), which is ignored. This instruction |
592 | has no operands. To force assembling 16-bit or 32-bit version of this |
613 | has no operands. To force assembling 16-bit or 32-bit version of this |
593 | instruction use "popaw" or "popad" mnemonic. |
614 | instruction use "popaw" or "popad" mnemonic. |
594 | 615 | ||
595 | 616 | ||
596 | 2.1.2 Type conversion instructions |
617 | 2.1.2 Type conversion instructions |
597 | 618 | ||
598 | The type conversion instructions convert bytes into words, words into double |
619 | The type conversion instructions convert bytes into words, words into double |
599 | words, and double words into quad words. These conversions can be done using |
620 | words, and double words into quad words. These conversions can be done using |
600 | the sign extension or zero extension. The sign extension fills the extra bits |
621 | the sign extension or zero extension. The sign extension fills the extra bits |
601 | of the larger item with the value of the sign bit of the smaller item, the |
622 | of the larger item with the value of the sign bit of the smaller item, the |
602 | zero extension simply fills them with zeros. |
623 | zero extension simply fills them with zeros. |
603 | "cwd" and "cdq" double the size of value AX or EAX register respectively |
624 | "cwd" and "cdq" double the size of value AX or EAX register respectively |
604 | and store the extra bits into the DX or EDX register. The conversion is done |
625 | and store the extra bits into the DX or EDX register. The conversion is done |
605 | using the sign extension. These instructions have no operands. |
626 | using the sign extension. These instructions have no operands. |
606 | "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends |
627 | "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends |
607 | the sign of the word in AX throughout EAX. These instructions also have no |
628 | the sign of the word in AX throughout EAX. These instructions also have no |
608 | operands. |
629 | operands. |
609 | "movsx" converts a byte to word or double word and a word to double word |
630 | "movsx" converts a byte to word or double word and a word to double word |
610 | using the sign extension. "movzx" does the same, but it uses the zero |
631 | using the sign extension. "movzx" does the same, but it uses the zero |
611 | extension. The source operand can be general register or memory, while the |
632 | extension. The source operand can be general register or memory, while the |
612 | destination operand must be a general register. For example: |
633 | destination operand must be a general register. For example: |
613 | 634 | ||
614 | movsx ax,al ; byte register to word register |
635 | movsx ax,al ; byte register to word register |
615 | movsx edx,dl ; byte register to double word register |
636 | movsx edx,dl ; byte register to double word register |
616 | movsx eax,ax ; word register to double word register |
637 | movsx eax,ax ; word register to double word register |
617 | movsx ax,byte [bx] ; byte memory to word register |
638 | movsx ax,byte [bx] ; byte memory to word register |
618 | movsx edx,byte [bx] ; byte memory to double word register |
639 | movsx edx,byte [bx] ; byte memory to double word register |
619 | movsx eax,word [bx] ; word memory to double word register |
640 | movsx eax,word [bx] ; word memory to double word register |
620 | 641 | ||
621 | 642 | ||
622 | 2.1.3 Binary arithmetic instructions |
643 | 2.1.3 Binary arithmetic instructions |
623 | 644 | ||
624 | "add" replaces the destination operand with the sum of the source and |
645 | "add" replaces the destination operand with the sum of the source and |
625 | destination operands and sets CF if overflow has occurred. The operands may |
646 | destination operands and sets CF if overflow has occurred. The operands may |
626 | be bytes, words or double words. The destination operand can be general |
647 | be bytes, words or double words. The destination operand can be general |
627 | register or memory, the source operand can be general register or immediate |
648 | register or memory, the source operand can be general register or immediate |
628 | value, it can also be memory if the destination operand is register. |
649 | value, it can also be memory if the destination operand is register. |
629 | 650 | ||
630 | add ax,bx ; add register to register |
651 | add ax,bx ; add register to register |
631 | add ax,[si] ; add memory to register |
652 | add ax,[si] ; add memory to register |
632 | add [di],al ; add register to memory |
653 | add [di],al ; add register to memory |
633 | add al,48 ; add immediate value to register |
654 | add al,48 ; add immediate value to register |
634 | add [char],48 ; add immediate value to memory |
655 | add [char],48 ; add immediate value to memory |
635 | 656 | ||
636 | "adc" sums the operands, adds one if CF is set, and replaces the destination |
657 | "adc" sums the operands, adds one if CF is set, and replaces the destination |
637 | operand with the result. Rules for the operands are the same as for the "add" |
658 | operand with the result. Rules for the operands are the same as for the "add" |
638 | instruction. An "add" followed by multiple "adc" instructions can be used to |
659 | instruction. An "add" followed by multiple "adc" instructions can be used to |
639 | add numbers longer than 32 bits. |
660 | add numbers longer than 32 bits. |
640 | "inc" adds one to the operand, it does not affect CF. The operand can be a |
661 | "inc" adds one to the operand, it does not affect CF. The operand can be a |
641 | general register or memory, and the size of the operand can be byte, word or |
662 | general register or memory, and the size of the operand can be byte, word or |
642 | double word. |
663 | double word. |
643 | 664 | ||
644 | inc ax ; increment register by one |
665 | inc ax ; increment register by one |
645 | inc byte [bx] ; increment memory by one |
666 | inc byte [bx] ; increment memory by one |
646 | 667 | ||
647 | "sub" subtracts the source operand from the destination operand and replaces |
668 | "sub" subtracts the source operand from the destination operand and replaces |
648 | the destination operand with the result. If a borrow is required, the CF is |
669 | the destination operand with the result. If a borrow is required, the CF is |
649 | set. Rules for the operands are the same as for the "add" instruction. |
670 | set. Rules for the operands are the same as for the "add" instruction. |
650 | "sbb" subtracts the source operand from the destination operand, subtracts |
671 | "sbb" subtracts the source operand from the destination operand, subtracts |
651 | one if CF is set, and stores the result to the destination operand. Rules for |
672 | one if CF is set, and stores the result to the destination operand. Rules for |
652 | the operands are the same as for the "add" instruction. A "sub" followed by |
673 | the operands are the same as for the "add" instruction. A "sub" followed by |
653 | multiple "sbb" instructions may be used to subtract numbers longer than 32 |
674 | multiple "sbb" instructions may be used to subtract numbers longer than 32 |
654 | bits. |
675 | bits. |
655 | "dec" subtracts one from the operand, it does not affect CF. Rules for the |
676 | "dec" subtracts one from the operand, it does not affect CF. Rules for the |
656 | operand are the same as for the "inc" instruction. |
677 | operand are the same as for the "inc" instruction. |
657 | "cmp" subtracts the source operand from the destination operand. It updates |
678 | "cmp" subtracts the source operand from the destination operand. It updates |
658 | the flags as the "sub" instruction, but does not alter the source and |
679 | the flags as the "sub" instruction, but does not alter the source and |
659 | destination operands. Rules for the operands are the same as for the "sub" |
680 | destination operands. Rules for the operands are the same as for the "sub" |
660 | instruction. |
681 | instruction. |
661 | "neg" subtracts a signed integer operand from zero. The effect of this |
682 | "neg" subtracts a signed integer operand from zero. The effect of this |
662 | instructon is to reverse the sign of the operand from positive to negative or |
683 | instructon is to reverse the sign of the operand from positive to negative or |
663 | from negative to positive. Rules for the operand are the same as for the "inc" |
684 | from negative to positive. Rules for the operand are the same as for the "inc" |
664 | instruction. |
685 | instruction. |
665 | "xadd" exchanges the destination operand with the source operand, then loads |
686 | "xadd" exchanges the destination operand with the source operand, then loads |
666 | the sum of the two values into the destination operand. Rules for the operands |
687 | the sum of the two values into the destination operand. Rules for the operands |
667 | are the same as for the "add" instruction. |
688 | are the same as for the "add" instruction. |
668 | All the above binary arithmetic instructions update SF, ZF, PF and OF flags. |
689 | All the above binary arithmetic instructions update SF, ZF, PF and OF flags. |
669 | SF is always set to the same value as the result's sign bit, ZF is set when |
690 | SF is always set to the same value as the result's sign bit, ZF is set when |
670 | all the bits of result are zero, PF is set when low order eight bits of result |
691 | all the bits of result are zero, PF is set when low order eight bits of result |
671 | contain an even number of set bits, OF is set if result is too large for a |
692 | contain an even number of set bits, OF is set if result is too large for a |
672 | positive number or too small for a negative number (excluding sign bit) to fit |
693 | positive number or too small for a negative number (excluding sign bit) to fit |
673 | in destination operand. |
694 | in destination operand. |
674 | "mul" performs an unsigned multiplication of the operand and the |
695 | "mul" performs an unsigned multiplication of the operand and the |
675 | accumulator. If the operand is a byte, the processor multiplies it by the |
696 | accumulator. If the operand is a byte, the processor multiplies it by the |
676 | contents of AL and returns the 16-bit result to AH and AL. If the operand is a |
697 | contents of AL and returns the 16-bit result to AH and AL. If the operand is a |
677 | word, the processor multiplies it by the contents of AX and returns the 32-bit |
698 | word, the processor multiplies it by the contents of AX and returns the 32-bit |
678 | result to DX and AX. If the operand is a double word, the processor multiplies |
699 | result to DX and AX. If the operand is a double word, the processor multiplies |
679 | it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul" |
700 | it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul" |
680 | sets CF and OF when the upper half of the result is nonzero, otherwise they |
701 | sets CF and OF when the upper half of the result is nonzero, otherwise they |
681 | are cleared. Rules for the operand are the same as for the "inc" instruction. |
702 | are cleared. Rules for the operand are the same as for the "inc" instruction. |
682 | "imul" performs a signed multiplication operation. This instruction has |
703 | "imul" performs a signed multiplication operation. This instruction has |
683 | three variations. First has one operand and behaves in the same way as the |
704 | three variations. First has one operand and behaves in the same way as the |
684 | "mul" instruction. Second has two operands, in this case destination operand |
705 | "mul" instruction. Second has two operands, in this case destination operand |
685 | is multiplied by the source operand and the result replaces the destination |
706 | is multiplied by the source operand and the result replaces the destination |
686 | operand. Destination operand must be a general register, it can be word or |
707 | operand. Destination operand must be a general register, it can be word or |
687 | double word, source operand can be general register, memory or immediate |
708 | double word, source operand can be general register, memory or immediate |
688 | value. Third form has three operands, the destination operand must be a |
709 | value. Third form has three operands, the destination operand must be a |
689 | general register, word or double word in size, source operand can be general |
710 | general register, word or double word in size, source operand can be general |
690 | register or memory, and third operand must be an immediate value. The source |
711 | register or memory, and third operand must be an immediate value. The source |
691 | operand is multiplied by the immediate value and the result is stored in the |
712 | operand is multiplied by the immediate value and the result is stored in the |
692 | destination register. All the three forms calculate the product to twice the |
713 | destination register. All the three forms calculate the product to twice the |
693 | size of operands and set CF and OF when the upper half of the result is |
714 | size of operands and set CF and OF when the upper half of the result is |
694 | nonzero, but second and third form truncate the product to the size of |
715 | nonzero, but second and third form truncate the product to the size of |
695 | operands. So second and third forms can be also used for unsigned operands |
716 | operands. So second and third forms can be also used for unsigned operands |
696 | because, whether the operands are signed or unsigned, the lower half of the |
717 | because, whether the operands are signed or unsigned, the lower half of the |
697 | product is the same. Below are the examples for all three forms: |
718 | product is the same. Below are the examples for all three forms: |
698 | 719 | ||
699 | imul bl ; accumulator by register |
720 | imul bl ; accumulator by register |
700 | imul word [si] ; accumulator by memory |
721 | imul word [si] ; accumulator by memory |
701 | imul bx,cx ; register by register |
722 | imul bx,cx ; register by register |
702 | imul bx,[si] ; register by memory |
723 | imul bx,[si] ; register by memory |
703 | imul bx,10 ; register by immediate value |
724 | imul bx,10 ; register by immediate value |
704 | imul ax,bx,10 ; register by immediate value to register |
725 | imul ax,bx,10 ; register by immediate value to register |
705 | imul ax,[si],10 ; memory by immediate value to register |
726 | imul ax,[si],10 ; memory by immediate value to register |
706 | 727 | ||
707 | "div" performs an unsigned division of the accumulator by the operand. |
728 | "div" performs an unsigned division of the accumulator by the operand. |
708 | The dividend (the accumulator) is twice the size of the divisor (the operand), |
729 | The dividend (the accumulator) is twice the size of the divisor (the operand), |
709 | the quotient and remainder have the same size as the divisor. If divisor is |
730 | the quotient and remainder have the same size as the divisor. If divisor is |
710 | byte, the dividend is taken from AX register, the quotient is stored in AL and |
731 | byte, the dividend is taken from AX register, the quotient is stored in AL and |
711 | the remainder is stored in AH. If divisor is word, the upper half of dividend |
732 | the remainder is stored in AH. If divisor is word, the upper half of dividend |
712 | is taken from DX, the lower half of dividend is taken from AX, the quotient is |
733 | is taken from DX, the lower half of dividend is taken from AX, the quotient is |
713 | stored in AX and the remainder is stored in DX. If divisor is double word, |
734 | stored in AX and the remainder is stored in DX. If divisor is double word, |
714 | the upper half of dividend is taken from EDX, the lower half of dividend is |
735 | the upper half of dividend is taken from EDX, the lower half of dividend is |
715 | taken from EAX, the quotient is stored in EAX and the remainder is stored in |
736 | taken from EAX, the quotient is stored in EAX and the remainder is stored in |
716 | EDX. Rules for the operand are the same as for the "mul" instruction. |
737 | EDX. Rules for the operand are the same as for the "mul" instruction. |
717 | "idiv" performs a signed division of the accumulator by the operand. |
738 | "idiv" performs a signed division of the accumulator by the operand. |
718 | It uses the same registers as the "div" instruction, and the rules for |
739 | It uses the same registers as the "div" instruction, and the rules for |
719 | the operand are the same. |
740 | the operand are the same. |
720 | 741 | ||
721 | 742 | ||
722 | 2.1.4 Decimal arithmetic instructions |
743 | 2.1.4 Decimal arithmetic instructions |
723 | 744 | ||
724 | Decimal arithmetic is performed by combining the binary arithmetic |
745 | Decimal arithmetic is performed by combining the binary arithmetic |
725 | instructions (already described in the prior section) with the decimal |
746 | instructions (already described in the prior section) with the decimal |
726 | arithmetic instructions. The decimal arithmetic instructions are used to |
747 | arithmetic instructions. The decimal arithmetic instructions are used to |
727 | adjust the results of a previous binary arithmetic operation to produce a |
748 | adjust the results of a previous binary arithmetic operation to produce a |
728 | valid packed or unpacked decimal result, or to adjust the inputs to a |
749 | valid packed or unpacked decimal result, or to adjust the inputs to a |
729 | subsequent binary arithmetic operation so the operation will produce a valid |
750 | subsequent binary arithmetic operation so the operation will produce a valid |
730 | packed or unpacked decimal result. |
751 | packed or unpacked decimal result. |
731 | "daa" adjusts the result of adding two valid packed decimal operands in |
752 | "daa" adjusts the result of adding two valid packed decimal operands in |
732 | AL. "daa" must always follow the addition of two pairs of packed decimal |
753 | AL. "daa" must always follow the addition of two pairs of packed decimal |
733 | numbers (one digit in each half-byte) to obtain a pair of valid packed |
754 | numbers (one digit in each half-byte) to obtain a pair of valid packed |
734 | decimal digits as results. The carry flag is set if carry was needed. |
755 | decimal digits as results. The carry flag is set if carry was needed. |
735 | This instruction has no operands. |
756 | This instruction has no operands. |
736 | "das" adjusts the result of subtracting two valid packed decimal operands |
757 | "das" adjusts the result of subtracting two valid packed decimal operands |
737 | in AL. "das" must always follow the subtraction of one pair of packed decimal |
758 | in AL. "das" must always follow the subtraction of one pair of packed decimal |
738 | numbers (one digit in each half-byte) from another to obtain a pair of valid |
759 | numbers (one digit in each half-byte) from another to obtain a pair of valid |
739 | packed decimal digits as results. The carry flag is set if a borrow was |
760 | packed decimal digits as results. The carry flag is set if a borrow was |
740 | needed. This instruction has no operands. |
761 | needed. This instruction has no operands. |
741 | "aaa" changes the contents of register AL to a valid unpacked decimal |
762 | "aaa" changes the contents of register AL to a valid unpacked decimal |
742 | number, and zeroes the top four bits. "aaa" must always follow the addition |
763 | number, and zeroes the top four bits. "aaa" must always follow the addition |
743 | of two unpacked decimal operands in AL. The carry flag is set and AH is |
764 | of two unpacked decimal operands in AL. The carry flag is set and AH is |
744 | incremented if a carry is necessary. This instruction has no operands. |
765 | incremented if a carry is necessary. This instruction has no operands. |
745 | "aas" changes the contents of register AL to a valid unpacked decimal |
766 | "aas" changes the contents of register AL to a valid unpacked decimal |
746 | number, and zeroes the top four bits. "aas" must always follow the |
767 | number, and zeroes the top four bits. "aas" must always follow the |
747 | subtraction of one unpacked decimal operand from another in AL. The carry flag |
768 | subtraction of one unpacked decimal operand from another in AL. The carry flag |
748 | is set and AH decremented if a borrow is necessary. This instruction has no |
769 | is set and AH decremented if a borrow is necessary. This instruction has no |
749 | operands. |
770 | operands. |
750 | "aam" corrects the result of a multiplication of two valid unpacked decimal |
771 | "aam" corrects the result of a multiplication of two valid unpacked decimal |
751 | numbers. "aam" must always follow the multiplication of two decimal numbers |
772 | numbers. "aam" must always follow the multiplication of two decimal numbers |
752 | to produce a valid decimal result. The high order digit is left in AH, the |
773 | to produce a valid decimal result. The high order digit is left in AH, the |
753 | low order digit in AL. The generalized version of this instruction allows |
774 | low order digit in AL. The generalized version of this instruction allows |
754 | adjustment of the contents of the AX to create two unpacked digits of any |
775 | adjustment of the contents of the AX to create two unpacked digits of any |
755 | number base. The standard version of this instruction has no operands, the |
776 | number base. The standard version of this instruction has no operands, the |
756 | generalized version has one operand - an immediate value specifying the |
777 | generalized version has one operand - an immediate value specifying the |
757 | number base for the created digits. |
778 | number base for the created digits. |
758 | "aad" modifies the numerator in AH and AL to prepare for the division of two |
779 | "aad" modifies the numerator in AH and AL to prepare for the division of two |
759 | valid unpacked decimal operands so that the quotient produced by the division |
780 | valid unpacked decimal operands so that the quotient produced by the division |
760 | will be a valid unpacked decimal number. AH should contain the high order |
781 | will be a valid unpacked decimal number. AH should contain the high order |
761 | digit and AL the low order digit. This instruction adjusts the value and |
782 | digit and AL the low order digit. This instruction adjusts the value and |
762 | places the result in AL, while AH will contain zero. The generalized version |
783 | places the result in AL, while AH will contain zero. The generalized version |
763 | of this instruction allows adjustment of two unpacked digits of any number |
784 | of this instruction allows adjustment of two unpacked digits of any number |
764 | base. Rules for the operand are the same as for the "aam" instruction. |
785 | base. Rules for the operand are the same as for the "aam" instruction. |
765 | 786 | ||
766 | 787 | ||
767 | 2.1.5 Logical instructions |
788 | 2.1.5 Logical instructions |
768 | 789 | ||
769 | "not" inverts the bits in the specified operand to form a one's |
790 | "not" inverts the bits in the specified operand to form a one's complement |
770 | complement of the operand. It has no effect on the flags. Rules for the |
791 | of the operand. It has no effect on the flags. Rules for the operand are the |
771 | operand are the same as for the "inc" instruction. |
792 | same as for the "inc" instruction. |
772 | "and", "or" and "xor" instructions perform the standard |
793 | "and", "or" and "xor" instructions perform the standard logical operations. |
773 | logical operations. They update the SF, ZF and PF flags. Rules for the |
794 | They update the SF, ZF and PF flags. Rules for the operands are the same as |
774 | operands are the same as for the "add" instruction. |
795 | for the "add" instruction. |
775 | "bt", "bts", "btr" and "btc" instructions operate on a single bit which can |
796 | "bt", "bts", "btr" and "btc" instructions operate on a single bit which can |
776 | be in memory or in a general register. The location of the bit is specified |
797 | be in memory or in a general register. The location of the bit is specified |
777 | as an offset from the low order end of the operand. The value of the offset |
798 | as an offset from the low order end of the operand. The value of the offset |
778 | is the taken from the second operand, it either may be an immediate byte or |
799 | is the taken from the second operand, it either may be an immediate byte or |
779 | a general register. These instructions first assign the value of the selected |
800 | a general register. These instructions first assign the value of the selected |
780 | bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to |
801 | bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to |
781 | 1, "btr" resets the selected bit to 0, "btc" changes the bit to its |
802 | 1, "btr" resets the selected bit to 0, "btc" changes the bit to its |
782 | complement. The first operand can be word or double word. |
803 | complement. The first operand can be word or double word. |
783 | 804 | ||
784 | bt ax,15 ; test bit in register |
805 | bt ax,15 ; test bit in register |
785 | bts word [bx],15 ; test and set bit in memory |
806 | bts word [bx],15 ; test and set bit in memory |
786 | btr ax,cx ; test and reset bit in register |
807 | btr ax,cx ; test and reset bit in register |
787 | btc word [bx],cx ; test and complement bit in memory |
808 | btc word [bx],cx ; test and complement bit in memory |
788 | 809 | ||
789 | "bsf" and "bsr" instructions scan a word or double word for first set bit |
810 | "bsf" and "bsr" instructions scan a word or double word for first set bit |
790 | and store the index of this bit into destination operand, which must be |
811 | and store the index of this bit into destination operand, which must be |
791 | general register. The bit string being scanned is specified by source operand, |
812 | general register. The bit string being scanned is specified by source operand, |
792 | it may be either general register or memory. The ZF flag is set if the entire |
813 | it may be either general register or memory. The ZF flag is set if the entire |
793 | string is zero (no set bits are found); otherwise it is cleared. If no set bit |
814 | string is zero (no set bits are found); otherwise it is cleared. If no set bit |
794 | is found, the value of the destination register is undefined. "bsf" scans from |
815 | is found, the value of the destination register is undefined. "bsf" scans from |
795 | low order to high order (starting from bit index zero). "bsr" scans from high |
816 | low order to high order (starting from bit index zero). "bsr" scans from high |
796 | order to low order (starting from bit index 15 of a word or index 31 of a |
817 | order to low order (starting from bit index 15 of a word or index 31 of a |
797 | double word). |
818 | double word). |
798 | 819 | ||
799 | bsf ax,bx ; scan register forward |
820 | bsf ax,bx ; scan register forward |
800 | bsr ax,[si] ; scan memory reverse |
821 | bsr ax,[si] ; scan memory reverse |
801 | 822 | ||
802 | "shl" shifts the destination operand left by the number of bits specified |
823 | "shl" shifts the destination operand left by the number of bits specified |
803 | in the second operand. The destination operand can be byte, word, or double |
824 | in the second operand. The destination operand can be byte, word, or double |
804 | word general register or memory. The second operand can be an immediate value |
825 | word general register or memory. The second operand can be an immediate value |
805 | or the CL register. The processor shifts zeros in from the right (low order) |
826 | or the CL register. The processor shifts zeros in from the right (low order) |
806 | side of the operand as bits exit from the left side. The last bit that exited |
827 | side of the operand as bits exit from the left side. The last bit that exited |
807 | is stored in CF. "sal" is a synonym for "shl". |
828 | is stored in CF. "sal" is a synonym for "shl". |
808 | 829 | ||
809 | shl al,1 ; shift register left by one bit |
830 | shl al,1 ; shift register left by one bit |
810 | shl byte [bx],1 ; shift memory left by one bit |
831 | shl byte [bx],1 ; shift memory left by one bit |
811 | shl ax,cl ; shift register left by count from cl |
832 | shl ax,cl ; shift register left by count from cl |
812 | shl word [bx],cl ; shift memory left by count from cl |
833 | shl word [bx],cl ; shift memory left by count from cl |
813 | 834 | ||
814 | "shr" and "sar" shift the destination operand right by the number of bits |
835 | "shr" and "sar" shift the destination operand right by the number of bits |
815 | specified in the second operand. Rules for operands are the same as for the |
836 | specified in the second operand. Rules for operands are the same as for the |
816 | "shl" instruction. "shr" shifts zeros in from the left side of the operand as |
837 | "shl" instruction. "shr" shifts zeros in from the left side of the operand as |
817 | bits exit from the right side. The last bit that exited is stored in CF. |
838 | bits exit from the right side. The last bit that exited is stored in CF. |
818 | "sar" preserves the sign of the operand by shifting in zeros on the left side |
839 | "sar" preserves the sign of the operand by shifting in zeros on the left side |
819 | if the value is positive or by shifting in ones if the value is negative. |
840 | if the value is positive or by shifting in ones if the value is negative. |
820 | "shld" shifts bits of the destination operand to the left by the number |
841 | "shld" shifts bits of the destination operand to the left by the number |
821 | of bits specified in third operand, while shifting high order bits from the |
842 | of bits specified in third operand, while shifting high order bits from the |
822 | source operand into the destination operand on the right. The source operand |
843 | source operand into the destination operand on the right. The source operand |
823 | remains unmodified. The destination operand can be a word or double word |
844 | remains unmodified. The destination operand can be a word or double word |
824 | general register or memory, the source operand must be a general register, |
845 | general register or memory, the source operand must be a general register, |
825 | third operand can be an immediate value or the CL register. |
846 | third operand can be an immediate value or the CL register. |
826 | 847 | ||
827 | shld ax,bx,1 ; shift register left by one bit |
848 | shld ax,bx,1 ; shift register left by one bit |
828 | shld [di],bx,1 ; shift memory left by one bit |
849 | shld [di],bx,1 ; shift memory left by one bit |
829 | shld ax,bx,cl ; shift register left by count from cl |
850 | shld ax,bx,cl ; shift register left by count from cl |
830 | shld [di],bx,cl ; shift memory left by count from cl |
851 | shld [di],bx,cl ; shift memory left by count from cl |
831 | 852 | ||
832 | "shrd" shifts bits of the destination operand to the right, while shifting |
853 | "shrd" shifts bits of the destination operand to the right, while shifting |
833 | low order bits from the source operand into the destination operand on the |
854 | low order bits from the source operand into the destination operand on the |
834 | left. The source operand remains unmodified. Rules for operands are the same |
855 | left. The source operand remains unmodified. Rules for operands are the same |
835 | as for the "shld" instruction. |
856 | as for the "shld" instruction. |
836 | "rol" and "rcl" rotate the byte, word or double word destination operand |
857 | "rol" and "rcl" rotate the byte, word or double word destination operand |
837 | left by the number of bits specified in the second operand. For each rotation |
858 | left by the number of bits specified in the second operand. For each rotation |
838 | specified, the high order bit that exits from the left of the operand returns |
859 | specified, the high order bit that exits from the left of the operand returns |
839 | at the right to become the new low order bit. "rcl" additionally puts in CF |
860 | at the right to become the new low order bit. "rcl" additionally puts in CF |
840 | each high order bit that exits from the left side of the operand before it |
861 | each high order bit that exits from the left side of the operand before it |
841 | returns to the operand as the low order bit on the next rotation cycle. Rules |
862 | returns to the operand as the low order bit on the next rotation cycle. Rules |
842 | for operands are the same as for the "shl" instruction. |
863 | for operands are the same as for the "shl" instruction. |
843 | "ror" and "rcr" rotate the byte, word or double word destination operand |
864 | "ror" and "rcr" rotate the byte, word or double word destination operand |
844 | right by the number of bits specified in the second operand. For each rotation |
865 | right by the number of bits specified in the second operand. For each rotation |
845 | specified, the low order bit that exits from the right of the operand returns |
866 | specified, the low order bit that exits from the right of the operand returns |
846 | at the left to become the new high order bit. "rcr" additionally puts in CF |
867 | at the left to become the new high order bit. "rcr" additionally puts in CF |
847 | each low order bit that exits from the right side of the operand before it |
868 | each low order bit that exits from the right side of the operand before it |
848 | returns to the operand as the high order bit on the next rotation cycle. |
869 | returns to the operand as the high order bit on the next rotation cycle. |
849 | Rules for operands are the same as for the "shl" instruction. |
870 | Rules for operands are the same as for the "shl" instruction. |
850 | "test" performs the same action as the "and" instruction, but it does not |
871 | "test" performs the same action as the "and" instruction, but it does not |
851 | alter the destination operand, only updates flags. Rules for the operands are |
872 | alter the destination operand, only updates flags. Rules for the operands are |
852 | the same as for the "and" instruction. |
873 | the same as for the "and" instruction. |
853 | "bswap" reverses the byte order of a 32-bit general register: bits 0 through |
874 | "bswap" reverses the byte order of a 32-bit general register: bits 0 through |
854 | 7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with |
875 | 7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with |
855 | bits 16 through 23. This instruction is provided for converting little-endian |
876 | bits 16 through 23. This instruction is provided for converting little-endian |
856 | values to big-endian format and vice versa. |
877 | values to big-endian format and vice versa. |
857 | 878 | ||
858 | bswap edx ; swap bytes in register |
879 | bswap edx ; swap bytes in register |
859 | 880 | ||
860 | 881 | ||
861 | 2.1.6 Control transfer instructions |
882 | 2.1.6 Control transfer instructions |
862 | 883 | ||
863 | "jmp" unconditionally transfers control to the target location. The |
884 | "jmp" unconditionally transfers control to the target location. The |
864 | destination address can be specified directly within the instruction or |
885 | destination address can be specified directly within the instruction or |
865 | indirectly through a register or memory, the acceptable size of this address |
886 | indirectly through a register or memory, the acceptable size of this address |
866 | depends on whether the jump is near or far (it can be specified by preceding |
887 | depends on whether the jump is near or far (it can be specified by preceding |
867 | the operand with "near" or "far" operator) and whether the instruction is |
888 | the operand with "near" or "far" operator) and whether the instruction is |
868 | 16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit |
889 | 16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit |
869 | instruction or the "dword" size for 32-bit instruction. Operand for far jump |
890 | instruction or the "dword" size for 32-bit instruction. Operand for far jump |
870 | should be "dword" size for 16-bit instruction or "pword" size for 32-bit |
891 | should be "dword" size for 16-bit instruction or "pword" size for 32-bit |
871 | instruction. A direct "jmp" instruction includes the destination address as |
892 | instruction. A direct "jmp" instruction includes the destination address as |
872 | part of the instruction (and can be preceded by "short", "near" or "far" |
893 | part of the instruction (and can be preceded by "short", "near" or "far" |
873 | operator), the operand specifying address should be the numerical expression |
894 | operator), the operand specifying address should be the numerical expression |
874 | for near or short jump, or two numerical expressions separated with colon for |
895 | for near or short jump, or two numerical expressions separated with colon for |
875 | far jump, the first specifies selector of segment, the second is the offset |
896 | far jump, the first specifies selector of segment, the second is the offset |
876 | within segment. The "pword" operator can be used to force the 32-bit far call, |
897 | within segment. The "pword" operator can be used to force the 32-bit far call, |
877 | and "dword" to force the 16-bit far call. An indirect "jmp" instruction |
898 | and "dword" to force the 16-bit far call. An indirect "jmp" instruction |
878 | obtains the destination address indirectly through a register or a pointer |
899 | obtains the destination address indirectly through a register or a pointer |
879 | variable, the operand should be general register or memory. See also 1.2.5 for |
900 | variable, the operand should be general register or memory. See also 1.2.5 for |
880 | some more details. |
901 | some more details. |
881 | 902 | ||
882 | jmp 100h ; direct near jump |
903 | jmp 100h ; direct near jump |
883 | jmp 0FFFFh:0 ; direct far jump |
904 | jmp 0FFFFh:0 ; direct far jump |
884 | jmp ax ; indirect near jump |
905 | jmp ax ; indirect near jump |
885 | jmp pword [ebx] ; indirect far jump |
906 | jmp pword [ebx] ; indirect far jump |
886 | 907 | ||
887 | "call" transfers control to the procedure, saving on the stack the address |
908 | "call" transfers control to the procedure, saving on the stack the address |
888 | of the instruction following the "call" for later use by a "ret" (return) |
909 | of the instruction following the "call" for later use by a "ret" (return) |
889 | instruction. Rules for the operands are the same as for the "jmp" instruction, |
910 | instruction. Rules for the operands are the same as for the "jmp" instruction, |
890 | but the "call" has no short variant of direct instruction and thus it not |
911 | but the "call" has no short variant of direct instruction and thus it not |
891 | optimized. |
912 | optimized. |
892 | "ret", "retn" and "retf" instructions terminate the execution of a procedure |
913 | "ret", "retn" and "retf" instructions terminate the execution of a procedure |
893 | and transfers control back to the program that originally invoked the |
914 | and transfers control back to the program that originally invoked the |
894 | procedure using the address that was stored on the stack by the "call" |
915 | procedure using the address that was stored on the stack by the "call" |
895 | instruction. "ret" is the equivalent for "retn", which returns from the |
916 | instruction. "ret" is the equivalent for "retn", which returns from the |
896 | procedure that was executed using the near call, while "retf" returns from |
917 | procedure that was executed using the near call, while "retf" returns from |
897 | the procedure that was executed using the far call. These instructions default |
918 | the procedure that was executed using the far call. These instructions default |
898 | to the size of address appropriate for the current code setting, but the size |
919 | to the size of address appropriate for the current code setting, but the size |
899 | of address can be forced to 16-bit by using the "retw", "retnw" and "retfw" |
920 | of address can be forced to 16-bit by using the "retw", "retnw" and "retfw" |
900 | mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics. |
921 | mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics. |
901 | All these instructions may optionally specify an immediate operand, by adding |
922 | All these instructions may optionally specify an immediate operand, by adding |
902 | this constant to the stack pointer, they effectively remove any arguments that |
923 | this constant to the stack pointer, they effectively remove any arguments that |
903 | the calling program pushed on the stack before the execution of the "call" |
924 | the calling program pushed on the stack before the execution of the "call" |
904 | instruction. |
925 | instruction. |
905 | "iret" returns control to an interrupted procedure. It differs from "ret" in |
926 | "iret" returns control to an interrupted procedure. It differs from "ret" in |
906 | that it also pops the flags from the stack into the flags register. The flags |
927 | that it also pops the flags from the stack into the flags register. The flags |
907 | are stored on the stack by the interrupt mechanism. It defaults to the size of |
928 | are stored on the stack by the interrupt mechanism. It defaults to the size of |
908 | return address appropriate for the current code setting, but it can be forced |
929 | return address appropriate for the current code setting, but it can be forced |
909 | to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic. |
930 | to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic. |
910 | The conditional transfer instructions are jumps that may or may not transfer |
931 | The conditional transfer instructions are jumps that may or may not transfer |
911 | control, depending on the state of the CPU flags when the instruction |
932 | control, depending on the state of the CPU flags when the instruction |
912 | executes. The mnemonics for conditional jumps may be obtained by attaching |
933 | executes. The mnemonics for conditional jumps may be obtained by attaching |
913 | the condition mnemonic (see table 2.1) to the "j" mnemonic, |
934 | the condition mnemonic (see table 2.1) to the "j" mnemonic, |
914 | for example "jc" instruction will transfer the control when the CF flag is |
935 | for example "jc" instruction will transfer the control when the CF flag is |
915 | set. The conditional jumps can be short or near, and direct only, and can be |
936 | set. The conditional jumps can be short or near, and direct only, and can be |
916 | optimized (see 1.2.5), the operand should be an immediate value specifying |
937 | optimized (see 1.2.5), the operand should be an immediate value specifying |
917 | target address. |
938 | target address. |
918 | 939 | ||
919 | Table 2.1 Conditions |
940 | Table 2.1 Conditions |
920 | ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ |
941 | /-----------------------------------------------------------\ |
921 | ³ Mnemonic ³ Condition tested ³ Description ³ |
942 | | Mnemonic | Condition tested | Description | |
922 | ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ |
943 | |==========|=======================|========================| |
923 | ³ o ³ OF = 1 ³ overflow ³ |
944 | | o | OF = 1 | overflow | |
924 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
945 | |----------|-----------------------|------------------------| |
925 | ³ no ³ OF = 0 ³ not overflow ³ |
946 | | no | OF = 0 | not overflow | |
926 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
947 | |----------|-----------------------|------------------------| |
927 | ³ c ³ ³ carry ³ |
948 | | c | | carry | |
928 | ³ b ³ CF = 1 ³ below ³ |
949 | | b | CF = 1 | below | |
929 | ³ nae ³ ³ not above nor equal ³ |
950 | | nae | | not above nor equal | |
930 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
951 | |----------|-----------------------|------------------------| |
931 | ³ nc ³ ³ not carry ³ |
952 | | nc | | not carry | |
932 | ³ ae ³ CF = 0 ³ above or equal ³ |
953 | | ae | CF = 0 | above or equal | |
933 | ³ nb ³ ³ not below ³ |
954 | | nb | | not below | |
934 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
955 | |----------|-----------------------|------------------------| |
935 | ³ e ³ ZF = 1 ³ equal ³ |
956 | | e | ZF = 1 | equal | |
936 | ³ z ³ ³ zero ³ |
957 | | z | | zero | |
937 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
958 | |----------|-----------------------|------------------------| |
938 | ³ ne ³ ZF = 0 ³ not equal ³ |
959 | | ne | ZF = 0 | not equal | |
939 | ³ nz ³ ³ not zero ³ |
960 | | nz | | not zero | |
940 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
961 | |----------|-----------------------|------------------------| |
941 | ³ be ³ CF or ZF = 1 ³ below or equal ³ |
962 | | be | CF or ZF = 1 | below or equal | |
942 | ³ na ³ ³ not above ³ |
963 | | na | | not above | |
943 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
964 | |----------|-----------------------|------------------------| |
944 | ³ a ³ CF or ZF = 0 ³ above ³ |
965 | | a | CF or ZF = 0 | above | |
945 | ³ nbe ³ ³ not below nor equal ³ |
966 | | nbe | | not below nor equal | |
946 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
967 | |----------|-----------------------|------------------------| |
947 | ³ s ³ SF = 1 ³ sign ³ |
968 | | s | SF = 1 | sign | |
948 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
969 | |----------|-----------------------|------------------------| |
949 | ³ ns ³ SF = 0 ³ not sign ³ |
970 | | ns | SF = 0 | not sign | |
950 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
971 | |----------|-----------------------|------------------------| |
951 | ³ p ³ PF = 1 ³ parity ³ |
972 | | p | PF = 1 | parity | |
952 | ³ pe ³ ³ parity even ³ |
973 | | pe | | parity even | |
953 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
974 | |----------|-----------------------|------------------------| |
954 | ³ np ³ PF = 0 ³ not parity ³ |
975 | | np | PF = 0 | not parity | |
955 | ³ po ³ ³ parity odd ³ |
976 | | po | | parity odd | |
956 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
977 | |----------|-----------------------|------------------------| |
957 | ³ l ³ SF xor OF = 1 ³ less ³ |
978 | | l | SF xor OF = 1 | less | |
958 | ³ nge ³ ³ not greater nor equal ³ |
979 | | nge | | not greater nor equal | |
959 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
980 | |----------|-----------------------|------------------------| |
960 | ³ ge ³ SF xor OF = 0 ³ greater or equal ³ |
981 | | ge | SF xor OF = 0 | greater or equal | |
961 | ³ nl ³ ³ not less ³ |
982 | | nl | | not less | |
962 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
983 | |----------|-----------------------|------------------------| |
963 | ³ le ³ (SF xor OF) or ZF = 1 ³ less or equal ³ |
984 | | le | (SF xor OF) or ZF = 1 | less or equal | |
964 | ³ ng ³ ³ not greater ³ |
985 | | ng | | not greater | |
965 | ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ |
986 | |----------|-----------------------|------------------------| |
966 | ³ g ³ (SF xor OF) or ZF = 0 ³ greater ³ |
987 | | g | (SF xor OF) or ZF = 0 | greater | |
967 | ³ nle ³ ³ not less nor equal ³ |
988 | | nle | | not less nor equal | |
968 | ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ |
989 | \-----------------------------------------------------------/ |
969 | 990 | ||
970 | The "loop" instructions are conditional jumps that use a value placed in |
991 | The "loop" instructions are conditional jumps that use a value placed in |
971 | CX (or ECX) to specify the number of repetitions of a software loop. All |
992 | CX (or ECX) to specify the number of repetitions of a software loop. All |
972 | "loop" instructions automatically decrement CX (or ECX) and terminate the |
993 | "loop" instructions automatically decrement CX (or ECX) and terminate the |
973 | loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX |
994 | loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX |
974 | whether the current code setting is 16-bit or 32-bit, but it can be forced to |
995 | whether the current code setting is 16-bit or 32-bit, but it can be forced to |
975 | us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic. |
996 | us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic. |
976 | "loope" and "loopz" are the synonyms for the same instruction, which acts as |
997 | "loope" and "loopz" are the synonyms for the same instruction, which acts as |
977 | the standard "loop", but also terminates the loop when ZF flag is set. |
998 | the standard "loop", but also terminates the loop when ZF flag is set. |
978 | "loopew" and "loopzw" mnemonics force them to use CX register while "looped" |
999 | "loopew" and "loopzw" mnemonics force them to use CX register while "looped" |
979 | and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the |
1000 | and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the |
980 | synonyms for the same instructions, which acts as the standard "loop", but |
1001 | synonyms for the same instructions, which acts as the standard "loop", but |
981 | also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw" |
1002 | also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw" |
982 | mnemonics force them to use CX register while "loopned" and "loopnzd" force |
1003 | mnemonics force them to use CX register while "loopned" and "loopnzd" force |
983 | them to use ECX register. Every "loop" instruction needs an operand being an |
1004 | them to use ECX register. Every "loop" instruction needs an operand being an |
984 | immediate value specifying target address, it can be only short jump (in the |
1005 | immediate value specifying target address, it can be only short jump (in the |
985 | range of 128 bytes back and 127 bytes forward from the address of instruction |
1006 | range of 128 bytes back and 127 bytes forward from the address of instruction |
986 | following the "loop" instruction). |
1007 | following the "loop" instruction). |
987 | "jcxz" branches to the label specified in the instruction if it finds a |
1008 | "jcxz" branches to the label specified in the instruction if it finds a |
988 | value of zero in CX, "jecxz" does the same, but checks the value of ECX |
1009 | value of zero in CX, "jecxz" does the same, but checks the value of ECX |
989 | instead of CX. Rules for the operands are the same as for the "loop" |
1010 | instead of CX. Rules for the operands are the same as for the "loop" |
990 | instruction. |
1011 | instruction. |
991 | "int" activates the interrupt service routine that corresponds to the |
1012 | "int" activates the interrupt service routine that corresponds to the |
992 | number specified as an operand to the instruction, the number should be in |
1013 | number specified as an operand to the instruction, the number should be in |
993 | range from 0 to 255. The interrupt service routine terminates with an "iret" |
1014 | range from 0 to 255. The interrupt service routine terminates with an "iret" |
994 | instruction that returns control to the instruction that follows "int". |
1015 | instruction that returns control to the instruction that follows "int". |
995 | "int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3. |
1016 | "int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3. |
996 | "into" instruction invokes the interrupt 4 if the OF flag is set. |
1017 | "into" instruction invokes the interrupt 4 if the OF flag is set. |
997 | "bound" verifies that the signed value contained in the specified register |
1018 | "bound" verifies that the signed value contained in the specified register |
998 | lies within specified limits. An interrupt 5 occurs if the value contained in |
1019 | lies within specified limits. An interrupt 5 occurs if the value contained in |
999 | the register is less than the lower bound or greater than the upper bound. It |
1020 | the register is less than the lower bound or greater than the upper bound. It |
1000 | needs two operands, the first operand specifies the register being tested, |
1021 | needs two operands, the first operand specifies the register being tested, |
1001 | the second operand should be memory address for the two signed limit values. |
1022 | the second operand should be memory address for the two signed limit values. |
1002 | The operands can be "word" or "dword" in size. |
1023 | The operands can be "word" or "dword" in size. |
1003 | 1024 | ||
1004 | bound ax,[bx] ; check word for bounds |
1025 | bound ax,[bx] ; check word for bounds |
1005 | bound eax,[esi] ; check double word for bounds |
1026 | bound eax,[esi] ; check double word for bounds |
1006 | 1027 | ||
1007 | 1028 | ||
1008 | 2.1.7 I/O instructions |
1029 | 2.1.7 I/O instructions |
1009 | 1030 | ||
1010 | "in" transfers a byte, word, or double word from an input port to AL, AX, |
1031 | "in" transfers a byte, word, or double word from an input port to AL, AX, |
1011 | or EAX. I/O ports can be addressed either directly, with the immediate byte |
1032 | or EAX. I/O ports can be addressed either directly, with the immediate byte |
1012 | value coded in instruction, or indirectly via the DX register. The destination |
1033 | value coded in instruction, or indirectly via the DX register. The destination |
1013 | operand should be AL, AX, or EAX register. The source operand should be an |
1034 | operand should be AL, AX, or EAX register. The source operand should be an |
1014 | immediate value in range from 0 to 255, or DX register. |
1035 | immediate value in range from 0 to 255, or DX register. |
1015 | 1036 | ||
1016 | in al,20h ; input byte from port 20h |
1037 | in al,20h ; input byte from port 20h |
1017 | in ax,dx ; input word from port addressed by dx |
1038 | in ax,dx ; input word from port addressed by dx |
1018 | 1039 | ||
1019 | "out" transfers a byte, word, or double word to an output port from AL, AX, |
1040 | "out" transfers a byte, word, or double word to an output port from AL, AX, |
1020 | or EAX. The program can specify the number of the port using the same methods |
1041 | or EAX. The program can specify the number of the port using the same methods |
1021 | as the "in" instruction. The destination operand should be an immediate value |
1042 | as the "in" instruction. The destination operand should be an immediate value |
1022 | in range from 0 to 255, or DX register. The source operand should be AL, AX, |
1043 | in range from 0 to 255, or DX register. The source operand should be AL, AX, |
1023 | or EAX register. |
1044 | or EAX register. |
1024 | 1045 | ||
1025 | out 20h,ax ; output word to port 20h |
1046 | out 20h,ax ; output word to port 20h |
1026 | out dx,al ; output byte to port addressed by dx |
1047 | out dx,al ; output byte to port addressed by dx |
1027 | 1048 | ||
1028 | 1049 | ||
1029 | 2.1.8 Strings operations |
1050 | 2.1.8 Strings operations |
1030 | 1051 | ||
1031 | The string operations operate on one element of a string. A string element |
1052 | The string operations operate on one element of a string. A string element |
1032 | may be a byte, a word, or a double word. The string elements are addressed by |
1053 | may be a byte, a word, or a double word. The string elements are addressed by |
1033 | SI and DI (or ESI and EDI) registers. After every string operation SI and/or |
1054 | SI and DI (or ESI and EDI) registers. After every string operation SI and/or |
1034 | DI (or ESI and/or EDI) are automatically updated to point to the next element |
1055 | DI (or ESI and/or EDI) are automatically updated to point to the next element |
1035 | of the string. If DF (direction flag) is zero, the index registers are |
1056 | of the string. If DF (direction flag) is zero, the index registers are |
1036 | incremented, if DF is one, they are decremented. The amount of the increment |
1057 | incremented, if DF is one, they are decremented. The amount of the increment |
1037 | or decrement is 1, 2, or 4 depending on the size of the string element. Every |
1058 | or decrement is 1, 2, or 4 depending on the size of the string element. Every |
1038 | string operation instruction has short forms which have no operands and use |
1059 | string operation instruction has short forms which have no operands and use |
1039 | SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code |
1060 | SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code |
1040 | type is 32-bit. SI and ESI by default address data in the segment selected |
1061 | type is 32-bit. SI and ESI by default address data in the segment selected |
1041 | by DS, DI and EDI always address data in the segment selected by ES. Short |
1062 | by DS, DI and EDI always address data in the segment selected by ES. Short |
1042 | form is obtained by attaching to the mnemonic of string operation letter |
1063 | form is obtained by attaching to the mnemonic of string operation letter |
1043 | specifying the size of string element, it should be "b" for byte element, |
1064 | specifying the size of string element, it should be "b" for byte element, |
1044 | "w" for word element, and "d" for double word element. Full form of string |
1065 | "w" for word element, and "d" for double word element. Full form of string |
1045 | operation needs operands providing the size operator and the memory addresses, |
1066 | operation needs operands providing the size operator and the memory addresses, |
1046 | which can be SI or ESI with any segment prefix, DI or EDI always with ES |
1067 | which can be SI or ESI with any segment prefix, DI or EDI always with ES |
1047 | segment prefix. |
1068 | segment prefix. |
1048 | "movs" transfers the string element pointed to by SI (or ESI) to the |
1069 | "movs" transfers the string element pointed to by SI (or ESI) to the |
1049 | location pointed to by DI (or EDI). Size of operands can be byte, word, or |
1070 | location pointed to by DI (or EDI). Size of operands can be byte, word, or |
1050 | double word. The destination operand should be memory addressed by DI or EDI, |
1071 | double word. The destination operand should be memory addressed by DI or EDI, |
1051 | the source operand should be memory addressed by SI or ESI with any segment |
1072 | the source operand should be memory addressed by SI or ESI with any segment |
1052 | prefix. |
1073 | prefix. |
1053 | 1074 | ||
1054 | movs byte [di],[si] ; transfer byte |
1075 | movs byte [di],[si] ; transfer byte |
1055 | movs word [es:di],[ss:si] ; transfer word |
1076 | movs word [es:di],[ss:si] ; transfer word |
1056 | movsd ; transfer double word |
1077 | movsd ; transfer double word |
1057 | 1078 | ||
1058 | "cmps" subtracts the destination string element from the source string |
1079 | "cmps" subtracts the destination string element from the source string |
1059 | element and updates the flags AF, SF, PF, CF and OF, but it does not change |
1080 | element and updates the flags AF, SF, PF, CF and OF, but it does not change |
1060 | any of the compared elements. If the string elements are equal, ZF is set, |
1081 | any of the compared elements. If the string elements are equal, ZF is set, |
1061 | otherwise it is cleared. The first operand for this instruction should be the |
1082 | otherwise it is cleared. The first operand for this instruction should be the |
1062 | source string element addressed by SI or ESI with any segment prefix, the |
1083 | source string element addressed by SI or ESI with any segment prefix, the |
1063 | second operand should be the destination string element addressed by DI or |
1084 | second operand should be the destination string element addressed by DI or |
1064 | EDI. |
1085 | EDI. |
1065 | 1086 | ||
1066 | cmpsb ; compare bytes |
1087 | cmpsb ; compare bytes |
1067 | cmps word [ds:si],[es:di] ; compare words |
1088 | cmps word [ds:si],[es:di] ; compare words |
1068 | cmps dword [fs:esi],[edi] ; compare double words |
1089 | cmps dword [fs:esi],[edi] ; compare double words |
1069 | 1090 | ||
1070 | "scas" subtracts the destination string element from AL, AX, or EAX |
1091 | "scas" subtracts the destination string element from AL, AX, or EAX |
1071 | (depending on the size of string element) and updates the flags AF, SF, ZF, |
1092 | (depending on the size of string element) and updates the flags AF, SF, ZF, |
1072 | PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared. |
1093 | PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared. |
1073 | The operand should be the destination string element addressed by DI or EDI. |
1094 | The operand should be the destination string element addressed by DI or EDI. |
1074 | 1095 | ||
1075 | scas byte [es:di] ; scan byte |
1096 | scas byte [es:di] ; scan byte |
1076 | scasw ; scan word |
1097 | scasw ; scan word |
1077 | scas dword [es:edi] ; scan double word |
1098 | scas dword [es:edi] ; scan double word |
1078 | 1099 | ||
1079 | "stos" places the value of AL, AX, or EAX into the destination string |
1100 | "stos" places the value of AL, AX, or EAX into the destination string |
1080 | element. Rules for the operand are the same as for the "scas" instruction. |
1101 | element. Rules for the operand are the same as for the "scas" instruction. |
1081 | "lods" places the source string element into AL, AX, or EAX. The operand |
1102 | "lods" places the source string element into AL, AX, or EAX. The operand |
1082 | should be the source string element addressed by SI or ESI with any segment |
1103 | should be the source string element addressed by SI or ESI with any segment |
1083 | prefix. |
1104 | prefix. |
1084 | 1105 | ||
1085 | lods byte [ds:si] ; load byte |
1106 | lods byte [ds:si] ; load byte |
1086 | lods word [cs:si] ; load word |
1107 | lods word [cs:si] ; load word |
1087 | lodsd ; load double word |
1108 | lodsd ; load double word |
1088 | 1109 | ||
1089 | "ins" transfers a byte, word, or double word from an input port addressed |
1110 | "ins" transfers a byte, word, or double word from an input port addressed |
1090 | by DX register to the destination string element. The destination operand |
1111 | by DX register to the destination string element. The destination operand |
1091 | should be memory addressed by DI or EDI, the source operand should be the DX |
1112 | should be memory addressed by DI or EDI, the source operand should be the DX |
1092 | register. |
1113 | register. |
1093 | 1114 | ||
1094 | insb ; input byte |
1115 | insb ; input byte |
1095 | ins word [es:di],dx ; input word |
1116 | ins word [es:di],dx ; input word |
1096 | ins dword [edi],dx ; input double word |
1117 | ins dword [edi],dx ; input double word |
1097 | 1118 | ||
1098 | "outs" transfers the source string element to an output port addressed by |
1119 | "outs" transfers the source string element to an output port addressed by |
1099 | DX register. The destination operand should be the DX register and the source |
1120 | DX register. The destination operand should be the DX register and the source |
1100 | operand should be memory addressed by SI or ESI with any segment prefix. |
1121 | operand should be memory addressed by SI or ESI with any segment prefix. |
1101 | 1122 | ||
1102 | outs dx,byte [si] ; output byte |
1123 | outs dx,byte [si] ; output byte |
1103 | outsw ; output word |
1124 | outsw ; output word |
1104 | outs dx,dword [gs:esi] ; output double word |
1125 | outs dx,dword [gs:esi] ; output double word |
1105 | 1126 | ||
1106 | The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify |
1127 | The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify |
1107 | repeated string operation. When a string operation instruction has a repeat |
1128 | repeated string operation. When a string operation instruction has a repeat |
1108 | prefix, the operation is executed repeatedly, each time using a different |
1129 | prefix, the operation is executed repeatedly, each time using a different |
1109 | element of the string. The repetition terminates when one of the conditions |
1130 | element of the string. The repetition terminates when one of the conditions |
1110 | specified by the prefix is satisfied. All three prefixes automatically |
1131 | specified by the prefix is satisfied. All three prefixes automatically |
1111 | decrease CX or ECX register (depending whether string operation instruction |
1132 | decrease CX or ECX register (depending whether string operation instruction |
1112 | uses the 16-bit or 32-bit addressing) after each operation and repeat the |
1133 | uses the 16-bit or 32-bit addressing) after each operation and repeat the |
1113 | associated operation until CX or ECX is zero. "repe"/"repz" and |
1134 | associated operation until CX or ECX is zero. "repe"/"repz" and |
1114 | "repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions |
1135 | "repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions |
1115 | (described below). When these prefixes are used, repetition of the next |
1136 | (described below). When these prefixes are used, repetition of the next |
1116 | instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate |
1137 | instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate |
1117 | the execution when the ZF is zero, "repne" and "repnz" terminate the execution |
1138 | the execution when the ZF is zero, "repne" and "repnz" terminate the execution |
1118 | when the ZF is set. |
1139 | when the ZF is set. |
1119 | 1140 | ||
1120 | rep movsd ; transfer multiple double words |
1141 | rep movsd ; transfer multiple double words |
1121 | repe cmpsb ; compare bytes until not equal |
1142 | repe cmpsb ; compare bytes until not equal |
1122 | 1143 | ||
1123 | 1144 | ||
1124 | 2.1.9 Flag control instructions |
1145 | 2.1.9 Flag control instructions |
1125 | 1146 | ||
1126 | The flag control instructions provide a method for directly changing the |
1147 | The flag control instructions provide a method for directly changing the |
1127 | state of bits in the flag register. All instructions described in this |
1148 | state of bits in the flag register. All instructions described in this |
1128 | section have no operands. |
1149 | section have no operands. |
1129 | "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the |
1150 | "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the |
1130 | CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes |
1151 | CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes |
1131 | the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the |
1152 | the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the |
1132 | interrupts, "cli" zeroes the IF and therefore disables the interrupts. |
1153 | interrupts, "cli" zeroes the IF and therefore disables the interrupts. |
1133 | "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the |
1154 | "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the |
1134 | AH register. The contents of the remaining bits are undefined. The flags |
1155 | AH register. The contents of the remaining bits are undefined. The flags |
1135 | remain unaffected. |
1156 | remain unaffected. |
1136 | "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF, |
1157 | "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF, |
1137 | AF, PF, and CF. |
1158 | AF, PF, and CF. |
1138 | "pushf" decrements "esp" by two or four and stores the low word or |
1159 | "pushf" decrements "esp" by two or four and stores the low word or |
1139 | double word of flags register at the top of stack, size of stored data |
1160 | double word of flags register at the top of stack, size of stored data |
1140 | depends on the current code setting. "pushfw" variant forces storing the |
1161 | depends on the current code setting. "pushfw" variant forces storing the |
1141 | word and "pushfd" forces storing the double word. |
1162 | word and "pushfd" forces storing the double word. |
1142 | "popf" transfers specific bits from the word or double word at the top |
1163 | "popf" transfers specific bits from the word or double word at the top |
1143 | of stack, then increments "esp" by two or four, this value depends on |
1164 | of stack, then increments "esp" by two or four, this value depends on |
1144 | the current code setting. "popfw" variant forces restoring from the word |
1165 | the current code setting. "popfw" variant forces restoring from the word |
1145 | and "popfd" forces restoring from the double word. |
1166 | and "popfd" forces restoring from the double word. |
1146 | 1167 | ||
1147 | 1168 | ||
1148 | 2.1.10 Conditional operations |
1169 | 2.1.10 Conditional operations |
1149 | 1170 | ||
1150 | The instructions obtained by attaching the condition mnemonic (see table |
1171 | The instructions obtained by attaching the condition mnemonic (see table |
1151 | 2.1) to the "set" mnemonic set a byte to one if the condition is true and set |
1172 | 2.1) to the "set" mnemonic set a byte to one if the condition is true and set |
1152 | the byte to zero otherwise. The operand should be an 8-bit be general register |
1173 | the byte to zero otherwise. The operand should be an 8-bit be general register |
1153 | or the byte in memory. |
1174 | or the byte in memory. |
1154 | 1175 | ||
1155 | setne al ; set al if zero flag cleared |
1176 | setne al ; set al if zero flag cleared |
1156 | seto byte [bx] ; set byte if overflow |
1177 | seto byte [bx] ; set byte if overflow |
1157 | 1178 | ||
1158 | "salc" instruction sets the all bits of AL register when the carry flag is |
1179 | "salc" instruction sets the all bits of AL register when the carry flag is |
1159 | set and zeroes the AL register otherwise. This instruction has no arguments. |
1180 | set and zeroes the AL register otherwise. This instruction has no arguments. |
1160 | The instructions obtained by attaching the condition mnemonic to the "cmov" |
1181 | The instructions obtained by attaching the condition mnemonic to "cmov" |
1161 | mnemonic transfer the word or double word from the general register or memory |
1182 | mnemonic transfer the word or double word from the general register or memory |
1162 | to the general register only when the condition is true. The destination |
1183 | to the general register only when the condition is true. The destination |
1163 | operand should be general register, the source operand can be general register |
1184 | operand should be general register, the source operand can be general register |
1164 | or memory. |
1185 | or memory. |
1165 | 1186 | ||
1166 | cmove ax,bx ; move when zero flag set |
1187 | cmove ax,bx ; move when zero flag set |
1167 | cmovnc eax,[ebx] ; move when carry flag cleared |
1188 | cmovnc eax,[ebx] ; move when carry flag cleared |
1168 | 1189 | ||
1169 | "cmpxchg" compares the value in the AL, AX, or EAX register with the |
1190 | "cmpxchg" compares the value in the AL, AX, or EAX register with the |
1170 | destination operand. If the two values are equal, the source operand is |
1191 | destination operand. If the two values are equal, the source operand is |
1171 | loaded into the destination operand. Otherwise, the destination operand is |
1192 | loaded into the destination operand. Otherwise, the destination operand is |
1172 | loaded into the AL, AX, or EAX register. The destination operand may be a |
1193 | loaded into the AL, AX, or EAX register. The destination operand may be a |
1173 | general register or memory, the source operand must be a general register. |
1194 | general register or memory, the source operand must be a general register. |
1174 | 1195 | ||
1175 | cmpxchg dl,bl ; compare and exchange with register |
1196 | cmpxchg dl,bl ; compare and exchange with register |
1176 | cmpxchg [bx],dx ; compare and exchange with memory |
1197 | cmpxchg [bx],dx ; compare and exchange with memory |
1177 | 1198 | ||
1178 | "cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the |
1199 | "cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the |
1179 | destination operand. If the values are equal, the 64-bit value in ECX and EBX |
1200 | destination operand. If the values are equal, the 64-bit value in ECX and EBX |
1180 | registers is stored in the destination operand. Otherwise, the value in the |
1201 | registers is stored in the destination operand. Otherwise, the value in the |
1181 | destination operand is loaded into EDX and EAX registers. The destination |
1202 | destination operand is loaded into EDX and EAX registers. The destination |
1182 | operand should be a quad word in memory. |
1203 | operand should be a quad word in memory. |
1183 | 1204 | ||
1184 | cmpxchg8b [bx] ; compare and exchange 8 bytes |
1205 | cmpxchg8b [bx] ; compare and exchange 8 bytes |
1185 | 1206 | ||
1186 | 1207 | ||
1187 | 2.1.11 Miscellaneous instructions |
1208 | 2.1.11 Miscellaneous instructions |
1188 | 1209 | ||
1189 | "nop" instruction occupies one byte but affects nothing but the instruction |
1210 | "nop" instruction occupies one byte but affects nothing but the instruction |
1190 | pointer. This instruction has no operands and doesn't perform any operation. |
1211 | pointer. This instruction has no operands and doesn't perform any operation. |
1191 | "ud2" instruction generates an invalid opcode exception. This instruction |
1212 | "ud2" instruction generates an invalid opcode exception. This instruction |
1192 | is provided for software testing to explicitly generate an invalid opcode. |
1213 | is provided for software testing to explicitly generate an invalid opcode. |
1193 | This is instruction has no operands. |
1214 | This is instruction has no operands. |
1194 | "xlat" replaces a byte in the AL register with a byte indexed by its value |
1215 | "xlat" replaces a byte in the AL register with a byte indexed by its value |
1195 | in a translation table addressed by BX or EBX. The operand should be a byte |
1216 | in a translation table addressed by BX or EBX. The operand should be a byte |
1196 | memory addressed by BX or EBX with any segment prefix. This instruction has |
1217 | memory addressed by BX or EBX with any segment prefix. This instruction has |
1197 | also a short form "xlatb" which has no operands and uses the BX or EBX address |
1218 | also a short form "xlatb" which has no operands and uses the BX or EBX address |
1198 | in the segment selected by DS depending on the current code setting. |
1219 | in the segment selected by DS depending on the current code setting. |
1199 | "lds" transfers a pointer variable from the source operand to DS and the |
1220 | "lds" transfers a pointer variable from the source operand to DS and the |
1200 | destination register. The source operand must be a memory operand, and the |
1221 | destination register. The source operand must be a memory operand, and the |
1201 | destination operand must be a general register. The DS register receives the |
1222 | destination operand must be a general register. The DS register receives the |
1202 | segment selector of the pointer while the destination register receives the |
1223 | segment selector of the pointer while the destination register receives the |
1203 | offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically |
1224 | offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically |
1204 | to "lds" except that rather than DS register the ES, FS, GS and SS is used |
1225 | to "lds" except that rather than DS register the ES, FS, GS and SS is used |
1205 | respectively. |
1226 | respectively. |
1206 | 1227 | ||
1207 | lds bx,[si] ; load pointer to ds:bx |
1228 | lds bx,[si] ; load pointer to ds:bx |
1208 | 1229 | ||
1209 | "lea" transfers the offset of the source operand (rather than its value) |
1230 | "lea" transfers the offset of the source operand (rather than its value) |
1210 | to the destination operand. The source operand must be a memory operand, and |
1231 | to the destination operand. The source operand must be a memory operand, and |
1211 | the destination operand must be a general register. |
1232 | the destination operand must be a general register. |
1212 | 1233 | ||
1213 | lea dx,[bx+si+1] ; load effective address to dx |
1234 | lea dx,[bx+si+1] ; load effective address to dx |
1214 | 1235 | ||
1215 | "cpuid" returns processor identification and feature information in the |
1236 | "cpuid" returns processor identification and feature information in the |
1216 | EAX, EBX, ECX, and EDX registers. The information returned is selected by |
1237 | EAX, EBX, ECX, and EDX registers. The information returned is selected by |
1217 | entering a value in the EAX register before the instruction is executed. |
1238 | entering a value in the EAX register before the instruction is executed. |
1218 | This instruction has no operands. |
1239 | This instruction has no operands. |
1219 | "pause" instruction delays the execution of the next instruction an |
1240 | "pause" instruction delays the execution of the next instruction an |
1220 | implementation specific amount of time. It can be used to improve the |
1241 | implementation specific amount of time. It can be used to improve the |
1221 | performance of spin wait loops. This instruction has no operands. |
1242 | performance of spin wait loops. This instruction has no operands. |
1222 | "enter" creates a stack frame that may be used to implement the scope rules |
1243 | "enter" creates a stack frame that may be used to implement the scope rules |
1223 | of block-structured high-level languages. A "leave" instruction at the end of |
1244 | of block-structured high-level languages. A "leave" instruction at the end of |
1224 | a procedure complements an "enter" at the beginning of the procedure to |
1245 | a procedure complements an "enter" at the beginning of the procedure to |
1225 | simplify stack management and to control access to variables for nested |
1246 | simplify stack management and to control access to variables for nested |
1226 | procedures. The "enter" instruction includes two parameters. The first |
1247 | procedures. The "enter" instruction includes two parameters. The first |
1227 | parameter specifies the number of bytes of dynamic storage to be allocated on |
1248 | parameter specifies the number of bytes of dynamic storage to be allocated on |
1228 | the stack for the routine being entered. The second parameter corresponds to |
1249 | the stack for the routine being entered. The second parameter corresponds to |
1229 | the lexical nesting level of the routine, it can be in range from 0 to 31. |
1250 | the lexical nesting level of the routine, it can be in range from 0 to 31. |
1230 | The specified lexical level determines how many sets of stack frame pointers |
1251 | The specified lexical level determines how many sets of stack frame pointers |
1231 | the CPU copies into the new stack frame from the preceding frame. This list |
1252 | the CPU copies into the new stack frame from the preceding frame. This list |
1232 | of stack frame pointers is sometimes called the display. The first word (or |
1253 | of stack frame pointers is sometimes called the display. The first word (or |
1233 | double word when code is 32-bit) of the display is a pointer to the last stack |
1254 | double word when code is 32-bit) of the display is a pointer to the last stack |
1234 | frame. This pointer enables a "leave" instruction to reverse the action of the |
1255 | frame. This pointer enables a "leave" instruction to reverse the action of the |
1235 | previous "enter" instruction by effectively discarding the last stack frame. |
1256 | previous "enter" instruction by effectively discarding the last stack frame. |
1236 | After "enter" creates the new display for a procedure, it allocates the |
1257 | After "enter" creates the new display for a procedure, it allocates the |
1237 | dynamic storage space for that procedure by decrementing ESP by the number of |
1258 | dynamic storage space for that procedure by decrementing ESP by the number of |
1238 | bytes specified in the first parameter. To enable a procedure to address its |
1259 | bytes specified in the first parameter. To enable a procedure to address its |
1239 | display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack |
1260 | display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack |
1240 | frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to |
1261 | frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to |
1241 | BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting |
1262 | BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting |
1242 | levels greater than zero, the processor pushes additional frame pointers on |
1263 | levels greater than zero, the processor pushes additional frame pointers on |
1243 | the stack before adjusting the stack pointer. |
1264 | the stack before adjusting the stack pointer. |
1244 | 1265 | ||
1245 | enter 2048,0 ; enter and allocate 2048 bytes on stack |
1266 | enter 2048,0 ; enter and allocate 2048 bytes on stack |
1246 | 1267 | ||
1247 | 1268 | ||
1248 | 2.1.12 System instructions |
1269 | 2.1.12 System instructions |
1249 | 1270 | ||
1250 | "lmsw" loads the operand into the machine status word (bits 0 through 15 of |
1271 | "lmsw" loads the operand into the machine status word (bits 0 through 15 of |
1251 | CR0 register), while "smsw" stores the machine status word into the |
1272 | CR0 register), while "smsw" stores the machine status word into the |
1252 | destination operand. The operand for both those instructions can be 16-bit |
1273 | destination operand. The operand for both those instructions can be 16-bit |
1253 | general register or memory, for "smsw" it can also be 32-bit general |
1274 | general register or memory, for "smsw" it can also be 32-bit general |
1254 | register. |
1275 | register. |
1255 | 1276 | ||
1256 | lmsw ax ; load machine status from register |
1277 | lmsw ax ; load machine status from register |
1257 | smsw [bx] ; store machine status to memory |
1278 | smsw [bx] ; store machine status to memory |
1258 | 1279 | ||
1259 | "lgdt" and "lidt" instructions load the values in operand into the global |
1280 | "lgdt" and "lidt" instructions load the values in operand into the global |
1260 | descriptor table register or the interrupt descriptor table register |
1281 | descriptor table register or the interrupt descriptor table register |
1261 | respectively. "sgdt" and "sidt" store the contents of the global descriptor |
1282 | respectively. "sgdt" and "sidt" store the contents of the global descriptor |
1262 | table register or the interrupt descriptor table register in the destination |
1283 | table register or the interrupt descriptor table register in the destination |
1263 | operand. The operand should be a 6 bytes in memory. |
1284 | operand. The operand should be a 6 bytes in memory. |
1264 | 1285 | ||
1265 | lgdt [ebx] ; load global descriptor table |
1286 | lgdt [ebx] ; load global descriptor table |
1266 | 1287 | ||
1267 | "lldt" loads the operand into the segment selector field of the local |
1288 | "lldt" loads the operand into the segment selector field of the local |
1268 | descriptor table register and "sldt" stores the segment selector from the |
1289 | descriptor table register and "sldt" stores the segment selector from the |
1269 | local descriptor table register in the operand. "ltr" loads the operand into |
1290 | local descriptor table register in the operand. "ltr" loads the operand into |
1270 | the segment selector field of the task register and "str" stores the segment |
1291 | the segment selector field of the task register and "str" stores the segment |
1271 | selector from the task register in the operand. Rules for operand are the same |
1292 | selector from the task register in the operand. Rules for operand are the same |
1272 | as for the "lmsw" and "smsw" instructions. |
1293 | as for the "lmsw" and "smsw" instructions. |
1273 | "lar" loads the access rights from the segment descriptor specified by |
1294 | "lar" loads the access rights from the segment descriptor specified by |
1274 | the selector in source operand into the destination operand and sets the ZF |
1295 | the selector in source operand into the destination operand and sets the ZF |
1275 | flag. The destination operand can be a 16-bit or 32-bit general register. |
1296 | flag. The destination operand can be a 16-bit or 32-bit general register. |
1276 | The source operand should be a 16-bit general register or memory. |
1297 | The source operand should be a 16-bit general register or memory. |
1277 | 1298 | ||
1278 | lar ax,[bx] ; load access rights into word |
1299 | lar ax,[bx] ; load access rights into word |
1279 | lar eax,dx ; load access rights into double word |
1300 | lar eax,dx ; load access rights into double word |
1280 | 1301 | ||
1281 | "lsl" loads the segment limit from the segment descriptor specified by the |
1302 | "lsl" loads the segment limit from the segment descriptor specified by the |
1282 | selector in source operand into the destination operand and sets the ZF flag. |
1303 | selector in source operand into the destination operand and sets the ZF flag. |
1283 | Rules for operand are the same as for the "lar" instruction. |
1304 | Rules for operand are the same as for the "lar" instruction. |
1284 | "verr" and "verw" verify whether the code or data segment specified with |
1305 | "verr" and "verw" verify whether the code or data segment specified with |
1285 | the operand is readable or writable from the current privilege level. The |
1306 | the operand is readable or writable from the current privilege level. The |
1286 | operand should be a word, it can be general register or memory. If the segment |
1307 | operand should be a word, it can be general register or memory. If the segment |
1287 | is accessible and readable (for "verr") or writable (for "verw") the ZF flag |
1308 | is accessible and readable (for "verr") or writable (for "verw") the ZF flag |
1288 | is set, otherwise it's cleared. Rules for operand are the same as for the |
1309 | is set, otherwise it's cleared. Rules for operand are the same as for the |
1289 | "lldt" instruction. |
1310 | "lldt" instruction. |
1290 | "arpl" compares the RPL (requestor's privilege level) fields of two segment |
1311 | "arpl" compares the RPL (requestor's privilege level) fields of two segment |
1291 | selectors. The first operand contains one segment selector and the second |
1312 | selectors. The first operand contains one segment selector and the second |
1292 | operand contains the other. If the RPL field of the destination operand is |
1313 | operand contains the other. If the RPL field of the destination operand is |
1293 | less than the RPL field of the source operand, the ZF flag is set and the RPL |
1314 | less than the RPL field of the source operand, the ZF flag is set and the RPL |
1294 | field of the destination operand is increased to match that of the source |
1315 | field of the destination operand is increased to match that of the source |
1295 | operand. Otherwise, the ZF flag is cleared and no change is made to the |
1316 | operand. Otherwise, the ZF flag is cleared and no change is made to the |
1296 | destination operand. The destination operand can be a word general register |
1317 | destination operand. The destination operand can be a word general register |
1297 | or memory, the source operand must be a general register. |
1318 | or memory, the source operand must be a general register. |
1298 | 1319 | ||
1299 | arpl bx,ax ; adjust RPL of selector in register |
1320 | arpl bx,ax ; adjust RPL of selector in register |
1300 | arpl [bx],ax ; adjust RPL of selector in memory |
1321 | arpl [bx],ax ; adjust RPL of selector in memory |
1301 | 1322 | ||
1302 | "clts" clears the TS (task switched) flag in the CR0 register. This |
1323 | "clts" clears the TS (task switched) flag in the CR0 register. This |
1303 | instruction has no operands. |
1324 | instruction has no operands. |
1304 | "lock" prefix causes the processor's bus-lock signal to be asserted during |
1325 | "lock" prefix causes the processor's bus-lock signal to be asserted during |
1305 | execution of the accompanying instruction. In a multiprocessor environment, |
1326 | execution of the accompanying instruction. In a multiprocessor environment, |
1306 | the bus-lock signal insures that the processor has exclusive use of any shared |
1327 | the bus-lock signal insures that the processor has exclusive use of any shared |
1307 | memory while the signal is asserted. The "lock" prefix can be prepended only |
1328 | memory while the signal is asserted. The "lock" prefix can be prepended only |
1308 | to the following instructions and only to those forms of the instructions |
1329 | to the following instructions and only to those forms of the instructions |
1309 | where the destination operand is a memory operand: "add", "adc", "and", "btc", |
1330 | where the destination operand is a memory operand: "add", "adc", "and", "btc", |
1310 | "btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb", |
1331 | "btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb", |
1311 | "sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of |
1332 | "sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of |
1312 | these instructions and the source operand is a memory operand, an undefined |
1333 | these instructions and the source operand is a memory operand, an undefined |
1313 | opcode exception may be generated. An undefined opcode exception will also be |
1334 | opcode exception may be generated. An undefined opcode exception will also be |
1314 | generated if the "lock" prefix is used with any instruction not in the above |
1335 | generated if the "lock" prefix is used with any instruction not in the above |
1315 | list. The "xchg" instruction always asserts the bus-lock signal regardless of |
1336 | list. The "xchg" instruction always asserts the bus-lock signal regardless of |
1316 | the presence or absence of the "lock" prefix. |
1337 | the presence or absence of the "lock" prefix. |
1317 | "hlt" stops instruction execution and places the processor in a halted |
1338 | "hlt" stops instruction execution and places the processor in a halted |
1318 | state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET |
1339 | state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET |
1319 | signal will resume execution. This instruction has no operands. |
1340 | signal will resume execution. This instruction has no operands. |
1320 | "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry |
1341 | "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry |
1321 | specified with the operand, which should be a memory. The processor determines |
1342 | specified with the operand, which should be a memory. The processor determines |
1322 | the page that contains that address and flushes the TLB entry for that page. |
1343 | the page that contains that address and flushes the TLB entry for that page. |
1323 | "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the |
1344 | "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the |
1324 | address specified in the ECX register into registers EDX and EAX. "wrmsr" |
1345 | address specified in the ECX register into registers EDX and EAX. "wrmsr" |
1325 | writes the contents of registers EDX and EAX into the 64-bit MSR of the |
1346 | writes the contents of registers EDX and EAX into the 64-bit MSR of the |
1326 | address specified in the ECX register. "rdtsc" loads the current value of the |
1347 | address specified in the ECX register. "rdtsc" loads the current value of the |
1327 | processor's time stamp counter from the 64-bit MSR into the EDX and EAX |
1348 | processor's time stamp counter from the 64-bit MSR into the EDX and EAX |
1328 | registers. The processor increments the time stamp counter MSR every clock |
1349 | registers. The processor increments the time stamp counter MSR every clock |
1329 | cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the |
1350 | cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the |
1330 | contents of the 40-bit performance monitoring counter specified in the ECX |
1351 | contents of the 40-bit performance monitoring counter specified in the ECX |
1331 | register into registers EDX and EAX. These instructions have no operands. |
1352 | register into registers EDX and EAX. These instructions have no operands. |
1332 | "wbinvd" writes back all modified cache lines in the processor's internal |
1353 | "wbinvd" writes back all modified cache lines in the processor's internal |
1333 | cache to main memory and invalidates (flushes) the internal caches. The |
1354 | cache to main memory and invalidates (flushes) the internal caches. The |
1334 | instruction then issues a special function bus cycle that directs external |
1355 | instruction then issues a special function bus cycle that directs external |
1335 | caches to also write back modified data and another bus cycle to indicate that |
1356 | caches to also write back modified data and another bus cycle to indicate that |
1336 | the external caches should be invalidated. This instruction has no operands. |
1357 | the external caches should be invalidated. This instruction has no operands. |
1337 | "rsm" return program control from the system management mode to the program |
1358 | "rsm" return program control from the system management mode to the program |
1338 | that was interrupted when the processor received an SMM interrupt. This |
1359 | that was interrupted when the processor received an SMM interrupt. This |
1339 | instruction has no operands. |
1360 | instruction has no operands. |
1340 | "sysenter" executes a fast call to a level 0 system procedure, "sysexit" |
1361 | "sysenter" executes a fast call to a level 0 system procedure, "sysexit" |
1341 | executes a fast return to level 3 user code. The addresses used by these |
1362 | executes a fast return to level 3 user code. The addresses used by these |
1342 | instructions are stored in MSRs. These instructions have no operands. |
1363 | instructions are stored in MSRs. These instructions have no operands. |
1343 | 1364 | ||
1344 | 1365 | ||
1345 | 2.1.13 FPU instructions |
1366 | 2.1.13 FPU instructions |
1346 | 1367 | ||
1347 | The FPU (Floating-Point Unit) instructions operate on the floating-point |
1368 | The FPU (Floating-Point Unit) instructions operate on the floating-point |
1348 | values in three formats: single precision (32-bit), double precision (64-bit) |
1369 | values in three formats: single precision (32-bit), double precision (64-bit) |
1349 | and double extended precision (80-bit). The FPU registers form the stack and |
1370 | and double extended precision (80-bit). The FPU registers form the stack and |
1350 | each of them holds the double extended precision floating-point value. When |
1371 | each of them holds the double extended precision floating-point value. When |
1351 | some values are pushed onto the stack or are removed from the top, the FPU |
1372 | some values are pushed onto the stack or are removed from the top, the FPU |
1352 | registers are shifted, so ST0 is always the value on the top of FPU stack, ST1 |
1373 | registers are shifted, so ST0 is always the value on the top of FPU stack, ST1 |
1353 | is the first value below the top, etc. The ST0 name has also the synonym ST. |
1374 | is the first value below the top, etc. The ST0 name has also the synonym ST. |
1354 | "fld" pushes the floating-point value onto the FPU register stack. The |
1375 | "fld" pushes the floating-point value onto the FPU register stack. The |
1355 | operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register, |
1376 | operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register, |
1356 | its value is then loaded onto the top of FPU register stack (the ST0 |
1377 | its value is then loaded onto the top of FPU register stack (the ST0 |
1357 | register) and is automatically converted into the double extended precision |
1378 | register) and is automatically converted into the double extended precision |
1358 | format. |
1379 | format. |
1359 | 1380 | ||
1360 | fld dword [bx] ; load single prevision value from memory |
1381 | fld dword [bx] ; load single prevision value from memory |
1361 | fld st2 ; push value of st2 onto register stack |
1382 | fld st2 ; push value of st2 onto register stack |
1362 | 1383 | ||
1363 | "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the |
1384 | "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the |
1364 | commonly used contants onto the FPU register stack. The loaded constants are |
1385 | commonly used contants onto the FPU register stack. The loaded constants are |
1365 | +1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions |
1386 | +1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions |
1366 | have no operands. |
1387 | have no operands. |
1367 | "fild" convert the singed integer source operand into double extended |
1388 | "fild" converts the signed integer source operand into double extended |
1368 | precision floating-point format and pushes the result onto the FPU register |
1389 | precision floating-point format and pushes the result onto the FPU register |
1369 | stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location. |
1390 | stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location. |
1370 | 1391 | ||
1371 | fild qword [bx] ; load 64-bit integer from memory |
1392 | fild qword [bx] ; load 64-bit integer from memory |
1372 | 1393 | ||
1373 | "fst" copies the value of ST0 register to the destination operand, which |
1394 | "fst" copies the value of ST0 register to the destination operand, which |
1374 | can be 32-bit or 64-bit memory location or another FPU register. "fstp" |
1395 | can be 32-bit or 64-bit memory location or another FPU register. "fstp" |
1375 | performs the same operation as "fst" and then pops the register stack, |
1396 | performs the same operation as "fst" and then pops the register stack, |
1376 | getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction |
1397 | getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction |
1377 | and can also store value in the 80-bit memory. |
1398 | and can also store value in the 80-bit memory. |
1378 | 1399 | ||
1379 | fst st3 ; copy value of st0 into st3 register |
1400 | fst st3 ; copy value of st0 into st3 register |
1380 | fstp tword [bx] ; store value in memory and pop stack |
1401 | fstp tword [bx] ; store value in memory and pop stack |
1381 | 1402 | ||
1382 | "fist" converts the value in ST0 to a signed integer and stores the result |
1403 | "fist" converts the value in ST0 to a signed integer and stores the result |
1383 | in the destination operand. The operand can be 16-bit or 32-bit memory |
1404 | in the destination operand. The operand can be 16-bit or 32-bit memory |
1384 | location. "fistp" performs the same operation and then pops the register |
1405 | location. "fistp" performs the same operation and then pops the register |
1385 | stack, it accepts the same operands as the "fist" instruction and can also |
1406 | stack, it accepts the same operands as the "fist" instruction and can also |
1386 | store integer value in the 64-bit memory, so it has the same rules for |
1407 | store integer value in the 64-bit memory, so it has the same rules for |
1387 | operands as "fild" instruction. |
1408 | operands as "fild" instruction. |
1388 | "fbld" converts the packed BCD integer into double extended precision |
1409 | "fbld" converts the packed BCD integer into double extended precision |
1389 | floating-point format and pushes this value onto the FPU stack. "fbstp" |
1410 | floating-point format and pushes this value onto the FPU stack. "fbstp" |
1390 | converts the value in ST0 to an 18-digit packed BCD integer, stores the result |
1411 | converts the value in ST0 to an 18-digit packed BCD integer, stores the result |
1391 | in the destination operand, and pops the register stack. The operand should be |
1412 | in the destination operand, and pops the register stack. The operand should be |
1392 | an 80-bit memory location. |
1413 | an 80-bit memory location. |
1393 | "fadd" adds the destination and source operand and stores the sum in the |
1414 | "fadd" adds the destination and source operand and stores the sum in the |
1394 | destination location. The destination operand is always an FPU register, if |
1415 | destination location. The destination operand is always an FPU register, if |
1395 | the source is a memory location, the destination is ST0 register and only |
1416 | the source is a memory location, the destination is ST0 register and only |
1396 | source operand should be specified. If both operands are FPU registers, at |
1417 | source operand should be specified. If both operands are FPU registers, at |
1397 | least one of them should be ST0 register. An operand in memory can be a |
1418 | least one of them should be ST0 register. An operand in memory can be a |
1398 | 32-bit or 64-bit value. |
1419 | 32-bit or 64-bit value. |
1399 | 1420 | ||
1400 | fadd qword [bx] ; add double precision value to st0 |
1421 | fadd qword [bx] ; add double precision value to st0 |
1401 | fadd st2,st0 ; add st0 to st2 |
1422 | fadd st2,st0 ; add st0 to st2 |
1402 | 1423 | ||
1403 | "faddp" adds the destination and source operand, stores the sum in the |
1424 | "faddp" adds the destination and source operand, stores the sum in the |
1404 | destination location and then pops the register stack. The destination operand |
1425 | destination location and then pops the register stack. The destination operand |
1405 | must be an FPU register and the source operand must be the ST0. When no |
1426 | must be an FPU register and the source operand must be the ST0. When no |
1406 | operands are specified, ST1 is used as a destination operand. |
1427 | operands are specified, ST1 is used as a destination operand. |
1407 | 1428 | ||
1408 | faddp ; add st0 to st1 and pop the stack |
1429 | faddp ; add st0 to st1 and pop the stack |
1409 | faddp st2,st0 ; add st0 to st2 and pop the stack |
1430 | faddp st2,st0 ; add st0 to st2 and pop the stack |
1410 | 1431 | ||
1411 | "fiadd" instruction converts an integer source operand into double extended |
1432 | "fiadd" instruction converts an integer source operand into double extended |
1412 | precision floating-point value and adds it to the destination operand. The |
1433 | precision floating-point value and adds it to the destination operand. The |
1413 | operand should be a 16-bit or 32-bit memory location. |
1434 | operand should be a 16-bit or 32-bit memory location. |
1414 | 1435 | ||
1415 | fiadd word [bx] ; add word integer to st0 |
1436 | fiadd word [bx] ; add word integer to st0 |
1416 | 1437 | ||
1417 | "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd", |
1438 | "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd", |
1418 | have the same rules for operands and differ only in the perfomed computation. |
1439 | have the same rules for operands and differ only in the perfomed computation. |
1419 | "fsub" substracts the source operand from the destination operand, "fsubr" |
1440 | "fsub" substracts the source operand from the destination operand, "fsubr" |
1420 | substract the destination operand from the source operand, "fmul" multiplies |
1441 | substract the destination operand from the source operand, "fmul" multiplies |
1421 | the destination and source operands, "fdiv" divides the destination operand by |
1442 | the destination and source operands, "fdiv" divides the destination operand by |
1422 | the source operand and "fdivr" divides the source operand by the destination |
1443 | the source operand and "fdivr" divides the source operand by the destination |
1423 | operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same |
1444 | operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same |
1424 | operations and pop the register stack, the rules for operand are the same as |
1445 | operations and pop the register stack, the rules for operand are the same as |
1425 | for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr" |
1446 | for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr" |
1426 | perform these operations after converting the integer source operand into |
1447 | perform these operations after converting the integer source operand into |
1427 | floating-point value, they have the same rules for operands as "fiadd" |
1448 | floating-point value, they have the same rules for operands as "fiadd" |
1428 | instruction. |
1449 | instruction. |
1429 | "fsqrt" computes the square root of the value in ST0 register, "fsin" |
1450 | "fsqrt" computes the square root of the value in ST0 register, "fsin" |
1430 | computes the sine of that value, "fcos" computes the cosine of that value, |
1451 | computes the sine of that value, "fcos" computes the cosine of that value, |
1431 | "fchs" complements its sign bit, "fabs" clears its sign to create the absolute |
1452 | "fchs" complements its sign bit, "fabs" clears its sign to create the absolute |
1432 | value, "frndint" rounds it to the nearest integral value, depending on the |
1453 | value, "frndint" rounds it to the nearest integral value, depending on the |
1433 | current rounding mode. "f2xm1" computes the exponential value of 2 to the |
1454 | current rounding mode. "f2xm1" computes the exponential value of 2 to the |
1434 | power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the |
1455 | power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the |
1435 | range -1.0 to +1.0. All these instruction store the result in ST0 and have no |
1456 | range -1.0 to +1.0. All these instruction store the result in ST0 and have no |
1436 | operands. |
1457 | operands. |
1437 | "fsincos" computes both the sine and the cosine of the value in ST0 |
1458 | "fsincos" computes both the sine and the cosine of the value in ST0 |
1438 | register, stores the sine in ST0 and pushes the cosine on the top of FPU |
1459 | register, stores the sine in ST0 and pushes the cosine on the top of FPU |
1439 | register stack. "fptan" computes the tangent of the value in ST0, stores the |
1460 | register stack. "fptan" computes the tangent of the value in ST0, stores the |
1440 | result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes |
1461 | result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes |
1441 | the arctangent of the value in ST1 divided by the value in ST0, stores the |
1462 | the arctangent of the value in ST1 divided by the value in ST0, stores the |
1442 | result in ST1 and pops the FPU register stack. "fyl2x" computes the binary |
1463 | result in ST1 and pops the FPU register stack. "fyl2x" computes the binary |
1443 | logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the |
1464 | logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the |
1444 | FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to |
1465 | FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to |
1445 | ST0 before computing the logarithm. "fprem" computes the remainder obtained |
1466 | ST0 before computing the logarithm. "fprem" computes the remainder obtained |
1446 | from dividing the value in ST0 by the value in ST1, and stores the result |
1467 | from dividing the value in ST0 by the value in ST1, and stores the result |
1447 | in ST0. "fprem1" performs the same operation as "fprem", but it computes the |
1468 | in ST0. "fprem1" performs the same operation as "fprem", but it computes the |
1448 | remainder in the way specified by IEEE Standard 754. "fscale" truncates the |
1469 | remainder in the way specified by IEEE Standard 754. "fscale" truncates the |
1449 | value in ST1 and increases the exponent of ST0 by this value. "fxtract" |
1470 | value in ST1 and increases the exponent of ST0 by this value. "fxtract" |
1450 | separates the value in ST0 into its exponent and significand, stores the |
1471 | separates the value in ST0 into its exponent and significand, stores the |
1451 | exponent in ST0 and pushes the significand onto the register stack. "fnop" |
1472 | exponent in ST0 and pushes the significand onto the register stack. "fnop" |
1452 | performs no operation. These instruction have no operands. |
1473 | performs no operation. These instruction have no operands. |
1453 | "fxch" exchanges the contents of ST0 an another FPU register. The operand |
1474 | "fxch" exchanges the contents of ST0 an another FPU register. The operand |
1454 | should be an FPU register, if no operand is specified, the contents of ST0 and |
1475 | should be an FPU register, if no operand is specified, the contents of ST0 and |
1455 | ST1 are exchanged. |
1476 | ST1 are exchanged. |
1456 | "fcom" and "fcomp" compare the contents of ST0 and the source operand and |
1477 | "fcom" and "fcomp" compare the contents of ST0 and the source operand and |
1457 | set flags in the FPU status word according to the results. "fcomp" |
1478 | set flags in the FPU status word according to the results. "fcomp" |
1458 | additionally pops the register stack after performing the comparison. The |
1479 | additionally pops the register stack after performing the comparison. The |
1459 | operand can be a single or double precision value in memory or the FPU |
1480 | operand can be a single or double precision value in memory or the FPU |
1460 | register. When no operand is specified, ST1 is used as a source operand. |
1481 | register. When no operand is specified, ST1 is used as a source operand. |
1461 | 1482 | ||
1462 | fcom ; compare st0 with st1 |
1483 | fcom ; compare st0 with st1 |
1463 | fcomp st2 ; compare st0 with st2 and pop stack |
1484 | fcomp st2 ; compare st0 with st2 and pop stack |
1464 | 1485 | ||
1465 | "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status |
1486 | "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status |
1466 | word according to the results and pops the register stack twice. This |
1487 | word according to the results and pops the register stack twice. This |
1467 | instruction has no operands. |
1488 | instruction has no operands. |
1468 | "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU |
1489 | "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU |
1469 | registers. Rules for operands are the same as for the "fcom", "fcomp" and |
1490 | registers. Rules for operands are the same as for the "fcom", "fcomp" and |
1470 | "fcompp", but the source operand must be an FPU register. |
1491 | "fcompp", but the source operand must be an FPU register. |
1471 | "ficom" and "ficomp" compare the value in ST0 with an integer source operand |
1492 | "ficom" and "ficomp" compare the value in ST0 with an integer source operand |
1472 | and set the flags in the FPU status word according to the results. "ficomp" |
1493 | and set the flags in the FPU status word according to the results. "ficomp" |
1473 | additionally pops the register stack after performing the comparison. The |
1494 | additionally pops the register stack after performing the comparison. The |
1474 | integer value is converted to double extended precision floating-point format |
1495 | integer value is converted to double extended precision floating-point format |
1475 | before the comparison is made. The operand should be a 16-bit or 32-bit |
1496 | before the comparison is made. The operand should be a 16-bit or 32-bit |
1476 | memory location. |
1497 | memory location. |
1477 | 1498 | ||
1478 | ficom word [bx] ; compare st0 with 16-bit integer |
1499 | ficom word [bx] ; compare st0 with 16-bit integer |
1479 | 1500 | ||
1480 | "fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with |
1501 | "fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with |
1481 | another FPU register and set the ZF, PF and CF flags according to the results. |
1502 | another FPU register and set the ZF, PF and CF flags according to the results. |
1482 | "fcomip" and "fucomip" additionaly pop the register stack after performing the |
1503 | "fcomip" and "fucomip" additionaly pop the register stack after performing the |
1483 | comparison. The instructions obtained by attaching the FPU condition mnemonic |
1504 | comparison. The instructions obtained by attaching the FPU condition mnemonic |
1484 | (see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register |
1505 | (see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register |
1485 | into ST0 register if the fiven test condition is true. These instruction |
1506 | into ST0 register if the fiven test condition is true. These instruction |
1486 | allow two different syntaxes, one with single operand specifying the source |
1507 | allow two different syntaxes, one with single operand specifying the source |
1487 | FPU register, and one with two operands, in that case destination operand |
1508 | FPU register, and one with two operands, in that case destination operand |
1488 | should be ST0 register and the second operand specifies the source FPU |
1509 | should be ST0 register and the second operand specifies the source FPU |
1489 | register. |
1510 | register. |
1490 | 1511 | ||
1491 | fcomi st2 ; compare st0 with st2 and set flags |
1512 | fcomi st2 ; compare st0 with st2 and set flags |
1492 | fcmovb st0,st2 ; transfer st2 to st0 if below |
1513 | fcmovb st0,st2 ; transfer st2 to st0 if below |
1493 | 1514 | ||
1494 | Table 2.2 FPU conditions |
1515 | Table 2.2 FPU conditions |
1495 | ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ |
1516 | /------------------------------------------------------\ |
1496 | ³ Mnemonic ³ Condition tested ³ Description ³ |
1517 | | Mnemonic | Condition tested | Description | |
1497 | ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ |
1518 | |==========|==================|========================| |
1498 | ³ b ³ CF = 1 ³ below ³ |
1519 | | b | CF = 1 | below | |
1499 | ³ e ³ ZF = 1 ³ equal ³ |
1520 | | e | ZF = 1 | equal | |
1500 | ³ be ³ CF or ZF = 1 ³ below or equal ³ |
1521 | | be | CF or ZF = 1 | below or equal | |
1501 | ³ u ³ PF = 1 ³ unordered ³ |
1522 | | u | PF = 1 | unordered | |
1502 | ³ nb ³ CF = 0 ³ not below ³ |
1523 | | nb | CF = 0 | not below | |
1503 | ³ ne ³ ZF = 0 ³ not equal ³ |
1524 | | ne | ZF = 0 | not equal | |
1504 | ³ nbe ³ CF and ZF = 0 ³ not below nor equal ³ |
1525 | | nbe | CF and ZF = 0 | not below nor equal | |
1505 | ³ nu ³ PF = 0 ³ not unordered ³ |
1526 | | nu | PF = 0 | not unordered | |
1506 | ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ |
1527 | \------------------------------------------------------/ |
1507 | 1528 | ||
1508 | "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU |
1529 | "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU |
1509 | status word according to the results. "fxam" examines the contents of the ST0 |
1530 | status word according to the results. "fxam" examines the contents of the ST0 |
1510 | and sets the flags in FPU status word to indicate the class of value in the |
1531 | and sets the flags in FPU status word to indicate the class of value in the |
1511 | register. These instructions have no operands. |
1532 | register. These instructions have no operands. |
1512 | "fstsw" and "fnstsw" store the current value of the FPU status word in the |
1533 | "fstsw" and "fnstsw" store the current value of the FPU status word in the |
1513 | destination location. The destination operand can be either a 16-bit memory or |
1534 | destination location. The destination operand can be either a 16-bit memory or |
1514 | the AX register. "fstsw" checks for pending umasked FPU exceptions before |
1535 | the AX register. "fstsw" checks for pending umasked FPU exceptions before |
1515 | storing the status word, "fnstsw" does not. |
1536 | storing the status word, "fnstsw" does not. |
1516 | "fstcw" and "fnstcw" store the current value of the FPU control word at the |
1537 | "fstcw" and "fnstcw" store the current value of the FPU control word at the |
1517 | specified destination in memory. "fstcw" checks for pending umasked FPU |
1538 | specified destination in memory. "fstcw" checks for pending umasked FPU |
1518 | exceptions before storing the control word, "fnstcw" does not. "fldcw" loads |
1539 | exceptions before storing the control word, "fnstcw" does not. "fldcw" loads |
1519 | the operand into the FPU control word. The operand should be a 16-bit memory |
1540 | the operand into the FPU control word. The operand should be a 16-bit memory |
1520 | location. |
1541 | location. |
1521 | "fstenv" and "fnstenv" store the current FPU operating environment at the |
1542 | "fstenv" and "fnstenv" store the current FPU operating environment at the |
1522 | memory location specified with the destination operand, and then mask all FPU |
1543 | memory location specified with the destination operand, and then mask all FPU |
1523 | exceptions. "fstenv" checks for pending umasked FPU exceptions before |
1544 | exceptions. "fstenv" checks for pending umasked FPU exceptions before |
1524 | proceeding, "fnstenv" does not. "fldenv" loads the complete operating |
1545 | proceeding, "fnstenv" does not. "fldenv" loads the complete operating |
1525 | environment from memory into the FPU. "fsave" and "fnsave" store the current |
1546 | environment from memory into the FPU. "fsave" and "fnsave" store the current |
1526 | FPU state (operating environment and register stack) at the specified |
1547 | FPU state (operating environment and register stack) at the specified |
1527 | destination in memory and reinitializes the FPU. "fsave" check for pending |
1548 | destination in memory and reinitializes the FPU. "fsave" check for pending |
1528 | unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor" |
1549 | unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor" |
1529 | loads the FPU state from the specified memory location. All these instructions |
1550 | loads the FPU state from the specified memory location. All these instructions |
1530 | need an operand being a memory location. |
1551 | need an operand being a memory location. For each of these instruction |
1531 | "finit" and "fninit" set the FPU operating environment into its default |
1552 | exist two additional mnemonics that allow to precisely select the type of the |
- | 1553 | operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and |
|
- | 1554 | "frstorw" mnemonics force the instruction to perform operation as in the 16-bit |
|
- | 1555 | mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord" |
|
- | 1556 | force the operation as in 32-bit mode. |
|
- | 1557 | "finit" and "fninit" set the FPU operating environment into its default |
|
1532 | state. "finit" checks for pending unmasked FPU exception before proceeding, |
1558 | state. "finit" checks for pending unmasked FPU exception before proceeding, |
1533 | "fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the |
1559 | "fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the |
1534 | FPU status word. "fclex" checks for pending unmasked FPU exception before |
1560 | FPU status word. "fclex" checks for pending unmasked FPU exception before |
1535 | proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same |
1561 | proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same |
1536 | instruction, which causes the processor to check for pending unmasked FPU |
1562 | instruction, which causes the processor to check for pending unmasked FPU |
1537 | exceptions and handle them before proceeding. These instruction have no |
1563 | exceptions and handle them before proceeding. These instruction have no |
1538 | operands. |
1564 | operands. |
1539 | "ffree" sets the tag associated with specified FPU register to empty. The |
1565 | "ffree" sets the tag associated with specified FPU register to empty. The |
1540 | operand should be an FPU register. |
1566 | operand should be an FPU register. |
1541 | "fincstp" and "fdecstp" rotate the FPU stack by one by adding or |
1567 | "fincstp" and "fdecstp" rotate the FPU stack by one by adding or |
1542 | substracting one to the pointer of the top of stack. These instruction have no |
1568 | substracting one to the pointer of the top of stack. These instruction have no |
1543 | operands. |
1569 | operands. |
1544 | 1570 | ||
1545 | 1571 | ||
1546 | 2.1.14 MMX instructions |
1572 | 2.1.14 MMX instructions |
1547 | 1573 | ||
1548 | The MMX instructions operate on the packed integer types and use the MMX |
1574 | The MMX instructions operate on the packed integer types and use the MMX |
1549 | registers, which are the low 64-bit parts of the 80-bit FPU registers. Because |
1575 | registers, which are the low 64-bit parts of the 80-bit FPU registers. Because |
1550 | of this MMX instructions cannot be used at the same time as FPU instructions. |
1576 | of this MMX instructions cannot be used at the same time as FPU instructions. |
1551 | They can operate on packed bytes (eight 8-bit integers), packed words (four |
1577 | They can operate on packed bytes (eight 8-bit integers), packed words (four |
1552 | 16-bit integers) or packed double words (two 32-bit integers), use of packed |
1578 | 16-bit integers) or packed double words (two 32-bit integers), use of packed |
1553 | formats allows to perform operations on multiple data at one time. |
1579 | formats allows to perform operations on multiple data at one time. |
1554 | "movq" copies a quad word from the source operand to the destination |
1580 | "movq" copies a quad word from the source operand to the destination |
1555 | operand. At least one of the operands must be a MMX register, the second one |
1581 | operand. At least one of the operands must be a MMX register, the second one |
1556 | can be also a MMX register or 64-bit memory location. |
1582 | can be also a MMX register or 64-bit memory location. |
1557 | 1583 | ||
1558 | movq mm0,mm1 ; move quad word from register to register |
1584 | movq mm0,mm1 ; move quad word from register to register |
1559 | movq mm2,[ebx] ; move quad word from memory to register |
1585 | movq mm2,[ebx] ; move quad word from memory to register |
1560 | 1586 | ||
1561 | "movd" copies a double word from the source operand to the destination |
1587 | "movd" copies a double word from the source operand to the destination |
1562 | operand. One of the operands must be a MMX register, the second one can be a |
1588 | operand. One of the operands must be a MMX register, the second one can be a |
1563 | general register or 32-bit memory location. Only low double word of MMX |
1589 | general register or 32-bit memory location. Only low double word of MMX |
1564 | register is used. |
1590 | register is used. |
1565 | All general MMX operations have two operands, the destination operand should |
1591 | All general MMX operations have two operands, the destination operand should |
1566 | be a MMX register, the source operand can be a MMX register or 64-bit memory |
1592 | be a MMX register, the source operand can be a MMX register or 64-bit memory |
1567 | location. Operation is performed on the corresponding data elements of the |
1593 | location. Operation is performed on the corresponding data elements of the |
1568 | source and destination operand and stored in the data elements of the |
1594 | source and destination operand and stored in the data elements of the |
1569 | destination operand. "paddb", "paddw" and "paddd" perform the addition of |
1595 | destination operand. "paddb", "paddw" and "paddd" perform the addition of |
1570 | packed bytes, packed words, or packed double words. "psubb", "psubw" and |
1596 | packed bytes, packed words, or packed double words. "psubb", "psubw" and |
1571 | "psubd" perform the substraction of appropriate types. "paddsb", "paddsw", |
1597 | "psubd" perform the substraction of appropriate types. "paddsb", "paddsw", |
1572 | "psubsb" and "psubsw" perform the addition or substraction of packed bytes |
1598 | "psubsb" and "psubsw" perform the addition or substraction of packed bytes |
1573 | or packed words with the signed saturation. "paddusb", "paddusw", "psubusb", |
1599 | or packed words with the signed saturation. "paddusb", "paddusw", "psubusb", |
1574 | "psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw" |
1600 | "psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw" |
1575 | performs a signed multiply of the packed words and store the high or low words |
1601 | performs a signed multiplication of the packed words and store the high or low |
1576 | of the results in the destination operand. "pmaddwd" performs a multiply of |
1602 | words of the results in the destination operand. "pmaddwd" performs a multiply |
1577 | the packed words and adds the four intermediate double word products in pairs |
1603 | of the packed words and adds the four intermediate double word products in |
1578 | to produce result as a packed double words. "pand", "por" and "pxor" perform |
1604 | pairs to produce result as a packed double words. "pand", "por" and "pxor" |
1579 | the logical operations on the quad words, "pandn" peforms also a logical |
1605 | perform the logical operations on the quad words, "pandn" peforms also a |
1580 | negation of the destination operand before performing the "and" operation. |
1606 | logical negation of the destination operand before performing the "and" |
1581 | "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed bytes, |
1607 | operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed |
1582 | packed words or packed double words. If a pair of data elements is equal, the |
1608 | bytes, packed words or packed double words. If a pair of data elements is |
1583 | corresponding data element in the destination operand is filled with bits of |
1609 | equal, the corresponding data element in the destination operand is filled with |
1584 | value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" perform |
1610 | bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" |
1585 | the similar operation, but they check whether the data elements in the |
1611 | perform the similar operation, but they check whether the data elements in the |
1586 | destination operand are greater than the correspoding data elements in the |
1612 | destination operand are greater than the correspoding data elements in the |
1587 | source operand. "packsswb" converts packed signed words into packed signed |
1613 | source operand. "packsswb" converts packed signed words into packed signed |
1588 | bytes, "packssdw" converts packed signed double words into packed signed |
1614 | bytes, "packssdw" converts packed signed double words into packed signed |
1589 | words, using saturation to handle overflow conditions. "packuswb" converts |
1615 | words, using saturation to handle overflow conditions. "packuswb" converts |
1590 | packed signed words into packed unsigned bytes. Converted data elements from |
1616 | packed signed words into packed unsigned bytes. Converted data elements from |
1591 | the source operand are stored in the low part of the destination operand, |
1617 | the source operand are stored in the low part of the destination operand, |
1592 | while converted data elements from the destination operand are stored in the |
1618 | while converted data elements from the destination operand are stored in the |
1593 | high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data |
1619 | high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data |
1594 | elements from the high parts of the source and destination operands and |
1620 | elements from the high parts of the source and destination operands and |
1595 | stores the result into the destination operand. "punpcklbw", "punpcklwd" and |
1621 | stores the result into the destination operand. "punpcklbw", "punpcklwd" and |
1596 | "punpckldq" perform the same operation, but the low parts of the source and |
1622 | "punpckldq" perform the same operation, but the low parts of the source and |
1597 | destination operand are used. |
1623 | destination operand are used. |
1598 | 1624 | ||
1599 | paddsb mm0,[esi] ; add packed bytes with signed saturation |
1625 | paddsb mm0,[esi] ; add packed bytes with signed saturation |
1600 | pcmpeqw mm3,mm7 ; compare packed words for equality |
1626 | pcmpeqw mm3,mm7 ; compare packed words for equality |
1601 | 1627 | ||
1602 | "psllw", "pslld" and "psllq" perform logical shift left of the packed words, |
1628 | "psllw", "pslld" and "psllq" perform logical shift left of the packed words, |
1603 | packed double words or a single quad word in the destination operand by the |
1629 | packed double words or a single quad word in the destination operand by the |
1604 | amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform |
1630 | amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform |
1605 | logical shift right of the packed words, packed double words or a single quad |
1631 | logical shift right of the packed words, packed double words or a single quad |
1606 | word. "psraw" and "psrad" perform arithmetic shift of the packed words or |
1632 | word. "psraw" and "psrad" perform arithmetic shift of the packed words or |
1607 | double words. The destination operand should be a MMX register, while source |
1633 | double words. The destination operand should be a MMX register, while source |
1608 | operand can be a MMX register, 64-bit memory location, or 8-bit immediate |
1634 | operand can be a MMX register, 64-bit memory location, or 8-bit immediate |
1609 | value. |
1635 | value. |
1610 | 1636 | ||
1611 | psllw mm2,mm4 ; shift words left logically |
1637 | psllw mm2,mm4 ; shift words left logically |
1612 | psrad mm4,[ebx] ; shift double words right arithmetically |
1638 | psrad mm4,[ebx] ; shift double words right arithmetically |
1613 | 1639 | ||
1614 | "emms" makes the FPU registers usable for the FPU instructions, it must be |
1640 | "emms" makes the FPU registers usable for the FPU instructions, it must be |
1615 | used before using the FPU instructions if any MMX instructions were used. |
1641 | used before using the FPU instructions if any MMX instructions were used. |
1616 | 1642 | ||
1617 | 1643 | ||
1618 | 2.1.15 SSE instructions |
1644 | 2.1.15 SSE instructions |
1619 | 1645 | ||
1620 | The SSE extension adds more MMX instructions and also introduces the |
1646 | The SSE extension adds more MMX instructions and also introduces the |
1621 | operations on packed single precision floating point values. The 128-bit |
1647 | operations on packed single precision floating point values. The 128-bit |
1622 | packed single precision format consists of four single precision floating |
1648 | packed single precision format consists of four single precision floating |
1623 | point values. The 128-bit SSE registers are designed for the purpose of |
1649 | point values. The 128-bit SSE registers are designed for the purpose of |
1624 | operations on this data type. |
1650 | operations on this data type. |
1625 | "movaps" and "movups" transfer a double quad word operand containing packed |
1651 | "movaps" and "movups" transfer a double quad word operand containing packed |
1626 | single precision values from source operand to destination operand. At least |
1652 | single precision values from source operand to destination operand. At least |
1627 | one of the operands have to be a SSE register, the second one can be also a |
1653 | one of the operands have to be a SSE register, the second one can be also a |
1628 | SSE register or 128-bit memory location. Memory operands for "movaps" |
1654 | SSE register or 128-bit memory location. Memory operands for "movaps" |
1629 | instruction must be aligned on boundary of 16 bytes, operands for "movups" |
1655 | instruction must be aligned on boundary of 16 bytes, operands for "movups" |
1630 | instruction don't have to be aligned. |
1656 | instruction don't have to be aligned. |
1631 | 1657 | ||
1632 | movups xmm0,[ebx] ; move unaligned double quad word |
1658 | movups xmm0,[ebx] ; move unaligned double quad word |
1633 | 1659 | ||
1634 | "movlps" moves packed two single precision values between the memory and the |
1660 | "movlps" moves packed two single precision values between the memory and the |
1635 | low quad word of SSE register. "movhps" moved packed two single precision |
1661 | low quad word of SSE register. "movhps" moved packed two single precision |
1636 | values between the memory and the high quad word of SSE register. One of the |
1662 | values between the memory and the high quad word of SSE register. One of the |
1637 | operands must be a SSE register, and the other operand must be a 64-bit memory |
1663 | operands must be a SSE register, and the other operand must be a 64-bit memory |
1638 | location. |
1664 | location. |
1639 | 1665 | ||
1640 | movlps xmm0,[ebx] ; move memory to low quad word of xmm0 |
1666 | movlps xmm0,[ebx] ; move memory to low quad word of xmm0 |
1641 | movhps [esi],xmm7 ; move high quad word of xmm7 to memory |
1667 | movhps [esi],xmm7 ; move high quad word of xmm7 to memory |
1642 | 1668 | ||
1643 | "movlhps" moves packed two single precision values from the low quad word |
1669 | "movlhps" moves packed two single precision values from the low quad word |
1644 | of source register to the high quad word of destination register. "movhlps" |
1670 | of source register to the high quad word of destination register. "movhlps" |
1645 | moves two packed single precision values from the high quad word of source |
1671 | moves two packed single precision values from the high quad word of source |
1646 | register to the low quad word of destination register. Both operands have to |
1672 | register to the low quad word of destination register. Both operands have to |
1647 | be a SSE registers. |
1673 | be a SSE registers. |
1648 | "movmskps" transfers the most significant bit of each of the four single |
1674 | "movmskps" transfers the most significant bit of each of the four single |
1649 | precision values in the SSE register into low four bits of a general register. |
1675 | precision values in the SSE register into low four bits of a general register. |
1650 | The source operand must be a SSE register, the destination operand must be a |
1676 | The source operand must be a SSE register, the destination operand must be a |
1651 | general register. |
1677 | general register. |
1652 | "movss" transfers a single precision value between source and destination |
1678 | "movss" transfers a single precision value between source and destination |
1653 | operand (only the low double word is trasferred). At least one of the operands |
1679 | operand (only the low double word is trasferred). At least one of the operands |
1654 | have to be a SSE register, the second one can be also a SSE register or 32-bit |
1680 | have to be a SSE register, the second one can be also a SSE register or 32-bit |
1655 | memory location. |
1681 | memory location. |
1656 | 1682 | ||
1657 | movss [edi],xmm3 ; move low double word of xmm3 to memory |
1683 | movss [edi],xmm3 ; move low double word of xmm3 to memory |
1658 | 1684 | ||
1659 | Each of the SSE arithmetic operations has two variants. When the mnemonic |
1685 | Each of the SSE arithmetic operations has two variants. When the mnemonic |
1660 | ends with "ps", the source operand can be a 128-bit memory location or a SSE |
1686 | ends with "ps", the source operand can be a 128-bit memory location or a SSE |
1661 | register, the destination operand must be a SSE register and the operation is |
1687 | register, the destination operand must be a SSE register and the operation is |
1662 | performed on packed four single precision values, for each pair of the |
1688 | performed on packed four single precision values, for each pair of the |
1663 | corresponding data elements separately, the result is stored in the |
1689 | corresponding data elements separately, the result is stored in the |
1664 | destination register. When the mnemonic ends with "ss", the source operand |
1690 | destination register. When the mnemonic ends with "ss", the source operand |
1665 | can be a 32-bit memory location or a SSE register, the destination operand |
1691 | can be a 32-bit memory location or a SSE register, the destination operand |
1666 | must be a SSE register and the operation is performed on single precision |
1692 | must be a SSE register and the operation is performed on single precision |
1667 | values, only low double words of SSE registers are used in this case, the |
1693 | values, only low double words of SSE registers are used in this case, the |
1668 | result is stored in the low double word of destination register. "addps" and |
1694 | result is stored in the low double word of destination register. "addps" and |
1669 | "addss" add the values, "subps" and "subss" substract the source value from |
1695 | "addss" add the values, "subps" and "subss" substract the source value from |
1670 | destination value, "mulps" and "mulss" multiply the values, "divps" and |
1696 | destination value, "mulps" and "mulss" multiply the values, "divps" and |
1671 | "divss" divide the destination value by the source value, "rcpps" and "rcpss" |
1697 | "divss" divide the destination value by the source value, "rcpps" and "rcpss" |
1672 | compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss" |
1698 | compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss" |
1673 | compute the square root of the source value, "rsqrtps" and "rsqrtss" compute |
1699 | compute the square root of the source value, "rsqrtps" and "rsqrtss" compute |
1674 | the approximate reciprocal of square root of the source value, "maxps" and |
1700 | the approximate reciprocal of square root of the source value, "maxps" and |
1675 | "maxss" compare the source and destination values and return the greater one, |
1701 | "maxss" compare the source and destination values and return the greater one, |
1676 | "minps" and "minss" compare the source and destination values and return the |
1702 | "minps" and "minss" compare the source and destination values and return the |
1677 | lesser one. |
1703 | lesser one. |
1678 | 1704 | ||
1679 | mulss xmm0,[ebx] ; multiply single precision values |
1705 | mulss xmm0,[ebx] ; multiply single precision values |
1680 | addps xmm3,xmm7 ; add packed single precision values |
1706 | addps xmm3,xmm7 ; add packed single precision values |
1681 | 1707 | ||
1682 | "andps", "andnps", "orps" and "xorps" perform the logical operations on |
1708 | "andps", "andnps", "orps" and "xorps" perform the logical operations on |
1683 | packed single precision values. The source operand can be a 128-bit memory |
1709 | packed single precision values. The source operand can be a 128-bit memory |
1684 | location or a SSE register, the destination operand must be a SSE register. |
1710 | location or a SSE register, the destination operand must be a SSE register. |
1685 | "cmpps" compares packed single precision values and returns a mask result |
1711 | "cmpps" compares packed single precision values and returns a mask result |
1686 | into the destination operand, which must be a SSE register. The source operand |
1712 | into the destination operand, which must be a SSE register. The source operand |
1687 | can be a 128-bit memory location or SSE register, the third operand must be an |
1713 | can be a 128-bit memory location or SSE register, the third operand must be an |
1688 | immediate operand selecting code of one of the eight compare conditions |
1714 | immediate operand selecting code of one of the eight compare conditions |
1689 | (table 2.3). "cmpss" performs the same operation on single precision values, |
1715 | (table 2.3). "cmpss" performs the same operation on single precision values, |
1690 | only low double word of destination register is affected, in this case source |
1716 | only low double word of destination register is affected, in this case source |
1691 | operand can be a 32-bit memory location or SSE register. These two |
1717 | operand can be a 32-bit memory location or SSE register. These two |
1692 | instructions have also variants with only two operands and the condition |
1718 | instructions have also variants with only two operands and the condition |
1693 | encoded within mnemonic. Their mnemonics are obtained by attaching the |
1719 | encoded within mnemonic. Their mnemonics are obtained by attaching the |
1694 | mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or |
1720 | mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or |
1695 | "ss" at the end. |
1721 | "ss" at the end. |
1696 | 1722 | ||
1697 | cmpps xmm2,xmm4,0 ; compare packed single precision values |
1723 | cmpps xmm2,xmm4,0 ; compare packed single precision values |
1698 | cmpltss xmm0,[ebx] ; compare single precision values |
1724 | cmpltss xmm0,[ebx] ; compare single precision values |
1699 | 1725 | ||
1700 | Table 2.3 SSE conditions |
1726 | Table 2.3 SSE conditions |
1701 | ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ |
1727 | /-------------------------------------------\ |
1702 | ³ Code ³ Mnemonic ³ Description ³ |
1728 | | Code | Mnemonic | Description | |
1703 | ÆÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵ |
1729 | |======|==========|=========================| |
1704 | ³ 0 ³ eq ³ equal ³ |
1730 | | 0 | eq | equal | |
1705 | ³ 1 ³ lt ³ less than ³ |
1731 | | 1 | lt | less than | |
1706 | ³ 2 ³ le ³ less than or equal ³ |
1732 | | 2 | le | less than or equal | |
1707 | ³ 3 ³ unord ³ unordered ³ |
1733 | | 3 | unord | unordered | |
1708 | ³ 4 ³ neq ³ not equal ³ |
1734 | | 4 | neq | not equal | |
1709 | ³ 5 ³ nlt ³ not less than ³ |
1735 | | 5 | nlt | not less than | |
1710 | ³ 6 ³ nle ³ not less than nor equal ³ |
1736 | | 6 | nle | not less than nor equal | |
1711 | ³ 7 ³ ord ³ ordered ³ |
1737 | | 7 | ord | ordered | |
1712 | ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ |
1738 | \-------------------------------------------/ |
1713 | 1739 | ||
1714 | "comiss" and "ucomiss" compare the single precision values and set the ZF, |
1740 | "comiss" and "ucomiss" compare the single precision values and set the ZF, |
1715 | PF and CF flags to show the result. The destination operand must be a SSE |
1741 | PF and CF flags to show the result. The destination operand must be a SSE |
1716 | register, the source operand can be a 32-bit memory location or SSE register. |
1742 | register, the source operand can be a 32-bit memory location or SSE register. |
1717 | "shufps" moves any two of the four single precision values from the |
1743 | "shufps" moves any two of the four single precision values from the |
1718 | destination operand into the low quad word of the destination operand, and any |
1744 | destination operand into the low quad word of the destination operand, and any |
1719 | two of the four values from the source operand into the high quad word of the |
1745 | two of the four values from the source operand into the high quad word of the |
1720 | destination operand. The destination operand must be a SSE register, the |
1746 | destination operand. The destination operand must be a SSE register, the |
1721 | source operand can be a 128-bit memory location or SSE register, the third |
1747 | source operand can be a 128-bit memory location or SSE register, the third |
1722 | operand must be an 8-bit immediate value selecting which values will be moved |
1748 | operand must be an 8-bit immediate value selecting which values will be moved |
1723 | into the destination operand. Bits 0 and 1 select the value to be moved from |
1749 | into the destination operand. Bits 0 and 1 select the value to be moved from |
1724 | destination operand to the low double word of the result, bits 2 and 3 select |
1750 | destination operand to the low double word of the result, bits 2 and 3 select |
1725 | the value to be moved from the destination operand to the second double word, |
1751 | the value to be moved from the destination operand to the second double word, |
1726 | bits 4 and 5 select the value to be moved from the source operand to the third |
1752 | bits 4 and 5 select the value to be moved from the source operand to the third |
1727 | double word, and bits 6 and 7 select the value to be moved from the source |
1753 | double word, and bits 6 and 7 select the value to be moved from the source |
1728 | operand to the high double word of the result. |
1754 | operand to the high double word of the result. |
1729 | 1755 | ||
1730 | shufps xmm0,xmm0,10010011b ; shuffle double words |
1756 | shufps xmm0,xmm0,10010011b ; shuffle double words |
1731 | 1757 | ||
1732 | "unpckhps" performs an interleaved unpack of the values from the high parts |
1758 | "unpckhps" performs an interleaved unpack of the values from the high parts |
1733 | of the source and destination operands and stores the result in the |
1759 | of the source and destination operands and stores the result in the |
1734 | destination operand, which must be a SSE register. The source operand can be |
1760 | destination operand, which must be a SSE register. The source operand can be |
1735 | a 128-bit memory location or a SSE register. "unpcklps" performs an |
1761 | a 128-bit memory location or a SSE register. "unpcklps" performs an |
1736 | interleaved unpack of the values from the low parts of the source and |
1762 | interleaved unpack of the values from the low parts of the source and |
1737 | destination operand and stores the result in the destination operand, |
1763 | destination operand and stores the result in the destination operand, |
1738 | the rules for operands are the same. |
1764 | the rules for operands are the same. |
1739 | "cvtpi2ps" converts packed two double word integers into the the packed two |
1765 | "cvtpi2ps" converts packed two double word integers into the the packed two |
1740 | single precision floating point values and stores the result in the low quad |
1766 | single precision floating point values and stores the result in the low quad |
1741 | word of the destination operand, which should be a SSE register. The source |
1767 | word of the destination operand, which should be a SSE register. The source |
1742 | operand can be a 64-bit memory location or MMX register. |
1768 | operand can be a 64-bit memory location or MMX register. |
1743 | 1769 | ||
1744 | cvtpi2ps xmm0,mm0 ; convert integers to single precision values |
1770 | cvtpi2ps xmm0,mm0 ; convert integers to single precision values |
1745 | 1771 | ||
1746 | "cvtsi2ss" converts a double word integer into a single precision floating |
1772 | "cvtsi2ss" converts a double word integer into a single precision floating |
1747 | point value and stores the result in the low double word of the destination |
1773 | point value and stores the result in the low double word of the destination |
1748 | operand, which should be a SSE register. The source operand can be a 32-bit |
1774 | operand, which should be a SSE register. The source operand can be a 32-bit |
1749 | memory location or 32-bit general register. |
1775 | memory location or 32-bit general register. |
1750 | 1776 | ||
1751 | cvtsi2ss xmm0,eax ; convert integer to single precision value |
1777 | cvtsi2ss xmm0,eax ; convert integer to single precision value |
1752 | 1778 | ||
1753 | "cvtps2pi" converts packed two single precision floating point values into |
1779 | "cvtps2pi" converts packed two single precision floating point values into |
1754 | packed two double word integers and stores the result in the destination |
1780 | packed two double word integers and stores the result in the destination |
1755 | operand, which should be a MMX register. The source operand can be a 64-bit |
1781 | operand, which should be a MMX register. The source operand can be a 64-bit |
1756 | memory location or SSE register, only low quad word of SSE register is used. |
1782 | memory location or SSE register, only low quad word of SSE register is used. |
1757 | "cvttps2pi" performs the similar operation, except that truncation is used to |
1783 | "cvttps2pi" performs the similar operation, except that truncation is used to |
1758 | round a source values to integers, rules for the operands are the same. |
1784 | round a source values to integers, rules for the operands are the same. |
1759 | 1785 | ||
1760 | cvtps2pi mm0,xmm0 ; convert single precision values to integers |
1786 | cvtps2pi mm0,xmm0 ; convert single precision values to integers |
1761 | 1787 | ||
1762 | "cvtss2si" convert a single precision floating point value into a double |
1788 | "cvtss2si" convert a single precision floating point value into a double |
1763 | word integer and stores the result in the destination operand, which should be |
1789 | word integer and stores the result in the destination operand, which should be |
1764 | a 32-bit general register. The source operand can be a 32-bit memory location |
1790 | a 32-bit general register. The source operand can be a 32-bit memory location |
1765 | or SSE register, only low double word of SSE register is used. "cvttss2si" |
1791 | or SSE register, only low double word of SSE register is used. "cvttss2si" |
1766 | performs the similar operation, except that truncation is used to round a |
1792 | performs the similar operation, except that truncation is used to round a |
1767 | source value to integer, rules for the operands are the same. |
1793 | source value to integer, rules for the operands are the same. |
1768 | 1794 | ||
1769 | cvtss2si eax,xmm0 ; convert single precision value to integer |
1795 | cvtss2si eax,xmm0 ; convert single precision value to integer |
1770 | 1796 | ||
1771 | "pextrw" copies the word in the source operand specified by the third |
1797 | "pextrw" copies the word in the source operand specified by the third |
1772 | operand to the destination operand. The source operand must be a MMX register, |
1798 | operand to the destination operand. The source operand must be a MMX register, |
1773 | the destination operand must be a 32-bit general register (but only the low |
1799 | the destination operand must be a 32-bit general register (the high word of |
1774 | word of it is affected), the third operand must an 8-bit immediate value. |
1800 | the destination is cleared), the third operand must an 8-bit immediate value. |
1775 | 1801 | ||
1776 | pextrw eax,mm0,1 ; extract word into eax |
1802 | pextrw eax,mm0,1 ; extract word into eax |
1777 | 1803 | ||
1778 | "pinsrw" inserts a word from the source operand in the destination operand |
1804 | "pinsrw" inserts a word from the source operand in the destination operand |
1779 | at the location specified with the third operand, which must be an 8-bit |
1805 | at the location specified with the third operand, which must be an 8-bit |
1780 | immediate value. The destination operand must be a MMX register, the source |
1806 | immediate value. The destination operand must be a MMX register, the source |
1781 | operand can be a 16-bit memory location or 32-bit general register (only low |
1807 | operand can be a 16-bit memory location or 32-bit general register (only low |
1782 | word of the register is used). |
1808 | word of the register is used). |
1783 | 1809 | ||
1784 | pinsrw mm1,ebx,2 ; insert word from ebx |
1810 | pinsrw mm1,ebx,2 ; insert word from ebx |
1785 | 1811 | ||
1786 | "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub" |
1812 | "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub" |
1787 | return the maximum values of packed unsigned bytes, "pminub" returns the |
1813 | return the maximum values of packed unsigned bytes, "pminub" returns the |
1788 | minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values |
1814 | minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values |
1789 | of packed signed words, "pminsw" returns the minimum values of packed signed |
1815 | of packed signed words, "pminsw" returns the minimum values of packed signed |
1790 | words. "pmulhuw" performs a unsigned multiply of the packed words and stores |
1816 | words. "pmulhuw" performs a unsigned multiplication of the packed words and |
1791 | the high words of the results in the destination operand. "psadbw" computes |
1817 | stores the high words of the results in the destination operand. "psadbw" |
1792 | the absolute differences of packed unsigned bytes, sums the differences, and |
1818 | computes the absolute differences of packed unsigned bytes, sums the |
1793 | stores the sum in the low word of destination operand. All these instructions |
1819 | differences, and stores the sum in the low word of destination operand. All |
1794 | follow the same rules for operands as the general MMX operations described in |
1820 | these instructions follow the same rules for operands as the general MMX |
1795 | previous section. |
1821 | operations described in previous section. |
1796 | "pmovmskb" creates a mask made of the most significant bit of each byte in |
1822 | "pmovmskb" creates a mask made of the most significant bit of each byte in |
1797 | the source operand and stores the result in the low byte of destination |
1823 | the source operand and stores the result in the low byte of destination |
1798 | operand. The source operand must be a MMX register, the destination operand |
1824 | operand. The source operand must be a MMX register, the destination operand |
1799 | must a 32-bit general register. |
1825 | must a 32-bit general register. |
1800 | "pshufw" inserts words from the source operand in the destination operand |
1826 | "pshufw" inserts words from the source operand in the destination operand |
1801 | from the locations specified with the third operand. The destination operand |
1827 | from the locations specified with the third operand. The destination operand |
1802 | must be a MMX register, the source operand can be a 64-bit memory location or |
1828 | must be a MMX register, the source operand can be a 64-bit memory location or |
1803 | MMX register, third operand must an 8-bit immediate value selecting which |
1829 | MMX register, third operand must an 8-bit immediate value selecting which |
1804 | values will be moved into destination operand, in the similar way as the third |
1830 | values will be moved into destination operand, in the similar way as the third |
1805 | operand of the "shufps" instruction. |
1831 | operand of the "shufps" instruction. |
1806 | "movntq" moves the quad word from the source operand to memory using a |
1832 | "movntq" moves the quad word from the source operand to memory using a |
1807 | non-temporal hint to minimize cache pollution. The source operand should be a |
1833 | non-temporal hint to minimize cache pollution. The source operand should be a |
1808 | MMX register, the destination operand should be a 64-bit memory location. |
1834 | MMX register, the destination operand should be a 64-bit memory location. |
1809 | "movntps" stores packed single precision values from the SSE register to |
1835 | "movntps" stores packed single precision values from the SSE register to |
1810 | memory using a non-temporal hint. The source operand should be a SSE register, |
1836 | memory using a non-temporal hint. The source operand should be a SSE register, |
1811 | the destination operand should be a 128-bit memory location. "maskmovq" stores |
1837 | the destination operand should be a 128-bit memory location. "maskmovq" stores |
1812 | selected bytes from the first operand into a 64-bit memory location using a |
1838 | selected bytes from the first operand into a 64-bit memory location using a |
1813 | non-temporal hint. Both operands should be a MMX registers, the second operand |
1839 | non-temporal hint. Both operands should be a MMX registers, the second operand |
1814 | selects wich bytes from the source operand are written to memory. The |
1840 | selects wich bytes from the source operand are written to memory. The |
1815 | memory location is pointed by DI (or EDI) register in the segment selected |
1841 | memory location is pointed by DI (or EDI) register in the segment selected |
1816 | by DS. |
1842 | by DS. |
1817 | "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line |
1843 | "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line |
1818 | of data from memory that contains byte specified with the operand to a |
1844 | of data from memory that contains byte specified with the operand to a |
1819 | specified location in hierarchy. The operand should be an 8-bit memory |
1845 | specified location in hierarchy. The operand should be an 8-bit memory |
1820 | location. |
1846 | location. |
1821 | "sfence" performs a serializing operation on all instruction storing to |
1847 | "sfence" performs a serializing operation on all instruction storing to |
1822 | memory that were issued prior to it. This instruction has no operands. |
1848 | memory that were issued prior to it. This instruction has no operands. |
1823 | "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr" |
1849 | "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr" |
1824 | stores the contents of MXCSR into a 32-bit memory operand. |
1850 | stores the contents of MXCSR into a 32-bit memory operand. |
1825 | "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU |
1851 | "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU |
1826 | and SSE registers to a 512-byte memory location specified in the destination |
1852 | and SSE registers to a 512-byte memory location specified in the destination |
1827 | operand. "fxrstor" reloads data previously stored with "fxsave" instruction |
1853 | operand. "fxrstor" reloads data previously stored with "fxsave" instruction |
1828 | from the specified 512-byte memory location. The memory operand for both those |
1854 | from the specified 512-byte memory location. The memory operand for both those |
1829 | instructions must be aligned on 16 byte boundary, it should declare operand |
1855 | instructions must be aligned on 16 byte boundary, it should declare operand |
1830 | of no specified size. |
1856 | of no specified size. |
1831 | 1857 | ||
1832 | 1858 | ||
1833 | 2.1.16 SSE2 instructions |
1859 | 2.1.16 SSE2 instructions |
1834 | 1860 | ||
1835 | The SSE2 extension introduces the operations on packed double precision |
1861 | The SSE2 extension introduces the operations on packed double precision |
1836 | floating point values, extends the syntax of MMX instructions, and adds also |
1862 | floating point values, extends the syntax of MMX instructions, and adds also |
1837 | some new instructions. |
1863 | some new instructions. |
1838 | "movapd" and "movupd" transfer a double quad word operand containing packed |
1864 | "movapd" and "movupd" transfer a double quad word operand containing packed |
1839 | double precision values from source operand to destination operand. These |
1865 | double precision values from source operand to destination operand. These |
1840 | instructions are analogous to "movaps" and "movups" and have the same rules |
1866 | instructions are analogous to "movaps" and "movups" and have the same rules |
1841 | for operands. |
1867 | for operands. |
1842 | "movlpd" moves double precision value between the memory and the low quad |
1868 | "movlpd" moves double precision value between the memory and the low quad |
1843 | word of SSE register. "movhpd" moved double precision value between the memory |
1869 | word of SSE register. "movhpd" moved double precision value between the memory |
1844 | and the high quad word of SSE register. These instructions are analogous to |
1870 | and the high quad word of SSE register. These instructions are analogous to |
1845 | "movlps" and "movhps" and have the same rules for operands. |
1871 | "movlps" and "movhps" and have the same rules for operands. |
1846 | "movmskpd" transfers the most significant bit of each of the two double |
1872 | "movmskpd" transfers the most significant bit of each of the two double |
1847 | precision values in the SSE register into low two bits of a general register. |
1873 | precision values in the SSE register into low two bits of a general register. |
1848 | This instruction is analogous to "movmskps" and has the same rules for |
1874 | This instruction is analogous to "movmskps" and has the same rules for |
1849 | operands. |
1875 | operands. |
1850 | "movsd" transfers a double precision value between source and destination |
1876 | "movsd" transfers a double precision value between source and destination |
1851 | operand (only the low quad word is trasferred). At least one of the operands |
1877 | operand (only the low quad word is trasferred). At least one of the operands |
1852 | have to be a SSE register, the second one can be also a SSE register or 64-bit |
1878 | have to be a SSE register, the second one can be also a SSE register or 64-bit |
1853 | memory location. |
1879 | memory location. |
1854 | Arithmetic operations on double precision values are: "addpd", "addsd", |
1880 | Arithmetic operations on double precision values are: "addpd", "addsd", |
1855 | "subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd", |
1881 | "subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd", |
1856 | "maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic |
1882 | "maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic |
1857 | operations on single precision values described in previous section. When the |
1883 | operations on single precision values described in previous section. When the |
1858 | mnemonic ends with "pd" instead of "ps", the operation is performed on packed |
1884 | mnemonic ends with "pd" instead of "ps", the operation is performed on packed |
1859 | two double precision values, but rules for operands are the same. When the |
1885 | two double precision values, but rules for operands are the same. When the |
1860 | mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit |
1886 | mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit |
1861 | memory location or a SSE register, the destination operand must be a SSE |
1887 | memory location or a SSE register, the destination operand must be a SSE |
1862 | register and the operation is performed on double precision values, only low |
1888 | register and the operation is performed on double precision values, only low |
1863 | quad words of SSE registers are used in this case. |
1889 | quad words of SSE registers are used in this case. |
1864 | "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on |
1890 | "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on |
1865 | packed double precision values. They are analoguous to SSE logical operations |
1891 | packed double precision values. They are analoguous to SSE logical operations |
1866 | on single prevision values and have the same rules for operands. |
1892 | on single prevision values and have the same rules for operands. |
1867 | "cmppd" compares packed double precision values and returns and returns a |
1893 | "cmppd" compares packed double precision values and returns and returns a |
1868 | mask result into the destination operand. This instruction is analoguous to |
1894 | mask result into the destination operand. This instruction is analoguous to |
1869 | "cmpps" and has the same rules for operands. "cmpsd" performs the same |
1895 | "cmpps" and has the same rules for operands. "cmpsd" performs the same |
1870 | operation on double precision values, only low quad word of destination |
1896 | operation on double precision values, only low quad word of destination |
1871 | register is affected, in this case source operand can be a 64-bit memory or |
1897 | register is affected, in this case source operand can be a 64-bit memory or |
1872 | SSE register. Variant with only two operands are obtained by attaching the |
1898 | SSE register. Variant with only two operands are obtained by attaching the |
1873 | condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching |
1899 | condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching |
1874 | the "pd" or "sd" at the end. |
1900 | the "pd" or "sd" at the end. |
1875 | "comisd" and "ucomisd" compare the double precision values and set the ZF, |
1901 | "comisd" and "ucomisd" compare the double precision values and set the ZF, |
1876 | PF and CF flags to show the result. The destination operand must be a SSE |
1902 | PF and CF flags to show the result. The destination operand must be a SSE |
1877 | register, the source operand can be a 128-bit memory location or SSE register. |
1903 | register, the source operand can be a 128-bit memory location or SSE register. |
1878 | "shufpd" moves any of the two double precision values from the destination |
1904 | "shufpd" moves any of the two double precision values from the destination |
1879 | operand into the low quad word of the destination operand, and any of the two |
1905 | operand into the low quad word of the destination operand, and any of the two |
1880 | values from the source operand into the high quad word of the destination |
1906 | values from the source operand into the high quad word of the destination |
1881 | operand. This instruction is analoguous to "shufps" and has the same rules for |
1907 | operand. This instruction is analoguous to "shufps" and has the same rules for |
1882 | operand. Bit 0 of the third operand selects the value to be moved from the |
1908 | operand. Bit 0 of the third operand selects the value to be moved from the |
1883 | destination operand, bit 1 selects the value to be moved from the source |
1909 | destination operand, bit 1 selects the value to be moved from the source |
1884 | operand, the rest of bits are reserved and must be zeroed. |
1910 | operand, the rest of bits are reserved and must be zeroed. |
1885 | "unpckhpd" performs an unpack of the high quad words from the source and |
1911 | "unpckhpd" performs an unpack of the high quad words from the source and |
1886 | destination operands, "unpcklpd" performs an unpack of the low quad words from |
1912 | destination operands, "unpcklpd" performs an unpack of the low quad words from |
1887 | the source and destination operands. They are analoguous to "unpckhps" and |
1913 | the source and destination operands. They are analoguous to "unpckhps" and |
1888 | "unpcklps", and have the same rules for operands. |
1914 | "unpcklps", and have the same rules for operands. |
1889 | "cvtps2pd" converts the packed two single precision floating point values to |
1915 | "cvtps2pd" converts the packed two single precision floating point values to |
1890 | two packed double precision floating point values, the destination operand |
1916 | two packed double precision floating point values, the destination operand |
1891 | must be a SSE register, the source operand can be a 64-bit memory location or |
1917 | must be a SSE register, the source operand can be a 64-bit memory location or |
1892 | SSE register. "cvtpd2ps" converts the packed two double precision floating |
1918 | SSE register. "cvtpd2ps" converts the packed two double precision floating |
1893 | point values to packed two single precision floating point values, the |
1919 | point values to packed two single precision floating point values, the |
1894 | destination operand must be a SSE register, the source operand can be a |
1920 | destination operand must be a SSE register, the source operand can be a |
1895 | 128-bit memory location or SSE register. "cvtss2sd" converts the single |
1921 | 128-bit memory location or SSE register. "cvtss2sd" converts the single |
1896 | precision floating point value to double precision floating point value, the |
1922 | precision floating point value to double precision floating point value, the |
1897 | destination operand must be a SSE register, the source operand can be a 32-bit |
1923 | destination operand must be a SSE register, the source operand can be a 32-bit |
1898 | memory location or SSE register. "cvtsd2ss" converts the double precision |
1924 | memory location or SSE register. "cvtsd2ss" converts the double precision |
1899 | floating point value to single precision floating point value, the destination |
1925 | floating point value to single precision floating point value, the destination |
1900 | operand must be a SSE register, the source operand can be 64-bit memory |
1926 | operand must be a SSE register, the source operand can be 64-bit memory |
1901 | location or SSE register. |
1927 | location or SSE register. |
1902 | "cvtpi2pd" converts packed two double word integers into the the packed |
1928 | "cvtpi2pd" converts packed two double word integers into the the packed |
1903 | double precision floating point values, the destination operand must be a SSE |
1929 | double precision floating point values, the destination operand must be a SSE |
1904 | register, the source operand can be a 64-bit memory location or MMX register. |
1930 | register, the source operand can be a 64-bit memory location or MMX register. |
1905 | "cvtsi2sd" converts a double word integer into a double precision floating |
1931 | "cvtsi2sd" converts a double word integer into a double precision floating |
1906 | point value, the destination operand must be a SSE register, the source |
1932 | point value, the destination operand must be a SSE register, the source |
1907 | operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi" |
1933 | operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi" |
1908 | converts packed double precision floating point values into packed two double |
1934 | converts packed double precision floating point values into packed two double |
1909 | word integers, the destination operand should be a MMX register, the source |
1935 | word integers, the destination operand should be a MMX register, the source |
1910 | operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs |
1936 | operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs |
1911 | the similar operation, except that truncation is used to round a source values |
1937 | the similar operation, except that truncation is used to round a source values |
1912 | to integers, rules for operands are the same. "cvtsd2si" converts a double |
1938 | to integers, rules for operands are the same. "cvtsd2si" converts a double |
1913 | precision floating point value into a double word integer, the destination |
1939 | precision floating point value into a double word integer, the destination |
1914 | operand should be a 32-bit general register, the source operand can be a |
1940 | operand should be a 32-bit general register, the source operand can be a |
1915 | 64-bit memory location or SSE register. "cvttsd2si" performs the similar |
1941 | 64-bit memory location or SSE register. "cvttsd2si" performs the similar |
1916 | operation, except that truncation is used to round a source value to integer, |
1942 | operation, except that truncation is used to round a source value to integer, |
1917 | rules for operands are the same. |
1943 | rules for operands are the same. |
1918 | "cvtps2dq" and "cvttps2dq" convert packed single precision floating point |
1944 | "cvtps2dq" and "cvttps2dq" convert packed single precision floating point |
1919 | values to packed four double word integers, storing them in the destination |
1945 | values to packed four double word integers, storing them in the destination |
1920 | operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating |
1946 | operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating |
1921 | point values to packed two double word integers, storing the result in the low |
1947 | point values to packed two double word integers, storing the result in the low |
1922 | quad word of the destination operand. "cvtdq2ps" converts packed four |
1948 | quad word of the destination operand. "cvtdq2ps" converts packed four |
1923 | double word integers to packed single precision floating point values. |
1949 | double word integers to packed single precision floating point values. |
1924 | "cvtdq2pd" converts packed two double word integers from the low quad word |
1950 | For all these instruction destination operand must be a SSE register, the |
1925 | of the source operand to packed double precision floating point values. |
- | |
1926 | For all these instruction destination operand must be a SSE register, the |
- | |
1927 | source operand can be a 128-bit memory location or SSE register. |
1951 | source operand can be a 128-bit memory location or SSE register. |
1928 | "movdqa" and "movdqu" transfer a double quad word operand containing packed |
1952 | "cvtdq2pd" converts packed two double word integers from the source operand to |
- | 1953 | packed double precision floating point values, the source can be a 64-bit |
|
- | 1954 | memory location or SSE register, destination has to be SSE register. |
|
- | 1955 | "movdqa" and "movdqu" transfer a double quad word operand containing packed |
|
1929 | integers from source operand to destination operand. At least one of the |
1956 | integers from source operand to destination operand. At least one of the |
1930 | operands have to be a SSE register, the second one can be also a SSE register |
1957 | operands have to be a SSE register, the second one can be also a SSE register |
1931 | or 128-bit memory location. Memory operands for "movdqa" instruction must be |
1958 | or 128-bit memory location. Memory operands for "movdqa" instruction must be |
1932 | aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have |
1959 | aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have |
1933 | to be aligned. |
1960 | to be aligned. |
1934 | "movq2dq" moves the contents of the MMX source register to the low quad word |
1961 | "movq2dq" moves the contents of the MMX source register to the low quad word |
1935 | of destination SSE register. "movdq2q" moves the low quad word from the source |
1962 | of destination SSE register. "movdq2q" moves the low quad word from the source |
1936 | SSE register to the destination MMX register. |
1963 | SSE register to the destination MMX register. |
1937 | 1964 | ||
1938 | movq2dq xmm0,mm1 ; move from MMX register to SSE register |
1965 | movq2dq xmm0,mm1 ; move from MMX register to SSE register |
1939 | movdq2q mm0,xmm1 ; move from SSE register to MMX register |
1966 | movdq2q mm0,xmm1 ; move from SSE register to MMX register |
1940 | 1967 | ||
1941 | All MMX instructions operating on the 64-bit packed integers (those with |
1968 | All MMX instructions operating on the 64-bit packed integers (those with |
1942 | mnemonics starting with "p") are extended to operate on 128-bit packed |
1969 | mnemonics starting with "p") are extended to operate on 128-bit packed |
1943 | integers located in SSE registers. Additional syntax for these instructions |
1970 | integers located in SSE registers. Additional syntax for these instructions |
1944 | needs an SSE register where MMX register was needed, and the 128-bit memory |
1971 | needs an SSE register where MMX register was needed, and the 128-bit memory |
1945 | location or SSE register where 64-bit memory location of MMX register were |
1972 | location or SSE register where 64-bit memory location or MMX register were |
1946 | needed. The exception is "pshufw" instruction, which doesn't allow extended |
1973 | needed. The exception is "pshufw" instruction, which doesn't allow extended |
1947 | syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only |
1974 | syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only |
1948 | the extended syntax, and perform the same operation as "pshufw" on the high |
1975 | the extended syntax, and perform the same operation as "pshufw" on the high |
1949 | or low quad words of operands respectively. Also the new instruction "pshufd" |
1976 | or low quad words of operands respectively. Also the new instruction "pshufd" |
1950 | is introduced, which performs the same operation as "pshufw", but on the |
1977 | is introduced, which performs the same operation as "pshufw", but on the |
1951 | double words instead of words, it allows only the extended syntax. |
1978 | double words instead of words, it allows only the extended syntax. |
1952 | 1979 | ||
1953 | psubb xmm0,[esi] ; substract 16 packed bytes |
1980 | psubb xmm0,[esi] ; substract 16 packed bytes |
1954 | pextrw eax,xmm0,7 ; extract highest word into eax |
1981 | pextrw eax,xmm0,7 ; extract highest word into eax |
1955 | 1982 | ||
1956 | "paddq" performs the addition of packed quad words, "psubq" performs the |
1983 | "paddq" performs the addition of packed quad words, "psubq" performs the |
1957 | substraction of packed quad words, "pmuludq" performs an unsigned multiply |
1984 | substraction of packed quad words, "pmuludq" performs an unsigned |
1958 | of low double words from each corresponding quad words and returns the results |
1985 | multiplication of low double words from each corresponding quad words and |
1959 | in packed quad words. These instructions follow the same rules for operands as |
1986 | returns the results in packed quad words. These instructions follow the same |
1960 | the general MMX operations described in 2.1.14. |
1987 | rules for operands as the general MMX operations described in 2.1.14. |
1961 | "pslldq" and "psrldq" perform logical shift left or right of the double |
1988 | "pslldq" and "psrldq" perform logical shift left or right of the double |
1962 | quad word in the destination operand by the amount of bits specified in the |
1989 | quad word in the destination operand by the amount of bytes specified in the |
1963 | source operand. The destination operand should be a SSE register, source |
1990 | source operand. The destination operand should be a SSE register, source |
1964 | operand should be an 8-bit immediate value. |
1991 | operand should be an 8-bit immediate value. |
1965 | "punpckhqdq" interleaves the high quad word of the source operand and the |
1992 | "punpckhqdq" interleaves the high quad word of the source operand and the |
1966 | high quad word of the destination operand and writes them to the destination |
1993 | high quad word of the destination operand and writes them to the destination |
1967 | SSE register. "punpcklqdq" interleaves the low quad word of the source operand |
1994 | SSE register. "punpcklqdq" interleaves the low quad word of the source operand |
1968 | and the low quad word of the destination operand and writes them to the |
1995 | and the low quad word of the destination operand and writes them to the |
1969 | destination SSE register. The source operand can be a 128-bit memory location |
1996 | destination SSE register. The source operand can be a 128-bit memory location |
1970 | or SSE register. |
1997 | or SSE register. |
1971 | "movntdq" stores packed integer data from the SSE register to memory using |
1998 | "movntdq" stores packed integer data from the SSE register to memory using |
1972 | non-temporal hint. The source operand should be a SSE register, the |
1999 | non-temporal hint. The source operand should be a SSE register, the |
1973 | destination operand should be a 128-bit memory location. "movntpd" stores |
2000 | destination operand should be a 128-bit memory location. "movntpd" stores |
1974 | packed double precision values from the SSE register to memory using a |
2001 | packed double precision values from the SSE register to memory using a |
1975 | non-temporal hint. Rules for operand are the same. "movnti" stores integer |
2002 | non-temporal hint. Rules for operand are the same. "movnti" stores integer |
1976 | from a general register to memory using a non-temporal hint. The source |
2003 | from a general register to memory using a non-temporal hint. The source |
1977 | operand should be a 32-bit general register, the destination operand should |
2004 | operand should be a 32-bit general register, the destination operand should |
1978 | be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first |
2005 | be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first |
1979 | operand into a 128-bit memory location using a non-temporal hint. Both |
2006 | operand into a 128-bit memory location using a non-temporal hint. Both |
1980 | operands should be a SSE registers, the second operand selects wich bytes from |
2007 | operands should be a SSE registers, the second operand selects wich bytes from |
1981 | the source operand are written to memory. The memory location is pointed by DI |
2008 | the source operand are written to memory. The memory location is pointed by DI |
1982 | (or EDI) register in the segment selected by DS and does not need to be |
2009 | (or EDI) register in the segment selected by DS and does not need to be |
1983 | aligned. |
2010 | aligned. |
1984 | "clflush" writes and invalidates the cache line associated with the address |
2011 | "clflush" writes and invalidates the cache line associated with the address |
1985 | of byte specified with the operand, which should be a 8-bit memory location. |
2012 | of byte specified with the operand, which should be a 8-bit memory location. |
1986 | "lfence" performs a serializing operation on all instruction loading from |
2013 | "lfence" performs a serializing operation on all instruction loading from |
1987 | memory that were issued prior to it. "mfence" performs a serializing operation |
2014 | memory that were issued prior to it. "mfence" performs a serializing operation |
1988 | on all instruction accesing memory that were issued prior to it, and so it |
2015 | on all instruction accesing memory that were issued prior to it, and so it |
1989 | combines the functions of "sfence" (described in previous section) and |
2016 | combines the functions of "sfence" (described in previous section) and |
1990 | "lfence" instructions. These instructions have no operands. |
2017 | "lfence" instructions. These instructions have no operands. |
1991 | 2018 | ||
1992 | 2019 | ||
1993 | 2.1.17 SSE3 instructions |
2020 | 2.1.17 SSE3 instructions |
1994 | 2021 | ||
1995 | Prescott technology introduced some new instructions to improve the performance |
2022 | Prescott technology introduced some new instructions to improve the performance |
1996 | of SSE and SSE2 - this extension is called SSE3. |
2023 | of SSE and SSE2 - this extension is called SSE3. |
1997 | "fisttp" behaves like the "fistp" instruction and accepts the same operands, |
2024 | "fisttp" behaves like the "fistp" instruction and accepts the same operands, |
1998 | the only difference is that it always used truncation, irrespective of the |
2025 | the only difference is that it always used truncation, irrespective of the |
1999 | rounding mode. |
2026 | rounding mode. |
2000 | "movshdup" loads into destination operand the 128-bit value obtained from |
2027 | "movshdup" loads into destination operand the 128-bit value obtained from |
2001 | the source value of the same size by filling the each quad word with the two |
2028 | the source value of the same size by filling the each quad word with the two |
2002 | duplicates of the value in its high double word. "movsldup" performs the same |
2029 | duplicates of the value in its high double word. "movsldup" performs the same |
2003 | action, except it duplicates the values of low double words. The destination |
2030 | action, except it duplicates the values of low double words. The destination |
2004 | operand should be SSE register, the source operand can be SSE register or |
2031 | operand should be SSE register, the source operand can be SSE register or |
2005 | 128-bit memory location. |
2032 | 128-bit memory location. |
2006 | "movddup" loads the 64-bit source value and duplicates it into high and low |
2033 | "movddup" loads the 64-bit source value and duplicates it into high and low |
2007 | quad word of the destination operand. The destination operand should be SSE |
2034 | quad word of the destination operand. The destination operand should be SSE |
2008 | register, the source operand can be SSE register or 64-bit memory location. |
2035 | register, the source operand can be SSE register or 64-bit memory location. |
2009 | "lddqu" is functionally equivalent to "movdqu" instruction with memory as |
2036 | "lddqu" is functionally equivalent to "movdqu" with memory as source |
2010 | source operand, but it may improve performance when the source operand crosses |
2037 | operand, but it may improve performance when the source operand crosses a |
2011 | a cacheline boundary. The destination operand has to be SSE register, the |
2038 | cacheline boundary. The destination operand has to be SSE register, the source |
2012 | source operand must be 128-bit memory location. |
2039 | operand must be 128-bit memory location. |
2013 | "addsubps" performs single precision addition of second and fourth pairs and |
2040 | "addsubps" performs single precision addition of second and fourth pairs and |
2014 | single precision substracion of the first and third pairs of floating point |
2041 | single precision substracion of the first and third pairs of floating point |
2015 | values in the operands. "addsubpd" performs double precision addition of the |
2042 | values in the operands. "addsubpd" performs double precision addition of the |
2016 | second pair and double precision substraction of the first pair of floating |
2043 | second pair and double precision substraction of the first pair of floating |
2017 | point values in the operand. "haddps" performs the addition of two single |
2044 | point values in the operand. "haddps" performs the addition of two single |
2018 | precision values within the each quad word of source and destination operands, |
2045 | precision values within the each quad word of source and destination operands, |
2019 | and stores the results of such horizontal addition of values from destination |
2046 | and stores the results of such horizontal addition of values from destination |
2020 | operand into low quad word of destination operand, and the results from the |
2047 | operand into low quad word of destination operand, and the results from the |
2021 | source operand into high quad word of destination operand. "haddpd" performs |
2048 | source operand into high quad word of destination operand. "haddpd" performs |
2022 | the addition of two double precision values within each operand, and stores |
2049 | the addition of two double precision values within each operand, and stores |
2023 | the result from destination operand into low quad word of destination operand, |
2050 | the result from destination operand into low quad word of destination operand, |
2024 | and the result from source operand into high quad word of destination operand. |
2051 | and the result from source operand into high quad word of destination operand. |
2025 | All these instruction need the destination operand to be SSE register, source |
2052 | All these instruction need the destination operand to be SSE register, source |
2026 | operand can be SSE register or 128-bit memory location. |
2053 | operand can be SSE register or 128-bit memory location. |
2027 | "monitor" sets up an address range for monitoring of write-back stores. It |
2054 | "monitor" sets up an address range for monitoring of write-back stores. It |
2028 | need its three operands to be EAX, ECX and EDX register in that order. "mwait" |
2055 | need its three operands to be EAX, ECX and EDX register in that order. "mwait" |
2029 | waits for a write-back store to the address range set up by the "monitor" |
2056 | waits for a write-back store to the address range set up by the "monitor" |
2030 | instruction. It uses two operands with additional parameters, first being the |
2057 | instruction. It uses two operands with additional parameters, first being the |
2031 | EAX and second the ECX register. |
2058 | EAX and second the ECX register. |
2032 | 2059 | The functionality of SSE3 is further extended by the set of Supplemental |
|
- | 2060 | SSE3 instructions (SSSE3). They generally follow the same rules for operands |
|
- | 2061 | as all the MMX operations extended by SSE. |
|
- | 2062 | "phaddw" and "phaddd" perform the horizontal additional of the pairs of |
|
- | 2063 | adjacent values from both the source and destination operand, and stores the |
|
- | 2064 | sums into the destination (sums from the source operand go into lower part of |
|
- | 2065 | destination register). They operate on 16-bit or 32-bit chunks, respectively. |
|
- | 2066 | "phaddsw" performs the same operation on signed 16-bit packed values, but the |
|
- | 2067 | result of each addition is saturated. "phsubw" and "phsubd" analogously |
|
- | 2068 | perform the horizontal substraction of 16-bit or 32-bit packed value, and |
|
- | 2069 | "phsubsw" performs the horizontal substraction of signed 16-bit packed values |
|
- | 2070 | with saturation. |
|
- | 2071 | "pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed |
|
- | 2072 | packed signed value in source operand and stores them into the destination |
|
- | 2073 | register. They operator on 8-bit, 16-bit and 32-bit elements respectively. |
|
- | 2074 | "pmaddubsw" multiplies signed 8-bit values from the source operand with the |
|
- | 2075 | corresponding unsigned 8-bit values from the destination operand to produce |
|
- | 2076 | intermediate 16-bit values, and every adjacent pair of those intermediate |
|
- | 2077 | values is then added horizontally and those 16-bit sums are stored into the |
|
- | 2078 | destination operand. |
|
- | 2079 | "pmulhrsw" multiplies corresponding 16-bit integers from the source and |
|
- | 2080 | destination operand to produce intermediate 32-bit values, and the 16 bits |
|
- | 2081 | next to the highest bit of each of those values are then rounded and packed |
|
- | 2082 | into the destination operand. |
|
- | 2083 | "pshufb" shuffles the bytes in the destination operand according to the |
|
- | 2084 | mask provided by source operand - each of the bytes in source operand is |
|
- | 2085 | an index of the target position for the corresponding byte in the destination. |
|
- | 2086 | "psignb", "psignw" and "psignd" perform the operation on 8-bit, 16-bit or |
|
- | 2087 | 32-bit integers in destination operand, depending on the signs of the values |
|
- | 2088 | in the source. If the value in source is negative, the corresponding value in |
|
- | 2089 | the destination register is negated, if the value in source is positive, no |
|
- | 2090 | operation is performed on the corresponding value is performed, and if the |
|
- | 2091 | value in source is zero, the value in destination is zeroed, too. |
|
- | 2092 | "palignr" appends the source operand to the destination operand to form the |
|
- | 2093 | intermediate value of twice the size, and then extracts into the destination |
|
- | 2094 | register the 64 or 128 bits that are right-aligned to the byte offset |
|
- | 2095 | specified by the third operand, which should be an 8-bit immediate value. This |
|
- | 2096 | is the only SSSE3 instruction that takes three arguments. |
|
- | 2097 | ||
2033 | 2098 | ||
2034 | 2.1.18 AMD 3DNow! instructions |
2099 | 2.1.18 AMD 3DNow! instructions |
2035 | 2100 | ||
2036 | The 3DNow! extension adds a new MMX instructions to those described in 2.1.14, |
2101 | The 3DNow! extension adds a new MMX instructions to those described in 2.1.14, |
2037 | and introduces operation on the 64-bit packed floating point values, each |
2102 | and introduces operation on the 64-bit packed floating point values, each |
2038 | consisting of two single precision floating point values. |
2103 | consisting of two single precision floating point values. |
2039 | These instructions follow the same rules as the general MMX operations, the |
2104 | These instructions follow the same rules as the general MMX operations, the |
2040 | destination operand should be a MMX register, the source operand can be a MMX |
2105 | destination operand should be a MMX register, the source operand can be a MMX |
2041 | register or 64-bit memory location. "pavgusb" computes the rounded averages |
2106 | register or 64-bit memory location. "pavgusb" computes the rounded averages |
2042 | of packed unsigned bytes. "pmulhrw" performs a signed multiply of the packed |
2107 | of packed unsigned bytes. "pmulhrw" performs a signed multiplication of the |
2043 | words, round the high word of each double word results and stores them in the |
2108 | packed words, round the high word of each double word results and stores them |
2044 | destination operand. "pi2fd" converts packed double word integers into |
2109 | in the destination operand. "pi2fd" converts packed double word integers into |
2045 | packed floating point values. "pf2id" converts packed floating point values |
2110 | packed floating point values. "pf2id" converts packed floating point values |
2046 | into packed double word integers using truncation. "pi2fw" converts packed |
2111 | into packed double word integers using truncation. "pi2fw" converts packed |
2047 | word integers into packed floating point values, only low words of each |
2112 | word integers into packed floating point values, only low words of each |
2048 | double word in source operand are used. "pf2iw" converts packed floating |
2113 | double word in source operand are used. "pf2iw" converts packed floating |
2049 | point values to packed word integers, results are extended to double words |
2114 | point values to packed word integers, results are extended to double words |
2050 | using the sign extension. "pfadd" adds packed floating point values. "pfsub" |
2115 | using the sign extension. "pfadd" adds packed floating point values. "pfsub" |
2051 | and "pfsubr" substracts packed floating point values, the first one substracts |
2116 | and "pfsubr" substracts packed floating point values, the first one substracts |
2052 | source values from destination values, the second one substracts destination |
2117 | source values from destination values, the second one substracts destination |
2053 | values from the source values. "pfmul" multiplies packed floating point |
2118 | values from the source values. "pfmul" multiplies packed floating point |
2054 | values. "pfacc" adds the low and high floating point values of the destination |
2119 | values. "pfacc" adds the low and high floating point values of the destination |
2055 | operand, storing the result in the low double word of destination, and adds |
2120 | operand, storing the result in the low double word of destination, and adds |
2056 | the low and high floating point values of the source operand, storing the |
2121 | the low and high floating point values of the source operand, storing the |
2057 | result in the high double word of destination. "pfnacc" substracts the high |
2122 | result in the high double word of destination. "pfnacc" substracts the high |
2058 | floating point value of the destination operand from the low, storing the |
2123 | floating point value of the destination operand from the low, storing the |
2059 | result in the low double word of destination, and substracts the high floating |
2124 | result in the low double word of destination, and substracts the high floating |
2060 | point value of the source operand from the low, storing the result in the high |
2125 | point value of the source operand from the low, storing the result in the high |
2061 | double word of destination. "pfpnacc" substracts the high floating point value |
2126 | double word of destination. "pfpnacc" substracts the high floating point value |
2062 | of the destination operand from the low, storing the result in the low double |
2127 | of the destination operand from the low, storing the result in the low double |
2063 | word of destination, and adds the low and high floating point values of the |
2128 | word of destination, and adds the low and high floating point values of the |
2064 | source operand, storing the result in the high double word of destination. |
2129 | source operand, storing the result in the high double word of destination. |
2065 | "pfmax" and "pfmin" compute the maximum and minimum of floating point values. |
2130 | "pfmax" and "pfmin" compute the maximum and minimum of floating point values. |
2066 | "pswapd" reverses the high and low double word of the source operand. "pfrcp" |
2131 | "pswapd" reverses the high and low double word of the source operand. "pfrcp" |
2067 | returns an estimates of the reciprocals of floating point values from the |
2132 | returns an estimates of the reciprocals of floating point values from the |
2068 | source operand, "pfrsqrt" returns an estimates of the reciprocal square |
2133 | source operand, "pfrsqrt" returns an estimates of the reciprocal square |
2069 | roots of floating point values from the source operand, "pfrcpit1" performs |
2134 | roots of floating point values from the source operand, "pfrcpit1" performs |
2070 | the first step in the Newton-Raphson iteration to refine the reciprocal |
2135 | the first step in the Newton-Raphson iteration to refine the reciprocal |
2071 | approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first |
2136 | approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first |
2072 | step in the Newton-Raphson iteration to refine the reciprocal square root |
2137 | step in the Newton-Raphson iteration to refine the reciprocal square root |
2073 | approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the |
2138 | approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the |
2074 | second final step in the Newton-Raphson iteration to refine the reciprocal |
2139 | second final step in the Newton-Raphson iteration to refine the reciprocal |
2075 | approximation or the reciprocal square root approximation. "pfcmpeq", |
2140 | approximation or the reciprocal square root approximation. "pfcmpeq", |
2076 | "pfcmpge" and "pfcmpgt" compare the packed floating point values and sets |
2141 | "pfcmpge" and "pfcmpgt" compare the packed floating point values and sets |
2077 | all bits or zeroes all bits of the correspoding data element in the |
2142 | all bits or zeroes all bits of the correspoding data element in the |
2078 | destination operand according to the result of comparison, first checks |
2143 | destination operand according to the result of comparison, first checks |
2079 | whether values are equal, second checks whether destination value is greater |
2144 | whether values are equal, second checks whether destination value is greater |
2080 | or equal to source value, third checks whether destination value is greater |
2145 | or equal to source value, third checks whether destination value is greater |
2081 | than source value. |
2146 | than source value. |
2082 | "prefetch" and "prefetchw" load the line of data from memory that contains |
2147 | "prefetch" and "prefetchw" load the line of data from memory that contains |
2083 | byte specified with the operand into the data cache, "prefetchw" instruction |
2148 | byte specified with the operand into the data cache, "prefetchw" instruction |
2084 | should be used when the data in the cache line is expected to be modified, |
2149 | should be used when the data in the cache line is expected to be modified, |
2085 | otherwise the "prefetch" instruction should be used. The operand should be an |
2150 | otherwise the "prefetch" instruction should be used. The operand should be an |
2086 | 8-bit memory location. |
2151 | 8-bit memory location. |
2087 | "femms" performs a fast clear of MMX state. This instruction has no |
2152 | "femms" performs a fast clear of MMX state. This instruction has no |
2088 | operands. |
2153 | operands. |
2089 | 2154 | ||
2090 | 2155 | ||
2091 | 2.1.19 The x86-64 long mode instructions |
2156 | 2.1.19 The x86-64 long mode instructions |
2092 | 2157 | ||
2093 | The AMD64 and EM64T architectures (we will use the common name x86-64 for them |
2158 | The AMD64 and EM64T architectures (we will use the common name x86-64 for them |
2094 | both) extend the x86 instruction set for the 64-bit processing. While legacy |
2159 | both) extend the x86 instruction set for the 64-bit processing. While legacy |
2095 | and compatibility modes use the same set of registers and instructions, the |
2160 | and compatibility modes use the same set of registers and instructions, the |
2096 | new long mode extends the x86 operations to 64 bits and introduces several new |
2161 | new long mode extends the x86 operations to 64 bits and introduces several new |
2097 | registers. You can turn on generating the code for this mode with the "use64" |
2162 | registers. You can turn on generating the code for this mode with the "use64" |
2098 | directive. |
2163 | directive. |
2099 | Each of the general purpose registers is extended to 64 bits and the eight |
2164 | Each of the general purpose registers is extended to 64 bits and the eight |
2100 | whole new general purpose registers and also eight new SSE registers are added. |
2165 | whole new general purpose registers and also eight new SSE registers are added. |
2101 | See table 2.4 for the summary of new registers (only the ones that was not |
2166 | See table 2.4 for the summary of new registers (only the ones that was not |
2102 | listed in table 1.2). The general purpose registers of smallers sizes are the |
2167 | listed in table 1.2). The general purpose registers of smallers sizes are the |
2103 | low order portions of the larger ones. You can still access the "ah", "bh", |
2168 | low order portions of the larger ones. You can still access the "ah", "bh", |
2104 | "ch" and "dh" registers in long mode, but you cannot use them in the same |
2169 | "ch" and "dh" registers in long mode, but you cannot use them in the same |
2105 | instruction with any of the new registers. |
2170 | instruction with any of the new registers. |
2106 | 2171 | ||
2107 | Table 2.4 New registers in long mode |
2172 | Table 2.4 New registers in long mode |
2108 | ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿ |
2173 | /--------------------------------------------------\ |
2109 | ³ Type ³ General ³ SSE ³ |
2174 | | Type | General | SSE | AVX | |
2110 | ÃÄÄÄÄÄÄÅÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÅÄÄÄÄÄÄÄ´ |
2175 | |------|---------------------------|-------|-------| |
2111 | ³ Bits ³ 8 ³ 16 ³ 32 ³ 64 ³ 128 ³ |
2176 | | Bits | 8 | 16 | 32 | 64 | 128 | 256 | |
2112 | ÆÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵ |
2177 | |======|======|======|======|======|=======|=======| |
2113 | ³ ³ ³ ³ ³ rax ³ ³ |
2178 | | | | | | rax | | | |
2114 | ³ ³ ³ ³ ³ rcx ³ ³ |
2179 | | | | | | rcx | | | |
2115 | ³ ³ ³ ³ ³ rdx ³ ³ |
2180 | | | | | | rdx | | | |
2116 | ³ ³ ³ ³ ³ rbx ³ ³ |
2181 | | | | | | rbx | | | |
2117 | ³ ³ spl ³ ³ ³ rsp ³ ³ |
2182 | | | spl | | | rsp | | | |
2118 | ³ ³ bpl ³ ³ ³ rbp ³ ³ |
2183 | | | bpl | | | rbp | | | |
2119 | ³ ³ sil ³ ³ ³ rsi ³ ³ |
2184 | | | sil | | | rsi | | | |
2120 | ³ ³ dil ³ ³ ³ rdi ³ ³ |
2185 | | | dil | | | rdi | | | |
2121 | ³ ³ r8b ³ r8w ³ r8d ³ r8 ³ xmm8 ³ |
2186 | | | r8b | r8w | r8d | r8 | xmm8 | ymm8 | |
2122 | ³ ³ r9b ³ r9w ³ r9d ³ r9 ³ xmm9 ³ |
2187 | | | r9b | r9w | r9d | r9 | xmm9 | ymm9 | |
2123 | ³ ³ r10b ³ r10w ³ r10d ³ r10 ³ xmm10 ³ |
2188 | | | r10b | r10w | r10d | r10 | xmm10 | ymm10 | |
2124 | ³ ³ r11b ³ r11w ³ r11d ³ r11 ³ xmm11 ³ |
2189 | | | r11b | r11w | r11d | r11 | xmm11 | ymm11 | |
2125 | ³ ³ r12b ³ r12w ³ r12d ³ r12 ³ xmm12 ³ |
2190 | | | r12b | r12w | r12d | r12 | xmm12 | ymm12 | |
2126 | ³ ³ r13b ³ r13w ³ r13d ³ r13 ³ xmm13 ³ |
2191 | | | r13b | r13w | r13d | r13 | xmm13 | ymm13 | |
2127 | ³ ³ r14b ³ r14w ³ r14d ³ r14 ³ xmm14 ³ |
2192 | | | r14b | r14w | r14d | r14 | xmm14 | ymm14 | |
2128 | ³ ³ r15b ³ r15w ³ r15d ³ r15 ³ xmm15 ³ |
2193 | | | r15b | r15w | r15d | r15 | xmm15 | ymm15 | |
2129 | ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ |
2194 | \--------------------------------------------------/ |
2130 | 2195 | ||
2131 | In general any instruction from x86 architecture, which allowed 16-bit or |
2196 | In general any instruction from x86 architecture, which allowed 16-bit or |
2132 | 32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit |
2197 | 32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit |
2133 | registers should be used for addressing in long mode, the 32-bit addressing |
2198 | registers should be used for addressing in long mode, the 32-bit addressing |
2134 | is also allowed, but it's not possible to use the addresses based on 16-bit |
2199 | is also allowed, but it's not possible to use the addresses based on 16-bit |
2135 | registers. Below are the samples of new operations possible in long mode on the |
2200 | registers. Below are the samples of new operations possible in long mode on the |
2136 | example of "mov" instruction: |
2201 | example of "mov" instruction: |
2137 | 2202 | ||
2138 | mov rax,r8 ; transfer 64-bit general register |
2203 | mov rax,r8 ; transfer 64-bit general register |
2139 | mov al,[rbx] ; transfer memory addressed by 64-bit register |
2204 | mov al,[rbx] ; transfer memory addressed by 64-bit register |
2140 | 2205 | ||
2141 | The long mode uses also the instruction pointer based addresses, you can |
2206 | The long mode uses also the instruction pointer based addresses, you can |
2142 | specify it manually with the special RIP register symbol, but such addressing |
2207 | specify it manually with the special RIP register symbol, but such addressing |
2143 | is also automatically generated by flat assembler, since there is no 64-bit |
2208 | is also automatically generated by flat assembler, since there is no 64-bit |
2144 | absolute addressing in long mode. You can still force the assembler to use the |
2209 | absolute addressing in long mode. You can still force the assembler to use the |
2145 | 32-bit absolute addressing by putting the "dword" size override for address |
2210 | 32-bit absolute addressing by putting the "dword" size override for address |
2146 | inside the square brackets. There is also one exception, where the 64-bit |
2211 | inside the square brackets. There is also one exception, where the 64-bit |
2147 | absolute addressing is possible, it's the "mov" instruction with one of the |
2212 | absolute addressing is possible, it's the "mov" instruction with one of the |
2148 | operand being accumulator register, and second being the memory operand. |
2213 | operand being accumulator register, and second being the memory operand. |
2149 | To force the assembler to use the 64-bit absolute addressing there, use the |
2214 | To force the assembler to use the 64-bit absolute addressing there, use the |
2150 | "qword" size operator for address inside the square brackets. When no size |
2215 | "qword" size operator for address inside the square brackets. When no size |
2151 | operator is applied to address, assembler generates the optimal form |
2216 | operator is applied to address, assembler generates the optimal form |
2152 | automatically. |
2217 | automatically. |
2153 | 2218 | ||
2154 | mov [qword 0],rax ; absolute 64-bit addressing |
2219 | mov [qword 0],rax ; absolute 64-bit addressing |
2155 | mov [dword 0],r15d ; absolute 32-bit addressing |
2220 | mov [dword 0],r15d ; absolute 32-bit addressing |
2156 | mov [0],rsi ; automatic RIP-relative addressing |
2221 | mov [0],rsi ; automatic RIP-relative addressing |
2157 | mov [rip+3],sil ; manual RIP-relative addressing |
2222 | mov [rip+3],sil ; manual RIP-relative addressing |
2158 | 2223 | ||
2159 | Also as the immediate operands for 64-bit operations only the signed 32-bit |
2224 | Also as the immediate operands for 64-bit operations only the signed 32-bit |
2160 | values are possible, with the only exception being the "mov" instruction with |
2225 | values are possible, with the only exception being the "mov" instruction with |
2161 | destination operand being 64-bit general purpose register. Trying to force the |
2226 | destination operand being 64-bit general purpose register. Trying to force the |
2162 | 64-bit immediate with any other instruction will cause an error. |
2227 | 64-bit immediate with any other instruction will cause an error. |
2163 | If any operation is performed on the 32-bit general registers in long mode, |
2228 | If any operation is performed on the 32-bit general registers in long mode, |
2164 | the upper 32 bits of the 64-bit registers containing them are filled with |
2229 | the upper 32 bits of the 64-bit registers containing them are filled with |
2165 | zeros. This is unlike the operations on 16-bit or 8-bit portions of those |
2230 | zeros. This is unlike the operations on 16-bit or 8-bit portions of those |
2166 | registers, which preserve the upper bits. |
2231 | registers, which preserve the upper bits. |
2167 | Three new type conversion instructions are available. The "cdqe" sign extends |
2232 | Three new type conversion instructions are available. The "cdqe" sign |
2168 | the double word in EAX into quad word and stores the result in RAX register. |
2233 | extends the double word in EAX into quad word and stores the result in RAX |
2169 | "cqo" sign extends the quad word in RAX into double quad word and stores the |
2234 | register. "cqo" sign extends the quad word in RAX into double quad word and |
2170 | extra bits in the RDX register. These instructions have no operands. "movsxd" |
2235 | stores the extra bits in the RDX register. These instructions have no |
2171 | sign extends the double word source operand, being either the 32-bit register |
2236 | operands. "movsxd" sign extends the double word source operand, being either |
2172 | or memory, into 64-bit destination operand, which has to be register. |
2237 | the 32-bit register or memory, into 64-bit destination operand, which has to |
2173 | No analogous instruction is needed for the zero extension, since it is done |
2238 | be register. No analogous instruction is needed for the zero extension, since |
2174 | automatically by any operations on 32-bit registers, as noted in previous |
2239 | it is done automatically by any operations on 32-bit registers, as noted in |
2175 | paragraph. And the "movzx" and "movsx" instructions, conforming to the general |
2240 | previous paragraph. And the "movzx" and "movsx" instructions, conforming to |
2176 | rule, can be used with 64-bit destination operand, allowing extension of byte |
2241 | the general rule, can be used with 64-bit destination operand, allowing |
2177 | or word values into quad words. |
2242 | extension of byte or word values into quad words. |
2178 | All the binary arithmetic and logical instruction are promoted to allow |
2243 | All the binary arithmetic and logical instruction have been promoted to |
2179 | 64-bit operands in long mode. The use of decimal arithmetic instructions in |
2244 | allow 64-bit operands in long mode. The use of decimal arithmetic instructions |
2180 | long mode is prohibited. |
2245 | in long mode is prohibited. |
2181 | The stack operations, like "push" and "pop" in long mode default to 64-bit |
2246 | The stack operations, like "push" and "pop" in long mode default to 64-bit |
2182 | operands and it's not possible to use 32-bit operands with them. The "pusha" |
2247 | operands and it's not possible to use 32-bit operands with them. The "pusha" |
2183 | and "popa" are disallowed in long mode. |
2248 | and "popa" are disallowed in long mode. |
2184 | The indirect near jumps and calls in long mode default to 64-bit operands and |
2249 | The indirect near jumps and calls in long mode default to 64-bit operands |
2185 | it's not possible to use the 32-bit operands with them. On the other hand, the |
2250 | and it's not possible to use the 32-bit operands with them. On the other hand, |
2186 | indirect far jumps and calls allow any operands that were allowed by the x86 |
2251 | the indirect far jumps and calls allow any operands that were allowed by the |
2187 | architecture and also 80-bit memory operand is allowed (though only EM64T seems |
2252 | x86 architecture and also 80-bit memory operand is allowed (though only EM64T |
2188 | to implement such variant), with the first eight bytes defining the offset and |
2253 | seems to implement such variant), with the first eight bytes defining the |
2189 | two last bytes specifying the selector. The direct far jumps and calls are not |
2254 | offset and two last bytes specifying the selector. The direct far jumps and |
2190 | allowed in long mode. |
2255 | calls are not allowed in long mode. |
2191 | The I/O instructions, "in", "out", "ins" and "outs" are the exceptional |
2256 | The I/O instructions, "in", "out", "ins" and "outs" are the exceptional |
2192 | instructions that are not extended to accept quad word operands in long mode. |
2257 | instructions that are not extended to accept quad word operands in long mode. |
2193 | But all other string operations are, and there are new short forms "movsq", |
2258 | But all other string operations are, and there are new short forms "movsq", |
2194 | "cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string |
2259 | "cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string |
2195 | operations for 64-bit string elements. The RSI and RDI registers are used by |
2260 | operations for 64-bit string elements. The RSI and RDI registers are used by |
2196 | default to address the string elements. |
2261 | default to address the string elements. |
2197 | The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source |
2262 | The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source |
2198 | memory operand with 64-bit destination register (though only EM64T seems to |
2263 | memory operand with 64-bit destination register (though only EM64T seems to |
2199 | implement such variant). The "lds" and "les" are disallowed in long mode. |
2264 | implement such variant). The "lds" and "les" are disallowed in long mode. |
2200 | The system instructions like "lgdt" which required the 48-bit memory operand, |
2265 | The system instructions like "lgdt" which required the 48-bit memory operand, |
2201 | in long mode require the 80-bit memory operand. |
2266 | in long mode require the 80-bit memory operand. |
2202 | The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses |
2267 | The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses |
2203 | the double quad word memory operand and 64-bit registers to perform the |
2268 | the double quad word memory operand and 64-bit registers to perform the |
2204 | analoguous operation. |
2269 | analoguous operation. |
2205 | "swapgs" is the new instruction, which swaps the contents of GS register and |
2270 | The "fxsave64" and "fxrstor64" are new variants of "fxsave" and "fxrstor" |
- | 2271 | instructions, available only in long mode, which use a different format of |
|
- | 2272 | storage area in order to store some pointers in full 64-bit size. |
|
- | 2273 | "swapgs" is the new instruction, which swaps the contents of GS register and |
|
2206 | the KernelGSbase model-specific register (MSR address 0C0000102h). |
2274 | the KernelGSbase model-specific register (MSR address 0C0000102h). |
2207 | "syscall" and "sysret" is the pair of new instructions that provide the |
2275 | "syscall" and "sysret" is the pair of new instructions that provide the |
2208 | functionality similar to "sysenter" and "sysexit" in long mode, where the |
2276 | functionality similar to "sysenter" and "sysexit" in long mode, where the |
2209 | latter pair is disallowed. |
2277 | latter pair is disallowed. The "sysexitq" and "sysretq" mnemonics provide the |
- | 2278 | 64-bit versions of "sysexit" and "sysret" instructions. |
|
- | 2279 | The "rdmsrq" and "wrmsrq" mnemonics are the 64-bit variants of the "rdmsr" |
|
- | 2280 | and "wrmsr" instructions. |
|
- | 2281 | ||
- | 2282 | ||
- | 2283 | 2.1.20 SSE4 instructions |
|
- | 2284 | ||
- | 2285 | There are actually three different sets of instructions under the name SSE4. |
|
- | 2286 | Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the |
|
- | 2287 | former into the full Intel's SSE4 set. On the other hand, the implementation |
|
- | 2288 | by AMD includes only a few instructions from this set, but also contains |
|
- | 2289 | some additional instructions, that are called the SSE4a set. |
|
- | 2290 | The SSE4.1 instructions mostly follow the same rules for operands, as |
|
- | 2291 | the basic SSE operations, so they require destination operand to be SSE |
|
- | 2292 | register and source operand to be 128-bit memory location or SSE register, |
|
- | 2293 | and some operations require a third operand, the 8-bit immediate value. |
|
- | 2294 | "pmulld" performs a signed multiplication of the packed double words and |
|
- | 2295 | stores the low double words of the results in the destination operand. |
|
- | 2296 | "pmuldq" performs a two signed multiplications of the corresponding double |
|
- | 2297 | words in the lower quad words of operands, and stores the results as |
|
- | 2298 | packed quad words into the destination register. "pminsb" and "pmaxsb" |
|
- | 2299 | return the minimum or maximum values of packed signed bytes, "pminuw" and |
|
- | 2300 | "pmaxuw" return the minimum and maximum values of packed unsigned words, |
|
- | 2301 | "pminud", "pmaxud", "pminsd" and "pmaxsd" return minimum or maximum values |
|
- | 2302 | of packed unsigned or signed words. These instruction complement the |
|
- | 2303 | instructions computing packed minimum or maximum introduced by SSE. |
|
- | 2304 | "ptest" sets the ZF flag to one when the result of bitwise AND of the |
|
- | 2305 | both operands is zero, and zeroes the ZF otherwise. It also sets CF flag |
|
- | 2306 | to one, when the result of bitwise AND of the destination operand with |
|
- | 2307 | the bitwise NOT of the source operand is zero, and zeroes the CF otherwise. |
|
- | 2308 | "pcmpeqq" compares packed quad words for equality, and fills the |
|
- | 2309 | corresponding elements of destination operand with either ones or zeros, |
|
- | 2310 | depending on the result of comparison. |
|
- | 2311 | "packusdw" converts packed signed double words from both the source and |
|
- | 2312 | destination operand into the unsigned words using saturation, and stores |
|
- | 2313 | the eight resulting word values into the destination register. |
|
- | 2314 | "phminposuw" finds the minimum unsigned word value in source operand and |
|
- | 2315 | places it into the lowest word of destination operand, setting the remaining |
|
- | 2316 | upper bits of destination to zero. |
|
- | 2317 | "roundps", "roundss", "roundpd" and "roundsd" perform the rounding of packed |
|
- | 2318 | or individual floating point value of single or double precision, using the |
|
- | 2319 | rounding mode specified by the third operand. |
|
- | 2320 | ||
- | 2321 | roundsd xmm0,xmm1,0011b ; round toward zero |
|
- | 2322 | ||
- | 2323 | "dpps" calculates dot product of packed single precision floating point |
|
- | 2324 | values, that is it multiplies the corresponding pairs of values from source and |
|
- | 2325 | destination operand and then sums the products up. The high four bits of the |
|
- | 2326 | 8-bit immediate third operand control which products are calculated and taken |
|
- | 2327 | to the sum, and the low four bits control, into which elements of destination |
|
- | 2328 | the resulting dot product is copied (the other elements are filled with zero). |
|
- | 2329 | "dppd" calculates dot product of packed double precision floating point values. |
|
- | 2330 | The bits 4 and 5 of third operand control, which products are calculated and |
|
- | 2331 | added, and bits 0 and 1 of this value control, which elements in destination |
|
- | 2332 | register should get filled with the result. "mpsadbw" calculates multiple sums |
|
- | 2333 | of absolute differences of unsigned bytes. The third operand controls, with |
|
- | 2334 | value in bits 0-1, which of the four-byte blocks in source operand is taken to |
|
- | 2335 | calculate the absolute differencies, and with value in bit 2, at which of the |
|
- | 2336 | two first four-byte block in destination operand start calculating multiple |
|
- | 2337 | sums. The sum is calculated from four absolute differencies between the |
|
- | 2338 | corresponding unsigned bytes in the source and destination block, and each next |
|
- | 2339 | sum is calculated in the same way, but taking the four bytes from destination |
|
- | 2340 | at the position one byte after the position of previous block. The four bytes |
|
- | 2341 | from the source stay the same each time. This way eight sums of absolute |
|
- | 2342 | differencies are calculated and stored as packed word values into the |
|
- | 2343 | destination operand. The instructions described in this paragraph follow the |
|
- | 2344 | same rules for operands, as "roundps" instruction. |
|
- | 2345 | "blendps", "blendvps", "blendpd" and "blendvpd" conditionally copy the |
|
- | 2346 | values from source operand into the destination operand, depending on the bits |
|
- | 2347 | of the mask provided by third operand. If a mask bit is set, the corresponding |
|
- | 2348 | element of source is copied into the same place in destination, otherwise this |
|
- | 2349 | position is destination is left unchanged. The rules for the first two operands |
|
- | 2350 | are the same, as for general SSE instructions. "blendps" and "blendpd" need |
|
- | 2351 | third operand to be 8-bit immediate, and they operate on single or double |
|
- | 2352 | precision values, respectively. "blendvps" and "blendvpd" require third operand |
|
- | 2353 | to be the XMM0 register. |
|
- | 2354 | ||
- | 2355 | blendvps xmm3,xmm7,xmm0 ; blend according to mask |
|
- | 2356 | ||
- | 2357 | "pblendw" conditionally copies word elements from the source operand into the |
|
- | 2358 | destination, depending on the bits of mask provided by third operand, which |
|
- | 2359 | needs to be 8-bit immediate value. "pblendvb" conditionally copies byte |
|
- | 2360 | elements from the source operands into destination, depending on mask defined |
|
- | 2361 | by the third operand, which has to be XMM0 register. These instructions follow |
|
- | 2362 | the same rules for operands as "blendps" and "blendvps" instructions, |
|
- | 2363 | respectively. |
|
- | 2364 | "insertps" inserts a single precision floating point value taken from the |
|
- | 2365 | position in source operand specified by bits 6-7 of third operand into location |
|
- | 2366 | in destination register selected by bits 4-5 of third operand. Additionally, |
|
- | 2367 | the low four bits of third operand control, which elements in destination |
|
- | 2368 | register will be set to zero. The first two operands follow the same rules as |
|
- | 2369 | for the general SSE operation, the third operand should be 8-bit immediate. |
|
- | 2370 | "extractps" extracts a single precision floating point value taken from the |
|
- | 2371 | location in source operand specified by low two bits of third operand, and |
|
- | 2372 | stores it into the destination operand. The destination can be a 32-bit memory |
|
- | 2373 | value or general purpose register, the source operand must be SSE register, |
|
- | 2374 | and the third operand should be 8-bit immediate value. |
|
- | 2375 | ||
- | 2376 | extractps edx,xmm3,3 ; extract the highest value |
|
- | 2377 | ||
- | 2378 | "pinsrb", "pinsrd" and "pinsrq" copy a byte, double word or quad word from |
|
- | 2379 | the source operand into the location of destination operand determined by the |
|
- | 2380 | third operand. The destination operand has to be SSE register, the source |
|
- | 2381 | operand can be a memory location of appropriate size, or the 32-bit general |
|
- | 2382 | purpose register (but 64-bit general purpose register for "pinsrq", which is |
|
- | 2383 | only available in long mode), and the third operand has to be 8-bit immediate |
|
- | 2384 | value. These instructions complement the "pinsrw" instruction operating on SSE |
|
- | 2385 | register destination, which was introduced by SSE2. |
|
- | 2386 | ||
- | 2387 | pinsrd xmm4,eax,1 ; insert double word into second position |
|
- | 2388 | ||
- | 2389 | "pextrb", "pextrw", "pextrd" and "pextrq" copy a byte, word, double word or |
|
- | 2390 | quad word from the location in source operand specified by third operand, into |
|
- | 2391 | the destination. The source operand should be SSE register, the third operand |
|
- | 2392 | should be 8-bit immediate, and the destination operand can be memory location |
|
- | 2393 | of appropriate size, or the 32-bit general purpose register (but 64-bit general |
|
- | 2394 | purpose register for "pextrq", which is only available in long mode). The |
|
- | 2395 | "pextrw" instruction with SSE register as source was already introduced by |
|
- | 2396 | SSE2, but SSE4 extends it to allow memory operand as destination. |
|
- | 2397 | ||
- | 2398 | pextrw [ebx],xmm3,7 ; extract highest word into memory |
|
- | 2399 | ||
- | 2400 | "pmovsxbw" and "pmovzxbw" perform sign extension or zero extension of eight |
|
- | 2401 | byte values from the source operand into packed word values in destination |
|
- | 2402 | operand, which has to be SSE register. The source can be 64-bit memory or SSE |
|
- | 2403 | register - when it is register, only its low portion is used. "pmovsxbd" and |
|
- | 2404 | "pmovzxbd" perform sign extension or zero extension of the four byte values |
|
- | 2405 | from the source operand into packed double word values in destination operand, |
|
- | 2406 | the source can be 32-bit memory or SSE register. "pmovsxbq" and "pmovzxbq" |
|
- | 2407 | perform sign extension or zero extension of the two byte values from the |
|
- | 2408 | source operand into packed quad word values in destination operand, the source |
|
- | 2409 | can be 16-bit memory or SSE register. "pmovsxwd" and "pmovzxwd" perform sign |
|
- | 2410 | extension or zero extension of the four word values from the source operand |
|
- | 2411 | into packed double words in destination operand, the source can be 64-bit |
|
- | 2412 | memory or SSE register. "pmovsxwq" and "pmovzxwq" perform sign extension or |
|
- | 2413 | zero extension of the two word values from the source operand into packed quad |
|
- | 2414 | words in destination operand, the source can be 32-bit memory or SSE register. |
|
- | 2415 | "pmovsxdq" and "pmovzxdq" perform sign extension or zero extension of the two |
|
- | 2416 | double word values from the source operand into packed quad words in |
|
- | 2417 | destination operand, the source can be 64-bit memory or SSE register. |
|
- | 2418 | ||
- | 2419 | pmovzxbq xmm0,word [si] ; zero-extend bytes to quad words |
|
- | 2420 | pmovsxwq xmm0,xmm1 ; sign-extend words to quad words |
|
- | 2421 | ||
- | 2422 | "movntdqa" loads double quad word from the source operand to the destination |
|
- | 2423 | using a non-temporal hint. The destination operand should be SSE register, |
|
- | 2424 | and the source operand should be 128-bit memory location. |
|
- | 2425 | The SSE4.2, described below, adds not only some new operations on SSE |
|
- | 2426 | registers, but also introduces some completely new instructions operating on |
|
- | 2427 | general purpose registers only. |
|
- | 2428 | "pcmpistri" compares two zero-ended (implicit length) strings provided in |
|
- | 2429 | its source and destination operand and generates an index stored to ECX; |
|
- | 2430 | "pcmpistrm" performs the same comparison and generates a mask stored to XMM0. |
|
- | 2431 | "pcmpestri" compares two strings of explicit lengths, with length provided |
|
- | 2432 | in EAX for the destination operand and in EDX for the source operand, and |
|
- | 2433 | generates an index stored to ECX; "pcmpestrm" performs the same comparision |
|
- | 2434 | and generates a mask stored to XMM0. The source and destination operand follow |
|
- | 2435 | the same rules as for general SSE instructions, the third operand should be |
|
- | 2436 | 8-bit immediate value determining the details of performed operation - refer to |
|
- | 2437 | Intel documentation for information on those details. |
|
- | 2438 | "pcmpgtq" compares packed quad words, and fills the corresponding elements of |
|
- | 2439 | destination operand with either ones or zeros, depending on whether the value |
|
- | 2440 | in destination is greater than the one in source, or not. This instruction |
|
- | 2441 | follows the same rules for operands as "pcmpeqq". |
|
- | 2442 | "crc32" accumulates a CRC32 value for the source operand starting with |
|
- | 2443 | initial value provided by destination operand, and stores the result in |
|
- | 2444 | destination. Unless in long mode, the destination operand should be a 32-bit |
|
- | 2445 | general purpose register, and the source operand can be a byte, word, or double |
|
- | 2446 | word register or memory location. In long mode the destination operand can |
|
- | 2447 | also be a 64-bit general purpose register, and the source operand in such case |
|
- | 2448 | can be a byte or quad word register or memory location. |
|
- | 2449 | ||
- | 2450 | crc32 eax,dl ; accumulate CRC32 on byte value |
|
- | 2451 | crc32 eax,word [ebx] ; accumulate CRC32 on word value |
|
- | 2452 | crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value |
|
- | 2453 | ||
- | 2454 | "popcnt" calculates the number of bits set in the source operand, which can |
|
- | 2455 | be 16-bit, 32-bit, or 64-bit general purpose register or memory location, |
|
- | 2456 | and stores this count in the destination operand, which has to be register of |
|
- | 2457 | the same size as source operand. The 64-bit variant is available only in long |
|
- | 2458 | mode. |
|
- | 2459 | ||
- | 2460 | popcnt ecx,eax ; count bits set to 1 |
|
- | 2461 | ||
- | 2462 | The SSE4a extension, which also includes the "popcnt" instruction introduced |
|
- | 2463 | by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the |
|
- | 2464 | same syntax, and calculates the count of leading zero bits in source operand |
|
- | 2465 | (if the source operand is all zero bits, the total number of bits in source |
|
- | 2466 | operand is stored in destination). |
|
- | 2467 | "extrq" extract the sequence of bits from the low quad word of SSE register |
|
- | 2468 | provided as first operand and stores them at the low end of this register, |
|
- | 2469 | filling the remaining bits in the low quad word with zeros. The position of bit |
|
- | 2470 | string and its length can either be provided with two 8-bit immediate values |
|
- | 2471 | as second and third operand, or by SSE register as second operand (and there |
|
- | 2472 | is no third operand in such case), which should contain position value in bits |
|
- | 2473 | 8-13 and length of bit string in bits 0-5. |
|
- | 2474 | ||
- | 2475 | extrq xmm0,8,7 ; extract 8 bits from position 7 |
|
- | 2476 | extrq xmm0,xmm5 ; extract bits defined by register |
|
- | 2477 | ||
- | 2478 | "insertq" writes the sequence of bits from the low quad word of the source |
|
- | 2479 | operand into specified position in low quad word of the destination operand, |
|
- | 2480 | leaving the other bits in low quad word of destination intact. The position |
|
- | 2481 | where bits should be written and the length of bit string can either be |
|
- | 2482 | provided with two 8-bit immediate values as third and fourth operand, or by |
|
- | 2483 | the bit fields in source operand (and there are only two operands in such |
|
- | 2484 | case), which should contain position value in bits 72-77 and length of bit |
|
- | 2485 | string in bits 64-69. |
|
- | 2486 | ||
- | 2487 | insertq xmm1,xmm0,4,2 ; insert 4 bits at position 2 |
|
- | 2488 | insertq xmm1,xmm0 ; insert bits defined by register |
|
- | 2489 | ||
- | 2490 | "movntss" and "movntsd" store single or double precision floating point |
|
- | 2491 | value from the source SSE register into 32-bit or 64-bit destination memory |
|
- | 2492 | location respectively, using non-temporal hint. |
|
- | 2493 | ||
- | 2494 | ||
- | 2495 | 2.1.21 AVX instructions |
|
- | 2496 | ||
- | 2497 | The Advanced Vector Extensions introduce instructions that are new variants |
|
- | 2498 | of SSE instructions, with new scheme of encoding that allows extended syntax |
|
- | 2499 | having a destination operand separate from all the source operands. It also |
|
- | 2500 | introduces 256-bit AVX registers, which extend up the old 128-bit SSE |
|
- | 2501 | registers. Any AVX instruction that puts some result into SSE register, puts |
|
- | 2502 | zero bits into high portion of the AVX register containing it. |
|
- | 2503 | The AVX version of SSE instruction has the mnemonic obtained by prepending |
|
- | 2504 | SSE instruction name with "v". For any SSE arithmetic instruction which had a |
|
- | 2505 | destination operand also being used as one of the source values, the AVX |
|
- | 2506 | variant has a new syntax with three operands - the destination and two sources. |
|
- | 2507 | The destination and first source can be SSE registers, and second source can be |
|
- | 2508 | SSE register or memory. If the operation is performed on single pair of values, |
|
- | 2509 | the remaining bits of first source SSE register are copied into the the |
|
- | 2510 | destination register. |
|
- | 2511 | ||
- | 2512 | vsubss xmm0,xmm2,xmm3 ; substract two 32-bit floats |
|
- | 2513 | vmulsd xmm0,xmm7,qword [esi] ; multiply two 64-bit floats |
|
- | 2514 | ||
- | 2515 | In case of packed operations, each instruction can also operate on the 256-bit |
|
- | 2516 | data size when the AVX registers are specified instead of SSE registers, and |
|
- | 2517 | the size of memory operand is also doubled then. |
|
- | 2518 | ||
- | 2519 | vaddps ymm1,ymm5,yword [esi] ; eight sums of 32-bit float pairs |
|
- | 2520 | ||
- | 2521 | The instructions that operate on packed integer types (in particular the ones |
|
- | 2522 | that earlier had been promoted from MMX to SSE) also acquired the new syntax |
|
- | 2523 | with three operands, however they are only allowed to operate on 128-bit |
|
- | 2524 | packed types and thus cannot use the whole AVX registers. |
|
- | 2525 | ||
- | 2526 | vpavgw xmm3,xmm0,xmm2 ; average of 16-bit integers |
|
- | 2527 | vpslld xmm1,xmm0,1 ; shift double words left |
|
- | 2528 | ||
- | 2529 | If the SSE version of instruction had a syntax with three operands, the third |
|
- | 2530 | one being an immediate value, the AVX version of such instruction takes four |
|
- | 2531 | operands, with immediate remaining the last one. |
|
- | 2532 | ||
- | 2533 | vshufpd ymm0,ymm1,ymm2,10010011b ; shuffle 64-bit floats |
|
- | 2534 | vpalignr xmm0,xmm4,xmm2,3 ; extract byte aligned value |
|
- | 2535 | ||
- | 2536 | The promotion to new syntax according to the rules described above has been |
|
- | 2537 | applied to all the instructions from SSE extensions up to SSE4, with the |
|
- | 2538 | exceptions described below. |
|
- | 2539 | "vdppd" instruction has syntax extended to four operans, but it does not |
|
- | 2540 | have a 256-bit version. |
|
- | 2541 | The are a few instructions, namely "vsqrtpd", "vsqrtps", "vrcpps" and |
|
- | 2542 | "vrsqrtps", which can operate on 256-bit data size, but retained the syntax |
|
- | 2543 | with only two operands, because they use data from only one source: |
|
- | 2544 | ||
- | 2545 | vsqrtpd ymm1,ymm0 ; put square roots into other register |
|
- | 2546 | ||
- | 2547 | In a similar way "vroundpd" and "vroundps" retained the syntax with three |
|
- | 2548 | operands, the last one being immediate value. |
|
- | 2549 | ||
- | 2550 | vroundps ymm0,ymm1,0011b ; round toward zero |
|
- | 2551 | ||
- | 2552 | Also some of the operations on packed integers kept their two-operand or |
|
- | 2553 | three-operand syntax while being promoted to AVX version. In such case these |
|
- | 2554 | instructions follow exactly the same rules for operands as their SSE |
|
- | 2555 | counterparts (since operations on packed integers do not have 256-bit variants |
|
- | 2556 | in AVX extension). These include "vpcmpestri", "vpcmpestrm", "vpcmpistri", |
|
- | 2557 | "vpcmpistrm", "vphminposuw", "vpshufd", "vpshufhw", "vpshuflw". And there are |
|
- | 2558 | more instructions that in AVX versions keep exactly the same syntax for |
|
- | 2559 | operands as the one from SSE, without any additional options: "vcomiss", |
|
- | 2560 | "vcomisd", "vcvtss2si", "vcvtsd2si", "vcvttss2si", "vcvttsd2si", "vextractps", |
|
- | 2561 | "vpextrb", "vpextrw", "vpextrd", "vpextrq", "vmovd", "vmovq", "vmovntdqa", |
|
- | 2562 | "vmaskmovdqu", "vpmovmskb", "vpmovsxbw", "vpmovsxbd", "vpmovsxbq", "vpmovsxwd", |
|
- | 2563 | "vpmovsxwq", "vpmovsxdq", "vpmovzxbw", "vpmovzxbd", "vpmovzxbq", "vpmovzxwd", |
|
- | 2564 | "vpmovzxwq" and "vpmovzxdq". |
|
- | 2565 | The move and conversion instructions have mostly been promoted to allow |
|
- | 2566 | 256-bit size operands in addition to the 128-bit variant with syntax identical |
|
- | 2567 | to that from SSE version of the same instruction. Each of the "vcvtdq2ps", |
|
- | 2568 | "vcvtps2dq" and "vcvttps2dq", "vmovaps", "vmovapd", "vmovups", "vmovupd", |
|
- | 2569 | "vmovdqa", "vmovdqu", "vlddqu", "vmovntps", "vmovntpd", "vmovntdq", |
|
- | 2570 | "vmovsldup", "vmovshdup", "vmovmskps" and "vmovmskpd" inherits the 128-bit |
|
- | 2571 | syntax from SSE without any changes, and also allows a new form with 256-bit |
|
- | 2572 | operands in place of 128-bit ones. |
|
- | 2573 | ||
- | 2574 | vmovups [edi],ymm6 ; store unaligned 256-bit data |
|
- | 2575 | ||
- | 2576 | "vmovddup" has the identical 128-bit syntax as its SSE version, and it also |
|
- | 2577 | has a 256-bit version, which stores the duplicates of the lowest quad word |
|
- | 2578 | from the source operand in the lower half of destination operand, and in the |
|
- | 2579 | upper half of destination the duplicates of the low quad word from the upper |
|
- | 2580 | half of source. Both source and destination operands need then to be 256-bit |
|
- | 2581 | values. |
|
- | 2582 | "vmovlhps" and "vmovhlps" have only 128-bit versions, and each takes three |
|
- | 2583 | operands, which all must be SSE registers. "vmovlhps" copies two single |
|
- | 2584 | precision values from the low quad word of second source register to the high |
|
- | 2585 | quad word of destination register, and copies the low quad word of first |
|
- | 2586 | source register into the low quad word of destination register. "vmovhlps" |
|
- | 2587 | copies two single precision values from the high quad word of second source |
|
- | 2588 | register to the low quad word of destination register, and copies the high |
|
- | 2589 | quad word of first source register into the high quad word of destination |
|
- | 2590 | register. |
|
- | 2591 | "vmovlps", "vmovhps", "vmovlpd" and "vmovhpd" have only 128-bit versions and |
|
- | 2592 | their syntax varies depending on whether memory operand is a destination or |
|
- | 2593 | source. When memory is destination, the syntax is identical to the one of |
|
- | 2594 | equivalent SSE instruction, and when memory is source, the instruction requires |
|
- | 2595 | three operands, first two being SSE registers and the third one 64-bit memory. |
|
- | 2596 | The value put into destination is then the value copied from first source with |
|
- | 2597 | either low or high quad word replaced with value from second source (the |
|
- | 2598 | memory operand). |
|
- | 2599 | ||
- | 2600 | vmovhps [esi],xmm7 ; store upper half to memory |
|
- | 2601 | vmovlps xmm0,xmm7,[ebx] ; low from memory, rest from register |
|
- | 2602 | ||
- | 2603 | "vmovss" and "vmovsd" have syntax identical to their SSE equivalents as long |
|
- | 2604 | as one of the operands is memory, while the versions that operate purely on |
|
- | 2605 | registers require three operands (each being SSE register). The value stored |
|
- | 2606 | in destination is then the value copied from first source with lowest data |
|
- | 2607 | element replaced with the lowest value from second source. |
|
- | 2608 | ||
- | 2609 | vmovss xmm3,[edi] ; low from memory, rest zeroed |
|
- | 2610 | vmovss xmm0,xmm1,xmm2 ; one value from xmm2, three from xmm1 |
|
- | 2611 | ||
- | 2612 | "vcvtss2sd", "vcvtsd2ss", "vcvtsi2ss" and "vcvtsi2d" use the three-operand |
|
- | 2613 | syntax, where destination and first source are always SSE registers, and the |
|
- | 2614 | second source follows the same rules and the source in syntax of equivalent |
|
- | 2615 | SSE instruction. The value stored in destination is then the value copied from |
|
- | 2616 | first source with lowest data element replaced with the result of conversion. |
|
- | 2617 | ||
- | 2618 | vcvtsi2sd xmm4,xmm4,ecx ; 32-bit integer to 64-bit float |
|
- | 2619 | vcvtsi2ss xmm0,xmm0,rax ; 64-bit integer to 32-bit float |
|
- | 2620 | ||
- | 2621 | "vcvtdq2pd" and "vcvtps2pd" allow the same syntax as their SSE equivalents, |
|
- | 2622 | plus the new variants with AVX register as destination and SSE register or |
|
- | 2623 | 128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and |
|
- | 2624 | "vcvtpd2ps", in addition to variant with syntax identical to SSE version, |
|
- | 2625 | allow a variant with SSE register as destination and AVX register or 256-bit |
|
- | 2626 | memory as source. |
|
- | 2627 | "vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use |
|
- | 2628 | a syntax with four operands, where destination and first source have to be SSE |
|
- | 2629 | registers, and the third and fourth operand follow the same rules as second |
|
- | 2630 | and third operand in the syntax of equivalent SSE instruction. Value stored in |
|
- | 2631 | destination is the the value copied from first source with some data elements |
|
- | 2632 | replaced with values extracted from the second source, analogously to the |
|
- | 2633 | operation of corresponding SSE instruction. |
|
- | 2634 | ||
- | 2635 | vpinsrd xmm0,xmm0,eax,3 ; insert double word |
|
- | 2636 | ||
- | 2637 | "vblendvps", "vblendvpd" and "vpblendvb" use a new syntax with four register |
|
- | 2638 | operands: destination, two sources and a mask, where second source can also be |
|
- | 2639 | a memory operand. "vblendvps" and "vblendvpd" have 256-bit variant, where |
|
- | 2640 | operands are AVX registers or 256-bit memory, as well as 128-bit variant, |
|
- | 2641 | which has operands being SSE registers or 128-bit memory. "vpblendvb" has only |
|
- | 2642 | a 128-bit variant. Value stored in destination is the value copied from the |
|
- | 2643 | first source with some data elements replaced, according to mask, by values |
|
- | 2644 | from the second source. |
|
- | 2645 | ||
- | 2646 | vblendvps ymm3,ymm1,ymm2,ymm7 ; blend according to mask |
|
- | 2647 | ||
- | 2648 | "vptest" allows the same syntax as its SSE version and also has a 256-bit |
|
- | 2649 | version, with both operands doubled in size. There are also two new |
|
- | 2650 | instructions, "vtestps" and "vtestpd", which perform analogous tests, but only |
|
- | 2651 | of the sign bits of corresponding single precision or double precision values, |
|
- | 2652 | and set the ZF and CF accordingly. They follow the same syntax rules as |
|
- | 2653 | "vptest". |
|
- | 2654 | ||
- | 2655 | vptest ymm0,yword [ebx] ; test 256-bit values |
|
- | 2656 | vtestpd xmm0,xmm1 ; test sign bits of 64-bit floats |
|
- | 2657 | ||
- | 2658 | "vbroadcastss", "vbroadcastsd" and "vbroadcastf128" are new instructions, |
|
- | 2659 | which broadcast the data element defined by source operand into all elements |
|
- | 2660 | of corresponing size in the destination register. "vbroadcastss" needs |
|
- | 2661 | source to be 32-bit memory and destination to be either SSE or AVX register. |
|
- | 2662 | "vbroadcastsd" requires 64-bit memory as source, and AVX register as |
|
- | 2663 | destination. "vbroadcastf128" requires 128-bit memory as source, and AVX |
|
- | 2664 | register as destination. |
|
2210 | 2665 | ||
- | 2666 | vbroadcastss ymm0,dword [eax] ; get eight copies of value |
|
- | 2667 | ||
- | 2668 | "vinsertf128" is the new instruction, which takes four operands. The |
|
- | 2669 | destination and first source have to be AVX registers, second source can be |
|
- | 2670 | SSE register or 128-bit memory location, and fourth operand should be an |
|
- | 2671 | immediate value. It stores in destination the value obtained by taking |
|
- | 2672 | contents of first source and replacing one of its 128-bit units with value of |
|
- | 2673 | the second source. The lowest bit of fourth operand specifies at which |
|
- | 2674 | position that replacement is done (either 0 or 1). |
|
- | 2675 | "vextractf128" is the new instruction with three operands. The destination |
|
- | 2676 | needs to be SSE register or 128-bit memory location, the source must be AVX |
|
- | 2677 | register, and the third operand should be an immediate value. It extracts |
|
- | 2678 | into destination one of the 128-bit units from source. The lowest bit of third |
|
- | 2679 | operand specifies, which unit is extracted. |
|
- | 2680 | "vmaskmovps" and "vmaskmovpd" are the new instructions with three operands |
|
- | 2681 | that selectively store in destination the elements from second source |
|
- | 2682 | depending on the sign bits of corresponding elements from first source. These |
|
- | 2683 | instructions can operate on either 128-bit data (SSE registers) or 256-bit |
|
- | 2684 | data (AVX registers). Either destination or second source has to be a memory |
|
- | 2685 | location of appropriate size, the two other operands should be registers. |
|
- | 2686 | ||
- | 2687 | vmaskmovps [edi],xmm0,xmm5 ; conditionally store |
|
- | 2688 | vmaskmovpd ymm5,ymm0,[esi] ; conditionally load |
|
- | 2689 | ||
- | 2690 | "vpermilpd" and "vpermilps" are the new instructions with three operands |
|
- | 2691 | that permute the values from first source according to the control fields from |
|
- | 2692 | second source and put the result into destination operand. It allows to use |
|
- | 2693 | either three SSE registers or three AVX registers as its operands, the second |
|
- | 2694 | source can be a memory of size equal to the registers used. In alternative |
|
- | 2695 | form the second source can be immediate value and then the first source |
|
- | 2696 | can be a memory location of the size equal to destination register. |
|
- | 2697 | "vperm2f128" is the new instruction with four operands, which selects |
|
- | 2698 | 128-bit blocks of floating point data from first and second source according |
|
- | 2699 | to the bit fields from fourth operand, and stores them in destination. |
|
- | 2700 | Destination and first source need to be AVX registers, second source can be |
|
- | 2701 | AVX register or 256-bit memory area, and fourth operand should be an immediate |
|
- | 2702 | value. |
|
- | 2703 | ||
- | 2704 | vperm2f128 ymm0,ymm6,ymm7,12h ; permute 128-bit blocks |
|
- | 2705 | ||
- | 2706 | "vzeroall" instruction sets all the AVX registers to zero. "vzeroupper" sets |
|
- | 2707 | the upper 128-bit portions of all AVX registers to zero, leaving the SSE |
|
- | 2708 | registers intact. These new instructions take no operands. |
|
- | 2709 | "vldmxcsr" and "vstmxcsr" are the AVX versions of "ldmxcsr" and "stmxcsr" |
|
- | 2710 | instructions. The rules for their operands remain unchanged. |
|
- | 2711 | ||
- | 2712 | ||
- | 2713 | 2.1.22 AVX2 instructions |
|
- | 2714 | ||
- | 2715 | The AVX2 extension allows all the AVX instructions operating on packed integers |
|
- | 2716 | to use 256-bit data types, and introduces some new instructions as well. |
|
- | 2717 | The AVX instructions that operate on packed integers and had only a 128-bit |
|
- | 2718 | variants, have been supplemented with 256-bit variants, and thus their syntax |
|
- | 2719 | rules became analogous to AVX instructions operating on packed floating point |
|
- | 2720 | types. |
|
- | 2721 | ||
- | 2722 | vpsubb ymm0,ymm0,[esi] ; substract 32 packed bytes |
|
- | 2723 | vpavgw ymm3,ymm0,ymm2 ; average of 16-bit integers |
|
- | 2724 | ||
- | 2725 | However there are some instructions that have not been equipped with the |
|
- | 2726 | 256-bit variants. "vpcmpestri", "vpcmpestrm", "vpcmpistri", "vpcmpistrm", |
|
- | 2727 | "vpextrb", "vpextrw", "vpextrd", "vpextrq", "vpinsrb", "vpinsrw", "vpinsrd", |
|
- | 2728 | "vpinsrq" and "vphminposuw" are not affected by AVX2 and allow only the |
|
- | 2729 | 128-bit operands. |
|
- | 2730 | The packed shift instructions, which allowed the third operand specifying |
|
- | 2731 | amount to be SSE register or 128-bit memory location, use the same rules |
|
- | 2732 | for the third operand in their 256-bit variant. |
|
- | 2733 | ||
- | 2734 | vpsllw ymm2,ymm2,xmm4 ; shift words left |
|
- | 2735 | vpsrad ymm0,ymm3,xword [ebx] ; shift double words right |
|
- | 2736 | ||
- | 2737 | There are also new packed shift instructions with standard three-operand AVX |
|
- | 2738 | syntax, which shift each element from first source by the amount specified in |
|
- | 2739 | corresponding element of second source, and store the results in destination. |
|
- | 2740 | "vpsllvd" shifts 32-bit elements left, "vpsllvq" shifts 64-bit elements left, |
|
- | 2741 | "vpsrlvd" shifts 32-bit elements right logically, "vpsrlvq" shifts 64-bit |
|
- | 2742 | elements right logically and "vpsravd" shifts 32-bit elements right |
|
- | 2743 | arithmetically. |
|
- | 2744 | The sign-extend and zero-extend instructions, which in AVX versions allowed |
|
- | 2745 | source operand to be SSE register or a memory of specific size, in the new |
|
- | 2746 | 256-bit variant need memory of that size doubled or SSE register as source and |
|
- | 2747 | AVX register as destination. |
|
- | 2748 | ||
- | 2749 | vpmovzxbq ymm0,dword [esi] ; bytes to quad words |
|
- | 2750 | ||
- | 2751 | Also "vmovntdqa" has been upgraded with 256-bit variant, so it allows to |
|
- | 2752 | transfer 256-bit value from memory to AVX register, it needs memory address |
|
- | 2753 | to be aligned to 32 bytes. |
|
- | 2754 | "vpmaskmovd" and "vpmaskmovq" are the new instructions with syntax identical |
|
- | 2755 | to "vmaskmovps" or "vmaskmovpd", and they performs analogous operation on |
|
- | 2756 | packed 32-bit or 64-bit values. |
|
- | 2757 | "vinserti128", "vextracti128", "vbroadcasti128" and "vperm2i128" are the new |
|
- | 2758 | instructions with syntax identical to "vinsertf128", "vextractf128", |
|
- | 2759 | "vbroadcastf128" and "vperm2f128" respectively, and they perform analogous |
|
- | 2760 | operations on 128-bit blocks of integer data. |
|
- | 2761 | "vbroadcastss" and "vbroadcastsd" instructions have been extended to allow |
|
- | 2762 | SSE register as a source operand (which in AVX could only be a memory). |
|
- | 2763 | "vpbroadcastb", "vpbroadcastw", "vpbroadcastd" and "vpbroadcastq" are the |
|
- | 2764 | new instructions which broadcast the byte, word, double word or quad word from |
|
- | 2765 | the source operand into all elements of corresponing size in the destination |
|
- | 2766 | register. The destination operand can be either SSE or AVX register, and the |
|
- | 2767 | source operand can be SSE register or memory of size equal to the size of data |
|
- | 2768 | element. |
|
- | 2769 | ||
- | 2770 | vpbroadcastb ymm0,byte [ebx] ; get 32 identical bytes |
|
- | 2771 | ||
- | 2772 | "vpermd" and "vpermps" are new three-operand instructions, which use each |
|
- | 2773 | 32-bit element from first source as an index of element in second source which |
|
- | 2774 | is copied into destination at position corresponding to element containing |
|
- | 2775 | index. The destination and first source have to be AVX registers, and the |
|
- | 2776 | second source can be AVX register or 256-bit memory. |
|
- | 2777 | "vpermq" and "vpermpd" are new three-operand instructions, which use 2-bit |
|
- | 2778 | indexes from the immediate value specified as third operand to determine which |
|
- | 2779 | element from source store at given position in destination. The destination |
|
- | 2780 | has to be AVX register, source can be AVX register or 256-bit memory, and the |
|
- | 2781 | third operand must be 8-bit immediate value. |
|
- | 2782 | The family of new instructions performing "gather" operation have special |
|
- | 2783 | syntax, as in their memory operand they use addressing mode that is unique to |
|
- | 2784 | them. The base of address can be a 32-bit or 64-bit general purpose register |
|
- | 2785 | (the latter only in long mode), and the index (possibly multiplied by scale |
|
- | 2786 | value, as in standard addressing) is specified by SSE or AVX register. It is |
|
- | 2787 | possible to use only index without base and any numerical displacement can be |
|
- | 2788 | added to the address. Each of those instructions takes three operands. First |
|
- | 2789 | operand is the destination register, second operand is memory addressed with |
|
- | 2790 | a vector index, and third operand is register containing a mask. The most |
|
- | 2791 | significant bit of each element of mask determines whether a value will be |
|
- | 2792 | loaded from memory into corresponding element in destination. The address of |
|
- | 2793 | each element to load is determined by using the corresponding element from |
|
- | 2794 | index register in memory operand to calculate final address with given base |
|
- | 2795 | and displacement. When the index register contains less elements than the |
|
- | 2796 | destination and mask registers, the higher elements of destination are zeroed. |
|
- | 2797 | After the value is successfuly loaded, the corresponding element in mask |
|
- | 2798 | register is set to zero. The destination, index and mask should all be |
|
- | 2799 | distinct registers, it is not allowed to use the same register in two |
|
- | 2800 | different roles. |
|
- | 2801 | "vgatherdps" loads single precision floating point values addressed by |
|
- | 2802 | 32-bit indexes. The destination, index and mask should all be registers of the |
|
- | 2803 | same type, either SSE or AVX. The data addressed by memory operand is 32-bit |
|
- | 2804 | in size. |
|
- | 2805 | ||
- | 2806 | vgatherdps xmm0,[eax+xmm1],xmm3 ; gather four floats |
|
- | 2807 | vgatherdps ymm0,[ebx+ymm7*4],ymm3 ; gather eight floats |
|
- | 2808 | ||
- | 2809 | "vgatherqps" loads single precision floating point values addressed by |
|
- | 2810 | 64-bit indexes. The destination and mask should always be SSE registers, while |
|
- | 2811 | index register can be either SSE or AVX register. The data addressed by memory |
|
- | 2812 | operand is 32-bit in size. |
|
- | 2813 | ||
- | 2814 | vgatherqps xmm0,[xmm2],xmm3 ; gather two floats |
|
- | 2815 | vgatherqps xmm0,[ymm2+64],xmm3 ; gather four floats |
|
- | 2816 | ||
- | 2817 | "vgatherdpd" loads double precision floating point values addressed by |
|
- | 2818 | 32-bit indexes. The index register should always be SSE register, the |
|
- | 2819 | destination and mask should be two registers of the same type, either SSE or |
|
- | 2820 | AVX. The data addressed by memory operand is 64-bit in size. |
|
- | 2821 | ||
- | 2822 | vgatherdpd xmm0,[ebp+xmm1],xmm3 ; gather two doubles |
|
- | 2823 | vgatherdpd ymm0,[xmm3*8],ymm5 ; gather four doubles |
|
- | 2824 | ||
- | 2825 | "vgatherqpd" loads double precision floating point values addressed by |
|
- | 2826 | 64-bit indexes. The destination, index and mask should all be registers of the |
|
- | 2827 | same type, either SSE or AVX. The data addressed by memory operand is 64-bit |
|
- | 2828 | in size. |
|
- | 2829 | "vpgatherdd" and "vpgatherqd" load 32-bit values addressed by either 32-bit |
|
- | 2830 | or 64-bit indexes. They follow the same rules as "vgatherdps" and "vgatherqps" |
|
- | 2831 | respectively. |
|
- | 2832 | "vpgatherdq" and "vpgatherqq" load 64-bit values addressed by either 32-bit |
|
- | 2833 | or 64-bit indexes. They follow the same rules as "vgatherdpd" and "vgatherqpd" |
|
- | 2834 | respectively. |
|
- | 2835 | ||
- | 2836 | ||
- | 2837 | 2.1.23 Auxiliary sets of computational instructions |
|
- | 2838 | ||
- | 2839 | There is a number of additional instruction set extensions related to |
|
- | 2840 | AVX. They introduce new vector instructions (and sometimes also their SSE |
|
- | 2841 | equivalents that use classic instruction encoding), and even some new |
|
- | 2842 | instructions operating on general registers that use the AVX-like encoding |
|
- | 2843 | allowing the extended syntax with separate destination and source operands. |
|
- | 2844 | The CPU support for each of these instruction sets needs to be determined |
|
- | 2845 | separately. |
|
- | 2846 | The AES extension provides a specialized set of instructions for the |
|
- | 2847 | purpose of cryptographic computations defined by Advanced Encryption Standard. |
|
- | 2848 | Each of these instructions has two versions: the AVX one and the one with |
|
- | 2849 | SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the |
|
- | 2850 | details of operation of these instructions. |
|
- | 2851 | "aesenc" and "aesenclast" perform a single round of AES encryption on data |
|
- | 2852 | from first source with a round key from second source, and store result in |
|
- | 2853 | destination. The destination and first source are SSE registers, and the |
|
- | 2854 | second source can be SSE register or 128-bit memory. The AVX versions of these |
|
- | 2855 | instructions, "vaesenc" and "vaesenclast", use the syntax with three operands, |
|
- | 2856 | while the SSE-like version has only two operands, with first operand being |
|
- | 2857 | both the destination and first source. |
|
- | 2858 | "aesdec" and "aesdeclast" perform a single round of AES decryption on data |
|
- | 2859 | from first source with a round key from second source. The syntax rules for |
|
- | 2860 | them and their AVX versions are the same as for "aesenc". |
|
- | 2861 | "aesimc" performs the InvMixColumns transformation of source operand and |
|
- | 2862 | store the result in destination. Both "aesimc" and "vaesimc" use only two |
|
- | 2863 | operands, destination being SSE register, and source being SSE register or |
|
- | 2864 | 128-bit memory location. |
|
- | 2865 | "aeskeygenassist" is a helper instruction for generating the round key. |
|
- | 2866 | It needs three operands: destination being SSE register, source being SSE |
|
- | 2867 | register or 128-bit memory, and third operand being 8-bit immediate value. |
|
- | 2868 | The AVX version of this instruction uses the same syntax. |
|
- | 2869 | The CLMUL extension introduces just one instruction, "pclmulqdq", and its |
|
- | 2870 | AVX version as well. This instruction performs a carryless multiplication of |
|
- | 2871 | two 64-bit values selected from first and second source according to the bit |
|
- | 2872 | fields in immediate value. The destination and first source are SSE registers, |
|
- | 2873 | second source is SSE register or 128-bit memory, and immediate value is |
|
- | 2874 | provided as last operand. "vpclmulqdq" takes four operands, while "pclmulqdq" |
|
- | 2875 | takes only three operands, with the first one serving both the role of |
|
- | 2876 | destination and first source. |
|
- | 2877 | The FMA (Fused Multiply-Add) extension introduces additional AVX |
|
- | 2878 | instructions which perform multiplication and summation as single operation. |
|
- | 2879 | Each one takes three operands, first one serving both the role of destination |
|
- | 2880 | and first source, and the following ones being the second and third source. |
|
- | 2881 | The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first |
|
- | 2882 | either "m" or "nm" to select whether result of multiplication should be taken |
|
- | 2883 | as-is or negated, then either "add" or "sub" to select whether third value |
|
- | 2884 | will be added to the product or substracted from the product, then either |
|
- | 2885 | "132", "213" or "231" to select which source operands are multiplied and which |
|
- | 2886 | one is added or substracted, and finally the type of data on which the |
|
- | 2887 | instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE |
|
- | 2888 | instructions promoted to AVX, instructions operating on packed floating point |
|
- | 2889 | values allow 128-bit or 256-bit syntax, in former all the operands are SSE |
|
- | 2890 | registers, but the third one can also be a 128-bit memory, in latter the |
|
- | 2891 | operands are AVX registers and the third one can also be a 256-bit memory. |
|
- | 2892 | Instructions that compute just one floating point result need operands to be |
|
- | 2893 | SSE registers, and the third operand can also be a memory, either 32-bit for |
|
- | 2894 | single precision or 64-bit for double precision. |
|
- | 2895 | ||
- | 2896 | vfmsub231ps ymm1,ymm2,ymm3 ; multiply and substract |
|
- | 2897 | vfnmadd132sd xmm0,xmm5,[ebx] ; multiply, negate and add |
|
- | 2898 | ||
- | 2899 | In addition to the instructions created by the rule described above, there are |
|
- | 2900 | families of instructions with mnemonics starting with either "vfmaddsub" or |
|
- | 2901 | "vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or |
|
- | 2902 | "pd" (the operation must always be on packed values in this case). They add |
|
- | 2903 | to the result of multiplication or substract from it depending on the position |
|
- | 2904 | of value in packed data - instructions from the "vfmaddsub" group add when the |
|
- | 2905 | position is odd and substract when the position is even, instructions from the |
|
- | 2906 | "vfmsubadd" group add when the position is even and subtstract when the |
|
- | 2907 | position is odd. The rules for operands are the same as for other FMA |
|
- | 2908 | instructions. |
|
- | 2909 | The FMA4 instructions are similar to FMA, but use syntax with four operands |
|
- | 2910 | and thus allow destination to be different than all the sources. Their |
|
- | 2911 | mnemonics are identical to FMA instructions with the "132", "213" or "231" cut |
|
- | 2912 | out, as having separate destination operand makes such selection of operands |
|
- | 2913 | superfluous. The multiplication is always performed on values from the first |
|
- | 2914 | and second source, and then the value from third source is added or |
|
- | 2915 | substracted. Either second or third source can be a memory operand, and the |
|
- | 2916 | rules for the sizes of operands are the same as for FMA instructions. |
|
- | 2917 | ||
- | 2918 | vfmaddpd ymm0,ymm1,[esi],ymm2 ; multiply and add |
|
- | 2919 | vfmsubss xmm0,xmm1,xmm2,[ebx] ; multiply and substract |
|
- | 2920 | ||
- | 2921 | The F16C extension consists of two instructions, "vcvtps2ph" and |
|
- | 2922 | "vcvtph2ps", which convert floating point values between single precision and |
|
- | 2923 | half precision (the 16-bit floating point format). "vcvtps2ph" takes three |
|
- | 2924 | operands: destination, source, and rounding controls. The third operand is |
|
- | 2925 | always an immediate, the source is either SSE or AVX register containing |
|
- | 2926 | single precision values, and the destination is SSE register or memory, the |
|
- | 2927 | size of memory is 64 bits when the source is SSE register and 128 bits when |
|
- | 2928 | the source is AVX register. "vcvtph2ps" takes two operands, the destination |
|
- | 2929 | that can be SSE or AVX register, and the source that is SSE register or memory |
|
- | 2930 | with size of the half of destination operand's size. |
|
- | 2931 | The AMD XOP extension introduces a number of new vector instructions with |
|
- | 2932 | encoding and syntax analogous to AVX instructions. "vfrczps", "vfrczss", |
|
- | 2933 | "vfrczpd" and "vfrczsd" extract fractional portions of single or double |
|
- | 2934 | precision values, they all take two operands. The packed operations allow |
|
- | 2935 | either SSE or AVX register as destination, for the other two it has to be SSE |
|
- | 2936 | register. Source can be register of the same type as destination, or memory |
|
- | 2937 | of appropriate size (256-bit for destination being AVX register, 128-bit for |
|
- | 2938 | packed operation with destination being SSE register, 64-bit for operation |
|
- | 2939 | on a solitary double precision value and 32-bit for operation on a solitary |
|
- | 2940 | single precision value). |
|
- | 2941 | ||
- | 2942 | vfrczps ymm0,[esi] ; load fractional parts |
|
- | 2943 | ||
- | 2944 | "vpcmov" copies bits from either first or second source into destination |
|
- | 2945 | depending on the values of corresponding bits in the fourth operand (the |
|
- | 2946 | selector). If the bit in selector is set, the corresponding bit from first |
|
- | 2947 | source is copied into the same position in destination, otherwise the bit from |
|
- | 2948 | second source is copied. Either second source or selector can be memory |
|
- | 2949 | location, 128-bit or 256-bit depending on whether SSE registers or AVX |
|
- | 2950 | registers are specified as the other operands. |
|
- | 2951 | ||
- | 2952 | vpcmov xmm0,xmm1,xmm2,[ebx] ; selector in memory |
|
- | 2953 | vpcmov ymm0,ymm5,[esi],ymm2 ; source in memory |
|
- | 2954 | ||
- | 2955 | The family of packed comparison instructions take four operands, the |
|
- | 2956 | destination and first source being SSE register, second source being SSE |
|
- | 2957 | register or 128-bit memory and the fourth operand being immediate value |
|
- | 2958 | defining the type of comparison. The mnemonic or instruction is created |
|
- | 2959 | by appending to "vpcom" prefix either "b" or "ub" to compare signed or |
|
- | 2960 | unsigned bytes, "w" or "uw" to compare signed or unsigned words, "d" or "ud" |
|
- | 2961 | to compare signed or unsigned double words, "q" or "uq" to compare signed or |
|
- | 2962 | unsigned quad words. The respective values from the first and second source |
|
- | 2963 | are compared and the corresponding data element in destination is set to |
|
- | 2964 | either all ones or all zeros depending on the result of comparison. The fourth |
|
- | 2965 | operand has to specify one of the eight comparison types (table 2.5). All |
|
- | 2966 | these instruction have also variants with only three operands and the type |
|
- | 2967 | of comparison encoded within the instruction name by inserting the comparison |
|
- | 2968 | mnemonic after "vpcom". |
|
- | 2969 | ||
- | 2970 | vpcomb xmm0,xmm1,xmm2,4 ; test for equal bytes |
|
- | 2971 | vpcomgew xmm0,xmm1,[ebx] ; compare signed words |
|
- | 2972 | ||
- | 2973 | Table 2.5 XOP comparisons |
|
- | 2974 | /-------------------------------------------\ |
|
- | 2975 | | Code | Mnemonic | Description | |
|
- | 2976 | |======|==========|=========================| |
|
- | 2977 | | 0 | lt | less than | |
|
- | 2978 | | 1 | le | less than or equal | |
|
- | 2979 | | 2 | gt | greater than | |
|
- | 2980 | | 3 | ge | greater than or equal | |
|
- | 2981 | | 4 | eq | equal | |
|
- | 2982 | | 5 | neq | not equal | |
|
- | 2983 | | 6 | false | false | |
|
- | 2984 | | 7 | true | true | |
|
- | 2985 | \-------------------------------------------/ |
|
- | 2986 | ||
- | 2987 | "vpermil2ps" and "vpermil2pd" set the elements in destination register to |
|
- | 2988 | zero or to a value selected from first or second source depending on the |
|
- | 2989 | corresponding bit fields from the fourth operand (the selector) and the |
|
- | 2990 | immediate value provided in fifth operand. Refer to the AMD manuals for the |
|
- | 2991 | detailed explanation of the operation performed by these instructions. Each |
|
- | 2992 | of the first four operands can be a register, and either second source or |
|
- | 2993 | selector can be memory location, 128-bit or 256-bit depending on whether SSE |
|
- | 2994 | registers or AVX registers are used for the other operands. |
|
- | 2995 | ||
- | 2996 | vpermil2ps ymm0,ymm3,ymm7,ymm2,0 ; permute from two sources |
|
- | 2997 | ||
- | 2998 | "vphaddbw" adds pairs of adjacent signed bytes to form 16-bit values and |
|
- | 2999 | stores them at the same positions in destination. "vphaddubw" does the same |
|
- | 3000 | but treats the bytes as unsigned. "vphaddbd" and "vphaddubd" sum all bytes |
|
- | 3001 | (either signed or unsigned) in each four-byte block to 32-bit results, |
|
- | 3002 | "vphaddbq" and "vphaddubq" sum all bytes in each eight-byte block to |
|
- | 3003 | 64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit |
|
- | 3004 | results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to |
|
- | 3005 | 64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit |
|
- | 3006 | results. "vphsubbw" substracts in each two-byte block the byte at higher |
|
- | 3007 | position from the one at lower position, and stores the result as a signed |
|
- | 3008 | 16-bit value at the corresponding position in destination, "vphsubwd" |
|
- | 3009 | substracts in each two-word block the word at higher position from the one at |
|
- | 3010 | lower position and makes signed 32-bit results, "vphsubdq" substract in each |
|
- | 3011 | block of two double word the one at higher position from the one at lower |
|
- | 3012 | position and makes signed 64-bit results. Each of these instructions takes |
|
- | 3013 | two operands, the destination being SSE register, and the source being SSE |
|
- | 3014 | register or 128-bit memory. |
|
- | 3015 | ||
- | 3016 | vphadduwq xmm0,xmm1 ; sum quadruplets of words |
|
- | 3017 | ||
- | 3018 | "vpmacsww" and "vpmacssww" multiply the corresponding signed 16-bit values |
|
- | 3019 | from the first and second source and then add the products to the parallel |
|
- | 3020 | values from the third source, then "vpmacsww" takes the lowest 16 bits of the |
|
- | 3021 | result and "vpmacssww" saturates the result down to 16-bit value, and they |
|
- | 3022 | store the final 16-bit results in the destination. "vpmacsdd" and "vpmacssdd" |
|
- | 3023 | perform the analogous operation on 32-bit values. "vpmacswd" and "vpmacswd" do |
|
- | 3024 | the same calculation only on the low 16-bit values from each 32-bit block and |
|
- | 3025 | form the 32-bit results. "vpmacsdql" and "vpmacssdql" perform such operation |
|
- | 3026 | on the low 32-bit values from each 64-bit block and form the 64-bit results, |
|
- | 3027 | while "vpmacsdqh" and "vpmacssdqh" do the same on the high 32-bit values from |
|
- | 3028 | each 64-bit block, also forming the 64-bit results. "vpmadcswd" and |
|
- | 3029 | "vpmadcsswd" multiply the corresponding signed 16-bit value from the first |
|
- | 3030 | and second source, then sum all the four products and add this sum to each |
|
- | 3031 | 16-bit element from third source, storing the truncated or saturated result |
|
- | 3032 | in destination. All these instructions take four operands, the second source |
|
- | 3033 | can be 128-bit memory or SSE register, all the other operands have to be |
|
- | 3034 | SSE registers. |
|
- | 3035 | ||
- | 3036 | vpmacsdd xmm6,xmm1,[ebx],xmm6 ; accumulate product |
|
- | 3037 | ||
- | 3038 | "vpperm" selects bytes from first and second source, optionally applies a |
|
- | 3039 | separate transformation to each of them, and stores them in the destination. |
|
- | 3040 | The bit fields in fourth operand (the selector) specify for each position in |
|
- | 3041 | destination what byte from which source is taken and what operation is applied |
|
- | 3042 | to it before it is stored there. Refer to the AMD manuals for the detailed |
|
- | 3043 | information about these bit fields. This instruction takes four operands, |
|
- | 3044 | either second source or selector can be a 128-bit memory (or they can be SSE |
|
- | 3045 | registers both), all the other operands have to be SSE registers. |
|
- | 3046 | "vpshlb", "vpshlw", "vpshld" and "vpshlq" shift logically bytes, words, double |
|
- | 3047 | words or quad words respectively. The amount of bits to shift by is specified |
|
- | 3048 | for each element separately by the signed byte placed at the corresponding |
|
- | 3049 | position in the third operand. The source containing elements to shift is |
|
- | 3050 | provided as second operand. Either second or third operand can be 128-bit |
|
- | 3051 | memory (or they can be SSE registers both) and the other operands have to be |
|
- | 3052 | SSE registers. |
|
- | 3053 | ||
- | 3054 | vpshld xmm3,xmm1,[ebx] ; shift bytes from xmm1 |
|
- | 3055 | ||
- | 3056 | "vpshab", "vpshaw", "vpshad" and "vpshaq" arithmetically shift bytes, words, |
|
- | 3057 | double words or quad words. These instructions follow the same rules as the |
|
- | 3058 | logical shifts described above. "vprotb", "vprotw", "vprotd" and "vprotq" |
|
- | 3059 | rotate bytes, word, double words or quad words. They follow the same rules as |
|
- | 3060 | shifts, but additionally allow third operand to be immediate value, in which |
|
- | 3061 | case the same amount of rotation is specified for all the elements in source. |
|
- | 3062 | ||
- | 3063 | vprotb xmm0,[esi],3 ; rotate bytes to the left |
|
- | 3064 | ||
- | 3065 | The MOVBE extension introduces just one new instruction, "movbe", which |
|
- | 3066 | swaps bytes in value from source before storing it in destination, so can |
|
- | 3067 | be used to load and store big endian values. It takes two operands, either |
|
- | 3068 | the destination or source should be a 16-bit, 32-bit or 64-bit memory (the |
|
- | 3069 | last one being only allowed in long mode), and the other operand should be |
|
- | 3070 | a general register of the same size. |
|
- | 3071 | The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces |
|
- | 3072 | new instructions operating on general registers, which use the same encoding |
|
- | 3073 | as AVX instructions and so allow the extended syntax. All these instructions |
|
- | 3074 | use 32-bit operands, and in long mode they also allow the forms with 64-bit |
|
- | 3075 | operands. |
|
- | 3076 | "andn" calculates the bitwise AND of second source with the inverted bits |
|
- | 3077 | of first source and stores the result in destination. The destination and |
|
- | 3078 | the first source have to be general registers, the second source can be |
|
- | 3079 | general register or memory. |
|
- | 3080 | ||
- | 3081 | andn edx,eax,[ebx] ; bit-multiply inverted eax with memory |
|
- | 3082 | ||
- | 3083 | "bextr" extracts from the first source the sequence of bits using an index |
|
- | 3084 | and length specified by bit fields in the second source operand and stores |
|
- | 3085 | it into destination. The lowest 8 bits of second source specify the position |
|
- | 3086 | of bit sequence to extract and the next 8 bits of second source specify the |
|
- | 3087 | length of sequence. The first source can be a general register or memory, |
|
- | 3088 | the other two operands have to be general registers. |
|
- | 3089 | ||
- | 3090 | bextr eax,[esi],ecx ; extract bit field from memory |
|
- | 3091 | ||
- | 3092 | "blsi" extracts the lowest set bit from the source, setting all the other |
|
- | 3093 | bits in destination to zero. The destination must be a general register, |
|
- | 3094 | the source can be general register or memory. |
|
- | 3095 | ||
- | 3096 | blsi rax,r11 ; isolate the lowest set bit |
|
- | 3097 | ||
- | 3098 | "blsmsk" sets all the bits in the destination up to the lowest set bit in |
|
- | 3099 | the source, including this bit. "blsr" copies all the bits from the source to |
|
- | 3100 | destination except for the lowest set bit, which is replaced by zero. These |
|
- | 3101 | instructions follow the same rules for operands as "blsi". |
|
- | 3102 | "tzcnt" counts the number of trailing zero bits, that is the zero bits up to |
|
- | 3103 | the lowest set bit of source value. This instruction is analogous to "lzcnt" |
|
- | 3104 | and follows the same rules for operands, so it also has a 16-bit version, |
|
- | 3105 | unlike the other BMI instructions. |
|
- | 3106 | "bzhi" is BMI2 instruction, which copies the bits from first source to |
|
- | 3107 | destination, zeroing all the bits up from the position specified by second |
|
- | 3108 | source. It follows the same rules for operands as "bextr". |
|
- | 3109 | "pext" uses a mask in second source operand to select bits from first |
|
- | 3110 | operands and puts the selected bits as a continuous sequence into destination. |
|
- | 3111 | "pdep" performs the reverse operation - it takes sequence of bits from the |
|
- | 3112 | first source and puts them consecutively at the positions where the bits in |
|
- | 3113 | second source are set, setting all the other bits in destination to zero. |
|
- | 3114 | These BMI2 instructions follow the same rules for operands as "andn". |
|
- | 3115 | "mulx" is a BMI2 instruction which performs an unsigned multiplication of |
|
- | 3116 | value from EDX or RDX register (depending on the size of specified operands) |
|
- | 3117 | by the value from third operand, and stores the low half of result in the |
|
- | 3118 | second operand, and the high half of result in the first operand, and it does |
|
- | 3119 | it without affecting the flags. The third operand can be general register or |
|
- | 3120 | memory, and both the destination operands have to be general registers. |
|
- | 3121 | ||
- | 3122 | mulx edx,eax,ecx ; multiply edx by ecx into edx:eax |
|
- | 3123 | ||
- | 3124 | "shlx", "shrx" and "sarx" are BMI2 instructions, which perform logical or |
|
- | 3125 | arithmetical shifts of value from first source by the amount specified by |
|
- | 3126 | second source, and store the result in destination without affecting the |
|
- | 3127 | flags. The have the same rules for operands as "bzhi" instruction. |
|
- | 3128 | "rorx" is a BMI2 instruction which rotates right the value from source |
|
- | 3129 | operand by the constant amount specified in third operand and stores the |
|
- | 3130 | result in destination without affecting the flags. The destination operand |
|
- | 3131 | has to be general register, the source operand can be general register or |
|
- | 3132 | memory, and the third operand has to be an immediate value. |
|
- | 3133 | ||
- | 3134 | rorx eax,edx,7 ; rotate without affecting flags |
|
- | 3135 | ||
- | 3136 | The TBM is an extension designed by AMD to supplement the BMI set. The |
|
- | 3137 | "bextr" instruction is extended with a new form, in which second source is |
|
- | 3138 | a 32-bit immediate value. "blsic" is a new instruction which performs the |
|
- | 3139 | same operation as "blsi", but with the bits of result reversed. It uses the |
|
- | 3140 | same rules for operands as "blsi". "blsfill" is a new instruction, which takes |
|
- | 3141 | the value from source, sets all the bits below the lowest set bit and store |
|
- | 3142 | the result in destination, it also uses the same rules for operands as "blsi". |
|
- | 3143 | "blci", "blcic", "blcs", "blcmsk" and "blcfill" are instructions analogous |
|
- | 3144 | to "blsi", "blsic", "blsr", "blsmsk" and "blsfill" respectively, but they |
|
- | 3145 | perform the bit-inverted versions of the same operations. They follow the |
|
- | 3146 | same rules for operands as the instructions they reflect. |
|
- | 3147 | "tzmsk" finds the lowest set bit in value from source operand, sets all bits |
|
- | 3148 | below it to 1 and all the rest of bits to zero, then writes the result to |
|
- | 3149 | destination. "t1mskc" finds the least significant zero bit in the value from |
|
- | 3150 | source operand, sets the bits below it to zero and all the other bits to 1, |
|
- | 3151 | and writes the result to destination. These instructions have the same rules |
|
- | 3152 | for operands as "blsi". |
|
- | 3153 | ||
- | 3154 | ||
- | 3155 | 2.1.24 Other extensions of instruction set |
|
- | 3156 | ||
- | 3157 | There is a number of additional instruction set extensions recognized by flat |
|
- | 3158 | assembler, and the general syntax of the instructions introduced by those |
|
- | 3159 | extensions is provided here. For a detailed information on the operations |
|
- | 3160 | performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE, |
|
- | 3161 | RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM |
|
- | 3162 | extension). |
|
- | 3163 | The Virtual-Machine Extensions (VMX) provide a set of instructions for the |
|
- | 3164 | management of virtual machines. The "vmxon" instruction, which enters the VMX |
|
- | 3165 | operation, requires a single 64-bit memory operand, which should be a physical |
|
- | 3166 | address of memory region, which the logical processor may use to support VMX |
|
- | 3167 | operation. The "vmxoff" instruction, which leaves the VMX operation, has no |
|
- | 3168 | operands. The "vmlaunch" and "vmresume", which launch or resume the virtual |
|
- | 3169 | machines, and "vmcall", which allows guest software to call the VM monitor, |
|
- | 3170 | use no operands either. |
|
- | 3171 | The "vmptrld" loads the physical address of current Virtual Machine Control |
|
- | 3172 | Structure (VMCS) from its memory operand, "vmptrst" stores the pointer to |
|
- | 3173 | current VMCS into address specified by its memory operand, and "vmclear" sets |
|
- | 3174 | the launch state of the VMCS referenced by its memory operand to clear. These |
|
- | 3175 | three instruction all require single 64-bit memory operand. |
|
- | 3176 | The "vmread" reads from VCMS a field specified by the source operand and |
|
- | 3177 | stores it into the destination operand. The source operand should be a |
|
- | 3178 | general purpose register, and the destination operand can be a register of |
|
- | 3179 | memory. The "vmwrite" writes into a VMCS field specified by the destination |
|
- | 3180 | operand the value provided by source operand. The source operand can be a |
|
- | 3181 | general purpose register or memory, and the destination operand must be a |
|
- | 3182 | register. The size of operands for those instructions should be 64-bit when |
|
- | 3183 | in long mode, and 32-bit otherwise. |
|
- | 3184 | The "invept" and "invvpid" invalidate the translation lookaside buffers |
|
- | 3185 | (TLBs) and paging-structure caches, either derived from extended page tables |
|
- | 3186 | (EPT), or based on the virtual processor identifier (VPID). These instructions |
|
- | 3187 | require two operands, the first one being the general purpose register |
|
- | 3188 | specifying the type of invalidation, and the second one being a 128-bit |
|
- | 3189 | memory operand providing the invalidation descriptor. The first operand |
|
- | 3190 | should be a 64-bit register when in long mode, and 32-bit register otherwise. |
|
- | 3191 | The Safer Mode Extensions (SMX) provide the functionalities available |
|
- | 3192 | throught the "getsec" instruction. This instruction takes no operands, and |
|
- | 3193 | the function that is executed is determined by the contents of EAX register |
|
- | 3194 | upon executing this instruction. |
|
- | 3195 | The Secure Virtual Machine (SVM) is a variant of virtual machine extension |
|
- | 3196 | used by AMD. The "skinit" instruction securely reinitializes the processor |
|
- | 3197 | allowing the startup of trusted software, such as the virtual machine monitor |
|
- | 3198 | (VMM). This instruction takes a single operand, which must be EAX, and |
|
- | 3199 | provides a physical address of the secure loader block (SLB). |
|
- | 3200 | The "vmrun" instruction is used to start a guest virtual machine, |
|
- | 3201 | its only operand should be an accumulator register (AX, EAX or RAX, the |
|
- | 3202 | last one available only in long mode) providing the physical address of the |
|
- | 3203 | virtual machine control block (VMCB). The "vmsave" stores a subset of |
|
- | 3204 | processor state into VMCB specified by its operand, and "vmload" loads the |
|
- | 3205 | same subset of processor state from a specified VMCB. The same operand rules |
|
- | 3206 | as for the "vmrun" apply to those two instructions. |
|
- | 3207 | "vmmcall" allows the guest software to call the VMM. This instruction takes |
|
- | 3208 | no operands. |
|
- | 3209 | "stgi" set the global interrupt flag to 1, and "clgi" zeroes it. These |
|
- | 3210 | instructions take no operands. |
|
- | 3211 | "invlpga" invalidates the TLB mapping for a virtual page specified by the |
|
- | 3212 | first operand (which has to be accumulator register) and address space |
|
- | 3213 | identifier specified by the second operand (which must be ECX register). |
|
- | 3214 | The XSAVE set of instructions allows to save and restore processor state |
|
- | 3215 | components. "xsave" and "xsaveopt" store the components of processor state |
|
- | 3216 | defined by bit mask in EDX and EAX registers into area defined by memory |
|
- | 3217 | operand. "xrstor" restores from the area specified by memory operand the |
|
- | 3218 | components of processor state defined by mask in EDX and EAX. The "xsave64", |
|
- | 3219 | "xsaveopt64" and "xrstor64" are 64-bit versions of these instructions, allowed |
|
- | 3220 | only in long mode. |
|
- | 3221 | "xgetbv" read the contents of 64-bit XCR (extended control register) |
|
- | 3222 | specified in ECX register into EDX and EAX registers. "xsetbv" writes the |
|
- | 3223 | contents of EDX and EAX into the 64-bit XCR specified by ECX register. These |
|
- | 3224 | instructions have no operands. |
|
- | 3225 | The RDRAND extension introduces one new instruction, "rdrand", which loads |
|
- | 3226 | the hardware-generated random value into general register. It takes one |
|
- | 3227 | operand, which can be 16-bit, 32-bit or 64-bit register (with the last one |
|
- | 3228 | being allowed only in long mode). |
|
- | 3229 | The FSGSBASE extension adds long mode instructions that allow to read and |
|
- | 3230 | write the segment base registers for FS and GS segments. "rdfsbase" and |
|
- | 3231 | "rdgsbase" read the corresponding segment base registers into operand, while |
|
- | 3232 | "wrfsbase" and "wrgsbase" write the value of operand into those register. |
|
- | 3233 | All these instructions take one operand, which can be 32-bit or 64-bit general |
|
- | 3234 | register. |
|
- | 3235 | The INVPCID extension adds "invpcid" instruction, which invalidates mapping |
|
- | 3236 | in the TLBs and paging caches based on the invalidation type specified in |
|
- | 3237 | first operand and PCID invalidate descriptor specified in second operand. |
|
- | 3238 | The first operands should be 32-bit general register when not in long mode, |
|
- | 3239 | or 64-bit general register when in long mode. The second operand should be |
|
- | 3240 | 128-bit memory location. |
|
- | 3241 | The HLE and RTM extensions provide set of instructions for the transactional |
|
- | 3242 | management. The "xacquire" and "xrelease" are new prefixes that can be used |
|
- | 3243 | with some of the instructions to start or end lock elision on the memory |
|
- | 3244 | address specified by prefixed instruction. The "xbegin" instruction starts |
|
- | 3245 | the transactional execution, its operand is the address a fallback routine |
|
- | 3246 | that gets executes in case of transaction abort, specified like the operand |
|
- | 3247 | for near jump instruction. "xend" marks the end of transcational execution |
|
- | 3248 | region, it takes no operands. "xabort" forces the transaction abort, it takes |
|
- | 3249 | an 8-bit immediate value as its only operand, this value is passed in the |
|
- | 3250 | highest bits of EAX to the fallback routine. "xtest" checks whether there is |
|
- | 3251 | transactional execution in progress, this instruction takes no operands. |
|
- | 3252 | ||
2211 | 3253 | ||
2212 | 2.2 Control directives |
3254 | 2.2 Control directives |
2213 | 3255 | ||
2214 | This section describes the directives that control the assembly process, they |
3256 | This section describes the directives that control the assembly process, they |
2215 | are processed during the assembly and may cause some blocks of instructions |
3257 | are processed during the assembly and may cause some blocks of instructions |
2216 | to be assembled differently or not assembled at all. |
3258 | to be assembled differently or not assembled at all. |
2217 | 3259 | ||
2218 | 3260 | ||
2219 | 2.2.1 Numerical constants |
3261 | 2.2.1 Numerical constants |
2220 | 3262 | ||
2221 | The "=" directive allows to define the numerical constant. It should be |
3263 | The "=" directive allows to define the numerical constant. It should be |
2222 | preceded by the name for the constant and followed by the numerical expression |
3264 | preceded by the name for the constant and followed by the numerical expression |
2223 | providing the value. The value of such constants can be a number or an address, |
3265 | providing the value. The value of such constants can be a number or an address, |
2224 | but - unlike labels - the numerical constants are not allowed to hold the |
3266 | but - unlike labels - the numerical constants are not allowed to hold the |
2225 | register-based addresses. Besides this difference, in their basic variant |
3267 | register-based addresses. Besides this difference, in their basic variant |
2226 | numerical constants behave very much like labels and you can even |
3268 | numerical constants behave very much like labels and you can even |
2227 | forward-reference them (access their values before they actually get defined). |
3269 | forward-reference them (access their values before they actually get defined). |
2228 | There is, however, a second variant of numerical constants, which is |
3270 | There is, however, a second variant of numerical constants, which is |
2229 | recognized by assembler when you try to define the constant of name, under |
3271 | recognized by assembler when you try to define the constant of name, under |
2230 | which there already was a numerical constant defined. In such case assembler |
3272 | which there already was a numerical constant defined. In such case assembler |
2231 | treats that constant as an assembly-time variable and allows it to be assigned |
3273 | treats that constant as an assembly-time variable and allows it to be assigned |
2232 | with new value, but forbids forward-referencing it (for obvious reasons). Let's |
3274 | with new value, but forbids forward-referencing it (for obvious reasons). Let's |
2233 | see both the variant of numerical constants in one example: |
3275 | see both the variant of numerical constants in one example: |
2234 | 3276 | ||
2235 | dd sum |
3277 | dd sum |
2236 | x = 1 |
3278 | x = 1 |
2237 | x = x+2 |
3279 | x = x+2 |
2238 | sum = x |
3280 | sum = x |
2239 | 3281 | ||
2240 | Here the "x" is an assembly-time variable, and every time it is accessed, the |
3282 | Here the "x" is an assembly-time variable, and every time it is accessed, the |
2241 | value that was assigned to it the most recently is used. Thus if we tried to |
3283 | value that was assigned to it the most recently is used. Thus if we tried to |
2242 | access the "x" before it gets defined the first time, like if we wrote "dd x" |
3284 | access the "x" before it gets defined the first time, like if we wrote "dd x" |
2243 | in place of the "dd sum" instruction, it would cause an error. And when it is |
3285 | in place of the "dd sum" instruction, it would cause an error. And when it is |
2244 | re-defined with the "x = x+2" directive, the previous value of "x" is used to |
3286 | re-defined with the "x = x+2" directive, the previous value of "x" is used to |
2245 | calculate the new one. So when the "sum" constant gets defined, the "x" has |
3287 | calculate the new one. So when the "sum" constant gets defined, the "x" has |
2246 | value of 3, and this value is assigned to the "sum". Since this one is defined |
3288 | value of 3, and this value is assigned to the "sum". Since this one is defined |
2247 | only once in source, it is the standard numerical constant, and can be |
3289 | only once in source, it is the standard numerical constant, and can be |
2248 | forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about |
3290 | forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about |
2249 | how the assembler is able to resolve this, see section 2.2.6. |
3291 | how the assembler is able to resolve this, see section 2.2.6. |
2250 | The value of numerical constant can be preceded by size operator, which can |
3292 | The value of numerical constant can be preceded by size operator, which can |
2251 | ensure that the value will fit in the range for the specified size, and can |
3293 | ensure that the value will fit in the range for the specified size, and can |
2252 | affect also how some of the calculations inside the numerical expression are |
3294 | affect also how some of the calculations inside the numerical expression are |
2253 | performed. This example: |
3295 | performed. This example: |
2254 | 3296 | ||
2255 | c8 = byte -1 |
3297 | c8 = byte -1 |
2256 | c32 = dword -1 |
3298 | c32 = dword -1 |
2257 | 3299 | ||
2258 | defines two different constants, the first one fits in 8 bits, the second one |
3300 | defines two different constants, the first one fits in 8 bits, the second one |
2259 | fits in 32 bits. |
3301 | fits in 32 bits. |
2260 | When you need to define constant with the value of address, which may be |
3302 | When you need to define constant with the value of address, which may be |
2261 | register-based (and thus you cannot employ numerical constant for this |
3303 | register-based (and thus you cannot employ numerical constant for this |
2262 | purpose), you can use the extended syntax of "label" directive (already |
3304 | purpose), you can use the extended syntax of "label" directive (already |
2263 | described in section 1.2.3), like: |
3305 | described in section 1.2.3), like: |
2264 | 3306 | ||
2265 | label myaddr at ebp+4 |
3307 | label myaddr at ebp+4 |
2266 | 3308 | ||
2267 | which declares label placed at "ebp+4" address. However remember that labels, |
3309 | which declares label placed at "ebp+4" address. However remember that labels, |
2268 | unlike numerical constants, cannot become assembly-time variables. |
3310 | unlike numerical constants, cannot become assembly-time variables. |
2269 | 3311 | ||
2270 | 3312 | ||
2271 | 2.2.2 Conditional assembly |
3313 | 2.2.2 Conditional assembly |
2272 | 3314 | ||
2273 | "if" directive causes come block of instructions to be assembled only under |
3315 | "if" directive causes some block of instructions to be assembled only under |
2274 | certain condition. It should be followed by logical expression specifying the |
3316 | certain condition. It should be followed by logical expression specifying the |
2275 | condition, instructions in next lines will be assembled only when this |
3317 | condition, instructions in next lines will be assembled only when this |
2276 | condition is met, otherwise they will be skipped. The optional "else if" |
3318 | condition is met, otherwise they will be skipped. The optional "else if" |
2277 | directive followed with logical expression specifying additional condition |
3319 | directive followed with logical expression specifying additional condition |
2278 | begins the next block of instructions that will be assembled if previous |
3320 | begins the next block of instructions that will be assembled if previous |
2279 | conditions were not met, and the additional condition is met. The optional |
3321 | conditions were not met, and the additional condition is met. The optional |
2280 | "else" directive begins the block of instructions that will be assembled if |
3322 | "else" directive begins the block of instructions that will be assembled if |
2281 | all the conditions were not met. The "end if" directive ends the last block of |
3323 | all the conditions were not met. The "end if" directive ends the last block of |
2282 | instructions. |
3324 | instructions. |
2283 | You should note that "if" directive is processed at assembly stage and |
3325 | You should note that "if" directive is processed at assembly stage and |
2284 | therefore it doesn't affect any preprocessor directives, like the definitions |
3326 | therefore it doesn't affect any preprocessor directives, like the definitions |
2285 | of symbolic constants and macroinstructions - when the assembler recognizes the |
3327 | of symbolic constants and macroinstructions - when the assembler recognizes the |
2286 | "if" directive, all the preprocessing has been already finished. |
3328 | "if" directive, all the preprocessing has been already finished. |
2287 | The logical expression consist of logical values and logical operators. The |
3329 | The logical expression consist of logical values and logical operators. The |
2288 | logical operators are "~" for logical negation, "&" for logical and, "|" for |
3330 | logical operators are "~" for logical negation, "&" for logical and, "|" for |
2289 | logical or. The negation has the highest priority. Logical value can be a |
3331 | logical or. The negation has the highest priority. Logical value can be a |
2290 | numerical expression, it will be false if it is equal to zero, otherwise it |
3332 | numerical expression, it will be false if it is equal to zero, otherwise it |
2291 | will be true. Two numerical expression can be compared using one of the |
3333 | will be true. Two numerical expression can be compared using one of the |
2292 | following operators to make the logical value: "=" (equal), "<" (less), |
3334 | following operators to make the logical value: "=" (equal), "<" (less), |
2293 | ">" (greater), "<=" (less or equal), ">=" (greater or equal), |
3335 | ">" (greater), "<=" (less or equal), ">=" (greater or equal), |
2294 | "<>" (not equal). |
3336 | "<>" (not equal). |
2295 | The "used" operator followed by a symbol name, is the logical value that |
3337 | The "used" operator followed by a symbol name, is the logical value that |
2296 | checks whether the given symbol is used somewhere (it returns correct result |
3338 | checks whether the given symbol is used somewhere (it returns correct result |
2297 | even if symbol is used only after this check). The "defined" operator can be |
3339 | even if symbol is used only after this check). The "defined" operator can be |
2298 | followed by any expression, usually just by a single symbol name; it checks |
3340 | followed by any expression, usually just by a single symbol name; it checks |
2299 | whether the given expression contains only symbols that are defined in the |
3341 | whether the given expression contains only symbols that are defined in the |
2300 | source and accessible from the current position. |
3342 | source and accessible from the current position. |
2301 | The following simple example uses the "count" constant that should be |
3343 | With "relativeto" operator it is possible to check whether values of two |
- | 3344 | expressions differ only by constant amount. The valid syntax is a numerical |
|
- | 3345 | expression followed by "relativeto" and then another expression (possibly |
|
- | 3346 | register-based). Labels that have no simple numerical value can be tested |
|
- | 3347 | this way to determine what kind of operations may be possible with them. |
|
- | 3348 | The following simple example uses the "count" constant that should be |
|
2302 | defined somewhere in source: |
3349 | defined somewhere in source: |
2303 | 3350 | ||
2304 | if count>0 |
3351 | if count>0 |
2305 | mov cx,count |
3352 | mov cx,count |
2306 | rep movsb |
3353 | rep movsb |
2307 | end if |
3354 | end if |
2308 | 3355 | ||
2309 | These two assembly instructions will be assembled only if the "count" constant |
3356 | These two assembly instructions will be assembled only if the "count" constant |
2310 | is greater than 0. The next sample shows more complex conditional structure: |
3357 | is greater than 0. The next sample shows more complex conditional structure: |
2311 | 3358 | ||
2312 | if count & ~ count mod 4 |
3359 | if count & ~ count mod 4 |
2313 | mov cx,count/4 |
3360 | mov cx,count/4 |
2314 | rep movsd |
3361 | rep movsd |
2315 | else if count>4 |
3362 | else if count>4 |
2316 | mov cx,count/4 |
3363 | mov cx,count/4 |
2317 | rep movsd |
3364 | rep movsd |
2318 | mov cx,count mod 4 |
3365 | mov cx,count mod 4 |
2319 | rep movsb |
3366 | rep movsb |
2320 | else |
3367 | else |
2321 | mov cx,count |
3368 | mov cx,count |
2322 | rep movsb |
3369 | rep movsb |
2323 | end if |
3370 | end if |
2324 | 3371 | ||
2325 | The first block of instructions gets assembled when the "count" is non zero and |
3372 | The first block of instructions gets assembled when the "count" is non zero and |
2326 | divisible by four, if this condition is not met, the second logical expression, |
3373 | divisible by four, if this condition is not met, the second logical expression, |
2327 | which follows the "else if", is evaluated and if it's true, the second block |
3374 | which follows the "else if", is evaluated and if it's true, the second block |
2328 | of instructions get assembled, otherwise the last block of instructions, which |
3375 | of instructions get assembled, otherwise the last block of instructions, which |
2329 | follows the line containing only "else", is assembled. |
3376 | follows the line containing only "else", is assembled. |
2330 | There are also operators that allow comparison of values being any chains of |
3377 | There are also operators that allow comparison of values being any chains of |
2331 | symbols. The "eq" compares two such values whether they are exactly the same. |
3378 | symbols. The "eq" compares whether two such values are exactly the same. |
2332 | The "in" operator checks whether given value is a member of the list of values |
3379 | The "in" operator checks whether given value is a member of the list of values |
2333 | following this operator, the list should be enclosed between "<" and ">" |
3380 | following this operator, the list should be enclosed between "<" and ">" |
2334 | characters, its members should be separated with commas. The symbols are |
3381 | characters, its members should be separated with commas. The symbols are |
2335 | considered the same when they have the same meaning for the assembler - for |
3382 | considered the same when they have the same meaning for the assembler - for |
2336 | example "pword" and "fword" for assembler are the same and thus are not |
3383 | example "pword" and "fword" for assembler are the same and thus are not |
2337 | distinguished by the above operators. In the same way "16 eq 10h" is the true |
3384 | distinguished by the above operators. In the same way "16 eq 10h" is the true |
2338 | condition, however "16 eq 10+4" is not. |
3385 | condition, however "16 eq 10+4" is not. |
2339 | The "eqtype" operator checks whether the two compared values have the same |
3386 | The "eqtype" operator checks whether the two compared values have the same |
2340 | structure, and whether the structural elements are of the same type. The |
3387 | structure, and whether the structural elements are of the same type. The |
2341 | distinguished types include numerical expressions, individual quoted strings, |
3388 | distinguished types include numerical expressions, individual quoted strings, |
2342 | floating point numbers, address expressions (the expressions enclosed in square |
3389 | floating point numbers, address expressions (the expressions enclosed in square |
2343 | brackets or preceded by "ptr" operator), instruction mnemonics, registers, size |
3390 | brackets or preceded by "ptr" operator), instruction mnemonics, registers, size |
2344 | operators, jump type and code type operators. And each of the special |
3391 | operators, jump type and code type operators. And each of the special |
2345 | characters that act as a separators, like comma or colon, is the separate type |
3392 | characters that act as a separators, like comma or colon, is the separate type |
2346 | itself. For example, two values, each one consisting of register name followed |
3393 | itself. For example, two values, each one consisting of register name followed |
2347 | by comma and numerical expression, will be regarded as of the same type, no |
3394 | by comma and numerical expression, will be regarded as of the same type, no |
2348 | matter what kind of register and how complicated numerical expression is used; |
3395 | matter what kind of register and how complicated numerical expression is used; |
2349 | with exception for the quoted strings and floating point values, which are the |
3396 | with exception for the quoted strings and floating point values, which are the |
2350 | special kinds of numerical expressions and are treated as different types. Thus |
3397 | special kinds of numerical expressions and are treated as different types. Thus |
2351 | "eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false. |
3398 | "eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false. |
2352 | 3399 | ||
2353 | 3400 | ||
2354 | 2.2.3 Repeating blocks of instructions |
3401 | 2.2.3 Repeating blocks of instructions |
2355 | 3402 | ||
2356 | "times" directive repeats one instruction specified number of times. It |
3403 | "times" directive repeats one instruction specified number of times. It |
2357 | should be followed by numerical expression specifying number of repeats and |
3404 | should be followed by numerical expression specifying number of repeats and |
2358 | the instruction to repeat (optionally colon can be used to separate number and |
3405 | the instruction to repeat (optionally colon can be used to separate number and |
2359 | instruction). When special symbol "%" is used inside the instruction, it is |
3406 | instruction). When special symbol "%" is used inside the instruction, it is |
2360 | equal to the number of current repeat. For example "times 5 db %" will define |
3407 | equal to the number of current repeat. For example "times 5 db %" will define |
2361 | five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is |
3408 | five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is |
2362 | also allowed, so "times 3 times % db %" will define six bytes with values |
3409 | also allowed, so "times 3 times % db %" will define six bytes with values |
2363 | 1, 1, 2, 1, 2, 3. |
3410 | 1, 1, 2, 1, 2, 3. |
2364 | "repeat" directive repeats the whole block of instructions. It should be |
3411 | "repeat" directive repeats the whole block of instructions. It should be |
2365 | followed by numerical expression specifying number of repeats. Instructions |
3412 | followed by numerical expression specifying number of repeats. Instructions |
2366 | to repeat are expected in next lines, ended with the "end repeat" directive, |
3413 | to repeat are expected in next lines, ended with the "end repeat" directive, |
2367 | for example: |
3414 | for example: |
2368 | 3415 | ||
2369 | repeat 8 |
3416 | repeat 8 |
2370 | mov byte [bx],% |
3417 | mov byte [bx],% |
2371 | inc bx |
3418 | inc bx |
2372 | end repeat |
3419 | end repeat |
2373 | 3420 | ||
2374 | The generated code will store byte values from one to eight in the memory |
3421 | The generated code will store byte values from one to eight in the memory |
2375 | addressed by BX register. |
3422 | addressed by BX register. |
2376 | Number of repeats can be zero, in that case the instructions are not |
3423 | Number of repeats can be zero, in that case the instructions are not |
2377 | assembled at all. |
3424 | assembled at all. |
2378 | The "break" directive allows to stop repeating earlier and continue assembly |
3425 | The "break" directive allows to stop repeating earlier and continue assembly |
2379 | from the first line after the "end repeat". Combined with the "if" directive it |
3426 | from the first line after the "end repeat". Combined with the "if" directive it |
2380 | allows to stop repeating under some special condition, like: |
3427 | allows to stop repeating under some special condition, like: |
2381 | 3428 | ||
2382 | s = x/2 |
3429 | s = x/2 |
2383 | repeat 100 |
3430 | repeat 100 |
2384 | if x/s = s |
3431 | if x/s = s |
2385 | break |
3432 | break |
2386 | end if |
3433 | end if |
2387 | s = (s+x/s)/2 |
3434 | s = (s+x/s)/2 |
2388 | end repeat |
3435 | end repeat |
2389 | 3436 | ||
2390 | The "while" directive repeats the block of instructions as long as the |
3437 | The "while" directive repeats the block of instructions as long as the |
2391 | condition specified by the logical expression following it is true. The block |
3438 | condition specified by the logical expression following it is true. The block |
2392 | of instructions to be repeated should end with the "end while" directive. |
3439 | of instructions to be repeated should end with the "end while" directive. |
2393 | Before each repetition the logical expression is evaluated and when its value |
3440 | Before each repetition the logical expression is evaluated and when its value |
2394 | is false, the assembly is continued starting from the first line after the |
3441 | is false, the assembly is continued starting from the first line after the |
2395 | "end while". Also in this case the "%" symbol holds the number of current |
3442 | "end while". Also in this case the "%" symbol holds the number of current |
2396 | repeat. The "break" directive can be used to stop this kind of loop in the same |
3443 | repeat. The "break" directive can be used to stop this kind of loop in the same |
2397 | way as with "repeat" directive. The previous sample can be rewritten to use the |
3444 | way as with "repeat" directive. The previous sample can be rewritten to use the |
2398 | "while" instead of "repeat" this way: |
3445 | "while" instead of "repeat" this way: |
2399 | 3446 | ||
2400 | s = x/2 |
3447 | s = x/2 |
2401 | while x/s <> s |
3448 | while x/s <> s |
2402 | s = (s+x/s)/2 |
3449 | s = (s+x/s)/2 |
2403 | if % = 100 |
3450 | if % = 100 |
2404 | break |
3451 | break |
2405 | end if |
3452 | end if |
2406 | end while |
3453 | end while |
2407 | 3454 | ||
2408 | The blocks defined with "if", "repeat" and "while" can be nested in any |
3455 | The blocks defined with "if", "repeat" and "while" can be nested in any |
2409 | order, however they should be closed in the same order in which they were |
3456 | order, however they should be closed in the same order in which they were |
2410 | started. The "break" directive always stops processing the block that was |
3457 | started. The "break" directive always stops processing the block that was |
2411 | started last with either the "repeat" or "while" directive. |
3458 | started last with either the "repeat" or "while" directive. |
2412 | 3459 | ||
2413 | 3460 | ||
2414 | 2.2.4 Addressing spaces |
3461 | 2.2.4 Addressing spaces |
2415 | 3462 | ||
2416 | "org" directive sets address at which the following code is expected to |
3463 | "org" directive sets address at which the following code is expected to |
2417 | appear in memory. It should be followed by numerical expression specifying |
3464 | appear in memory. It should be followed by numerical expression specifying |
2418 | the address. This directive begins the new addressing space, the following |
3465 | the address. This directive begins the new addressing space, the following |
2419 | code itself is not moved in any way, but all the labels defined within it |
3466 | code itself is not moved in any way, but all the labels defined within it |
2420 | and the value of "$" symbol are affected as if it was put at the given |
3467 | and the value of "$" symbol are affected as if it was put at the given |
2421 | address. However it's the responsibility of programmer to put the code at |
3468 | address. However it's the responsibility of programmer to put the code at |
2422 | correct address at run-time. |
3469 | correct address at run-time. |
2423 | The "load" directive allows to define constant with a binary value loaded |
3470 | The "load" directive allows to define constant with a binary value loaded |
2424 | from the already assembled code. This directive should be followed by the name |
3471 | from the already assembled code. This directive should be followed by the name |
2425 | of the constant, then optionally size operator, then "from" operator and a |
3472 | of the constant, then optionally size operator, then "from" operator and a |
2426 | numerical expression specifying a valid address in current addressing space. |
3473 | numerical expression specifying a valid address in current addressing space. |
2427 | The size operator has unusual meaning in this case - it states how many bytes |
3474 | The size operator has unusual meaning in this case - it states how many bytes |
2428 | (up to 8) have to be loaded to form the binary value of constant. If no size |
3475 | (up to 8) have to be loaded to form the binary value of constant. If no size |
2429 | operator is specified, one byte is loaded (thus value is in range from 0 to |
3476 | operator is specified, one byte is loaded (thus value is in range from 0 to |
2430 | 255). The loaded data cannot exceed current offset. |
3477 | 255). The loaded data cannot exceed current offset. |
2431 | The "store" directive can modify the already generated code by replacing |
3478 | The "store" directive can modify the already generated code by replacing |
2432 | some of the previously generated data with the value defined by given |
3479 | some of the previously generated data with the value defined by given |
2433 | numerical expression, which follow. The expression can be preceded by the |
3480 | numerical expression, which follows. The expression can be preceded by the |
2434 | optional size operator to specify how large value the expression defines, and |
3481 | optional size operator to specify how large value the expression defines, and |
2435 | therefore how much bytes will be stored, if there is no size operator, the |
3482 | therefore how much bytes will be stored, if there is no size operator, the |
2436 | size of one byte is assumed. Then the "at" operator and the numerical |
3483 | size of one byte is assumed. Then the "at" operator and the numerical |
2437 | expression defining the valid address in current addressing code space, at |
3484 | expression defining the valid address in current addressing code space, at |
2438 | which the given value have to be stored should follow. This is a directive for |
3485 | which the given value have to be stored should follow. This is a directive for |
2439 | advanced appliances and should be used carefully. |
3486 | advanced appliances and should be used carefully. |
2440 | Both "load" and "store" directives are limited to operate on places in |
3487 | Both "load" and "store" directives are limited to operate on places in |
2441 | current addressing space. The "$$" symbol is always equal to the base address |
3488 | current addressing space. The "$$" symbol is always equal to the base address |
2442 | of current addressing space, and the "$" symbol is the address of current |
3489 | of current addressing space, and the "$" symbol is the address of current |
2443 | position in that addressing space, therefore these two values define limits |
3490 | position in that addressing space, therefore these two values define limits |
2444 | of the area, where "load" and "store" can operate. |
3491 | of the area, where "load" and "store" can operate. |
2445 | Combining the "load" and "store" directives allows to do things like encoding |
3492 | Combining the "load" and "store" directives allows to do things like encoding |
2446 | some of the already generated code. For example to encode the whole code |
3493 | some of the already generated code. For example to encode the whole code |
2447 | generated in current addressing space you can use such block of directives: |
3494 | generated in current addressing space you can use such block of directives: |
2448 | 3495 | ||
2449 | repeat $-$$ |
3496 | repeat $-$$ |
2450 | load a byte from $$+%-1 |
3497 | load a byte from $$+%-1 |
2451 | store byte a xor c at $$+%-1 |
3498 | store byte a xor c at $$+%-1 |
2452 | end repeat |
3499 | end repeat |
2453 | 3500 | ||
2454 | and each byte of code will be xored with the value defined by "c" constant. |
3501 | and each byte of code will be xored with the value defined by "c" constant. |
2455 | "virtual" defines virtual data at specified address. This data won't be |
3502 | "virtual" defines virtual data at specified address. This data will not be |
2456 | included in the output file, but labels defined there can be used in other |
3503 | included in the output file, but labels defined there can be used in other |
2457 | parts of source. This directive can be followed by "at" operator and the |
3504 | parts of source. This directive can be followed by "at" operator and the |
2458 | numerical expression specifying the address for virtual data, otherwise is |
3505 | numerical expression specifying the address for virtual data, otherwise is |
2459 | uses current address, the same as "virtual at $". Instructions defining data |
3506 | uses current address, the same as "virtual at $". Instructions defining data |
2460 | are expected in next lines, ended with "end virtual" directive. The block of |
3507 | are expected in next lines, ended with "end virtual" directive. The block of |
2461 | virtual instructions itself is an independent addressing space, after it's |
3508 | virtual instructions itself is an independent addressing space, after it's |
2462 | ended, the context of previous addressing space is restored. |
3509 | ended, the context of previous addressing space is restored. |
2463 | The "virtual" directive can be used to create union of some variables, for |
3510 | The "virtual" directive can be used to create union of some variables, for |
2464 | example: |
3511 | example: |
2465 | 3512 | ||
2466 | GDTR dp ? |
3513 | GDTR dp ? |
2467 | virtual at GDTR |
3514 | virtual at GDTR |
2468 | GDT_limit dw ? |
3515 | GDT_limit dw ? |
2469 | GDT_address dd ? |
3516 | GDT_address dd ? |
2470 | end virtual |
3517 | end virtual |
2471 | 3518 | ||
2472 | It defines two labels for parts of the 48-bit variable at "GDTR" address. |
3519 | It defines two labels for parts of the 48-bit variable at "GDTR" address. |
2473 | It can be also used to define labels for some structures addressed by a |
3520 | It can be also used to define labels for some structures addressed by a |
2474 | register, for example: |
3521 | register, for example: |
2475 | 3522 | ||
2476 | virtual at bx |
3523 | virtual at bx |
2477 | LDT_limit dw ? |
3524 | LDT_limit dw ? |
2478 | LDT_address dd ? |
3525 | LDT_address dd ? |
2479 | end virtual |
3526 | end virtual |
2480 | 3527 | ||
2481 | With such definition instruction "mov ax,[LDT_limit]" will be assembled |
3528 | With such definition instruction "mov ax,[LDT_limit]" will be assembled |
2482 | to "mov ax,[bx]". |
3529 | to the same instruction as "mov ax,[bx]". |
2483 | Declaring defined data values or instructions inside the virtual block would |
3530 | Declaring defined data values or instructions inside the virtual block would |
2484 | also be useful, because the "load" directive can be used to load the values |
3531 | also be useful, because the "load" directive can be used to load the values |
2485 | from the virtually generated code into a constants. This directive should be |
3532 | from the virtually generated code into a constants. This directive should be |
2486 | used after the code it loads but before the virtual block ends, because it can |
3533 | used after the code it loads but before the virtual block ends, because it can |
2487 | only load the values from the same addressing space. For example: |
3534 | only load the values from the same addressing space. For example: |
2488 | 3535 | ||
2489 | virtual at 0 |
3536 | virtual at 0 |
2490 | xor eax,eax |
3537 | xor eax,eax |
2491 | and edx,eax |
3538 | and edx,eax |
2492 | load zeroq dword from 0 |
3539 | load zeroq dword from 0 |
2493 | end virtual |
3540 | end virtual |
2494 | 3541 | ||
2495 | The above piece of code will define the "zeroq" constant containing four bytes |
3542 | The above piece of code will define the "zeroq" constant containing four bytes |
2496 | of the machine code of the instructions defined inside the virtual block. |
3543 | of the machine code of the instructions defined inside the virtual block. |
2497 | This method can be also used to load some binary value from external file. |
3544 | This method can be also used to load some binary value from external file. |
2498 | For example this code: |
3545 | For example this code: |
2499 | 3546 | ||
2500 | virtual at 0 |
3547 | virtual at 0 |
2501 | file 'a.txt':10h,1 |
3548 | file 'a.txt':10h,1 |
2502 | load char from 0 |
3549 | load char from 0 |
2503 | end virtual |
3550 | end virtual |
2504 | 3551 | ||
2505 | loads the single byte from offset 10h in file "a.txt" into the "char" |
3552 | loads the single byte from offset 10h in file "a.txt" into the "char" |
2506 | constant. |
3553 | constant. |
2507 | Any of the "section" directives described in 2.4 also begins a new |
3554 | Any of the "section" directives described in 2.4 also begins a new |
2508 | addressing space. |
3555 | addressing space. |
2509 | 3556 | ||
2510 | 3557 | ||
2511 | 2.2.5 Other directives |
3558 | 2.2.5 Other directives |
2512 | 3559 | ||
2513 | "align" directive aligns code or data to the specified boundary. It should |
3560 | "align" directive aligns code or data to the specified boundary. It should |
2514 | be followed by a numerical expression specifying the number of bytes, to the |
3561 | be followed by a numerical expression specifying the number of bytes, to the |
2515 | multiply of which the current address has to be aligned. The boundary value |
3562 | multiply of which the current address has to be aligned. The boundary value |
2516 | has to be the power of two. |
3563 | has to be the power of two. |
2517 | The "align" directive fills the bytes that had to be skipped to perform the |
3564 | The "align" directive fills the bytes that had to be skipped to perform the |
2518 | alignment with the "nop" instructions and at the same time marks this area as |
3565 | alignment with the "nop" instructions and at the same time marks this area as |
2519 | uninitialized data, so if it is placed among other uninitialized data that |
3566 | uninitialized data, so if it is placed among other uninitialized data that |
2520 | wouldn't take space in the output file, the alignment bytes will act the same |
3567 | wouldn't take space in the output file, the alignment bytes will act the same |
2521 | way. If you need to fill the alignment area with some other values, you can |
3568 | way. If you need to fill the alignment area with some other values, you can |
2522 | combine "align" with "virtual" to get the size of alignment needed and then |
3569 | combine "align" with "virtual" to get the size of alignment needed and then |
2523 | create the alignment yourself, like: |
3570 | create the alignment yourself, like: |
2524 | 3571 | ||
2525 | virtual |
3572 | virtual |
2526 | align 16 |
3573 | align 16 |
2527 | a = $ - $$ |
3574 | a = $ - $$ |
2528 | end virtual |
3575 | end virtual |
2529 | db a dup 0 |
3576 | db a dup 0 |
2530 | 3577 | ||
2531 | The "a" constant is defined to be the difference between address after |
3578 | The "a" constant is defined to be the difference between address after |
2532 | alignment and address of the "virtual" block (see previous section), so it is |
3579 | alignment and address of the "virtual" block (see previous section), so it is |
2533 | equal to the size of needed alignment space. |
3580 | equal to the size of needed alignment space. |
2534 | "display" directive displays the message at the assembly time. It should |
3581 | "display" directive displays the message at the assembly time. It should |
2535 | be followed by the quoted strings or byte values, separated with commas. It |
3582 | be followed by the quoted strings or byte values, separated with commas. It |
2536 | can be used to display values of some constants, for example: |
3583 | can be used to display values of some constants, for example: |
2537 | 3584 | ||
2538 | bits = 16 |
3585 | bits = 16 |
2539 | display 'Current offset is 0x' |
3586 | display 'Current offset is 0x' |
2540 | repeat bits/4 |
3587 | repeat bits/4 |
2541 | d = '0' + $ shr (bits-%*4) and 0Fh |
3588 | d = '0' + $ shr (bits-%*4) and 0Fh |
2542 | if d > '9' |
3589 | if d > '9' |
2543 | d = d + 'A'-'9'-1 |
3590 | d = d + 'A'-'9'-1 |
2544 | end if |
3591 | end if |
2545 | display d |
3592 | display d |
2546 | end repeat |
3593 | end repeat |
2547 | display 13,10 |
3594 | display 13,10 |
2548 | 3595 | ||
2549 | This block of directives calculates the four hexadecimal digits of 16-bit value |
3596 | This block of directives calculates the four hexadecimal digits of 16-bit |
2550 | and converts them into characters for displaying. Note that this won't work if |
3597 | value and converts them into characters for displaying. Note that this will |
2551 | the adresses in current addressing space are relocatable (as it might happen |
3598 | not work if the adresses in current addressing space are relocatable (as it |
2552 | with PE or object output formats), since only absolute values can be used this |
3599 | might happen with PE or object output formats), since only absolute values can |
2553 | way. The absolute value may be obtained by calculating the relative address, |
3600 | be used this way. The absolute value may be obtained by calculating the |
2554 | like "$-$$", or "rva $" in case of PE format. |
3601 | relative address, like "$-$$", or "rva $" in case of PE format. |
- | 3602 | The "err" directive immediately terminates the assembly process when it is |
|
- | 3603 | encountered by assembler. |
|
- | 3604 | The "assert" directive tests whether the logical expression that follows it |
|
- | 3605 | is true, and if not, it signalizes the error. |
|
2555 | 3606 | ||
2556 | 3607 | ||
2557 | 2.2.6 Multiple passes |
3608 | 2.2.6 Multiple passes |
2558 | 3609 | ||
2559 | Because the assembler allows to reference some of the labels or constants |
3610 | Because the assembler allows to reference some of the labels or constants |
2560 | before they get actually defined, it has to predict the values of such labels |
3611 | before they get actually defined, it has to predict the values of such labels |
2561 | and if there is even a suspicion that prediction failed in at least one case, |
3612 | and if there is even a suspicion that prediction failed in at least one case, |
2562 | it does one more pass, assembling the whole source, this time doing better |
3613 | it does one more pass, assembling the whole source, this time doing better |
2563 | prediction based on the values the labels got in the previous pass. |
3614 | prediction based on the values the labels got in the previous pass. |
2564 | The changing values of labels can cause some instructions to have encodings |
3615 | The changing values of labels can cause some instructions to have encodings |
2565 | of different length, and this can cause the change in values of labels again. |
3616 | of different length, and this can cause the change in values of labels again. |
2566 | And since the labels and constants can also be used inside the expressions that |
3617 | And since the labels and constants can also be used inside the expressions that |
2567 | affect the behavior of control directives, the whole block of source can be |
3618 | affect the behavior of control directives, the whole block of source can be |
2568 | processed completely differently during the new pass. Thus the assembler does |
3619 | processed completely differently during the new pass. Thus the assembler does |
2569 | more and more passes, each time trying to do better predictions to approach |
3620 | more and more passes, each time trying to do better predictions to approach |
2570 | the final solution, when all the values get predicted correctly. It uses |
3621 | the final solution, when all the values get predicted correctly. It uses |
2571 | various method for predicting the values, which has been chosen to allow |
3622 | various method for predicting the values, which has been chosen to allow |
2572 | finding in a few passes the solution of possibly smallest length for the most |
3623 | finding in a few passes the solution of possibly smallest length for the most |
2573 | of the programs. |
3624 | of the programs. |
2574 | Some of the errors, like the values not fitting in required boundaries, are |
3625 | Some of the errors, like the values not fitting in required boundaries, are |
2575 | not signaled during those intermediate passes, since it may happen that when |
3626 | not signaled during those intermediate passes, since it may happen that when |
2576 | some of the values are predicted better, these errors will disappear. However |
3627 | some of the values are predicted better, these errors will disappear. However |
2577 | if assembler meets some illegal syntax construction or unknown instruction, it |
3628 | if assembler meets some illegal syntax construction or unknown instruction, it |
2578 | always stops immediately. Also defining some label more than once causes such |
3629 | always stops immediately. Also defining some label more than once causes such |
2579 | error, because it makes the predictions groundless. |
3630 | error, because it makes the predictions groundless. |
2580 | Only the messages created with the "display" directive during the last |
3631 | Only the messages created with the "display" directive during the last |
2581 | performed pass get actually displayed. In case when the assembly has been |
3632 | performed pass get actually displayed. In case when the assembly has been |
2582 | stopped due to an error, these messages may reflect the predicted values that |
3633 | stopped due to an error, these messages may reflect the predicted values that |
2583 | are not yet resolved correctly. |
3634 | are not yet resolved correctly. |
2584 | The solution may sometimes not exist and in such cases the assembler will |
3635 | The solution may sometimes not exist and in such cases the assembler will |
2585 | never manage to make correct predictions - for this reason there is a limit for |
3636 | never manage to make correct predictions - for this reason there is a limit for |
2586 | a number of passes, and when assembler reaches this limit, it stops and |
3637 | a number of passes, and when assembler reaches this limit, it stops and |
2587 | displays the message that it is not able to generate the correct output. |
3638 | displays the message that it is not able to generate the correct output. |
2588 | Consider the following example: |
3639 | Consider the following example: |
2589 | 3640 | ||
2590 | if ~ defined alpha |
3641 | if ~ defined alpha |
2591 | alpha: |
3642 | alpha: |
2592 | end if |
3643 | end if |
2593 | 3644 | ||
2594 | The "defined" operator gives the true value when the expression following it |
3645 | The "defined" operator gives the true value when the expression following it |
2595 | could be calculated in this place, what in this case means that the "alpha" |
3646 | could be calculated in this place, what in this case means that the "alpha" |
2596 | label is defined somewhere. But the above block causes this label to be defined |
3647 | label is defined somewhere. But the above block causes this label to be defined |
2597 | only when the value given by "defined" operator is false, what leads to an |
3648 | only when the value given by "defined" operator is false, what leads to an |
2598 | antynomy and makes it impossible to resolve such code. When processing the "if" |
3649 | antynomy and makes it impossible to resolve such code. When processing the "if" |
2599 | directive assembler has to predict whether the "alpha" label will be defined |
3650 | directive assembler has to predict whether the "alpha" label will be defined |
2600 | somewhere (it wouldn't have to predict only if the label was already defined |
3651 | somewhere (it wouldn't have to predict only if the label was already defined |
2601 | earlier in this pass), and whatever the prediction is, the opposite always |
3652 | earlier in this pass), and whatever the prediction is, the opposite always |
2602 | happens. Thus the assembly will fail, unless the "alpha" label is defined |
3653 | happens. Thus the assembly will fail, unless the "alpha" label is defined |
2603 | somewhere in source preceding the above block of instructions - in such case, |
3654 | somewhere in source preceding the above block of instructions - in such case, |
2604 | as it was already noted, the prediction is not needed and the block will just |
3655 | as it was already noted, the prediction is not needed and the block will just |
2605 | get skipped. |
3656 | get skipped. |
2606 | The above sample might have been written as a try to define the label only |
3657 | The above sample might have been written as a try to define the label only |
2607 | when it was not yet defined. It fails, because the "defined" operator does |
3658 | when it was not yet defined. It fails, because the "defined" operator does |
2608 | check whether the label is defined anywhere, and this includes the definition |
3659 | check whether the label is defined anywhere, and this includes the definition |
2609 | inside this conditionally processed block. However adding some additional |
3660 | inside this conditionally processed block. However adding some additional |
2610 | condition may make it possible to get it resolved: |
3661 | condition may make it possible to get it resolved: |
2611 | 3662 | ||
2612 | if ~ defined alpha | defined @f |
3663 | if ~ defined alpha | defined @f |
2613 | alpha: |
3664 | alpha: |
2614 | @@: |
3665 | @@: |
2615 | end if |
3666 | end if |
2616 | 3667 | ||
2617 | The "@f" is always the same label as the nearest "@@" symbol in the source |
3668 | The "@f" is always the same label as the nearest "@@" symbol in the source |
2618 | following it, so the above sample would mean the same if any unique name was |
3669 | following it, so the above sample would mean the same if any unique name was |
2619 | used instead of the anonymous label. When "alpha" is not defined in any other |
3670 | used instead of the anonymous label. When "alpha" is not defined in any other |
2620 | place in source, the only possible solution is when this block gets defined, |
3671 | place in source, the only possible solution is when this block gets defined, |
2621 | and this time this doesn't lead to the antynomy, because of the anonymous |
3672 | and this time this doesn't lead to the antynomy, because of the anonymous |
2622 | label which makes this block self-establishing. To better understand this, |
3673 | label which makes this block self-establishing. To better understand this, |
2623 | look at the blocks that has nothing more than this self-establishing: |
3674 | look at the blocks that has nothing more than this self-establishing: |
2624 | 3675 | ||
2625 | if defined @f |
3676 | if defined @f |
2626 | @@: |
3677 | @@: |
2627 | end if |
3678 | end if |
2628 | 3679 | ||
2629 | This is an example of source that may have more than one solution, as both |
3680 | This is an example of source that may have more than one solution, as both |
2630 | cases when this block gets processed or not are equally correct. Which one of |
3681 | cases when this block gets processed or not are equally correct. Which one of |
2631 | those two solutions we get depends on the algorithm on the assembler, in case |
3682 | those two solutions we get depends on the algorithm on the assembler, in case |
2632 | of flat assembler - on the algorithm of predictions. Back to the previous |
3683 | of flat assembler - on the algorithm of predictions. Back to the previous |
2633 | sample, when "alpha" is not defined anywhere else, the condition for "if" block |
3684 | sample, when "alpha" is not defined anywhere else, the condition for "if" block |
2634 | cannot be false, so we are left with only one possible solution, and we can |
3685 | cannot be false, so we are left with only one possible solution, and we can |
2635 | hope the assembler will arrive at it. On the other hand, when "alpha" is |
3686 | hope the assembler will arrive at it. On the other hand, when "alpha" is |
2636 | defined in some other place, we've got two possible solutions again, but one of |
3687 | defined in some other place, we've got two possible solutions again, but one of |
2637 | them causes "alpha" to be defined twice, and such an error causes assembler to |
3688 | them causes "alpha" to be defined twice, and such an error causes assembler to |
2638 | abort the assembly immediately, as this is the kind of error that deeply |
3689 | abort the assembly immediately, as this is the kind of error that deeply |
2639 | disturbs the process of resolving. So we can get such source either correctly |
3690 | disturbs the process of resolving. So we can get such source either correctly |
2640 | resolved or causing an error, and what we get may depend on the internal |
3691 | resolved or causing an error, and what we get may depend on the internal |
2641 | choices made by the assembler. |
3692 | choices made by the assembler. |
2642 | However there are some facts about such choices that are certain. When |
3693 | However there are some facts about such choices that are certain. When |
2643 | assembler has to check whether the given symbol is defined and it was already |
3694 | assembler has to check whether the given symbol is defined and it was already |
2644 | defined in the current pass, no prediction is needed - it was already noted |
3695 | defined in the current pass, no prediction is needed - it was already noted |
2645 | above. And when the given symbol has been defined never before, including all |
3696 | above. And when the given symbol has been defined never before, including all |
2646 | the already finished passes, the assembler predicts it to be not defined. |
3697 | the already finished passes, the assembler predicts it to be not defined. |
2647 | Knowing this, we can expect that the simple self-establishing block shown |
3698 | Knowing this, we can expect that the simple self-establishing block shown |
2648 | above will not be assembled at all and that the previous sample will resolve |
3699 | above will not be assembled at all and that the previous sample will resolve |
2649 | correctly when "alpha" is defined somewhere before our conditional block, |
3700 | correctly when "alpha" is defined somewhere before our conditional block, |
2650 | while it will itself define "alpha" when it's not already defined earlier, thus |
3701 | while it will itself define "alpha" when it's not already defined earlier, thus |
2651 | potentially causing the error because of double definition if the "alpha" is |
3702 | potentially causing the error because of double definition if the "alpha" is |
2652 | also defined somewhere later. |
3703 | also defined somewhere later. |
2653 | The "used" operator may be expected to behave in a similar manner in |
3704 | The "used" operator may be expected to behave in a similar manner in |
2654 | analogous cases, however any other kinds of predictions my not be so simple and |
3705 | analogous cases, however any other kinds of predictions my not be so simple and |
2655 | you should never rely on them this way. |
3706 | you should never rely on them this way. |
2656 | 3707 | The "err" directive, usually used to stop the assembly when some condition is |
|
- | 3708 | met, stops the assembly immediately, regardless of whether the current pass |
|
- | 3709 | is final or intermediate. So even when the condition that caused this directive |
|
- | 3710 | to be interpreted is mispredicted and temporary, and would eventually disappear |
|
- | 3711 | in the later passes, the assembly is stopped anyway. |
|
- | 3712 | The "assert" directive signalizes the error only if its expression is false |
|
- | 3713 | after all the symbols have been resolved. You can use "assert 0" in place of |
|
- | 3714 | "err" when you do not want to have assembly stopped during the intermediate |
|
- | 3715 | passes. |
|
- | 3716 | ||
2657 | 3717 | ||
2658 | 2.3 Preprocessor directives |
3718 | 2.3 Preprocessor directives |
2659 | 3719 | ||
2660 | All preprocessor directives are processed before the main assembly process, |
3720 | All preprocessor directives are processed before the main assembly process, |
2661 | and therefore are not affected by the control directives. At this time also |
3721 | and therefore are not affected by the control directives. At this time also |
2662 | all comments are stripped out. |
3722 | all comments are stripped out. |
2663 | 3723 | ||
2664 | 3724 | ||
2665 | 2.3.1 Including source files |
3725 | 2.3.1 Including source files |
2666 | 3726 | ||
2667 | "include" directive includes the specified source file at the position where |
3727 | "include" directive includes the specified source file at the position where |
2668 | it is used. It should be followed by the quoted name of file that should be |
3728 | it is used. It should be followed by the quoted name of file that should be |
2669 | included, for example: |
3729 | included, for example: |
2670 | 3730 | ||
2671 | include 'macros.inc' |
3731 | include 'macros.inc' |
2672 | 3732 | ||
2673 | The whole included file is preprocessed before preprocessing the lines next |
3733 | The whole included file is preprocessed before preprocessing the lines next |
2674 | to the line containing the "include" directive. There are no limits to the |
3734 | to the line containing the "include" directive. There are no limits to the |
2675 | number of included files as long as they fit in memory. |
3735 | number of included files as long as they fit in memory. |
2676 | The quoted path can contain environment variables enclosed within "%" |
3736 | The quoted path can contain environment variables enclosed within "%" |
2677 | characters, they will be replaced with their values inside the path, both the |
3737 | characters, they will be replaced with their values inside the path, both the |
2678 | "\" and "/" characters are allowed as a path separators. If no absolute path |
3738 | "\" and "/" characters are allowed as a path separators. The file is first |
2679 | is given, the file is first searched for in the directory containing file |
3739 | searched for in the directory containing file which included it and when it is |
- | 3740 | not found there, the search is continued in the directories specified in the |
|
2680 | which included it and when it's not found there, in the directory containing |
3741 | environment variable called INCLUDE (the multiple paths separated with |
- | 3742 | semicolons can be defined there, they will be searched in the same order as |
|
- | 3743 | specified). If file was not found in any of these places, preprocessor looks |
|
2681 | the main source file (the one specified in command line). These rules concern |
3744 | for it in the directory containing the main source file (the one specified in |
2682 | also paths given with the "file" directive. |
3745 | command line). These rules concern also paths given with the "file" directive. |
2683 | 3746 | ||
2684 | 3747 | ||
2685 | 2.3.2 Symbolic constants |
3748 | 2.3.2 Symbolic constants |
2686 | 3749 | ||
2687 | The symbolic constants are different from the numerical constants, before the |
3750 | The symbolic constants are different from the numerical constants, before the |
2688 | assembly process they are replaced with their values everywhere in source |
3751 | assembly process they are replaced with their values everywhere in source |
2689 | lines after their definitions, and anything can become their values. |
3752 | lines after their definitions, and anything can become their values. |
2690 | The definition of symbolic constant consists of name of the constant |
3753 | The definition of symbolic constant consists of name of the constant |
2691 | followed by the "equ" directive. Everything that follows this directive will |
3754 | followed by the "equ" directive. Everything that follows this directive will |
2692 | become the value of constant. If the value of symbolic constant contains |
3755 | become the value of constant. If the value of symbolic constant contains |
2693 | other symbolic constants, they are replaced with their values before assigning |
3756 | other symbolic constants, they are replaced with their values before assigning |
2694 | this value to the new constant. For example: |
3757 | this value to the new constant. For example: |
2695 | 3758 | ||
2696 | d equ dword |
3759 | d equ dword |
2697 | NULL equ d 0 |
3760 | NULL equ d 0 |
2698 | d equ edx |
3761 | d equ edx |
2699 | 3762 | ||
2700 | After these three definitions the value of "NULL" constant is "dword 0" and |
3763 | After these three definitions the value of "NULL" constant is "dword 0" and |
2701 | the value of "d" is "edx". So, for example, "push NULL" will be assembled as |
3764 | the value of "d" is "edx". So, for example, "push NULL" will be assembled as |
2702 | "push dword 0" and "push d" will be assembled as "push edx". And if then the |
3765 | "push dword 0" and "push d" will be assembled as "push edx". And if then the |
2703 | following line was put: |
3766 | following line was put: |
2704 | 3767 | ||
2705 | d equ d,eax |
3768 | d equ d,eax |
2706 | 3769 | ||
2707 | the "d" constant would get the new value of "edx,eax". This way the growing |
3770 | the "d" constant would get the new value of "edx,eax". This way the growing |
2708 | lists of symbols can be defined. |
3771 | lists of symbols can be defined. |
2709 | "restore" directive allows to get back previous value of redefined symbolic |
3772 | "restore" directive allows to get back previous value of redefined symbolic |
2710 | constant. It should be followed by one more names of symbolic constants, |
3773 | constant. It should be followed by one more names of symbolic constants, |
2711 | separated with commas. So "restore d" after the above definitions will give |
3774 | separated with commas. So "restore d" after the above definitions will give |
2712 | "d" constant back the value "edx", the second one will restore it to value |
3775 | "d" constant back the value "edx", the second one will restore it to value |
2713 | "dword", and one more will revert "d" to original meaning as if no such |
3776 | "dword", and one more will revert "d" to original meaning as if no such |
2714 | constant was defined. If there was no constant defined of given name, |
3777 | constant was defined. If there was no constant defined of given name, |
2715 | "restore" won't cause an error, it will be just ignored. |
3778 | "restore" will not cause an error, it will be just ignored. |
2716 | Symbolic constant can be used to adjust the syntax of assembler to personal |
3779 | Symbolic constant can be used to adjust the syntax of assembler to personal |
2717 | preferences. For example the following set of definitions provides the handy |
3780 | preferences. For example the following set of definitions provides the handy |
2718 | shortcuts for all the size operators: |
3781 | shortcuts for all the size operators: |
2719 | 3782 | ||
2720 | b equ byte |
3783 | b equ byte |
2721 | w equ word |
3784 | w equ word |
2722 | d equ dword |
3785 | d equ dword |
2723 | p equ pword |
3786 | p equ pword |
2724 | f equ fword |
3787 | f equ fword |
2725 | q equ qword |
3788 | q equ qword |
2726 | t equ tword |
3789 | t equ tword |
2727 | x equ dqword |
3790 | x equ dqword |
2728 | 3791 | y equ qqword |
|
- | 3792 | ||
2729 | Because symbolic constant may also have an empty value, it can be used to |
3793 | Because symbolic constant may also have an empty value, it can be used to |
2730 | allow the syntax with "offset" word before any address value: |
3794 | allow the syntax with "offset" word before any address value: |
2731 | 3795 | ||
2732 | offset equ |
3796 | offset equ |
2733 | 3797 | ||
2734 | After this definition "mov ax,offset char" will be valid construction for |
3798 | After this definition "mov ax,offset char" will be valid construction for |
2735 | copying the offset of "char" variable into "ax" register, because "offset" is |
3799 | copying the offset of "char" variable into "ax" register, because "offset" is |
2736 | replaced with an empty value, and therefore ignored. |
3800 | replaced with an empty value, and therefore ignored. |
2737 | The "define" directive followed by the name of constant and then the value, |
3801 | The "define" directive followed by the name of constant and then the value, |
2738 | is the alternative way of defining symbolic constant. The only difference |
3802 | is the alternative way of defining symbolic constant. The only difference |
2739 | between "define" and "equ" is that "define" assigns the value as it is, it does |
3803 | between "define" and "equ" is that "define" assigns the value as it is, it does |
2740 | not replace the symbolic constants with their values inside it. |
3804 | not replace the symbolic constants with their values inside it. |
2741 | Symbolic constants can also be defined with the "fix" directive, which has |
3805 | Symbolic constants can also be defined with the "fix" directive, which has |
2742 | the same syntax as "equ", but defines constants of high priority - they are |
3806 | the same syntax as "equ", but defines constants of high priority - they are |
2743 | replaced with their symbolic values even before processing the preprocessor |
3807 | replaced with their symbolic values even before processing the preprocessor |
2744 | directives and macroinstructions, the only exception is "fix" directive |
3808 | directives and macroinstructions, the only exception is "fix" directive |
2745 | itself, which has the highest possible priority, so it allows redefinition of |
3809 | itself, which has the highest possible priority, so it allows redefinition of |
2746 | constants defined this way. |
3810 | constants defined this way. |
2747 | The "fix" directive can be used for syntax adjustments related to directives |
3811 | The "fix" directive can be used for syntax adjustments related to directives |
2748 | of preprocessor, what cannot be done with "equ" directive. For example: |
3812 | of preprocessor, what cannot be done with "equ" directive. For example: |
2749 | 3813 | ||
2750 | incl fix include |
3814 | incl fix include |
2751 | 3815 | ||
2752 | defines a short name for "include" directive, while the similar definition done |
3816 | defines a short name for "include" directive, while the similar definition done |
2753 | with "equ" directive wouldn't give such result, as standard symbolic constants |
3817 | with "equ" directive wouldn't give such result, as standard symbolic constants |
2754 | are replaced with their values after searching the line for preprocessor |
3818 | are replaced with their values after searching the line for preprocessor |
2755 | directives. |
3819 | directives. |
2756 | 3820 | ||
2757 | 3821 | ||
2758 | 2.3.3 Macroinstructions |
3822 | 2.3.3 Macroinstructions |
2759 | 3823 | ||
2760 | "macro" directive allows you to define your own complex instructions, called |
3824 | "macro" directive allows you to define your own complex instructions, called |
2761 | macroinstructions, using which can greatly simplify the process of |
3825 | macroinstructions, using which can greatly simplify the process of |
2762 | programming. In its simplest form it's similar to symbolic constant |
3826 | programming. In its simplest form it's similar to symbolic constant |
2763 | definition. For example the following definition defines a shortcut for the |
3827 | definition. For example the following definition defines a shortcut for the |
2764 | "test al,0xFF" instruction: |
3828 | "test al,0xFF" instruction: |
2765 | 3829 | ||
2766 | macro tst {test al,0xFF} |
3830 | macro tst {test al,0xFF} |
2767 | 3831 | ||
2768 | After the "macro" directive there is a name of macroinstruction and then its |
3832 | After the "macro" directive there is a name of macroinstruction and then its |
2769 | contents enclosed between the "{" and "}" characters. You can use "tst" |
3833 | contents enclosed between the "{" and "}" characters. You can use "tst" |
2770 | instruction anywhere after this definition and it will be assembled as |
3834 | instruction anywhere after this definition and it will be assembled as |
2771 | "test al,0xFF". Defining symbolic constant "tst" of that value would give the |
3835 | "test al,0xFF". Defining symbolic constant "tst" of that value would give the |
2772 | similar result, but the difference is that the name of macroinstruction is |
3836 | similar result, but the difference is that the name of macroinstruction is |
2773 | recognized only as an instruction mnemonic. Also, macroinstructions are |
3837 | recognized only as an instruction mnemonic. Also, macroinstructions are |
2774 | replaced with corresponding code even before the symbolic constants are |
3838 | replaced with corresponding code even before the symbolic constants are |
2775 | replaced with their values. So if you define macroinstruction and symbolic |
3839 | replaced with their values. So if you define macroinstruction and symbolic |
2776 | constant of the same name, and use this name as an instruction mnemonic, it |
3840 | constant of the same name, and use this name as an instruction mnemonic, it |
2777 | will be replaced with the contents of macroinstruction, but it will be |
3841 | will be replaced with the contents of macroinstruction, but it will be |
2778 | replaced with value if symbolic constant if used somewhere inside the |
3842 | replaced with value if symbolic constant if used somewhere inside the |
2779 | operands. |
3843 | operands. |
2780 | The definition of macroinstruction can consist of many lines, because |
3844 | The definition of macroinstruction can consist of many lines, because |
2781 | "{" and "}" characters don't have to be in the same line as "macro" directive. |
3845 | "{" and "}" characters don't have to be in the same line as "macro" directive. |
2782 | For example: |
3846 | For example: |
2783 | 3847 | ||
2784 | macro stos0 |
3848 | macro stos0 |
2785 | { |
3849 | { |
2786 | xor al,al |
3850 | xor al,al |
2787 | stosb |
3851 | stosb |
2788 | } |
3852 | } |
2789 | 3853 | ||
2790 | The macroinstruction "stos0" will be replaced with these two assembly |
3854 | The macroinstruction "stos0" will be replaced with these two assembly |
2791 | instructions anywhere it's used. |
3855 | instructions anywhere it's used. |
2792 | Like instructions which needs some number of operands, the macroinstruction |
3856 | Like instructions which needs some number of operands, the macroinstruction |
2793 | can be defined to need some number of arguments separated with commas. The |
3857 | can be defined to need some number of arguments separated with commas. The |
2794 | names of needed argument should follow the name of macroinstruction in the |
3858 | names of needed argument should follow the name of macroinstruction in the |
2795 | line of "macro" directive and should be separated with commas if there is more |
3859 | line of "macro" directive and should be separated with commas if there is more |
2796 | than one. Anywhere one of these names occurs in the contents of |
3860 | than one. Anywhere one of these names occurs in the contents of |
2797 | macroinstruction, it will be replaced with corresponding value, provided when |
3861 | macroinstruction, it will be replaced with corresponding value, provided when |
2798 | the macroinstruction is used. Here is an example of a macroinstruction that |
3862 | the macroinstruction is used. Here is an example of a macroinstruction that |
2799 | will do data alignment for binary output format: |
3863 | will do data alignment for binary output format: |
2800 | 3864 | ||
2801 | macro align value { rb (value-1)-($+value-1) mod value } |
3865 | macro align value { rb (value-1)-($+value-1) mod value } |
2802 | 3866 | ||
2803 | When the "align 4" instruction is found after this macroinstruction is |
3867 | When the "align 4" instruction is found after this macroinstruction is |
2804 | defined, it will be replaced with contents of this macroinstruction, and the |
3868 | defined, it will be replaced with contents of this macroinstruction, and the |
2805 | "value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4". |
3869 | "value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4". |
2806 | If a macroinstruction is defined that uses an instruction with the same name |
3870 | If a macroinstruction is defined that uses an instruction with the same name |
2807 | inside its definition, the previous meaning of this name is used. Useful |
3871 | inside its definition, the previous meaning of this name is used. Useful |
2808 | redefinition of macroinstructions can be done in that way, for example: |
3872 | redefinition of macroinstructions can be done in that way, for example: |
2809 | 3873 | ||
2810 | macro mov op1,op2 |
3874 | macro mov op1,op2 |
2811 | { |
3875 | { |
2812 | if op1 in |
3876 | if op1 in |
2813 | push op2 |
3877 | push op2 |
2814 | pop op1 |
3878 | pop op1 |
2815 | else |
3879 | else |
2816 | mov op1,op2 |
3880 | mov op1,op2 |
2817 | end if |
3881 | end if |
2818 | } |
3882 | } |
2819 | 3883 | ||
2820 | This macroinstruction extends the syntax of "mov" instruction, allowing both |
3884 | This macroinstruction extends the syntax of "mov" instruction, allowing both |
2821 | operands to be segment registers. For example "mov ds,es" will be assembled as |
3885 | operands to be segment registers. For example "mov ds,es" will be assembled as |
2822 | "push es" and "pop ds". In all other cases the standard "mov" instruction will |
3886 | "push es" and "pop ds". In all other cases the standard "mov" instruction will |
2823 | be used. The syntax of this "mov" can be extended further by defining next |
3887 | be used. The syntax of this "mov" can be extended further by defining next |
2824 | macroinstruction of that name, which will use the previous macroinstruction: |
3888 | macroinstruction of that name, which will use the previous macroinstruction: |
2825 | 3889 | ||
2826 | macro mov op1,op2,op3 |
3890 | macro mov op1,op2,op3 |
2827 | { |
3891 | { |
2828 | if op3 eq |
3892 | if op3 eq |
2829 | mov op1,op2 |
3893 | mov op1,op2 |
2830 | else |
3894 | else |
2831 | mov op1,op2 |
3895 | mov op1,op2 |
2832 | mov op2,op3 |
3896 | mov op2,op3 |
2833 | end if |
3897 | end if |
2834 | } |
3898 | } |
2835 | 3899 | ||
2836 | It allows "mov" instruction to have three operands, but it can still have two |
3900 | It allows "mov" instruction to have three operands, but it can still have two |
2837 | operands only, because when macroinstruction is given less arguments than it |
3901 | operands only, because when macroinstruction is given less arguments than it |
2838 | needs, the rest of arguments will have empty values. When three operands are |
3902 | needs, the rest of arguments will have empty values. When three operands are |
2839 | given, this macroinstruction will become two macroinstructions of the previous |
3903 | given, this macroinstruction will become two macroinstructions of the previous |
2840 | definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and |
3904 | definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and |
2841 | "mov ds,dx". |
3905 | "mov ds,dx". |
2842 | By placing the "*" after the name of argument you can mark the argument as |
3906 | By placing the "*" after the name of argument you can mark the argument as |
2843 | required - preprocessor won't allow it to have an empty value. For example the |
3907 | required - preprocessor will not allow it to have an empty value. For example |
2844 | above macroinstruction could be declared as "macro mov op1*,op2*,op3" to make |
3908 | the above macroinstruction could be declared as "macro mov op1*,op2*,op3" to |
2845 | sure that first two arguments will always have to be given some non empty |
3909 | make sure that first two arguments will always have to be given some non empty |
2846 | values. |
3910 | values. |
2847 | When it's needed to provide macroinstruction with argument that contains |
3911 | Alternatively, you can provide the default value for argument, by placing |
- | 3912 | the "=" followed by value after the name of argument. Then if the argument |
|
- | 3913 | has an empty value provided, the default value will be used instead. |
|
- | 3914 | When it's needed to provide macroinstruction with argument that contains |
|
2848 | some commas, such argument should be enclosed between "<" and ">" characters. |
3915 | some commas, such argument should be enclosed between "<" and ">" characters. |
2849 | If it contains more than one "<" character, the same number of ">" should be |
3916 | If it contains more than one "<" character, the same number of ">" should be |
2850 | used to tell that the value of argument ends. |
3917 | used to tell that the value of argument ends. |
2851 | "purge" directive allows removing the last definition of specified |
3918 | "purge" directive allows removing the last definition of specified |
2852 | macroinstruction. It should be followed by one or more names of |
3919 | macroinstruction. It should be followed by one or more names of |
2853 | macroinstructions, separated with commas. If such macroinstruction has not |
3920 | macroinstructions, separated with commas. If such macroinstruction has not |
2854 | been defined, you won't get any error. For example after having the syntax of |
3921 | been defined, you will not get any error. For example after having the syntax |
2855 | "mov" extended with the macroinstructions defined above, you can disable |
3922 | of "mov" extended with the macroinstructions defined above, you can disable |
2856 | syntax with three operands back by using "purge mov" directive. Next |
3923 | syntax with three operands back by using "purge mov" directive. Next |
2857 | "purge mov" will disable also syntax for two operands being segment registers, |
3924 | "purge mov" will disable also syntax for two operands being segment registers, |
2858 | and all the next such directives will do nothing. |
3925 | and all the next such directives will do nothing. |
2859 | If after the "macro" directive you enclose some group of arguments' names in |
3926 | If after the "macro" directive you enclose some group of arguments' names in |
2860 | square brackets, it will allow giving more values for this group of arguments |
3927 | square brackets, it will allow giving more values for this group of arguments |
2861 | when using that macroinstruction. Any more argument given after the last |
3928 | when using that macroinstruction. Any more argument given after the last |
2862 | argument of such group will begin the new group and will become the first |
3929 | argument of such group will begin the new group and will become the first |
2863 | argument of it. That's why after closing the square bracket no more argument |
3930 | argument of it. That's why after closing the square bracket no more argument |
2864 | names can follow. The contents of macroinstruction will be processed for each |
3931 | names can follow. The contents of macroinstruction will be processed for each |
2865 | such group of arguments separately. The simplest example is to enclose one |
3932 | such group of arguments separately. The simplest example is to enclose one |
2866 | argument name in square brackets: |
3933 | argument name in square brackets: |
2867 | 3934 | ||
2868 | macro stoschar [char] |
3935 | macro stoschar [char] |
2869 | { |
3936 | { |
2870 | mov al,char |
3937 | mov al,char |
2871 | stosb |
3938 | stosb |
2872 | } |
3939 | } |
2873 | 3940 | ||
2874 | This macroinstruction accepts unlimited number of arguments, and each one |
3941 | This macroinstruction accepts unlimited number of arguments, and each one |
2875 | will be processed into these two instructions separately. For example |
3942 | will be processed into these two instructions separately. For example |
2876 | "stoschar 1,2,3" will be assembled as the following instructions: |
3943 | "stoschar 1,2,3" will be assembled as the following instructions: |
2877 | 3944 | ||
2878 | mov al,1 |
3945 | mov al,1 |
2879 | stosb |
3946 | stosb |
2880 | mov al,2 |
3947 | mov al,2 |
2881 | stosb |
3948 | stosb |
2882 | mov al,3 |
3949 | mov al,3 |
2883 | stosb |
3950 | stosb |
2884 | 3951 | ||
2885 | There are some special directives available only inside the definitions of |
3952 | There are some special directives available only inside the definitions of |
2886 | macroinstructions. "local" directive defines local names, which will be |
3953 | macroinstructions. "local" directive defines local names, which will be |
2887 | replaced with unique values each time the macroinstruction is used. It should |
3954 | replaced with unique values each time the macroinstruction is used. It should |
2888 | be followed by names separated with commas. If the name given as parameter to |
3955 | be followed by names separated with commas. If the name given as parameter to |
2889 | "local" directive begins with a dot or two dots, the unique labels generated |
3956 | "local" directive begins with a dot or two dots, the unique labels generated |
2890 | by each evaluation of macroinstruction will have the same properties. |
3957 | by each evaluation of macroinstruction will have the same properties. |
2891 | This directive is usually needed for the constants or labels that |
3958 | This directive is usually needed for the constants or labels that |
2892 | macroinstruction defines and uses internally. For example: |
3959 | macroinstruction defines and uses internally. For example: |
2893 | 3960 | ||
2894 | macro movstr |
3961 | macro movstr |
2895 | { |
3962 | { |
2896 | local move |
3963 | local move |
2897 | move: |
3964 | move: |
2898 | lodsb |
3965 | lodsb |
2899 | stosb |
3966 | stosb |
2900 | test al,al |
3967 | test al,al |
2901 | jnz move |
3968 | jnz move |
2902 | } |
3969 | } |
2903 | 3970 | ||
2904 | Each time this macroinstruction is used, "move" will become other unique name |
3971 | Each time this macroinstruction is used, "move" will become other unique name |
2905 | in its instructions, so you won't get an error you normally get when some |
3972 | in its instructions, so you will not get an error you normally get when some |
2906 | label is defined more than once. |
3973 | label is defined more than once. |
2907 | "forward", "reverse" and "common" directives divide macroinstruction into |
3974 | "forward", "reverse" and "common" directives divide macroinstruction into |
2908 | blocks, each one processed after the processing of previous is finished. They |
3975 | blocks, each one processed after the processing of previous is finished. They |
2909 | differ in behavior only if macroinstruction allows multiple groups of |
3976 | differ in behavior only if macroinstruction allows multiple groups of |
2910 | arguments. Block of instructions that follows "forward" directive is processed |
3977 | arguments. Block of instructions that follows "forward" directive is processed |
2911 | for each group of arguments, from first to last - exactly like the default |
3978 | for each group of arguments, from first to last - exactly like the default |
2912 | block (not preceded by any of these directives). Block that follows "reverse" |
3979 | block (not preceded by any of these directives). Block that follows "reverse" |
2913 | directive is processed for each group of argument in reverse order - from last |
3980 | directive is processed for each group of argument in reverse order - from last |
2914 | to first. Block that follows "common" directive is processed only once, |
3981 | to first. Block that follows "common" directive is processed only once, |
2915 | commonly for all groups of arguments. Local name defined in one of the blocks |
3982 | commonly for all groups of arguments. Local name defined in one of the blocks |
2916 | is available in all the following blocks when processing the same group of |
3983 | is available in all the following blocks when processing the same group of |
2917 | arguments as when it was defined, and when it is defined in common block it is |
3984 | arguments as when it was defined, and when it is defined in common block it is |
2918 | available in all the following blocks not depending on which group of |
3985 | available in all the following blocks not depending on which group of |
2919 | arguments is processed. |
3986 | arguments is processed. |
2920 | Here is an example of macroinstruction that will create the table of |
3987 | Here is an example of macroinstruction that will create the table of |
2921 | addresses to strings followed by these strings: |
3988 | addresses to strings followed by these strings: |
2922 | 3989 | ||
2923 | macro strtbl name,[string] |
3990 | macro strtbl name,[string] |
2924 | { |
3991 | { |
2925 | common |
3992 | common |
2926 | label name dword |
3993 | label name dword |
2927 | forward |
3994 | forward |
2928 | local label |
3995 | local label |
2929 | dd label |
3996 | dd label |
2930 | forward |
3997 | forward |
2931 | label db string,0 |
3998 | label db string,0 |
2932 | } |
3999 | } |
2933 | 4000 | ||
2934 | First argument given to this macroinstruction will become the label for table |
4001 | First argument given to this macroinstruction will become the label for table |
2935 | of addresses, next arguments should be the strings. First block is processed |
4002 | of addresses, next arguments should be the strings. First block is processed |
2936 | only once and defines the label, second block for each string declares its |
4003 | only once and defines the label, second block for each string declares its |
2937 | local name and defines the table entry holding the address to that string. |
4004 | local name and defines the table entry holding the address to that string. |
2938 | Third block defines the data of each string with the corresponding label. |
4005 | Third block defines the data of each string with the corresponding label. |
2939 | The directive starting the block in macroinstruction can be followed by the |
4006 | The directive starting the block in macroinstruction can be followed by the |
2940 | first instruction of this block in the same line, like in the following |
4007 | first instruction of this block in the same line, like in the following |
2941 | example: |
4008 | example: |
2942 | 4009 | ||
2943 | macro stdcall proc,[arg] |
4010 | macro stdcall proc,[arg] |
2944 | { |
4011 | { |
2945 | reverse push arg |
4012 | reverse push arg |
2946 | common call proc |
4013 | common call proc |
2947 | } |
4014 | } |
2948 | 4015 | ||
2949 | This macroinstruction can be used for calling the procedures using STDCALL |
4016 | This macroinstruction can be used for calling the procedures using STDCALL |
2950 | convention, arguments are pushed on stack in the reverse order. For example |
4017 | convention, which has all the arguments pushed on stack in the reverse order. |
2951 | "stdcall foo,1,2,3" will be assembled as: |
4018 | For example "stdcall foo,1,2,3" will be assembled as: |
2952 | 4019 | ||
2953 | push 3 |
4020 | push 3 |
2954 | push 2 |
4021 | push 2 |
2955 | push 1 |
4022 | push 1 |
2956 | call foo |
4023 | call foo |
2957 | 4024 | ||
2958 | If some name inside macroinstruction has multiple values (it is either one |
4025 | If some name inside macroinstruction has multiple values (it is either one |
2959 | of the arguments enclosed in square brackets or local name defined in the |
4026 | of the arguments enclosed in square brackets or local name defined in the |
2960 | block following "forward" or "reverse" directive) and is used in block |
4027 | block following "forward" or "reverse" directive) and is used in block |
2961 | following the "common" directive, it will be replaced with all of its values, |
4028 | following the "common" directive, it will be replaced with all of its values, |
2962 | separated with commas. For example the following macroinstruction will pass |
4029 | separated with commas. For example the following macroinstruction will pass |
2963 | all of the additional arguments to the previously defined "stdcall" |
4030 | all of the additional arguments to the previously defined "stdcall" |
2964 | macroinstruction: |
4031 | macroinstruction: |
2965 | 4032 | ||
2966 | macro invoke proc,[arg] |
4033 | macro invoke proc,[arg] |
2967 | { common stdcall [proc],arg } |
4034 | { common stdcall [proc],arg } |
2968 | 4035 | ||
2969 | It can be used to call indirectly (by the pointer stored in memory) the |
4036 | It can be used to call indirectly (by the pointer stored in memory) the |
2970 | procedure using STDCALL convention. |
4037 | procedure using STDCALL convention. |
2971 | Inside macroinstruction also special operator "#" can be used. This |
4038 | Inside macroinstruction also special operator "#" can be used. This |
2972 | operator causes two names to be concatenated into one name. It can be useful, |
4039 | operator causes two names to be concatenated into one name. It can be useful, |
2973 | because it's done after the arguments and local names are replaced with their |
4040 | because it's done after the arguments and local names are replaced with their |
2974 | values. The following macroinstruction will generate the conditional jump |
4041 | values. The following macroinstruction will generate the conditional jump |
2975 | according to the "cond" argument: |
4042 | according to the "cond" argument: |
2976 | 4043 | ||
2977 | macro jif op1,cond,op2,label |
4044 | macro jif op1,cond,op2,label |
2978 | { |
4045 | { |
2979 | cmp op1,op2 |
4046 | cmp op1,op2 |
2980 | j#cond label |
4047 | j#cond label |
2981 | } |
4048 | } |
2982 | 4049 | ||
2983 | For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and |
4050 | For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and |
2984 | "jae exit" instructions. |
4051 | "jae exit" instructions. |
2985 | The "#" operator can be also used to concatenate two quoted strings into one. |
4052 | The "#" operator can be also used to concatenate two quoted strings into one. |
2986 | Also conversion of name into a quoted string is possible, with the "`" operator, |
4053 | Also conversion of name into a quoted string is possible, with the "`" operator, |
2987 | which likewise can be used inside the macroinstruction. It convert the name |
4054 | which likewise can be used inside the macroinstruction. It converts the name |
2988 | that follows it into a quoted string - but note, that when it is followed by |
4055 | that follows it into a quoted string - but note, that when it is followed by |
2989 | a macro argument which is being replaced with value containing more than one |
4056 | a macro argument which is being replaced with value containing more than one |
2990 | symbol, only the first of them will be converted, as the "`" operator converts |
4057 | symbol, only the first of them will be converted, as the "`" operator converts |
2991 | only one symbol that immediately follows it. Here's an example of utilizing |
4058 | only one symbol that immediately follows it. Here's an example of utilizing |
2992 | those two features: |
4059 | those two features: |
2993 | 4060 | ||
2994 | macro label name |
4061 | macro label name |
2995 | { |
4062 | { |
2996 | label name |
4063 | label name |
2997 | if ~ used name |
4064 | if ~ used name |
2998 | display `name # " is defined but not used.",13,10 |
4065 | display `name # " is defined but not used.",13,10 |
2999 | end if |
4066 | end if |
3000 | } |
4067 | } |
3001 | 4068 | ||
3002 | When label defined with such macro is not used in the source, macro will warn |
4069 | When label defined with such macro is not used in the source, macro will warn |
3003 | you with the message, informing to which label it applies. |
4070 | you with the message, informing to which label it applies. |
3004 | To make macroinstruction behaving differently when some of the arguments are |
4071 | To make macroinstruction behaving differently when some of the arguments are |
3005 | of some special type, for example a quoted strings, you can use "eqtype" |
4072 | of some special type, for example a quoted strings, you can use "eqtype" |
3006 | comparison operator. Here's an example of utilizing it to distinguish a |
4073 | comparison operator. Here's an example of utilizing it to distinguish a |
3007 | quoted string from an other argument: |
4074 | quoted string from an other argument: |
3008 | 4075 | ||
3009 | macro message arg |
4076 | macro message arg |
3010 | { |
4077 | { |
3011 | if arg eqtype "" |
4078 | if arg eqtype "" |
3012 | local str |
4079 | local str |
3013 | jmp @f |
4080 | jmp @f |
3014 | str db arg,0Dh,0Ah,24h |
4081 | str db arg,0Dh,0Ah,24h |
3015 | @@: |
4082 | @@: |
3016 | mov dx,str |
4083 | mov dx,str |
3017 | else |
4084 | else |
3018 | mov dx,arg |
4085 | mov dx,arg |
3019 | end if |
4086 | end if |
3020 | mov ah,9 |
4087 | mov ah,9 |
3021 | int 21h |
4088 | int 21h |
3022 | } |
4089 | } |
3023 | 4090 | ||
3024 | The above macro is designed for displaying messages in DOS programs. When the |
4091 | The above macro is designed for displaying messages in DOS programs. When the |
3025 | argument of this macro is some number, label, or variable, the string from |
4092 | argument of this macro is some number, label, or variable, the string from |
3026 | that address is displayed, but when the argument is a quoted string, the |
4093 | that address is displayed, but when the argument is a quoted string, the |
3027 | created code will display that string followed by the carriage return and |
4094 | created code will display that string followed by the carriage return and |
3028 | line feed. |
4095 | line feed. |
3029 | It is also possible to put a declaration of macroinstruction inside another |
4096 | It is also possible to put a declaration of macroinstruction inside another |
3030 | macroinstruction, so one macro can define another, but there is a problem |
4097 | macroinstruction, so one macro can define another, but there is a problem |
3031 | with such definitions caused by the fact, that "}" character cannot occur |
4098 | with such definitions caused by the fact, that "}" character cannot occur |
3032 | inside the macroinstruction, as it always means the end of definition. To |
4099 | inside the macroinstruction, as it always means the end of definition. To |
3033 | overcome this problem, the escaping of symbols inside macroinstruction can be |
4100 | overcome this problem, the escaping of symbols inside macroinstruction can be |
3034 | used. This is done by placing one or more backslashes in front of any other |
4101 | used. This is done by placing one or more backslashes in front of any other |
3035 | symbol (even the special character). Preprocessor sees such sequence as a |
4102 | symbol (even the special character). Preprocessor sees such sequence as a |
3036 | single symbol, but each time it meets such symbol during the macroinstruction |
4103 | single symbol, but each time it meets such symbol during the macroinstruction |
3037 | processing, it cuts the backslash character from the front of it. For example |
4104 | processing, it cuts the backslash character from the front of it. For example |
3038 | "\{" is treated as single symbol, but during processing of the macroinstruction |
4105 | "\{" is treated as single symbol, but during processing of the macroinstruction |
3039 | it becomes the "{" symbol. This allows to put one definition of |
4106 | it becomes the "{" symbol. This allows to put one definition of |
3040 | macroinstruction inside another: |
4107 | macroinstruction inside another: |
3041 | 4108 | ||
3042 | macro ext instr |
4109 | macro ext instr |
3043 | { |
4110 | { |
3044 | macro instr op1,op2,op3 |
4111 | macro instr op1,op2,op3 |
3045 | \{ |
4112 | \{ |
3046 | if op3 eq |
4113 | if op3 eq |
3047 | instr op1,op2 |
4114 | instr op1,op2 |
3048 | else |
4115 | else |
3049 | instr op1,op2 |
4116 | instr op1,op2 |
3050 | instr op2,op3 |
4117 | instr op2,op3 |
3051 | end if |
4118 | end if |
3052 | \} |
4119 | \} |
3053 | } |
4120 | } |
3054 | 4121 | ||
3055 | ext add |
4122 | ext add |
3056 | ext sub |
4123 | ext sub |
3057 | 4124 | ||
3058 | The macro "ext" is defined correctly, but when it is used, the "\{" and "\}" |
4125 | The macro "ext" is defined correctly, but when it is used, the "\{" and "\}" |
3059 | become the "{" and "}" symbols. So when the "ext add" is processed, the |
4126 | become the "{" and "}" symbols. So when the "ext add" is processed, the |
3060 | contents of macro becomes valid definition of a macroinstruction and this way |
4127 | contents of macro becomes valid definition of a macroinstruction and this way |
3061 | the "add" macro becomes defined. In the same way "ext sub" defines the "sub" |
4128 | the "add" macro becomes defined. In the same way "ext sub" defines the "sub" |
3062 | macro. The use of "\{" symbol wasn't really necessary here, but is done this |
4129 | macro. The use of "\{" symbol wasn't really necessary here, but is done this |
3063 | way to make the definition more clear. |
4130 | way to make the definition more clear. |
3064 | If some directives specific to macroinstructions, like "local" or "common" |
4131 | If some directives specific to macroinstructions, like "local" or "common" |
3065 | are needed inside some macro embedded this way, they can be escaped in the same |
4132 | are needed inside some macro embedded this way, they can be escaped in the same |
3066 | way. Escaping the symbol with more than one backslash is also allowed, which |
4133 | way. Escaping the symbol with more than one backslash is also allowed, which |
3067 | allows multiple levels of nesting the macroinstruction definitions. |
4134 | allows multiple levels of nesting the macroinstruction definitions. |
3068 | The another technique for defining one macroinstruction by another is to |
4135 | The another technique for defining one macroinstruction by another is to |
3069 | use the "fix" directive, which becomes useful when some macroinstruction only |
4136 | use the "fix" directive, which becomes useful when some macroinstruction only |
3070 | begins the definition of another one, without closing it. For example: |
4137 | begins the definition of another one, without closing it. For example: |
3071 | 4138 | ||
3072 | macro tmacro [params] |
4139 | macro tmacro [params] |
3073 | { |
4140 | { |
3074 | common macro params { |
4141 | common macro params { |
3075 | } |
4142 | } |
3076 | 4143 | ||
3077 | MACRO fix tmacro |
4144 | MACRO fix tmacro |
3078 | ENDM fix } |
4145 | ENDM fix } |
3079 | 4146 | ||
3080 | defines an alternative syntax for defining macroinstructions, which looks like: |
4147 | defines an alternative syntax for defining macroinstructions, which looks like: |
3081 | 4148 | ||
3082 | MACRO stoschar char |
4149 | MACRO stoschar char |
3083 | mov al,char |
4150 | mov al,char |
3084 | stosb |
4151 | stosb |
3085 | ENDM |
4152 | ENDM |
3086 | 4153 | ||
3087 | Note that symbol that has such customized definition must be defined with "fix" |
4154 | Note that symbol that has such customized definition must be defined with "fix" |
3088 | directive, because only the prioritized symbolic constants are processed before |
4155 | directive, because only the prioritized symbolic constants are processed before |
3089 | the preprocessor looks for the "}" character while defining the macro. This |
4156 | the preprocessor looks for the "}" character while defining the macro. This |
3090 | might be a problem if one needed to perform some additional tasks one the end |
4157 | might be a problem if one needed to perform some additional tasks one the end |
3091 | of such definition, but there is one more feature which helps in such cases. |
4158 | of such definition, but there is one more feature which helps in such cases. |
3092 | Namely it is possible to put any directive, instruction or macroinstruction |
4159 | Namely it is possible to put any directive, instruction or macroinstruction |
3093 | just after the "}" character that ends the macroinstruction and it will be |
4160 | just after the "}" character that ends the macroinstruction and it will be |
3094 | processed in the same way as if it was put in the next line. |
4161 | processed in the same way as if it was put in the next line. |
3095 | 4162 | ||
3096 | 4163 | ||
3097 | 2.3.4 Structures |
4164 | 2.3.4 Structures |
3098 | 4165 | ||
3099 | "struc" directive is a special variant of "macro" directive that is used to |
4166 | "struc" directive is a special variant of "macro" directive that is used to |
3100 | define data structures. Macroinstruction defined using the "struc" directive |
4167 | define data structures. Macroinstruction defined using the "struc" directive |
3101 | must be preceded by a label (like the data definition directive) when it's |
4168 | must be preceded by a label (like the data definition directive) when it's |
3102 | used. This label will be also attached at the beginning of every name starting |
4169 | used. This label will be also attached at the beginning of every name starting |
3103 | with dot in the contents of macroinstruction. The macroinstruction defined |
4170 | with dot in the contents of macroinstruction. The macroinstruction defined |
3104 | using the "struc" directive can have the same name as some other |
4171 | using the "struc" directive can have the same name as some other |
3105 | macroinstruction defined using the "macro" directive, structure |
4172 | macroinstruction defined using the "macro" directive, structure |
3106 | macroinstruction won't prevent the standard macroinstruction being processed |
4173 | macroinstruction will not prevent the standard macroinstruction from being |
3107 | when there is no label before it and vice versa. All the rules and features |
4174 | processed when there is no label before it and vice versa. All the rules and |
3108 | concerning standard macroinstructions apply to structure macroinstructions. |
4175 | features concerning standard macroinstructions apply to structure |
3109 | Here is the sample of structure macroinstruction: |
4176 | macroinstructions. |
- | 4177 | Here is the sample of structure macroinstruction: |
|
3110 | 4178 | ||
3111 | struc point x,y |
4179 | struc point x,y |
3112 | { |
4180 | { |
3113 | .x dw x |
4181 | .x dw x |
3114 | .y dw y |
4182 | .y dw y |
3115 | } |
4183 | } |
3116 | 4184 | ||
3117 | For example "my point 7,11" will define structure labeled "my", consisting of |
4185 | For example "my point 7,11" will define structure labeled "my", consisting of |
3118 | two variables: "my.x" with value 7 and "my.y" with value 11. |
4186 | two variables: "my.x" with value 7 and "my.y" with value 11. |
3119 | If somewhere inside the definition of structure the name consisting of a |
4187 | If somewhere inside the definition of structure the name consisting of a |
3120 | single dot it found, it is replaced by the name of the label for the given |
4188 | single dot it found, it is replaced by the name of the label for the given |
3121 | instance of structure and this label will not be defined automatically in |
4189 | instance of structure and this label will not be defined automatically in |
3122 | such case, allowing to completely customize the definition. The following |
4190 | such case, allowing to completely customize the definition. The following |
3123 | example utilizes this feature to extend the data definition directive "db" |
4191 | example utilizes this feature to extend the data definition directive "db" |
3124 | with ability to calculate the size of defined data: |
4192 | with ability to calculate the size of defined data: |
3125 | 4193 | ||
3126 | struc db [data] |
4194 | struc db [data] |
3127 | { |
4195 | { |
3128 | common |
4196 | common |
3129 | . db data |
4197 | . db data |
3130 | .size = $ - . |
4198 | .size = $ - . |
3131 | } |
4199 | } |
3132 | 4200 | ||
3133 | With such definition "msg db 'Hello!',13,10" will define also "msg.size" |
4201 | With such definition "msg db 'Hello!',13,10" will define also "msg.size" |
3134 | constant, equal to the size of defined data in bytes. |
4202 | constant, equal to the size of defined data in bytes. |
3135 | Defining data structures addressed by registers or absolute values should be |
4203 | Defining data structures addressed by registers or absolute values should be |
3136 | done using the "virtual" directive with structure macroinstruction |
4204 | done using the "virtual" directive with structure macroinstruction |
3137 | (see 2.2.4). |
4205 | (see 2.2.4). |
3138 | "restruc" directive removes the last definition of the structure, just like |
4206 | "restruc" directive removes the last definition of the structure, just like |
3139 | "purge" does with macroinstructions and "restore" with symbolic constants. |
4207 | "purge" does with macroinstructions and "restore" with symbolic constants. |
3140 | It also has the same syntax - should be followed by one or more names of |
4208 | It also has the same syntax - should be followed by one or more names of |
3141 | structure macroinstructions, separated with commas. |
4209 | structure macroinstructions, separated with commas. |
3142 | 4210 | ||
3143 | 4211 | ||
3144 | 2.3.5 Repeating macroinstructions |
4212 | 2.3.5 Repeating macroinstructions |
3145 | 4213 | ||
3146 | The "rept" directive is a special kind of macroinstruction, which makes given |
4214 | The "rept" directive is a special kind of macroinstruction, which makes given |
3147 | amount of duplicates of the block enclosed with braces. The basic syntax is |
4215 | amount of duplicates of the block enclosed with braces. The basic syntax is |
3148 | "rept" directive followed by number (it cannot be an expression, since |
4216 | "rept" directive followed by number and then block of source enclosed between |
3149 | preprocessor doesn't do calculations, if you need repetitions based on values |
- | |
3150 | calculated by assembler, use one of the code repeating directives that are |
- | |
3151 | processed by assembler, see 2.2.3), and then block of source enclosed between |
- | |
3152 | the "{" and "}" characters. The simplest example: |
4217 | the "{" and "}" characters. The simplest example: |
3153 | 4218 | ||
3154 | rept 5 { in al,dx } |
4219 | rept 5 { in al,dx } |
3155 | 4220 | ||
3156 | will make five duplicates of the "in al,dx" line. The block of instructions |
4221 | will make five duplicates of the "in al,dx" line. The block of instructions |
3157 | is defined in the same way as for the standard macroinstruction and any |
4222 | is defined in the same way as for the standard macroinstruction and any |
3158 | special operators and directives which can be used only inside |
4223 | special operators and directives which can be used only inside |
3159 | macroinstructions are also allowed here. When the given count is zero, the |
4224 | macroinstructions are also allowed here. When the given count is zero, the |
3160 | block is simply skipped, as if you defined macroinstruction but never used |
4225 | block is simply skipped, as if you defined macroinstruction but never used |
3161 | it. The number of repetitions can be followed by the name of counter symbol, |
4226 | it. The number of repetitions can be followed by the name of counter symbol, |
3162 | which will get replaced symbolically with the number of duplicate currently |
4227 | which will get replaced symbolically with the number of duplicate currently |
3163 | generated. So this: |
4228 | generated. So this: |
3164 | 4229 | ||
3165 | rept 3 counter |
4230 | rept 3 counter |
3166 | { |
4231 | { |
3167 | byte#counter db counter |
4232 | byte#counter db counter |
3168 | } |
4233 | } |
3169 | 4234 | ||
3170 | will generate lines: |
4235 | will generate lines: |
3171 | 4236 | ||
3172 | byte1 db 1 |
4237 | byte1 db 1 |
3173 | byte2 db 2 |
4238 | byte2 db 2 |
3174 | byte3 db 3 |
4239 | byte3 db 3 |
3175 | 4240 | ||
3176 | The repetition mechanism applied to "rept" blocks is the same as the one used |
4241 | The repetition mechanism applied to "rept" blocks is the same as the one used |
3177 | to process multiple groups of arguments for macroinstructions, so directives |
4242 | to process multiple groups of arguments for macroinstructions, so directives |
3178 | like "forward", "common" and "reverse" can be used in their usual meaning. |
4243 | like "forward", "common" and "reverse" can be used in their usual meaning. |
3179 | Thus such macroinstruction: |
4244 | Thus such macroinstruction: |
3180 | 4245 | ||
3181 | rept 7 num { reverse display `num } |
4246 | rept 7 num { reverse display `num } |
3182 | 4247 | ||
3183 | will display digits from 7 to 1 as text. The "local" directive behaves in the |
4248 | will display digits from 7 to 1 as text. The "local" directive behaves in the |
3184 | same way as inside macroinstruction with multiple groups of arguments, so: |
4249 | same way as inside macroinstruction with multiple groups of arguments, so: |
3185 | 4250 | ||
3186 | rept 21 |
4251 | rept 21 |
3187 | { |
4252 | { |
3188 | local label |
4253 | local label |
3189 | label: loop label |
4254 | label: loop label |
3190 | } |
4255 | } |
3191 | 4256 | ||
3192 | will generate unique label for each duplicate. |
4257 | will generate unique label for each duplicate. |
3193 | The counter symbol by default counts from 1, but you can declare different |
4258 | The counter symbol by default counts from 1, but you can declare different |
3194 | base value by placing the number preceded by colon immediately after the name |
4259 | base value by placing the number preceded by colon immediately after the name |
3195 | of counter. For example: |
4260 | of counter. For example: |
3196 | 4261 | ||
3197 | rept 8 n:0 { pxor xmm#n,xmm#n } |
4262 | rept 8 n:0 { pxor xmm#n,xmm#n } |
3198 | 4263 | ||
3199 | will generate code which will clear the contents of eight SSE registers. |
4264 | will generate code which will clear the contents of eight SSE registers. |
3200 | You can define multiple counters separated with commas, and each one can have |
4265 | You can define multiple counters separated with commas, and each one can have |
3201 | different base. |
4266 | different base. |
3202 | The "irp" directive iterates the single argument through the given list of |
4267 | The number of repetitions and the base values for counters can be specified |
- | 4268 | using the numerical expressions with operator rules identical as in the case |
|
- | 4269 | of assembler. However each value used in such expression must either be a |
|
- | 4270 | directly specified number, or a symbolic constant with value also being an |
|
- | 4271 | expression that can be calculated by preprocessor (in such case the value |
|
- | 4272 | of expression associated with symbolic constant is calculated first, and then |
|
- | 4273 | substituted into the outer expression in place of that constant). If you need |
|
- | 4274 | repetitions based on values that can only be calculated at assembly time, use |
|
- | 4275 | one of the code repeating directives that are processed by assembler, see |
|
- | 4276 | section 2.2.3. |
|
- | 4277 | The "irp" directive iterates the single argument through the given list of |
|
3203 | parameters. The syntax is "irp" followed by the argument name, then the comma |
4278 | parameters. The syntax is "irp" followed by the argument name, then the comma |
3204 | and then the list of parameters. The parameters are specified in the same |
4279 | and then the list of parameters. The parameters are specified in the same |
3205 | way like in the invocation of standard macroinstruction, so they have to be |
4280 | way like in the invocation of standard macroinstruction, so they have to be |
3206 | separated with commas and each one can be enclosed with the "<" and ">" |
4281 | separated with commas and each one can be enclosed with the "<" and ">" |
3207 | characters. Also the name of argument may be followed by "*" to mark that it |
4282 | characters. Also the name of argument may be followed by "*" to mark that it |
3208 | cannot get an empty value. Such block: |
4283 | cannot get an empty value. Such block: |
3209 | 4284 | ||
3210 | irp value, 2,3,5 |
4285 | irp value, 2,3,5 |
3211 | { db value } |
4286 | { db value } |
3212 | 4287 | ||
3213 | will generate lines: |
4288 | will generate lines: |
3214 | 4289 | ||
3215 | db 2 |
4290 | db 2 |
3216 | db 3 |
4291 | db 3 |
3217 | db 5 |
4292 | db 5 |
3218 | 4293 | ||
3219 | The "irps" directive iterates through the given list of symbols, it should |
4294 | The "irps" directive iterates through the given list of symbols, it should |
3220 | be followed by the argument name, then the comma and then the sequence of any |
4295 | be followed by the argument name, then the comma and then the sequence of any |
3221 | symbols. Each symbol in this sequence, no matter whether it is the name |
4296 | symbols. Each symbol in this sequence, no matter whether it is the name |
3222 | symbol, symbol character or quoted string, becomes an argument value for one |
4297 | symbol, symbol character or quoted string, becomes an argument value for one |
3223 | iteration. If there are no symbols following the comma, no iteration is done |
4298 | iteration. If there are no symbols following the comma, no iteration is done |
3224 | at all. This example: |
4299 | at all. This example: |
3225 | 4300 | ||
3226 | irps reg, al bx ecx |
4301 | irps reg, al bx ecx |
3227 | { xor reg,reg } |
4302 | { xor reg,reg } |
3228 | 4303 | ||
3229 | will generate lines: |
4304 | will generate lines: |
3230 | 4305 | ||
3231 | xor al,al |
4306 | xor al,al |
3232 | xor bx,bx |
4307 | xor bx,bx |
3233 | xor ecx,ecx |
4308 | xor ecx,ecx |
3234 | 4309 | ||
3235 | The blocks defined by the "irp" and "irps" directives are also processed in |
4310 | The blocks defined by the "irp" and "irps" directives are also processed in |
3236 | the same way as any macroinstructions, so operators and directives specific |
4311 | the same way as any macroinstructions, so operators and directives specific |
3237 | to macroinstructions may be freely used also in this case. |
4312 | to macroinstructions may be freely used also in this case. |
3238 | 4313 | ||
3239 | 4314 | ||
3240 | 2.3.6 Conditional preprocessing |
4315 | 2.3.6 Conditional preprocessing |
3241 | 4316 | ||
3242 | "match" directive causes some block of source to be preprocessed and passed |
4317 | "match" directive causes some block of source to be preprocessed and passed |
3243 | to assembler only when the given sequence of symbols matches the specified |
4318 | to assembler only when the given sequence of symbols matches the specified |
3244 | pattern. The pattern comes first, ended with comma, then the symbols that have |
4319 | pattern. The pattern comes first, ended with comma, then the symbols that have |
3245 | to be matched with the pattern, and finally the block of source, enclosed |
4320 | to be matched with the pattern, and finally the block of source, enclosed |
3246 | within braces as macroinstruction. |
4321 | within braces as macroinstruction. |
3247 | There are the few rules for building the expression for matching, first is |
4322 | There are the few rules for building the expression for matching, first is |
3248 | that any of symbol characters and any quoted string should be matched exactly |
4323 | that any of symbol characters and any quoted string should be matched exactly |
3249 | as is. In this example: |
4324 | as is. In this example: |
3250 | 4325 | ||
3251 | match +,+ { include 'first.inc' } |
4326 | match +,+ { include 'first.inc' } |
3252 | match +,- { include 'second.inc' } |
4327 | match +,- { include 'second.inc' } |
3253 | 4328 | ||
3254 | the first file will get included, since "+" after comma matches the "+" in |
4329 | the first file will get included, since "+" after comma matches the "+" in |
3255 | pattern, and the second file won't be included, since there is no match. |
4330 | pattern, and the second file will not be included, since there is no match. |
3256 | To match any other symbol literally, it has to be preceded by "=" character |
4331 | To match any other symbol literally, it has to be preceded by "=" character |
3257 | in the pattern. Also to match the "=" character itself, or the comma, the |
4332 | in the pattern. Also to match the "=" character itself, or the comma, the |
3258 | "==" and "=," constructions have to be used. For example the "=a==" pattern |
4333 | "==" and "=," constructions have to be used. For example the "=a==" pattern |
3259 | will match the "a=" sequence. |
4334 | will match the "a=" sequence. |
3260 | If some name symbol is placed in the pattern, it matches any sequence |
4335 | If some name symbol is placed in the pattern, it matches any sequence |
3261 | consisting of at least one symbol and then this name is replaced with the |
4336 | consisting of at least one symbol and then this name is replaced with the |
3262 | matched sequence everywhere inside the following block, analogously to the |
4337 | matched sequence everywhere inside the following block, analogously to the |
3263 | parameters of macroinstruction. For instance: |
4338 | parameters of macroinstruction. For instance: |
3264 | 4339 | ||
3265 | match a-b, 0-7 |
4340 | match a-b, 0-7 |
3266 | { dw a,b-a } |
4341 | { dw a,b-a } |
3267 | 4342 | ||
3268 | will generate the "dw 0,7-0" instruction. Each name is always matched with |
4343 | will generate the "dw 0,7-0" instruction. Each name is always matched with |
3269 | as few symbols as possible, leaving the rest for the following ones, so in |
4344 | as few symbols as possible, leaving the rest for the following ones, so in |
3270 | this case: |
4345 | this case: |
3271 | 4346 | ||
3272 | match a b, 1+2+3 { db a } |
4347 | match a b, 1+2+3 { db a } |
3273 | 4348 | ||
3274 | the "a" name will match the "1" symbol, leaving the "+2+3" sequence to be |
4349 | the "a" name will match the "1" symbol, leaving the "+2+3" sequence to be |
3275 | matched with "b". But in this case: |
4350 | matched with "b". But in this case: |
3276 | 4351 | ||
3277 | match a b, 1 { db a } |
4352 | match a b, 1 { db a } |
3278 | 4353 | ||
3279 | there will be nothing left for "b" to match, so the block won't get processed |
4354 | there will be nothing left for "b" to match, so the block will not get |
3280 | at all. |
4355 | processed at all. |
3281 | The block of source defined by match is processed in the same way as any |
4356 | The block of source defined by match is processed in the same way as any |
3282 | macroinstruction, so any operators specific to macroinstructions can be used |
4357 | macroinstruction, so any operators specific to macroinstructions can be used |
3283 | also in this case. |
4358 | also in this case. |
3284 | What makes "match" directive more useful is the fact, that it replaces the |
4359 | What makes "match" directive more useful is the fact, that it replaces the |
3285 | symbolic constants with their values in the matched sequence of symbols (that |
4360 | symbolic constants with their values in the matched sequence of symbols (that |
3286 | is everywhere after comma up to the beginning of the source block) before |
4361 | is everywhere after comma up to the beginning of the source block) before |
3287 | performing the match. Thanks to this it can be used for example to process |
4362 | performing the match. Thanks to this it can be used for example to process |
3288 | some block of source under the condition that some symbolic constant has the |
4363 | some block of source under the condition that some symbolic constant has the |
3289 | given value, like: |
4364 | given value, like: |
3290 | 4365 | ||
3291 | match =TRUE, DEBUG { include 'debug.inc' } |
4366 | match =TRUE, DEBUG { include 'debug.inc' } |
3292 | 4367 | ||
3293 | which will include the file only when the symbolic constant "DEBUG" was |
4368 | which will include the file only when the symbolic constant "DEBUG" was |
3294 | defined with value "TRUE". |
4369 | defined with value "TRUE". |
3295 | 4370 | ||
3296 | 4371 | ||
3297 | 2.3.7 Order of processing |
4372 | 2.3.7 Order of processing |
3298 | 4373 | ||
3299 | When combining various features of the preprocessor, it's important to know |
4374 | When combining various features of the preprocessor, it's important to know |
3300 | the order in which they are processed. As it was already noted, the highest |
4375 | the order in which they are processed. As it was already noted, the highest |
3301 | priority has the "fix" directive and the replacements defined with it. This |
4376 | priority has the "fix" directive and the replacements defined with it. This |
3302 | is done completely before doing any other preprocessing, therefore this |
4377 | is done completely before doing any other preprocessing, therefore this |
3303 | piece of source: |
4378 | piece of source: |
3304 | 4379 | ||
3305 | V fix { |
4380 | V fix { |
3306 | macro empty |
4381 | macro empty |
3307 | V |
4382 | V |
3308 | V fix } |
4383 | V fix } |
3309 | V |
4384 | V |
3310 | 4385 | ||
3311 | becomes a valid definition of an empty macroinstruction. It can be interpreted |
4386 | becomes a valid definition of an empty macroinstruction. It can be interpreted |
3312 | that the "fix" directive and prioritized symbolic constants are processed in |
4387 | that the "fix" directive and prioritized symbolic constants are processed in |
3313 | a separate stage, and all other preprocessing is done after on the resulting |
4388 | a separate stage, and all other preprocessing is done after on the resulting |
3314 | source. |
4389 | source. |
3315 | The standard preprocessing that comes after, on each line begins with |
4390 | The standard preprocessing that comes after, on each line begins with |
3316 | recognition of the first symbol. It begins with checking for the preprocessor |
4391 | recognition of the first symbol. It starts with checking for the preprocessor |
3317 | directives, and when none of them is detected, preprocessor checks whether the |
4392 | directives, and when none of them is detected, preprocessor checks whether the |
3318 | first symbol is macroinstruction. If no macroinstruction is found, it moves |
4393 | first symbol is macroinstruction. If no macroinstruction is found, it moves |
3319 | to the second symbol of line, and again begins with checking for directives, |
4394 | to the second symbol of line, and again begins with checking for directives, |
3320 | which in this case is only the "equ" directive, as this is the only one that |
4395 | which in this case is only the "equ" directive, as this is the only one that |
3321 | occurs as the second symbol in line. If there's no directive, the second |
4396 | occurs as the second symbol in line. If there is no directive, the second |
3322 | symbol is checked for the case of structure macroinstruction and when none |
4397 | symbol is checked for the case of structure macroinstruction and when none |
3323 | of those checks gives the positive result, the symbolic constants are replaced |
4398 | of those checks gives the positive result, the symbolic constants are replaced |
3324 | with their values and such line is passed to the assembler. |
4399 | with their values and such line is passed to the assembler. |
3325 | To see it on the example, assume that there is defined the macroinstruction |
4400 | To see it on the example, assume that there is defined the macroinstruction |
3326 | called "foo" and the structure macroinstruction called "bar". Those lines: |
4401 | called "foo" and the structure macroinstruction called "bar". Those lines: |
3327 | 4402 | ||
3328 | foo equ |
4403 | foo equ |
3329 | foo bar |
4404 | foo bar |
3330 | 4405 | ||
3331 | would be then both interpreted as invocations of macroinstruction "foo", since |
4406 | would be then both interpreted as invocations of macroinstruction "foo", since |
3332 | the meaning of the first symbol overrides the meaning of second one. |
4407 | the meaning of the first symbol overrides the meaning of second one. |
3333 | The macroinstructions generate the new lines from their definition blocks, |
4408 | When the macroinstruction generates the new lines from its definition block, |
3334 | replacing the parameters with their values and then processing the "#" and "`" |
4409 | in every line it first scans for macroinstruction directives, and interpretes |
- | 4410 | them accordingly. All the other content in the definition block is used to |
|
- | 4411 | brew the new lines, replacing the macroinstruction parameters with their values |
|
- | 4412 | and then processing the symbol escaping and "#" and "`" operators. The |
|
3335 | operators. The conversion operator has the higher priority than concatenation. |
4413 | conversion operator has the higher priority than concatenation and if any of |
3336 | After this is completed, the newly generated line goes through the standard |
4414 | them operates on the escaped symbol, the escaping is cancelled before finishing |
- | 4415 | the operation. After this is completed, the newly generated line goes through |
|
3337 | preprocessing, as described above. |
4416 | the standard preprocessing, as described above. |
3338 | Though the symbolic constants are usually only replaced in the lines, where |
4417 | Though the symbolic constants are usually only replaced in the lines, where |
3339 | no preprocessor directives nor macroinstructions has been found, there are some |
4418 | no preprocessor directives nor macroinstructions has been found, there are some |
3340 | special cases where those replacements are performed in the parts of lines |
4419 | special cases where those replacements are performed in the parts of lines |
3341 | containing directives. First one is the definition of symbolic constant, where |
4420 | containing directives. First one is the definition of symbolic constant, where |
3342 | the replacements are done everywhere after the "equ" keyword and the resulting |
4421 | the replacements are done everywhere after the "equ" keyword and the resulting |
3343 | value is then assigned to the new constant (see 2.3.2). The second such case |
4422 | value is then assigned to the new constant (see 2.3.2). The second such case |
3344 | is the "match" directive, where the replacements are done in the symbols |
4423 | is the "match" directive, where the replacements are done in the symbols |
3345 | following comma before matching them with pattern. These features can be used |
4424 | following comma before matching them with pattern. These features can be used |
3346 | for example to maintain the lists, like this set of definitions: |
4425 | for example to maintain the lists, like this set of definitions: |
3347 | 4426 | ||
3348 | list equ |
4427 | list equ |
3349 | 4428 | ||
3350 | macro append item |
4429 | macro append item |
3351 | { |
4430 | { |
3352 | match any, list \{ list equ list,item \} |
4431 | match any, list \{ list equ list,item \} |
3353 | match , list \{ list equ item \} |
4432 | match , list \{ list equ item \} |
3354 | } |
4433 | } |
3355 | 4434 | ||
3356 | The "list" constant is here initialized with empty value, and the "append" |
4435 | The "list" constant is here initialized with empty value, and the "append" |
3357 | macroinstruction can be used to add the new items into this list, separating |
4436 | macroinstruction can be used to add the new items into this list, separating |
3358 | them with commas. The first match in this macroinstruction occurs only when |
4437 | them with commas. The first match in this macroinstruction occurs only when |
3359 | the value of list is not empty (see 2.3.6), in such case the new value for the |
4438 | the value of list is not empty (see 2.3.6), in such case the new value for the |
3360 | list is the previous one with the comma and the new item appended at the end. |
4439 | list is the previous one with the comma and the new item appended at the end. |
3361 | The second match happens only when the list is still empty, and in such case |
4440 | The second match happens only when the list is still empty, and in such case |
3362 | the list is defined to contain just the new item. So starting with the empty |
4441 | the list is defined to contain just the new item. So starting with the empty |
3363 | list, the "append 1" would define "list equ 1" and the "append 2" following it |
4442 | list, the "append 1" would define "list equ 1" and the "append 2" following it |
3364 | would define "list equ 1,2". One might then need to use this list as the |
4443 | would define "list equ 1,2". One might then need to use this list as the |
3365 | parameters to some macroinstruction. But it cannot be done directly - if "foo" |
4444 | parameters to some macroinstruction. But it cannot be done directly - if "foo" |
3366 | is the macroinstruction, then "foo list" would just pass the "list" symbol |
4445 | is the macroinstruction, then "foo list" would just pass the "list" symbol |
3367 | as a parameter to macro, since symbolic constants are not unrolled at this |
4446 | as a parameter to macro, since symbolic constants are not unrolled at this |
3368 | stage. For this purpose again "match" directive comes in handy: |
4447 | stage. For this purpose again "match" directive comes in handy: |
3369 | 4448 | ||
3370 | match params, list { foo params } |
4449 | match params, list { foo params } |
3371 | 4450 | ||
3372 | The value of "list", if it's not empty, matches the "params" keyword, which is |
4451 | The value of "list", if it's not empty, matches the "params" keyword, which is |
3373 | then replaced with matched value when generating the new lines defined by the |
4452 | then replaced with matched value when generating the new lines defined by the |
3374 | block enclosed with braces. So if the "list" had value "1,2", the above line |
4453 | block enclosed with braces. So if the "list" had value "1,2", the above line |
3375 | would generate the line containing "foo 1,2", which would then go through the |
4454 | would generate the line containing "foo 1,2", which would then go through the |
3376 | standard preprocessing. |
4455 | standard preprocessing. |
3377 | There is one more special case - when preprocessor goes to checking the |
4456 | The other special case is in the parameters of "rept" directive. The amount |
- | 4457 | of repetitions and the base value for counter can be specified using |
|
- | 4458 | numerical expressions, and if there is a symbolic constant with non-numerical |
|
- | 4459 | name used in such an expression, preprocessor tries to evaluate its value as |
|
- | 4460 | a numerical expression and if succeeds, it replaces the symbolic constant with |
|
- | 4461 | the result of that calculation and continues to evaluate the primary |
|
- | 4462 | expression. If the expression inside that symbolic constants also contains |
|
- | 4463 | some symbolic constants, preprocessor will try to calculate all the needed |
|
- | 4464 | values recursively. |
|
- | 4465 | This allows to perform some calculations at the time of preprocessing, as |
|
- | 4466 | long as all the values used are the numbers known at the preprocessing stage. |
|
- | 4467 | A single repetition with "rept" can be used for the sole purpose of |
|
- | 4468 | calculating some value, like in this example: |
|
- | 4469 | ||
- | 4470 | define a b+4 |
|
- | 4471 | define b 3 |
|
- | 4472 | rept 1 result:a*b+2 { define c result } |
|
- | 4473 | ||
- | 4474 | To compute the base value for "result" counter, preprocessor replaces the "b" |
|
- | 4475 | with its value and recursively calculates the value of "a", obtaining 7 as |
|
- | 4476 | the result, then it calculates the main expression with the result being 23. |
|
- | 4477 | The "c" then gets defined with the first value of counter (because the block |
|
- | 4478 | is processed just one time), which is the result of the computation, so the |
|
- | 4479 | value of "c" is simple "23" symbol. Note that if "b" is later redefined with |
|
- | 4480 | some other numerical value, the next time and expression containing "a" is |
|
- | 4481 | calculated, the value of "a" will reflect the new value of "b", because the |
|
- | 4482 | symbolic constant contains just the text of the expression. |
|
- | 4483 | There is one more special case - when preprocessor goes to checking the |
|
3378 | second symbol in the line and it happens to be the colon character (what is |
4484 | second symbol in the line and it happens to be the colon character (what is |
3379 | then interpreted by assembler as definition of a label), it stops in this |
4485 | then interpreted by assembler as definition of a label), it stops in this |
3380 | place and finishes the preprocessing of the first symbol (so if it's the |
4486 | place and finishes the preprocessing of the first symbol (so if it's the |
3381 | symbolic constant it gets unrolled) and if it still appears to be the label, |
4487 | symbolic constant it gets unrolled) and if it still appears to be the label, |
3382 | it performs the standard preprocessing starting from the place after the |
4488 | it performs the standard preprocessing starting from the place after the |
3383 | label. This allows to place preprocessor directives and macroinstructions |
4489 | label. This allows to place preprocessor directives and macroinstructions |
3384 | after the labels, analogously to the instructions and directives processed |
4490 | after the labels, analogously to the instructions and directives processed |
3385 | by assembler, like: |
4491 | by assembler, like: |
3386 | 4492 | ||
3387 | start: include 'start.inc' |
4493 | start: include 'start.inc' |
3388 | 4494 | ||
3389 | However if the label becomes broken during preprocessing (for example when |
4495 | However if the label becomes broken during preprocessing (for example when |
3390 | it is the symbolic constant with empty value), only replacing of the symbolic |
4496 | it is the symbolic constant with empty value), only replacing of the symbolic |
3391 | constants is continued for the rest of line. |
4497 | constants is continued for the rest of line. |
3392 | It should be remembered, that the jobs performed by preprocessor are the |
4498 | It should be remembered, that the jobs performed by preprocessor are the |
3393 | preliminary operations on the texts symbols, that are done in a simple |
4499 | preliminary operations on the texts symbols, that are done in a simple |
3394 | single pass before the main process of assembly. The text that is the |
4500 | single pass before the main process of assembly. The text that is the |
3395 | result of preprocessing is passed to assembler, and it then does its |
4501 | result of preprocessing is passed to assembler, and it then does its |
3396 | multiple passes on it. Thus the control directives, which are recognized and |
4502 | multiple passes on it. Thus the control directives, which are recognized and |
3397 | processed only by the assembler - as they are dependent on the numerical |
4503 | processed only by the assembler - as they are dependent on the numerical |
3398 | values that may even vary between passes - are not recognized in any way by |
4504 | values that may even vary between passes - are not recognized in any way by |
3399 | the preprocessor and have no effect on the preprocessing. Consider this |
4505 | the preprocessor and have no effect on the preprocessing. Consider this |
3400 | example source: |
4506 | example source: |
3401 | 4507 | ||
3402 | if 0 |
4508 | if 0 |
3403 | a = 1 |
4509 | a = 1 |
3404 | b equ 2 |
4510 | b equ 2 |
3405 | end if |
4511 | end if |
3406 | dd b |
4512 | dd b |
3407 | 4513 | ||
3408 | When it is preprocessed, they only directive that is recognized by the |
4514 | When it is preprocessed, they only directive that is recognized by the |
3409 | preprocessor is the "equ", which defines symbolic constant "b", so later |
4515 | preprocessor is the "equ", which defines symbolic constant "b", so later |
3410 | in the source the "b" symbol is replaced with the value "2". Except for this |
4516 | in the source the "b" symbol is replaced with the value "2". Except for this |
3411 | replacement, the other lines are passes unchanged to the assembler. So |
4517 | replacement, the other lines are passes unchanged to the assembler. So |
3412 | after preprocessing the above source becomes: |
4518 | after preprocessing the above source becomes: |
3413 | 4519 | ||
3414 | if 0 |
4520 | if 0 |
3415 | a = 1 |
4521 | a = 1 |
3416 | end if |
4522 | end if |
3417 | dd 2 |
4523 | dd 2 |
3418 | 4524 | ||
3419 | Now when assembler processes it, the condition for the "if" is false, and |
4525 | Now when assembler processes it, the condition for the "if" is false, and |
3420 | the "a" constant doesn't get defined. However symbolic constant "b" was |
4526 | the "a" constant doesn't get defined. However symbolic constant "b" was |
3421 | processed normally, even though its definition was put just next to the one |
4527 | processed normally, even though its definition was put just next to the one |
3422 | of "a". So because of the possible confusion you should be very careful |
4528 | of "a". So because of the possible confusion you should be very careful |
3423 | every time when mixing the features of preprocessor and assembler - always |
4529 | every time when mixing the features of preprocessor and assembler - in such |
3424 | try to imagine what your source will become after the preprocessing, and |
4530 | cases it is important to realize what the source will become after the |
3425 | thus what the assembler will see and do its multiple passes on. |
4531 | preprocessing, and thus what the assembler will see and do its multiple passes |
3426 | 4532 | on. |
|
- | 4533 | ||
3427 | 4534 | ||
3428 | 2.4 Formatter directives |
4535 | 2.4 Formatter directives |
3429 | 4536 | ||
3430 | These directives are actually also a kind of control directives, with the |
4537 | These directives are actually also a kind of control directives, with the |
3431 | purpose of controlling the format of generated code. |
4538 | purpose of controlling the format of generated code. |
3432 | "format" directive followed by the format identifier allows to select the |
4539 | "format" directive followed by the format identifier allows to select the |
3433 | output format. This directive should be put at the beginning of the source. |
4540 | output format. This directive should be put at the beginning of the source. |
3434 | Default output format is a flat binary file, it can also be selected by using |
4541 | Default output format is a flat binary file, it can also be selected by using |
3435 | "format binary" directive. |
4542 | "format binary" directive. This directive can be followed by the "as" keyword |
3436 | "use16" and "use32" directives force the assembler to generate 16-bit or |
4543 | and the quoted string specifying the default file extension for the output |
- | 4544 | file. Unless the output file name was specified from the command line, |
|
- | 4545 | assembler will use this extension when generating the output file. |
|
- | 4546 | "use16" and "use32" directives force the assembler to generate 16-bit or |
|
3437 | 32-bit code, omitting the default setting for selected output format. "use64" |
4547 | 32-bit code, omitting the default setting for selected output format. "use64" |
3438 | enables generating the code for the long mode of x86-64 processors. |
4548 | enables generating the code for the long mode of x86-64 processors. |
3439 | Below are described different output formats with the directives specific to |
4549 | Below are described different output formats with the directives specific to |
3440 | these formats. |
4550 | these formats. |
3441 | 4551 | ||
3442 | 4552 | ||
3443 | 2.4.1 MZ executable |
4553 | 2.4.1 MZ executable |
3444 | 4554 | ||
3445 | To select the MZ output format, use "format MZ" directive. The default code |
4555 | To select the MZ output format, use "format MZ" directive. The default code |
3446 | setting for this format is 16-bit. |
4556 | setting for this format is 16-bit. |
3447 | "segment" directive defines a new segment, it should be followed by label, |
4557 | "segment" directive defines a new segment, it should be followed by label, |
3448 | which value will be the number of defined segment, optionally "use16" or |
4558 | which value will be the number of defined segment, optionally "use16" or |
3449 | "use32" word can follow to specify whether code in this segment should be |
4559 | "use32" word can follow to specify whether code in this segment should be |
3450 | 16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes). |
4560 | 16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes). |
3451 | All the labels defined then will have values relative to the beginning of this |
4561 | All the labels defined then will have values relative to the beginning of this |
3452 | segment. |
4562 | segment. |
3453 | "entry" directive sets the entry point for MZ executable, it should be |
4563 | "entry" directive sets the entry point for MZ executable, it should be |
3454 | followed by the far address (name of segment, colon and the offset inside |
4564 | followed by the far address (name of segment, colon and the offset inside |
3455 | segment) of desired entry point. |
4565 | segment) of desired entry point. |
3456 | "stack" directive sets up the stack for MZ executable. It can be followed by |
4566 | "stack" directive sets up the stack for MZ executable. It can be followed by |
3457 | numerical expression specifying the size of stack to be created automatically |
4567 | numerical expression specifying the size of stack to be created automatically |
3458 | or by the far address of initial stack frame when you want to set up the stack |
4568 | or by the far address of initial stack frame when you want to set up the stack |
3459 | manually. When no stack is defined, the stack of default size 4096 bytes will |
4569 | manually. When no stack is defined, the stack of default size 4096 bytes will |
3460 | be created. |
4570 | be created. |
3461 | "heap" directive should be followed by a 16-bit value defining maximum size |
4571 | "heap" directive should be followed by a 16-bit value defining maximum size |
3462 | of additional heap in paragraphs (this is heap in addition to stack and |
4572 | of additional heap in paragraphs (this is heap in addition to stack and |
3463 | undefined data). Use "heap 0" to always allocate only memory program really |
4573 | undefined data). Use "heap 0" to always allocate only memory program really |
3464 | needs. Default size of heap is 65535. |
4574 | needs. Default size of heap is 65535. |
3465 | 4575 | ||
3466 | 4576 | ||
3467 | 2.4.2 Portable Executable |
4577 | 2.4.2 Portable Executable |
3468 | 4578 | ||
3469 | To select the Portable Executable output format, use "format PE" directive, it |
4579 | To select the Portable Executable output format, use "format PE" directive, it |
3470 | can be followed by additional format settings: use "console", "GUI" or |
4580 | can be followed by additional format settings: first the target subsystem |
3471 | "native" operator selects the target subsystem (floating point value |
4581 | setting, which can be "console" or "GUI" for Windows applications, "native" |
3472 | specifying subsystem version can follow), "DLL" marks the output file as a |
4582 | for Windows drivers, "EFI", "EFIboot" or "EFIruntime" for the UEFI, it may be |
- | 4583 | followed by the minimum version of system that the executable is targeted to |
|
3473 | dynamic link library. Then can follow the "at" operator and the numerical |
4584 | (specified in form of floating-point value). Optional "DLL" and "WDM" keywords |
- | 4585 | mark the output file as a dynamic link library and WDM driver respectively, |
|
- | 4586 | and the "large" keyword marks the executable as able to handle addresses |
|
- | 4587 | larger than 2 GB. |
|
- | 4588 | After those settings can follow the "at" operator and a numerical expression |
|
3474 | expression specifying the base of PE image and then optionally "on" operator |
4589 | specifying the base of PE image and then optionally "on" operator followed by |
3475 | followed by the quoted string containing file name selects custom MZ stub for |
4590 | the quoted string containing file name selects custom MZ stub for PE program |
3476 | PE program (when specified file is not a MZ executable, it is treated as a |
4591 | (when specified file is not a MZ executable, it is treated as a flat binary |
3477 | flat binary executable file and converted into MZ format). The default code |
4592 | executable file and converted into MZ format). The default code setting for |
3478 | setting for this format is 32-bit. The example of fully featured PE format |
4593 | this format is 32-bit. The example of fully featured PE format declaration: |
3479 | declaration: |
4594 | |
3480 | - | ||
3481 | format PE GUI 4.0 DLL at 7000000h on 'stub.exe' |
4595 | format PE GUI 4.0 DLL at 7000000h on 'stub.exe' |
3482 | 4596 | ||
3483 | To create PE file for the x86-64 architecture, use "PE64" keyword instead of |
4597 | To create PE file for the x86-64 architecture, use "PE64" keyword instead of |
3484 | "PE" in the format declaration, in such case the long mode code is generated |
4598 | "PE" in the format declaration, in such case the long mode code is generated |
3485 | by default. |
4599 | by default. |
3486 | "section" directive defines a new section, it should be followed by quoted |
4600 | "section" directive defines a new section, it should be followed by quoted |
3487 | string defining the name of section, then one or more section flags can |
4601 | string defining the name of section, then one or more section flags can |
3488 | follow. Available flags are: "code", "data", "readable", "writeable", |
4602 | follow. Available flags are: "code", "data", "readable", "writeable", |
3489 | "executable", "shareable", "discardable", "notpageable". The origin of section |
4603 | "executable", "shareable", "discardable", "notpageable". The origin of section |
3490 | is aligned to page (4096 bytes). Example declaration of PE section: |
4604 | is aligned to page (4096 bytes). Example declaration of PE section: |
3491 | 4605 | ||
3492 | section '.text' code readable executable |
4606 | section '.text' code readable executable |
3493 | 4607 | ||
3494 | Among with flags also one of the special PE data identifiers can be specified |
4608 | Among with flags also one of the special PE data identifiers can be specified |
3495 | to mark the whole section as a special data, possible identifiers are |
4609 | to mark the whole section as a special data, possible identifiers are |
3496 | "export", "import", "resource" and "fixups". If the section is marked to |
4610 | "export", "import", "resource" and "fixups". If the section is marked to |
3497 | contain fixups, they are generated automatically and no more data needs to be |
4611 | contain fixups, they are generated automatically and no more data needs to be |
3498 | defined in this section. Also resource data can be generated automatically |
4612 | defined in this section. Also resource data can be generated automatically |
3499 | from the resource file, it can be achieved by writing the "from" operator and |
4613 | from the resource file, it can be achieved by writing the "from" operator and |
3500 | quoted file name after the "resource" identifier. Below are the examples of |
4614 | quoted file name after the "resource" identifier. Below are the examples of |
3501 | sections containing some special PE data: |
4615 | sections containing some special PE data: |
3502 | 4616 | ||
3503 | section '.reloc' data discardable fixups |
4617 | section '.reloc' data discardable fixups |
3504 | section '.rsrc' data readable resource from 'my.res' |
4618 | section '.rsrc' data readable resource from 'my.res' |
3505 | 4619 | ||
3506 | "entry" directive sets the entry point for Portable Executable, the value of |
4620 | "entry" directive sets the entry point for Portable Executable, the value of |
3507 | entry point should follow. |
4621 | entry point should follow. |
3508 | "stack" directive sets up the size of stack for Portable Executable, value |
4622 | "stack" directive sets up the size of stack for Portable Executable, value |
3509 | of stack reserve size should follow, optionally value of stack commit |
4623 | of stack reserve size should follow, optionally value of stack commit |
3510 | separated with comma can follow. When stack is not defined, it's set by |
4624 | separated with comma can follow. When stack is not defined, it's set by |
3511 | default to size of 4096 bytes. |
4625 | default to size of 4096 bytes. |
3512 | "heap" directive chooses the size of heap for Portable Executable, value of |
4626 | "heap" directive chooses the size of heap for Portable Executable, value of |
3513 | heap reserve size should follow, optionally value of heap commit separated |
4627 | heap reserve size should follow, optionally value of heap commit separated |
3514 | with comma can follow. When no heap is defined, it is set by default to size |
4628 | with comma can follow. When no heap is defined, it is set by default to size |
3515 | of 65536 bytes, when size of heap commit is unspecified, it is by default set |
4629 | of 65536 bytes, when size of heap commit is unspecified, it is by default set |
3516 | to zero. |
4630 | to zero. |
3517 | "data" directive begins the definition of special PE data, it should be |
4631 | "data" directive begins the definition of special PE data, it should be |
3518 | followed by one of the data identifiers ("export", "import", "resource" or |
4632 | followed by one of the data identifiers ("export", "import", "resource" or |
3519 | "fixups") or by the number of data entry in PE header. The data should be |
4633 | "fixups") or by the number of data entry in PE header. The data should be |
3520 | defined in next lines, ended with "end data" directive. When fixups data |
4634 | defined in next lines, ended with "end data" directive. When fixups data |
3521 | definition is chosen, they are generated automatically and no more data needs |
4635 | definition is chosen, they are generated automatically and no more data needs |
3522 | to be defined there. The same applies to the resource data when the "resource" |
4636 | to be defined there. The same applies to the resource data when the "resource" |
3523 | identifier is followed by "from" operator and quoted file name - in such case |
4637 | identifier is followed by "from" operator and quoted file name - in such case |
3524 | data is taken from the given resource file. |
4638 | data is taken from the given resource file. |
3525 | The "rva" operator can be used inside the numerical expressions to obtain |
4639 | The "rva" operator can be used inside the numerical expressions to obtain |
3526 | the RVA of the item addressed by the value it is applied to. |
4640 | the RVA of the item addressed by the value it is applied to, that is the |
3527 | 4641 | offset relative to the base of PE image. |
|
- | 4642 | ||
3528 | 4643 | ||
3529 | 2.4.3 Common Object File Format |
4644 | 2.4.3 Common Object File Format |
3530 | 4645 | ||
3531 | To select Common Object File Format, use "format COFF" or "format MS COFF" |
4646 | To select Common Object File Format, use "format COFF" or "format MS COFF" |
3532 | directive whether you want to create classic or Microsoft's COFF file. The |
4647 | directive, depending whether you want to create classic (DJGPP) or Microsoft's |
3533 | default code setting for this format is 32-bit. To create the file in |
4648 | variant of COFF file. The default code setting for this format is 32-bit. To |
3534 | Microsoft's COFF format for the x86-64 architecture, use "format MS64 COFF" |
4649 | create the file in Microsoft's COFF format for the x86-64 architecture, use |
3535 | setting, in such case long mode code is generated by default. |
4650 | "format MS64 COFF" setting, in such case long mode code is generated by |
3536 | "section" directive defines a new section, it should be followed by quoted |
4651 | default. |
- | 4652 | "section" directive defines a new section, it should be followed by quoted |
|
3537 | string defining the name of section, then one or more section flags can |
4653 | string defining the name of section, then one or more section flags can |
3538 | follow. Section flags available for both COFF variants are "code" and "data", |
4654 | follow. Section flags available for both COFF variants are "code" and "data", |
3539 | while "readable", "writeable", "executable", "shareable", "discardable", |
4655 | while flags "readable", "writeable", "executable", "shareable", "discardable", |
3540 | "notpageable", "linkremove" and "linkinfo" are flags available only with |
4656 | "notpageable", "linkremove" and "linkinfo" are available only with Microsoft's |
3541 | Microsoft COFF variant. |
4657 | COFF variant. |
3542 | By default section is aligned to double word (four bytes), in case of |
4658 | By default section is aligned to double word (four bytes), in case of |
3543 | Microsoft COFF variant other alignment can be specified by providing the |
4659 | Microsoft COFF variant other alignment can be specified by providing the |
3544 | "align" operator followed by alignment value (any power of two up to 8192) |
4660 | "align" operator followed by alignment value (any power of two up to 8192) |
3545 | among the section flags. |
4661 | among the section flags. |
3546 | "extrn" directive defines the external symbol, it should be followed by the |
4662 | "extrn" directive defines the external symbol, it should be followed by the |
3547 | name of symbol and optionally the size operator specifying the size of data |
4663 | name of symbol and optionally the size operator specifying the size of data |
3548 | labeled by this symbol. The name of symbol can be also preceded by quoted |
4664 | labeled by this symbol. The name of symbol can be also preceded by quoted |
3549 | string containing name of the external symbol and the "as" operator. |
4665 | string containing name of the external symbol and the "as" operator. |
3550 | Some example declarations of external symbols: |
4666 | Some example declarations of external symbols: |
3551 | 4667 | ||
3552 | extrn exit |
4668 | extrn exit |
3553 | extrn '__imp__MessageBoxA@16' as MessageBox:dword |
4669 | extrn '__imp__MessageBoxA@16' as MessageBox:dword |
3554 | 4670 | ||
3555 | "public" directive declares the existing symbol as public, it should be |
4671 | "public" directive declares the existing symbol as public, it should be |
3556 | followed by the name of symbol, optionally it can be followed by the "as" |
4672 | followed by the name of symbol, optionally it can be followed by the "as" |
3557 | operator and the quoted string containing name under which symbol should be |
4673 | operator and the quoted string containing name under which symbol should be |
3558 | available as public. Some examples of public symbols declarations: |
4674 | available as public. Some examples of public symbols declarations: |
3559 | 4675 | ||
3560 | public main |
4676 | public main |
3561 | public start as '_start' |
4677 | public start as '_start' |
3562 | 4678 | ||
- | 4679 | Additionally, with COFF format it's possible to specify exported symbol as |
|
- | 4680 | static, it's done by preceding the name of symbol with the "static" keyword. |
|
- | 4681 | When using the Microsoft's COFF format, the "rva" operator can be used |
|
- | 4682 | inside the numerical expressions to obtain the RVA of the item addressed by the |
|
- | 4683 | value it is applied to. |
|
- | 4684 | ||
3563 | 2.4.4 Executable and Linkable Format |
4685 | 2.4.4 Executable and Linkable Format |
3564 | 4686 | ||
3565 | To select ELF output format, use "format ELF" directive. The default code |
4687 | To select ELF output format, use "format ELF" directive. The default code |
3566 | setting for this format is 32-bit. To create ELF file for the x86-64 |
4688 | setting for this format is 32-bit. To create ELF file for the x86-64 |
3567 | architecture, use "format ELF64" directive, in such case the long mode code is |
4689 | architecture, use "format ELF64" directive, in such case the long mode code is |
3568 | generated by default. |
4690 | generated by default. |
3569 | "section" directive defines a new section, it should be followed by quoted |
4691 | "section" directive defines a new section, it should be followed by quoted |
3570 | string defining the name of section, then can follow one or both of the |
4692 | string defining the name of section, then can follow one or both of the |
3571 | "executable" and "writeable" flags, optionally also "align" operator followed |
4693 | "executable" and "writeable" flags, optionally also "align" operator followed |
3572 | by the number specifying the alignment of section (it has to be the power of |
4694 | by the number specifying the alignment of section (it has to be the power of |
3573 | two), if no alignment is specified, the default value is used, which is 4 or 8, |
4695 | two), if no alignment is specified, the default value is used, which is 4 or 8, |
3574 | depending on which format variant has been chosen. |
4696 | depending on which format variant has been chosen. |
3575 | "extrn" and "public" directives have the same meaning and syntax as when the |
4697 | "extrn" and "public" directives have the same meaning and syntax as when the |
3576 | COFF output format is selected (described in previous section). |
4698 | COFF output format is selected (described in previous section). |
3577 | The "rva" operator can be used also in the case of this format (however not |
4699 | The "rva" operator can be used also in the case of this format (however not |
3578 | when target architecture is x86-64), it converts the address into the offset |
4700 | when target architecture is x86-64), it converts the address into the offset |
3579 | relative to the GOT table, so it may be useful to create position-independent |
4701 | relative to the GOT table, so it may be useful to create position-independent |
3580 | code. |
4702 | code. There's also a special "plt" operator, which allows to call the external |
3581 | To create executable file, follow the format choice directive with the |
- | |
3582 | "executable" keyword. It allows to use "entry" directive followed by the value |
- | |
3583 | to set as entry point of program. On the other hand it makes "extrn" and |
4703 | functions through the Procedure Linkage Table. You can even create an alias |
3584 | "public" directives unavailable, and instead of "section" there should be the |
4704 | for external function that will make it always be called through PLT, with |
3585 | "segment" directive used, followed only by one or more segment permission |
4705 | the code like: |
3586 | flags. The origin of segment is aligned to page (4096 bytes), and available |
- | |
3587 | flags for are: "readable", "writeable" and "executable". |
- | |
3588 | 4706 | ||
- | 4707 | extrn 'printf' as _printf |
|
- | 4708 | printf = PLT _printf |
|
- | 4709 | ||
- | 4710 | To create executable file, follow the format choice directive with the |
|
- | 4711 | "executable" keyword and optionally the number specifying the brand of the |
|
- | 4712 | target operating system (for example value 3 would mark the executable |
|
- | 4713 | for Linux system). With this format selected it is allowed to use "entry" |
|
- | 4714 | directive followed by the value to set as entry point of program. On the other |
|
- | 4715 | hand it makes "extrn" and "public" directives unavailable, and instead of |
|
- | 4716 | "section" there should be the "segment" directive used, followed by one or |
|
- | 4717 | more segment permission flags and optionally a marker of special ELF |
|
- | 4718 | executable segment, which can be "interpreter", "dynamic" or "note". The |
|
- | 4719 | origin of segment is aligned to page (4096 bytes), and available permission |
|
- | 4720 | flags are: "readable", "writeable" and "executable". |
|
3589 | 4721 | ||
3590 | EOF">">">>">>=">">> |
4722 | EOF">">">>">>=">">> |