Subversion Repositories Kolibri OS

Rev

Rev 1737 | Go to most recent revision | Only display areas with differences | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 1737 Rev 2666
1
Üßßß
1
,'''
2
                         ÜÜÛÜÜ ÜÜÜÜ    ÜÜÜÜÜ ÜÜÜ ÜÜ
2
                         ,,;,, ,,,,    ,,,,, ,,, ,,
3
                           Û       Û  Û      Û  Û  Û
3
                           ;       ;  ;      ;  ;  ;
4
                           Û  ÜßßßßÛ   ßßßßÜ Û  Û  Û
4
                           ;  ,'''';   '''', ;  ;  ;
5
                           Û  ßÜÜÜÜÛÜ ÜÜÜÜÜß Û  Û  Û
5
                           ;  ',,,,;, ,,,,,' ;  ;  ;
6
 
6
 
7
                              flat assembler 1.66
7
                              flat assembler 1.70
8
                              Programmer's Manual
8
                              Programmer's Manual
9
 
9
 
10
 
10
 
11
Table of contents
11
Table of contents
12
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
12
-----------------
13
 
13
 
14
Chapter 1  Introduction
14
Chapter 1  Introduction
15
 
15
 
16
        1.1  Compiler overview
16
        1.1  Compiler overview
17
        1.1.1  System requirements
17
        1.1.1  System requirements
18
        1.1.2  Executing compiler from command line
18
        1.1.2  Executing compiler from command line
19
        1.1.3  Compiler messages
19
        1.1.3  Compiler messages
20
        1.1.4  Output formats
20
        1.1.4  Output formats
21
 
21
 
22
        1.2  Assembly syntax
22
        1.2  Assembly syntax
23
        1.2.1  Instruction syntax
23
        1.2.1  Instruction syntax
24
        1.2.2  Data definitions
24
        1.2.2  Data definitions
25
        1.2.3  Constants and labels
25
        1.2.3  Constants and labels
26
        1.2.4  Numerical expressions
26
        1.2.4  Numerical expressions
27
        1.2.5  Jumps and calls
27
        1.2.5  Jumps and calls
28
        1.2.6  Size settings
28
        1.2.6  Size settings
29
 
29
 
30
Chapter 2  Instruction set
30
Chapter 2  Instruction set
31
 
31
 
32
        2.1  The x86 architecture instructions
32
        2.1  The x86 architecture instructions
33
        2.1.1  Data movement instructions
33
        2.1.1  Data movement instructions
34
        2.1.2  Type conversion instructions
34
        2.1.2  Type conversion instructions
35
        2.1.3  Binary arithmetic instructions
35
        2.1.3  Binary arithmetic instructions
36
        2.1.4  Decimal arithmetic instructions
36
        2.1.4  Decimal arithmetic instructions
37
        2.1.5  Logical instructions
37
        2.1.5  Logical instructions
38
        2.1.6  Control transfer instructions
38
        2.1.6  Control transfer instructions
39
        2.1.7  I/O instructions
39
        2.1.7  I/O instructions
40
        2.1.8  Strings operations
40
        2.1.8  Strings operations
41
        2.1.9  Flag control instructions
41
        2.1.9  Flag control instructions
42
        2.1.10  Conditional operations
42
        2.1.10  Conditional operations
43
        2.1.11  Miscellaneous instructions
43
        2.1.11  Miscellaneous instructions
44
        2.1.12  System instructions
44
        2.1.12  System instructions
45
        2.1.13  FPU instructions
45
        2.1.13  FPU instructions
46
        2.1.14  MMX instructions
46
        2.1.14  MMX instructions
47
        2.1.15  SSE instructions
47
        2.1.15  SSE instructions
48
        2.1.16  SSE2 instructions
48
        2.1.16  SSE2 instructions
49
        2.1.17  SSE3 instructions
49
        2.1.17  SSE3 instructions
50
        2.1.18  AMD 3DNow! instructions
50
        2.1.18  AMD 3DNow! instructions
51
        2.1.19  The x86-64 long mode instructions
51
        2.1.19  The x86-64 long mode instructions
52
 
52
        2.1.20  SSE4 instructions
-
 
53
        2.1.21  AVX instructions
-
 
54
        2.1.22  AVX2 instructions
-
 
55
        2.1.23  Auxiliary sets of computational instructions
-
 
56
        2.1.24  Other extensions of instruction set
-
 
57
 
53
        2.2  Control directives
58
        2.2  Control directives
54
        2.2.1  Numerical constants
59
        2.2.1  Numerical constants
55
        2.2.2  Conditional assembly
60
        2.2.2  Conditional assembly
56
        2.2.3  Repeating blocks of instructions
61
        2.2.3  Repeating blocks of instructions
57
        2.2.4  Addressing spaces
62
        2.2.4  Addressing spaces
58
        2.2.5  Other directives
63
        2.2.5  Other directives
59
        2.2.6  Multiple passes
64
        2.2.6  Multiple passes
60
 
65
 
61
        2.3  Preprocessor directives
66
        2.3  Preprocessor directives
62
        2.3.1  Including source files
67
        2.3.1  Including source files
63
        2.3.2  Symbolic constants
68
        2.3.2  Symbolic constants
64
        2.3.3  Macroinstructions
69
        2.3.3  Macroinstructions
65
        2.3.4  Structures
70
        2.3.4  Structures
66
        2.3.5  Repeating macroinstructions
71
        2.3.5  Repeating macroinstructions
67
        2.3.6  Conditional preprocessing
72
        2.3.6  Conditional preprocessing
68
        2.3.7  Order of processing
73
        2.3.7  Order of processing
69
 
74
 
70
        2.4  Formatter directives
75
        2.4  Formatter directives
71
        2.4.1  MZ executable
76
        2.4.1  MZ executable
72
        2.4.2  Portable Executable
77
        2.4.2  Portable Executable
73
        2.4.3  Common Object File Format
78
        2.4.3  Common Object File Format
74
        2.4.4  Executable and Linkable Format
79
        2.4.4  Executable and Linkable Format
75
 
80
 
76
 
81
 
-
 
82
 
77
Chapter 1  Introduction
83
Chapter 1  Introduction
78
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
84
-----------------------
79
 
85
 
80
This chapter contains all the most important information you need to begin
86
This chapter contains all the most important information you need to begin
81
using the flat assembler. If you are experienced assembly language programmer,
87
using the flat assembler. If you are experienced assembly language programmer,
82
you should read at least this chapter before using this compiler.
88
you should read at least this chapter before using this compiler.
83
 
89
 
84
 
90
 
85
1.1  Compiler overview
91
1.1  Compiler overview
86
 
92
 
87
Flat assembler is a fast assembly language compiler for the x86 architecture
93
Flat assembler is a fast assembly language compiler for the x86 architecture
88
processors, which does multiple passes to optimize the size of generated
94
processors, which does multiple passes to optimize the size of generated
89
machine code. It is self-compilable and versions for different operating
95
machine code. It is self-compilable and versions for different operating
90
systems are provided. All the versions are designed to be used from the system
96
systems are provided. All the versions are designed to be used from the system
91
command line and they should not differ in behavior.
97
command line and they should not differ in behavior.
92
 
98
 
93
 
99
 
94
1.1.1  System requirements
100
1.1.1  System requirements
95
 
101
 
96
All versions require the x86 architecture 32-bit processor (at least 80386),
102
All versions require the x86 architecture 32-bit processor (at least 80386),
97
although they can produce programs for the x86 architecture 16-bit processors,
103
although they can produce programs for the x86 architecture 16-bit processors,
98
too. DOS version requires an OS compatible with MS DOS 2.0 and either true
104
too. DOS version requires an OS compatible with MS DOS 2.0 and either true
99
real mode environment or DPMI. Windows version requires a Win32 console
105
real mode environment or DPMI. Windows version requires a Win32 console
100
compatible with 3.1 version.
106
compatible with 3.1 version.
101
 
107
 
102
 
108
 
103
1.1.2  Executing compiler from command line
109
1.1.2  Executing compiler from command line
104
 
110
 
105
To execute flat assembler from the command line you need to provide two
111
To execute flat assembler from the command line you need to provide two
106
parameters - first should be name of source file, second should be name of
112
parameters - first should be name of source file, second should be name of
107
destination file. If no second parameter is given, the name for output
113
destination file. If no second parameter is given, the name for output
108
file will be guessed automatically. After displaying short information about
114
file will be guessed automatically. After displaying short information about
109
the program name and version, compiler will read the data from source file and
115
the program name and version, compiler will read the data from source file and
110
compile it. When the compilation is successful, compiler will write the
116
compile it. When the compilation is successful, compiler will write the
111
generated code to the destination file and display the summary of compilation
117
generated code to the destination file and display the summary of compilation
112
process; otherwise it will display the information about error that occurred.
118
process; otherwise it will display the information about error that occurred.
113
  The source file should be a text file, and can be created in any text
119
  The source file should be a text file, and can be created in any text
114
editor. Line breaks are accepted in both DOS and Unix standards, tabulators
120
editor. Line breaks are accepted in both DOS and Unix standards, tabulators
115
are treated as spaces.
121
are treated as spaces.
116
  In the command line you can also include "-m" option followed by a number,
122
  In the command line you can also include "-m" option followed by a number,
117
which specifies how many kilobytes of memory flat assembler should maximally
123
which specifies how many kilobytes of memory flat assembler should maximally
118
use. In case of DOS version this options limits only the usage of extended
124
use. In case of DOS version this options limits only the usage of extended
119
memory. The "-p" option followed by a number can be used to specify the limit
125
memory. The "-p" option followed by a number can be used to specify the limit
120
for number of passes the assembler performs. If code cannot be generated
126
for number of passes the assembler performs. If code cannot be generated
121
within specified amount of passes, the assembly will be terminated with an
127
within specified amount of passes, the assembly will be terminated with an
122
error message. The maximum value of this setting is 65536, while the default
128
error message. The maximum value of this setting is 65536, while the default
123
limit, used when no such option is included in command line, is 100.
129
limit, used when no such option is included in command line, is 100.
124
It is also possible to limit the number of passes the assembler
130
It is also possible to limit the number of passes the assembler
125
performs, with the "-p" option followed by a number specifying the maximum
131
performs, with the "-p" option followed by a number specifying the maximum
126
number of passes.
132
number of passes.
127
  There are no command line options that would affect the output of compiler,
133
  There are no command line options that would affect the output of compiler,
128
flat assembler requires only the source code to include the information it
134
flat assembler requires only the source code to include the information it
129
really needs. For example, to specify output format you specify it by using
135
really needs. For example, to specify output format you specify it by using
130
the "format" directive at the beginning of source.
136
the "format" directive at the beginning of source.
131
 
137
 
132
 
138
 
133
1.1.3  Compiler messages
139
1.1.3  Compiler messages
134
 
140
 
135
As it is stated above, after the successful compilation, the compiler displays
141
As it is stated above, after the successful compilation, the compiler displays
136
the compilation summary. It includes the information of how many passes was
142
the compilation summary. It includes the information of how many passes was
137
done, how much time it took, and how many bytes were written into the
143
done, how much time it took, and how many bytes were written into the
138
destination file.
144
destination file.
139
The following is an example of the compilation summary:
145
The following is an example of the compilation summary:
140
 
146
 
141
flat assembler  version 1.66
147
flat assembler  version 1.70 (16384 kilobytes memory)
142
38 passes, 5.3 seconds, 77824 bytes.
148
38 passes, 5.3 seconds, 77824 bytes.
143
 
149
 
144
In case of error during the compilation process, the program will display an
150
In case of error during the compilation process, the program will display an
145
error message. For example, when compiler can't find the input file, it will
151
error message. For example, when compiler can't find the input file, it will
146
display the following message:
152
display the following message:
147
 
153
 
148
flat assembler  version 1.66
154
flat assembler  version 1.70 (16384 kilobytes memory)
149
error: source file not found.
155
error: source file not found.
150
 
156
 
151
If the error is connected with a specific part of source code, the source line
157
If the error is connected with a specific part of source code, the source line
152
that caused the error will be also displayed. Also placement of this line in
158
that caused the error will be also displayed. Also placement of this line in
153
the source is given to help you finding this error, for example:
159
the source is given to help you finding this error, for example:
154
 
160
 
155
flat assembler  version 1.66
161
flat assembler  version 1.70 (16384 kilobytes memory)
156
example.asm [3]:
162
example.asm [3]:
157
        mob     ax,1
163
        mob     ax,1
158
error: illegal instruction.
164
error: illegal instruction.
159
 
165
 
160
It means that in the third line of the "example.asm" file compiler has
166
It means that in the third line of the "example.asm" file compiler has
161
encountered an unrecognized instruction. When the line that caused error
167
encountered an unrecognized instruction. When the line that caused error
162
contains a macroinstruction, also the line in macroinstruction definition
168
contains a macroinstruction, also the line in macroinstruction definition
163
that generated the erroneous instruction is displayed:
169
that generated the erroneous instruction is displayed:
164
 
170
 
165
flat assembler  version 1.66
171
flat assembler  version 1.70 (16384 kilobytes memory)
166
example.asm [6]:
172
example.asm [6]:
167
        stoschar 7
173
        stoschar 7
168
example.asm [3] stoschar [1]:
174
example.asm [3] stoschar [1]:
169
        mob     al,char
175
        mob     al,char
170
error: illegal instruction.
176
error: illegal instruction.
171
 
177
 
172
It means that the macroinstruction in the sixth line of the "example.asm" file
178
It means that the macroinstruction in the sixth line of the "example.asm" file
173
generated an unrecognized instruction with the first line of its definition.
179
generated an unrecognized instruction with the first line of its definition.
174
 
180
 
175
 
181
 
176
1.1.4  Output formats
182
1.1.4  Output formats
177
 
183
 
178
By default, when there is no "format" directive in source file, flat
184
By default, when there is no "format" directive in source file, flat
179
assembler simply puts generated instruction codes into output, creating this
185
assembler simply puts generated instruction codes into output, creating this
180
way flat binary file. By default it generates 16-bit code, but you can always
186
way flat binary file. By default it generates 16-bit code, but you can always
181
turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
187
turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
182
Some of the output formats switch into 32-bit mode, when selected - more
188
Some of the output formats switch into 32-bit mode, when selected - more
183
information about formats which you can choose can be found in 2.4.
189
information about formats which you can choose can be found in 2.4.
184
  All output code is always in the order in which it was entered into the
190
  All output code is always in the order in which it was entered into the
185
source file.
191
source file.
186
 
192
 
187
 
193
 
188
1.2  Assembly syntax
194
1.2  Assembly syntax
189
 
195
 
190
The information provided below is intended mainly for the assembler
196
The information provided below is intended mainly for the assembler
191
programmers that have been using some other assembly compilers before.
197
programmers that have been using some other assembly compilers before.
192
If you are beginner, you should look for the assembly programming tutorials.
198
If you are beginner, you should look for the assembly programming tutorials.
193
  Flat assembler by default uses the Intel syntax for the assembly
199
  Flat assembler by default uses the Intel syntax for the assembly
194
instructions, although you can customize it using the preprocessor
200
instructions, although you can customize it using the preprocessor
195
capabilities (macroinstructions and symbolic constants). It also has its own
201
capabilities (macroinstructions and symbolic constants). It also has its own
196
set of the directives - the instructions for compiler.
202
set of the directives - the instructions for compiler.
197
  All symbols defined inside the sources are case-sensitive.
203
  All symbols defined inside the sources are case-sensitive.
198
 
204
 
199
 
205
 
200
1.2.1  Instruction syntax
206
1.2.1  Instruction syntax
201
 
207
 
202
Instructions in assembly language are separated by line breaks, and one
208
Instructions in assembly language are separated by line breaks, and one
203
instruction is expected to fill the one line of text. If a line contains
209
instruction is expected to fill the one line of text. If a line contains
204
a semicolon, except for the semicolons inside the quoted strings, the rest of
210
a semicolon, except for the semicolons inside the quoted strings, the rest of
205
this line is the comment and compiler ignores it. If a line ends with "\"
211
this line is the comment and compiler ignores it. If a line ends with "\"
206
character (eventually the semicolon and comment may follow it), the next line
212
character (eventually the semicolon and comment may follow it), the next line
207
is attached at this point.
213
is attached at this point.
208
  Each line in source is the sequence of items, which may be one of the three
214
  Each line in source is the sequence of items, which may be one of the three
209
types. One type are the symbol characters, which are the special characters
215
types. One type are the symbol characters, which are the special characters
210
that are individual items even when are not spaced from the other ones.
216
that are individual items even when are not spaced from the other ones.
211
Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
217
Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
212
other characters, separated from other items with either blank spaces or
218
other characters, separated from other items with either blank spaces or
213
symbol characters, is a symbol. If the first character of symbol is either a
219
symbol characters, is a symbol. If the first character of symbol is either a
214
single or double quote, it integrates the any sequence of characters following
220
single or double quote, it integrates any sequence of characters following it,
215
it, even the special ones, into a quoted string, which should end with the same
221
even the special ones, into a quoted string, which should end with the same
216
character, with which it began (the single or double quote) - however if there
222
character, with which it began (the single or double quote) - however if there
217
are two such characters in a row (without any other character between them),
223
are two such characters in a row (without any other character between them),
218
they are integrated into quoted string as just one of them and the quoted
224
they are integrated into quoted string as just one of them and the quoted
219
string continues then. The symbols other than symbol characters and quoted
225
string continues then. The symbols other than symbol characters and quoted
220
strings can be used as names, so are also called the name symbols.
226
strings can be used as names, so are also called the name symbols.
221
  Every instruction consists of the mnemonic and the various number of
227
  Every instruction consists of the mnemonic and the various number of
222
operands, separated with commas. The operand can be register, immediate value
228
operands, separated with commas. The operand can be register, immediate value
223
or a data addressed in memory, it can also be preceded by size operator to
229
or a data addressed in memory, it can also be preceded by size operator to
224
define or override its size (table 1.1). Names of available registers you can
230
define or override its size (table 1.1). Names of available registers you can
225
find in table 1.2, their sizes cannot be overridden. Immediate value can be
231
find in table 1.2, their sizes cannot be overridden. Immediate value can be
226
specified by any numerical expression.
232
specified by any numerical expression.
227
  When operand is a data in memory, the address of that data (also any
233
  When operand is a data in memory, the address of that data (also any
228
numerical expression, but it may contain registers) should be enclosed in
234
numerical expression, but it may contain registers) should be enclosed in
229
square brackets or preceded by "ptr" operator. For example instruction
235
square brackets or preceded by "ptr" operator. For example instruction
230
"mov eax,3" will put the immediate value 3 into the EAX register, instruction
236
"mov eax,3" will put the immediate value 3 into the EAX register, instruction
231
"mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
237
"mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
232
instruction "mov byte [7],3" will put the immediate value 3 into the byte at
238
instruction "mov byte [7],3" will put the immediate value 3 into the byte at
233
address 7, it can also be written as "mov byte ptr 7,3". To specify which
239
address 7, it can also be written as "mov byte ptr 7,3". To specify which
234
segment register should be used for addressing, segment register name followed
240
segment register should be used for addressing, segment register name followed
235
by a colon should be put just before the address value (inside the square
241
by a colon should be put just before the address value (inside the square
236
brackets or after the "ptr" operator).
242
brackets or after the "ptr" operator).
237
 
243
 
238
   Table 1.1  Size operators
244
   Table 1.1  Size operators
239
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿
245
  /-------------------------\
240
  ³ Operator ³ Bits ³ Bytes ³
246
  | Operator | Bits | Bytes |
241
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵
247
  |==========|======|=======|
242
  ³ byte     ³ 8    ³ 1     ³
248
  | byte     | 8    | 1     |
243
  ³ word     ³ 16   ³ 2     ³
249
  | word     | 16   | 2     |
244
  ³ dword    ³ 32   ³ 4     ³
250
  | dword    | 32   | 4     |
245
  ³ fword    ³ 48   ³ 6     ³
251
  | fword    | 48   | 6     |
246
  ³ pword    ³ 48   ³ 6     ³
252
  | pword    | 48   | 6     |
247
  ³ qword    ³ 64   ³ 8     ³
253
  | qword    | 64   | 8     |
248
  ³ tbyte    ³ 80   ³ 10    ³
254
  | tbyte    | 80   | 10    |
249
  ³ tword    ³ 80   ³ 10    ³
255
  | tword    | 80   | 10    |
250
  ³ dqword   ³ 128  ³ 16    ³
256
  | dqword   | 128  | 16    |
251
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ
257
  | xword    | 128  | 16    |
-
 
258
  | qqword   | 256  | 32    |
-
 
259
  | yword    | 256  | 32    |
-
 
260
  \-------------------------/
252
 
261
 
253
   Table 1.2  Registers
262
   Table 1.2  Registers
254
  ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
263
  /-----------------------------------------------------------------\
255
  ³ Type    ³ Bits ³                                                ³
264
  | Type    | Bits |                                                |
256
  ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
265
  |=========|======|================================================|
257
  ³         ³ 8    ³ al    cl    dl    bl    ah    ch    dh    bh   ³
266
  |         | 8    | al    cl    dl    bl    ah    ch    dh    bh   |
258
  ³ General ³ 16   ³ ax    cx    dx    bx    sp    bp    si    di   ³
267
  | General | 16   | ax    cx    dx    bx    sp    bp    si    di   |
259
  ³         ³ 32   ³ eax   ecx   edx   ebx   esp   ebp   esi   edi  ³
268
  |         | 32   | eax   ecx   edx   ebx   esp   ebp   esi   edi  |
260
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
269
  |---------|------|------------------------------------------------|
261
  ³ Segment ³ 16   ³ es    cs    ss    ds    fs    gs               ³
270
  | Segment | 16   | es    cs    ss    ds    fs    gs               |
262
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
271
  |---------|------|------------------------------------------------|
263
  ³ Control ³ 32   ³ cr0         cr2   cr3   cr4                    ³
272
  | Control | 32   | cr0         cr2   cr3   cr4                    |
264
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
273
  |---------|------|------------------------------------------------|
265
  ³ Debug   ³ 32   ³ dr0   dr1   dr2   dr3               dr6   dr7  ³
274
  | Debug   | 32   | dr0   dr1   dr2   dr3               dr6   dr7  |
266
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
275
  |---------|------|------------------------------------------------|
267
  ³ FPU     ³ 80   ³ st0   st1   st2   st3   st4   st5   st6   st7  ³
276
  | FPU     | 80   | st0   st1   st2   st3   st4   st5   st6   st7  |
268
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
277
  |---------|------|------------------------------------------------|
269
  ³ MMX     ³ 64   ³ mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  ³
278
  | MMX     | 64   | mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  |
270
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
279
  |---------|------|------------------------------------------------|
271
  ³ SSE     ³ 128  ³ xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 ³
280
  | SSE     | 128  | xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 |
-
 
281
  |---------|------|------------------------------------------------|
-
 
282
  | AVX     | 256  | ymm0  ymm1  ymm2  ymm3  ymm4  ymm5  ymm6  ymm7 |
272
  ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
283
  \-----------------------------------------------------------------/
273
 
284
 
274
 
285
 
275
1.2.2  Data definitions
286
1.2.2  Data definitions
276
 
287
 
277
To define data or reserve a space for it, use one of the directives listed in
288
To define data or reserve a space for it, use one of the directives listed in
278
table 1.3. The data definition directive should be followed by one or more of
289
table 1.3. The data definition directive should be followed by one or more of
279
numerical expressions, separated with commas. These expressions define the
290
numerical expressions, separated with commas. These expressions define the
280
values for data cells of size depending on which directive is used. For
291
values for data cells of size depending on which directive is used. For
281
example "db 1,2,3" will define the three bytes of values 1, 2 and 3
292
example "db 1,2,3" will define the three bytes of values 1, 2 and 3
282
respectively.
293
respectively.
283
  The "db" and "du" directives also accept the quoted string values of any
294
  The "db" and "du" directives also accept the quoted string values of any
284
length, which will be converted into chain of bytes when "db" is used and into
295
length, which will be converted into chain of bytes when "db" is used and into
285
chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
296
chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
286
will define the three bytes of values 61, 62 and 63.
297
will define the three bytes of values 61, 62 and 63.
287
  The "dp" directive and its synonym "df" accept the values consisting of two
298
  The "dp" directive and its synonym "df" accept the values consisting of two
288
numerical expressions separated with colon, the first value will become the
299
numerical expressions separated with colon, the first value will become the
289
high word and the second value will become the low double word of the far
300
high word and the second value will become the low double word of the far
290
pointer value. Also "dd" accepts such pointers consisting of two word values
301
pointer value. Also "dd" accepts such pointers consisting of two word values
291
separated with colon, and "dt" accepts the word and quad word value separated
302
separated with colon, and "dt" accepts the word and quad word value separated
292
with colon, the quad word is stored first. The "dt" directive with single
303
with colon, the quad word is stored first. The "dt" directive with single
293
expression as parameter accepts only floating point values and creates data in
304
expression as parameter accepts only floating point values and creates data in
294
FPU double extended precision format.
305
FPU double extended precision format.
295
  Any of the above directive allows the usage of special "dup" operator to
306
  Any of the above directive allows the usage of special "dup" operator to
296
make multiple copies of given values. The count of duplicates should precede
307
make multiple copies of given values. The count of duplicates should precede
297
this operator and the value to duplicate should follow - it can even be the
308
this operator and the value to duplicate should follow - it can even be the
298
chain of values separated with commas, but such set of values needs to be
309
chain of values separated with commas, but such set of values needs to be
299
enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
310
enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
300
of the given two byte sequence.
311
of the given two byte sequence.
301
  The "file" is a special directive and its syntax is different. This
312
  The "file" is a special directive and its syntax is different. This
302
directive includes a chain of bytes from file and it should be followed by the
313
directive includes a chain of bytes from file and it should be followed by the
303
quoted file name, then optionally numerical expression specifying offset in
314
quoted file name, then optionally numerical expression specifying offset in
304
file preceded by the colon, and - also optionally - comma and numerical
315
file preceded by the colon, and - also optionally - comma and numerical
305
expression specifying count of bytes to include (if no count is specified, all
316
expression specifying count of bytes to include (if no count is specified, all
306
data up to the end of file is included). For example "file 'data.bin'" will
317
data up to the end of file is included). For example "file 'data.bin'" will
307
include the whole file as binary data and "file 'data.bin':10h,4" will include
318
include the whole file as binary data and "file 'data.bin':10h,4" will include
308
only four bytes starting at offset 10h.
319
only four bytes starting at offset 10h.
309
  The data reservation directive should be followed by only one numerical
320
  The data reservation directive should be followed by only one numerical
310
expression, and this value defines how many cells of the specified size should
321
expression, and this value defines how many cells of the specified size should
311
be reserved. All data definition directives also accept the "?" value, which
322
be reserved. All data definition directives also accept the "?" value, which
312
means that this cell should not be initialized to any value and the effect is
323
means that this cell should not be initialized to any value and the effect is
313
the same as by using the data reservation directive. The uninitialized data
324
the same as by using the data reservation directive. The uninitialized data
314
may not be included in the output file, so its values should be always
325
may not be included in the output file, so its values should be always
315
considered unknown.
326
considered unknown.
316
 
327
 
317
   Table 1.3  Data directives
328
   Table 1.3  Data directives
318
  ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿
329
  /----------------------------\
319
  ³ Size    ³ Define ³ Reserve ³
330
  | Size    | Define | Reserve |
320
  ³ (bytes) ³ data   ³ data    ³
331
  | (bytes) | data   | data    |
321
  ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍ͵
332
  |=========|========|=========|
322
  ³ 1       ³ db     ³ rb      ³
333
  | 1       | db     | rb      |
323
  ³         ³ file   ³         ³
334
  |         | file   |         |
324
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
335
  |---------|--------|---------|
325
  ³ 2       ³ dw     ³ rw      ³
336
  | 2       | dw     | rw      |
326
  ³         ³ du     ³         ³
337
  |         | du     |         |
327
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
338
  |---------|--------|---------|
328
  ³ 4       ³ dd     ³ rd      ³
339
  | 4       | dd     | rd      |
329
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
340
  |---------|--------|---------|
330
  ³ 6       ³ dp     ³ rp      ³
341
  | 6       | dp     | rp      |
331
  ³         ³ df     ³ rf      ³
342
  |         | df     | rf      |
332
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
343
  |---------|--------|---------|
333
  ³ 8       ³ dq     ³ rq      ³
344
  | 8       | dq     | rq      |
334
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
345
  |---------|--------|---------|
335
  ³ 10      ³ dt     ³ rt      ³
346
  | 10      | dt     | rt      |
336
  ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÙ
347
  \----------------------------/
337
 
348
 
338
 
349
 
339
1.2.3  Constants and labels
350
1.2.3  Constants and labels
340
 
351
 
341
In the numerical expressions you can also use constants or labels instead of
352
In the numerical expressions you can also use constants or labels instead of
342
numbers. To define the constant or label you should use the specific
353
numbers. To define the constant or label you should use the specific
343
directives. Each label can be defined only once and it is accessible from the
354
directives. Each label can be defined only once and it is accessible from the
344
any place of source (even before it was defined). Constant can be redefined
355
any place of source (even before it was defined). Constant can be redefined
345
many times, but in this case it is accessible only after it was defined, and
356
many times, but in this case it is accessible only after it was defined, and
346
is always equal to the value from last definition before the place where it's
357
is always equal to the value from last definition before the place where it's
347
used. When a constant is defined only once in source, it is - like the label -
358
used. When a constant is defined only once in source, it is - like the label -
348
accessible from anywhere.
359
accessible from anywhere.
349
  The definition of constant consists of name of the constant followed by the
360
  The definition of constant consists of name of the constant followed by the
350
"=" character and numerical expression, which after calculation will become
361
"=" character and numerical expression, which after calculation will become
351
the value of constant. This value is always calculated at the time the
362
the value of constant. This value is always calculated at the time the
352
constant is defined. For example you can define "count" constant by using the
363
constant is defined. For example you can define "count" constant by using the
353
directive "count = 17", and then use it in the assembly instructions, like
364
directive "count = 17", and then use it in the assembly instructions, like
354
"mov cx,count" - which will become "mov cx,17" during the compilation process.
365
"mov cx,count" - which will become "mov cx,17" during the compilation process.
355
  There are different ways to define labels. The simplest is to follow the
366
  There are different ways to define labels. The simplest is to follow the
356
name of label by the colon, this directive can even be followed by the other
367
name of label by the colon, this directive can even be followed by the other
357
instruction in the same line. It defines the label whose value is equal to
368
instruction in the same line. It defines the label whose value is equal to
358
offset of the point where it's defined. This method is usually used to label
369
offset of the point where it's defined. This method is usually used to label
359
the places in code. The other way is to follow the name of label (without a
370
the places in code. The other way is to follow the name of label (without a
360
colon) by some data directive. It defines the label with value equal to
371
colon) by some data directive. It defines the label with value equal to
361
offset of the beginning of defined data, and remembered as a label for data
372
offset of the beginning of defined data, and remembered as a label for data
362
with cell size as specified for that data directive in table 1.3.
373
with cell size as specified for that data directive in table 1.3.
363
  The label can be treated as constant of value equal to offset of labeled
374
  The label can be treated as constant of value equal to offset of labeled
364
code or data. For example when you define data using the labeled directive
375
code or data. For example when you define data using the labeled directive
365
"char db 224", to put the offset of this data into BX register you should use
376
"char db 224", to put the offset of this data into BX register you should use
366
"mov bx,char" instruction, and to put the value of byte addressed by "char"
377
"mov bx,char" instruction, and to put the value of byte addressed by "char"
367
label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
378
label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
368
But when you try to assemble "mov ax,[char]", it will cause an error, because
379
But when you try to assemble "mov ax,[char]", it will cause an error, because
369
fasm compares the sizes of operands, which should be equal. You can force
380
fasm compares the sizes of operands, which should be equal. You can force
370
assembling that instruction by using size override: "mov ax,word [char]", but
381
assembling that instruction by using size override: "mov ax,word [char]", but
371
remember that this instruction will read the two bytes beginning at "char"
382
remember that this instruction will read the two bytes beginning at "char"
372
address, while it was defined as a one byte.
383
address, while it was defined as a one byte.
373
  The last and the most flexible way to define labels is to use "label"
384
  The last and the most flexible way to define labels is to use "label"
374
directive. This directive should be followed by the name of label, then
385
directive. This directive should be followed by the name of label, then
375
optionally size operator (it can be preceded by a colon) and then - also
386
optionally size operator (it can be preceded by a colon) and then - also
376
optionally "at" operator and the numerical expression defining the address at
387
optionally "at" operator and the numerical expression defining the address at
377
which this label should be defined. For example "label wchar word at char"
388
which this label should be defined. For example "label wchar word at char"
378
will define a new label for the 16-bit data at the address of "char". Now the
389
will define a new label for the 16-bit data at the address of "char". Now the
379
instruction "mov ax,[wchar]" will be after compilation the same as
390
instruction "mov ax,[wchar]" will be after compilation the same as
380
"mov ax,word [char]". If no address is specified, "label" directive defines
391
"mov ax,word [char]". If no address is specified, "label" directive defines
381
the label at current offset. Thus "mov [wchar],57568" will copy two bytes
392
the label at current offset. Thus "mov [wchar],57568" will copy two bytes
382
while "mov [char],224" will copy one byte to the same address.
393
while "mov [char],224" will copy one byte to the same address.
383
  The label whose name begins with dot is treated as local label, and its name
394
  The label whose name begins with dot is treated as local label, and its name
384
is attached to the name of last global label (with name beginning with
395
is attached to the name of last global label (with name beginning with
385
anything but dot) to make the full name of this label. So you can use the
396
anything but dot) to make the full name of this label. So you can use the
386
short name (beginning with dot) of this label anywhere before the next global
397
short name (beginning with dot) of this label anywhere before the next global
387
label is defined, and in the other places you have to use the full name. Label
398
label is defined, and in the other places you have to use the full name. Label
388
beginning with two dots are the exception - they are like global, but they
399
beginning with two dots are the exception - they are like global, but they
389
don't become the new prefix for local labels.
400
don't become the new prefix for local labels.
390
  The "@@" name means anonymous label, you can have defined many of them in
401
  The "@@" name means anonymous label, you can have defined many of them in
391
the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
402
the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
392
anonymous label, symbol "@f" references the nearest following anonymous label.
403
anonymous label, symbol "@f" references the nearest following anonymous label.
393
These special symbol are case-insensitive.
404
These special symbol are case-insensitive.
394
 
405
 
395
 
406
 
396
1.2.4  Numerical expressions
407
1.2.4  Numerical expressions
397
 
408
 
398
In the above examples all the numerical expressions were the simple numbers,
409
In the above examples all the numerical expressions were the simple numbers,
399
constants or labels. But they can be more complex, by using the arithmetical
410
constants or labels. But they can be more complex, by using the arithmetical
400
or logical operators for calculations at compile time. All these operators
411
or logical operators for calculations at compile time. All these operators
401
with their priority values are listed in table 1.4.
412
with their priority values are listed in table 1.4. The operations with higher
402
The operations with higher priority value will be calculated first, you can
413
priority value will be calculated first, you can of course change this
403
of course change this behavior by putting some parts of expression into
414
behavior by putting some parts of expression into parenthesis. The "+", "-",
404
parenthesis. The "+", "-", "*" and "/" are standard arithmetical operations,
415
"*" and "/" are standard arithmetical operations, "mod" calculates the
405
"mod" calculates the remainder from division. The "and", "or", "xor", "shl",
416
remainder from division. The "and", "or", "xor", "shl", "shr" and "not"
406
"shr" and "not" perform the same logical operations as assembly instructions
417
perform the same logical operations as assembly instructions of those names.
407
of those names. The "rva" performs the conversion of an address into the
418
The "rva" and "plt" are special unary operators that perform conversions
-
 
419
between different kinds of addresses, they can be used only with few of the
408
relocatable offset and is specific to some of the output formats (see 2.4).
420
output formats and their meaning may vary (see 2.4).
409
  The numbers in the expression are by default treated as a decimal, binary
421
  The arithmetical and logical calculations are usually processed as if they
-
 
422
operated on infinite precision 2-adic numbers, and assembler signalizes an
-
 
423
overflow error if because of its limitations it is not table to perform the
-
 
424
required calculation, or if the result is too large number to fit in either
-
 
425
signed or unsigned range for the destination unit size. However "not", "xor"
-
 
426
and "shr" operators are exceptions from this rule - if the value specified
-
 
427
by numerical expression has to fit in a unit of specified size, and the
-
 
428
arguments for operation fit into that size, the operation will be performed
-
 
429
with precision limited to that size.
-
 
430
  The numbers in the expression are by default treated as a decimal, binary
410
numbers should have the "b" letter attached at the end, octal number should
431
numbers should have the "b" letter attached at the end, octal number should
411
end with "o" letter, hexadecimal numbers should begin with "0x" characters
432
end with "o" letter, hexadecimal numbers should begin with "0x" characters
412
(like in C language) or with the "$" character (like in Pascal language) or
433
(like in C language) or with the "$" character (like in Pascal language) or
413
they should end with "h" letter. Also quoted string, when encountered in
434
they should end with "h" letter. Also quoted string, when encountered in
414
expression, will be converted into number - the first character will become
435
expression, will be converted into number - the first character will become
415
the least significant byte of number.
436
the least significant byte of number.
416
  The numerical expression used as an address value can also contain any of
437
  The numerical expression used as an address value can also contain any of
417
general registers used for addressing, they can be added and multiplied by
438
general registers used for addressing, they can be added and multiplied by
418
appropriate values, as it is allowed for the x86 architecture instructions.
439
appropriate values, as it is allowed for the x86 architecture instructions.
419
  There are also some special symbols that can be used inside the numerical
440
  There are also some special symbols that can be used inside the numerical
420
expression. First is "$", which is always equal to the value of current
441
expression. First is "$", which is always equal to the value of current
421
offset, while "$$" is equal to base address of current addressing space. The
442
offset, while "$$" is equal to base address of current addressing space. The
422
other one is "%", which is the number of current repeat in parts of code that
443
other one is "%", which is the number of current repeat in parts of code that
423
are repeated using some special directives (see 2.2). There's also "%t"
444
are repeated using some special directives (see 2.2). There's also "%t"
424
symbol, which is always equal to the current time stamp.
445
symbol, which is always equal to the current time stamp.
425
  Any numerical expression can also consist of single floating point value
446
  Any numerical expression can also consist of single floating point value
426
(flat assembler does not allow any floating point operations at compilation
447
(flat assembler does not allow any floating point operations at compilation
427
time) in the scientific notation, they can end with the "f" letter to be
448
time) in the scientific notation, they can end with the "f" letter to be
428
recognized, otherwise they should contain at least one of the "." or "E"
449
recognized, otherwise they should contain at least one of the "." or "E"
429
characters. So "1.0", "1E0" and "1f" define the same floating point value,
450
characters. So "1.0", "1E0" and "1f" define the same floating point value,
430
while simple "1" defines an integer value.
451
while simple "1" defines an integer value.
431
 
452
 
432
   Table 1.4  Arithmetical and logical operators by priority
453
   Table 1.4  Arithmetical and logical operators by priority
433
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
454
  /-------------------------\
434
  ³ Priority ³ Operators    ³
455
  | Priority | Operators    |
435
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
456
  |==========|==============|
436
  ³ 0        ³ +  -         ³
457
  | 0        | +  -         |
437
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
458
  |----------|--------------|
438
  ³ 1        ³ *  /         ³
459
  | 1        | *  /         |
439
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
460
  |----------|--------------|
440
  ³ 2        ³ mod          ³
461
  | 2        | mod          |
441
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
462
  |----------|--------------|
442
  ³ 3        ³ and  or  xor ³
463
  | 3        | and  or  xor |
443
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
464
  |----------|--------------|
444
  ³ 4        ³ shl  shr     ³
465
  | 4        | shl  shr     |
445
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
466
  |----------|--------------|
446
  ³ 5        ³ not          ³
467
  | 5        | not          |
447
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
468
  |----------|--------------|
448
  ³ 6        ³ rva          ³
469
  | 6        | rva  plt     |
449
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
470
  \-------------------------/
450
 
471
 
451
 
472
 
452
1.2.5  Jumps and calls
473
1.2.5  Jumps and calls
453
 
474
 
454
The operand of any jump or call instruction can be preceded not only by the
475
The operand of any jump or call instruction can be preceded not only by the
455
size operator, but also by one of the operators specifying type of the jump:
476
size operator, but also by one of the operators specifying type of the jump:
456
"short", "near" of "far". For example, when assembler is in 16-bit mode,
477
"short", "near" of "far". For example, when assembler is in 16-bit mode,
457
instruction "jmp dword [0]" will become the far jump and when assembler is
478
instruction "jmp dword [0]" will become the far jump and when assembler is
458
in 32-bit mode, it will become the near jump. To force this instruction to be
479
in 32-bit mode, it will become the near jump. To force this instruction to be
459
treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
480
treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
460
  When operand of near jump is the immediate value, assembler will generate
481
  When operand of near jump is the immediate value, assembler will generate
461
the shortest variant of this jump instruction if possible (but won't create
482
the shortest variant of this jump instruction if possible (but will not create
462
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
483
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
463
unless there is a size operator stating it). By specifying the jump type
484
unless there is a size operator stating it). By specifying the jump type
464
you can force it to always generate long variant (for example "jmp near 0")
485
you can force it to always generate long variant (for example "jmp near 0")
465
or to always generate short variant and terminate with an error when it's
486
or to always generate short variant and terminate with an error when it's
466
impossible (for example "jmp short 0").
487
impossible (for example "jmp short 0").
467
 
488
 
468
 
489
 
469
1.2.6  Size settings
490
1.2.6  Size settings
470
 
491
 
471
When instruction uses some memory addressing, by default the smallest form of
492
When instruction uses some memory addressing, by default the smallest form of
472
instruction is generated by using the short displacement if only address
493
instruction is generated by using the short displacement if only address
473
value fits in the range. This can be overridden using the "word" or "dword"
494
value fits in the range. This can be overridden using the "word" or "dword"
474
operator before the address inside the square brackets (or after the "ptr"
495
operator before the address inside the square brackets (or after the "ptr"
475
operator), which forces the long displacement of appropriate size to be made.
496
operator), which forces the long displacement of appropriate size to be made.
476
In case when address is not relative to any registers, those operators allow
497
In case when address is not relative to any registers, those operators allow
477
also to choose the appropriate mode of absolute addressing.
498
also to choose the appropriate mode of absolute addressing.
478
  Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
499
  Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
479
first operand being 16-bit or 32-bit are by default generated in shortened
500
first operand being 16-bit or 32-bit are by default generated in shortened
480
8-bit form when the second operand is immediate value fitting in the range
501
8-bit form when the second operand is immediate value fitting in the range
481
for signed 8-bit values. It also can be overridden by putting the "word" or
502
for signed 8-bit values. It also can be overridden by putting the "word" or
482
"dword" operator before the immediate value. The similar rules applies to the
503
"dword" operator before the immediate value. The similar rules applies to the
483
"imul" instruction with the last operand being immediate value.
504
"imul" instruction with the last operand being immediate value.
484
  Immediate value as an operand for "push" instruction without a size operator
505
  Immediate value as an operand for "push" instruction without a size operator
485
is by default treated as a word value if assembler is in 16-bit mode and as a
506
is by default treated as a word value if assembler is in 16-bit mode and as a
486
double word value if assembler is in 32-bit mode, shorter 8-bit form of this
507
double word value if assembler is in 32-bit mode, shorter 8-bit form of this
487
instruction is used if possible, "word" or "dword" size operator forces the
508
instruction is used if possible, "word" or "dword" size operator forces the
488
"push" instruction to be generated in longer form for specified size. "pushw"
509
"push" instruction to be generated in longer form for specified size. "pushw"
489
and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
510
and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
490
without forcing it to use the longer form of instruction.
511
without forcing it to use the longer form of instruction.
491
 
512
 
492
 
513
 
493
Chapter 2  Instruction set
514
Chapter 2  Instruction set
494
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
515
--------------------------
495
 
516
 
496
This chapter provides the detailed information about the instructions and
517
This chapter provides the detailed information about the instructions and
497
directives supported by flat assembler. Directives for defining labels were
518
directives supported by flat assembler. Directives for defining labels were
498
already discussed in 1.2.3, all other directives will be described later in
519
already discussed in 1.2.3, all other directives will be described later in
499
this chapter.
520
this chapter.
500
 
521
 
501
 
522
 
502
2.1  The x86 architecture instructions
523
2.1  The x86 architecture instructions
503
 
524
 
504
In this section you can find both the information about the syntax and
525
In this section you can find both the information about the syntax and
505
purpose the assembly language instructions. If you need more technical
526
purpose the assembly language instructions. If you need more technical
506
information, look for the Intel Architecture Software Developer's Manual.
527
information, look for the Intel Architecture Software Developer's Manual.
507
  Assembly instructions consist of the mnemonic (instruction's name) and from
528
  Assembly instructions consist of the mnemonic (instruction's name) and from
508
zero to three operands. If there are two or more operands, usually first is
529
zero to three operands. If there are two or more operands, usually first is
509
the destination operand and second is the source operand. Each operand can be
530
the destination operand and second is the source operand. Each operand can be
510
register, memory or immediate value (see 1.2 for details about syntax of
531
register, memory or immediate value (see 1.2 for details about syntax of
511
operands). After the description of each instruction there are examples
532
operands). After the description of each instruction there are examples
512
of different combinations of operands, if the instruction has any.
533
of different combinations of operands, if the instruction has any.
513
  Some instructions act as prefixes and can be followed by other instruction
534
  Some instructions act as prefixes and can be followed by other instruction
514
in the same line, and there can be more than one prefix in a line. Each name
535
in the same line, and there can be more than one prefix in a line. Each name
515
of the segment register is also a mnemonic of instruction prefix, altough it
536
of the segment register is also a mnemonic of instruction prefix, altough it
516
is recommended to use segment overrides inside the square brackets instead of
537
is recommended to use segment overrides inside the square brackets instead of
517
these prefixes.
538
these prefixes.
518
 
539
 
519
 
540
 
520
2.1.1  Data movement instructions
541
2.1.1  Data movement instructions
521
 
542
 
522
"mov" transfers a byte, word or double word from the source operand to the
543
"mov" transfers a byte, word or double word from the source operand to the
523
destination operand. It can transfer data between general registers, from
544
destination operand. It can transfer data between general registers, from
524
the general register to memory, or from memory to general register, but it
545
the general register to memory, or from memory to general register, but it
525
cannot move from memory to memory. It can also transfer an immediate value to
546
cannot move from memory to memory. It can also transfer an immediate value to
526
general register or memory, segment register to general register or memory,
547
general register or memory, segment register to general register or memory,
527
general register or memory to segment register, control or debug register to
548
general register or memory to segment register, control or debug register to
528
general register and general register to control or debug register. The "mov"
549
general register and general register to control or debug register. The "mov"
529
can be assembled only if the size of source operand and size of destination
550
can be assembled only if the size of source operand and size of destination
530
operand are the same. Below are the examples for each of the allowed
551
operand are the same. Below are the examples for each of the allowed
531
combinations:
552
combinations:
532
 
553
 
533
    mov bx,ax       ; general register to general register
554
    mov bx,ax       ; general register to general register
534
    mov [char],al   ; general register to memory
555
    mov [char],al   ; general register to memory
535
    mov bl,[char]   ; memory to general register
556
    mov bl,[char]   ; memory to general register
536
    mov dl,32       ; immediate value to general register
557
    mov dl,32       ; immediate value to general register
537
    mov [char],32   ; immediate value to memory
558
    mov [char],32   ; immediate value to memory
538
    mov ax,ds       ; segment register to general register
559
    mov ax,ds       ; segment register to general register
539
    mov [bx],ds     ; segment register to memory
560
    mov [bx],ds     ; segment register to memory
540
    mov ds,ax       ; general register to segment register
561
    mov ds,ax       ; general register to segment register
541
    mov ds,[bx]     ; memory to segment register
562
    mov ds,[bx]     ; memory to segment register
542
    mov eax,cr0     ; control register to general register
563
    mov eax,cr0     ; control register to general register
543
    mov cr3,ebx     ; general register to control register
564
    mov cr3,ebx     ; general register to control register
544
 
565
 
545
  "xchg" swaps the contents of two operands. It can swap two byte operands,
566
  "xchg" swaps the contents of two operands. It can swap two byte operands,
546
two word operands or two double word operands. Order of operands is not
567
two word operands or two double word operands. Order of operands is not
547
important. The operands may be two general registers, or general register
568
important. The operands may be two general registers, or general register
548
with memory. For example:
569
with memory. For example:
549
 
570
 
550
    xchg ax,bx      ; swap two general registers
571
    xchg ax,bx      ; swap two general registers
551
    xchg al,[char]  ; swap register with memory
572
    xchg al,[char]  ; swap register with memory
552
 
573
 
553
  "push" decrements the stack frame pointer (ESP register), then transfers
574
  "push" decrements the stack frame pointer (ESP register), then transfers
554
the operand to the top of stack indicated by ESP. The operand can be memory,
575
the operand to the top of stack indicated by ESP. The operand can be memory,
555
general register, segment register or immediate value of word or double word
576
general register, segment register or immediate value of word or double word
556
size. If operand is an immediate value and no size is specified, it is by
577
size. If operand is an immediate value and no size is specified, it is by
557
default treated as a word value if assembler is in 16-bit mode and as a double
578
default treated as a word value if assembler is in 16-bit mode and as a double
558
word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
579
word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
559
variants of this instruction that store the values of word or double word size
580
variants of this instruction that store the values of word or double word size
560
respectively. If more operands follow in the same line (separated only with
581
respectively. If more operands follow in the same line (separated only with
561
spaces, not commas), compiler will assemble chain of the "push" instructions
582
spaces, not commas), compiler will assemble chain of the "push" instructions
562
with these operands. The examples are with single operands:
583
with these operands. The examples are with single operands:
563
 
584
 
564
    push ax         ; store general register
585
    push ax         ; store general register
565
    push es         ; store segment register
586
    push es         ; store segment register
566
    pushw [bx]      ; store memory
587
    pushw [bx]      ; store memory
567
    push 1000h      ; store immediate value
588
    push 1000h      ; store immediate value
568
 
589
 
569
  "pusha" saves the contents of the eight general register on the stack.
590
  "pusha" saves the contents of the eight general register on the stack.
570
This instruction has no operands. There are two version of this instruction,
591
This instruction has no operands. There are two version of this instruction,
571
one 16-bit and one 32-bit, assembler automatically generates the appropriate
592
one 16-bit and one 32-bit, assembler automatically generates the appropriate
572
version for current mode, but it can be overridden by using "pushaw" or
593
version for current mode, but it can be overridden by using "pushaw" or
573
"pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
594
"pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
574
version of this instruction pushes general registers on the stack in the
595
version of this instruction pushes general registers on the stack in the
575
following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
596
following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
576
BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
597
BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
577
in the same order.
598
in the same order.
578
  "pop" transfers the word or double word at the current top of stack to the
599
  "pop" transfers the word or double word at the current top of stack to the
579
destination operand, and then increments ESP to point to the new top of stack.
600
destination operand, and then increments ESP to point to the new top of stack.
580
The operand can be memory, general register or segment register. "popw" and
601
The operand can be memory, general register or segment register. "popw" and
581
"popd" mnemonics are variants of this instruction for restoring the values of
602
"popd" mnemonics are variants of this instruction for restoring the values of
582
word or double word size respectively. If more operands separated with spaces
603
word or double word size respectively. If more operands separated with spaces
583
follow in the same line, compiler will assemble chain of the "pop"
604
follow in the same line, compiler will assemble chain of the "pop"
584
instructions with these operands.
605
instructions with these operands.
585
 
606
 
586
    pop bx          ; restore general register
607
    pop bx          ; restore general register
587
    pop ds          ; restore segment register
608
    pop ds          ; restore segment register
588
    popw [si]       ; restore memory
609
    popw [si]       ; restore memory
589
 
610
 
590
  "popa" restores the registers saved on the stack by "pusha" instruction,
611
  "popa" restores the registers saved on the stack by "pusha" instruction,
591
except for the saved value of SP (or ESP), which is ignored. This instruction
612
except for the saved value of SP (or ESP), which is ignored. This instruction
592
has no operands. To force assembling 16-bit or 32-bit version of this
613
has no operands. To force assembling 16-bit or 32-bit version of this
593
instruction use "popaw" or "popad" mnemonic.
614
instruction use "popaw" or "popad" mnemonic.
594
 
615
 
595
 
616
 
596
2.1.2  Type conversion instructions
617
2.1.2  Type conversion instructions
597
 
618
 
598
The type conversion instructions convert bytes into words, words into double
619
The type conversion instructions convert bytes into words, words into double
599
words, and double words into quad words. These conversions can be done using
620
words, and double words into quad words. These conversions can be done using
600
the sign extension or zero extension. The sign extension fills the extra bits
621
the sign extension or zero extension. The sign extension fills the extra bits
601
of the larger item with the value of the sign bit of the smaller item, the
622
of the larger item with the value of the sign bit of the smaller item, the
602
zero extension simply fills them with zeros.
623
zero extension simply fills them with zeros.
603
  "cwd" and "cdq" double the size of value AX or EAX register respectively
624
  "cwd" and "cdq" double the size of value AX or EAX register respectively
604
and store the extra bits into the DX or EDX register. The conversion is done
625
and store the extra bits into the DX or EDX register. The conversion is done
605
using the sign extension. These instructions have no operands.
626
using the sign extension. These instructions have no operands.
606
  "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
627
  "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
607
the sign of the word in AX throughout EAX. These instructions also have no
628
the sign of the word in AX throughout EAX. These instructions also have no
608
operands.
629
operands.
609
  "movsx" converts a byte to word or double word and a word to double word
630
  "movsx" converts a byte to word or double word and a word to double word
610
using the sign extension. "movzx" does the same, but it uses the zero
631
using the sign extension. "movzx" does the same, but it uses the zero
611
extension. The source operand can be general register or memory, while the
632
extension. The source operand can be general register or memory, while the
612
destination operand must be a general register. For example:
633
destination operand must be a general register. For example:
613
 
634
 
614
    movsx ax,al         ; byte register to word register
635
    movsx ax,al         ; byte register to word register
615
    movsx edx,dl        ; byte register to double word register
636
    movsx edx,dl        ; byte register to double word register
616
    movsx eax,ax        ; word register to double word register
637
    movsx eax,ax        ; word register to double word register
617
    movsx ax,byte [bx]  ; byte memory to word register
638
    movsx ax,byte [bx]  ; byte memory to word register
618
    movsx edx,byte [bx] ; byte memory to double word register
639
    movsx edx,byte [bx] ; byte memory to double word register
619
    movsx eax,word [bx] ; word memory to double word register
640
    movsx eax,word [bx] ; word memory to double word register
620
 
641
 
621
 
642
 
622
2.1.3  Binary arithmetic instructions
643
2.1.3  Binary arithmetic instructions
623
 
644
 
624
"add" replaces the destination operand with the sum of the source and
645
"add" replaces the destination operand with the sum of the source and
625
destination operands and sets CF if overflow has occurred. The operands may
646
destination operands and sets CF if overflow has occurred. The operands may
626
be bytes, words or double words. The destination operand can be general
647
be bytes, words or double words. The destination operand can be general
627
register or memory, the source operand can be general register or immediate
648
register or memory, the source operand can be general register or immediate
628
value, it can also be memory if the destination operand is register.
649
value, it can also be memory if the destination operand is register.
629
 
650
 
630
    add ax,bx       ; add register to register
651
    add ax,bx       ; add register to register
631
    add ax,[si]     ; add memory to register
652
    add ax,[si]     ; add memory to register
632
    add [di],al     ; add register to memory
653
    add [di],al     ; add register to memory
633
    add al,48       ; add immediate value to register
654
    add al,48       ; add immediate value to register
634
    add [char],48   ; add immediate value to memory
655
    add [char],48   ; add immediate value to memory
635
 
656
 
636
  "adc" sums the operands, adds one if CF is set, and replaces the destination
657
  "adc" sums the operands, adds one if CF is set, and replaces the destination
637
operand with the result. Rules for the operands are the same as for the "add"
658
operand with the result. Rules for the operands are the same as for the "add"
638
instruction. An "add" followed by multiple "adc" instructions can be used to
659
instruction. An "add" followed by multiple "adc" instructions can be used to
639
add numbers longer than 32 bits.
660
add numbers longer than 32 bits.
640
  "inc" adds one to the operand, it does not affect CF. The operand can be a
661
  "inc" adds one to the operand, it does not affect CF. The operand can be a
641
general register or memory, and the size of the operand can be byte, word or
662
general register or memory, and the size of the operand can be byte, word or
642
double word.
663
double word.
643
 
664
 
644
    inc ax          ; increment register by one
665
    inc ax          ; increment register by one
645
    inc byte [bx]   ; increment memory by one
666
    inc byte [bx]   ; increment memory by one
646
 
667
 
647
  "sub" subtracts the source operand from the destination operand and replaces
668
  "sub" subtracts the source operand from the destination operand and replaces
648
the destination operand with the result. If a borrow is required, the CF is
669
the destination operand with the result. If a borrow is required, the CF is
649
set. Rules for the operands are the same as for the "add" instruction.
670
set. Rules for the operands are the same as for the "add" instruction.
650
  "sbb" subtracts the source operand from the destination operand, subtracts
671
  "sbb" subtracts the source operand from the destination operand, subtracts
651
one if CF is set, and stores the result to the destination operand. Rules for
672
one if CF is set, and stores the result to the destination operand. Rules for
652
the operands are the same as for the "add" instruction. A "sub" followed by
673
the operands are the same as for the "add" instruction. A "sub" followed by
653
multiple "sbb" instructions may be used to subtract numbers longer than 32
674
multiple "sbb" instructions may be used to subtract numbers longer than 32
654
bits.
675
bits.
655
  "dec" subtracts one from the operand, it does not affect CF. Rules for the
676
  "dec" subtracts one from the operand, it does not affect CF. Rules for the
656
operand are the same as for the "inc" instruction.
677
operand are the same as for the "inc" instruction.
657
  "cmp" subtracts the source operand from the destination operand. It updates
678
  "cmp" subtracts the source operand from the destination operand. It updates
658
the flags as the "sub" instruction, but does not alter the source and
679
the flags as the "sub" instruction, but does not alter the source and
659
destination operands. Rules for the operands are the same as for the "sub"
680
destination operands. Rules for the operands are the same as for the "sub"
660
instruction.
681
instruction.
661
  "neg" subtracts a signed integer operand from zero. The effect of this
682
  "neg" subtracts a signed integer operand from zero. The effect of this
662
instructon is to reverse the sign of the operand from positive to negative or
683
instructon is to reverse the sign of the operand from positive to negative or
663
from negative to positive. Rules for the operand are the same as for the "inc"
684
from negative to positive. Rules for the operand are the same as for the "inc"
664
instruction.
685
instruction.
665
  "xadd" exchanges the destination operand with the source operand, then loads
686
  "xadd" exchanges the destination operand with the source operand, then loads
666
the sum of the two values into the destination operand. Rules for the operands
687
the sum of the two values into the destination operand. Rules for the operands
667
are the same as for the "add" instruction.
688
are the same as for the "add" instruction.
668
  All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
689
  All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
669
SF is always set to the same value as the result's sign bit, ZF is set when
690
SF is always set to the same value as the result's sign bit, ZF is set when
670
all the bits of result are zero, PF is set when low order eight bits of result
691
all the bits of result are zero, PF is set when low order eight bits of result
671
contain an even number of set bits, OF is set if result is too large for a
692
contain an even number of set bits, OF is set if result is too large for a
672
positive number or too small for a negative number (excluding sign bit) to fit
693
positive number or too small for a negative number (excluding sign bit) to fit
673
in destination operand.
694
in destination operand.
674
  "mul" performs an unsigned multiplication of the operand and the
695
  "mul" performs an unsigned multiplication of the operand and the
675
accumulator. If the operand is a byte, the processor multiplies it by the
696
accumulator. If the operand is a byte, the processor multiplies it by the
676
contents of AL and returns the 16-bit result to AH and AL. If the operand is a
697
contents of AL and returns the 16-bit result to AH and AL. If the operand is a
677
word, the processor multiplies it by the contents of AX and returns the 32-bit
698
word, the processor multiplies it by the contents of AX and returns the 32-bit
678
result to DX and AX. If the operand is a double word, the processor multiplies
699
result to DX and AX. If the operand is a double word, the processor multiplies
679
it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
700
it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
680
sets CF and OF when the upper half of the result is nonzero, otherwise they
701
sets CF and OF when the upper half of the result is nonzero, otherwise they
681
are cleared. Rules for the operand are the same as for the "inc" instruction.
702
are cleared. Rules for the operand are the same as for the "inc" instruction.
682
  "imul" performs a signed multiplication operation. This instruction has
703
  "imul" performs a signed multiplication operation. This instruction has
683
three variations. First has one operand and behaves in the same way as the
704
three variations. First has one operand and behaves in the same way as the
684
"mul" instruction. Second has two operands, in this case destination operand
705
"mul" instruction. Second has two operands, in this case destination operand
685
is multiplied by the source operand and the result replaces the destination
706
is multiplied by the source operand and the result replaces the destination
686
operand. Destination operand must be a general register, it can be word or
707
operand. Destination operand must be a general register, it can be word or
687
double word, source operand can be general register, memory or immediate
708
double word, source operand can be general register, memory or immediate
688
value. Third form has three operands, the destination operand must be a
709
value. Third form has three operands, the destination operand must be a
689
general register, word or double word in size, source operand can be general
710
general register, word or double word in size, source operand can be general
690
register or memory, and third operand must be an immediate value. The source
711
register or memory, and third operand must be an immediate value. The source
691
operand is multiplied by the immediate value and the result is stored in the
712
operand is multiplied by the immediate value and the result is stored in the
692
destination register. All the three forms calculate the product to twice the
713
destination register. All the three forms calculate the product to twice the
693
size of operands and set CF and OF when the upper half of the result is
714
size of operands and set CF and OF when the upper half of the result is
694
nonzero, but second and third form truncate the product to the size of
715
nonzero, but second and third form truncate the product to the size of
695
operands. So second and third forms can be also used for unsigned operands
716
operands. So second and third forms can be also used for unsigned operands
696
because, whether the operands are signed or unsigned, the lower half of the
717
because, whether the operands are signed or unsigned, the lower half of the
697
product is the same. Below are the examples for all three forms:
718
product is the same. Below are the examples for all three forms:
698
 
719
 
699
    imul bl         ; accumulator by register
720
    imul bl         ; accumulator by register
700
    imul word [si]  ; accumulator by memory
721
    imul word [si]  ; accumulator by memory
701
    imul bx,cx      ; register by register
722
    imul bx,cx      ; register by register
702
    imul bx,[si]    ; register by memory
723
    imul bx,[si]    ; register by memory
703
    imul bx,10      ; register by immediate value
724
    imul bx,10      ; register by immediate value
704
    imul ax,bx,10   ; register by immediate value to register
725
    imul ax,bx,10   ; register by immediate value to register
705
    imul ax,[si],10 ; memory by immediate value to register
726
    imul ax,[si],10 ; memory by immediate value to register
706
 
727
 
707
  "div" performs an unsigned division of the accumulator by the operand.
728
  "div" performs an unsigned division of the accumulator by the operand.
708
The dividend (the accumulator) is twice the size of the divisor (the operand),
729
The dividend (the accumulator) is twice the size of the divisor (the operand),
709
the quotient and remainder have the same size as the divisor. If divisor is
730
the quotient and remainder have the same size as the divisor. If divisor is
710
byte, the dividend is taken from AX register, the quotient is stored in AL and
731
byte, the dividend is taken from AX register, the quotient is stored in AL and
711
the remainder is stored in AH. If divisor is word, the upper half of dividend
732
the remainder is stored in AH. If divisor is word, the upper half of dividend
712
is taken from DX, the lower half of dividend is taken from AX, the quotient is
733
is taken from DX, the lower half of dividend is taken from AX, the quotient is
713
stored in AX and the remainder is stored in DX. If divisor is double word,
734
stored in AX and the remainder is stored in DX. If divisor is double word,
714
the upper half of dividend is taken from EDX, the lower half of dividend is
735
the upper half of dividend is taken from EDX, the lower half of dividend is
715
taken from EAX, the quotient is stored in EAX and the remainder is stored in
736
taken from EAX, the quotient is stored in EAX and the remainder is stored in
716
EDX. Rules for the operand are the same as for the "mul" instruction.
737
EDX. Rules for the operand are the same as for the "mul" instruction.
717
  "idiv" performs a signed division of the accumulator by the operand.
738
  "idiv" performs a signed division of the accumulator by the operand.
718
It uses the same registers as the "div" instruction, and the rules for
739
It uses the same registers as the "div" instruction, and the rules for
719
the operand are the same.
740
the operand are the same.
720
 
741
 
721
 
742
 
722
2.1.4  Decimal arithmetic instructions
743
2.1.4  Decimal arithmetic instructions
723
 
744
 
724
Decimal arithmetic is performed by combining the binary arithmetic
745
Decimal arithmetic is performed by combining the binary arithmetic
725
instructions (already described in the prior section) with the decimal
746
instructions (already described in the prior section) with the decimal
726
arithmetic instructions. The decimal arithmetic instructions are used to
747
arithmetic instructions. The decimal arithmetic instructions are used to
727
adjust the results of a previous binary arithmetic operation to produce a
748
adjust the results of a previous binary arithmetic operation to produce a
728
valid packed or unpacked decimal result, or to adjust the inputs to a
749
valid packed or unpacked decimal result, or to adjust the inputs to a
729
subsequent binary arithmetic operation so the operation will produce a valid
750
subsequent binary arithmetic operation so the operation will produce a valid
730
packed or unpacked decimal result.
751
packed or unpacked decimal result.
731
  "daa" adjusts the result of adding two valid packed decimal operands in
752
  "daa" adjusts the result of adding two valid packed decimal operands in
732
AL. "daa" must always follow the addition of two pairs of packed decimal
753
AL. "daa" must always follow the addition of two pairs of packed decimal
733
numbers (one digit in each half-byte) to obtain a pair of valid packed
754
numbers (one digit in each half-byte) to obtain a pair of valid packed
734
decimal digits as results. The carry flag is set if carry was needed.
755
decimal digits as results. The carry flag is set if carry was needed.
735
This instruction has no operands.
756
This instruction has no operands.
736
  "das" adjusts the result of subtracting two valid packed decimal operands
757
  "das" adjusts the result of subtracting two valid packed decimal operands
737
in AL. "das" must always follow the subtraction of one pair of packed decimal
758
in AL. "das" must always follow the subtraction of one pair of packed decimal
738
numbers (one digit in each half-byte) from another to obtain a pair of valid
759
numbers (one digit in each half-byte) from another to obtain a pair of valid
739
packed decimal digits as results. The carry flag is set if a borrow was
760
packed decimal digits as results. The carry flag is set if a borrow was
740
needed. This instruction has no operands.
761
needed. This instruction has no operands.
741
  "aaa" changes the contents of register AL to a valid unpacked decimal
762
  "aaa" changes the contents of register AL to a valid unpacked decimal
742
number, and zeroes the top four bits. "aaa" must always follow the addition
763
number, and zeroes the top four bits. "aaa" must always follow the addition
743
of two unpacked decimal operands in AL. The carry flag is set and AH is
764
of two unpacked decimal operands in AL. The carry flag is set and AH is
744
incremented if a carry is necessary. This instruction has no operands.
765
incremented if a carry is necessary. This instruction has no operands.
745
  "aas" changes the contents of register AL to a valid unpacked decimal
766
  "aas" changes the contents of register AL to a valid unpacked decimal
746
number, and zeroes the top four bits. "aas" must always follow the
767
number, and zeroes the top four bits. "aas" must always follow the
747
subtraction of one unpacked decimal operand from another in AL. The carry flag
768
subtraction of one unpacked decimal operand from another in AL. The carry flag
748
is set and AH decremented if a borrow is necessary. This instruction has no
769
is set and AH decremented if a borrow is necessary. This instruction has no
749
operands.
770
operands.
750
  "aam" corrects the result of a multiplication of two valid unpacked decimal
771
  "aam" corrects the result of a multiplication of two valid unpacked decimal
751
numbers. "aam" must always follow the multiplication of two decimal numbers
772
numbers. "aam" must always follow the multiplication of two decimal numbers
752
to produce a valid decimal result. The high order digit is left in AH, the
773
to produce a valid decimal result. The high order digit is left in AH, the
753
low order digit in AL. The generalized version of this instruction allows
774
low order digit in AL. The generalized version of this instruction allows
754
adjustment of the contents of the AX to create two unpacked digits of any
775
adjustment of the contents of the AX to create two unpacked digits of any
755
number base. The standard version of this instruction has no operands, the
776
number base. The standard version of this instruction has no operands, the
756
generalized version has one operand - an immediate value specifying the
777
generalized version has one operand - an immediate value specifying the
757
number base for the created digits.
778
number base for the created digits.
758
  "aad" modifies the numerator in AH and AL to prepare for the division of two
779
  "aad" modifies the numerator in AH and AL to prepare for the division of two
759
valid unpacked decimal operands so that the quotient produced by the division
780
valid unpacked decimal operands so that the quotient produced by the division
760
will be a valid unpacked decimal number. AH should contain the high order
781
will be a valid unpacked decimal number. AH should contain the high order
761
digit and AL the low order digit. This instruction adjusts the value and
782
digit and AL the low order digit. This instruction adjusts the value and
762
places the result in AL, while AH will contain zero. The generalized version
783
places the result in AL, while AH will contain zero. The generalized version
763
of this instruction allows adjustment of two unpacked digits of any number
784
of this instruction allows adjustment of two unpacked digits of any number
764
base. Rules for the operand are the same as for the "aam" instruction.
785
base. Rules for the operand are the same as for the "aam" instruction.
765
 
786
 
766
 
787
 
767
2.1.5  Logical instructions
788
2.1.5  Logical instructions
768
 
789
 
769
"not" inverts the bits in the specified operand to form a one's
790
"not" inverts the bits in the specified operand to form a one's complement 
770
complement of the operand. It has no effect on the flags. Rules for the
791
of the operand. It has no effect on the flags. Rules for the operand are the 
771
operand are the same as for the "inc" instruction.
792
same as for the "inc" instruction.
772
  "and", "or" and "xor" instructions perform the standard
793
  "and", "or" and "xor" instructions perform the standard logical operations. 
773
logical operations. They update the SF, ZF and PF flags. Rules for the
794
They update the SF, ZF and PF flags. Rules for the operands are the same as 
774
operands are the same as for the "add" instruction.
795
for the "add" instruction.
775
  "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
796
  "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
776
be in memory or in a general register. The location of the bit is specified
797
be in memory or in a general register. The location of the bit is specified
777
as an offset from the low order end of the operand. The value of the offset
798
as an offset from the low order end of the operand. The value of the offset
778
is the taken from the second operand, it either may be an immediate byte or
799
is the taken from the second operand, it either may be an immediate byte or
779
a general register. These instructions first assign the value of the selected
800
a general register. These instructions first assign the value of the selected
780
bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
801
bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
781
1, "btr" resets the selected bit to 0, "btc" changes the bit to its
802
1, "btr" resets the selected bit to 0, "btc" changes the bit to its
782
complement. The first operand can be word or double word.
803
complement. The first operand can be word or double word.
783
 
804
 
784
    bt  ax,15        ; test bit in register
805
    bt  ax,15        ; test bit in register
785
    bts word [bx],15 ; test and set bit in memory
806
    bts word [bx],15 ; test and set bit in memory
786
    btr ax,cx        ; test and reset bit in register
807
    btr ax,cx        ; test and reset bit in register
787
    btc word [bx],cx ; test and complement bit in memory
808
    btc word [bx],cx ; test and complement bit in memory
788
 
809
 
789
  "bsf" and "bsr" instructions scan a word or double word for first set bit
810
  "bsf" and "bsr" instructions scan a word or double word for first set bit
790
and store the index of this bit into destination operand, which must be
811
and store the index of this bit into destination operand, which must be
791
general register. The bit string being scanned is specified by source operand,
812
general register. The bit string being scanned is specified by source operand,
792
it may be either general register or memory. The ZF flag is set if the entire
813
it may be either general register or memory. The ZF flag is set if the entire
793
string is zero (no set bits are found); otherwise it is cleared. If no set bit
814
string is zero (no set bits are found); otherwise it is cleared. If no set bit
794
is found, the value of the destination register is undefined. "bsf" scans from
815
is found, the value of the destination register is undefined. "bsf" scans from
795
low order to high order (starting from bit index zero). "bsr" scans from high
816
low order to high order (starting from bit index zero). "bsr" scans from high
796
order to low order (starting from bit index 15 of a word or index 31 of a
817
order to low order (starting from bit index 15 of a word or index 31 of a
797
double word).
818
double word).
798
 
819
 
799
    bsf ax,bx        ; scan register forward
820
    bsf ax,bx        ; scan register forward
800
    bsr ax,[si]      ; scan memory reverse
821
    bsr ax,[si]      ; scan memory reverse
801
 
822
 
802
  "shl" shifts the destination operand left by the number of bits specified
823
  "shl" shifts the destination operand left by the number of bits specified
803
in the second operand. The destination operand can be byte, word, or double
824
in the second operand. The destination operand can be byte, word, or double
804
word general register or memory. The second operand can be an immediate value
825
word general register or memory. The second operand can be an immediate value
805
or the CL register. The processor shifts zeros in from the right (low order)
826
or the CL register. The processor shifts zeros in from the right (low order)
806
side of the operand as bits exit from the left side. The last bit that exited
827
side of the operand as bits exit from the left side. The last bit that exited
807
is stored in CF. "sal" is a synonym for "shl".
828
is stored in CF. "sal" is a synonym for "shl".
808
 
829
 
809
    shl al,1         ; shift register left by one bit
830
    shl al,1         ; shift register left by one bit
810
    shl byte [bx],1  ; shift memory left by one bit
831
    shl byte [bx],1  ; shift memory left by one bit
811
    shl ax,cl        ; shift register left by count from cl
832
    shl ax,cl        ; shift register left by count from cl
812
    shl word [bx],cl ; shift memory left by count from cl
833
    shl word [bx],cl ; shift memory left by count from cl
813
 
834
 
814
  "shr" and "sar" shift the destination operand right by the number of bits
835
  "shr" and "sar" shift the destination operand right by the number of bits
815
specified in the second operand. Rules for operands are the same as for the
836
specified in the second operand. Rules for operands are the same as for the
816
"shl" instruction. "shr" shifts zeros in from the left side of the operand as
837
"shl" instruction. "shr" shifts zeros in from the left side of the operand as
817
bits exit from the right side. The last bit that exited is stored in CF.
838
bits exit from the right side. The last bit that exited is stored in CF.
818
"sar" preserves the sign of the operand by shifting in zeros on the left side
839
"sar" preserves the sign of the operand by shifting in zeros on the left side
819
if the value is positive or by shifting in ones if the value is negative.
840
if the value is positive or by shifting in ones if the value is negative.
820
  "shld" shifts bits of the destination operand to the left by the number
841
  "shld" shifts bits of the destination operand to the left by the number
821
of bits specified in third operand, while shifting high order bits from the
842
of bits specified in third operand, while shifting high order bits from the
822
source operand into the destination operand on the right. The source operand
843
source operand into the destination operand on the right. The source operand
823
remains unmodified. The destination operand can be a word or double word
844
remains unmodified. The destination operand can be a word or double word
824
general register or memory, the source operand must be a general register,
845
general register or memory, the source operand must be a general register,
825
third operand can be an immediate value or the CL register.
846
third operand can be an immediate value or the CL register.
826
 
847
 
827
    shld ax,bx,1     ; shift register left by one bit
848
    shld ax,bx,1     ; shift register left by one bit
828
    shld [di],bx,1   ; shift memory left by one bit
849
    shld [di],bx,1   ; shift memory left by one bit
829
    shld ax,bx,cl    ; shift register left by count from cl
850
    shld ax,bx,cl    ; shift register left by count from cl
830
    shld [di],bx,cl  ; shift memory left by count from cl
851
    shld [di],bx,cl  ; shift memory left by count from cl
831
 
852
 
832
  "shrd" shifts bits of the destination operand to the right, while shifting
853
  "shrd" shifts bits of the destination operand to the right, while shifting
833
low order bits from the source operand into the destination operand on the
854
low order bits from the source operand into the destination operand on the
834
left. The source operand remains unmodified. Rules for operands are the same
855
left. The source operand remains unmodified. Rules for operands are the same
835
as for the "shld" instruction.
856
as for the "shld" instruction.
836
  "rol" and "rcl" rotate the byte, word or double word destination operand
857
  "rol" and "rcl" rotate the byte, word or double word destination operand
837
left by the number of bits specified in the second operand. For each rotation
858
left by the number of bits specified in the second operand. For each rotation
838
specified, the high order bit that exits from the left of the operand returns
859
specified, the high order bit that exits from the left of the operand returns
839
at the right to become the new low order bit. "rcl" additionally puts in CF
860
at the right to become the new low order bit. "rcl" additionally puts in CF
840
each high order bit that exits from the left side of the operand before it
861
each high order bit that exits from the left side of the operand before it
841
returns to the operand as the low order bit on the next rotation cycle. Rules
862
returns to the operand as the low order bit on the next rotation cycle. Rules
842
for operands are the same as for the "shl" instruction.
863
for operands are the same as for the "shl" instruction.
843
  "ror" and "rcr" rotate the byte, word or double word destination operand
864
  "ror" and "rcr" rotate the byte, word or double word destination operand
844
right by the number of bits specified in the second operand. For each rotation
865
right by the number of bits specified in the second operand. For each rotation
845
specified, the low order bit that exits from the right of the operand returns
866
specified, the low order bit that exits from the right of the operand returns
846
at the left to become the new high order bit. "rcr" additionally puts in CF
867
at the left to become the new high order bit. "rcr" additionally puts in CF
847
each low order bit that exits from the right side of the operand before it
868
each low order bit that exits from the right side of the operand before it
848
returns to the operand as the high order bit on the next rotation cycle.
869
returns to the operand as the high order bit on the next rotation cycle.
849
Rules for operands are the same as for the "shl" instruction.
870
Rules for operands are the same as for the "shl" instruction.
850
  "test" performs the same action as the "and" instruction, but it does not
871
  "test" performs the same action as the "and" instruction, but it does not
851
alter the destination operand, only updates flags. Rules for the operands are
872
alter the destination operand, only updates flags. Rules for the operands are
852
the same as for the "and" instruction.
873
the same as for the "and" instruction.
853
  "bswap" reverses the byte order of a 32-bit general register: bits 0 through
874
  "bswap" reverses the byte order of a 32-bit general register: bits 0 through
854
7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
875
7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
855
bits 16 through 23. This instruction is provided for converting little-endian
876
bits 16 through 23. This instruction is provided for converting little-endian
856
values to big-endian format and vice versa.
877
values to big-endian format and vice versa.
857
 
878
 
858
    bswap edx        ; swap bytes in register
879
    bswap edx        ; swap bytes in register
859
 
880
 
860
 
881
 
861
2.1.6  Control transfer instructions
882
2.1.6  Control transfer instructions
862
 
883
 
863
"jmp" unconditionally transfers control to the target location. The
884
"jmp" unconditionally transfers control to the target location. The
864
destination address can be specified directly within the instruction or
885
destination address can be specified directly within the instruction or
865
indirectly through a register or memory, the acceptable size of this address
886
indirectly through a register or memory, the acceptable size of this address
866
depends on whether the jump is near or far (it can be specified by preceding
887
depends on whether the jump is near or far (it can be specified by preceding
867
the operand with "near" or "far" operator) and whether the instruction is
888
the operand with "near" or "far" operator) and whether the instruction is
868
16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
889
16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
869
instruction or the "dword" size for 32-bit instruction. Operand for far jump
890
instruction or the "dword" size for 32-bit instruction. Operand for far jump
870
should be "dword" size for 16-bit instruction or "pword" size for 32-bit
891
should be "dword" size for 16-bit instruction or "pword" size for 32-bit
871
instruction. A direct "jmp" instruction includes the destination address as
892
instruction. A direct "jmp" instruction includes the destination address as
872
part of the instruction (and can be preceded by "short", "near" or "far"
893
part of the instruction (and can be preceded by "short", "near" or "far"
873
operator), the operand specifying address should be the numerical expression
894
operator), the operand specifying address should be the numerical expression
874
for near or short jump, or two numerical expressions separated with colon for
895
for near or short jump, or two numerical expressions separated with colon for
875
far jump, the first specifies selector of segment, the second is the offset
896
far jump, the first specifies selector of segment, the second is the offset
876
within segment. The "pword" operator can be used to force the 32-bit far call,
897
within segment. The "pword" operator can be used to force the 32-bit far call,
877
and "dword" to force the 16-bit far call. An indirect "jmp" instruction
898
and "dword" to force the 16-bit far call. An indirect "jmp" instruction
878
obtains the destination address indirectly through a register or a pointer
899
obtains the destination address indirectly through a register or a pointer
879
variable, the operand should be general register or memory. See also 1.2.5 for
900
variable, the operand should be general register or memory. See also 1.2.5 for
880
some more details.
901
some more details.
881
 
902
 
882
    jmp 100h         ; direct near jump
903
    jmp 100h         ; direct near jump
883
    jmp 0FFFFh:0     ; direct far jump
904
    jmp 0FFFFh:0     ; direct far jump
884
    jmp ax           ; indirect near jump
905
    jmp ax           ; indirect near jump
885
    jmp pword [ebx]  ; indirect far jump
906
    jmp pword [ebx]  ; indirect far jump
886
 
907
 
887
  "call" transfers control to the procedure, saving on the stack the address
908
  "call" transfers control to the procedure, saving on the stack the address
888
of the instruction following the "call" for later use by a "ret" (return)
909
of the instruction following the "call" for later use by a "ret" (return)
889
instruction. Rules for the operands are the same as for the "jmp" instruction,
910
instruction. Rules for the operands are the same as for the "jmp" instruction,
890
but the "call" has no short variant of direct instruction and thus it not
911
but the "call" has no short variant of direct instruction and thus it not
891
optimized.
912
optimized.
892
  "ret", "retn" and "retf" instructions terminate the execution of a procedure
913
  "ret", "retn" and "retf" instructions terminate the execution of a procedure
893
and transfers control back to the program that originally invoked the
914
and transfers control back to the program that originally invoked the
894
procedure using the address that was stored on the stack by the "call"
915
procedure using the address that was stored on the stack by the "call"
895
instruction. "ret" is the equivalent for "retn", which returns from the
916
instruction. "ret" is the equivalent for "retn", which returns from the
896
procedure that was executed using the near call, while "retf" returns from
917
procedure that was executed using the near call, while "retf" returns from
897
the procedure that was executed using the far call. These instructions default
918
the procedure that was executed using the far call. These instructions default
898
to the size of address appropriate for the current code setting, but the size
919
to the size of address appropriate for the current code setting, but the size
899
of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
920
of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
900
mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
921
mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
901
All these instructions may optionally specify an immediate operand, by adding
922
All these instructions may optionally specify an immediate operand, by adding
902
this constant to the stack pointer, they effectively remove any arguments that
923
this constant to the stack pointer, they effectively remove any arguments that
903
the calling program pushed on the stack before the execution of the "call"
924
the calling program pushed on the stack before the execution of the "call"
904
instruction.
925
instruction.
905
  "iret" returns control to an interrupted procedure. It differs from "ret" in
926
  "iret" returns control to an interrupted procedure. It differs from "ret" in
906
that it also pops the flags from the stack into the flags register. The flags
927
that it also pops the flags from the stack into the flags register. The flags
907
are stored on the stack by the interrupt mechanism. It defaults to the size of
928
are stored on the stack by the interrupt mechanism. It defaults to the size of
908
return address appropriate for the current code setting, but it can be forced
929
return address appropriate for the current code setting, but it can be forced
909
to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
930
to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
910
  The conditional transfer instructions are jumps that may or may not transfer
931
  The conditional transfer instructions are jumps that may or may not transfer
911
control, depending on the state of the CPU flags when the instruction
932
control, depending on the state of the CPU flags when the instruction
912
executes. The mnemonics for conditional jumps may be obtained by attaching
933
executes. The mnemonics for conditional jumps may be obtained by attaching
913
the condition mnemonic (see table 2.1) to the "j" mnemonic,
934
the condition mnemonic (see table 2.1) to the "j" mnemonic,
914
for example "jc" instruction will transfer the control when the CF flag is
935
for example "jc" instruction will transfer the control when the CF flag is
915
set. The conditional jumps can be short or near, and direct only, and can be
936
set. The conditional jumps can be short or near, and direct only, and can be
916
optimized (see 1.2.5), the operand should be an immediate value specifying
937
optimized (see 1.2.5), the operand should be an immediate value specifying
917
target address.
938
target address.
918
 
939
 
919
   Table 2.1  Conditions
940
   Table 2.1  Conditions
920
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
941
  /-----------------------------------------------------------\
921
  ³ Mnemonic ³ Condition tested      ³ Description            ³
942
  | Mnemonic | Condition tested      | Description            |
922
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
943
  |==========|=======================|========================|
923
  ³ o        ³ OF = 1                ³ overflow               ³
944
  | o        | OF = 1                | overflow               |
924
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
945
  |----------|-----------------------|------------------------|
925
  ³ no       ³ OF = 0                ³ not overflow           ³
946
  | no       | OF = 0                | not overflow           |
926
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
947
  |----------|-----------------------|------------------------|
927
  ³ c        ³                       ³ carry                  ³
948
  | c        |                       | carry                  |
928
  ³ b        ³ CF = 1                ³ below                  ³
949
  | b        | CF = 1                | below                  |
929
  ³ nae      ³                       ³ not above nor equal    ³
950
  | nae      |                       | not above nor equal    |
930
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
951
  |----------|-----------------------|------------------------|
931
  ³ nc       ³                       ³ not carry              ³
952
  | nc       |                       | not carry              |
932
  ³ ae       ³ CF = 0                ³ above or equal         ³
953
  | ae       | CF = 0                | above or equal         |
933
  ³ nb       ³                       ³ not below              ³
954
  | nb       |                       | not below              |
934
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
955
  |----------|-----------------------|------------------------|
935
  ³ e        ³ ZF = 1                ³ equal                  ³
956
  | e        | ZF = 1                | equal                  |
936
  ³ z        ³                       ³ zero                   ³
957
  | z        |                       | zero                   |
937
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
958
  |----------|-----------------------|------------------------|
938
  ³ ne       ³ ZF = 0                ³ not equal              ³
959
  | ne       | ZF = 0                | not equal              |
939
  ³ nz       ³                       ³ not zero               ³
960
  | nz       |                       | not zero               |
940
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
961
  |----------|-----------------------|------------------------|
941
  ³ be       ³ CF or ZF = 1          ³ below or equal         ³
962
  | be       | CF or ZF = 1          | below or equal         |
942
  ³ na       ³                       ³ not above              ³
963
  | na       |                       | not above              |
943
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
964
  |----------|-----------------------|------------------------|
944
  ³ a        ³ CF or ZF = 0          ³ above                  ³
965
  | a        | CF or ZF = 0          | above                  |
945
  ³ nbe      ³                       ³ not below nor equal    ³
966
  | nbe      |                       | not below nor equal    |
946
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
967
  |----------|-----------------------|------------------------|
947
  ³ s        ³ SF = 1                ³ sign                   ³
968
  | s        | SF = 1                | sign                   |
948
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
969
  |----------|-----------------------|------------------------|
949
  ³ ns       ³ SF = 0                ³ not sign               ³
970
  | ns       | SF = 0                | not sign               |
950
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
971
  |----------|-----------------------|------------------------|
951
  ³ p        ³ PF = 1                ³ parity                 ³
972
  | p        | PF = 1                | parity                 |
952
  ³ pe       ³                       ³ parity even            ³
973
  | pe       |                       | parity even            |
953
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
974
  |----------|-----------------------|------------------------|
954
  ³ np       ³ PF = 0                ³ not parity             ³
975
  | np       | PF = 0                | not parity             |
955
  ³ po       ³                       ³ parity odd             ³
976
  | po       |                       | parity odd             |
956
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
977
  |----------|-----------------------|------------------------|
957
  ³ l        ³ SF xor OF = 1         ³ less                   ³
978
  | l        | SF xor OF = 1         | less                   |
958
  ³ nge      ³                       ³ not greater nor equal  ³
979
  | nge      |                       | not greater nor equal  |
959
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
980
  |----------|-----------------------|------------------------|
960
  ³ ge       ³ SF xor OF = 0         ³ greater or equal       ³
981
  | ge       | SF xor OF = 0         | greater or equal       |
961
  ³ nl       ³                       ³ not less               ³
982
  | nl       |                       | not less               |
962
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
983
  |----------|-----------------------|------------------------|
963
  ³ le       ³ (SF xor OF) or ZF = 1 ³ less or equal          ³
984
  | le       | (SF xor OF) or ZF = 1 | less or equal          |
964
  ³ ng       ³                       ³ not greater            ³
985
  | ng       |                       | not greater            |
965
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
986
  |----------|-----------------------|------------------------|
966
  ³ g        ³ (SF xor OF) or ZF = 0 ³ greater                ³
987
  | g        | (SF xor OF) or ZF = 0 | greater                |
967
  ³ nle      ³                       ³ not less nor equal     ³
988
  | nle      |                       | not less nor equal     |
968
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
989
  \-----------------------------------------------------------/
969
 
990
 
970
  The "loop" instructions are conditional jumps that use a value placed in
991
  The "loop" instructions are conditional jumps that use a value placed in
971
CX (or ECX) to specify the number of repetitions of a software loop. All
992
CX (or ECX) to specify the number of repetitions of a software loop. All
972
"loop" instructions automatically decrement CX (or ECX) and terminate the
993
"loop" instructions automatically decrement CX (or ECX) and terminate the
973
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
994
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
974
whether the current code setting is 16-bit or 32-bit, but it can be forced to
995
whether the current code setting is 16-bit or 32-bit, but it can be forced to
975
us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
996
us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
976
"loope" and "loopz" are the synonyms for the same instruction, which acts as
997
"loope" and "loopz" are the synonyms for the same instruction, which acts as
977
the standard "loop", but also terminates the loop when ZF flag is set.
998
the standard "loop", but also terminates the loop when ZF flag is set.
978
"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
999
"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
979
and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
1000
and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
980
synonyms for the same instructions, which acts as the standard "loop", but
1001
synonyms for the same instructions, which acts as the standard "loop", but
981
also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
1002
also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
982
mnemonics force them to use CX register while "loopned" and "loopnzd" force
1003
mnemonics force them to use CX register while "loopned" and "loopnzd" force
983
them to use ECX register. Every "loop" instruction needs an operand being an
1004
them to use ECX register. Every "loop" instruction needs an operand being an
984
immediate value specifying target address, it can be only short jump (in the
1005
immediate value specifying target address, it can be only short jump (in the
985
range of 128 bytes back and 127 bytes forward from the address of instruction
1006
range of 128 bytes back and 127 bytes forward from the address of instruction
986
following the "loop" instruction).
1007
following the "loop" instruction).
987
  "jcxz" branches to the label specified in the instruction if it finds a
1008
  "jcxz" branches to the label specified in the instruction if it finds a
988
value of zero in CX, "jecxz" does the same, but checks the value of ECX
1009
value of zero in CX, "jecxz" does the same, but checks the value of ECX
989
instead of CX. Rules for the operands are the same as for the "loop"
1010
instead of CX. Rules for the operands are the same as for the "loop"
990
instruction.
1011
instruction.
991
  "int" activates the interrupt service routine that corresponds to the
1012
  "int" activates the interrupt service routine that corresponds to the
992
number specified as an operand to the instruction, the number should be in
1013
number specified as an operand to the instruction, the number should be in
993
range from 0 to 255. The interrupt service routine terminates with an "iret"
1014
range from 0 to 255. The interrupt service routine terminates with an "iret"
994
instruction that returns control to the instruction that follows "int".
1015
instruction that returns control to the instruction that follows "int".
995
"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
1016
"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
996
"into" instruction invokes the interrupt 4 if the OF flag is set.
1017
"into" instruction invokes the interrupt 4 if the OF flag is set.
997
  "bound" verifies that the signed value contained in the specified register
1018
  "bound" verifies that the signed value contained in the specified register
998
lies within specified limits. An interrupt 5 occurs if the value contained in
1019
lies within specified limits. An interrupt 5 occurs if the value contained in
999
the register is less than the lower bound or greater than the upper bound. It
1020
the register is less than the lower bound or greater than the upper bound. It
1000
needs two operands, the first operand specifies the register being tested,
1021
needs two operands, the first operand specifies the register being tested,
1001
the second operand should be memory address for the two signed limit values.
1022
the second operand should be memory address for the two signed limit values.
1002
The operands can be "word" or "dword" in size.
1023
The operands can be "word" or "dword" in size.
1003
 
1024
 
1004
    bound ax,[bx]    ; check word for bounds
1025
    bound ax,[bx]    ; check word for bounds
1005
    bound eax,[esi]  ; check double word for bounds
1026
    bound eax,[esi]  ; check double word for bounds
1006
 
1027
 
1007
 
1028
 
1008
2.1.7  I/O instructions
1029
2.1.7  I/O instructions
1009
 
1030
 
1010
  "in" transfers a byte, word, or double word from an input port to AL, AX,
1031
  "in" transfers a byte, word, or double word from an input port to AL, AX,
1011
or EAX. I/O ports can be addressed either directly, with the immediate byte
1032
or EAX. I/O ports can be addressed either directly, with the immediate byte
1012
value coded in instruction, or indirectly via the DX register. The destination
1033
value coded in instruction, or indirectly via the DX register. The destination
1013
operand should be AL, AX, or EAX register. The source operand should be an
1034
operand should be AL, AX, or EAX register. The source operand should be an
1014
immediate value in range from 0 to 255, or DX register.
1035
immediate value in range from 0 to 255, or DX register.
1015
 
1036
 
1016
    in al,20h        ; input byte from port 20h
1037
    in al,20h        ; input byte from port 20h
1017
    in ax,dx         ; input word from port addressed by dx
1038
    in ax,dx         ; input word from port addressed by dx
1018
 
1039
 
1019
  "out" transfers a byte, word, or double word to an output port from AL, AX,
1040
  "out" transfers a byte, word, or double word to an output port from AL, AX,
1020
or EAX. The program can specify the number of the port using the same methods
1041
or EAX. The program can specify the number of the port using the same methods
1021
as the "in" instruction. The destination operand should be an immediate value
1042
as the "in" instruction. The destination operand should be an immediate value
1022
in range from 0 to 255, or DX register. The source operand should be AL, AX,
1043
in range from 0 to 255, or DX register. The source operand should be AL, AX,
1023
or EAX register.
1044
or EAX register.
1024
 
1045
 
1025
    out 20h,ax       ; output word to port 20h
1046
    out 20h,ax       ; output word to port 20h
1026
    out dx,al        ; output byte to port addressed by dx
1047
    out dx,al        ; output byte to port addressed by dx
1027
 
1048
 
1028
 
1049
 
1029
2.1.8  Strings operations
1050
2.1.8  Strings operations
1030
 
1051
 
1031
The string operations operate on one element of a string. A string element
1052
The string operations operate on one element of a string. A string element
1032
may be a byte, a word, or a double word. The string elements are addressed by
1053
may be a byte, a word, or a double word. The string elements are addressed by
1033
SI and DI (or ESI and EDI) registers. After every string operation SI and/or
1054
SI and DI (or ESI and EDI) registers. After every string operation SI and/or
1034
DI (or ESI and/or EDI) are automatically updated to point to the next element
1055
DI (or ESI and/or EDI) are automatically updated to point to the next element
1035
of the string. If DF (direction flag) is zero, the index registers are
1056
of the string. If DF (direction flag) is zero, the index registers are
1036
incremented, if DF is one, they are decremented. The amount of the increment
1057
incremented, if DF is one, they are decremented. The amount of the increment
1037
or decrement is 1, 2, or 4 depending on the size of the string element. Every
1058
or decrement is 1, 2, or 4 depending on the size of the string element. Every
1038
string operation instruction has short forms which have no operands and use
1059
string operation instruction has short forms which have no operands and use
1039
SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
1060
SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
1040
type is 32-bit. SI and ESI by default address data in the segment selected
1061
type is 32-bit. SI and ESI by default address data in the segment selected
1041
by DS, DI and EDI always address data in the segment selected by ES. Short
1062
by DS, DI and EDI always address data in the segment selected by ES. Short
1042
form is obtained by attaching to the mnemonic of string operation letter
1063
form is obtained by attaching to the mnemonic of string operation letter
1043
specifying the size of string element, it should be "b" for byte element,
1064
specifying the size of string element, it should be "b" for byte element,
1044
"w" for word element, and "d" for double word element. Full form of string
1065
"w" for word element, and "d" for double word element. Full form of string
1045
operation needs operands providing the size operator and the memory addresses,
1066
operation needs operands providing the size operator and the memory addresses,
1046
which can be SI or ESI with any segment prefix, DI or EDI always with ES
1067
which can be SI or ESI with any segment prefix, DI or EDI always with ES
1047
segment prefix.
1068
segment prefix.
1048
  "movs" transfers the string element pointed to by SI (or ESI) to the
1069
  "movs" transfers the string element pointed to by SI (or ESI) to the
1049
location pointed to by DI (or EDI). Size of operands can be byte, word, or
1070
location pointed to by DI (or EDI). Size of operands can be byte, word, or
1050
double word. The destination operand should be memory addressed by DI or EDI,
1071
double word. The destination operand should be memory addressed by DI or EDI,
1051
the source operand should be memory addressed by SI or ESI with any segment
1072
the source operand should be memory addressed by SI or ESI with any segment
1052
prefix.
1073
prefix.
1053
 
1074
 
1054
    movs byte [di],[si]        ; transfer byte
1075
    movs byte [di],[si]        ; transfer byte
1055
    movs word [es:di],[ss:si]  ; transfer word
1076
    movs word [es:di],[ss:si]  ; transfer word
1056
    movsd                      ; transfer double word
1077
    movsd                      ; transfer double word
1057
 
1078
 
1058
  "cmps" subtracts the destination string element from the source string
1079
  "cmps" subtracts the destination string element from the source string
1059
element and updates the flags AF, SF, PF, CF and OF, but it does not change
1080
element and updates the flags AF, SF, PF, CF and OF, but it does not change
1060
any of the compared elements. If the string elements are equal, ZF is set,
1081
any of the compared elements. If the string elements are equal, ZF is set,
1061
otherwise it is cleared. The first operand for this instruction should be the
1082
otherwise it is cleared. The first operand for this instruction should be the
1062
source string element addressed by SI or ESI with any segment prefix, the
1083
source string element addressed by SI or ESI with any segment prefix, the
1063
second operand should be the destination string element addressed by DI or
1084
second operand should be the destination string element addressed by DI or
1064
EDI.
1085
EDI.
1065
 
1086
 
1066
    cmpsb                      ; compare bytes
1087
    cmpsb                      ; compare bytes
1067
    cmps word [ds:si],[es:di]  ; compare words
1088
    cmps word [ds:si],[es:di]  ; compare words
1068
    cmps dword [fs:esi],[edi]  ; compare double words
1089
    cmps dword [fs:esi],[edi]  ; compare double words
1069
 
1090
 
1070
  "scas" subtracts the destination string element from AL, AX, or EAX
1091
  "scas" subtracts the destination string element from AL, AX, or EAX
1071
(depending on the size of string element) and updates the flags AF, SF, ZF,
1092
(depending on the size of string element) and updates the flags AF, SF, ZF,
1072
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
1093
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
1073
The operand should be the destination string element addressed by DI or EDI.
1094
The operand should be the destination string element addressed by DI or EDI.
1074
 
1095
 
1075
    scas byte [es:di]          ; scan byte
1096
    scas byte [es:di]          ; scan byte
1076
    scasw                      ; scan word
1097
    scasw                      ; scan word
1077
    scas dword [es:edi]        ; scan double word
1098
    scas dword [es:edi]        ; scan double word
1078
 
1099
 
1079
  "stos" places the value of AL, AX, or EAX into the destination string
1100
  "stos" places the value of AL, AX, or EAX into the destination string
1080
element. Rules for the operand are the same as for the "scas" instruction.
1101
element. Rules for the operand are the same as for the "scas" instruction.
1081
  "lods" places the source string element into AL, AX, or EAX. The operand
1102
  "lods" places the source string element into AL, AX, or EAX. The operand
1082
should be the source string element addressed by SI or ESI with any segment
1103
should be the source string element addressed by SI or ESI with any segment
1083
prefix.
1104
prefix.
1084
 
1105
 
1085
    lods byte [ds:si]          ; load byte
1106
    lods byte [ds:si]          ; load byte
1086
    lods word [cs:si]          ; load word
1107
    lods word [cs:si]          ; load word
1087
    lodsd                      ; load double word
1108
    lodsd                      ; load double word
1088
 
1109
 
1089
  "ins" transfers a byte, word, or double word from an input port addressed
1110
  "ins" transfers a byte, word, or double word from an input port addressed
1090
by DX register to the destination string element. The destination operand
1111
by DX register to the destination string element. The destination operand
1091
should be memory addressed by DI or EDI, the source operand should be the DX
1112
should be memory addressed by DI or EDI, the source operand should be the DX
1092
register.
1113
register.
1093
 
1114
 
1094
    insb                       ; input byte
1115
    insb                       ; input byte
1095
    ins word [es:di],dx        ; input word
1116
    ins word [es:di],dx        ; input word
1096
    ins dword [edi],dx         ; input double word
1117
    ins dword [edi],dx         ; input double word
1097
 
1118
 
1098
  "outs" transfers the source string element to an output port addressed by
1119
  "outs" transfers the source string element to an output port addressed by
1099
DX register. The destination operand should be the DX register and the source
1120
DX register. The destination operand should be the DX register and the source
1100
operand should be memory addressed by SI or ESI with any segment prefix.
1121
operand should be memory addressed by SI or ESI with any segment prefix.
1101
 
1122
 
1102
    outs dx,byte [si]          ; output byte
1123
    outs dx,byte [si]          ; output byte
1103
    outsw                      ; output word
1124
    outsw                      ; output word
1104
    outs dx,dword [gs:esi]     ; output double word
1125
    outs dx,dword [gs:esi]     ; output double word
1105
 
1126
 
1106
  The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify
1127
  The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify
1107
repeated string operation. When a string operation instruction has a repeat
1128
repeated string operation. When a string operation instruction has a repeat
1108
prefix, the operation is executed repeatedly, each time using a different
1129
prefix, the operation is executed repeatedly, each time using a different
1109
element of the string. The repetition terminates when one of the conditions
1130
element of the string. The repetition terminates when one of the conditions
1110
specified by the prefix is satisfied. All three prefixes automatically
1131
specified by the prefix is satisfied. All three prefixes automatically
1111
decrease CX or ECX register (depending whether string operation instruction
1132
decrease CX or ECX register (depending whether string operation instruction
1112
uses the 16-bit or 32-bit addressing) after each operation and repeat the
1133
uses the 16-bit or 32-bit addressing) after each operation and repeat the
1113
associated operation until CX or ECX is zero. "repe"/"repz" and
1134
associated operation until CX or ECX is zero. "repe"/"repz" and
1114
"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
1135
"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
1115
(described below). When these prefixes are used, repetition of the next
1136
(described below). When these prefixes are used, repetition of the next
1116
instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
1137
instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
1117
the execution when the ZF is zero, "repne" and "repnz" terminate the execution
1138
the execution when the ZF is zero, "repne" and "repnz" terminate the execution
1118
when the ZF is set.
1139
when the ZF is set.
1119
 
1140
 
1120
    rep  movsd       ; transfer multiple double words
1141
    rep  movsd       ; transfer multiple double words
1121
    repe cmpsb       ; compare bytes until not equal
1142
    repe cmpsb       ; compare bytes until not equal
1122
 
1143
 
1123
 
1144
 
1124
2.1.9  Flag control instructions
1145
2.1.9  Flag control instructions
1125
 
1146
 
1126
The flag control instructions provide a method for directly changing the
1147
The flag control instructions provide a method for directly changing the
1127
state of bits in the flag register. All instructions described in this
1148
state of bits in the flag register. All instructions described in this
1128
section have no operands.
1149
section have no operands.
1129
  "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
1150
  "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
1130
CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
1151
CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
1131
the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
1152
the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
1132
interrupts, "cli" zeroes the IF and therefore disables the interrupts.
1153
interrupts, "cli" zeroes the IF and therefore disables the interrupts.
1133
  "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
1154
  "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
1134
AH register. The contents of the remaining bits are undefined. The flags
1155
AH register. The contents of the remaining bits are undefined. The flags
1135
remain unaffected.
1156
remain unaffected.
1136
  "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
1157
  "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
1137
AF, PF, and CF.
1158
AF, PF, and CF.
1138
  "pushf" decrements "esp" by two or four and stores the low word or
1159
  "pushf" decrements "esp" by two or four and stores the low word or
1139
double word of flags register at the top of stack, size of stored data
1160
double word of flags register at the top of stack, size of stored data
1140
depends on the current code setting. "pushfw" variant forces storing the
1161
depends on the current code setting. "pushfw" variant forces storing the
1141
word and "pushfd" forces storing the double word.
1162
word and "pushfd" forces storing the double word.
1142
  "popf" transfers specific bits from the word or double word at the top
1163
  "popf" transfers specific bits from the word or double word at the top
1143
of stack, then increments "esp" by two or four, this value depends on
1164
of stack, then increments "esp" by two or four, this value depends on
1144
the current code setting. "popfw" variant forces restoring from the word
1165
the current code setting. "popfw" variant forces restoring from the word
1145
and "popfd" forces restoring from the double word.
1166
and "popfd" forces restoring from the double word.
1146
 
1167
 
1147
 
1168
 
1148
2.1.10  Conditional operations
1169
2.1.10  Conditional operations
1149
 
1170
 
1150
  The instructions obtained by attaching the condition mnemonic (see table
1171
  The instructions obtained by attaching the condition mnemonic (see table
1151
2.1) to the "set" mnemonic set a byte to one if the condition is true and set
1172
2.1) to the "set" mnemonic set a byte to one if the condition is true and set
1152
the byte to zero otherwise. The operand should be an 8-bit be general register
1173
the byte to zero otherwise. The operand should be an 8-bit be general register
1153
or the byte in memory.
1174
or the byte in memory.
1154
 
1175
 
1155
    setne al         ; set al if zero flag cleared
1176
    setne al         ; set al if zero flag cleared
1156
    seto byte [bx]   ; set byte if overflow
1177
    seto byte [bx]   ; set byte if overflow
1157
 
1178
 
1158
  "salc" instruction sets the all bits of AL register when the carry flag is
1179
  "salc" instruction sets the all bits of AL register when the carry flag is
1159
set and zeroes the AL register otherwise. This instruction has no arguments.
1180
set and zeroes the AL register otherwise. This instruction has no arguments.
1160
  The instructions obtained by attaching the condition mnemonic to the "cmov"
1181
  The instructions obtained by attaching the condition mnemonic to "cmov"
1161
mnemonic transfer the word or double word from the general register or memory
1182
mnemonic transfer the word or double word from the general register or memory
1162
to the general register only when the condition is true. The destination
1183
to the general register only when the condition is true. The destination
1163
operand should be general register, the source operand can be general register
1184
operand should be general register, the source operand can be general register
1164
or memory.
1185
or memory.
1165
 
1186
 
1166
    cmove ax,bx      ; move when zero flag set
1187
    cmove ax,bx      ; move when zero flag set
1167
    cmovnc eax,[ebx] ; move when carry flag cleared
1188
    cmovnc eax,[ebx] ; move when carry flag cleared
1168
 
1189
 
1169
  "cmpxchg" compares the value in the AL, AX, or EAX register with the
1190
  "cmpxchg" compares the value in the AL, AX, or EAX register with the
1170
destination operand. If the two values are equal, the source operand is
1191
destination operand. If the two values are equal, the source operand is
1171
loaded into the destination operand. Otherwise, the destination operand is
1192
loaded into the destination operand. Otherwise, the destination operand is
1172
loaded into the AL, AX, or EAX register. The destination operand may be a
1193
loaded into the AL, AX, or EAX register. The destination operand may be a
1173
general register or memory, the source operand must be a general register.
1194
general register or memory, the source operand must be a general register.
1174
 
1195
 
1175
    cmpxchg dl,bl    ; compare and exchange with register
1196
    cmpxchg dl,bl    ; compare and exchange with register
1176
    cmpxchg [bx],dx  ; compare and exchange with memory
1197
    cmpxchg [bx],dx  ; compare and exchange with memory
1177
 
1198
 
1178
  "cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the
1199
  "cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the
1179
destination operand. If the values are equal, the 64-bit value in ECX and EBX
1200
destination operand. If the values are equal, the 64-bit value in ECX and EBX
1180
registers is stored in the destination operand. Otherwise, the value in the
1201
registers is stored in the destination operand. Otherwise, the value in the
1181
destination operand is loaded into EDX and EAX registers. The destination
1202
destination operand is loaded into EDX and EAX registers. The destination
1182
operand should be a quad word in memory.
1203
operand should be a quad word in memory.
1183
 
1204
 
1184
    cmpxchg8b [bx]   ; compare and exchange 8 bytes
1205
    cmpxchg8b [bx]   ; compare and exchange 8 bytes
1185
 
1206
 
1186
 
1207
 
1187
2.1.11  Miscellaneous instructions
1208
2.1.11  Miscellaneous instructions
1188
 
1209
 
1189
"nop" instruction occupies one byte but affects nothing but the instruction
1210
"nop" instruction occupies one byte but affects nothing but the instruction
1190
pointer. This instruction has no operands and doesn't perform any operation.
1211
pointer. This instruction has no operands and doesn't perform any operation.
1191
  "ud2" instruction generates an invalid opcode exception. This instruction
1212
  "ud2" instruction generates an invalid opcode exception. This instruction
1192
is provided for software testing to explicitly generate an invalid opcode.
1213
is provided for software testing to explicitly generate an invalid opcode.
1193
This is instruction has no operands.
1214
This is instruction has no operands.
1194
  "xlat" replaces a byte in the AL register with a byte indexed by its value
1215
  "xlat" replaces a byte in the AL register with a byte indexed by its value
1195
in a translation table addressed by BX or EBX. The operand should be a byte
1216
in a translation table addressed by BX or EBX. The operand should be a byte
1196
memory addressed by BX or EBX with any segment prefix. This instruction has
1217
memory addressed by BX or EBX with any segment prefix. This instruction has
1197
also a short form "xlatb" which has no operands and uses the BX or EBX address
1218
also a short form "xlatb" which has no operands and uses the BX or EBX address
1198
in the segment selected by DS depending on the current code setting.
1219
in the segment selected by DS depending on the current code setting.
1199
  "lds" transfers a pointer variable from the source operand to DS and the
1220
  "lds" transfers a pointer variable from the source operand to DS and the
1200
destination register. The source operand must be a memory operand, and the
1221
destination register. The source operand must be a memory operand, and the
1201
destination operand must be a general register. The DS register receives the
1222
destination operand must be a general register. The DS register receives the
1202
segment selector of the pointer while the destination register receives the
1223
segment selector of the pointer while the destination register receives the
1203
offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
1224
offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
1204
to "lds" except that rather than DS register the ES, FS, GS and SS is used
1225
to "lds" except that rather than DS register the ES, FS, GS and SS is used
1205
respectively.
1226
respectively.
1206
 
1227
 
1207
    lds bx,[si]      ; load pointer to ds:bx
1228
    lds bx,[si]      ; load pointer to ds:bx
1208
 
1229
 
1209
  "lea" transfers the offset of the source operand (rather than its value)
1230
  "lea" transfers the offset of the source operand (rather than its value)
1210
to the destination operand. The source operand must be a memory operand, and
1231
to the destination operand. The source operand must be a memory operand, and
1211
the destination operand must be a general register.
1232
the destination operand must be a general register.
1212
 
1233
 
1213
    lea dx,[bx+si+1] ; load effective address to dx
1234
    lea dx,[bx+si+1] ; load effective address to dx
1214
 
1235
 
1215
  "cpuid" returns processor identification and feature information in the
1236
  "cpuid" returns processor identification and feature information in the
1216
EAX, EBX, ECX, and EDX registers. The information returned is selected by
1237
EAX, EBX, ECX, and EDX registers. The information returned is selected by
1217
entering a value in the EAX register before the instruction is executed.
1238
entering a value in the EAX register before the instruction is executed.
1218
This instruction has no operands.
1239
This instruction has no operands.
1219
  "pause" instruction delays the execution of the next instruction an
1240
  "pause" instruction delays the execution of the next instruction an
1220
implementation specific amount of time. It can be used to improve the
1241
implementation specific amount of time. It can be used to improve the
1221
performance of spin wait loops. This instruction has no operands.
1242
performance of spin wait loops. This instruction has no operands.
1222
  "enter" creates a stack frame that may be used to implement the scope rules
1243
  "enter" creates a stack frame that may be used to implement the scope rules
1223
of block-structured high-level languages. A "leave" instruction at the end of
1244
of block-structured high-level languages. A "leave" instruction at the end of
1224
a procedure complements an "enter" at the beginning of the procedure to
1245
a procedure complements an "enter" at the beginning of the procedure to
1225
simplify stack management and to control access to variables for nested
1246
simplify stack management and to control access to variables for nested
1226
procedures. The "enter" instruction includes two parameters. The first
1247
procedures. The "enter" instruction includes two parameters. The first
1227
parameter specifies the number of bytes of dynamic storage to be allocated on
1248
parameter specifies the number of bytes of dynamic storage to be allocated on
1228
the stack for the routine being entered. The second parameter corresponds to
1249
the stack for the routine being entered. The second parameter corresponds to
1229
the lexical nesting level of the routine, it can be in range from 0 to 31.
1250
the lexical nesting level of the routine, it can be in range from 0 to 31.
1230
The specified lexical level determines how many sets of stack frame pointers
1251
The specified lexical level determines how many sets of stack frame pointers
1231
the CPU copies into the new stack frame from the preceding frame. This list
1252
the CPU copies into the new stack frame from the preceding frame. This list
1232
of stack frame pointers is sometimes called the display. The first word (or
1253
of stack frame pointers is sometimes called the display. The first word (or
1233
double word when code is 32-bit) of the display is a pointer to the last stack
1254
double word when code is 32-bit) of the display is a pointer to the last stack
1234
frame. This pointer enables a "leave" instruction to reverse the action of the
1255
frame. This pointer enables a "leave" instruction to reverse the action of the
1235
previous "enter" instruction by effectively discarding the last stack frame.
1256
previous "enter" instruction by effectively discarding the last stack frame.
1236
After "enter" creates the new display for a procedure, it allocates the
1257
After "enter" creates the new display for a procedure, it allocates the
1237
dynamic storage space for that procedure by decrementing ESP by the number of
1258
dynamic storage space for that procedure by decrementing ESP by the number of
1238
bytes specified in the first parameter. To enable a procedure to address its
1259
bytes specified in the first parameter. To enable a procedure to address its
1239
display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
1260
display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
1240
frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
1261
frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
1241
BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
1262
BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
1242
levels greater than zero, the processor pushes additional frame pointers on
1263
levels greater than zero, the processor pushes additional frame pointers on
1243
the stack before adjusting the stack pointer.
1264
the stack before adjusting the stack pointer.
1244
 
1265
 
1245
    enter 2048,0     ; enter and allocate 2048 bytes on stack
1266
    enter 2048,0     ; enter and allocate 2048 bytes on stack
1246
 
1267
 
1247
 
1268
 
1248
2.1.12  System instructions
1269
2.1.12  System instructions
1249
 
1270
 
1250
"lmsw" loads the operand into the machine status word (bits 0 through 15 of
1271
"lmsw" loads the operand into the machine status word (bits 0 through 15 of
1251
CR0 register), while "smsw" stores the machine status word into the
1272
CR0 register), while "smsw" stores the machine status word into the
1252
destination operand. The operand for both those instructions can be 16-bit
1273
destination operand. The operand for both those instructions can be 16-bit
1253
general register or memory, for "smsw" it can also be 32-bit general 
1274
general register or memory, for "smsw" it can also be 32-bit general
1254
register.
1275
register.
1255
 
1276
 
1256
    lmsw ax          ; load machine status from register
1277
    lmsw ax          ; load machine status from register
1257
    smsw [bx]        ; store machine status to memory
1278
    smsw [bx]        ; store machine status to memory
1258
 
1279
 
1259
  "lgdt" and "lidt" instructions load the values in operand into the global
1280
  "lgdt" and "lidt" instructions load the values in operand into the global
1260
descriptor table register or the interrupt descriptor table register
1281
descriptor table register or the interrupt descriptor table register
1261
respectively. "sgdt" and "sidt" store the contents of the global descriptor
1282
respectively. "sgdt" and "sidt" store the contents of the global descriptor
1262
table register or the interrupt descriptor table register in the destination
1283
table register or the interrupt descriptor table register in the destination
1263
operand. The operand should be a 6 bytes in memory.
1284
operand. The operand should be a 6 bytes in memory.
1264
 
1285
 
1265
    lgdt [ebx]       ; load global descriptor table
1286
    lgdt [ebx]       ; load global descriptor table
1266
 
1287
 
1267
  "lldt" loads the operand into the segment selector field of the local
1288
  "lldt" loads the operand into the segment selector field of the local
1268
descriptor table register and "sldt" stores the segment selector from the
1289
descriptor table register and "sldt" stores the segment selector from the
1269
local descriptor table register in the operand. "ltr" loads the operand into
1290
local descriptor table register in the operand. "ltr" loads the operand into
1270
the segment selector field of the task register and "str" stores the segment
1291
the segment selector field of the task register and "str" stores the segment
1271
selector from the task register in the operand. Rules for operand are the same
1292
selector from the task register in the operand. Rules for operand are the same
1272
as for the "lmsw" and "smsw" instructions.
1293
as for the "lmsw" and "smsw" instructions.
1273
  "lar" loads the access rights from the segment descriptor specified by
1294
  "lar" loads the access rights from the segment descriptor specified by
1274
the selector in source operand into the destination operand and sets the ZF
1295
the selector in source operand into the destination operand and sets the ZF
1275
flag. The destination operand can be a 16-bit or 32-bit general register.
1296
flag. The destination operand can be a 16-bit or 32-bit general register.
1276
The source operand should be a 16-bit general register or memory.
1297
The source operand should be a 16-bit general register or memory.
1277
 
1298
 
1278
    lar ax,[bx]      ; load access rights into word
1299
    lar ax,[bx]      ; load access rights into word
1279
    lar eax,dx       ; load access rights into double word
1300
    lar eax,dx       ; load access rights into double word
1280
 
1301
 
1281
  "lsl" loads the segment limit from the segment descriptor specified by the
1302
  "lsl" loads the segment limit from the segment descriptor specified by the
1282
selector in source operand into the destination operand and sets the ZF flag.
1303
selector in source operand into the destination operand and sets the ZF flag.
1283
Rules for operand are the same as for the "lar" instruction.
1304
Rules for operand are the same as for the "lar" instruction.
1284
  "verr" and "verw" verify whether the code or data segment specified with
1305
  "verr" and "verw" verify whether the code or data segment specified with
1285
the operand is readable or writable from the current privilege level. The
1306
the operand is readable or writable from the current privilege level. The
1286
operand should be a word, it can be general register or memory. If the segment
1307
operand should be a word, it can be general register or memory. If the segment
1287
is accessible and readable (for "verr") or writable (for "verw") the ZF flag
1308
is accessible and readable (for "verr") or writable (for "verw") the ZF flag
1288
is set, otherwise it's cleared. Rules for operand are the same as for the
1309
is set, otherwise it's cleared. Rules for operand are the same as for the
1289
"lldt" instruction.
1310
"lldt" instruction.
1290
  "arpl" compares the RPL (requestor's privilege level) fields of two segment
1311
  "arpl" compares the RPL (requestor's privilege level) fields of two segment
1291
selectors. The first operand contains one segment selector and the second
1312
selectors. The first operand contains one segment selector and the second
1292
operand contains the other. If the RPL field of the destination operand is
1313
operand contains the other. If the RPL field of the destination operand is
1293
less than the RPL field of the source operand, the ZF flag is set and the RPL
1314
less than the RPL field of the source operand, the ZF flag is set and the RPL
1294
field of the destination operand is increased to match that of the source
1315
field of the destination operand is increased to match that of the source
1295
operand. Otherwise, the ZF flag is cleared and no change is made to the
1316
operand. Otherwise, the ZF flag is cleared and no change is made to the
1296
destination operand. The destination operand can be a word general register
1317
destination operand. The destination operand can be a word general register
1297
or memory, the source operand must be a general register.
1318
or memory, the source operand must be a general register.
1298
 
1319
 
1299
    arpl bx,ax       ; adjust RPL of selector in register
1320
    arpl bx,ax       ; adjust RPL of selector in register
1300
    arpl [bx],ax     ; adjust RPL of selector in memory
1321
    arpl [bx],ax     ; adjust RPL of selector in memory
1301
 
1322
 
1302
  "clts" clears the TS (task switched) flag in the CR0 register. This
1323
  "clts" clears the TS (task switched) flag in the CR0 register. This
1303
instruction has no operands.
1324
instruction has no operands.
1304
  "lock" prefix causes the processor's bus-lock signal to be asserted during
1325
  "lock" prefix causes the processor's bus-lock signal to be asserted during
1305
execution of the accompanying instruction. In a multiprocessor environment,
1326
execution of the accompanying instruction. In a multiprocessor environment,
1306
the bus-lock signal insures that the processor has exclusive use of any shared
1327
the bus-lock signal insures that the processor has exclusive use of any shared
1307
memory while the signal is asserted. The "lock" prefix can be prepended only
1328
memory while the signal is asserted. The "lock" prefix can be prepended only
1308
to the following instructions and only to those forms of the instructions
1329
to the following instructions and only to those forms of the instructions
1309
where the destination operand is a memory operand: "add", "adc", "and", "btc",
1330
where the destination operand is a memory operand: "add", "adc", "and", "btc",
1310
"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
1331
"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
1311
"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
1332
"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
1312
these instructions and the source operand is a memory operand, an undefined
1333
these instructions and the source operand is a memory operand, an undefined
1313
opcode exception may be generated. An undefined opcode exception will also be
1334
opcode exception may be generated. An undefined opcode exception will also be
1314
generated if the "lock" prefix is used with any instruction not in the above
1335
generated if the "lock" prefix is used with any instruction not in the above
1315
list. The "xchg" instruction always asserts the bus-lock signal regardless of
1336
list. The "xchg" instruction always asserts the bus-lock signal regardless of
1316
the presence or absence of the "lock" prefix.
1337
the presence or absence of the "lock" prefix.
1317
  "hlt" stops instruction execution and places the processor in a halted
1338
  "hlt" stops instruction execution and places the processor in a halted
1318
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
1339
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
1319
signal will resume execution. This instruction has no operands.
1340
signal will resume execution. This instruction has no operands.
1320
  "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
1341
  "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
1321
specified with the operand, which should be a memory. The processor determines
1342
specified with the operand, which should be a memory. The processor determines
1322
the page that contains that address and flushes the TLB entry for that page.
1343
the page that contains that address and flushes the TLB entry for that page.
1323
  "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
1344
  "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
1324
address specified in the ECX register into registers EDX and EAX. "wrmsr"
1345
address specified in the ECX register into registers EDX and EAX. "wrmsr"
1325
writes the contents of registers EDX and EAX into the 64-bit MSR of the
1346
writes the contents of registers EDX and EAX into the 64-bit MSR of the
1326
address specified in the ECX register. "rdtsc" loads the current value of the
1347
address specified in the ECX register. "rdtsc" loads the current value of the
1327
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
1348
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
1328
registers. The processor increments the time stamp counter MSR every clock
1349
registers. The processor increments the time stamp counter MSR every clock
1329
cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
1350
cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
1330
contents of the 40-bit performance monitoring counter specified in the ECX
1351
contents of the 40-bit performance monitoring counter specified in the ECX
1331
register into registers EDX and EAX. These instructions have no operands.
1352
register into registers EDX and EAX. These instructions have no operands.
1332
  "wbinvd" writes back all modified cache lines in the processor's internal
1353
  "wbinvd" writes back all modified cache lines in the processor's internal
1333
cache to main memory and invalidates (flushes) the internal caches. The
1354
cache to main memory and invalidates (flushes) the internal caches. The
1334
instruction then issues a special function bus cycle that directs external
1355
instruction then issues a special function bus cycle that directs external
1335
caches to also write back modified data and another bus cycle to indicate that
1356
caches to also write back modified data and another bus cycle to indicate that
1336
the external caches should be invalidated. This instruction has no operands.
1357
the external caches should be invalidated. This instruction has no operands.
1337
  "rsm" return program control from the system management mode to the program
1358
  "rsm" return program control from the system management mode to the program
1338
that was interrupted when the processor received an SMM interrupt. This
1359
that was interrupted when the processor received an SMM interrupt. This
1339
instruction has no operands.
1360
instruction has no operands.
1340
  "sysenter" executes a fast call to a level 0 system procedure, "sysexit"
1361
  "sysenter" executes a fast call to a level 0 system procedure, "sysexit"
1341
executes a fast return to level 3 user code. The addresses used by these
1362
executes a fast return to level 3 user code. The addresses used by these
1342
instructions are stored in MSRs. These instructions have no operands.
1363
instructions are stored in MSRs. These instructions have no operands.
1343
 
1364
 
1344
 
1365
 
1345
2.1.13  FPU instructions
1366
2.1.13  FPU instructions
1346
 
1367
 
1347
The FPU (Floating-Point Unit) instructions operate on the floating-point
1368
The FPU (Floating-Point Unit) instructions operate on the floating-point
1348
values in three formats: single precision (32-bit), double precision (64-bit)
1369
values in three formats: single precision (32-bit), double precision (64-bit)
1349
and double extended precision (80-bit). The FPU registers form the stack and
1370
and double extended precision (80-bit). The FPU registers form the stack and
1350
each of them holds the double extended precision floating-point value. When
1371
each of them holds the double extended precision floating-point value. When
1351
some values are pushed onto the stack or are removed from the top, the FPU
1372
some values are pushed onto the stack or are removed from the top, the FPU
1352
registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
1373
registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
1353
is the first value below the top, etc. The ST0 name has also the synonym ST.
1374
is the first value below the top, etc. The ST0 name has also the synonym ST.
1354
  "fld" pushes the floating-point value onto the FPU register stack. The
1375
  "fld" pushes the floating-point value onto the FPU register stack. The
1355
operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
1376
operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
1356
its value is then loaded onto the top of FPU register stack (the ST0
1377
its value is then loaded onto the top of FPU register stack (the ST0
1357
register) and is automatically converted into the double extended precision
1378
register) and is automatically converted into the double extended precision
1358
format.
1379
format.
1359
 
1380
 
1360
    fld dword [bx]   ; load single prevision value from memory
1381
    fld dword [bx]   ; load single prevision value from memory
1361
    fld st2          ; push value of st2 onto register stack
1382
    fld st2          ; push value of st2 onto register stack
1362
 
1383
 
1363
  "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
1384
  "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
1364
commonly used contants onto the FPU register stack. The loaded constants are
1385
commonly used contants onto the FPU register stack. The loaded constants are
1365
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
1386
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
1366
have no operands.
1387
have no operands.
1367
  "fild" convert the singed integer source operand into double extended
1388
  "fild" converts the signed integer source operand into double extended
1368
precision floating-point format and pushes the result onto the FPU register
1389
precision floating-point format and pushes the result onto the FPU register
1369
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
1390
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
1370
 
1391
 
1371
    fild qword [bx]  ; load 64-bit integer from memory
1392
    fild qword [bx]  ; load 64-bit integer from memory
1372
 
1393
 
1373
  "fst" copies the value of ST0 register to the destination operand, which
1394
  "fst" copies the value of ST0 register to the destination operand, which
1374
can be 32-bit or 64-bit memory location or another FPU register. "fstp"
1395
can be 32-bit or 64-bit memory location or another FPU register. "fstp"
1375
performs the same operation as "fst" and then pops the register stack,
1396
performs the same operation as "fst" and then pops the register stack,
1376
getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
1397
getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
1377
and can also store value in the 80-bit memory.
1398
and can also store value in the 80-bit memory.
1378
 
1399
 
1379
    fst st3          ; copy value of st0 into st3 register
1400
    fst st3          ; copy value of st0 into st3 register
1380
    fstp tword [bx]  ; store value in memory and pop stack
1401
    fstp tword [bx]  ; store value in memory and pop stack
1381
 
1402
 
1382
  "fist" converts the value in ST0 to a signed integer and stores the result
1403
  "fist" converts the value in ST0 to a signed integer and stores the result
1383
in the destination operand. The operand can be 16-bit or 32-bit memory
1404
in the destination operand. The operand can be 16-bit or 32-bit memory
1384
location. "fistp" performs the same operation and then pops the register
1405
location. "fistp" performs the same operation and then pops the register
1385
stack, it accepts the same operands as the "fist" instruction and can also
1406
stack, it accepts the same operands as the "fist" instruction and can also
1386
store integer value in the 64-bit memory, so it has the same rules for
1407
store integer value in the 64-bit memory, so it has the same rules for
1387
operands as "fild" instruction.
1408
operands as "fild" instruction.
1388
  "fbld" converts the packed BCD integer into double extended precision
1409
  "fbld" converts the packed BCD integer into double extended precision
1389
floating-point format and pushes this value onto the FPU stack. "fbstp"
1410
floating-point format and pushes this value onto the FPU stack. "fbstp"
1390
converts the value in ST0 to an 18-digit packed BCD integer, stores the result
1411
converts the value in ST0 to an 18-digit packed BCD integer, stores the result
1391
in the destination operand, and pops the register stack. The operand should be
1412
in the destination operand, and pops the register stack. The operand should be
1392
an 80-bit memory location.
1413
an 80-bit memory location.
1393
  "fadd" adds the destination and source operand and stores the sum in the
1414
  "fadd" adds the destination and source operand and stores the sum in the
1394
destination location. The destination operand is always an FPU register, if
1415
destination location. The destination operand is always an FPU register, if
1395
the source is a memory location, the destination is ST0 register and only
1416
the source is a memory location, the destination is ST0 register and only
1396
source operand should be specified. If both operands are FPU registers, at
1417
source operand should be specified. If both operands are FPU registers, at
1397
least one of them should be ST0 register. An operand in memory can be a
1418
least one of them should be ST0 register. An operand in memory can be a
1398
32-bit or 64-bit value.
1419
32-bit or 64-bit value.
1399
 
1420
 
1400
    fadd qword [bx]  ; add double precision value to st0
1421
    fadd qword [bx]  ; add double precision value to st0
1401
    fadd st2,st0     ; add st0 to st2
1422
    fadd st2,st0     ; add st0 to st2
1402
 
1423
 
1403
  "faddp" adds the destination and source operand, stores the sum in the
1424
  "faddp" adds the destination and source operand, stores the sum in the
1404
destination location and then pops the register stack. The destination operand
1425
destination location and then pops the register stack. The destination operand
1405
must be an FPU register and the source operand must be the ST0. When no
1426
must be an FPU register and the source operand must be the ST0. When no
1406
operands are specified, ST1 is used as a destination operand.
1427
operands are specified, ST1 is used as a destination operand.
1407
 
1428
 
1408
    faddp            ; add st0 to st1 and pop the stack
1429
    faddp            ; add st0 to st1 and pop the stack
1409
    faddp st2,st0    ; add st0 to st2 and pop the stack
1430
    faddp st2,st0    ; add st0 to st2 and pop the stack
1410
 
1431
 
1411
"fiadd" instruction converts an integer source operand into double extended
1432
"fiadd" instruction converts an integer source operand into double extended
1412
precision floating-point value and adds it to the destination operand. The
1433
precision floating-point value and adds it to the destination operand. The
1413
operand should be a 16-bit or 32-bit memory location.
1434
operand should be a 16-bit or 32-bit memory location.
1414
 
1435
 
1415
    fiadd word [bx]  ; add word integer to st0
1436
    fiadd word [bx]  ; add word integer to st0
1416
 
1437
 
1417
  "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd",
1438
  "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd",
1418
have the same rules for operands and differ only in the perfomed computation.
1439
have the same rules for operands and differ only in the perfomed computation.
1419
"fsub" substracts the source operand from the destination operand, "fsubr"
1440
"fsub" substracts the source operand from the destination operand, "fsubr"
1420
substract the destination operand from the source operand, "fmul" multiplies
1441
substract the destination operand from the source operand, "fmul" multiplies
1421
the destination and source operands, "fdiv" divides the destination operand by
1442
the destination and source operands, "fdiv" divides the destination operand by
1422
the source operand and "fdivr" divides the source operand by the destination
1443
the source operand and "fdivr" divides the source operand by the destination
1423
operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
1444
operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
1424
operations and pop the register stack, the rules for operand are the same as
1445
operations and pop the register stack, the rules for operand are the same as
1425
for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
1446
for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
1426
perform these operations after converting the integer source operand into
1447
perform these operations after converting the integer source operand into
1427
floating-point value, they have the same rules for operands as "fiadd"
1448
floating-point value, they have the same rules for operands as "fiadd"
1428
instruction.
1449
instruction.
1429
  "fsqrt" computes the square root of the value in ST0 register, "fsin"
1450
  "fsqrt" computes the square root of the value in ST0 register, "fsin"
1430
computes the sine of that value, "fcos" computes the cosine of that value,
1451
computes the sine of that value, "fcos" computes the cosine of that value,
1431
"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
1452
"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
1432
value, "frndint" rounds it to the nearest integral value, depending on the
1453
value, "frndint" rounds it to the nearest integral value, depending on the
1433
current rounding mode. "f2xm1" computes the exponential value of 2 to the
1454
current rounding mode. "f2xm1" computes the exponential value of 2 to the
1434
power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
1455
power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
1435
range -1.0 to +1.0. All these instruction store the result in ST0 and have no
1456
range -1.0 to +1.0. All these instruction store the result in ST0 and have no
1436
operands.
1457
operands.
1437
  "fsincos" computes both the sine and the cosine of the value in ST0
1458
  "fsincos" computes both the sine and the cosine of the value in ST0
1438
register, stores the sine in ST0 and pushes the cosine on the top of FPU
1459
register, stores the sine in ST0 and pushes the cosine on the top of FPU
1439
register stack. "fptan" computes the tangent of the value in ST0, stores the
1460
register stack. "fptan" computes the tangent of the value in ST0, stores the
1440
result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
1461
result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
1441
the arctangent of the value in ST1 divided by the value in ST0, stores the
1462
the arctangent of the value in ST1 divided by the value in ST0, stores the
1442
result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
1463
result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
1443
logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
1464
logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
1444
FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
1465
FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
1445
ST0 before computing the logarithm. "fprem" computes the remainder obtained
1466
ST0 before computing the logarithm. "fprem" computes the remainder obtained
1446
from dividing the value in ST0 by the value in ST1, and stores the result
1467
from dividing the value in ST0 by the value in ST1, and stores the result
1447
in ST0. "fprem1" performs the same operation as "fprem", but it computes the
1468
in ST0. "fprem1" performs the same operation as "fprem", but it computes the
1448
remainder in the way specified by IEEE Standard 754. "fscale" truncates the
1469
remainder in the way specified by IEEE Standard 754. "fscale" truncates the
1449
value in ST1 and increases the exponent of ST0 by this value. "fxtract"
1470
value in ST1 and increases the exponent of ST0 by this value. "fxtract"
1450
separates the value in ST0 into its exponent and significand, stores the
1471
separates the value in ST0 into its exponent and significand, stores the
1451
exponent in ST0 and pushes the significand onto the register stack. "fnop"
1472
exponent in ST0 and pushes the significand onto the register stack. "fnop"
1452
performs no operation. These instruction have no operands.
1473
performs no operation. These instruction have no operands.
1453
  "fxch" exchanges the contents of ST0 an another FPU register. The operand
1474
  "fxch" exchanges the contents of ST0 an another FPU register. The operand
1454
should be an FPU register, if no operand is specified, the contents of ST0 and
1475
should be an FPU register, if no operand is specified, the contents of ST0 and
1455
ST1 are exchanged.
1476
ST1 are exchanged.
1456
  "fcom" and "fcomp" compare the contents of ST0 and the source operand and
1477
  "fcom" and "fcomp" compare the contents of ST0 and the source operand and
1457
set flags in the FPU status word according to the results. "fcomp"
1478
set flags in the FPU status word according to the results. "fcomp"
1458
additionally pops the register stack after performing the comparison. The
1479
additionally pops the register stack after performing the comparison. The
1459
operand can be a single or double precision value in memory or the FPU
1480
operand can be a single or double precision value in memory or the FPU
1460
register. When no operand is specified, ST1 is used as a source operand.
1481
register. When no operand is specified, ST1 is used as a source operand.
1461
 
1482
 
1462
    fcom             ; compare st0 with st1
1483
    fcom             ; compare st0 with st1
1463
    fcomp st2        ; compare st0 with st2 and pop stack
1484
    fcomp st2        ; compare st0 with st2 and pop stack
1464
 
1485
 
1465
  "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
1486
  "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
1466
word according to the results and pops the register stack twice. This
1487
word according to the results and pops the register stack twice. This
1467
instruction has no operands.
1488
instruction has no operands.
1468
  "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
1489
  "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
1469
registers. Rules for operands are the same as for the "fcom", "fcomp" and
1490
registers. Rules for operands are the same as for the "fcom", "fcomp" and
1470
"fcompp", but the source operand must be an FPU register.
1491
"fcompp", but the source operand must be an FPU register.
1471
  "ficom" and "ficomp" compare the value in ST0 with an integer source operand
1492
  "ficom" and "ficomp" compare the value in ST0 with an integer source operand
1472
and set the flags in the FPU status word according to the results. "ficomp"
1493
and set the flags in the FPU status word according to the results. "ficomp"
1473
additionally pops the register stack after performing the comparison. The
1494
additionally pops the register stack after performing the comparison. The
1474
integer value is converted to double extended precision floating-point format
1495
integer value is converted to double extended precision floating-point format
1475
before the comparison is made. The operand should be a 16-bit or 32-bit
1496
before the comparison is made. The operand should be a 16-bit or 32-bit
1476
memory location.
1497
memory location.
1477
 
1498
 
1478
    ficom word [bx]  ; compare st0 with 16-bit integer
1499
    ficom word [bx]  ; compare st0 with 16-bit integer
1479
 
1500
 
1480
  "fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with
1501
  "fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with
1481
another FPU register and set the ZF, PF and CF flags according to the results.
1502
another FPU register and set the ZF, PF and CF flags according to the results.
1482
"fcomip" and "fucomip" additionaly pop the register stack after performing the
1503
"fcomip" and "fucomip" additionaly pop the register stack after performing the
1483
comparison. The instructions obtained by attaching the FPU condition mnemonic
1504
comparison. The instructions obtained by attaching the FPU condition mnemonic
1484
(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
1505
(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
1485
into ST0 register if the fiven test condition is true. These instruction
1506
into ST0 register if the fiven test condition is true. These instruction
1486
allow two different syntaxes, one with single operand specifying the source
1507
allow two different syntaxes, one with single operand specifying the source
1487
FPU register, and one with two operands, in that case destination operand
1508
FPU register, and one with two operands, in that case destination operand
1488
should be ST0 register and the second operand specifies the source FPU
1509
should be ST0 register and the second operand specifies the source FPU
1489
register.
1510
register.
1490
 
1511
 
1491
    fcomi st2        ; compare st0 with st2 and set flags
1512
    fcomi st2        ; compare st0 with st2 and set flags
1492
    fcmovb st0,st2   ; transfer st2 to st0 if below
1513
    fcmovb st0,st2   ; transfer st2 to st0 if below
1493
 
1514
 
1494
   Table 2.2  FPU conditions
1515
   Table 2.2  FPU conditions
1495
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
1516
  /------------------------------------------------------\
1496
  ³ Mnemonic ³ Condition tested ³ Description            ³
1517
  | Mnemonic | Condition tested | Description            |
1497
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
1518
  |==========|==================|========================|
1498
  ³ b        ³ CF = 1           ³ below                  ³
1519
  | b        | CF = 1           | below                  |
1499
  ³ e        ³ ZF = 1           ³ equal                  ³
1520
  | e        | ZF = 1           | equal                  |
1500
  ³ be       ³ CF or ZF = 1     ³ below or equal         ³
1521
  | be       | CF or ZF = 1     | below or equal         |
1501
  ³ u        ³ PF = 1           ³ unordered              ³
1522
  | u        | PF = 1           | unordered              |
1502
  ³ nb       ³ CF = 0           ³ not below              ³
1523
  | nb       | CF = 0           | not below              |
1503
  ³ ne       ³ ZF = 0           ³ not equal              ³
1524
  | ne       | ZF = 0           | not equal              |
1504
  ³ nbe      ³ CF and ZF = 0    ³ not below nor equal    ³
1525
  | nbe      | CF and ZF = 0    | not below nor equal    |
1505
  ³ nu       ³ PF = 0           ³ not unordered          ³
1526
  | nu       | PF = 0           | not unordered          |
1506
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
1527
  \------------------------------------------------------/
1507
 
1528
 
1508
  "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
1529
  "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
1509
status word according to the results. "fxam" examines the contents of the ST0
1530
status word according to the results. "fxam" examines the contents of the ST0
1510
and sets the flags in FPU status word to indicate the class of value in the
1531
and sets the flags in FPU status word to indicate the class of value in the
1511
register. These instructions have no operands.
1532
register. These instructions have no operands.
1512
  "fstsw" and "fnstsw" store the current value of the FPU status word in the
1533
  "fstsw" and "fnstsw" store the current value of the FPU status word in the
1513
destination location. The destination operand can be either a 16-bit memory or
1534
destination location. The destination operand can be either a 16-bit memory or
1514
the AX register. "fstsw" checks for pending umasked FPU exceptions before
1535
the AX register. "fstsw" checks for pending umasked FPU exceptions before
1515
storing the status word, "fnstsw" does not.
1536
storing the status word, "fnstsw" does not.
1516
  "fstcw" and "fnstcw" store the current value of the FPU control word at the
1537
  "fstcw" and "fnstcw" store the current value of the FPU control word at the
1517
specified destination in memory. "fstcw" checks for pending umasked FPU
1538
specified destination in memory. "fstcw" checks for pending umasked FPU
1518
exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
1539
exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
1519
the operand into the FPU control word. The operand should be a 16-bit memory
1540
the operand into the FPU control word. The operand should be a 16-bit memory
1520
location.
1541
location.
1521
  "fstenv" and "fnstenv" store the current FPU operating environment at the
1542
  "fstenv" and "fnstenv" store the current FPU operating environment at the
1522
memory location specified with the destination operand, and then mask all FPU
1543
memory location specified with the destination operand, and then mask all FPU
1523
exceptions. "fstenv" checks for pending umasked FPU exceptions before
1544
exceptions. "fstenv" checks for pending umasked FPU exceptions before
1524
proceeding, "fnstenv" does not. "fldenv" loads the complete operating
1545
proceeding, "fnstenv" does not. "fldenv" loads the complete operating
1525
environment from memory into the FPU. "fsave" and "fnsave" store the current
1546
environment from memory into the FPU. "fsave" and "fnsave" store the current
1526
FPU state (operating environment and register stack) at the specified
1547
FPU state (operating environment and register stack) at the specified
1527
destination in memory and reinitializes the FPU. "fsave" check for pending
1548
destination in memory and reinitializes the FPU. "fsave" check for pending
1528
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
1549
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
1529
loads the FPU state from the specified memory location. All these instructions
1550
loads the FPU state from the specified memory location. All these instructions
1530
need an operand being a memory location.
1551
need an operand being a memory location. For each of these instruction
1531
  "finit" and "fninit" set the FPU operating environment into its default
1552
exist two additional mnemonics that allow to precisely select the type of the
-
 
1553
operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
-
 
1554
"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
-
 
1555
mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
-
 
1556
force the operation as in 32-bit mode.
-
 
1557
  "finit" and "fninit" set the FPU operating environment into its default
1532
state. "finit" checks for pending unmasked FPU exception before proceeding,
1558
state. "finit" checks for pending unmasked FPU exception before proceeding,
1533
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
1559
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
1534
FPU status word. "fclex" checks for pending unmasked FPU exception before
1560
FPU status word. "fclex" checks for pending unmasked FPU exception before
1535
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
1561
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
1536
instruction, which causes the processor to check for pending unmasked FPU
1562
instruction, which causes the processor to check for pending unmasked FPU
1537
exceptions and handle them before proceeding. These instruction have no
1563
exceptions and handle them before proceeding. These instruction have no
1538
operands.
1564
operands.
1539
  "ffree" sets the tag associated with specified FPU register to empty. The
1565
  "ffree" sets the tag associated with specified FPU register to empty. The
1540
operand should be an FPU register.
1566
operand should be an FPU register.
1541
  "fincstp" and "fdecstp" rotate the FPU stack by one by adding or
1567
  "fincstp" and "fdecstp" rotate the FPU stack by one by adding or
1542
substracting one to the pointer of the top of stack. These instruction have no
1568
substracting one to the pointer of the top of stack. These instruction have no
1543
operands.
1569
operands.
1544
 
1570
 
1545
 
1571
 
1546
2.1.14  MMX instructions
1572
2.1.14  MMX instructions
1547
 
1573
 
1548
The MMX instructions operate on the packed integer types and use the MMX
1574
The MMX instructions operate on the packed integer types and use the MMX
1549
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
1575
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
1550
of this MMX instructions cannot be used at the same time as FPU instructions.
1576
of this MMX instructions cannot be used at the same time as FPU instructions.
1551
They can operate on packed bytes (eight 8-bit integers), packed words (four
1577
They can operate on packed bytes (eight 8-bit integers), packed words (four
1552
16-bit integers) or packed double words (two 32-bit integers), use of packed
1578
16-bit integers) or packed double words (two 32-bit integers), use of packed
1553
formats allows to perform operations on multiple data at one time.
1579
formats allows to perform operations on multiple data at one time.
1554
  "movq" copies a quad word from the source operand to the destination
1580
  "movq" copies a quad word from the source operand to the destination
1555
operand. At least one of the operands must be a MMX register, the second one
1581
operand. At least one of the operands must be a MMX register, the second one
1556
can be also a MMX register or 64-bit memory location.
1582
can be also a MMX register or 64-bit memory location.
1557
 
1583
 
1558
    movq mm0,mm1     ; move quad word from register to register
1584
    movq mm0,mm1     ; move quad word from register to register
1559
    movq mm2,[ebx]   ; move quad word from memory to register
1585
    movq mm2,[ebx]   ; move quad word from memory to register
1560
 
1586
 
1561
  "movd" copies a double word from the source operand to the destination
1587
  "movd" copies a double word from the source operand to the destination
1562
operand. One of the operands must be a MMX register, the second one can be a
1588
operand. One of the operands must be a MMX register, the second one can be a
1563
general register or 32-bit memory location. Only low double word of MMX
1589
general register or 32-bit memory location. Only low double word of MMX
1564
register is used.
1590
register is used.
1565
  All general MMX operations have two operands, the destination operand should
1591
  All general MMX operations have two operands, the destination operand should
1566
be a MMX register, the source operand can be a MMX register or 64-bit memory
1592
be a MMX register, the source operand can be a MMX register or 64-bit memory
1567
location. Operation is performed on the corresponding data elements of the
1593
location. Operation is performed on the corresponding data elements of the
1568
source and destination operand and stored in the data elements of the
1594
source and destination operand and stored in the data elements of the
1569
destination operand. "paddb", "paddw" and "paddd" perform the addition of
1595
destination operand. "paddb", "paddw" and "paddd" perform the addition of
1570
packed bytes, packed words, or packed double words.  "psubb", "psubw" and
1596
packed bytes, packed words, or packed double words.  "psubb", "psubw" and
1571
"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1597
"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1572
"psubsb" and "psubsw" perform the addition or substraction of packed bytes
1598
"psubsb" and "psubsw" perform the addition or substraction of packed bytes
1573
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
1599
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
1574
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
1600
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
1575
performs a signed multiply of the packed words and store the high or low words
1601
performs a signed multiplication of the packed words and store the high or low
1576
of the results in the destination operand. "pmaddwd" performs a multiply of
1602
words of the results in the destination operand. "pmaddwd" performs a multiply
1577
the packed words and adds the four intermediate double word products in pairs
1603
of the packed words and adds the four intermediate double word products in
1578
to produce result as a packed double words. "pand", "por" and "pxor" perform
1604
pairs to produce result as a packed double words. "pand", "por" and "pxor"
1579
the logical operations on the quad words, "pandn" peforms also a logical
1605
perform the logical operations on the quad words, "pandn" peforms also a
1580
negation of the destination operand before performing the "and" operation.
1606
logical negation of the destination operand before performing the "and"
1581
"pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed bytes,
1607
operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
1582
packed words or packed double words. If a pair of data elements is equal, the
1608
bytes, packed words or packed double words. If a pair of data elements is
1583
corresponding data element in the destination operand is filled with bits of
1609
equal, the corresponding data element in the destination operand is filled with
1584
value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" perform
1610
bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
1585
the similar operation, but they check whether the data elements in the
1611
perform the similar operation, but they check whether the data elements in the
1586
destination operand are greater than the correspoding data elements in the
1612
destination operand are greater than the correspoding data elements in the
1587
source operand. "packsswb" converts packed signed words into packed signed
1613
source operand. "packsswb" converts packed signed words into packed signed
1588
bytes, "packssdw" converts packed signed double words into packed signed
1614
bytes, "packssdw" converts packed signed double words into packed signed
1589
words, using saturation to handle overflow conditions. "packuswb" converts
1615
words, using saturation to handle overflow conditions. "packuswb" converts
1590
packed signed words into packed unsigned bytes. Converted data elements from
1616
packed signed words into packed unsigned bytes. Converted data elements from
1591
the source operand are stored in the low part of the destination operand,
1617
the source operand are stored in the low part of the destination operand,
1592
while converted data elements from the destination operand are stored in the
1618
while converted data elements from the destination operand are stored in the
1593
high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
1619
high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
1594
elements from the high parts of the source and destination operands and
1620
elements from the high parts of the source and destination operands and
1595
stores the result into the destination operand. "punpcklbw", "punpcklwd" and
1621
stores the result into the destination operand. "punpcklbw", "punpcklwd" and
1596
"punpckldq" perform the same operation, but the low parts of the source and
1622
"punpckldq" perform the same operation, but the low parts of the source and
1597
destination operand are used.
1623
destination operand are used.
1598
 
1624
 
1599
    paddsb mm0,[esi] ; add packed bytes with signed saturation
1625
    paddsb mm0,[esi] ; add packed bytes with signed saturation
1600
    pcmpeqw mm3,mm7  ; compare packed words for equality
1626
    pcmpeqw mm3,mm7  ; compare packed words for equality
1601
 
1627
 
1602
  "psllw", "pslld" and "psllq" perform logical shift left of the packed words,
1628
  "psllw", "pslld" and "psllq" perform logical shift left of the packed words,
1603
packed double words or a single quad word in the destination operand by the
1629
packed double words or a single quad word in the destination operand by the
1604
amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
1630
amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
1605
logical shift right of the packed words, packed double words or a single quad
1631
logical shift right of the packed words, packed double words or a single quad
1606
word. "psraw" and "psrad" perform arithmetic shift of the packed words or
1632
word. "psraw" and "psrad" perform arithmetic shift of the packed words or
1607
double words. The destination operand should be a MMX register, while source
1633
double words. The destination operand should be a MMX register, while source
1608
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
1634
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
1609
value.
1635
value.
1610
 
1636
 
1611
    psllw mm2,mm4    ; shift words left logically
1637
    psllw mm2,mm4    ; shift words left logically
1612
    psrad mm4,[ebx]  ; shift double words right arithmetically
1638
    psrad mm4,[ebx]  ; shift double words right arithmetically
1613
 
1639
 
1614
  "emms" makes the FPU registers usable for the FPU instructions, it must be
1640
  "emms" makes the FPU registers usable for the FPU instructions, it must be
1615
used before using the FPU instructions if any MMX instructions were used.
1641
used before using the FPU instructions if any MMX instructions were used.
1616
 
1642
 
1617
 
1643
 
1618
2.1.15  SSE instructions
1644
2.1.15  SSE instructions
1619
 
1645
 
1620
The SSE extension adds more MMX instructions and also introduces the
1646
The SSE extension adds more MMX instructions and also introduces the
1621
operations on packed single precision floating point values. The 128-bit
1647
operations on packed single precision floating point values. The 128-bit
1622
packed single precision format consists of four single precision floating
1648
packed single precision format consists of four single precision floating
1623
point values. The 128-bit SSE registers are designed for the purpose of
1649
point values. The 128-bit SSE registers are designed for the purpose of
1624
operations on this data type.
1650
operations on this data type.
1625
  "movaps" and "movups" transfer a double quad word operand containing packed
1651
  "movaps" and "movups" transfer a double quad word operand containing packed
1626
single precision values from source operand to destination operand. At least
1652
single precision values from source operand to destination operand. At least
1627
one of the operands have to be a SSE register, the second one can be also a
1653
one of the operands have to be a SSE register, the second one can be also a
1628
SSE register or 128-bit memory location. Memory operands for "movaps"
1654
SSE register or 128-bit memory location. Memory operands for "movaps"
1629
instruction must be aligned on boundary of 16 bytes, operands for "movups"
1655
instruction must be aligned on boundary of 16 bytes, operands for "movups"
1630
instruction don't have to be aligned.
1656
instruction don't have to be aligned.
1631
 
1657
 
1632
    movups xmm0,[ebx]  ; move unaligned double quad word
1658
    movups xmm0,[ebx]  ; move unaligned double quad word
1633
 
1659
 
1634
  "movlps" moves packed two single precision values between the memory and the
1660
  "movlps" moves packed two single precision values between the memory and the
1635
low quad word of SSE register. "movhps" moved packed two single precision
1661
low quad word of SSE register. "movhps" moved packed two single precision
1636
values between the memory and the high quad word of SSE register. One of the
1662
values between the memory and the high quad word of SSE register. One of the
1637
operands must be a SSE register, and the other operand must be a 64-bit memory
1663
operands must be a SSE register, and the other operand must be a 64-bit memory
1638
location.
1664
location.
1639
 
1665
 
1640
    movlps xmm0,[ebx]  ; move memory to low quad word of xmm0
1666
    movlps xmm0,[ebx]  ; move memory to low quad word of xmm0
1641
    movhps [esi],xmm7  ; move high quad word of xmm7 to memory
1667
    movhps [esi],xmm7  ; move high quad word of xmm7 to memory
1642
 
1668
 
1643
  "movlhps" moves packed two single precision values from the low quad word
1669
  "movlhps" moves packed two single precision values from the low quad word
1644
of source register to the high quad word of destination register. "movhlps"
1670
of source register to the high quad word of destination register. "movhlps"
1645
moves two packed single precision values from the high quad word of source
1671
moves two packed single precision values from the high quad word of source
1646
register to the low quad word of destination register. Both operands have to
1672
register to the low quad word of destination register. Both operands have to
1647
be a SSE registers.
1673
be a SSE registers.
1648
  "movmskps" transfers the most significant bit of each of the four single
1674
  "movmskps" transfers the most significant bit of each of the four single
1649
precision values in the SSE register into low four bits of a general register.
1675
precision values in the SSE register into low four bits of a general register.
1650
The source operand must be a SSE register, the destination operand must be a
1676
The source operand must be a SSE register, the destination operand must be a
1651
general register.
1677
general register.
1652
  "movss" transfers a single precision value between source and destination
1678
  "movss" transfers a single precision value between source and destination
1653
operand (only the low double word is trasferred). At least one of the operands
1679
operand (only the low double word is trasferred). At least one of the operands
1654
have to be a SSE register, the second one can be also a SSE register or 32-bit
1680
have to be a SSE register, the second one can be also a SSE register or 32-bit
1655
memory location.
1681
memory location.
1656
 
1682
 
1657
    movss [edi],xmm3   ; move low double word of xmm3 to memory
1683
    movss [edi],xmm3   ; move low double word of xmm3 to memory
1658
 
1684
 
1659
  Each of the SSE arithmetic operations has two variants. When the mnemonic
1685
  Each of the SSE arithmetic operations has two variants. When the mnemonic
1660
ends with "ps", the source operand can be a 128-bit memory location or a SSE
1686
ends with "ps", the source operand can be a 128-bit memory location or a SSE
1661
register, the destination operand must be a SSE register and the operation is
1687
register, the destination operand must be a SSE register and the operation is
1662
performed on packed four single precision values, for each pair of the
1688
performed on packed four single precision values, for each pair of the
1663
corresponding data elements separately, the result is stored in the
1689
corresponding data elements separately, the result is stored in the
1664
destination register. When the mnemonic ends with "ss", the source operand
1690
destination register. When the mnemonic ends with "ss", the source operand
1665
can be a 32-bit memory location or a SSE register, the destination operand
1691
can be a 32-bit memory location or a SSE register, the destination operand
1666
must be a SSE register and the operation is performed on single precision
1692
must be a SSE register and the operation is performed on single precision
1667
values, only low double words of SSE registers are used in this case, the
1693
values, only low double words of SSE registers are used in this case, the
1668
result is stored in the low double word of destination register. "addps" and
1694
result is stored in the low double word of destination register. "addps" and
1669
"addss" add the values, "subps" and "subss" substract the source value from
1695
"addss" add the values, "subps" and "subss" substract the source value from
1670
destination value, "mulps" and "mulss" multiply the values, "divps" and
1696
destination value, "mulps" and "mulss" multiply the values, "divps" and
1671
"divss" divide the destination value by the source value, "rcpps" and "rcpss"
1697
"divss" divide the destination value by the source value, "rcpps" and "rcpss"
1672
compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
1698
compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
1673
compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
1699
compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
1674
the approximate reciprocal of square root of the source value, "maxps" and
1700
the approximate reciprocal of square root of the source value, "maxps" and
1675
"maxss" compare the source and destination values and return the greater one,
1701
"maxss" compare the source and destination values and return the greater one,
1676
"minps" and "minss" compare the source and destination values and return the
1702
"minps" and "minss" compare the source and destination values and return the
1677
lesser one.
1703
lesser one.
1678
 
1704
 
1679
    mulss xmm0,[ebx]   ; multiply single precision values
1705
    mulss xmm0,[ebx]   ; multiply single precision values
1680
    addps xmm3,xmm7    ; add packed single precision values
1706
    addps xmm3,xmm7    ; add packed single precision values
1681
 
1707
 
1682
  "andps", "andnps", "orps" and "xorps" perform the logical operations on
1708
  "andps", "andnps", "orps" and "xorps" perform the logical operations on
1683
packed single precision values. The source operand can be a 128-bit memory
1709
packed single precision values. The source operand can be a 128-bit memory
1684
location or a SSE register, the destination operand must be a SSE register.
1710
location or a SSE register, the destination operand must be a SSE register.
1685
  "cmpps" compares packed single precision values and returns a mask result
1711
  "cmpps" compares packed single precision values and returns a mask result
1686
into the destination operand, which must be a SSE register. The source operand
1712
into the destination operand, which must be a SSE register. The source operand
1687
can be a 128-bit memory location or SSE register, the third operand must be an
1713
can be a 128-bit memory location or SSE register, the third operand must be an
1688
immediate operand selecting code of one of the eight compare conditions
1714
immediate operand selecting code of one of the eight compare conditions
1689
(table 2.3). "cmpss" performs the same operation on single precision values,
1715
(table 2.3). "cmpss" performs the same operation on single precision values,
1690
only low double word of destination register is affected, in this case source
1716
only low double word of destination register is affected, in this case source
1691
operand can be a 32-bit memory location or SSE register. These two
1717
operand can be a 32-bit memory location or SSE register. These two
1692
instructions have also variants with only two operands and the condition
1718
instructions have also variants with only two operands and the condition
1693
encoded within mnemonic. Their mnemonics are obtained by attaching the
1719
encoded within mnemonic. Their mnemonics are obtained by attaching the
1694
mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
1720
mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
1695
"ss" at the end.
1721
"ss" at the end.
1696
 
1722
 
1697
    cmpps xmm2,xmm4,0  ; compare packed single precision values
1723
    cmpps xmm2,xmm4,0  ; compare packed single precision values
1698
    cmpltss xmm0,[ebx] ; compare single precision values
1724
    cmpltss xmm0,[ebx] ; compare single precision values
1699
 
1725
 
1700
   Table 2.3  SSE conditions
1726
   Table 2.3  SSE conditions
1701
  ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
1727
  /-------------------------------------------\
1702
  ³ Code ³ Mnemonic ³ Description             ³
1728
  | Code | Mnemonic | Description             |
1703
  ÆÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
1729
  |======|==========|=========================|
1704
  ³ 0    ³ eq       ³ equal                   ³
1730
  | 0    | eq       | equal                   |
1705
  ³ 1    ³ lt       ³ less than               ³
1731
  | 1    | lt       | less than               |
1706
  ³ 2    ³ le       ³ less than or equal      ³
1732
  | 2    | le       | less than or equal      |
1707
  ³ 3    ³ unord    ³ unordered               ³
1733
  | 3    | unord    | unordered               |
1708
  ³ 4    ³ neq      ³ not equal               ³
1734
  | 4    | neq      | not equal               |
1709
  ³ 5    ³ nlt      ³ not less than           ³
1735
  | 5    | nlt      | not less than           |
1710
  ³ 6    ³ nle      ³ not less than nor equal ³
1736
  | 6    | nle      | not less than nor equal |
1711
  ³ 7    ³ ord      ³ ordered                 ³
1737
  | 7    | ord      | ordered                 |
1712
  ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
1738
  \-------------------------------------------/
1713
 
1739
 
1714
  "comiss" and "ucomiss" compare the single precision values and set the ZF,
1740
  "comiss" and "ucomiss" compare the single precision values and set the ZF,
1715
PF and CF flags to show the result. The destination operand must be a SSE
1741
PF and CF flags to show the result. The destination operand must be a SSE
1716
register, the source operand can be a 32-bit memory location or SSE register.
1742
register, the source operand can be a 32-bit memory location or SSE register.
1717
  "shufps" moves any two of the four single precision values from the
1743
  "shufps" moves any two of the four single precision values from the
1718
destination operand into the low quad word of the destination operand, and any
1744
destination operand into the low quad word of the destination operand, and any
1719
two of the four values from the source operand into the high quad word of the
1745
two of the four values from the source operand into the high quad word of the
1720
destination operand. The destination operand must be a SSE register, the
1746
destination operand. The destination operand must be a SSE register, the
1721
source operand can be a 128-bit memory location or SSE register, the third
1747
source operand can be a 128-bit memory location or SSE register, the third
1722
operand must be an 8-bit immediate value selecting which values will be moved
1748
operand must be an 8-bit immediate value selecting which values will be moved
1723
into the destination operand. Bits 0 and 1 select the value to be moved from
1749
into the destination operand. Bits 0 and 1 select the value to be moved from
1724
destination operand to the low double word of the result, bits 2 and 3 select
1750
destination operand to the low double word of the result, bits 2 and 3 select
1725
the value to be moved from the destination operand to the second double word,
1751
the value to be moved from the destination operand to the second double word,
1726
bits 4 and 5 select the value to be moved from the source operand to the third
1752
bits 4 and 5 select the value to be moved from the source operand to the third
1727
double word, and bits 6 and 7 select the value to be moved from the source
1753
double word, and bits 6 and 7 select the value to be moved from the source
1728
operand to the high double word of the result.
1754
operand to the high double word of the result.
1729
 
1755
 
1730
    shufps xmm0,xmm0,10010011b ; shuffle double words
1756
    shufps xmm0,xmm0,10010011b ; shuffle double words
1731
 
1757
 
1732
  "unpckhps" performs an interleaved unpack of the values from the high parts
1758
  "unpckhps" performs an interleaved unpack of the values from the high parts
1733
of the source and destination operands and stores the result in the
1759
of the source and destination operands and stores the result in the
1734
destination operand, which must be a SSE register. The source operand can be
1760
destination operand, which must be a SSE register. The source operand can be
1735
a 128-bit memory location or a SSE register. "unpcklps" performs an
1761
a 128-bit memory location or a SSE register. "unpcklps" performs an
1736
interleaved unpack of the values from the low parts of the source and
1762
interleaved unpack of the values from the low parts of the source and
1737
destination operand and stores the result in the destination operand,
1763
destination operand and stores the result in the destination operand,
1738
the rules for operands are the same.
1764
the rules for operands are the same.
1739
  "cvtpi2ps" converts packed two double word integers into the the packed two
1765
  "cvtpi2ps" converts packed two double word integers into the the packed two
1740
single precision floating point values and stores the result in the low quad
1766
single precision floating point values and stores the result in the low quad
1741
word of the destination operand, which should be a SSE register. The source
1767
word of the destination operand, which should be a SSE register. The source
1742
operand can be a 64-bit memory location or MMX register.
1768
operand can be a 64-bit memory location or MMX register.
1743
 
1769
 
1744
    cvtpi2ps xmm0,mm0  ; convert integers to single precision values
1770
    cvtpi2ps xmm0,mm0  ; convert integers to single precision values
1745
 
1771
 
1746
  "cvtsi2ss" converts a double word integer into a single precision floating
1772
  "cvtsi2ss" converts a double word integer into a single precision floating
1747
point value and stores the result in the low double word of the destination
1773
point value and stores the result in the low double word of the destination
1748
operand, which should be a SSE register. The source operand can be a 32-bit
1774
operand, which should be a SSE register. The source operand can be a 32-bit
1749
memory location or 32-bit general register.
1775
memory location or 32-bit general register.
1750
 
1776
 
1751
    cvtsi2ss xmm0,eax  ; convert integer to single precision value
1777
    cvtsi2ss xmm0,eax  ; convert integer to single precision value
1752
 
1778
 
1753
  "cvtps2pi" converts packed two single precision floating point values into
1779
  "cvtps2pi" converts packed two single precision floating point values into
1754
packed two double word integers and stores the result in the destination
1780
packed two double word integers and stores the result in the destination
1755
operand, which should be a MMX register. The source operand can be a 64-bit
1781
operand, which should be a MMX register. The source operand can be a 64-bit
1756
memory location or SSE register, only low quad word of SSE register is used.
1782
memory location or SSE register, only low quad word of SSE register is used.
1757
"cvttps2pi" performs the similar operation, except that truncation is used to
1783
"cvttps2pi" performs the similar operation, except that truncation is used to
1758
round a source values to integers, rules for the operands are the same.
1784
round a source values to integers, rules for the operands are the same.
1759
 
1785
 
1760
    cvtps2pi mm0,xmm0  ; convert single precision values to integers
1786
    cvtps2pi mm0,xmm0  ; convert single precision values to integers
1761
 
1787
 
1762
  "cvtss2si" convert a single precision floating point value into a double
1788
  "cvtss2si" convert a single precision floating point value into a double
1763
word integer and stores the result in the destination operand, which should be
1789
word integer and stores the result in the destination operand, which should be
1764
a 32-bit general register. The source operand can be a 32-bit memory location
1790
a 32-bit general register. The source operand can be a 32-bit memory location
1765
or SSE register, only low double word of SSE register is used. "cvttss2si"
1791
or SSE register, only low double word of SSE register is used. "cvttss2si"
1766
performs the similar operation, except that truncation is used to round a
1792
performs the similar operation, except that truncation is used to round a
1767
source value to integer, rules for the operands are the same.
1793
source value to integer, rules for the operands are the same.
1768
 
1794
 
1769
    cvtss2si eax,xmm0  ; convert single precision value to integer
1795
    cvtss2si eax,xmm0  ; convert single precision value to integer
1770
 
1796
 
1771
  "pextrw" copies the word in the source operand specified by the third
1797
  "pextrw" copies the word in the source operand specified by the third
1772
operand to the destination operand. The source operand must be a MMX register,
1798
operand to the destination operand. The source operand must be a MMX register,
1773
the destination operand must be a 32-bit general register (but only the low
1799
the destination operand must be a 32-bit general register (the high word of
1774
word of it is affected), the third operand must an 8-bit immediate value.
1800
the destination is cleared), the third operand must an 8-bit immediate value.
1775
 
1801
 
1776
    pextrw eax,mm0,1   ; extract word into eax
1802
    pextrw eax,mm0,1   ; extract word into eax
1777
 
1803
 
1778
  "pinsrw" inserts a word from the source operand in the destination operand
1804
  "pinsrw" inserts a word from the source operand in the destination operand
1779
at the location specified with the third operand, which must be an 8-bit
1805
at the location specified with the third operand, which must be an 8-bit
1780
immediate value. The destination operand must be a MMX register, the source
1806
immediate value. The destination operand must be a MMX register, the source
1781
operand can be a 16-bit memory location or 32-bit general register (only low
1807
operand can be a 16-bit memory location or 32-bit general register (only low
1782
word of the register is used).
1808
word of the register is used).
1783
 
1809
 
1784
    pinsrw mm1,ebx,2   ; insert word from ebx
1810
    pinsrw mm1,ebx,2   ; insert word from ebx
1785
 
1811
 
1786
  "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"
1812
  "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"
1787
return the maximum values of packed unsigned bytes, "pminub" returns the
1813
return the maximum values of packed unsigned bytes, "pminub" returns the
1788
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
1814
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
1789
of packed signed words, "pminsw" returns the minimum values of packed signed
1815
of packed signed words, "pminsw" returns the minimum values of packed signed
1790
words. "pmulhuw" performs a unsigned multiply of the packed words and stores
1816
words. "pmulhuw" performs a unsigned multiplication of the packed words and
1791
the high words of the results in the destination operand. "psadbw" computes
1817
stores the high words of the results in the destination operand. "psadbw"
1792
the absolute differences of packed unsigned bytes, sums the differences, and
1818
computes the absolute differences of packed unsigned bytes, sums the
1793
stores the sum in the low word of destination operand. All these instructions
1819
differences, and stores the sum in the low word of destination operand. All
1794
follow the same rules for operands as the general MMX operations described in
1820
these instructions follow the same rules for operands as the general MMX
1795
previous section.
1821
operations described in previous section.
1796
  "pmovmskb" creates a mask made of the most significant bit of each byte in
1822
  "pmovmskb" creates a mask made of the most significant bit of each byte in
1797
the source operand and stores the result in the low byte of destination
1823
the source operand and stores the result in the low byte of destination
1798
operand. The source operand must be a MMX register, the destination operand
1824
operand. The source operand must be a MMX register, the destination operand
1799
must a 32-bit general register.
1825
must a 32-bit general register.
1800
  "pshufw" inserts words from the source operand in the destination operand
1826
  "pshufw" inserts words from the source operand in the destination operand
1801
from the locations specified with the third operand. The destination operand
1827
from the locations specified with the third operand. The destination operand
1802
must be a MMX register, the source operand can be a 64-bit memory location or
1828
must be a MMX register, the source operand can be a 64-bit memory location or
1803
MMX register, third operand must an 8-bit immediate value selecting which
1829
MMX register, third operand must an 8-bit immediate value selecting which
1804
values will be moved into destination operand, in the similar way as the third
1830
values will be moved into destination operand, in the similar way as the third
1805
operand of the "shufps" instruction.
1831
operand of the "shufps" instruction.
1806
  "movntq" moves the quad word from the source operand to memory using a
1832
  "movntq" moves the quad word from the source operand to memory using a
1807
non-temporal hint to minimize cache pollution. The source operand should be a
1833
non-temporal hint to minimize cache pollution. The source operand should be a
1808
MMX register, the destination operand should be a 64-bit memory location.
1834
MMX register, the destination operand should be a 64-bit memory location.
1809
"movntps" stores packed single precision values from the SSE register to
1835
"movntps" stores packed single precision values from the SSE register to
1810
memory using a non-temporal hint. The source operand should be a SSE register,
1836
memory using a non-temporal hint. The source operand should be a SSE register,
1811
the destination operand should be a 128-bit memory location. "maskmovq" stores
1837
the destination operand should be a 128-bit memory location. "maskmovq" stores
1812
selected bytes from the first operand into a 64-bit memory location using a
1838
selected bytes from the first operand into a 64-bit memory location using a
1813
non-temporal hint. Both operands should be a MMX registers, the second operand
1839
non-temporal hint. Both operands should be a MMX registers, the second operand
1814
selects wich bytes from the source operand are written to memory. The
1840
selects wich bytes from the source operand are written to memory. The
1815
memory location is pointed by DI (or EDI) register in the segment selected
1841
memory location is pointed by DI (or EDI) register in the segment selected
1816
by DS.
1842
by DS.
1817
  "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
1843
  "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
1818
of data from memory that contains byte specified with the operand to a
1844
of data from memory that contains byte specified with the operand to a
1819
specified location in hierarchy.  The operand should be an 8-bit memory
1845
specified location in hierarchy.  The operand should be an 8-bit memory
1820
location.
1846
location.
1821
  "sfence" performs a serializing operation on all instruction storing to
1847
  "sfence" performs a serializing operation on all instruction storing to
1822
memory that were issued prior to it. This instruction has no operands.
1848
memory that were issued prior to it. This instruction has no operands.
1823
  "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
1849
  "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
1824
stores the contents of MXCSR into a 32-bit memory operand.
1850
stores the contents of MXCSR into a 32-bit memory operand.
1825
  "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
1851
  "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
1826
and SSE registers to a 512-byte memory location specified in the destination
1852
and SSE registers to a 512-byte memory location specified in the destination
1827
operand. "fxrstor" reloads data previously stored with "fxsave" instruction
1853
operand. "fxrstor" reloads data previously stored with "fxsave" instruction
1828
from the specified 512-byte memory location. The memory operand for both those
1854
from the specified 512-byte memory location. The memory operand for both those
1829
instructions must be aligned on 16 byte boundary, it should declare operand
1855
instructions must be aligned on 16 byte boundary, it should declare operand
1830
of no specified size.
1856
of no specified size.
1831
 
1857
 
1832
 
1858
 
1833
2.1.16  SSE2 instructions
1859
2.1.16  SSE2 instructions
1834
 
1860
 
1835
The SSE2 extension introduces the operations on packed double precision
1861
The SSE2 extension introduces the operations on packed double precision
1836
floating point values, extends the syntax of MMX instructions, and adds also
1862
floating point values, extends the syntax of MMX instructions, and adds also
1837
some new instructions.
1863
some new instructions.
1838
  "movapd" and "movupd" transfer a double quad word operand containing packed
1864
  "movapd" and "movupd" transfer a double quad word operand containing packed
1839
double precision values from source operand to destination operand. These
1865
double precision values from source operand to destination operand. These
1840
instructions are analogous to "movaps" and "movups" and have the same rules
1866
instructions are analogous to "movaps" and "movups" and have the same rules
1841
for operands.
1867
for operands.
1842
  "movlpd" moves double precision value between the memory and the low quad
1868
  "movlpd" moves double precision value between the memory and the low quad
1843
word of SSE register. "movhpd" moved double precision value between the memory
1869
word of SSE register. "movhpd" moved double precision value between the memory
1844
and the high quad word of SSE register. These instructions are analogous to
1870
and the high quad word of SSE register. These instructions are analogous to
1845
"movlps" and "movhps" and have the same rules for operands.
1871
"movlps" and "movhps" and have the same rules for operands.
1846
  "movmskpd" transfers the most significant bit of each of the two double
1872
  "movmskpd" transfers the most significant bit of each of the two double
1847
precision values in the SSE register into low two bits of a general register.
1873
precision values in the SSE register into low two bits of a general register.
1848
This instruction is analogous to "movmskps" and has the same rules for
1874
This instruction is analogous to "movmskps" and has the same rules for
1849
operands.
1875
operands.
1850
  "movsd" transfers a double precision value between source and destination
1876
  "movsd" transfers a double precision value between source and destination
1851
operand (only the low quad word is trasferred). At least one of the operands
1877
operand (only the low quad word is trasferred). At least one of the operands
1852
have to be a SSE register, the second one can be also a SSE register or 64-bit
1878
have to be a SSE register, the second one can be also a SSE register or 64-bit
1853
memory location.
1879
memory location.
1854
  Arithmetic operations on double precision values are: "addpd", "addsd",
1880
  Arithmetic operations on double precision values are: "addpd", "addsd",
1855
"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
1881
"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
1856
"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
1882
"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
1857
operations on single precision values described in previous section. When the
1883
operations on single precision values described in previous section. When the
1858
mnemonic ends with "pd" instead of "ps", the operation is performed on packed
1884
mnemonic ends with "pd" instead of "ps", the operation is performed on packed
1859
two double precision values, but rules for operands are the same. When the
1885
two double precision values, but rules for operands are the same. When the
1860
mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
1886
mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
1861
memory location or a SSE register, the destination operand must be a SSE
1887
memory location or a SSE register, the destination operand must be a SSE
1862
register and the operation is performed on double precision values, only low
1888
register and the operation is performed on double precision values, only low
1863
quad words of SSE registers are used in this case.
1889
quad words of SSE registers are used in this case.
1864
  "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
1890
  "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
1865
packed double precision values. They are analoguous to SSE logical operations
1891
packed double precision values. They are analoguous to SSE logical operations
1866
on single prevision values and have the same rules for operands.
1892
on single prevision values and have the same rules for operands.
1867
  "cmppd" compares packed double precision values and returns and returns a
1893
  "cmppd" compares packed double precision values and returns and returns a
1868
mask result into the destination operand. This instruction is analoguous to
1894
mask result into the destination operand. This instruction is analoguous to
1869
"cmpps" and has the same rules for operands. "cmpsd" performs the same
1895
"cmpps" and has the same rules for operands. "cmpsd" performs the same
1870
operation on double precision values, only low quad word of destination
1896
operation on double precision values, only low quad word of destination
1871
register is affected, in this case source operand can be a 64-bit memory or
1897
register is affected, in this case source operand can be a 64-bit memory or
1872
SSE register. Variant with only two operands are obtained by attaching the
1898
SSE register. Variant with only two operands are obtained by attaching the
1873
condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
1899
condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
1874
the "pd" or "sd" at the end.
1900
the "pd" or "sd" at the end.
1875
  "comisd" and "ucomisd" compare the double precision values and set the ZF,
1901
  "comisd" and "ucomisd" compare the double precision values and set the ZF,
1876
PF and CF flags to show the result. The destination operand must be a SSE
1902
PF and CF flags to show the result. The destination operand must be a SSE
1877
register, the source operand can be a 128-bit memory location or SSE register.
1903
register, the source operand can be a 128-bit memory location or SSE register.
1878
  "shufpd" moves any of the two double precision values from the destination
1904
  "shufpd" moves any of the two double precision values from the destination
1879
operand into the low quad word of the destination operand, and any of the two
1905
operand into the low quad word of the destination operand, and any of the two
1880
values from the source operand into the high quad word of the destination
1906
values from the source operand into the high quad word of the destination
1881
operand. This instruction is analoguous to "shufps" and has the same rules for
1907
operand. This instruction is analoguous to "shufps" and has the same rules for
1882
operand. Bit 0 of the third operand selects the value to be moved from the
1908
operand. Bit 0 of the third operand selects the value to be moved from the
1883
destination operand, bit 1 selects the value to be moved from the source
1909
destination operand, bit 1 selects the value to be moved from the source
1884
operand, the rest of bits are reserved and must be zeroed.
1910
operand, the rest of bits are reserved and must be zeroed.
1885
  "unpckhpd" performs an unpack of the high quad words from the source and
1911
  "unpckhpd" performs an unpack of the high quad words from the source and
1886
destination operands, "unpcklpd" performs an unpack of the low quad words from
1912
destination operands, "unpcklpd" performs an unpack of the low quad words from
1887
the source and destination operands. They are analoguous to "unpckhps" and
1913
the source and destination operands. They are analoguous to "unpckhps" and
1888
"unpcklps", and have the same rules for operands.
1914
"unpcklps", and have the same rules for operands.
1889
  "cvtps2pd" converts the packed two single precision floating point values to
1915
  "cvtps2pd" converts the packed two single precision floating point values to
1890
two packed double precision floating point values, the destination operand
1916
two packed double precision floating point values, the destination operand
1891
must be a SSE register, the source operand can be a 64-bit memory location or
1917
must be a SSE register, the source operand can be a 64-bit memory location or
1892
SSE register. "cvtpd2ps" converts the packed two double precision floating
1918
SSE register. "cvtpd2ps" converts the packed two double precision floating
1893
point values to packed two single precision floating point values, the
1919
point values to packed two single precision floating point values, the
1894
destination operand must be a SSE register, the source operand can be a
1920
destination operand must be a SSE register, the source operand can be a
1895
128-bit memory location or SSE register. "cvtss2sd" converts the single
1921
128-bit memory location or SSE register. "cvtss2sd" converts the single
1896
precision floating point value to double precision floating point value, the
1922
precision floating point value to double precision floating point value, the
1897
destination operand must be a SSE register, the source operand can be a 32-bit
1923
destination operand must be a SSE register, the source operand can be a 32-bit
1898
memory location or SSE register. "cvtsd2ss" converts the double precision
1924
memory location or SSE register. "cvtsd2ss" converts the double precision
1899
floating point value to single precision floating point value, the destination
1925
floating point value to single precision floating point value, the destination
1900
operand must be a SSE register, the source operand can be 64-bit memory
1926
operand must be a SSE register, the source operand can be 64-bit memory
1901
location or SSE register.
1927
location or SSE register.
1902
  "cvtpi2pd" converts packed two double word integers into the the packed
1928
  "cvtpi2pd" converts packed two double word integers into the the packed
1903
double precision floating point values, the destination operand must be a SSE
1929
double precision floating point values, the destination operand must be a SSE
1904
register, the source operand can be a 64-bit memory location or MMX register.
1930
register, the source operand can be a 64-bit memory location or MMX register.
1905
"cvtsi2sd" converts a double word integer into a double precision floating
1931
"cvtsi2sd" converts a double word integer into a double precision floating
1906
point value, the destination operand must be a SSE register, the source
1932
point value, the destination operand must be a SSE register, the source
1907
operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
1933
operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
1908
converts packed double precision floating point values into packed two double
1934
converts packed double precision floating point values into packed two double
1909
word integers, the destination operand should be a MMX register, the source
1935
word integers, the destination operand should be a MMX register, the source
1910
operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
1936
operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
1911
the similar operation, except that truncation is used to round a source values
1937
the similar operation, except that truncation is used to round a source values
1912
to integers, rules for operands are the same. "cvtsd2si" converts a double
1938
to integers, rules for operands are the same. "cvtsd2si" converts a double
1913
precision floating point value into a double word integer, the destination
1939
precision floating point value into a double word integer, the destination
1914
operand should be a 32-bit general register, the source operand can be a
1940
operand should be a 32-bit general register, the source operand can be a
1915
64-bit memory location or SSE register. "cvttsd2si" performs the similar
1941
64-bit memory location or SSE register. "cvttsd2si" performs the similar
1916
operation, except that truncation is used to round a source value to integer,
1942
operation, except that truncation is used to round a source value to integer,
1917
rules for operands are the same.
1943
rules for operands are the same.
1918
  "cvtps2dq" and "cvttps2dq" convert packed single precision floating point
1944
  "cvtps2dq" and "cvttps2dq" convert packed single precision floating point
1919
values to packed four double word integers, storing them in the destination
1945
values to packed four double word integers, storing them in the destination
1920
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
1946
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
1921
point values to packed two double word integers, storing the result in the low
1947
point values to packed two double word integers, storing the result in the low
1922
quad word of the destination operand. "cvtdq2ps" converts packed four
1948
quad word of the destination operand. "cvtdq2ps" converts packed four
1923
double word integers to packed single precision floating point values.
1949
double word integers to packed single precision floating point values.
1924
"cvtdq2pd" converts packed two double word integers from the low quad word
1950
For all these instruction destination operand must be a SSE register, the
1925
of the source operand to packed double precision floating point values.
-
 
1926
For all these instruction destination operand must be a SSE register, the
-
 
1927
source operand can be a 128-bit memory location or SSE register.
1951
source operand can be a 128-bit memory location or SSE register.
1928
  "movdqa" and "movdqu" transfer a double quad word operand containing packed
1952
"cvtdq2pd" converts packed two double word integers from the source operand to
-
 
1953
packed double precision floating point values, the source can be a 64-bit 
-
 
1954
memory location or SSE register, destination has to be SSE register.
-
 
1955
  "movdqa" and "movdqu" transfer a double quad word operand containing packed
1929
integers from source operand to destination operand. At least one of the
1956
integers from source operand to destination operand. At least one of the
1930
operands have to be a SSE register, the second one can be also a SSE register
1957
operands have to be a SSE register, the second one can be also a SSE register
1931
or 128-bit memory location. Memory operands for "movdqa" instruction must be
1958
or 128-bit memory location. Memory operands for "movdqa" instruction must be
1932
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
1959
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
1933
to be aligned.
1960
to be aligned.
1934
  "movq2dq" moves the contents of the MMX source register to the low quad word
1961
  "movq2dq" moves the contents of the MMX source register to the low quad word
1935
of destination SSE register. "movdq2q" moves the low quad word from the source
1962
of destination SSE register. "movdq2q" moves the low quad word from the source
1936
SSE register to the destination MMX register.
1963
SSE register to the destination MMX register.
1937
 
1964
 
1938
    movq2dq xmm0,mm1   ; move from MMX register to SSE register
1965
    movq2dq xmm0,mm1   ; move from MMX register to SSE register
1939
    movdq2q mm0,xmm1   ; move from SSE register to MMX register
1966
    movdq2q mm0,xmm1   ; move from SSE register to MMX register
1940
 
1967
 
1941
  All MMX instructions operating on the 64-bit packed integers (those with
1968
  All MMX instructions operating on the 64-bit packed integers (those with
1942
mnemonics starting with "p") are extended to operate on 128-bit packed
1969
mnemonics starting with "p") are extended to operate on 128-bit packed
1943
integers located in SSE registers. Additional syntax for these instructions
1970
integers located in SSE registers. Additional syntax for these instructions
1944
needs an SSE register where MMX register was needed, and the 128-bit memory
1971
needs an SSE register where MMX register was needed, and the 128-bit memory
1945
location or SSE register where 64-bit memory location of MMX register were
1972
location or SSE register where 64-bit memory location or MMX register were
1946
needed. The exception is "pshufw" instruction, which doesn't allow extended
1973
needed. The exception is "pshufw" instruction, which doesn't allow extended
1947
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
1974
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
1948
the extended syntax, and perform the same operation as "pshufw" on the high
1975
the extended syntax, and perform the same operation as "pshufw" on the high
1949
or low quad words of operands respectively. Also the new instruction "pshufd"
1976
or low quad words of operands respectively. Also the new instruction "pshufd"
1950
is introduced, which performs the same operation as "pshufw", but on the
1977
is introduced, which performs the same operation as "pshufw", but on the
1951
double words instead of words, it allows only the extended syntax.
1978
double words instead of words, it allows only the extended syntax.
1952
 
1979
 
1953
    psubb xmm0,[esi]   ; substract 16 packed bytes
1980
    psubb xmm0,[esi]   ; substract 16 packed bytes
1954
    pextrw eax,xmm0,7  ; extract highest word into eax
1981
    pextrw eax,xmm0,7  ; extract highest word into eax
1955
 
1982
 
1956
  "paddq" performs the addition of packed quad words, "psubq" performs the
1983
  "paddq" performs the addition of packed quad words, "psubq" performs the
1957
substraction of packed quad words, "pmuludq" performs an unsigned multiply
1984
substraction of packed quad words, "pmuludq" performs an unsigned
1958
of low double words from each corresponding quad words and returns the results
1985
multiplication of low double words from each corresponding quad words and
1959
in packed quad words. These instructions follow the same rules for operands as
1986
returns the results in packed quad words. These instructions follow the same
1960
the general MMX operations described in 2.1.14.
1987
rules for operands as the general MMX operations described in 2.1.14.
1961
  "pslldq" and "psrldq" perform logical shift left or right of the double
1988
  "pslldq" and "psrldq" perform logical shift left or right of the double
1962
quad word in the destination operand by the amount of bits specified in the
1989
quad word in the destination operand by the amount of bytes specified in the
1963
source operand. The destination operand should be a SSE register, source
1990
source operand. The destination operand should be a SSE register, source
1964
operand should be an 8-bit immediate value.
1991
operand should be an 8-bit immediate value.
1965
  "punpckhqdq" interleaves the high quad word of the source operand and the
1992
  "punpckhqdq" interleaves the high quad word of the source operand and the
1966
high quad word of the destination operand and writes them to the destination
1993
high quad word of the destination operand and writes them to the destination
1967
SSE register. "punpcklqdq" interleaves the low quad word of the source operand
1994
SSE register. "punpcklqdq" interleaves the low quad word of the source operand
1968
and the low quad word of the destination operand and writes them to the
1995
and the low quad word of the destination operand and writes them to the
1969
destination SSE register. The source operand can be a 128-bit memory location
1996
destination SSE register. The source operand can be a 128-bit memory location
1970
or SSE register.
1997
or SSE register.
1971
  "movntdq" stores packed integer data from the SSE register to memory using
1998
  "movntdq" stores packed integer data from the SSE register to memory using
1972
non-temporal hint. The source operand should be a SSE register, the
1999
non-temporal hint. The source operand should be a SSE register, the
1973
destination operand should be a 128-bit memory location. "movntpd" stores
2000
destination operand should be a 128-bit memory location. "movntpd" stores
1974
packed double precision values from the SSE register to memory using a
2001
packed double precision values from the SSE register to memory using a
1975
non-temporal hint. Rules for operand are the same. "movnti" stores integer
2002
non-temporal hint. Rules for operand are the same. "movnti" stores integer
1976
from a general register to memory using a non-temporal hint. The source
2003
from a general register to memory using a non-temporal hint. The source
1977
operand should be a 32-bit general register, the destination operand should
2004
operand should be a 32-bit general register, the destination operand should
1978
be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
2005
be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
1979
operand into a 128-bit memory location using a non-temporal hint. Both
2006
operand into a 128-bit memory location using a non-temporal hint. Both
1980
operands should be a SSE registers, the second operand selects wich bytes from
2007
operands should be a SSE registers, the second operand selects wich bytes from
1981
the source operand are written to memory. The memory location is pointed by DI
2008
the source operand are written to memory. The memory location is pointed by DI
1982
(or EDI) register in the segment selected by DS and does not need to be
2009
(or EDI) register in the segment selected by DS and does not need to be
1983
aligned.
2010
aligned.
1984
  "clflush" writes and invalidates the cache line associated with the address
2011
  "clflush" writes and invalidates the cache line associated with the address
1985
of byte specified with the operand, which should be a 8-bit memory location.
2012
of byte specified with the operand, which should be a 8-bit memory location.
1986
  "lfence" performs a serializing operation on all instruction loading from
2013
  "lfence" performs a serializing operation on all instruction loading from
1987
memory that were issued prior to it. "mfence" performs a serializing operation
2014
memory that were issued prior to it. "mfence" performs a serializing operation
1988
on all instruction accesing memory that were issued prior to it, and so it
2015
on all instruction accesing memory that were issued prior to it, and so it
1989
combines the functions of "sfence" (described in previous section) and
2016
combines the functions of "sfence" (described in previous section) and
1990
"lfence" instructions. These instructions have no operands.
2017
"lfence" instructions. These instructions have no operands.
1991
 
2018
 
1992
 
2019
 
1993
2.1.17  SSE3 instructions
2020
2.1.17  SSE3 instructions
1994
 
2021
 
1995
Prescott technology introduced some new instructions to improve the performance
2022
Prescott technology introduced some new instructions to improve the performance
1996
of SSE and SSE2 - this extension is called SSE3.
2023
of SSE and SSE2 - this extension is called SSE3.
1997
  "fisttp" behaves like the "fistp" instruction and accepts the same operands,
2024
  "fisttp" behaves like the "fistp" instruction and accepts the same operands,
1998
the only difference is that it always used truncation, irrespective of the
2025
the only difference is that it always used truncation, irrespective of the
1999
rounding mode.
2026
rounding mode.
2000
  "movshdup" loads into destination operand the 128-bit value obtained from
2027
  "movshdup" loads into destination operand the 128-bit value obtained from
2001
the source value of the same size by filling the each quad word with the two
2028
the source value of the same size by filling the each quad word with the two
2002
duplicates of the value in its high double word. "movsldup" performs the same
2029
duplicates of the value in its high double word. "movsldup" performs the same
2003
action, except it duplicates the values of low double words. The destination
2030
action, except it duplicates the values of low double words. The destination
2004
operand should be SSE register, the source operand can be SSE register or
2031
operand should be SSE register, the source operand can be SSE register or
2005
128-bit memory location.
2032
128-bit memory location.
2006
  "movddup" loads the 64-bit source value and duplicates it into high and low
2033
  "movddup" loads the 64-bit source value and duplicates it into high and low
2007
quad word of the destination operand. The destination operand should be SSE
2034
quad word of the destination operand. The destination operand should be SSE
2008
register, the source operand can be SSE register or 64-bit memory location.
2035
register, the source operand can be SSE register or 64-bit memory location.
2009
  "lddqu" is functionally equivalent to "movdqu" instruction with memory as
2036
  "lddqu" is functionally equivalent to "movdqu" with memory as source 
2010
source operand, but it may improve performance when the source operand crosses
2037
operand, but it may improve performance when the source operand crosses a 
2011
a cacheline boundary. The destination operand has to be SSE register, the
2038
cacheline boundary. The destination operand has to be SSE register, the source
2012
source operand must be 128-bit memory location.
2039
operand must be 128-bit memory location.
2013
  "addsubps" performs single precision addition of second and fourth pairs and
2040
  "addsubps" performs single precision addition of second and fourth pairs and
2014
single precision substracion of the first and third pairs of floating point
2041
single precision substracion of the first and third pairs of floating point
2015
values in the operands. "addsubpd" performs double precision addition of the
2042
values in the operands. "addsubpd" performs double precision addition of the
2016
second pair and double precision substraction of the first pair of floating
2043
second pair and double precision substraction of the first pair of floating
2017
point values in the operand. "haddps" performs the addition of two single
2044
point values in the operand. "haddps" performs the addition of two single
2018
precision values within the each quad word of source and destination operands,
2045
precision values within the each quad word of source and destination operands,
2019
and stores the results of such horizontal addition of values from destination
2046
and stores the results of such horizontal addition of values from destination
2020
operand into low quad word of destination operand, and the results from the
2047
operand into low quad word of destination operand, and the results from the
2021
source operand into high quad word of destination operand. "haddpd" performs
2048
source operand into high quad word of destination operand. "haddpd" performs
2022
the addition of two double precision values within each operand, and stores
2049
the addition of two double precision values within each operand, and stores
2023
the result from destination operand into low quad word of destination operand,
2050
the result from destination operand into low quad word of destination operand,
2024
and the result from source operand into high quad word of destination operand.
2051
and the result from source operand into high quad word of destination operand.
2025
All these instruction need the destination operand to be SSE register, source
2052
All these instruction need the destination operand to be SSE register, source
2026
operand can be SSE register or 128-bit memory location.
2053
operand can be SSE register or 128-bit memory location.
2027
  "monitor" sets up an address range for monitoring of write-back stores. It
2054
  "monitor" sets up an address range for monitoring of write-back stores. It
2028
need its three operands to be EAX, ECX and EDX register in that order. "mwait"
2055
need its three operands to be EAX, ECX and EDX register in that order. "mwait"
2029
waits for a write-back store to the address range set up by the "monitor"
2056
waits for a write-back store to the address range set up by the "monitor"
2030
instruction. It uses two operands with additional parameters, first being the
2057
instruction. It uses two operands with additional parameters, first being the
2031
EAX and second the ECX register.
2058
EAX and second the ECX register.
2032
 
2059
  The functionality of SSE3 is further extended by the set of Supplemental
-
 
2060
SSE3 instructions (SSSE3). They generally follow the same rules for operands
-
 
2061
as all the MMX operations extended by SSE.
-
 
2062
  "phaddw" and "phaddd" perform the horizontal additional of the pairs of
-
 
2063
adjacent values from both the source and destination operand, and stores the
-
 
2064
sums into the destination (sums from the source operand go into lower part of
-
 
2065
destination register). They operate on 16-bit or 32-bit chunks, respectively.
-
 
2066
"phaddsw" performs the same operation on signed 16-bit packed values, but the
-
 
2067
result of each addition is saturated. "phsubw" and "phsubd" analogously
-
 
2068
perform the horizontal substraction of 16-bit or 32-bit packed value, and
-
 
2069
"phsubsw" performs the horizontal substraction of signed 16-bit packed values
-
 
2070
with saturation.
-
 
2071
  "pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed
-
 
2072
packed signed value in source operand and stores them into the destination
-
 
2073
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.
-
 
2074
  "pmaddubsw" multiplies signed 8-bit values from the source operand with the
-
 
2075
corresponding unsigned 8-bit values from the destination operand to produce
-
 
2076
intermediate 16-bit values, and every adjacent pair of those intermediate
-
 
2077
values is then added horizontally and those 16-bit sums are stored into the
-
 
2078
destination operand.
-
 
2079
  "pmulhrsw" multiplies corresponding 16-bit integers from the source and
-
 
2080
destination operand to produce intermediate 32-bit values, and the 16 bits
-
 
2081
next to the highest bit of each of those values are then rounded and packed
-
 
2082
into the destination operand.
-
 
2083
  "pshufb" shuffles the bytes in the destination operand according to the
-
 
2084
mask provided by source operand - each of the bytes in source operand is
-
 
2085
an index of the target position for the corresponding byte in the destination.
-
 
2086
  "psignb", "psignw" and "psignd" perform the operation on 8-bit, 16-bit or
-
 
2087
32-bit integers in destination operand, depending on the signs of the values
-
 
2088
in the source. If the value in source is negative, the corresponding value in
-
 
2089
the destination register is negated, if the value in source is positive, no
-
 
2090
operation is performed on the corresponding value is performed, and if the
-
 
2091
value in source is zero, the value in destination is zeroed, too.
-
 
2092
  "palignr" appends the source operand to the destination operand to form the
-
 
2093
intermediate value of twice the size, and then extracts into the destination
-
 
2094
register the 64 or 128 bits that are right-aligned to the byte offset
-
 
2095
specified by the third operand, which should be an 8-bit immediate value. This
-
 
2096
is the only SSSE3 instruction that takes three arguments.
-
 
2097
 
2033
 
2098
 
2034
2.1.18  AMD 3DNow! instructions
2099
2.1.18  AMD 3DNow! instructions
2035
 
2100
 
2036
The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
2101
The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
2037
and introduces operation on the 64-bit packed floating point values, each
2102
and introduces operation on the 64-bit packed floating point values, each
2038
consisting of two single precision floating point values.
2103
consisting of two single precision floating point values.
2039
  These instructions follow the same rules as the general MMX operations, the
2104
  These instructions follow the same rules as the general MMX operations, the
2040
destination operand should be a MMX register, the source operand can be a MMX
2105
destination operand should be a MMX register, the source operand can be a MMX
2041
register or 64-bit memory location. "pavgusb" computes the rounded averages
2106
register or 64-bit memory location. "pavgusb" computes the rounded averages
2042
of packed unsigned bytes. "pmulhrw" performs a signed multiply of the packed
2107
of packed unsigned bytes. "pmulhrw" performs a signed multiplication of the
2043
words, round the high word of each double word results and stores them in the
2108
packed words, round the high word of each double word results and stores them
2044
destination operand. "pi2fd" converts packed double word integers into
2109
in the destination operand. "pi2fd" converts packed double word integers into
2045
packed floating point values. "pf2id" converts packed floating point values
2110
packed floating point values. "pf2id" converts packed floating point values
2046
into packed double word integers using truncation. "pi2fw" converts packed
2111
into packed double word integers using truncation. "pi2fw" converts packed
2047
word integers into packed floating point values, only low words of each
2112
word integers into packed floating point values, only low words of each
2048
double word in source operand are used. "pf2iw" converts packed floating
2113
double word in source operand are used. "pf2iw" converts packed floating
2049
point values to packed word integers, results are extended to double words
2114
point values to packed word integers, results are extended to double words
2050
using the sign extension. "pfadd" adds packed floating point values. "pfsub"
2115
using the sign extension. "pfadd" adds packed floating point values. "pfsub"
2051
and "pfsubr" substracts packed floating point values, the first one substracts
2116
and "pfsubr" substracts packed floating point values, the first one substracts
2052
source values from destination values, the second one substracts destination
2117
source values from destination values, the second one substracts destination
2053
values from the source values. "pfmul" multiplies packed floating point
2118
values from the source values. "pfmul" multiplies packed floating point
2054
values. "pfacc" adds the low and high floating point values of the destination
2119
values. "pfacc" adds the low and high floating point values of the destination
2055
operand, storing the result in the low double word of destination, and adds
2120
operand, storing the result in the low double word of destination, and adds
2056
the low and high floating point values of the source operand, storing the
2121
the low and high floating point values of the source operand, storing the
2057
result in the high double word of destination. "pfnacc" substracts the high
2122
result in the high double word of destination. "pfnacc" substracts the high
2058
floating point value of the destination operand from the low, storing the
2123
floating point value of the destination operand from the low, storing the
2059
result in the low double word of destination, and substracts the high floating
2124
result in the low double word of destination, and substracts the high floating
2060
point value of the source operand from the low, storing the result in the high
2125
point value of the source operand from the low, storing the result in the high
2061
double word of destination. "pfpnacc" substracts the high floating point value
2126
double word of destination. "pfpnacc" substracts the high floating point value
2062
of the destination operand from the low, storing the result in the low double
2127
of the destination operand from the low, storing the result in the low double
2063
word of destination, and adds the low and high floating point values of the
2128
word of destination, and adds the low and high floating point values of the
2064
source operand, storing the result in the high double word of destination.
2129
source operand, storing the result in the high double word of destination.
2065
"pfmax" and "pfmin" compute the maximum and minimum of floating point values.
2130
"pfmax" and "pfmin" compute the maximum and minimum of floating point values.
2066
"pswapd" reverses the high and low double word of the source operand. "pfrcp"
2131
"pswapd" reverses the high and low double word of the source operand. "pfrcp"
2067
returns an estimates of the reciprocals of floating point values from the
2132
returns an estimates of the reciprocals of floating point values from the
2068
source operand, "pfrsqrt" returns an estimates of the reciprocal square
2133
source operand, "pfrsqrt" returns an estimates of the reciprocal square
2069
roots of floating point values from the source operand, "pfrcpit1" performs
2134
roots of floating point values from the source operand, "pfrcpit1" performs
2070
the first step in the Newton-Raphson iteration to refine the reciprocal
2135
the first step in the Newton-Raphson iteration to refine the reciprocal
2071
approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
2136
approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
2072
step in the Newton-Raphson iteration to refine the reciprocal square root
2137
step in the Newton-Raphson iteration to refine the reciprocal square root
2073
approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
2138
approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
2074
second final step in the Newton-Raphson iteration to refine the reciprocal
2139
second final step in the Newton-Raphson iteration to refine the reciprocal
2075
approximation or the reciprocal square root approximation. "pfcmpeq",
2140
approximation or the reciprocal square root approximation. "pfcmpeq",
2076
"pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
2141
"pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
2077
all bits or zeroes all bits of the correspoding data element in the
2142
all bits or zeroes all bits of the correspoding data element in the
2078
destination operand according to the result of comparison, first checks
2143
destination operand according to the result of comparison, first checks
2079
whether values are equal, second checks whether destination value is greater
2144
whether values are equal, second checks whether destination value is greater
2080
or equal to source value, third checks whether destination value is greater
2145
or equal to source value, third checks whether destination value is greater
2081
than source value.
2146
than source value.
2082
  "prefetch" and "prefetchw" load the line of data from memory that contains
2147
  "prefetch" and "prefetchw" load the line of data from memory that contains
2083
byte specified with the operand into the data cache, "prefetchw" instruction
2148
byte specified with the operand into the data cache, "prefetchw" instruction
2084
should be used when the data in the cache line is expected to be modified,
2149
should be used when the data in the cache line is expected to be modified,
2085
otherwise the "prefetch" instruction should be used. The operand should be an
2150
otherwise the "prefetch" instruction should be used. The operand should be an
2086
8-bit memory location.
2151
8-bit memory location.
2087
  "femms" performs a fast clear of MMX state. This instruction has no
2152
  "femms" performs a fast clear of MMX state. This instruction has no
2088
operands.
2153
operands.
2089
 
2154
 
2090
 
2155
 
2091
2.1.19  The x86-64 long mode instructions
2156
2.1.19  The x86-64 long mode instructions
2092
 
2157
 
2093
The AMD64 and EM64T architectures (we will use the common name x86-64 for them
2158
The AMD64 and EM64T architectures (we will use the common name x86-64 for them
2094
both) extend the x86 instruction set for the 64-bit processing. While legacy
2159
both) extend the x86 instruction set for the 64-bit processing. While legacy
2095
and compatibility modes use the same set of registers and instructions, the
2160
and compatibility modes use the same set of registers and instructions, the
2096
new long mode extends the x86 operations to 64 bits and introduces several new
2161
new long mode extends the x86 operations to 64 bits and introduces several new
2097
registers. You can turn on generating the code for this mode with the "use64"
2162
registers. You can turn on generating the code for this mode with the "use64"
2098
directive.
2163
directive.
2099
  Each of the general purpose registers is extended to 64 bits and the eight
2164
  Each of the general purpose registers is extended to 64 bits and the eight
2100
whole new general purpose registers and also eight new SSE registers are added.
2165
whole new general purpose registers and also eight new SSE registers are added.
2101
See table 2.4 for the summary of new registers (only the ones that was not
2166
See table 2.4 for the summary of new registers (only the ones that was not
2102
listed in table 1.2). The general purpose registers of smallers sizes are the
2167
listed in table 1.2). The general purpose registers of smallers sizes are the
2103
low order portions of the larger ones. You can still access the "ah", "bh",
2168
low order portions of the larger ones. You can still access the "ah", "bh",
2104
"ch" and "dh" registers in long mode, but you cannot use them in the same
2169
"ch" and "dh" registers in long mode, but you cannot use them in the same
2105
instruction with any of the new registers.
2170
instruction with any of the new registers.
2106
 
2171
 
2107
   Table 2.4  New registers in long mode
2172
   Table 2.4  New registers in long mode
2108
  ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿
2173
  /--------------------------------------------------\
2109
  ³ Type ³          General          ³  SSE  ³
2174
  | Type |          General          |  SSE  |  AVX  |
2110
  ÃÄÄÄÄÄÄÅÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÅÄÄÄÄÄÄÄ´
2175
  |------|---------------------------|-------|-------|
2111
  ³ Bits ³  8   ³  16  ³  32  ³  64  ³  128  ³
2176
  | Bits |  8   |  16  |  32  |  64  |  128  |  256  |
2112
  ÆÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵
2177
  |======|======|======|======|======|=======|=======|
2113
  ³      ³      ³      ³      ³ rax  ³       ³
2178
  |      |      |      |      | rax  |       |       |
2114
  ³      ³      ³      ³      ³ rcx  ³       ³
2179
  |      |      |      |      | rcx  |       |       |
2115
  ³      ³      ³      ³      ³ rdx  ³       ³
2180
  |      |      |      |      | rdx  |       |       |
2116
  ³      ³      ³      ³      ³ rbx  ³       ³
2181
  |      |      |      |      | rbx  |       |       |
2117
  ³      ³ spl  ³      ³      ³ rsp  ³       ³
2182
  |      | spl  |      |      | rsp  |       |       |
2118
  ³      ³ bpl  ³      ³      ³ rbp  ³       ³
2183
  |      | bpl  |      |      | rbp  |       |       |
2119
  ³      ³ sil  ³      ³      ³ rsi  ³       ³
2184
  |      | sil  |      |      | rsi  |       |       |
2120
  ³      ³ dil  ³      ³      ³ rdi  ³       ³
2185
  |      | dil  |      |      | rdi  |       |       |
2121
  ³      ³ r8b  ³ r8w  ³ r8d  ³ r8   ³ xmm8  ³
2186
  |      | r8b  | r8w  | r8d  | r8   | xmm8  | ymm8  |
2122
  ³      ³ r9b  ³ r9w  ³ r9d  ³ r9   ³ xmm9  ³
2187
  |      | r9b  | r9w  | r9d  | r9   | xmm9  | ymm9  |
2123
  ³      ³ r10b ³ r10w ³ r10d ³ r10  ³ xmm10 ³
2188
  |      | r10b | r10w | r10d | r10  | xmm10 | ymm10 |
2124
  ³      ³ r11b ³ r11w ³ r11d ³ r11  ³ xmm11 ³
2189
  |      | r11b | r11w | r11d | r11  | xmm11 | ymm11 |
2125
  ³      ³ r12b ³ r12w ³ r12d ³ r12  ³ xmm12 ³
2190
  |      | r12b | r12w | r12d | r12  | xmm12 | ymm12 |
2126
  ³      ³ r13b ³ r13w ³ r13d ³ r13  ³ xmm13 ³
2191
  |      | r13b | r13w | r13d | r13  | xmm13 | ymm13 |
2127
  ³      ³ r14b ³ r14w ³ r14d ³ r14  ³ xmm14 ³
2192
  |      | r14b | r14w | r14d | r14  | xmm14 | ymm14 |
2128
  ³      ³ r15b ³ r15w ³ r15d ³ r15  ³ xmm15 ³
2193
  |      | r15b | r15w | r15d | r15  | xmm15 | ymm15 |
2129
  ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ
2194
  \--------------------------------------------------/
2130
 
2195
 
2131
   In general any instruction from x86 architecture, which allowed 16-bit or
2196
   In general any instruction from x86 architecture, which allowed 16-bit or
2132
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
2197
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
2133
registers should be used for addressing in long mode, the 32-bit addressing
2198
registers should be used for addressing in long mode, the 32-bit addressing
2134
is also allowed, but it's not possible to use the addresses based on 16-bit
2199
is also allowed, but it's not possible to use the addresses based on 16-bit
2135
registers. Below are the samples of new operations possible in long mode on the
2200
registers. Below are the samples of new operations possible in long mode on the
2136
example of "mov" instruction:
2201
example of "mov" instruction:
2137
 
2202
 
2138
    mov rax,r8   ; transfer 64-bit general register
2203
    mov rax,r8   ; transfer 64-bit general register
2139
    mov al,[rbx] ; transfer memory addressed by 64-bit register
2204
    mov al,[rbx] ; transfer memory addressed by 64-bit register
2140
 
2205
 
2141
The long mode uses also the instruction pointer based addresses, you can
2206
The long mode uses also the instruction pointer based addresses, you can
2142
specify it manually with the special RIP register symbol, but such addressing
2207
specify it manually with the special RIP register symbol, but such addressing
2143
is also automatically generated by flat assembler, since there is no 64-bit
2208
is also automatically generated by flat assembler, since there is no 64-bit
2144
absolute addressing in long mode. You can still force the assembler to use the
2209
absolute addressing in long mode. You can still force the assembler to use the
2145
32-bit absolute addressing by putting the "dword" size override for address
2210
32-bit absolute addressing by putting the "dword" size override for address
2146
inside the square brackets. There is also one exception, where the 64-bit
2211
inside the square brackets. There is also one exception, where the 64-bit
2147
absolute addressing is possible, it's the "mov" instruction with one of the
2212
absolute addressing is possible, it's the "mov" instruction with one of the
2148
operand being accumulator register, and second being the memory operand.
2213
operand being accumulator register, and second being the memory operand.
2149
To force the assembler to use the 64-bit absolute addressing there, use the
2214
To force the assembler to use the 64-bit absolute addressing there, use the
2150
"qword" size operator for address inside the square brackets. When no size
2215
"qword" size operator for address inside the square brackets. When no size
2151
operator is applied to address, assembler generates the optimal form
2216
operator is applied to address, assembler generates the optimal form
2152
automatically.
2217
automatically.
2153
 
2218
 
2154
    mov [qword 0],rax  ; absolute 64-bit addressing
2219
    mov [qword 0],rax  ; absolute 64-bit addressing
2155
    mov [dword 0],r15d ; absolute 32-bit addressing
2220
    mov [dword 0],r15d ; absolute 32-bit addressing
2156
    mov [0],rsi        ; automatic RIP-relative addressing
2221
    mov [0],rsi        ; automatic RIP-relative addressing
2157
    mov [rip+3],sil    ; manual RIP-relative addressing
2222
    mov [rip+3],sil    ; manual RIP-relative addressing
2158
 
2223
 
2159
  Also as the immediate operands for 64-bit operations only the signed 32-bit
2224
  Also as the immediate operands for 64-bit operations only the signed 32-bit
2160
values are possible, with the only exception being the "mov" instruction with
2225
values are possible, with the only exception being the "mov" instruction with
2161
destination operand being 64-bit general purpose register. Trying to force the
2226
destination operand being 64-bit general purpose register. Trying to force the
2162
64-bit immediate with any other instruction will cause an error.
2227
64-bit immediate with any other instruction will cause an error.
2163
  If any operation is performed on the 32-bit general registers in long mode,
2228
  If any operation is performed on the 32-bit general registers in long mode,
2164
the upper 32 bits of the 64-bit registers containing them are filled with
2229
the upper 32 bits of the 64-bit registers containing them are filled with
2165
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
2230
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
2166
registers, which preserve the upper bits.
2231
registers, which preserve the upper bits.
2167
  Three new type conversion instructions are available. The "cdqe" sign extends
2232
  Three new type conversion instructions are available. The "cdqe" sign 
2168
the double word in EAX into quad word and stores the result in RAX register.
2233
extends the double word in EAX into quad word and stores the result in RAX 
2169
"cqo" sign extends the quad word in RAX into double quad word and stores the
2234
register. "cqo" sign extends the quad word in RAX into double quad word and 
2170
extra bits in the RDX register. These instructions have no operands. "movsxd"
2235
stores the extra bits in the RDX register. These instructions have no 
2171
sign extends the double word source operand, being either the 32-bit register
2236
operands. "movsxd" sign extends the double word source operand, being either
2172
or memory, into 64-bit destination operand, which has to be register.
2237
the 32-bit register or memory, into 64-bit destination operand, which has to
2173
No analogous instruction is needed for the zero extension, since it is done
2238
be register. No analogous instruction is needed for the zero extension, since
2174
automatically by any operations on 32-bit registers, as noted in previous
2239
it is done automatically by any operations on 32-bit registers, as noted in
2175
paragraph. And the "movzx" and "movsx" instructions, conforming to the general
2240
previous paragraph. And the "movzx" and "movsx" instructions, conforming to
2176
rule, can be used with 64-bit destination operand, allowing extension of byte
2241
the general rule, can be used with 64-bit destination operand, allowing
2177
or word values into quad words.
2242
extension of byte or word values into quad words.
2178
  All the binary arithmetic and logical instruction are promoted to allow
2243
  All the binary arithmetic and logical instruction have been promoted to
2179
64-bit operands in long mode. The use of decimal arithmetic instructions in
2244
allow 64-bit operands in long mode. The use of decimal arithmetic instructions
2180
long mode is prohibited.
2245
in long mode is prohibited.
2181
  The stack operations, like "push" and "pop" in long mode default to 64-bit
2246
  The stack operations, like "push" and "pop" in long mode default to 64-bit
2182
operands and it's not possible to use 32-bit operands with them. The "pusha"
2247
operands and it's not possible to use 32-bit operands with them. The "pusha"
2183
and "popa" are disallowed in long mode.
2248
and "popa" are disallowed in long mode.
2184
  The indirect near jumps and calls in long mode default to 64-bit operands and
2249
  The indirect near jumps and calls in long mode default to 64-bit operands
2185
it's not possible to use the 32-bit operands with them. On the other hand, the
2250
and it's not possible to use the 32-bit operands with them. On the other hand,
2186
indirect far jumps and calls allow any operands that were allowed by the x86
2251
the indirect far jumps and calls allow any operands that were allowed by the 
2187
architecture and also 80-bit memory operand is allowed (though only EM64T seems
2252
x86 architecture and also 80-bit memory operand is allowed (though only EM64T
2188
to implement such variant), with the first eight bytes defining the offset and
2253
seems to implement such variant), with the first eight bytes defining the 
2189
two last bytes specifying the selector. The direct far jumps and calls are not
2254
offset and two last bytes specifying the selector. The direct far jumps and 
2190
allowed in long mode.
2255
calls are not allowed in long mode.
2191
  The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
2256
  The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
2192
instructions that are not extended to accept quad word operands in long mode.
2257
instructions that are not extended to accept quad word operands in long mode.
2193
But all other string operations are, and there are new short forms "movsq",
2258
But all other string operations are, and there are new short forms "movsq",
2194
"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
2259
"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
2195
operations for 64-bit string elements. The RSI and RDI registers are used by
2260
operations for 64-bit string elements. The RSI and RDI registers are used by
2196
default to address the string elements.
2261
default to address the string elements.
2197
  The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
2262
  The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
2198
memory operand with 64-bit destination register (though only EM64T seems to
2263
memory operand with 64-bit destination register (though only EM64T seems to
2199
implement such variant). The "lds" and "les" are disallowed in long mode.
2264
implement such variant). The "lds" and "les" are disallowed in long mode.
2200
  The system instructions like "lgdt" which required the 48-bit memory operand,
2265
  The system instructions like "lgdt" which required the 48-bit memory operand,
2201
in long mode require the 80-bit memory operand.
2266
in long mode require the 80-bit memory operand.
2202
  The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
2267
  The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
2203
the double quad word memory operand and 64-bit registers to perform the
2268
the double quad word memory operand and 64-bit registers to perform the
2204
analoguous operation.
2269
analoguous operation.
2205
  "swapgs" is the new instruction, which swaps the contents of GS register and
2270
  The "fxsave64" and "fxrstor64" are new variants of "fxsave" and "fxrstor"
-
 
2271
instructions, available only in long mode, which use a different format of
-
 
2272
storage area in order to store some pointers in full 64-bit size.  
-
 
2273
  "swapgs" is the new instruction, which swaps the contents of GS register and
2206
the KernelGSbase model-specific register (MSR address 0C0000102h).
2274
the KernelGSbase model-specific register (MSR address 0C0000102h).
2207
  "syscall" and "sysret" is the pair of new instructions that provide the
2275
  "syscall" and "sysret" is the pair of new instructions that provide the
2208
functionality similar to "sysenter" and "sysexit" in long mode, where the
2276
functionality similar to "sysenter" and "sysexit" in long mode, where the
2209
latter pair is disallowed.
2277
latter pair is disallowed. The "sysexitq" and "sysretq" mnemonics provide the
-
 
2278
64-bit versions of "sysexit" and "sysret" instructions.
-
 
2279
  The "rdmsrq" and "wrmsrq" mnemonics are the 64-bit variants of the "rdmsr"
-
 
2280
and "wrmsr" instructions.
-
 
2281
 
-
 
2282
 
-
 
2283
2.1.20  SSE4 instructions
-
 
2284
 
-
 
2285
There are actually three different sets of instructions under the name SSE4.
-
 
2286
Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
-
 
2287
former into the full Intel's SSE4 set. On the other hand, the implementation
-
 
2288
by AMD includes only a few instructions from this set, but also contains
-
 
2289
some additional instructions, that are called the SSE4a set.
-
 
2290
  The SSE4.1 instructions mostly follow the same rules for operands, as
-
 
2291
the basic SSE operations, so they require destination operand to be SSE
-
 
2292
register and source operand to be 128-bit memory location or SSE register,
-
 
2293
and some operations require a third operand, the 8-bit immediate value.
-
 
2294
  "pmulld" performs a signed multiplication of the packed double words and
-
 
2295
stores the low double words of the results in the destination operand.
-
 
2296
"pmuldq" performs a two signed multiplications of the corresponding double
-
 
2297
words in the lower quad words of operands, and stores the results as
-
 
2298
packed quad words into the destination register. "pminsb" and "pmaxsb"
-
 
2299
return the minimum or maximum values of packed signed bytes, "pminuw" and
-
 
2300
"pmaxuw" return the minimum and maximum values of packed unsigned words,
-
 
2301
"pminud", "pmaxud", "pminsd" and "pmaxsd" return minimum or maximum values
-
 
2302
of packed unsigned or signed words. These instruction complement the
-
 
2303
instructions computing packed minimum or maximum introduced by SSE.
-
 
2304
  "ptest" sets the ZF flag to one when the result of bitwise AND of the
-
 
2305
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
-
 
2306
to one, when the result of bitwise AND of the destination operand with
-
 
2307
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
-
 
2308
"pcmpeqq" compares packed quad words for equality, and fills the
-
 
2309
corresponding elements of destination operand with either ones or zeros,
-
 
2310
depending on the result of comparison.
-
 
2311
  "packusdw" converts packed signed double words from both the source and
-
 
2312
destination operand into the unsigned words using saturation, and stores
-
 
2313
the eight resulting word values into the destination register.
-
 
2314
  "phminposuw" finds the minimum unsigned word value in source operand and
-
 
2315
places it into the lowest word of destination operand, setting the remaining
-
 
2316
upper bits of destination to zero.
-
 
2317
  "roundps", "roundss", "roundpd" and "roundsd" perform the rounding of packed
-
 
2318
or individual floating point value of single or double precision, using the
-
 
2319
rounding mode specified by the third operand.
-
 
2320
 
-
 
2321
    roundsd xmm0,xmm1,0011b ; round toward zero
-
 
2322
 
-
 
2323
  "dpps" calculates dot product of packed single precision floating point
-
 
2324
values, that is it multiplies the corresponding pairs of values from source and
-
 
2325
destination operand and then sums the products up. The high four bits of the
-
 
2326
8-bit immediate third operand control which products are calculated and taken
-
 
2327
to the sum, and the low four bits control, into which elements of destination
-
 
2328
the resulting dot product is copied (the other elements are filled with zero).
-
 
2329
"dppd" calculates dot product of packed double precision floating point values.
-
 
2330
The bits 4 and 5 of third operand control, which products are calculated and
-
 
2331
added, and bits 0 and 1 of this value control, which elements in destination
-
 
2332
register should get filled with the result. "mpsadbw" calculates multiple sums
-
 
2333
of absolute differences of unsigned bytes. The third operand controls, with
-
 
2334
value in bits 0-1, which of the four-byte blocks in source operand is taken to
-
 
2335
calculate the absolute differencies, and with value in bit 2, at which of the
-
 
2336
two first four-byte block in destination operand start calculating multiple
-
 
2337
sums. The sum is calculated from four absolute differencies between the
-
 
2338
corresponding unsigned bytes in the source and destination block, and each next
-
 
2339
sum is calculated in the same way, but taking the four bytes from destination
-
 
2340
at the position one byte after the position of previous block. The four bytes
-
 
2341
from the source stay the same each time. This way eight sums of absolute
-
 
2342
differencies are calculated and stored as packed word values into the
-
 
2343
destination operand. The instructions described in this paragraph follow the
-
 
2344
same rules for operands, as "roundps" instruction.
-
 
2345
  "blendps", "blendvps", "blendpd" and "blendvpd" conditionally copy the
-
 
2346
values from source operand into the destination operand, depending on the bits
-
 
2347
of the mask provided by third operand. If a mask bit is set, the corresponding
-
 
2348
element of source is copied into the same place in destination, otherwise this
-
 
2349
position is destination is left unchanged. The rules for the first two operands
-
 
2350
are the same, as for general SSE instructions. "blendps" and "blendpd" need
-
 
2351
third operand to be 8-bit immediate, and they operate on single or double
-
 
2352
precision values, respectively. "blendvps" and "blendvpd" require third operand
-
 
2353
to be the XMM0 register.
-
 
2354
 
-
 
2355
    blendvps xmm3,xmm7,xmm0 ; blend according to mask
-
 
2356
 
-
 
2357
  "pblendw" conditionally copies word elements from the source operand into the
-
 
2358
destination, depending on the bits of mask provided by third operand, which
-
 
2359
needs to be 8-bit immediate value. "pblendvb" conditionally copies byte
-
 
2360
elements from the source operands into destination, depending on mask defined
-
 
2361
by the third operand, which has to be XMM0 register. These instructions follow
-
 
2362
the same rules for operands as "blendps" and "blendvps" instructions,
-
 
2363
respectively.
-
 
2364
  "insertps" inserts a single precision floating point value taken from the
-
 
2365
position in source operand specified by bits 6-7 of third operand into location
-
 
2366
in destination register selected by bits 4-5 of third operand. Additionally,
-
 
2367
the low four bits of third operand control, which elements in destination
-
 
2368
register will be set to zero. The first two operands follow the same rules as
-
 
2369
for the general SSE operation, the third operand should be 8-bit immediate.
-
 
2370
  "extractps" extracts a single precision floating point value taken from the
-
 
2371
location in source operand specified by low two bits of third operand, and
-
 
2372
stores it into the destination operand. The destination can be a 32-bit memory
-
 
2373
value or general purpose register, the source operand must be SSE register,
-
 
2374
and the third operand should be 8-bit immediate value.
-
 
2375
 
-
 
2376
    extractps edx,xmm3,3 ; extract the highest value
-
 
2377
 
-
 
2378
  "pinsrb", "pinsrd" and "pinsrq" copy a byte, double word or quad word from
-
 
2379
the source operand into the location of destination operand determined by the
-
 
2380
third operand. The destination operand has to be SSE register, the source
-
 
2381
operand can be a memory location of appropriate size, or the 32-bit general
-
 
2382
purpose register (but 64-bit general purpose register for "pinsrq", which is
-
 
2383
only available in long mode), and the third operand has to be 8-bit immediate
-
 
2384
value. These instructions complement the "pinsrw" instruction operating on SSE
-
 
2385
register destination, which was introduced by SSE2.
-
 
2386
 
-
 
2387
    pinsrd xmm4,eax,1 ; insert double word into second position
-
 
2388
 
-
 
2389
  "pextrb", "pextrw", "pextrd" and "pextrq" copy a byte, word, double word or
-
 
2390
quad word from the location in source operand specified by third operand, into
-
 
2391
the destination. The source operand should be SSE register, the third operand
-
 
2392
should be 8-bit immediate, and the destination operand can be memory location
-
 
2393
of appropriate size, or the 32-bit general purpose register (but 64-bit general
-
 
2394
purpose register for "pextrq", which is only available in long mode). The
-
 
2395
"pextrw" instruction with SSE register as source was already introduced by
-
 
2396
SSE2, but SSE4 extends it to allow memory operand as destination.
-
 
2397
 
-
 
2398
    pextrw [ebx],xmm3,7 ; extract highest word into memory
-
 
2399
 
-
 
2400
  "pmovsxbw" and "pmovzxbw" perform sign extension or zero extension of eight 
-
 
2401
byte values from the source operand into packed word values in destination 
-
 
2402
operand, which has to be SSE register. The source can be 64-bit memory or SSE 
-
 
2403
register - when it is register, only its low portion is used. "pmovsxbd" and 
-
 
2404
"pmovzxbd" perform sign extension or zero extension of the four byte values 
-
 
2405
from the source operand into packed double word values in destination operand, 
-
 
2406
the source can be 32-bit memory or SSE register. "pmovsxbq" and "pmovzxbq" 
-
 
2407
perform sign extension or zero extension of the two byte values from the 
-
 
2408
source operand into packed quad word values in destination operand, the source
-
 
2409
can be 16-bit memory or SSE register. "pmovsxwd" and "pmovzxwd" perform sign
-
 
2410
extension or zero extension of the four word values from the source operand 
-
 
2411
into packed double words in destination operand, the source can be 64-bit 
-
 
2412
memory or SSE register. "pmovsxwq" and "pmovzxwq" perform sign extension or 
-
 
2413
zero extension of the two word values from the source operand into packed quad
-
 
2414
words in destination operand, the source can be 32-bit memory or SSE register. 
-
 
2415
"pmovsxdq" and "pmovzxdq" perform sign extension or zero extension of the two 
-
 
2416
double word values from the source operand into packed quad words in 
-
 
2417
destination operand, the source can be 64-bit memory or SSE register.
-
 
2418
 
-
 
2419
    pmovzxbq xmm0,word [si]  ; zero-extend bytes to quad words
-
 
2420
    pmovsxwq xmm0,xmm1       ; sign-extend words to quad words 
-
 
2421
 
-
 
2422
  "movntdqa" loads double quad word from the source operand to the destination
-
 
2423
using a non-temporal hint. The destination operand should be SSE register,
-
 
2424
and the source operand should be 128-bit memory location.
-
 
2425
  The SSE4.2, described below, adds not only some new operations on SSE
-
 
2426
registers, but also introduces some completely new instructions operating on
-
 
2427
general purpose registers only.
-
 
2428
  "pcmpistri" compares two zero-ended (implicit length) strings provided in
-
 
2429
its source and destination operand and generates an index stored to ECX;
-
 
2430
"pcmpistrm" performs the same comparison and generates a mask stored to XMM0.
-
 
2431
"pcmpestri" compares two strings of explicit lengths, with length provided
-
 
2432
in EAX for the destination operand and in EDX for the source operand, and
-
 
2433
generates an index stored to ECX; "pcmpestrm" performs the same comparision
-
 
2434
and generates a mask stored to XMM0. The source and destination operand follow
-
 
2435
the same rules as for general SSE instructions, the third operand should be
-
 
2436
8-bit immediate value determining the details of performed operation - refer to
-
 
2437
Intel documentation for information on those details.
-
 
2438
  "pcmpgtq" compares packed quad words, and fills the corresponding elements of
-
 
2439
destination operand with either ones or zeros, depending on whether the value
-
 
2440
in destination is greater than the one in source, or not. This instruction
-
 
2441
follows the same rules for operands as "pcmpeqq".
-
 
2442
  "crc32" accumulates a CRC32 value for the source operand starting with
-
 
2443
initial value provided by destination operand, and stores the result in
-
 
2444
destination. Unless in long mode, the destination operand should be a 32-bit
-
 
2445
general purpose register, and the source operand can be a byte, word, or double
-
 
2446
word register or memory location. In long mode the destination operand can
-
 
2447
also be a 64-bit general purpose register, and the source operand in such case
-
 
2448
can be a byte or quad word register or memory location.
-
 
2449
 
-
 
2450
    crc32 eax,dl          ; accumulate CRC32 on byte value
-
 
2451
    crc32 eax,word [ebx]  ; accumulate CRC32 on word value
-
 
2452
    crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
-
 
2453
 
-
 
2454
  "popcnt" calculates the number of bits set in the source operand, which can
-
 
2455
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
-
 
2456
and stores this count in the destination operand, which has to be register of
-
 
2457
the same size as source operand. The 64-bit variant is available only in long
-
 
2458
mode.
-
 
2459
 
-
 
2460
    popcnt ecx,eax        ; count bits set to 1
-
 
2461
 
-
 
2462
  The SSE4a extension, which also includes the "popcnt" instruction introduced
-
 
2463
by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the
-
 
2464
same syntax, and calculates the count of leading zero bits in source operand
-
 
2465
(if the source operand is all zero bits, the total number of bits in source
-
 
2466
operand is stored in destination).
-
 
2467
  "extrq" extract the sequence of bits from the low quad word of SSE register
-
 
2468
provided as first operand and stores them at the low end of this register,
-
 
2469
filling the remaining bits in the low quad word with zeros. The position of bit
-
 
2470
string and its length can either be provided with two 8-bit immediate values
-
 
2471
as second and third operand, or by SSE register as second operand (and there
-
 
2472
is no third operand in such case), which should contain position value in bits
-
 
2473
8-13 and length of bit string in bits 0-5.
-
 
2474
 
-
 
2475
    extrq xmm0,8,7        ; extract 8 bits from position 7
-
 
2476
    extrq xmm0,xmm5       ; extract bits defined by register
-
 
2477
 
-
 
2478
  "insertq" writes the sequence of bits from the low quad word of the source
-
 
2479
operand into specified position in low quad word of the destination operand,
-
 
2480
leaving the other bits in low quad word of destination intact. The position
-
 
2481
where bits should be written and the length of bit string can either be
-
 
2482
provided with two 8-bit immediate values as third and fourth operand, or by
-
 
2483
the bit fields in source operand (and there are only two operands in such
-
 
2484
case), which should contain position value in bits 72-77 and length of bit
-
 
2485
string in bits 64-69.
-
 
2486
 
-
 
2487
    insertq xmm1,xmm0,4,2 ; insert 4 bits at position 2
-
 
2488
    insertq xmm1,xmm0     ; insert bits defined by register
-
 
2489
 
-
 
2490
  "movntss" and "movntsd" store single or double precision floating point
-
 
2491
value from the source SSE register into 32-bit or 64-bit destination memory
-
 
2492
location respectively, using non-temporal hint.
-
 
2493
 
-
 
2494
 
-
 
2495
2.1.21  AVX instructions
-
 
2496
 
-
 
2497
The Advanced Vector Extensions introduce instructions that are new variants 
-
 
2498
of SSE instructions, with new scheme of encoding that allows extended syntax 
-
 
2499
having a destination operand separate from all the source operands. It also 
-
 
2500
introduces 256-bit AVX registers, which extend up the old 128-bit SSE 
-
 
2501
registers. Any AVX instruction that puts some result into SSE register, puts 
-
 
2502
zero bits into high portion of the AVX register containing it.
-
 
2503
  The AVX version of SSE instruction has the mnemonic obtained by prepending
-
 
2504
SSE instruction name with "v". For any SSE arithmetic instruction which had a
-
 
2505
destination operand also being used as one of the source values, the AVX 
-
 
2506
variant has a new syntax with three operands - the destination and two sources. 
-
 
2507
The destination and first source can be SSE registers, and second source can be
-
 
2508
SSE register or memory. If the operation is performed on single pair of values,
-
 
2509
the remaining bits of first source SSE register are copied into the the 
-
 
2510
destination register.
-
 
2511
 
-
 
2512
    vsubss xmm0,xmm2,xmm3         ; substract two 32-bit floats
-
 
2513
    vmulsd xmm0,xmm7,qword [esi]  ; multiply two 64-bit floats 
-
 
2514
 
-
 
2515
In case of packed operations, each instruction can also operate on the 256-bit 
-
 
2516
data size when the AVX registers are specified instead of SSE registers, and 
-
 
2517
the size of memory operand is also doubled then.
-
 
2518
 
-
 
2519
    vaddps ymm1,ymm5,yword [esi]  ; eight sums of 32-bit float pairs 
-
 
2520
 
-
 
2521
The instructions that operate on packed integer types (in particular the ones
-
 
2522
that earlier had been promoted from MMX to SSE) also acquired the new syntax
-
 
2523
with three operands, however they are only allowed to operate on 128-bit 
-
 
2524
packed types and thus cannot use the whole AVX registers.
-
 
2525
 
-
 
2526
    vpavgw xmm3,xmm0,xmm2         ; average of 16-bit integers
-
 
2527
    vpslld xmm1,xmm0,1            ; shift double words left
-
 
2528
     
-
 
2529
If the SSE version of instruction had a syntax with three operands, the third
-
 
2530
one being an immediate value, the AVX version of such instruction takes four
-
 
2531
operands, with immediate remaining the last one.
-
 
2532
 
-
 
2533
    vshufpd ymm0,ymm1,ymm2,10010011b ; shuffle 64-bit floats
-
 
2534
    vpalignr xmm0,xmm4,xmm2,3        ; extract byte aligned value
-
 
2535
     
-
 
2536
The promotion to new syntax according to the rules described above has been 
-
 
2537
applied to all the instructions from SSE extensions up to SSE4, with the 
-
 
2538
exceptions described below.   
-
 
2539
  "vdppd" instruction has syntax extended to four operans, but it does not 
-
 
2540
have a 256-bit version.
-
 
2541
  The are a few instructions, namely "vsqrtpd", "vsqrtps", "vrcpps" and
-
 
2542
"vrsqrtps", which can operate on 256-bit data size, but retained the syntax 
-
 
2543
with only two operands, because they use data from only one source:
-
 
2544
    
-
 
2545
    vsqrtpd ymm1,ymm0         ; put square roots into other register
-
 
2546
 
-
 
2547
In a similar way "vroundpd" and "vroundps" retained the syntax with three 
-
 
2548
operands, the last one being immediate value.   
-
 
2549
 
-
 
2550
    vroundps ymm0,ymm1,0011b  ; round toward zero
-
 
2551
                              
-
 
2552
  Also some of the operations on packed integers kept their two-operand or
-
 
2553
three-operand syntax while being promoted to AVX version. In such case these
-
 
2554
instructions follow exactly the same rules for operands as their SSE 
-
 
2555
counterparts (since operations on packed integers do not have 256-bit variants
-
 
2556
in AVX extension). These include "vpcmpestri", "vpcmpestrm", "vpcmpistri",
-
 
2557
"vpcmpistrm", "vphminposuw", "vpshufd", "vpshufhw", "vpshuflw". And there are 
-
 
2558
more instructions that in AVX versions keep exactly the same syntax for 
-
 
2559
operands as the one from SSE, without any additional options: "vcomiss", 
-
 
2560
"vcomisd", "vcvtss2si", "vcvtsd2si", "vcvttss2si", "vcvttsd2si", "vextractps", 
-
 
2561
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vmovd", "vmovq", "vmovntdqa", 
-
 
2562
"vmaskmovdqu", "vpmovmskb", "vpmovsxbw", "vpmovsxbd", "vpmovsxbq", "vpmovsxwd", 
-
 
2563
"vpmovsxwq", "vpmovsxdq", "vpmovzxbw", "vpmovzxbd", "vpmovzxbq", "vpmovzxwd", 
-
 
2564
"vpmovzxwq" and "vpmovzxdq".
-
 
2565
  The move and conversion instructions have mostly been promoted to allow
-
 
2566
256-bit size operands in addition to the 128-bit variant with syntax identical
-
 
2567
to that from SSE version of the same instruction. Each of the "vcvtdq2ps", 
-
 
2568
"vcvtps2dq" and "vcvttps2dq", "vmovaps", "vmovapd", "vmovups", "vmovupd",
-
 
2569
"vmovdqa", "vmovdqu", "vlddqu", "vmovntps", "vmovntpd", "vmovntdq", 
-
 
2570
"vmovsldup", "vmovshdup", "vmovmskps" and "vmovmskpd" inherits the 128-bit 
-
 
2571
syntax from SSE without any changes, and also allows a new form with 256-bit 
-
 
2572
operands in place of 128-bit ones.  
-
 
2573
 
-
 
2574
    vmovups [edi],ymm6        ; store unaligned 256-bit data
-
 
2575
    
-
 
2576
  "vmovddup" has the identical 128-bit syntax as its SSE version, and it also 
-
 
2577
has a 256-bit version, which stores the duplicates of the lowest quad word 
-
 
2578
from the source operand in the lower half of destination operand, and in the 
-
 
2579
upper half of destination the duplicates of the low quad word from the upper 
-
 
2580
half of source. Both source and destination operands need then to be 256-bit 
-
 
2581
values.
-
 
2582
  "vmovlhps" and "vmovhlps" have only 128-bit versions, and each takes three
-
 
2583
operands, which all must be SSE registers. "vmovlhps" copies two single 
-
 
2584
precision values from the low quad word of second source register to the high 
-
 
2585
quad word of destination register, and copies the low quad word of first 
-
 
2586
source register into the low quad word of destination register. "vmovhlps" 
-
 
2587
copies two single  precision values from the high quad word of second source 
-
 
2588
register to the low quad word of destination register, and copies the high 
-
 
2589
quad word of first source register into the high quad word of destination 
-
 
2590
register. 
-
 
2591
  "vmovlps", "vmovhps", "vmovlpd" and "vmovhpd" have only 128-bit versions and
-
 
2592
their syntax varies depending on whether memory operand is a destination or
-
 
2593
source. When memory is destination, the syntax is identical to the one of
-
 
2594
equivalent SSE instruction, and when memory is source, the instruction requires
-
 
2595
three operands, first two being SSE registers and the third one 64-bit memory.
-
 
2596
The value put into destination is then the value copied from first source with
-
 
2597
either low or high quad word replaced with value from second source (the
-
 
2598
memory operand).
-
 
2599
 
-
 
2600
    vmovhps [esi],xmm7       ; store upper half to memory
-
 
2601
    vmovlps xmm0,xmm7,[ebx]  ; low from memory, rest from register  
-
 
2602
  
-
 
2603
  "vmovss" and "vmovsd" have syntax identical to their SSE equivalents as long
-
 
2604
as one of the operands is memory, while the versions that operate purely on 
-
 
2605
registers require three operands (each being SSE register). The value stored
-
 
2606
in destination is then the value copied from first source with lowest data
-
 
2607
element replaced with the lowest value from second source.
-
 
2608
 
-
 
2609
    vmovss xmm3,[edi]        ; low from memory, rest zeroed
-
 
2610
    vmovss xmm0,xmm1,xmm2    ; one value from xmm2, three from xmm1 
-
 
2611
  
-
 
2612
  "vcvtss2sd", "vcvtsd2ss", "vcvtsi2ss" and "vcvtsi2d" use the three-operand
-
 
2613
syntax, where destination and first source are always SSE registers, and the
-
 
2614
second source follows the same rules and the source in syntax of equivalent
-
 
2615
SSE instruction. The value stored in destination is then the value copied from
-
 
2616
first source with lowest data element replaced with the result of conversion. 
-
 
2617
 
-
 
2618
    vcvtsi2sd xmm4,xmm4,ecx  ; 32-bit integer to 64-bit float
-
 
2619
    vcvtsi2ss xmm0,xmm0,rax  ; 64-bit integer to 32-bit float
-
 
2620
 
-
 
2621
  "vcvtdq2pd" and "vcvtps2pd" allow the same syntax as their SSE equivalents, 
-
 
2622
plus the new variants with AVX register as destination and SSE register or 
-
 
2623
128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and 
-
 
2624
"vcvtpd2ps", in addition to variant with syntax identical to SSE version, 
-
 
2625
allow a variant with SSE register as destination and AVX register or 256-bit 
-
 
2626
memory as source.          
-
 
2627
  "vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use 
-
 
2628
a syntax with four operands, where destination and first source have to be SSE
-
 
2629
registers, and the third and fourth operand follow the same rules as second 
-
 
2630
and third operand in the syntax of equivalent SSE instruction. Value stored in 
-
 
2631
destination is the the value copied from first source with some data elements 
-
 
2632
replaced with values extracted from the second source, analogously to the 
-
 
2633
operation of corresponding SSE instruction.   
-
 
2634
  
-
 
2635
    vpinsrd xmm0,xmm0,eax,3  ; insert double word
-
 
2636
 
-
 
2637
  "vblendvps", "vblendvpd" and "vpblendvb" use a new syntax with four register
-
 
2638
operands: destination, two sources and a mask, where second source can also be
-
 
2639
a memory operand. "vblendvps" and "vblendvpd" have 256-bit variant, where 
-
 
2640
operands are AVX registers or 256-bit memory, as well as 128-bit variant, 
-
 
2641
which has operands being SSE registers or 128-bit memory. "vpblendvb" has only
-
 
2642
a 128-bit variant. Value stored in destination is the value copied from the
-
 
2643
first source with some data elements replaced, according to mask, by values 
-
 
2644
from the second source.
-
 
2645
 
-
 
2646
    vblendvps ymm3,ymm1,ymm2,ymm7  ; blend according to mask     
-
 
2647
   
-
 
2648
  "vptest" allows the same syntax as its SSE version and also has a 256-bit
-
 
2649
version, with both operands doubled in size. There are also two new 
-
 
2650
instructions, "vtestps" and "vtestpd", which perform analogous tests, but only
-
 
2651
of the sign bits of corresponding single precision or double precision values,
-
 
2652
and set the ZF and CF accordingly. They follow the same syntax rules as 
-
 
2653
"vptest".
-
 
2654
 
-
 
2655
    vptest ymm0,yword [ebx]  ; test 256-bit values
-
 
2656
    vtestpd xmm0,xmm1        ; test sign bits of 64-bit floats
-
 
2657
 
-
 
2658
  "vbroadcastss", "vbroadcastsd" and "vbroadcastf128" are new instructions, 
-
 
2659
which broadcast the data element defined by source operand into all elements
-
 
2660
of corresponing size in the destination register. "vbroadcastss" needs
-
 
2661
source to be 32-bit memory and destination to be either SSE or AVX register. 
-
 
2662
"vbroadcastsd" requires 64-bit memory as source, and AVX register as 
-
 
2663
destination. "vbroadcastf128" requires 128-bit memory as source, and AVX
-
 
2664
register as destination.
2210
 
2665
 
-
 
2666
    vbroadcastss ymm0,dword [eax]  ; get eight copies of value          
-
 
2667
 
-
 
2668
  "vinsertf128" is the new instruction, which takes four operands. The
-
 
2669
destination and first source have to be AVX registers, second source can be 
-
 
2670
SSE register or 128-bit memory location, and fourth operand should be an 
-
 
2671
immediate value. It stores in destination the value obtained by taking 
-
 
2672
contents of first source and replacing one of its 128-bit units with value of
-
 
2673
the second source. The lowest bit of fourth operand specifies at which 
-
 
2674
position that replacement is done (either 0 or 1). 
-
 
2675
  "vextractf128" is the new instruction with three operands. The destination
-
 
2676
needs to be SSE register or 128-bit memory location, the source must be AVX
-
 
2677
register, and the third operand should be an immediate value. It extracts
-
 
2678
into destination one of the 128-bit units from source. The lowest bit of third
-
 
2679
operand specifies, which unit is extracted.  
-
 
2680
  "vmaskmovps" and "vmaskmovpd" are the new instructions with three operands
-
 
2681
that selectively store in destination the elements from second source 
-
 
2682
depending on the sign bits of corresponding elements from first source. These
-
 
2683
instructions can operate on either 128-bit data (SSE registers) or 256-bit 
-
 
2684
data (AVX registers). Either destination or second source has to be a memory
-
 
2685
location of appropriate size, the two other operands should be registers.   
-
 
2686
  
-
 
2687
    vmaskmovps [edi],xmm0,xmm5  ; conditionally store
-
 
2688
    vmaskmovpd ymm5,ymm0,[esi]  ; conditionally load   
-
 
2689
 
-
 
2690
  "vpermilpd" and "vpermilps" are the new instructions with three operands 
-
 
2691
that permute the values from first source according to the control fields from 
-
 
2692
second source and put the result into destination operand. It allows to use
-
 
2693
either three SSE registers or three AVX registers as its operands, the second
-
 
2694
source can be a memory of size equal to the registers used. In alternative
-
 
2695
form the second source can be immediate value and then the first source
-
 
2696
can be a memory location of the size equal to destination register.
-
 
2697
  "vperm2f128" is the new instruction with four operands, which selects 
-
 
2698
128-bit blocks of floating point data from first and second source according
-
 
2699
to the bit fields from fourth operand, and stores them in destination.
-
 
2700
Destination and first source need to be AVX registers, second source can be
-
 
2701
AVX register or 256-bit memory area, and fourth operand should be an immediate
-
 
2702
value.
-
 
2703
 
-
 
2704
    vperm2f128 ymm0,ymm6,ymm7,12h  ; permute 128-bit blocks
-
 
2705
 
-
 
2706
  "vzeroall" instruction sets all the AVX registers to zero. "vzeroupper" sets
-
 
2707
the upper 128-bit portions of all AVX registers to zero, leaving the SSE 
-
 
2708
registers intact. These new instructions take no operands.
-
 
2709
  "vldmxcsr" and "vstmxcsr" are the AVX versions of "ldmxcsr" and "stmxcsr"
-
 
2710
instructions. The rules for their operands remain unchanged.  
-
 
2711
 
-
 
2712
  
-
 
2713
2.1.22  AVX2 instructions
-
 
2714
 
-
 
2715
The AVX2 extension allows all the AVX instructions operating on packed integers
-
 
2716
to use 256-bit data types, and introduces some new instructions as well.
-
 
2717
  The AVX instructions that operate on packed integers and had only a 128-bit
-
 
2718
variants, have been supplemented with 256-bit variants, and thus their syntax
-
 
2719
rules became analogous to AVX instructions operating on packed floating point
-
 
2720
types.
-
 
2721
 
-
 
2722
    vpsubb ymm0,ymm0,[esi]   ; substract 32 packed bytes
-
 
2723
    vpavgw ymm3,ymm0,ymm2    ; average of 16-bit integers
-
 
2724
 
-
 
2725
However there are some instructions that have not been equipped with the 
-
 
2726
256-bit variants. "vpcmpestri", "vpcmpestrm", "vpcmpistri", "vpcmpistrm", 
-
 
2727
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vpinsrb", "vpinsrw", "vpinsrd", 
-
 
2728
"vpinsrq" and "vphminposuw" are not affected by AVX2 and allow only the 
-
 
2729
128-bit operands.
-
 
2730
  The packed shift instructions, which allowed the third operand specifying
-
 
2731
amount to be SSE register or 128-bit memory location, use the same rules
-
 
2732
for the third operand in their 256-bit variant.
-
 
2733
 
-
 
2734
    vpsllw ymm2,ymm2,xmm4        ; shift words left
-
 
2735
    vpsrad ymm0,ymm3,xword [ebx] ; shift double words right
-
 
2736
 
-
 
2737
  There are also new packed shift instructions with standard three-operand AVX
-
 
2738
syntax, which shift each element from first source by the amount specified in 
-
 
2739
corresponding element of second source, and store the results in destination. 
-
 
2740
"vpsllvd" shifts 32-bit elements left, "vpsllvq" shifts 64-bit elements left, 
-
 
2741
"vpsrlvd" shifts 32-bit elements right logically, "vpsrlvq" shifts 64-bit 
-
 
2742
elements right logically and "vpsravd" shifts 32-bit elements right 
-
 
2743
arithmetically.
-
 
2744
  The sign-extend and zero-extend instructions, which in AVX versions allowed
-
 
2745
source operand to be SSE register or a memory of specific size, in the new
-
 
2746
256-bit variant need memory of that size doubled or SSE register as source and
-
 
2747
AVX register as destination.
-
 
2748
 
-
 
2749
    vpmovzxbq ymm0,dword [esi]   ; bytes to quad words
-
 
2750
    
-
 
2751
  Also "vmovntdqa" has been upgraded with 256-bit variant, so it allows to 
-
 
2752
transfer 256-bit value from memory to AVX register, it needs memory address 
-
 
2753
to be aligned to 32 bytes.   
-
 
2754
  "vpmaskmovd" and "vpmaskmovq" are the new instructions with syntax identical
-
 
2755
to "vmaskmovps" or "vmaskmovpd", and they performs analogous operation on
-
 
2756
packed 32-bit or 64-bit values.    
-
 
2757
  "vinserti128", "vextracti128", "vbroadcasti128" and "vperm2i128" are the new 
-
 
2758
instructions with syntax identical to "vinsertf128", "vextractf128",
-
 
2759
"vbroadcastf128" and "vperm2f128" respectively, and they perform analogous 
-
 
2760
operations on 128-bit blocks of integer data.
-
 
2761
  "vbroadcastss" and "vbroadcastsd" instructions have been extended to allow
-
 
2762
SSE register as a source operand (which in AVX could only be a memory).
-
 
2763
  "vpbroadcastb", "vpbroadcastw", "vpbroadcastd" and "vpbroadcastq" are the 
-
 
2764
new instructions which broadcast the byte, word, double word or quad word from
-
 
2765
the source operand into all elements of corresponing size in the destination 
-
 
2766
register. The destination operand can be either SSE or AVX register, and the
-
 
2767
source operand can be SSE register or memory of size equal to the size of data
-
 
2768
element.
-
 
2769
 
-
 
2770
    vpbroadcastb ymm0,byte [ebx]  ; get 32 identical bytes
-
 
2771
                 
-
 
2772
  "vpermd" and "vpermps" are new three-operand instructions, which use each 
-
 
2773
32-bit element from first source as an index of element in second source which
-
 
2774
is copied into destination at position corresponding to element containing
-
 
2775
index. The destination and first source have to be AVX registers, and the
-
 
2776
second source can be AVX register or 256-bit memory.
-
 
2777
  "vpermq" and "vpermpd" are new three-operand instructions, which use 2-bit
-
 
2778
indexes from the immediate value specified as third operand to determine which
-
 
2779
element from source store at given position in destination. The destination
-
 
2780
has to be AVX register, source can be AVX register or 256-bit memory, and the
-
 
2781
third operand must be 8-bit immediate value.    
-
 
2782
  The family of new instructions performing "gather" operation have special
-
 
2783
syntax, as in their memory operand they use addressing mode that is unique to
-
 
2784
them. The base of address can be a 32-bit or 64-bit general purpose register
-
 
2785
(the latter only in long mode), and the index (possibly multiplied by scale
-
 
2786
value, as in standard addressing) is specified by SSE or AVX register. It is
-
 
2787
possible to use only index without base and any numerical displacement can be
-
 
2788
added to the address. Each of those instructions takes three operands. First 
-
 
2789
operand is the destination register, second operand is memory addressed with
-
 
2790
a vector index, and third operand is register containing a mask. The most 
-
 
2791
significant bit of each element of mask determines whether a value will be 
-
 
2792
loaded from memory into corresponding element in destination. The address of
-
 
2793
each element to load is determined by using the corresponding element from 
-
 
2794
index register in memory operand to calculate final address with given base
-
 
2795
and displacement. When the index register contains less elements than the 
-
 
2796
destination and mask registers, the higher elements of destination are zeroed.
-
 
2797
After the value is successfuly loaded, the corresponding element in mask 
-
 
2798
register is set to zero. The destination, index and mask should all be
-
 
2799
distinct registers, it is not allowed to use the same register in two 
-
 
2800
different roles.
-
 
2801
  "vgatherdps" loads single precision floating point values addressed by 
-
 
2802
32-bit indexes. The destination, index and mask should all be registers of the
-
 
2803
same type, either SSE or AVX. The data addressed by memory operand is 32-bit
-
 
2804
in size. 
-
 
2805
 
-
 
2806
    vgatherdps xmm0,[eax+xmm1],xmm3    ; gather four floats
-
 
2807
    vgatherdps ymm0,[ebx+ymm7*4],ymm3  ; gather eight floats
-
 
2808
 
-
 
2809
  "vgatherqps" loads single precision floating point values addressed by
-
 
2810
64-bit indexes. The destination and mask should always be SSE registers, while
-
 
2811
index register can be either SSE or AVX register. The data addressed by memory
-
 
2812
operand is 32-bit in size.
-
 
2813
  
-
 
2814
    vgatherqps xmm0,[xmm2],xmm3        ; gather two floats     
-
 
2815
    vgatherqps xmm0,[ymm2+64],xmm3     ; gather four floats  
-
 
2816
  
-
 
2817
  "vgatherdpd" loads double precision floating point values addressed by
-
 
2818
32-bit indexes. The index register should always be SSE register, the 
-
 
2819
destination and mask should be two registers of the same type, either SSE or
-
 
2820
AVX. The data addressed by memory operand is 64-bit in size. 
-
 
2821
  
-
 
2822
    vgatherdpd xmm0,[ebp+xmm1],xmm3    ; gather two doubles
-
 
2823
    vgatherdpd ymm0,[xmm3*8],ymm5      ; gather four doubles
-
 
2824
 
-
 
2825
  "vgatherqpd" loads double precision floating point values addressed by
-
 
2826
64-bit indexes. The destination, index and mask should all be registers of the
-
 
2827
same type, either SSE or AVX. The data addressed by memory operand is 64-bit
-
 
2828
in size.      
-
 
2829
  "vpgatherdd" and "vpgatherqd" load 32-bit values addressed by either 32-bit
-
 
2830
or 64-bit indexes. They follow the same rules as "vgatherdps" and "vgatherqps"
-
 
2831
respectively.  
-
 
2832
  "vpgatherdq" and "vpgatherqq" load 64-bit values addressed by either 32-bit
-
 
2833
or 64-bit indexes. They follow the same rules as "vgatherdpd" and "vgatherqpd"
-
 
2834
respectively.  
-
 
2835
  
-
 
2836
 
-
 
2837
2.1.23  Auxiliary sets of computational instructions
-
 
2838
 
-
 
2839
  There is a number of additional instruction set extensions related to 
-
 
2840
AVX. They introduce new vector instructions (and sometimes also their SSE 
-
 
2841
equivalents that use classic instruction encoding), and even some new
-
 
2842
instructions operating on general registers that use the AVX-like encoding
-
 
2843
allowing the extended syntax with separate destination and source operands.
-
 
2844
The CPU support for each of these instruction sets needs to be determined
-
 
2845
separately.    
-
 
2846
  The AES extension provides a specialized set of instructions for the 
-
 
2847
purpose of cryptographic computations defined by Advanced Encryption Standard.
-
 
2848
Each of these instructions has two versions: the AVX one and the one with 
-
 
2849
SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
-
 
2850
details of operation of these instructions.
-
 
2851
  "aesenc" and "aesenclast" perform a single round of AES encryption on data
-
 
2852
from first source with a round key from second source, and store result in
-
 
2853
destination. The destination and first source are SSE registers, and the 
-
 
2854
second source can be SSE register or 128-bit memory. The AVX versions of these
-
 
2855
instructions, "vaesenc" and "vaesenclast", use the syntax with three operands,
-
 
2856
while the SSE-like version has only two operands, with first operand being 
-
 
2857
both the destination and first source.
-
 
2858
  "aesdec" and "aesdeclast" perform a single round of AES decryption on data
-
 
2859
from first source with a round key from second source. The syntax rules for
-
 
2860
them and their AVX versions are the same as for "aesenc".
-
 
2861
  "aesimc" performs the InvMixColumns transformation of source operand and
-
 
2862
store the result in destination. Both "aesimc" and "vaesimc" use only two
-
 
2863
operands, destination being SSE register, and source being SSE register or
-
 
2864
128-bit memory location.
-
 
2865
  "aeskeygenassist" is a helper instruction for generating the round key.
-
 
2866
It needs three operands: destination being SSE register, source being SSE
-
 
2867
register or 128-bit memory, and third operand being 8-bit immediate value.  
-
 
2868
The AVX version of this instruction uses the same syntax.  
-
 
2869
  The CLMUL extension introduces just one instruction, "pclmulqdq", and its
-
 
2870
AVX version as well. This instruction performs a carryless multiplication of
-
 
2871
two 64-bit values selected from first and second source according to the bit
-
 
2872
fields in immediate value. The destination and first source are SSE registers,
-
 
2873
second source is SSE register or 128-bit memory, and immediate value is 
-
 
2874
provided as last operand. "vpclmulqdq" takes four operands, while "pclmulqdq"
-
 
2875
takes only three operands, with the first one serving both the role of 
-
 
2876
destination and first source.
-
 
2877
  The FMA (Fused Multiply-Add) extension introduces additional AVX 
-
 
2878
instructions which perform multiplication and summation as single operation. 
-
 
2879
Each one takes three operands, first one serving both the role of destination 
-
 
2880
and first source, and the following ones being the second and third source. 
-
 
2881
The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first 
-
 
2882
either "m" or "nm" to select whether result of multiplication should be taken 
-
 
2883
as-is or negated, then either "add" or "sub" to select whether third value 
-
 
2884
will be added to the product or substracted from the product, then either 
-
 
2885
"132", "213" or "231" to select which source operands are multiplied and which 
-
 
2886
one is added or substracted, and finally the type of data on which the 
-
 
2887
instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE 
-
 
2888
instructions promoted to AVX, instructions operating on packed floating point 
-
 
2889
values allow 128-bit or 256-bit syntax, in former all the operands are SSE 
-
 
2890
registers, but the third one can also be a 128-bit memory, in latter the 
-
 
2891
operands are AVX registers and the third one can also be a 256-bit memory. 
-
 
2892
Instructions that compute just one floating point result need operands to be 
-
 
2893
SSE registers, and the third operand can also be a memory, either 32-bit for 
-
 
2894
single precision or 64-bit for double precision.
-
 
2895
 
-
 
2896
    vfmsub231ps ymm1,ymm2,ymm3     ; multiply and substract
-
 
2897
    vfnmadd132sd xmm0,xmm5,[ebx]   ; multiply, negate and add        
-
 
2898
 
-
 
2899
In addition to the instructions created by the rule described above, there are
-
 
2900
families of instructions with mnemonics starting with either "vfmaddsub" or
-
 
2901
"vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or
-
 
2902
"pd" (the operation must always be on packed values in this case). They add
-
 
2903
to the result of multiplication or substract from it depending on the position
-
 
2904
of value in packed data - instructions from the "vfmaddsub" group add when the
-
 
2905
position is odd and substract when the position is even, instructions from the
-
 
2906
"vfmsubadd" group add when the position is even and subtstract when the 
-
 
2907
position is odd. The rules for operands are the same as for other FMA 
-
 
2908
instructions.
-
 
2909
  The FMA4 instructions are similar to FMA, but use syntax with four operands
-
 
2910
and thus allow destination to be different than all the sources. Their 
-
 
2911
mnemonics are identical to FMA instructions with the "132", "213" or "231" cut
-
 
2912
out, as having separate destination operand makes such selection of operands
-
 
2913
superfluous. The multiplication is always performed on values from the first 
-
 
2914
and second source, and then the value from third source is added or 
-
 
2915
substracted. Either second or third source can be a memory operand, and the
-
 
2916
rules for the sizes of operands are the same as for FMA instructions.
-
 
2917
 
-
 
2918
    vfmaddpd ymm0,ymm1,[esi],ymm2  ; multiply and add   
-
 
2919
    vfmsubss xmm0,xmm1,xmm2,[ebx]  ; multiply and substract
-
 
2920
    
-
 
2921
  The F16C extension consists of two instructions, "vcvtps2ph" and 
-
 
2922
"vcvtph2ps", which convert floating point values between single precision and
-
 
2923
half precision (the 16-bit floating point format). "vcvtps2ph" takes three
-
 
2924
operands: destination, source, and rounding controls. The third operand is
-
 
2925
always an immediate, the source is either SSE or AVX register containing 
-
 
2926
single precision values, and the destination is SSE register or memory, the
-
 
2927
size of memory is 64 bits when the source is SSE register and 128 bits when
-
 
2928
the source is AVX register. "vcvtph2ps" takes two operands, the destination
-
 
2929
that can be SSE or AVX register, and the source that is SSE register or memory
-
 
2930
with size of the half of destination operand's size.
-
 
2931
  The AMD XOP extension introduces a number of new vector instructions with 
-
 
2932
encoding and syntax analogous to AVX instructions. "vfrczps", "vfrczss",
-
 
2933
"vfrczpd" and "vfrczsd" extract fractional portions of single or double
-
 
2934
precision values, they all take two operands. The packed operations allow
-
 
2935
either SSE or AVX register as destination, for the other two it has to be SSE
-
 
2936
register. Source can be register of the same type as destination, or memory
-
 
2937
of appropriate size (256-bit for destination being AVX register, 128-bit for
-
 
2938
packed operation with destination being SSE register, 64-bit for operation
-
 
2939
on a solitary double precision value and 32-bit for operation on a solitary 
-
 
2940
single precision value).
-
 
2941
 
-
 
2942
    vfrczps ymm0,[esi]           ; load fractional parts
-
 
2943
    
-
 
2944
  "vpcmov" copies bits from either first or second source into destination
-
 
2945
depending on the values of corresponding bits in the fourth operand (the
-
 
2946
selector). If the bit in selector is set, the corresponding bit from first
-
 
2947
source is copied into the same position in destination, otherwise the bit from
-
 
2948
second source is copied. Either second source or selector can be memory
-
 
2949
location, 128-bit or 256-bit depending on whether SSE registers or AVX
-
 
2950
registers are specified as the other operands.
-
 
2951
 
-
 
2952
    vpcmov xmm0,xmm1,xmm2,[ebx]  ; selector in memory
-
 
2953
    vpcmov ymm0,ymm5,[esi],ymm2  ; source in memory
-
 
2954
 
-
 
2955
The family of packed comparison instructions take four operands, the 
-
 
2956
destination and first source being SSE register, second source being SSE
-
 
2957
register or 128-bit memory and the fourth operand being immediate value
-
 
2958
defining the type of comparison. The mnemonic or instruction is created
-
 
2959
by appending to "vpcom" prefix either "b" or "ub" to compare signed or 
-
 
2960
unsigned bytes, "w" or "uw" to compare signed or unsigned words, "d" or "ud"
-
 
2961
to compare signed or unsigned double words, "q" or "uq" to compare signed or
-
 
2962
unsigned quad words. The respective values from the first and second source 
-
 
2963
are compared and the corresponding data element in destination is set to
-
 
2964
either all ones or all zeros depending on the result of comparison. The fourth
-
 
2965
operand has to specify one of the eight comparison types (table 2.5). All
-
 
2966
these instruction have also variants with only three operands and the type 
-
 
2967
of comparison encoded within the instruction name by inserting the comparison 
-
 
2968
mnemonic after "vpcom".
-
 
2969
 
-
 
2970
    vpcomb   xmm0,xmm1,xmm2,4    ; test for equal bytes
-
 
2971
    vpcomgew xmm0,xmm1,[ebx]     ; compare signed words
-
 
2972
 
-
 
2973
   Table 2.5  XOP comparisons
-
 
2974
  /-------------------------------------------\
-
 
2975
  | Code | Mnemonic | Description             |
-
 
2976
  |======|==========|=========================|
-
 
2977
  | 0    | lt       | less than               |
-
 
2978
  | 1    | le       | less than or equal      |
-
 
2979
  | 2    | gt       | greater than            |
-
 
2980
  | 3    | ge       | greater than or equal   |
-
 
2981
  | 4    | eq       | equal                   |
-
 
2982
  | 5    | neq      | not equal               |
-
 
2983
  | 6    | false    | false                   |
-
 
2984
  | 7    | true     | true                    |
-
 
2985
  \-------------------------------------------/
-
 
2986
 
-
 
2987
  "vpermil2ps" and "vpermil2pd" set the elements in destination register to
-
 
2988
zero or to a value selected from first or second source depending on the 
-
 
2989
corresponding bit fields from the fourth operand (the selector) and the 
-
 
2990
immediate value provided in fifth operand. Refer to the AMD manuals for the
-
 
2991
detailed explanation of the operation performed by these instructions. Each
-
 
2992
of the first four operands can be a register, and either second source or
-
 
2993
selector can be memory location, 128-bit or 256-bit depending on whether SSE 
-
 
2994
registers or AVX registers are used for the other operands.
-
 
2995
 
-
 
2996
    vpermil2ps ymm0,ymm3,ymm7,ymm2,0  ; permute from two sources
-
 
2997
  
-
 
2998
  "vphaddbw" adds pairs of adjacent signed bytes to form 16-bit values and 
-
 
2999
stores them at the same positions in destination. "vphaddubw" does the same 
-
 
3000
but treats the bytes as unsigned. "vphaddbd" and "vphaddubd" sum all bytes 
-
 
3001
(either signed or unsigned) in each four-byte block to 32-bit results, 
-
 
3002
"vphaddbq" and "vphaddubq" sum all bytes in each eight-byte block to 
-
 
3003
64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit 
-
 
3004
results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to 
-
 
3005
64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit
-
 
3006
results. "vphsubbw" substracts in each two-byte block the byte at higher 
-
 
3007
position from the one at lower position, and stores the result as a signed 
-
 
3008
16-bit value at the corresponding position in destination, "vphsubwd" 
-
 
3009
substracts in each two-word block the word at higher position from the one at
-
 
3010
lower position and makes signed 32-bit results, "vphsubdq" substract in each
-
 
3011
block of two double word the one at higher position from the one at lower
-
 
3012
position and makes signed 64-bit results. Each of these instructions takes
-
 
3013
two operands, the destination being SSE register, and the source being SSE
-
 
3014
register or 128-bit memory.
-
 
3015
 
-
 
3016
    vphadduwq xmm0,xmm1          ; sum quadruplets of words 
-
 
3017
  
-
 
3018
  "vpmacsww" and "vpmacssww" multiply the corresponding signed 16-bit values 
-
 
3019
from the first and second source and then add the products to the parallel 
-
 
3020
values from the third source, then "vpmacsww" takes the lowest 16 bits of the 
-
 
3021
result and "vpmacssww" saturates the result down to 16-bit value, and they 
-
 
3022
store the final 16-bit results in the destination. "vpmacsdd" and "vpmacssdd" 
-
 
3023
perform the analogous operation on 32-bit values. "vpmacswd" and "vpmacswd" do 
-
 
3024
the same calculation only on the low 16-bit values from each 32-bit block and 
-
 
3025
form the 32-bit results. "vpmacsdql" and "vpmacssdql" perform such operation 
-
 
3026
on the low 32-bit values from each 64-bit block and form the 64-bit results, 
-
 
3027
while "vpmacsdqh" and "vpmacssdqh" do the same on the high 32-bit values from 
-
 
3028
each 64-bit block, also forming the 64-bit results. "vpmadcswd" and 
-
 
3029
"vpmadcsswd" multiply the corresponding signed 16-bit value from the first
-
 
3030
and second source, then sum all the four products and add this sum to each
-
 
3031
16-bit element from third source, storing the truncated or saturated result
-
 
3032
in destination. All these instructions take four operands, the second source
-
 
3033
can be 128-bit memory or SSE register, all the other operands have to be
-
 
3034
SSE registers.
-
 
3035
 
-
 
3036
    vpmacsdd xmm6,xmm1,[ebx],xmm6  ; accumulate product
-
 
3037
 
-
 
3038
  "vpperm" selects bytes from first and second source, optionally applies a
-
 
3039
separate transformation to each of them, and stores them in the destination. 
-
 
3040
The bit fields in fourth operand (the selector) specify for each position in 
-
 
3041
destination what byte from which source is taken and what operation is applied 
-
 
3042
to it before it is stored there. Refer to the AMD manuals for the detailed 
-
 
3043
information about these bit fields. This instruction takes four operands, 
-
 
3044
either second source or selector can be a 128-bit memory (or they can be SSE
-
 
3045
registers both), all the other operands have to be SSE registers.
-
 
3046
  "vpshlb", "vpshlw", "vpshld" and "vpshlq" shift logically bytes, words, double
-
 
3047
words or quad words respectively. The amount of bits to shift by is specified
-
 
3048
for each element separately by the signed byte placed at the corresponding
-
 
3049
position in the third operand. The source containing elements to shift is
-
 
3050
provided as second operand. Either second or third operand can be 128-bit 
-
 
3051
memory (or they can be SSE registers both) and the other operands have to be 
-
 
3052
SSE registers.
-
 
3053
 
-
 
3054
    vpshld xmm3,xmm1,[ebx]       ; shift bytes from xmm1
-
 
3055
 
-
 
3056
"vpshab", "vpshaw", "vpshad" and "vpshaq" arithmetically shift bytes, words, 
-
 
3057
double words or quad words. These instructions follow the same rules as the 
-
 
3058
logical shifts described above. "vprotb", "vprotw", "vprotd" and "vprotq" 
-
 
3059
rotate bytes, word, double words or quad words. They follow the same rules as
-
 
3060
shifts, but additionally allow third operand to be immediate value, in which
-
 
3061
case the same amount of rotation is specified for all the elements in source.
-
 
3062
 
-
 
3063
    vprotb xmm0,[esi],3          ; rotate bytes to the left 
-
 
3064
 
-
 
3065
  The MOVBE extension introduces just one new instruction, "movbe", which
-
 
3066
swaps bytes in value from source before storing it in destination, so can
-
 
3067
be used to load and store big endian values. It takes two operands, either 
-
 
3068
the destination or source should be a 16-bit, 32-bit or 64-bit memory (the 
-
 
3069
last one being only allowed in long mode), and the other operand should be 
-
 
3070
a general register of the same size.  
-
 
3071
  The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces 
-
 
3072
new instructions operating on general registers, which use the same encoding
-
 
3073
as AVX instructions and so allow the extended syntax. All these instructions
-
 
3074
use 32-bit operands, and in long mode they also allow the forms with 64-bit
-
 
3075
operands.
-
 
3076
  "andn" calculates the bitwise AND of second source with the inverted bits
-
 
3077
of first source and stores the result in destination. The destination and 
-
 
3078
the first source have to be general registers, the second source can be 
-
 
3079
general register or memory.
-
 
3080
 
-
 
3081
    andn edx,eax,[ebx]   ; bit-multiply inverted eax with memory
-
 
3082
 
-
 
3083
  "bextr" extracts from the first source the sequence of bits using an index
-
 
3084
and length specified by bit fields in the second source operand and stores
-
 
3085
it into destination. The lowest 8 bits of second source specify the position 
-
 
3086
of bit sequence to extract and the next 8 bits of second source specify the 
-
 
3087
length of sequence. The first source can be a general register or memory,
-
 
3088
the other two operands have to be general registers.
-
 
3089
 
-
 
3090
    bextr eax,[esi],ecx  ; extract bit field from memory
-
 
3091
    
-
 
3092
  "blsi" extracts the lowest set bit from the source, setting all the other 
-
 
3093
bits in destination to zero. The destination must be a general register,
-
 
3094
the source can be general register or memory.
-
 
3095
 
-
 
3096
    blsi rax,r11         ; isolate the lowest set bit       
-
 
3097
  
-
 
3098
  "blsmsk" sets all the bits in the destination up to the lowest set bit in 
-
 
3099
the source, including this bit. "blsr" copies all the bits from the source to
-
 
3100
destination except for the lowest set bit, which is replaced by zero. These
-
 
3101
instructions follow the same rules for operands as "blsi".
-
 
3102
  "tzcnt" counts the number of trailing zero bits, that is the zero bits up to
-
 
3103
the lowest set bit of source value. This instruction is analogous to "lzcnt"
-
 
3104
and follows the same rules for operands, so it also has a 16-bit version, 
-
 
3105
unlike the other BMI instructions.
-
 
3106
  "bzhi" is BMI2 instruction, which copies the bits from first source to
-
 
3107
destination, zeroing all the bits up from the position specified by second
-
 
3108
source. It follows the same rules for operands as "bextr".
-
 
3109
  "pext" uses a mask in second source operand to select bits from first 
-
 
3110
operands and puts the selected bits as a continuous sequence into destination.
-
 
3111
"pdep" performs the reverse operation - it takes sequence of bits from the
-
 
3112
first source and puts them consecutively at the positions where the bits in 
-
 
3113
second source are set, setting all the other bits in destination to zero.
-
 
3114
These BMI2 instructions follow the same rules for operands as "andn".    
-
 
3115
  "mulx" is a BMI2 instruction which performs an unsigned multiplication of
-
 
3116
value from EDX or RDX register (depending on the size of specified operands)
-
 
3117
by the value from third operand, and stores the low half of result in the
-
 
3118
second operand, and the high half of result in the first operand, and it does
-
 
3119
it without affecting the flags. The third operand can be general register or 
-
 
3120
memory, and both the destination operands have to be general registers.
-
 
3121
 
-
 
3122
    mulx edx,eax,ecx     ; multiply edx by ecx into edx:eax   
-
 
3123
 
-
 
3124
  "shlx", "shrx" and "sarx" are BMI2 instructions, which perform logical or
-
 
3125
arithmetical shifts of value from first source by the amount specified by
-
 
3126
second source, and store the result in destination without affecting the 
-
 
3127
flags. The have the same rules for operands as "bzhi" instruction.
-
 
3128
  "rorx" is a BMI2 instruction which rotates right the value from source
-
 
3129
operand by the constant amount specified in third operand and stores the
-
 
3130
result in destination without affecting the flags. The destination operand
-
 
3131
has to be general register, the source operand can be general register or
-
 
3132
memory, and the third operand has to be an immediate value.
-
 
3133
 
-
 
3134
    rorx eax,edx,7       ; rotate without affecting flags
-
 
3135
                     
-
 
3136
  The TBM is an extension designed by AMD to supplement the BMI set. The 
-
 
3137
"bextr" instruction is extended with a new form, in which second source is
-
 
3138
a 32-bit immediate value. "blsic" is a new instruction which performs the
-
 
3139
same operation as "blsi", but with the bits of result reversed. It uses the
-
 
3140
same rules for operands as "blsi". "blsfill" is a new instruction, which takes
-
 
3141
the value from source, sets all the bits below the lowest set bit and store
-
 
3142
the result in destination, it also uses the same rules for operands as "blsi".
-
 
3143
  "blci", "blcic", "blcs", "blcmsk" and "blcfill" are instructions analogous
-
 
3144
to "blsi", "blsic", "blsr", "blsmsk" and "blsfill" respectively, but they
-
 
3145
perform the bit-inverted versions of the same operations. They follow the
-
 
3146
same rules for operands as the instructions they reflect.
-
 
3147
  "tzmsk" finds the lowest set bit in value from source operand, sets all bits
-
 
3148
below it to 1 and all the rest of bits to zero, then writes the result to 
-
 
3149
destination. "t1mskc" finds the least significant zero bit in the value from 
-
 
3150
source  operand, sets the bits below it to zero and all the other bits to 1, 
-
 
3151
and writes the result to destination. These instructions have the same rules
-
 
3152
for operands as "blsi".
-
 
3153
      
-
 
3154
 
-
 
3155
2.1.24  Other extensions of instruction set
-
 
3156
 
-
 
3157
There is a number of additional instruction set extensions recognized by flat
-
 
3158
assembler, and the general syntax of the instructions introduced by those
-
 
3159
extensions is provided here. For a detailed information on the operations
-
 
3160
performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE, 
-
 
3161
RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM 
-
 
3162
extension).
-
 
3163
  The Virtual-Machine Extensions (VMX) provide a set of instructions for the
-
 
3164
management of virtual machines. The "vmxon" instruction, which enters the VMX
-
 
3165
operation, requires a single 64-bit memory operand, which should be a physical
-
 
3166
address of memory region, which the logical processor may use to support VMX
-
 
3167
operation. The "vmxoff" instruction, which leaves the VMX operation, has no
-
 
3168
operands. The "vmlaunch" and "vmresume", which launch or resume the virtual
-
 
3169
machines, and "vmcall", which allows guest software to call the VM monitor, 
-
 
3170
use no operands either.
-
 
3171
  The "vmptrld" loads the physical address of current Virtual Machine Control
-
 
3172
Structure (VMCS) from its memory operand, "vmptrst" stores the pointer to
-
 
3173
current VMCS into address specified by its memory operand, and "vmclear" sets
-
 
3174
the launch state of the VMCS referenced by its memory operand to clear. These
-
 
3175
three instruction all require single 64-bit memory operand.
-
 
3176
  The "vmread" reads from VCMS a field specified by the source operand and
-
 
3177
stores it into the destination operand. The source operand should be a
-
 
3178
general purpose register, and the destination operand can be a register of
-
 
3179
memory. The "vmwrite" writes into a VMCS field specified by the destination
-
 
3180
operand the value provided by source operand. The source operand can be a
-
 
3181
general purpose register or memory, and the destination operand must be a
-
 
3182
register. The size of operands for those instructions should be 64-bit when
-
 
3183
in long mode, and 32-bit otherwise.
-
 
3184
  The "invept" and "invvpid" invalidate the translation lookaside buffers
-
 
3185
(TLBs) and paging-structure caches, either derived from extended page tables
-
 
3186
(EPT), or based on the virtual processor identifier (VPID). These instructions
-
 
3187
require two operands, the first one being the general purpose register
-
 
3188
specifying the type of invalidation, and the second one being a 128-bit
-
 
3189
memory operand providing the invalidation descriptor. The first operand
-
 
3190
should be a 64-bit register when in long mode, and 32-bit register otherwise.
-
 
3191
  The Safer Mode Extensions (SMX) provide the functionalities available
-
 
3192
throught the "getsec" instruction. This instruction takes no operands, and
-
 
3193
the function that is executed is determined by the contents of EAX register
-
 
3194
upon executing this instruction.
-
 
3195
  The Secure Virtual Machine (SVM) is a variant of virtual machine extension
-
 
3196
used by AMD. The "skinit" instruction securely reinitializes the processor
-
 
3197
allowing the startup of trusted software, such as the virtual machine monitor
-
 
3198
(VMM). This instruction takes a single operand, which must be EAX, and
-
 
3199
provides a physical address of the secure loader block (SLB).
-
 
3200
  The "vmrun" instruction is used to start a guest virtual machine,
-
 
3201
its only operand should be an accumulator register (AX, EAX or RAX, the
-
 
3202
last one available only in long mode) providing the physical address of the
-
 
3203
virtual machine control block (VMCB). The "vmsave" stores a subset of 
-
 
3204
processor state into VMCB specified by its operand, and "vmload" loads the 
-
 
3205
same subset of processor state from a specified VMCB. The same operand rules 
-
 
3206
as for the "vmrun" apply to those two instructions.
-
 
3207
  "vmmcall" allows the guest software to call the VMM. This instruction takes
-
 
3208
no operands.
-
 
3209
  "stgi" set the global interrupt flag to 1, and "clgi" zeroes it. These
-
 
3210
instructions take no operands.
-
 
3211
  "invlpga" invalidates the TLB mapping for a virtual page specified by the
-
 
3212
first operand (which has to be accumulator register) and address space
-
 
3213
identifier specified by the second operand (which must be ECX register).
-
 
3214
  The XSAVE set of instructions allows to save and restore processor state
-
 
3215
components. "xsave" and "xsaveopt" store the components of processor state 
-
 
3216
defined by bit mask in EDX and EAX registers into area defined by memory 
-
 
3217
operand. "xrstor" restores from the area specified by memory operand the 
-
 
3218
components of processor state defined by mask in EDX and EAX. The "xsave64",
-
 
3219
"xsaveopt64" and "xrstor64" are 64-bit versions of these instructions, allowed
-
 
3220
only in long mode.
-
 
3221
  "xgetbv" read the contents of 64-bit XCR (extended control register)
-
 
3222
specified in ECX register into EDX and EAX registers. "xsetbv" writes the
-
 
3223
contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
-
 
3224
instructions have no operands.
-
 
3225
  The RDRAND extension introduces one new instruction, "rdrand", which loads
-
 
3226
the hardware-generated random value into general register. It takes one
-
 
3227
operand, which can be 16-bit, 32-bit or 64-bit register (with the last one 
-
 
3228
being allowed only in long mode).
-
 
3229
  The FSGSBASE extension adds long mode instructions that allow to read and 
-
 
3230
write the segment base registers for FS and GS segments. "rdfsbase" and 
-
 
3231
"rdgsbase" read the corresponding segment base registers into operand, while 
-
 
3232
"wrfsbase" and "wrgsbase" write the value of operand into those register.
-
 
3233
All these instructions take one operand, which can be 32-bit or 64-bit general
-
 
3234
register.  
-
 
3235
  The INVPCID extension adds "invpcid" instruction, which invalidates mapping
-
 
3236
in the TLBs and paging caches based on the invalidation type specified in 
-
 
3237
first operand and PCID invalidate descriptor specified in second operand.
-
 
3238
The first operands should be 32-bit general register when not in long mode,
-
 
3239
or 64-bit general register when in long mode. The second operand should be
-
 
3240
128-bit memory location.  
-
 
3241
  The HLE and RTM extensions provide set of instructions for the transactional
-
 
3242
management. The "xacquire" and "xrelease" are new prefixes that can be used
-
 
3243
with some of the instructions to start or end lock elision on the memory
-
 
3244
address specified by prefixed instruction. The "xbegin" instruction starts
-
 
3245
the transactional execution, its operand is the address a fallback routine
-
 
3246
that gets executes in case of transaction abort, specified like the operand
-
 
3247
for near jump instruction. "xend" marks the end of transcational execution
-
 
3248
region, it takes no operands. "xabort" forces the transaction abort, it takes
-
 
3249
an 8-bit immediate value as its only operand, this value is passed in the
-
 
3250
highest bits of EAX to the fallback routine. "xtest" checks whether there is
-
 
3251
transactional execution in progress, this instruction takes no operands.
-
 
3252
 
2211
 
3253
 
2212
2.2  Control directives
3254
2.2  Control directives
2213
 
3255
 
2214
This section describes the directives that control the assembly process, they
3256
This section describes the directives that control the assembly process, they
2215
are processed during the assembly and may cause some blocks of instructions
3257
are processed during the assembly and may cause some blocks of instructions
2216
to be assembled differently or not assembled at all.
3258
to be assembled differently or not assembled at all.
2217
 
3259
 
2218
 
3260
 
2219
2.2.1  Numerical constants
3261
2.2.1  Numerical constants
2220
 
3262
 
2221
The "=" directive allows to define the numerical constant. It should be
3263
The "=" directive allows to define the numerical constant. It should be
2222
preceded by the name for the constant and followed by the numerical expression
3264
preceded by the name for the constant and followed by the numerical expression
2223
providing the value. The value of such constants can be a number or an address,
3265
providing the value. The value of such constants can be a number or an address,
2224
but - unlike labels - the numerical constants are not allowed to hold the
3266
but - unlike labels - the numerical constants are not allowed to hold the
2225
register-based addresses. Besides this difference, in their basic variant
3267
register-based addresses. Besides this difference, in their basic variant
2226
numerical constants behave very much like labels and you can even
3268
numerical constants behave very much like labels and you can even
2227
forward-reference them (access their values before they actually get defined).
3269
forward-reference them (access their values before they actually get defined).
2228
  There is, however, a second variant of numerical constants, which is
3270
  There is, however, a second variant of numerical constants, which is
2229
recognized by assembler when you try to define the constant of name, under
3271
recognized by assembler when you try to define the constant of name, under
2230
which there already was a numerical constant defined. In such case assembler
3272
which there already was a numerical constant defined. In such case assembler
2231
treats that constant as an assembly-time variable and allows it to be assigned
3273
treats that constant as an assembly-time variable and allows it to be assigned
2232
with new value, but forbids forward-referencing it (for obvious reasons). Let's
3274
with new value, but forbids forward-referencing it (for obvious reasons). Let's
2233
see both the variant of numerical constants in one example:
3275
see both the variant of numerical constants in one example:
2234
 
3276
 
2235
    dd sum
3277
    dd sum
2236
    x = 1
3278
    x = 1
2237
    x = x+2
3279
    x = x+2
2238
    sum = x
3280
    sum = x
2239
 
3281
 
2240
Here the "x" is an assembly-time variable, and every time it is accessed, the
3282
Here the "x" is an assembly-time variable, and every time it is accessed, the
2241
value that was assigned to it the most recently is used. Thus if we tried to
3283
value that was assigned to it the most recently is used. Thus if we tried to
2242
access the "x" before it gets defined the first time, like if we wrote "dd x"
3284
access the "x" before it gets defined the first time, like if we wrote "dd x"
2243
in place of the "dd sum" instruction, it would cause an error. And when it is
3285
in place of the "dd sum" instruction, it would cause an error. And when it is
2244
re-defined with the "x = x+2" directive, the previous value of "x" is used to
3286
re-defined with the "x = x+2" directive, the previous value of "x" is used to
2245
calculate the new one. So when the "sum" constant gets defined, the "x" has
3287
calculate the new one. So when the "sum" constant gets defined, the "x" has
2246
value of 3, and this value is assigned to the "sum". Since this one is defined
3288
value of 3, and this value is assigned to the "sum". Since this one is defined
2247
only once in source, it is the standard numerical constant, and can be
3289
only once in source, it is the standard numerical constant, and can be
2248
forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
3290
forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
2249
how the assembler is able to resolve this, see section 2.2.6.
3291
how the assembler is able to resolve this, see section 2.2.6.
2250
  The value of numerical constant can be preceded by size operator, which can
3292
  The value of numerical constant can be preceded by size operator, which can
2251
ensure that the value will fit in the range for the specified size, and can
3293
ensure that the value will fit in the range for the specified size, and can
2252
affect also how some of the calculations inside the numerical expression are
3294
affect also how some of the calculations inside the numerical expression are
2253
performed. This example:
3295
performed. This example:
2254
 
3296
 
2255
    c8 = byte -1
3297
    c8 = byte -1
2256
    c32 = dword -1
3298
    c32 = dword -1
2257
 
3299
 
2258
defines two different constants, the first one fits in 8 bits, the second one
3300
defines two different constants, the first one fits in 8 bits, the second one
2259
fits in 32 bits.
3301
fits in 32 bits.
2260
  When you need to define constant with the value of address, which may be
3302
  When you need to define constant with the value of address, which may be
2261
register-based (and thus you cannot employ numerical constant for this
3303
register-based (and thus you cannot employ numerical constant for this
2262
purpose), you can use the extended syntax of "label" directive (already
3304
purpose), you can use the extended syntax of "label" directive (already
2263
described in section 1.2.3), like:
3305
described in section 1.2.3), like:
2264
 
3306
 
2265
    label myaddr at ebp+4
3307
    label myaddr at ebp+4
2266
 
3308
 
2267
which declares label placed at "ebp+4" address. However remember that labels,
3309
which declares label placed at "ebp+4" address. However remember that labels,
2268
unlike numerical constants, cannot become assembly-time variables.
3310
unlike numerical constants, cannot become assembly-time variables.
2269
 
3311
 
2270
 
3312
 
2271
2.2.2  Conditional assembly
3313
2.2.2  Conditional assembly
2272
 
3314
 
2273
"if" directive causes come block of instructions to be assembled only under
3315
"if" directive causes some block of instructions to be assembled only under
2274
certain condition. It should be followed by logical expression specifying the
3316
certain condition. It should be followed by logical expression specifying the
2275
condition, instructions in next lines will be assembled only when this
3317
condition, instructions in next lines will be assembled only when this
2276
condition is met, otherwise they will be skipped. The optional "else if"
3318
condition is met, otherwise they will be skipped. The optional "else if"
2277
directive followed with logical expression specifying additional condition
3319
directive followed with logical expression specifying additional condition
2278
begins the next block of instructions that will be assembled if previous
3320
begins the next block of instructions that will be assembled if previous
2279
conditions were not met, and the additional condition is met. The optional
3321
conditions were not met, and the additional condition is met. The optional
2280
"else" directive begins the block of instructions that will be assembled if
3322
"else" directive begins the block of instructions that will be assembled if
2281
all the conditions were not met. The "end if" directive ends the last block of
3323
all the conditions were not met. The "end if" directive ends the last block of
2282
instructions.
3324
instructions.
2283
  You should note that "if" directive is processed at assembly stage and
3325
  You should note that "if" directive is processed at assembly stage and
2284
therefore it doesn't affect any preprocessor directives, like the definitions
3326
therefore it doesn't affect any preprocessor directives, like the definitions
2285
of symbolic constants and macroinstructions - when the assembler recognizes the
3327
of symbolic constants and macroinstructions - when the assembler recognizes the
2286
"if" directive, all the preprocessing has been already finished.
3328
"if" directive, all the preprocessing has been already finished.
2287
  The logical expression consist of logical values and logical operators. The
3329
  The logical expression consist of logical values and logical operators. The
2288
logical operators are "~" for logical negation, "&" for logical and, "|" for
3330
logical operators are "~" for logical negation, "&" for logical and, "|" for
2289
logical or. The negation has the highest priority. Logical value can be a
3331
logical or. The negation has the highest priority. Logical value can be a
2290
numerical expression, it will be false if it is equal to zero, otherwise it
3332
numerical expression, it will be false if it is equal to zero, otherwise it
2291
will be true. Two numerical expression can be compared using one of the
3333
will be true. Two numerical expression can be compared using one of the
2292
following operators to make the logical value: "=" (equal), "<" (less),
3334
following operators to make the logical value: "=" (equal), "<" (less),
2293
">" (greater), "<=" (less or equal), ">=" (greater or equal),
3335
">" (greater), "<=" (less or equal), ">=" (greater or equal),
2294
"<>" (not equal).
3336
"<>" (not equal).
2295
  The "used" operator followed by a symbol name, is the logical value that
3337
  The "used" operator followed by a symbol name, is the logical value that
2296
checks whether the given symbol is used somewhere (it returns correct result
3338
checks whether the given symbol is used somewhere (it returns correct result
2297
even if symbol is used only after this check). The "defined" operator can be
3339
even if symbol is used only after this check). The "defined" operator can be
2298
followed by any expression, usually just by a single symbol name; it checks
3340
followed by any expression, usually just by a single symbol name; it checks
2299
whether the given expression contains only symbols that are defined in the
3341
whether the given expression contains only symbols that are defined in the
2300
source and accessible from the current position.
3342
source and accessible from the current position.
2301
  The following simple example uses the "count" constant that should be
3343
  With "relativeto" operator it is possible to check whether values of two
-
 
3344
expressions differ only by constant amount. The valid syntax is a numerical
-
 
3345
expression followed by "relativeto" and then another expression (possibly
-
 
3346
register-based). Labels that have no simple numerical value can be tested
-
 
3347
this way to determine what kind of operations may be possible with them.
-
 
3348
  The following simple example uses the "count" constant that should be
2302
defined somewhere in source:
3349
defined somewhere in source:
2303
 
3350
 
2304
    if count>0
3351
    if count>0
2305
        mov cx,count
3352
        mov cx,count
2306
        rep movsb
3353
        rep movsb
2307
    end if
3354
    end if
2308
 
3355
 
2309
These two assembly instructions will be assembled only if the "count" constant
3356
These two assembly instructions will be assembled only if the "count" constant
2310
is greater than 0. The next sample shows more complex conditional structure:
3357
is greater than 0. The next sample shows more complex conditional structure:
2311
 
3358
 
2312
    if count & ~ count mod 4
3359
    if count & ~ count mod 4
2313
        mov cx,count/4
3360
        mov cx,count/4
2314
        rep movsd
3361
        rep movsd
2315
    else if count>4
3362
    else if count>4
2316
        mov cx,count/4
3363
        mov cx,count/4
2317
        rep movsd
3364
        rep movsd
2318
        mov cx,count mod 4
3365
        mov cx,count mod 4
2319
        rep movsb
3366
        rep movsb
2320
    else
3367
    else
2321
        mov cx,count
3368
        mov cx,count
2322
        rep movsb
3369
        rep movsb
2323
    end if
3370
    end if
2324
 
3371
 
2325
The first block of instructions gets assembled when the "count" is non zero and
3372
The first block of instructions gets assembled when the "count" is non zero and
2326
divisible by four, if this condition is not met, the second logical expression,
3373
divisible by four, if this condition is not met, the second logical expression,
2327
which follows the "else if", is evaluated and if it's true, the second block
3374
which follows the "else if", is evaluated and if it's true, the second block
2328
of instructions get assembled, otherwise the last block of instructions, which
3375
of instructions get assembled, otherwise the last block of instructions, which
2329
follows the line containing only "else", is assembled.
3376
follows the line containing only "else", is assembled.
2330
  There are also operators that allow comparison of values being any chains of
3377
  There are also operators that allow comparison of values being any chains of
2331
symbols. The "eq" compares two such values whether they are exactly the same.
3378
symbols. The "eq" compares whether two such values are exactly the same.
2332
The "in" operator checks whether given value is a member of the list of values
3379
The "in" operator checks whether given value is a member of the list of values
2333
following this operator, the list should be enclosed between "<" and ">"
3380
following this operator, the list should be enclosed between "<" and ">"
2334
characters, its members should be separated with commas. The symbols are
3381
characters, its members should be separated with commas. The symbols are
2335
considered the same when they have the same meaning for the assembler - for
3382
considered the same when they have the same meaning for the assembler - for
2336
example "pword" and "fword" for assembler are the same and thus are not
3383
example "pword" and "fword" for assembler are the same and thus are not
2337
distinguished by the above operators. In the same way "16 eq 10h" is the true
3384
distinguished by the above operators. In the same way "16 eq 10h" is the true
2338
condition, however "16 eq 10+4" is not.
3385
condition, however "16 eq 10+4" is not.
2339
  The "eqtype" operator checks whether the two compared values have the same
3386
  The "eqtype" operator checks whether the two compared values have the same
2340
structure, and whether the structural elements are of the same type. The
3387
structure, and whether the structural elements are of the same type. The
2341
distinguished types include numerical expressions, individual quoted strings,
3388
distinguished types include numerical expressions, individual quoted strings,
2342
floating point numbers, address expressions (the expressions enclosed in square
3389
floating point numbers, address expressions (the expressions enclosed in square
2343
brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
3390
brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
2344
operators, jump type and code type operators. And each of the special
3391
operators, jump type and code type operators. And each of the special
2345
characters that act as a separators, like comma or colon, is the separate type
3392
characters that act as a separators, like comma or colon, is the separate type
2346
itself. For example, two values, each one consisting of register name followed
3393
itself. For example, two values, each one consisting of register name followed
2347
by comma and numerical expression, will be regarded as of the same type, no
3394
by comma and numerical expression, will be regarded as of the same type, no
2348
matter what kind of register and how complicated numerical expression is used;
3395
matter what kind of register and how complicated numerical expression is used;
2349
with exception for the quoted strings and floating point values, which are the
3396
with exception for the quoted strings and floating point values, which are the
2350
special kinds of numerical expressions and are treated as different types. Thus
3397
special kinds of numerical expressions and are treated as different types. Thus
2351
"eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
3398
"eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
2352
 
3399
 
2353
 
3400
 
2354
2.2.3 Repeating blocks of instructions
3401
2.2.3 Repeating blocks of instructions
2355
 
3402
 
2356
"times" directive repeats one instruction specified number of times. It
3403
"times" directive repeats one instruction specified number of times. It
2357
should be followed by numerical expression specifying number of repeats and
3404
should be followed by numerical expression specifying number of repeats and
2358
the instruction to repeat (optionally colon can be used to separate number and
3405
the instruction to repeat (optionally colon can be used to separate number and
2359
instruction). When special symbol "%" is used inside the instruction, it is
3406
instruction). When special symbol "%" is used inside the instruction, it is
2360
equal to the number of current repeat. For example "times 5 db %" will define
3407
equal to the number of current repeat. For example "times 5 db %" will define
2361
five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
3408
five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
2362
also allowed, so "times 3 times % db %" will define six bytes with values
3409
also allowed, so "times 3 times % db %" will define six bytes with values
2363
1, 1, 2, 1, 2, 3.
3410
1, 1, 2, 1, 2, 3.
2364
  "repeat" directive repeats the whole block of instructions. It should be
3411
  "repeat" directive repeats the whole block of instructions. It should be
2365
followed by numerical expression specifying number of repeats. Instructions
3412
followed by numerical expression specifying number of repeats. Instructions
2366
to repeat are expected in next lines, ended with the "end repeat" directive,
3413
to repeat are expected in next lines, ended with the "end repeat" directive,
2367
for example:
3414
for example:
2368
 
3415
 
2369
    repeat 8
3416
    repeat 8
2370
        mov byte [bx],%
3417
        mov byte [bx],%
2371
        inc bx
3418
        inc bx
2372
    end repeat
3419
    end repeat
2373
 
3420
 
2374
The generated code will store byte values from one to eight in the memory
3421
The generated code will store byte values from one to eight in the memory
2375
addressed by BX register.
3422
addressed by BX register.
2376
  Number of repeats can be zero, in that case the instructions are not
3423
  Number of repeats can be zero, in that case the instructions are not
2377
assembled at all.
3424
assembled at all.
2378
  The "break" directive allows to stop repeating earlier and continue assembly
3425
  The "break" directive allows to stop repeating earlier and continue assembly
2379
from the first line after the "end repeat". Combined with the "if" directive it
3426
from the first line after the "end repeat". Combined with the "if" directive it
2380
allows to stop repeating under some special condition, like:
3427
allows to stop repeating under some special condition, like:
2381
 
3428
 
2382
    s = x/2
3429
    s = x/2
2383
    repeat 100
3430
    repeat 100
2384
        if x/s = s
3431
        if x/s = s
2385
            break
3432
            break
2386
        end if
3433
        end if
2387
        s = (s+x/s)/2
3434
        s = (s+x/s)/2
2388
    end repeat
3435
    end repeat
2389
 
3436
 
2390
  The "while" directive repeats the block of instructions as long as the
3437
  The "while" directive repeats the block of instructions as long as the
2391
condition specified by the logical expression following it is true. The block
3438
condition specified by the logical expression following it is true. The block
2392
of instructions to be repeated should end with the "end while" directive.
3439
of instructions to be repeated should end with the "end while" directive.
2393
Before each repetition the logical expression is evaluated and when its value
3440
Before each repetition the logical expression is evaluated and when its value
2394
is false, the assembly is continued starting from the first line after the
3441
is false, the assembly is continued starting from the first line after the
2395
"end while". Also in this case the "%" symbol holds the number of current
3442
"end while". Also in this case the "%" symbol holds the number of current
2396
repeat. The "break" directive can be used to stop this kind of loop in the same
3443
repeat. The "break" directive can be used to stop this kind of loop in the same
2397
way as with "repeat" directive. The previous sample can be rewritten to use the
3444
way as with "repeat" directive. The previous sample can be rewritten to use the
2398
"while" instead of "repeat" this way:
3445
"while" instead of "repeat" this way:
2399
 
3446
 
2400
    s = x/2
3447
    s = x/2
2401
    while x/s <> s
3448
    while x/s <> s
2402
        s = (s+x/s)/2
3449
        s = (s+x/s)/2
2403
        if % = 100
3450
        if % = 100
2404
            break
3451
            break
2405
        end if
3452
        end if
2406
    end while
3453
    end while
2407
 
3454
 
2408
  The blocks defined with "if", "repeat" and "while" can be nested in any
3455
  The blocks defined with "if", "repeat" and "while" can be nested in any
2409
order, however they should be closed in the same order in which they were
3456
order, however they should be closed in the same order in which they were
2410
started. The "break" directive always stops processing the block that was
3457
started. The "break" directive always stops processing the block that was
2411
started last with either the "repeat" or "while" directive.
3458
started last with either the "repeat" or "while" directive.
2412
 
3459
 
2413
 
3460
 
2414
2.2.4  Addressing spaces
3461
2.2.4  Addressing spaces
2415
 
3462
 
2416
  "org" directive sets address at which the following code is expected to
3463
  "org" directive sets address at which the following code is expected to
2417
appear in memory. It should be followed by numerical expression specifying
3464
appear in memory. It should be followed by numerical expression specifying
2418
the address. This directive begins the new addressing space, the following
3465
the address. This directive begins the new addressing space, the following
2419
code itself is not moved in any way, but all the labels defined within it
3466
code itself is not moved in any way, but all the labels defined within it
2420
and the value of "$" symbol are affected as if it was put at the given
3467
and the value of "$" symbol are affected as if it was put at the given
2421
address. However it's the responsibility of programmer to put the code at
3468
address. However it's the responsibility of programmer to put the code at
2422
correct address at run-time.
3469
correct address at run-time.
2423
  The "load" directive allows to define constant with a binary value loaded
3470
  The "load" directive allows to define constant with a binary value loaded
2424
from the already assembled code. This directive should be followed by the name
3471
from the already assembled code. This directive should be followed by the name
2425
of the constant, then optionally size operator, then "from" operator and a
3472
of the constant, then optionally size operator, then "from" operator and a
2426
numerical expression specifying a valid address in current addressing space.
3473
numerical expression specifying a valid address in current addressing space.
2427
The size operator has unusual meaning in this case - it states how many bytes
3474
The size operator has unusual meaning in this case - it states how many bytes
2428
(up to 8) have to be loaded to form the binary value of constant. If no size
3475
(up to 8) have to be loaded to form the binary value of constant. If no size
2429
operator is specified, one byte is loaded (thus value is in range from 0 to
3476
operator is specified, one byte is loaded (thus value is in range from 0 to
2430
255). The loaded data cannot exceed current offset.
3477
255). The loaded data cannot exceed current offset.
2431
  The "store" directive can modify the already generated code by replacing
3478
  The "store" directive can modify the already generated code by replacing
2432
some of the previously generated data with the value defined by given
3479
some of the previously generated data with the value defined by given
2433
numerical expression, which follow. The expression can be preceded by the
3480
numerical expression, which follows. The expression can be preceded by the
2434
optional size operator to specify how large value the expression defines, and
3481
optional size operator to specify how large value the expression defines, and
2435
therefore how much bytes will be stored, if there is no size operator, the
3482
therefore how much bytes will be stored, if there is no size operator, the
2436
size of one byte is assumed. Then the "at" operator and the numerical
3483
size of one byte is assumed. Then the "at" operator and the numerical
2437
expression defining the valid address in current addressing code space, at
3484
expression defining the valid address in current addressing code space, at
2438
which the given value have to be stored should follow. This is a directive for
3485
which the given value have to be stored should follow. This is a directive for
2439
advanced appliances and should be used carefully.
3486
advanced appliances and should be used carefully.
2440
  Both "load" and "store" directives are limited to operate on places in
3487
  Both "load" and "store" directives are limited to operate on places in
2441
current addressing space. The "$$" symbol is always equal to the base address
3488
current addressing space. The "$$" symbol is always equal to the base address
2442
of current addressing space, and the "$" symbol is the address of current
3489
of current addressing space, and the "$" symbol is the address of current
2443
position in that addressing space, therefore these two values define limits
3490
position in that addressing space, therefore these two values define limits
2444
of the area, where "load" and "store" can operate.
3491
of the area, where "load" and "store" can operate.
2445
  Combining the "load" and "store" directives allows to do things like encoding
3492
  Combining the "load" and "store" directives allows to do things like encoding
2446
some of the already generated code. For example to encode the whole code
3493
some of the already generated code. For example to encode the whole code
2447
generated in current addressing space you can use such block of directives:
3494
generated in current addressing space you can use such block of directives:
2448
 
3495
 
2449
    repeat $-$$
3496
    repeat $-$$
2450
        load a byte from $$+%-1
3497
        load a byte from $$+%-1
2451
        store byte a xor c at $$+%-1
3498
        store byte a xor c at $$+%-1
2452
    end repeat
3499
    end repeat
2453
 
3500
 
2454
and each byte of code will be xored with the value defined by "c" constant.
3501
and each byte of code will be xored with the value defined by "c" constant.
2455
  "virtual" defines virtual data at specified address. This data won't be
3502
  "virtual" defines virtual data at specified address. This data will not be
2456
included in the output file, but labels defined there can be used in other
3503
included in the output file, but labels defined there can be used in other
2457
parts of source. This directive can be followed by "at" operator and the
3504
parts of source. This directive can be followed by "at" operator and the
2458
numerical expression specifying the address for virtual data, otherwise is
3505
numerical expression specifying the address for virtual data, otherwise is
2459
uses current address, the same as "virtual at $". Instructions defining data
3506
uses current address, the same as "virtual at $". Instructions defining data
2460
are expected in next lines, ended with "end virtual" directive. The block of
3507
are expected in next lines, ended with "end virtual" directive. The block of
2461
virtual instructions itself is an independent addressing space, after it's
3508
virtual instructions itself is an independent addressing space, after it's
2462
ended, the context of previous addressing space is restored.
3509
ended, the context of previous addressing space is restored.
2463
  The "virtual" directive can be used to create union of some variables, for
3510
  The "virtual" directive can be used to create union of some variables, for
2464
example:
3511
example:
2465
 
3512
 
2466
    GDTR dp ?
3513
    GDTR dp ?
2467
    virtual at GDTR
3514
    virtual at GDTR
2468
        GDT_limit dw ?
3515
        GDT_limit dw ?
2469
        GDT_address dd ?
3516
        GDT_address dd ?
2470
    end virtual
3517
    end virtual
2471
 
3518
 
2472
It defines two labels for parts of the 48-bit variable at "GDTR" address.
3519
It defines two labels for parts of the 48-bit variable at "GDTR" address.
2473
  It can be also used to define labels for some structures addressed by a
3520
  It can be also used to define labels for some structures addressed by a
2474
register, for example:
3521
register, for example:
2475
 
3522
 
2476
    virtual at bx
3523
    virtual at bx
2477
        LDT_limit dw ?
3524
        LDT_limit dw ?
2478
        LDT_address dd ?
3525
        LDT_address dd ?
2479
    end virtual
3526
    end virtual
2480
 
3527
 
2481
With such definition instruction "mov ax,[LDT_limit]" will be assembled
3528
With such definition instruction "mov ax,[LDT_limit]" will be assembled
2482
to "mov ax,[bx]".
3529
to the same instruction as "mov ax,[bx]".
2483
  Declaring defined data values or instructions inside the virtual block would
3530
  Declaring defined data values or instructions inside the virtual block would
2484
also be useful, because the "load" directive can be used to load the values
3531
also be useful, because the "load" directive can be used to load the values
2485
from the virtually generated code into a constants. This directive should be
3532
from the virtually generated code into a constants. This directive should be
2486
used after the code it loads but before the virtual block ends, because it can
3533
used after the code it loads but before the virtual block ends, because it can
2487
only load the values from the same addressing space. For example:
3534
only load the values from the same addressing space. For example:
2488
 
3535
 
2489
    virtual at 0
3536
    virtual at 0
2490
        xor eax,eax
3537
        xor eax,eax
2491
        and edx,eax
3538
        and edx,eax
2492
        load zeroq dword from 0
3539
        load zeroq dword from 0
2493
    end virtual
3540
    end virtual
2494
 
3541
 
2495
The above piece of code will define the "zeroq" constant containing four bytes
3542
The above piece of code will define the "zeroq" constant containing four bytes
2496
of the machine code of the instructions defined inside the virtual block.
3543
of the machine code of the instructions defined inside the virtual block.
2497
This method can be also used to load some binary value from external file.
3544
This method can be also used to load some binary value from external file.
2498
For example this code:
3545
For example this code:
2499
 
3546
 
2500
    virtual at 0
3547
    virtual at 0
2501
        file 'a.txt':10h,1
3548
        file 'a.txt':10h,1
2502
        load char from 0
3549
        load char from 0
2503
    end virtual
3550
    end virtual
2504
 
3551
 
2505
loads the single byte from offset 10h in file "a.txt" into the "char"
3552
loads the single byte from offset 10h in file "a.txt" into the "char"
2506
constant.
3553
constant.
2507
  Any of the "section" directives described in 2.4 also begins a new
3554
  Any of the "section" directives described in 2.4 also begins a new
2508
addressing space.
3555
addressing space.
2509
 
3556
 
2510
 
3557
 
2511
2.2.5  Other directives
3558
2.2.5  Other directives
2512
 
3559
 
2513
"align" directive aligns code or data to the specified boundary. It should
3560
"align" directive aligns code or data to the specified boundary. It should
2514
be followed by a numerical expression specifying the number of bytes, to the
3561
be followed by a numerical expression specifying the number of bytes, to the
2515
multiply of which the current address has to be aligned. The boundary value
3562
multiply of which the current address has to be aligned. The boundary value
2516
has to be the power of two.
3563
has to be the power of two.
2517
  The "align" directive fills the bytes that had to be skipped to perform the
3564
  The "align" directive fills the bytes that had to be skipped to perform the
2518
alignment with the "nop" instructions and at the same time marks this area as
3565
alignment with the "nop" instructions and at the same time marks this area as
2519
uninitialized data, so if it is placed among other uninitialized data that
3566
uninitialized data, so if it is placed among other uninitialized data that
2520
wouldn't take space in the output file, the alignment bytes will act the same
3567
wouldn't take space in the output file, the alignment bytes will act the same
2521
way. If you need to fill the alignment area with some other values, you can
3568
way. If you need to fill the alignment area with some other values, you can
2522
combine "align" with "virtual" to get the size of alignment needed and then
3569
combine "align" with "virtual" to get the size of alignment needed and then
2523
create the alignment yourself, like:
3570
create the alignment yourself, like:
2524
 
3571
 
2525
    virtual
3572
    virtual
2526
        align 16
3573
        align 16
2527
        a = $ - $$
3574
        a = $ - $$
2528
    end virtual
3575
    end virtual
2529
    db a dup 0
3576
    db a dup 0
2530
 
3577
 
2531
The "a" constant is defined to be the difference between address after
3578
The "a" constant is defined to be the difference between address after
2532
alignment and address of the "virtual" block (see previous section), so it is
3579
alignment and address of the "virtual" block (see previous section), so it is
2533
equal to the size of needed alignment space.
3580
equal to the size of needed alignment space.
2534
  "display" directive displays the message at the assembly time. It should
3581
  "display" directive displays the message at the assembly time. It should
2535
be followed by the quoted strings or byte values, separated with commas. It
3582
be followed by the quoted strings or byte values, separated with commas. It
2536
can be used to display values of some constants, for example:
3583
can be used to display values of some constants, for example:
2537
 
3584
 
2538
    bits = 16
3585
    bits = 16
2539
    display 'Current offset is 0x'
3586
    display 'Current offset is 0x'
2540
    repeat bits/4
3587
    repeat bits/4
2541
        d = '0' + $ shr (bits-%*4) and 0Fh
3588
        d = '0' + $ shr (bits-%*4) and 0Fh
2542
        if d > '9'
3589
        if d > '9'
2543
            d = d + 'A'-'9'-1
3590
            d = d + 'A'-'9'-1
2544
        end if
3591
        end if
2545
        display d
3592
        display d
2546
    end repeat
3593
    end repeat
2547
    display 13,10
3594
    display 13,10
2548
 
3595
 
2549
This block of directives calculates the four hexadecimal digits of 16-bit value
3596
This block of directives calculates the four hexadecimal digits of 16-bit
2550
and converts them into characters for displaying. Note that this won't work if
3597
value and converts them into characters for displaying. Note that this will 
2551
the adresses in current addressing space are relocatable (as it might happen
3598
not work if the adresses in current addressing space are relocatable (as it 
2552
with PE or object output formats), since only absolute values can be used this
3599
might happen with PE or object output formats), since only absolute values can
2553
way. The absolute value may be obtained by calculating the relative address,
3600
be used this way. The absolute value may be obtained by calculating the 
2554
like "$-$$", or "rva $" in case of PE format.
3601
relative address, like "$-$$", or "rva $" in case of PE format.
-
 
3602
  The "err" directive immediately terminates the assembly process when it is
-
 
3603
encountered by assembler.
-
 
3604
  The "assert" directive tests whether the logical expression that follows it
-
 
3605
is true, and if not, it signalizes the error.
2555
 
3606
 
2556
 
3607
 
2557
2.2.6  Multiple passes
3608
2.2.6  Multiple passes
2558
 
3609
 
2559
Because the assembler allows to reference some of the labels or constants
3610
Because the assembler allows to reference some of the labels or constants
2560
before they get actually defined, it has to predict the values of such labels
3611
before they get actually defined, it has to predict the values of such labels
2561
and if there is even a suspicion that prediction failed in at least one case,
3612
and if there is even a suspicion that prediction failed in at least one case,
2562
it does one more pass, assembling the whole source, this time doing better
3613
it does one more pass, assembling the whole source, this time doing better
2563
prediction based on the values the labels got in the previous pass.
3614
prediction based on the values the labels got in the previous pass.
2564
  The changing values of labels can cause some instructions to have encodings
3615
  The changing values of labels can cause some instructions to have encodings
2565
of different length, and this can cause the change in values of labels again.
3616
of different length, and this can cause the change in values of labels again.
2566
And since the labels and constants can also be used inside the expressions that
3617
And since the labels and constants can also be used inside the expressions that
2567
affect the behavior of control directives, the whole block of source can be
3618
affect the behavior of control directives, the whole block of source can be
2568
processed completely differently during the new pass. Thus the assembler does
3619
processed completely differently during the new pass. Thus the assembler does
2569
more and more passes, each time trying to do better predictions to approach
3620
more and more passes, each time trying to do better predictions to approach
2570
the final solution, when all the values get predicted correctly. It uses
3621
the final solution, when all the values get predicted correctly. It uses
2571
various method for predicting the values, which has been chosen to allow
3622
various method for predicting the values, which has been chosen to allow
2572
finding in a few passes the solution of possibly smallest length for the most
3623
finding in a few passes the solution of possibly smallest length for the most
2573
of the programs.
3624
of the programs.
2574
  Some of the errors, like the values not fitting in required boundaries, are
3625
  Some of the errors, like the values not fitting in required boundaries, are
2575
not signaled during those intermediate passes, since it may happen that when
3626
not signaled during those intermediate passes, since it may happen that when
2576
some of the values are predicted better, these errors will disappear. However
3627
some of the values are predicted better, these errors will disappear. However
2577
if assembler meets some illegal syntax construction or unknown instruction, it
3628
if assembler meets some illegal syntax construction or unknown instruction, it
2578
always stops immediately. Also defining some label more than once causes such
3629
always stops immediately. Also defining some label more than once causes such
2579
error, because it makes the predictions groundless.
3630
error, because it makes the predictions groundless.
2580
  Only the messages created with the "display" directive during the last
3631
  Only the messages created with the "display" directive during the last
2581
performed pass get actually displayed. In case when the assembly has been
3632
performed pass get actually displayed. In case when the assembly has been
2582
stopped due to an error, these messages may reflect the predicted values that
3633
stopped due to an error, these messages may reflect the predicted values that
2583
are not yet resolved correctly.
3634
are not yet resolved correctly.
2584
  The solution may sometimes not exist and in such cases the assembler will
3635
  The solution may sometimes not exist and in such cases the assembler will
2585
never manage to make correct predictions - for this reason there is a limit for
3636
never manage to make correct predictions - for this reason there is a limit for
2586
a number of passes, and when assembler reaches this limit, it stops and
3637
a number of passes, and when assembler reaches this limit, it stops and
2587
displays the message that it is not able to generate the correct output.
3638
displays the message that it is not able to generate the correct output.
2588
Consider the following example:
3639
Consider the following example:
2589
 
3640
 
2590
    if ~ defined alpha
3641
    if ~ defined alpha
2591
        alpha:
3642
        alpha:
2592
    end if
3643
    end if
2593
 
3644
 
2594
The "defined" operator gives the true value when the expression following it
3645
The "defined" operator gives the true value when the expression following it
2595
could be calculated in this place, what in this case means that the "alpha"
3646
could be calculated in this place, what in this case means that the "alpha"
2596
label is defined somewhere. But the above block causes this label to be defined
3647
label is defined somewhere. But the above block causes this label to be defined
2597
only when the value given by "defined" operator is false, what leads to an
3648
only when the value given by "defined" operator is false, what leads to an
2598
antynomy and makes it impossible to resolve such code. When processing the "if"
3649
antynomy and makes it impossible to resolve such code. When processing the "if"
2599
directive assembler has to predict whether the "alpha" label will be defined
3650
directive assembler has to predict whether the "alpha" label will be defined
2600
somewhere (it wouldn't have to predict only if the label was already defined
3651
somewhere (it wouldn't have to predict only if the label was already defined
2601
earlier in this pass), and whatever the prediction is, the opposite always
3652
earlier in this pass), and whatever the prediction is, the opposite always
2602
happens. Thus the assembly will fail, unless the "alpha" label is defined
3653
happens. Thus the assembly will fail, unless the "alpha" label is defined
2603
somewhere in source preceding the above block of instructions - in such case,
3654
somewhere in source preceding the above block of instructions - in such case,
2604
as it was already noted, the prediction is not needed and the block will just
3655
as it was already noted, the prediction is not needed and the block will just
2605
get skipped.
3656
get skipped.
2606
  The above sample might have been written as a try to define the label only
3657
  The above sample might have been written as a try to define the label only
2607
when it was not yet defined. It fails, because the "defined" operator does
3658
when it was not yet defined. It fails, because the "defined" operator does
2608
check whether the label is defined anywhere, and this includes the definition
3659
check whether the label is defined anywhere, and this includes the definition
2609
inside this conditionally processed block. However adding some additional
3660
inside this conditionally processed block. However adding some additional
2610
condition may make it possible to get it resolved:
3661
condition may make it possible to get it resolved:
2611
 
3662
 
2612
    if ~ defined alpha | defined @f
3663
    if ~ defined alpha | defined @f
2613
        alpha:
3664
        alpha:
2614
        @@:
3665
        @@:
2615
    end if
3666
    end if
2616
 
3667
 
2617
The "@f" is always the same label as the nearest "@@" symbol in the source
3668
The "@f" is always the same label as the nearest "@@" symbol in the source
2618
following it, so the above sample would mean the same if any unique name was
3669
following it, so the above sample would mean the same if any unique name was
2619
used instead of the anonymous label. When "alpha" is not defined in any other
3670
used instead of the anonymous label. When "alpha" is not defined in any other
2620
place in source, the only possible solution is when this block gets defined,
3671
place in source, the only possible solution is when this block gets defined,
2621
and this time this doesn't lead to the antynomy, because of the anonymous
3672
and this time this doesn't lead to the antynomy, because of the anonymous
2622
label which makes this block self-establishing. To better understand this,
3673
label which makes this block self-establishing. To better understand this,
2623
look at the blocks that has nothing more than this self-establishing:
3674
look at the blocks that has nothing more than this self-establishing:
2624
 
3675
 
2625
    if defined @f
3676
    if defined @f
2626
        @@:
3677
        @@:
2627
    end if
3678
    end if
2628
 
3679
 
2629
This is an example of source that may have more than one solution, as both
3680
This is an example of source that may have more than one solution, as both
2630
cases when this block gets processed or not are equally correct. Which one of
3681
cases when this block gets processed or not are equally correct. Which one of
2631
those two solutions we get depends on the algorithm on the assembler, in case
3682
those two solutions we get depends on the algorithm on the assembler, in case
2632
of flat assembler - on the algorithm of predictions. Back to the previous
3683
of flat assembler - on the algorithm of predictions. Back to the previous
2633
sample, when "alpha" is not defined anywhere else, the condition for "if" block
3684
sample, when "alpha" is not defined anywhere else, the condition for "if" block
2634
cannot be false, so we are left with only one possible solution, and we can
3685
cannot be false, so we are left with only one possible solution, and we can
2635
hope the assembler will arrive at it. On the other hand, when "alpha" is
3686
hope the assembler will arrive at it. On the other hand, when "alpha" is
2636
defined in some other place, we've got two possible solutions again, but one of
3687
defined in some other place, we've got two possible solutions again, but one of
2637
them causes "alpha" to be defined twice, and such an error causes assembler to
3688
them causes "alpha" to be defined twice, and such an error causes assembler to
2638
abort the assembly immediately, as this is the kind of error that deeply
3689
abort the assembly immediately, as this is the kind of error that deeply
2639
disturbs the process of resolving. So we can get such source either correctly
3690
disturbs the process of resolving. So we can get such source either correctly
2640
resolved or causing an error, and what we get may depend on the internal
3691
resolved or causing an error, and what we get may depend on the internal
2641
choices made by the assembler.
3692
choices made by the assembler.
2642
  However there are some facts about such choices that are certain. When
3693
  However there are some facts about such choices that are certain. When
2643
assembler has to check whether the given symbol is defined and it was already
3694
assembler has to check whether the given symbol is defined and it was already
2644
defined in the current pass, no prediction is needed - it was already noted
3695
defined in the current pass, no prediction is needed - it was already noted
2645
above. And when the given symbol has been defined never before, including all
3696
above. And when the given symbol has been defined never before, including all
2646
the already finished passes, the assembler predicts it to be not defined.
3697
the already finished passes, the assembler predicts it to be not defined.
2647
Knowing this, we can expect that the simple self-establishing block shown
3698
Knowing this, we can expect that the simple self-establishing block shown
2648
above will not be assembled at all and that the previous sample will resolve
3699
above will not be assembled at all and that the previous sample will resolve
2649
correctly when "alpha" is defined somewhere before our conditional block,
3700
correctly when "alpha" is defined somewhere before our conditional block,
2650
while it will itself define "alpha" when it's not already defined earlier, thus
3701
while it will itself define "alpha" when it's not already defined earlier, thus
2651
potentially causing the error because of double definition if the "alpha" is
3702
potentially causing the error because of double definition if the "alpha" is
2652
also defined somewhere later.
3703
also defined somewhere later.
2653
  The "used" operator may be expected to behave in a similar manner in
3704
  The "used" operator may be expected to behave in a similar manner in
2654
analogous cases, however any other kinds of predictions my not be so simple and
3705
analogous cases, however any other kinds of predictions my not be so simple and
2655
you should never rely on them this way.
3706
you should never rely on them this way.
2656
 
3707
  The "err" directive, usually used to stop the assembly when some condition is
-
 
3708
met, stops the assembly immediately, regardless of whether the current pass
-
 
3709
is final or intermediate. So even when the condition that caused this directive
-
 
3710
to be interpreted is mispredicted and temporary, and would eventually disappear 
-
 
3711
in the later passes, the assembly is stopped anyway.
-
 
3712
  The "assert" directive signalizes the error only if its expression is false
-
 
3713
after all the symbols have been resolved. You can use "assert 0" in place of
-
 
3714
"err" when you do not want to have assembly stopped during the intermediate
-
 
3715
passes.
-
 
3716
 
2657
 
3717
 
2658
2.3  Preprocessor directives
3718
2.3  Preprocessor directives
2659
 
3719
 
2660
All preprocessor directives are processed before the main assembly process,
3720
All preprocessor directives are processed before the main assembly process,
2661
and therefore are not affected by the control directives. At this time also
3721
and therefore are not affected by the control directives. At this time also
2662
all comments are stripped out.
3722
all comments are stripped out.
2663
 
3723
 
2664
 
3724
 
2665
2.3.1  Including source files
3725
2.3.1  Including source files
2666
 
3726
 
2667
"include" directive includes the specified source file at the position where
3727
"include" directive includes the specified source file at the position where
2668
it is used. It should be followed by the quoted name of file that should be
3728
it is used. It should be followed by the quoted name of file that should be
2669
included, for example:
3729
included, for example:
2670
 
3730
 
2671
    include 'macros.inc'
3731
    include 'macros.inc'
2672
 
3732
 
2673
The whole included file is preprocessed before preprocessing the lines next
3733
The whole included file is preprocessed before preprocessing the lines next
2674
to the line containing the "include" directive. There are no limits to the
3734
to the line containing the "include" directive. There are no limits to the
2675
number of included files as long as they fit in memory.
3735
number of included files as long as they fit in memory.
2676
  The quoted path can contain environment variables enclosed within "%"
3736
  The quoted path can contain environment variables enclosed within "%"
2677
characters, they will be replaced with their values inside the path, both the
3737
characters, they will be replaced with their values inside the path, both the
2678
"\" and "/" characters are allowed as a path separators. If no absolute path
3738
"\" and "/" characters are allowed as a path separators. The file is first 
2679
is given, the file is first searched for in the directory containing file
3739
searched for in the directory containing file which included it and when it is
-
 
3740
not found there, the search is continued in the directories specified in the 
2680
which included it and when it's not found there, in the directory containing
3741
environment variable called INCLUDE (the multiple paths separated with 
-
 
3742
semicolons can be defined there, they will be searched in the same order as 
-
 
3743
specified). If file was not found in any of these places, preprocessor looks
2681
the main source file (the one specified in command line). These rules concern
3744
for it in the directory containing the main source file (the one specified in 
2682
also paths given with the "file" directive.
3745
command line). These rules concern also paths given with the "file" directive.
2683
 
3746
 
2684
 
3747
 
2685
2.3.2  Symbolic constants
3748
2.3.2  Symbolic constants
2686
 
3749
 
2687
The symbolic constants are different from the numerical constants, before the
3750
The symbolic constants are different from the numerical constants, before the
2688
assembly process they are replaced with their values everywhere in source
3751
assembly process they are replaced with their values everywhere in source
2689
lines after their definitions, and anything can become their values.
3752
lines after their definitions, and anything can become their values.
2690
  The definition of symbolic constant consists of name of the constant
3753
  The definition of symbolic constant consists of name of the constant
2691
followed by the "equ" directive. Everything that follows this directive will
3754
followed by the "equ" directive. Everything that follows this directive will
2692
become the value of constant. If the value of symbolic constant contains
3755
become the value of constant. If the value of symbolic constant contains
2693
other symbolic constants, they are replaced with their values before assigning
3756
other symbolic constants, they are replaced with their values before assigning
2694
this value to the new constant. For example:
3757
this value to the new constant. For example:
2695
 
3758
 
2696
    d equ dword
3759
    d equ dword
2697
    NULL equ d 0
3760
    NULL equ d 0
2698
    d equ edx
3761
    d equ edx
2699
 
3762
 
2700
After these three definitions the value of "NULL" constant is "dword 0" and
3763
After these three definitions the value of "NULL" constant is "dword 0" and
2701
the value of "d" is "edx". So, for example, "push NULL" will be assembled as
3764
the value of "d" is "edx". So, for example, "push NULL" will be assembled as
2702
"push dword 0" and "push d" will be assembled as "push edx". And if then the
3765
"push dword 0" and "push d" will be assembled as "push edx". And if then the
2703
following line was put:
3766
following line was put:
2704
 
3767
 
2705
    d equ d,eax
3768
    d equ d,eax
2706
 
3769
 
2707
the "d" constant would get the new value of "edx,eax". This way the growing
3770
the "d" constant would get the new value of "edx,eax". This way the growing
2708
lists of symbols can be defined.
3771
lists of symbols can be defined.
2709
  "restore" directive allows to get back previous value of redefined symbolic
3772
  "restore" directive allows to get back previous value of redefined symbolic
2710
constant. It should be followed by one more names of symbolic constants,
3773
constant. It should be followed by one more names of symbolic constants,
2711
separated with commas. So "restore d" after the above definitions will give
3774
separated with commas. So "restore d" after the above definitions will give
2712
"d" constant back the value "edx", the second one will restore it to value
3775
"d" constant back the value "edx", the second one will restore it to value
2713
"dword", and one more will revert "d" to original meaning as if no such
3776
"dword", and one more will revert "d" to original meaning as if no such
2714
constant was defined. If there was no constant defined of given name,
3777
constant was defined. If there was no constant defined of given name,
2715
"restore" won't cause an error, it will be just ignored.
3778
"restore" will not cause an error, it will be just ignored.
2716
  Symbolic constant can be used to adjust the syntax of assembler to personal
3779
  Symbolic constant can be used to adjust the syntax of assembler to personal
2717
preferences. For example the following set of definitions provides the handy
3780
preferences. For example the following set of definitions provides the handy
2718
shortcuts for all the size operators:
3781
shortcuts for all the size operators:
2719
 
3782
 
2720
    b equ byte
3783
    b equ byte
2721
    w equ word
3784
    w equ word
2722
    d equ dword
3785
    d equ dword
2723
    p equ pword
3786
    p equ pword
2724
    f equ fword
3787
    f equ fword
2725
    q equ qword
3788
    q equ qword
2726
    t equ tword
3789
    t equ tword
2727
    x equ dqword
3790
    x equ dqword
2728
 
3791
    y equ qqword
-
 
3792
 
2729
  Because symbolic constant may also have an empty value, it can be used to
3793
  Because symbolic constant may also have an empty value, it can be used to
2730
allow the syntax with "offset" word before any address value:
3794
allow the syntax with "offset" word before any address value:
2731
 
3795
 
2732
    offset equ
3796
    offset equ
2733
 
3797
 
2734
After this definition "mov ax,offset char" will be valid construction for
3798
After this definition "mov ax,offset char" will be valid construction for
2735
copying the offset of "char" variable into "ax" register, because "offset" is
3799
copying the offset of "char" variable into "ax" register, because "offset" is
2736
replaced with an empty value, and therefore ignored.
3800
replaced with an empty value, and therefore ignored.
2737
  The "define" directive followed by the name of constant and then the value,
3801
  The "define" directive followed by the name of constant and then the value,
2738
is the alternative way of defining symbolic constant. The only difference
3802
is the alternative way of defining symbolic constant. The only difference
2739
between "define" and "equ" is that "define" assigns the value as it is, it does
3803
between "define" and "equ" is that "define" assigns the value as it is, it does
2740
not replace the symbolic constants with their values inside it.
3804
not replace the symbolic constants with their values inside it.
2741
  Symbolic constants can also be defined with the "fix" directive, which has
3805
  Symbolic constants can also be defined with the "fix" directive, which has
2742
the same syntax as "equ", but defines constants of high priority - they are
3806
the same syntax as "equ", but defines constants of high priority - they are
2743
replaced with their symbolic values even before processing the preprocessor
3807
replaced with their symbolic values even before processing the preprocessor
2744
directives and macroinstructions, the only exception is "fix" directive
3808
directives and macroinstructions, the only exception is "fix" directive
2745
itself, which has the highest possible priority, so it allows redefinition of
3809
itself, which has the highest possible priority, so it allows redefinition of
2746
constants defined this way.
3810
constants defined this way.
2747
  The "fix" directive can be used for syntax adjustments related to directives
3811
  The "fix" directive can be used for syntax adjustments related to directives
2748
of preprocessor, what cannot be done with "equ" directive. For example:
3812
of preprocessor, what cannot be done with "equ" directive. For example:
2749
 
3813
 
2750
    incl fix include
3814
    incl fix include
2751
 
3815
 
2752
defines a short name for "include" directive, while the similar definition done
3816
defines a short name for "include" directive, while the similar definition done
2753
with "equ" directive wouldn't give such result, as standard symbolic constants
3817
with "equ" directive wouldn't give such result, as standard symbolic constants
2754
are replaced with their values after searching the line for preprocessor
3818
are replaced with their values after searching the line for preprocessor
2755
directives.
3819
directives.
2756
 
3820
 
2757
 
3821
 
2758
2.3.3  Macroinstructions
3822
2.3.3  Macroinstructions
2759
 
3823
 
2760
"macro" directive allows you to define your own complex instructions, called
3824
"macro" directive allows you to define your own complex instructions, called
2761
macroinstructions, using which can greatly simplify the process of
3825
macroinstructions, using which can greatly simplify the process of
2762
programming. In its simplest form it's similar to symbolic constant
3826
programming. In its simplest form it's similar to symbolic constant
2763
definition. For example the following definition defines a shortcut for the
3827
definition. For example the following definition defines a shortcut for the
2764
"test al,0xFF" instruction:
3828
"test al,0xFF" instruction:
2765
 
3829
 
2766
    macro tst {test al,0xFF}
3830
    macro tst {test al,0xFF}
2767
 
3831
 
2768
After the "macro" directive there is a name of macroinstruction and then its
3832
After the "macro" directive there is a name of macroinstruction and then its
2769
contents enclosed between the "{" and "}" characters. You can use "tst"
3833
contents enclosed between the "{" and "}" characters. You can use "tst"
2770
instruction anywhere after this definition and it will be assembled as
3834
instruction anywhere after this definition and it will be assembled as
2771
"test al,0xFF". Defining symbolic constant "tst" of that value would give the
3835
"test al,0xFF". Defining symbolic constant "tst" of that value would give the
2772
similar result, but the difference is that the name of macroinstruction is
3836
similar result, but the difference is that the name of macroinstruction is
2773
recognized only as an instruction mnemonic. Also, macroinstructions are
3837
recognized only as an instruction mnemonic. Also, macroinstructions are
2774
replaced with corresponding code even before the symbolic constants are
3838
replaced with corresponding code even before the symbolic constants are
2775
replaced with their values. So if you define macroinstruction and symbolic
3839
replaced with their values. So if you define macroinstruction and symbolic
2776
constant of the same name, and use this name as an instruction mnemonic, it
3840
constant of the same name, and use this name as an instruction mnemonic, it
2777
will be replaced with the contents of macroinstruction, but it will be
3841
will be replaced with the contents of macroinstruction, but it will be
2778
replaced with value if symbolic constant if used somewhere inside the
3842
replaced with value if symbolic constant if used somewhere inside the
2779
operands.
3843
operands.
2780
  The definition of macroinstruction can consist of many lines, because
3844
  The definition of macroinstruction can consist of many lines, because
2781
"{" and "}" characters don't have to be in the same line as "macro" directive.
3845
"{" and "}" characters don't have to be in the same line as "macro" directive.
2782
For example:
3846
For example:
2783
 
3847
 
2784
    macro stos0
3848
    macro stos0
2785
     {
3849
     {
2786
        xor al,al
3850
        xor al,al
2787
        stosb
3851
        stosb
2788
     }
3852
     }
2789
 
3853
 
2790
The macroinstruction "stos0" will be replaced with these two assembly
3854
The macroinstruction "stos0" will be replaced with these two assembly
2791
instructions anywhere it's used.
3855
instructions anywhere it's used.
2792
  Like instructions which needs some number of operands, the macroinstruction
3856
  Like instructions which needs some number of operands, the macroinstruction
2793
can be defined to need some number of arguments separated with commas. The
3857
can be defined to need some number of arguments separated with commas. The
2794
names of needed argument should follow the name of macroinstruction in the
3858
names of needed argument should follow the name of macroinstruction in the
2795
line of "macro" directive and should be separated with commas if there is more
3859
line of "macro" directive and should be separated with commas if there is more
2796
than one. Anywhere one of these names occurs in the contents of
3860
than one. Anywhere one of these names occurs in the contents of
2797
macroinstruction, it will be replaced with corresponding value, provided when
3861
macroinstruction, it will be replaced with corresponding value, provided when
2798
the macroinstruction is used. Here is an example of a macroinstruction that
3862
the macroinstruction is used. Here is an example of a macroinstruction that
2799
will do data alignment for binary output format:
3863
will do data alignment for binary output format:
2800
 
3864
 
2801
    macro align value { rb (value-1)-($+value-1) mod value }
3865
    macro align value { rb (value-1)-($+value-1) mod value }
2802
 
3866
 
2803
When the "align 4" instruction is found after this macroinstruction is
3867
When the "align 4" instruction is found after this macroinstruction is
2804
defined, it will be replaced with contents of this macroinstruction, and the
3868
defined, it will be replaced with contents of this macroinstruction, and the
2805
"value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
3869
"value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
2806
  If a macroinstruction is defined that uses an instruction with the same name
3870
  If a macroinstruction is defined that uses an instruction with the same name
2807
inside its definition, the previous meaning of this name is used. Useful
3871
inside its definition, the previous meaning of this name is used. Useful
2808
redefinition of macroinstructions can be done in that way, for example:
3872
redefinition of macroinstructions can be done in that way, for example:
2809
 
3873
 
2810
    macro mov op1,op2
3874
    macro mov op1,op2
2811
     {
3875
     {
2812
      if op1 in  & op2 in 
3876
      if op1 in  & op2 in 
2813
        push  op2
3877
        push  op2
2814
        pop   op1
3878
        pop   op1
2815
      else
3879
      else
2816
        mov   op1,op2
3880
        mov   op1,op2
2817
      end if
3881
      end if
2818
     }
3882
     }
2819
 
3883
 
2820
This macroinstruction extends the syntax of "mov" instruction, allowing both
3884
This macroinstruction extends the syntax of "mov" instruction, allowing both
2821
operands to be segment registers. For example "mov ds,es" will be assembled as
3885
operands to be segment registers. For example "mov ds,es" will be assembled as
2822
"push es" and "pop ds". In all other cases the standard "mov" instruction will
3886
"push es" and "pop ds". In all other cases the standard "mov" instruction will
2823
be used. The syntax of this "mov" can be extended further by defining next
3887
be used. The syntax of this "mov" can be extended further by defining next
2824
macroinstruction of that name, which will use the previous macroinstruction:
3888
macroinstruction of that name, which will use the previous macroinstruction:
2825
 
3889
 
2826
    macro mov op1,op2,op3
3890
    macro mov op1,op2,op3
2827
     {
3891
     {
2828
      if op3 eq
3892
      if op3 eq
2829
        mov   op1,op2
3893
        mov   op1,op2
2830
      else
3894
      else
2831
        mov   op1,op2
3895
        mov   op1,op2
2832
        mov   op2,op3
3896
        mov   op2,op3
2833
      end if
3897
      end if
2834
     }
3898
     }
2835
 
3899
 
2836
It allows "mov" instruction to have three operands, but it can still have two
3900
It allows "mov" instruction to have three operands, but it can still have two
2837
operands only, because when macroinstruction is given less arguments than it
3901
operands only, because when macroinstruction is given less arguments than it
2838
needs, the rest of arguments will have empty values. When three operands are
3902
needs, the rest of arguments will have empty values. When three operands are
2839
given, this macroinstruction will become two macroinstructions of the previous
3903
given, this macroinstruction will become two macroinstructions of the previous
2840
definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
3904
definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
2841
"mov ds,dx".
3905
"mov ds,dx".
2842
  By placing the "*" after the name of argument you can mark the argument as
3906
  By placing the "*" after the name of argument you can mark the argument as
2843
required - preprocessor won't allow it to have an empty value. For example the
3907
required - preprocessor will not allow it to have an empty value. For example 
2844
above macroinstruction could be declared as "macro mov op1*,op2*,op3" to make
3908
the above macroinstruction could be declared as "macro mov op1*,op2*,op3" to 
2845
sure that first two arguments will always have to be given some non empty
3909
make sure that first two arguments will always have to be given some non empty
2846
values.
3910
values.
2847
  When it's needed to provide macroinstruction with argument that contains
3911
  Alternatively, you can provide the default value for argument, by placing
-
 
3912
the "=" followed by value after the name of argument. Then if the argument
-
 
3913
has an empty value provided, the default value will be used instead.
-
 
3914
  When it's needed to provide macroinstruction with argument that contains
2848
some commas, such argument should be enclosed between "<" and ">" characters.
3915
some commas, such argument should be enclosed between "<" and ">" characters.
2849
If it contains more than one "<" character, the same number of ">" should be
3916
If it contains more than one "<" character, the same number of ">" should be
2850
used to tell that the value of argument ends.
3917
used to tell that the value of argument ends.
2851
  "purge" directive allows removing the last definition of specified
3918
  "purge" directive allows removing the last definition of specified
2852
macroinstruction. It should be followed by one or more names of
3919
macroinstruction. It should be followed by one or more names of
2853
macroinstructions, separated with commas. If such macroinstruction has not
3920
macroinstructions, separated with commas. If such macroinstruction has not
2854
been defined, you won't get any error. For example after having the syntax of
3921
been defined, you will not get any error. For example after having the syntax
2855
"mov" extended with the macroinstructions defined above, you can disable
3922
of "mov" extended with the macroinstructions defined above, you can disable
2856
syntax with three operands back by using "purge mov" directive. Next
3923
syntax with three operands back by using "purge mov" directive. Next
2857
"purge mov" will disable also syntax for two operands being segment registers,
3924
"purge mov" will disable also syntax for two operands being segment registers,
2858
and all the next such directives will do nothing.
3925
and all the next such directives will do nothing.
2859
  If after the "macro" directive you enclose some group of arguments' names in
3926
  If after the "macro" directive you enclose some group of arguments' names in
2860
square brackets, it will allow giving more values for this group of arguments
3927
square brackets, it will allow giving more values for this group of arguments
2861
when using that macroinstruction. Any more argument given after the last
3928
when using that macroinstruction. Any more argument given after the last
2862
argument of such group will begin the new group and will become the first
3929
argument of such group will begin the new group and will become the first
2863
argument of it. That's why after closing the square bracket no more argument
3930
argument of it. That's why after closing the square bracket no more argument
2864
names can follow. The contents of macroinstruction will be processed for each
3931
names can follow. The contents of macroinstruction will be processed for each
2865
such group of arguments separately. The simplest example is to enclose one
3932
such group of arguments separately. The simplest example is to enclose one
2866
argument name in square brackets:
3933
argument name in square brackets:
2867
 
3934
 
2868
    macro stoschar [char]
3935
    macro stoschar [char]
2869
     {
3936
     {
2870
        mov al,char
3937
        mov al,char
2871
        stosb
3938
        stosb
2872
     }
3939
     }
2873
 
3940
 
2874
This macroinstruction accepts unlimited number of arguments, and each one
3941
This macroinstruction accepts unlimited number of arguments, and each one
2875
will be processed into these two instructions separately. For example
3942
will be processed into these two instructions separately. For example
2876
"stoschar 1,2,3" will be assembled as the following instructions:
3943
"stoschar 1,2,3" will be assembled as the following instructions:
2877
 
3944
 
2878
    mov al,1
3945
    mov al,1
2879
    stosb
3946
    stosb
2880
    mov al,2
3947
    mov al,2
2881
    stosb
3948
    stosb
2882
    mov al,3
3949
    mov al,3
2883
    stosb
3950
    stosb
2884
 
3951
 
2885
  There are some special directives available only inside the definitions of
3952
  There are some special directives available only inside the definitions of
2886
macroinstructions. "local" directive defines local names, which will be
3953
macroinstructions. "local" directive defines local names, which will be
2887
replaced with unique values each time the macroinstruction is used. It should
3954
replaced with unique values each time the macroinstruction is used. It should
2888
be followed by names separated with commas. If the name given as parameter to
3955
be followed by names separated with commas. If the name given as parameter to
2889
"local" directive begins with a dot or two dots, the unique labels generated
3956
"local" directive begins with a dot or two dots, the unique labels generated
2890
by each evaluation of macroinstruction will have the same properties.
3957
by each evaluation of macroinstruction will have the same properties.
2891
This directive is usually needed for the constants or labels that
3958
This directive is usually needed for the constants or labels that
2892
macroinstruction defines and uses internally. For example:
3959
macroinstruction defines and uses internally. For example:
2893
 
3960
 
2894
    macro movstr
3961
    macro movstr
2895
     {
3962
     {
2896
        local move
3963
        local move
2897
      move:
3964
      move:
2898
        lodsb
3965
        lodsb
2899
        stosb
3966
        stosb
2900
        test al,al
3967
        test al,al
2901
        jnz move
3968
        jnz move
2902
     }
3969
     }
2903
 
3970
 
2904
Each time this macroinstruction is used, "move" will become other unique name
3971
Each time this macroinstruction is used, "move" will become other unique name
2905
in its instructions, so you won't get an error you normally get when some
3972
in its instructions, so you will not get an error you normally get when some
2906
label is defined more than once.
3973
label is defined more than once.
2907
  "forward", "reverse" and "common" directives divide macroinstruction into
3974
  "forward", "reverse" and "common" directives divide macroinstruction into
2908
blocks, each one processed after the processing of previous is finished. They
3975
blocks, each one processed after the processing of previous is finished. They
2909
differ in behavior only if macroinstruction allows multiple groups of
3976
differ in behavior only if macroinstruction allows multiple groups of
2910
arguments. Block of instructions that follows "forward" directive is processed
3977
arguments. Block of instructions that follows "forward" directive is processed
2911
for each group of arguments, from first to last - exactly like the default
3978
for each group of arguments, from first to last - exactly like the default
2912
block (not preceded by any of these directives). Block that follows "reverse"
3979
block (not preceded by any of these directives). Block that follows "reverse"
2913
directive is processed for each group of argument in reverse order - from last
3980
directive is processed for each group of argument in reverse order - from last
2914
to first. Block that follows "common" directive is processed only once,
3981
to first. Block that follows "common" directive is processed only once,
2915
commonly for all groups of arguments. Local name defined in one of the blocks
3982
commonly for all groups of arguments. Local name defined in one of the blocks
2916
is available in all the following blocks when processing the same group of
3983
is available in all the following blocks when processing the same group of
2917
arguments as when it was defined, and when it is defined in common block it is
3984
arguments as when it was defined, and when it is defined in common block it is
2918
available in all the following blocks not depending on which group of
3985
available in all the following blocks not depending on which group of
2919
arguments is processed.
3986
arguments is processed.
2920
  Here is an example of macroinstruction that will create the table of
3987
  Here is an example of macroinstruction that will create the table of
2921
addresses to strings followed by these strings:
3988
addresses to strings followed by these strings:
2922
 
3989
 
2923
    macro strtbl name,[string]
3990
    macro strtbl name,[string]
2924
     {
3991
     {
2925
      common
3992
      common
2926
        label name dword
3993
        label name dword
2927
      forward
3994
      forward
2928
        local label
3995
        local label
2929
        dd label
3996
        dd label
2930
      forward
3997
      forward
2931
        label db string,0
3998
        label db string,0
2932
     }
3999
     }
2933
 
4000
 
2934
First argument given to this macroinstruction will become the label for table
4001
First argument given to this macroinstruction will become the label for table
2935
of addresses, next arguments should be the strings. First block is processed
4002
of addresses, next arguments should be the strings. First block is processed
2936
only once and defines the label, second block for each string declares its
4003
only once and defines the label, second block for each string declares its
2937
local name and defines the table entry holding the address to that string.
4004
local name and defines the table entry holding the address to that string.
2938
Third block defines the data of each string with the corresponding label.
4005
Third block defines the data of each string with the corresponding label.
2939
  The directive starting the block in macroinstruction can be followed by the
4006
  The directive starting the block in macroinstruction can be followed by the
2940
first instruction of this block in the same line, like in the following
4007
first instruction of this block in the same line, like in the following
2941
example:
4008
example:
2942
 
4009
 
2943
    macro stdcall proc,[arg]
4010
    macro stdcall proc,[arg]
2944
     {
4011
     {
2945
      reverse push arg
4012
      reverse push arg
2946
      common call proc
4013
      common call proc
2947
     }
4014
     }
2948
 
4015
 
2949
This macroinstruction can be used for calling the procedures using STDCALL
4016
This macroinstruction can be used for calling the procedures using STDCALL
2950
convention, arguments are pushed on stack in the reverse order. For example
4017
convention, which has all the arguments pushed on stack in the reverse order. 
2951
"stdcall foo,1,2,3" will be assembled as:
4018
For example "stdcall foo,1,2,3" will be assembled as:
2952
 
4019
 
2953
    push 3
4020
    push 3
2954
    push 2
4021
    push 2
2955
    push 1
4022
    push 1
2956
    call foo
4023
    call foo
2957
 
4024
 
2958
  If some name inside macroinstruction has multiple values (it is either one
4025
  If some name inside macroinstruction has multiple values (it is either one
2959
of the arguments enclosed in square brackets or local name defined in the
4026
of the arguments enclosed in square brackets or local name defined in the
2960
block following "forward" or "reverse" directive) and is used in block
4027
block following "forward" or "reverse" directive) and is used in block
2961
following the "common" directive, it will be replaced with all of its values,
4028
following the "common" directive, it will be replaced with all of its values,
2962
separated with commas. For example the following macroinstruction will pass
4029
separated with commas. For example the following macroinstruction will pass
2963
all of the additional arguments to the previously defined "stdcall"
4030
all of the additional arguments to the previously defined "stdcall"
2964
macroinstruction:
4031
macroinstruction:
2965
 
4032
 
2966
    macro invoke proc,[arg]
4033
    macro invoke proc,[arg]
2967
     { common stdcall [proc],arg }
4034
     { common stdcall [proc],arg }
2968
 
4035
 
2969
It can be used to call indirectly (by the pointer stored in memory) the
4036
It can be used to call indirectly (by the pointer stored in memory) the
2970
procedure using STDCALL convention.
4037
procedure using STDCALL convention.
2971
  Inside macroinstruction also special operator "#" can be used. This
4038
  Inside macroinstruction also special operator "#" can be used. This
2972
operator causes two names to be concatenated into one name. It can be useful,
4039
operator causes two names to be concatenated into one name. It can be useful,
2973
because it's done after the arguments and local names are replaced with their
4040
because it's done after the arguments and local names are replaced with their
2974
values. The following macroinstruction will generate the conditional jump
4041
values. The following macroinstruction will generate the conditional jump
2975
according to the "cond" argument:
4042
according to the "cond" argument:
2976
 
4043
 
2977
    macro jif op1,cond,op2,label
4044
    macro jif op1,cond,op2,label
2978
     {
4045
     {
2979
        cmp op1,op2
4046
        cmp op1,op2
2980
        j#cond label
4047
        j#cond label
2981
     }
4048
     }
2982
 
4049
 
2983
For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
4050
For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
2984
"jae exit" instructions.
4051
"jae exit" instructions.
2985
  The "#" operator can be also used to concatenate two quoted strings into one.
4052
  The "#" operator can be also used to concatenate two quoted strings into one.
2986
Also conversion of name into a quoted string is possible, with the "`" operator,
4053
Also conversion of name into a quoted string is possible, with the "`" operator,
2987
which likewise can be used inside the macroinstruction. It convert the name
4054
which likewise can be used inside the macroinstruction. It converts the name
2988
that follows it into a quoted string - but note, that when it is followed by
4055
that follows it into a quoted string - but note, that when it is followed by
2989
a macro argument which is being replaced with value containing more than one
4056
a macro argument which is being replaced with value containing more than one
2990
symbol, only the first of them will be converted, as the "`" operator converts
4057
symbol, only the first of them will be converted, as the "`" operator converts
2991
only one symbol that immediately follows it. Here's an example of utilizing
4058
only one symbol that immediately follows it. Here's an example of utilizing
2992
those two features:
4059
those two features:
2993
 
4060
 
2994
    macro label name
4061
    macro label name
2995
     {
4062
     {
2996
        label name
4063
        label name
2997
        if ~ used name
4064
        if ~ used name
2998
          display `name # " is defined but not used.",13,10
4065
          display `name # " is defined but not used.",13,10
2999
        end if
4066
        end if
3000
     }
4067
     }
3001
 
4068
 
3002
When label defined with such macro is not used in the source, macro will warn
4069
When label defined with such macro is not used in the source, macro will warn
3003
you with the message, informing to which label it applies.
4070
you with the message, informing to which label it applies.
3004
  To make macroinstruction behaving differently when some of the arguments are
4071
  To make macroinstruction behaving differently when some of the arguments are
3005
of some special type, for example a quoted strings, you can use "eqtype"
4072
of some special type, for example a quoted strings, you can use "eqtype"
3006
comparison operator. Here's an example of utilizing it to distinguish a
4073
comparison operator. Here's an example of utilizing it to distinguish a
3007
quoted string from an other argument:
4074
quoted string from an other argument:
3008
 
4075
 
3009
    macro message arg
4076
    macro message arg
3010
     {
4077
     {
3011
      if arg eqtype ""
4078
      if arg eqtype ""
3012
        local str
4079
        local str
3013
        jmp   @f
4080
        jmp   @f
3014
        str   db arg,0Dh,0Ah,24h
4081
        str   db arg,0Dh,0Ah,24h
3015
        @@:
4082
        @@:
3016
        mov   dx,str
4083
        mov   dx,str
3017
      else
4084
      else
3018
        mov   dx,arg
4085
        mov   dx,arg
3019
      end if
4086
      end if
3020
        mov   ah,9
4087
        mov   ah,9
3021
        int   21h
4088
        int   21h
3022
     }
4089
     }
3023
 
4090
 
3024
The above macro is designed for displaying messages in DOS programs. When the
4091
The above macro is designed for displaying messages in DOS programs. When the
3025
argument of this macro is some number, label, or variable, the string from
4092
argument of this macro is some number, label, or variable, the string from
3026
that address is displayed, but when the argument is a quoted string, the
4093
that address is displayed, but when the argument is a quoted string, the
3027
created code will display that string followed by the carriage return and
4094
created code will display that string followed by the carriage return and
3028
line feed.
4095
line feed.
3029
  It is also possible to put a declaration of macroinstruction inside another
4096
  It is also possible to put a declaration of macroinstruction inside another
3030
macroinstruction, so one macro can define another, but there is a problem
4097
macroinstruction, so one macro can define another, but there is a problem
3031
with such definitions caused by the fact, that "}" character cannot occur
4098
with such definitions caused by the fact, that "}" character cannot occur
3032
inside the macroinstruction, as it always means the end of definition. To
4099
inside the macroinstruction, as it always means the end of definition. To
3033
overcome this problem, the escaping of symbols inside macroinstruction can be
4100
overcome this problem, the escaping of symbols inside macroinstruction can be
3034
used. This is done by placing one or more backslashes in front of any other
4101
used. This is done by placing one or more backslashes in front of any other
3035
symbol (even the special character). Preprocessor sees such sequence as a
4102
symbol (even the special character). Preprocessor sees such sequence as a
3036
single symbol, but each time it meets such symbol during the macroinstruction
4103
single symbol, but each time it meets such symbol during the macroinstruction
3037
processing, it cuts the backslash character from the front of it. For example
4104
processing, it cuts the backslash character from the front of it. For example
3038
"\{" is treated as single symbol, but during processing of the macroinstruction
4105
"\{" is treated as single symbol, but during processing of the macroinstruction
3039
it becomes the "{" symbol. This allows to put one definition of
4106
it becomes the "{" symbol. This allows to put one definition of
3040
macroinstruction inside another:
4107
macroinstruction inside another:
3041
 
4108
 
3042
    macro ext instr
4109
    macro ext instr
3043
     {
4110
     {
3044
      macro instr op1,op2,op3
4111
      macro instr op1,op2,op3
3045
       \{
4112
       \{
3046
        if op3 eq
4113
        if op3 eq
3047
          instr op1,op2
4114
          instr op1,op2
3048
        else
4115
        else
3049
          instr op1,op2
4116
          instr op1,op2
3050
          instr op2,op3
4117
          instr op2,op3
3051
        end if
4118
        end if
3052
       \}
4119
       \}
3053
     }
4120
     }
3054
 
4121
 
3055
    ext add
4122
    ext add
3056
    ext sub
4123
    ext sub
3057
 
4124
 
3058
The macro "ext" is defined correctly, but when it is used, the "\{" and "\}"
4125
The macro "ext" is defined correctly, but when it is used, the "\{" and "\}"
3059
become the "{" and "}" symbols. So when the "ext add" is processed, the
4126
become the "{" and "}" symbols. So when the "ext add" is processed, the
3060
contents of macro becomes valid definition of a macroinstruction and this way
4127
contents of macro becomes valid definition of a macroinstruction and this way
3061
the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
4128
the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
3062
macro. The use of "\{" symbol wasn't really necessary here, but is done this
4129
macro. The use of "\{" symbol wasn't really necessary here, but is done this
3063
way to make the definition more clear.
4130
way to make the definition more clear.
3064
  If some directives specific to macroinstructions, like "local" or "common"
4131
  If some directives specific to macroinstructions, like "local" or "common"
3065
are needed inside some macro embedded this way, they can be escaped in the same
4132
are needed inside some macro embedded this way, they can be escaped in the same
3066
way. Escaping the symbol with more than one backslash is also allowed, which
4133
way. Escaping the symbol with more than one backslash is also allowed, which
3067
allows multiple levels of nesting the macroinstruction definitions.
4134
allows multiple levels of nesting the macroinstruction definitions.
3068
  The another technique for defining one macroinstruction by another is to
4135
  The another technique for defining one macroinstruction by another is to
3069
use the "fix" directive, which becomes useful when some macroinstruction only
4136
use the "fix" directive, which becomes useful when some macroinstruction only
3070
begins the definition of another one, without closing it. For example:
4137
begins the definition of another one, without closing it. For example:
3071
 
4138
 
3072
    macro tmacro [params]
4139
    macro tmacro [params]
3073
     {
4140
     {
3074
      common macro params {
4141
      common macro params {
3075
     }
4142
     }
3076
 
4143
 
3077
    MACRO fix tmacro
4144
    MACRO fix tmacro
3078
    ENDM fix }
4145
    ENDM fix }
3079
 
4146
 
3080
defines an alternative syntax for defining macroinstructions, which looks like:
4147
defines an alternative syntax for defining macroinstructions, which looks like:
3081
 
4148
 
3082
    MACRO stoschar char
4149
    MACRO stoschar char
3083
        mov al,char
4150
        mov al,char
3084
        stosb
4151
        stosb
3085
    ENDM
4152
    ENDM
3086
 
4153
 
3087
Note that symbol that has such customized definition must be defined with "fix"
4154
Note that symbol that has such customized definition must be defined with "fix"
3088
directive, because only the prioritized symbolic constants are processed before
4155
directive, because only the prioritized symbolic constants are processed before
3089
the preprocessor looks for the "}" character while defining the macro. This
4156
the preprocessor looks for the "}" character while defining the macro. This
3090
might be a problem if one needed to perform some additional tasks one the end
4157
might be a problem if one needed to perform some additional tasks one the end
3091
of such definition, but there is one more feature which helps in such cases.
4158
of such definition, but there is one more feature which helps in such cases.
3092
Namely it is possible to put any directive, instruction or  macroinstruction
4159
Namely it is possible to put any directive, instruction or  macroinstruction
3093
just after the "}" character that ends the macroinstruction and it will be
4160
just after the "}" character that ends the macroinstruction and it will be
3094
processed in the same way as if it was put in the next line.
4161
processed in the same way as if it was put in the next line.
3095
 
4162
 
3096
 
4163
 
3097
2.3.4  Structures
4164
2.3.4  Structures
3098
 
4165
 
3099
"struc" directive is a special variant of "macro" directive that is used to
4166
"struc" directive is a special variant of "macro" directive that is used to
3100
define data structures. Macroinstruction defined using the "struc" directive
4167
define data structures. Macroinstruction defined using the "struc" directive
3101
must be preceded by a label (like the data definition directive) when it's
4168
must be preceded by a label (like the data definition directive) when it's
3102
used. This label will be also attached at the beginning of every name starting
4169
used. This label will be also attached at the beginning of every name starting
3103
with dot in the contents of macroinstruction. The macroinstruction defined
4170
with dot in the contents of macroinstruction. The macroinstruction defined
3104
using the "struc" directive can have the same name as some other
4171
using the "struc" directive can have the same name as some other
3105
macroinstruction defined using the "macro" directive, structure
4172
macroinstruction defined using the "macro" directive, structure
3106
macroinstruction won't prevent the standard macroinstruction being processed
4173
macroinstruction will not prevent the standard macroinstruction from being 
3107
when there is no label before it and vice versa. All the rules and features
4174
processed when there is no label before it and vice versa. All the rules and 
3108
concerning standard macroinstructions apply to structure macroinstructions.
4175
features concerning standard macroinstructions apply to structure 
3109
  Here is the sample of structure macroinstruction:
4176
macroinstructions.
-
 
4177
  Here is the sample of structure macroinstruction:
3110
 
4178
 
3111
    struc point x,y
4179
    struc point x,y
3112
     {
4180
     {
3113
        .x dw x
4181
        .x dw x
3114
        .y dw y
4182
        .y dw y
3115
     }
4183
     }
3116
 
4184
 
3117
For example "my point 7,11" will define structure labeled "my", consisting of
4185
For example "my point 7,11" will define structure labeled "my", consisting of
3118
two variables: "my.x" with value 7 and "my.y" with value 11.
4186
two variables: "my.x" with value 7 and "my.y" with value 11.
3119
  If somewhere inside the definition of structure the name consisting of a
4187
  If somewhere inside the definition of structure the name consisting of a
3120
single dot it found, it is replaced by the name of the label for the given
4188
single dot it found, it is replaced by the name of the label for the given
3121
instance of structure and this label will not be defined automatically in
4189
instance of structure and this label will not be defined automatically in
3122
such case, allowing to completely customize the definition. The following
4190
such case, allowing to completely customize the definition. The following
3123
example utilizes this feature to extend the data definition directive "db"
4191
example utilizes this feature to extend the data definition directive "db"
3124
with ability to calculate the size of defined data:
4192
with ability to calculate the size of defined data:
3125
 
4193
 
3126
    struc db [data]
4194
    struc db [data]
3127
     {
4195
     {
3128
       common
4196
       common
3129
        . db data
4197
        . db data
3130
        .size = $ - .
4198
        .size = $ - .
3131
     }
4199
     }
3132
 
4200
 
3133
With such definition "msg db 'Hello!',13,10" will define also "msg.size"
4201
With such definition "msg db 'Hello!',13,10" will define also "msg.size"
3134
constant, equal to the size of defined data in bytes.
4202
constant, equal to the size of defined data in bytes.
3135
  Defining data structures addressed by registers or absolute values should be
4203
  Defining data structures addressed by registers or absolute values should be
3136
done using the "virtual" directive with structure macroinstruction
4204
done using the "virtual" directive with structure macroinstruction
3137
(see 2.2.4).
4205
(see 2.2.4).
3138
  "restruc" directive removes the last definition of the structure, just like
4206
  "restruc" directive removes the last definition of the structure, just like
3139
"purge" does with macroinstructions and "restore" with symbolic constants.
4207
"purge" does with macroinstructions and "restore" with symbolic constants.
3140
It also has the same syntax - should be followed by one or more names of
4208
It also has the same syntax - should be followed by one or more names of
3141
structure macroinstructions, separated with commas.
4209
structure macroinstructions, separated with commas.
3142
 
4210
 
3143
 
4211
 
3144
2.3.5  Repeating macroinstructions
4212
2.3.5  Repeating macroinstructions
3145
 
4213
 
3146
The "rept" directive is a special kind of macroinstruction, which makes given
4214
The "rept" directive is a special kind of macroinstruction, which makes given
3147
amount of duplicates of the block enclosed with braces. The basic syntax is
4215
amount of duplicates of the block enclosed with braces. The basic syntax is
3148
"rept" directive followed by number (it cannot be an expression, since
4216
"rept" directive followed by number and then block of source enclosed between
3149
preprocessor doesn't do calculations, if you need repetitions based on values
-
 
3150
calculated by assembler, use one of the code repeating directives that are
-
 
3151
processed by assembler, see 2.2.3), and then block of source enclosed between
-
 
3152
the "{" and "}" characters. The simplest example:
4217
the "{" and "}" characters. The simplest example:
3153
 
4218
 
3154
    rept 5 { in al,dx }
4219
    rept 5 { in al,dx }
3155
 
4220
 
3156
will make five duplicates of the "in al,dx" line. The block of instructions
4221
will make five duplicates of the "in al,dx" line. The block of instructions
3157
is defined in the same way as for the standard macroinstruction and any
4222
is defined in the same way as for the standard macroinstruction and any
3158
special operators and directives which can be used only inside
4223
special operators and directives which can be used only inside
3159
macroinstructions are also allowed here. When the given count is zero, the
4224
macroinstructions are also allowed here. When the given count is zero, the
3160
block is simply skipped, as if you defined macroinstruction but never used
4225
block is simply skipped, as if you defined macroinstruction but never used
3161
it. The number of repetitions can be followed by the name of counter symbol,
4226
it. The number of repetitions can be followed by the name of counter symbol,
3162
which will get replaced symbolically with the number of duplicate currently
4227
which will get replaced symbolically with the number of duplicate currently
3163
generated. So this:
4228
generated. So this:
3164
 
4229
 
3165
    rept 3 counter
4230
    rept 3 counter
3166
     {
4231
     {
3167
        byte#counter db counter
4232
        byte#counter db counter
3168
     }
4233
     }
3169
 
4234
 
3170
will generate lines:
4235
will generate lines:
3171
 
4236
 
3172
    byte1 db 1
4237
    byte1 db 1
3173
    byte2 db 2
4238
    byte2 db 2
3174
    byte3 db 3
4239
    byte3 db 3
3175
 
4240
 
3176
The repetition mechanism applied to "rept" blocks is the same as the one used
4241
The repetition mechanism applied to "rept" blocks is the same as the one used
3177
to process multiple groups of arguments for macroinstructions, so directives
4242
to process multiple groups of arguments for macroinstructions, so directives
3178
like "forward", "common" and "reverse" can be used in their usual meaning.
4243
like "forward", "common" and "reverse" can be used in their usual meaning.
3179
Thus such macroinstruction:
4244
Thus such macroinstruction:
3180
 
4245
 
3181
    rept 7 num { reverse display `num }
4246
    rept 7 num { reverse display `num }
3182
 
4247
 
3183
will display digits from 7 to 1 as text. The "local" directive behaves in the
4248
will display digits from 7 to 1 as text. The "local" directive behaves in the
3184
same way as inside macroinstruction with multiple groups of arguments, so:
4249
same way as inside macroinstruction with multiple groups of arguments, so:
3185
 
4250
 
3186
    rept 21
4251
    rept 21
3187
     {
4252
     {
3188
       local label
4253
       local label
3189
       label: loop label
4254
       label: loop label
3190
     }
4255
     }
3191
 
4256
 
3192
will generate unique label for each duplicate.
4257
will generate unique label for each duplicate.
3193
  The counter symbol by default counts from 1, but you can declare different
4258
  The counter symbol by default counts from 1, but you can declare different
3194
base value by placing the number preceded by colon immediately after the name
4259
base value by placing the number preceded by colon immediately after the name
3195
of counter. For example:
4260
of counter. For example:
3196
 
4261
 
3197
    rept 8 n:0 { pxor xmm#n,xmm#n }
4262
    rept 8 n:0 { pxor xmm#n,xmm#n }
3198
 
4263
 
3199
will generate code which will clear the contents of eight SSE registers.
4264
will generate code which will clear the contents of eight SSE registers.
3200
You can define multiple counters separated with commas, and each one can have
4265
You can define multiple counters separated with commas, and each one can have
3201
different base.
4266
different base.
3202
  The "irp" directive iterates the single argument through the given list of
4267
  The number of repetitions and the base values for counters can be specified
-
 
4268
using the numerical expressions with operator rules identical as in the case
-
 
4269
of assembler. However each value used in such expression must either be a
-
 
4270
directly specified number, or a symbolic constant with value also being an
-
 
4271
expression that can be calculated by preprocessor (in such case the value
-
 
4272
of expression associated with symbolic constant is calculated first, and then
-
 
4273
substituted into the outer expression in place of that constant). If you need
-
 
4274
repetitions based on values that can only be calculated at assembly time, use
-
 
4275
one of the code repeating directives that are processed by assembler, see
-
 
4276
section 2.2.3.
-
 
4277
  The "irp" directive iterates the single argument through the given list of
3203
parameters. The syntax is "irp" followed by the argument name, then the comma
4278
parameters. The syntax is "irp" followed by the argument name, then the comma
3204
and then the list of parameters. The parameters are specified in the same
4279
and then the list of parameters. The parameters are specified in the same
3205
way like in the invocation of standard macroinstruction, so they have to be
4280
way like in the invocation of standard macroinstruction, so they have to be
3206
separated with commas and each one can be enclosed with the "<" and ">"
4281
separated with commas and each one can be enclosed with the "<" and ">"
3207
characters. Also the name of argument may be followed by "*" to mark that it
4282
characters. Also the name of argument may be followed by "*" to mark that it
3208
cannot get an empty value. Such block:
4283
cannot get an empty value. Such block:
3209
 
4284
 
3210
   irp value, 2,3,5
4285
   irp value, 2,3,5
3211
    { db value }
4286
    { db value }
3212
 
4287
 
3213
will generate lines:
4288
will generate lines:
3214
 
4289
 
3215
   db 2
4290
   db 2
3216
   db 3
4291
   db 3
3217
   db 5
4292
   db 5
3218
 
4293
 
3219
The "irps" directive iterates through the given list of symbols, it should
4294
The "irps" directive iterates through the given list of symbols, it should
3220
be followed by the argument name, then the comma and then the sequence of any
4295
be followed by the argument name, then the comma and then the sequence of any
3221
symbols. Each symbol in this sequence, no matter whether it is the name
4296
symbols. Each symbol in this sequence, no matter whether it is the name
3222
symbol, symbol character or quoted string, becomes an argument value for one
4297
symbol, symbol character or quoted string, becomes an argument value for one
3223
iteration. If there are no symbols following the comma, no iteration is done
4298
iteration. If there are no symbols following the comma, no iteration is done
3224
at all. This example:
4299
at all. This example:
3225
 
4300
 
3226
   irps reg, al bx ecx
4301
   irps reg, al bx ecx
3227
    { xor reg,reg }
4302
    { xor reg,reg }
3228
 
4303
 
3229
will generate lines:
4304
will generate lines:
3230
 
4305
 
3231
   xor al,al
4306
   xor al,al
3232
   xor bx,bx
4307
   xor bx,bx
3233
   xor ecx,ecx
4308
   xor ecx,ecx
3234
 
4309
 
3235
The blocks defined by the "irp" and "irps" directives are also processed in
4310
The blocks defined by the "irp" and "irps" directives are also processed in
3236
the same way as any macroinstructions, so operators and directives specific
4311
the same way as any macroinstructions, so operators and directives specific
3237
to macroinstructions may be freely used also in this case.
4312
to macroinstructions may be freely used also in this case.
3238
 
4313
 
3239
 
4314
 
3240
2.3.6  Conditional preprocessing
4315
2.3.6  Conditional preprocessing
3241
 
4316
 
3242
"match" directive causes some block of source to be preprocessed and passed
4317
"match" directive causes some block of source to be preprocessed and passed
3243
to assembler only when the given sequence of symbols matches the specified
4318
to assembler only when the given sequence of symbols matches the specified
3244
pattern. The pattern comes first, ended with comma, then the symbols that have
4319
pattern. The pattern comes first, ended with comma, then the symbols that have
3245
to be matched with the pattern, and finally the block of source, enclosed
4320
to be matched with the pattern, and finally the block of source, enclosed
3246
within braces as macroinstruction.
4321
within braces as macroinstruction.
3247
  There are the few rules for building the expression for matching, first is
4322
  There are the few rules for building the expression for matching, first is
3248
that any of symbol characters and any quoted string should be matched exactly
4323
that any of symbol characters and any quoted string should be matched exactly
3249
as is. In this example:
4324
as is. In this example:
3250
 
4325
 
3251
    match +,+ { include 'first.inc' }
4326
    match +,+ { include 'first.inc' }
3252
    match +,- { include 'second.inc' }
4327
    match +,- { include 'second.inc' }
3253
 
4328
 
3254
the first file will get included, since "+" after comma matches the "+" in
4329
the first file will get included, since "+" after comma matches the "+" in
3255
pattern, and the second file won't be included, since there is no match.
4330
pattern, and the second file will not be included, since there is no match.
3256
  To match any other symbol literally, it has to be preceded by "=" character
4331
  To match any other symbol literally, it has to be preceded by "=" character
3257
in the pattern. Also to match the "=" character itself, or the comma, the
4332
in the pattern. Also to match the "=" character itself, or the comma, the
3258
"==" and "=," constructions have to be used. For example the "=a==" pattern
4333
"==" and "=," constructions have to be used. For example the "=a==" pattern
3259
will match the "a=" sequence.
4334
will match the "a=" sequence.
3260
  If some name symbol is placed in the pattern, it matches any sequence
4335
  If some name symbol is placed in the pattern, it matches any sequence
3261
consisting of at least one symbol and then this name is replaced with the
4336
consisting of at least one symbol and then this name is replaced with the
3262
matched sequence everywhere inside the following block, analogously to the
4337
matched sequence everywhere inside the following block, analogously to the
3263
parameters of macroinstruction. For instance:
4338
parameters of macroinstruction. For instance:
3264
 
4339
 
3265
    match a-b, 0-7
4340
    match a-b, 0-7
3266
     { dw a,b-a }
4341
     { dw a,b-a }
3267
 
4342
 
3268
will generate the "dw 0,7-0" instruction. Each name is always matched with
4343
will generate the "dw 0,7-0" instruction. Each name is always matched with
3269
as few symbols as possible, leaving the rest for the following ones, so in
4344
as few symbols as possible, leaving the rest for the following ones, so in
3270
this case:
4345
this case:
3271
 
4346
 
3272
    match a b, 1+2+3 { db a }
4347
    match a b, 1+2+3 { db a }
3273
 
4348
 
3274
the "a" name will match the "1" symbol, leaving the "+2+3" sequence to be
4349
the "a" name will match the "1" symbol, leaving the "+2+3" sequence to be
3275
matched with "b". But in this case:
4350
matched with "b". But in this case:
3276
 
4351
 
3277
    match a b, 1 { db a }
4352
    match a b, 1 { db a }
3278
 
4353
 
3279
there will be nothing left for "b" to match, so the block won't get processed
4354
there will be nothing left for "b" to match, so the block will not get 
3280
at all.
4355
processed at all.
3281
  The block of source defined by match is processed in the same way as any
4356
  The block of source defined by match is processed in the same way as any
3282
macroinstruction, so any operators specific to macroinstructions can be used
4357
macroinstruction, so any operators specific to macroinstructions can be used
3283
also in this case.
4358
also in this case.
3284
  What makes "match" directive more useful is the fact, that it replaces the
4359
  What makes "match" directive more useful is the fact, that it replaces the
3285
symbolic constants with their values in the matched sequence of symbols (that
4360
symbolic constants with their values in the matched sequence of symbols (that
3286
is everywhere after comma up to the beginning of the source block) before
4361
is everywhere after comma up to the beginning of the source block) before
3287
performing the match. Thanks to this it can be used for example to process
4362
performing the match. Thanks to this it can be used for example to process
3288
some block of source under the condition that some symbolic constant has the
4363
some block of source under the condition that some symbolic constant has the
3289
given value, like:
4364
given value, like:
3290
 
4365
 
3291
    match =TRUE, DEBUG { include 'debug.inc' }
4366
    match =TRUE, DEBUG { include 'debug.inc' }
3292
 
4367
 
3293
which will include the file only when the symbolic constant "DEBUG" was
4368
which will include the file only when the symbolic constant "DEBUG" was
3294
defined with value "TRUE".
4369
defined with value "TRUE".
3295
 
4370
 
3296
 
4371
 
3297
2.3.7  Order of processing
4372
2.3.7  Order of processing
3298
 
4373
 
3299
When combining various features of the preprocessor, it's important to know
4374
When combining various features of the preprocessor, it's important to know
3300
the order in which they are processed. As it was already noted, the highest
4375
the order in which they are processed. As it was already noted, the highest
3301
priority has the "fix" directive and the replacements defined with it. This
4376
priority has the "fix" directive and the replacements defined with it. This
3302
is done completely before doing any other preprocessing, therefore this
4377
is done completely before doing any other preprocessing, therefore this
3303
piece of source:
4378
piece of source:
3304
 
4379
 
3305
    V fix {
4380
    V fix {
3306
      macro empty
4381
      macro empty
3307
       V
4382
       V
3308
    V fix }
4383
    V fix }
3309
       V
4384
       V
3310
 
4385
 
3311
becomes a valid definition of an empty macroinstruction. It can be interpreted
4386
becomes a valid definition of an empty macroinstruction. It can be interpreted
3312
that the "fix" directive and prioritized symbolic constants are processed in
4387
that the "fix" directive and prioritized symbolic constants are processed in
3313
a separate stage, and all other preprocessing is done after on the resulting
4388
a separate stage, and all other preprocessing is done after on the resulting
3314
source.
4389
source.
3315
  The standard preprocessing that comes after, on each line begins with
4390
  The standard preprocessing that comes after, on each line begins with
3316
recognition of the first symbol. It begins with checking for the preprocessor
4391
recognition of the first symbol. It starts with checking for the preprocessor
3317
directives, and when none of them is detected, preprocessor checks whether the
4392
directives, and when none of them is detected, preprocessor checks whether the
3318
first symbol is macroinstruction. If no macroinstruction is found, it moves
4393
first symbol is macroinstruction. If no macroinstruction is found, it moves
3319
to the second symbol of line, and again begins with checking for directives,
4394
to the second symbol of line, and again begins with checking for directives,
3320
which in this case is only the "equ" directive, as this is the only one that
4395
which in this case is only the "equ" directive, as this is the only one that
3321
occurs as the second symbol in line. If there's no directive, the second
4396
occurs as the second symbol in line. If there is no directive, the second
3322
symbol is checked for the case of structure macroinstruction and when none
4397
symbol is checked for the case of structure macroinstruction and when none
3323
of those checks gives the positive result, the symbolic constants are replaced
4398
of those checks gives the positive result, the symbolic constants are replaced
3324
with their values and such line is passed to the assembler.
4399
with their values and such line is passed to the assembler.
3325
  To see it on the example, assume that there is defined the macroinstruction
4400
  To see it on the example, assume that there is defined the macroinstruction
3326
called "foo" and the structure macroinstruction called "bar". Those lines:
4401
called "foo" and the structure macroinstruction called "bar". Those lines:
3327
 
4402
 
3328
    foo equ
4403
    foo equ
3329
    foo bar
4404
    foo bar
3330
 
4405
 
3331
would be then both interpreted as invocations of macroinstruction "foo", since
4406
would be then both interpreted as invocations of macroinstruction "foo", since
3332
the meaning of the first symbol overrides the meaning of second one.
4407
the meaning of the first symbol overrides the meaning of second one.
3333
  The macroinstructions generate the new lines from their definition blocks,
4408
  When the macroinstruction generates the new lines from its definition block,
3334
replacing the parameters with their values and then processing the "#" and "`"
4409
in every line it first scans for macroinstruction directives, and interpretes
-
 
4410
them accordingly. All the other content in the definition block is used to
-
 
4411
brew the new lines, replacing the macroinstruction parameters with their values
-
 
4412
and then processing the symbol escaping and "#" and "`" operators. The
3335
operators. The conversion operator has the higher priority than concatenation.
4413
conversion operator has the higher priority than concatenation and if any of
3336
After this is completed, the newly generated line goes through the standard
4414
them operates on the escaped symbol, the escaping is cancelled before finishing
-
 
4415
the operation. After this is completed, the newly generated line goes through
3337
preprocessing, as described above.
4416
the standard preprocessing, as described above.
3338
  Though the symbolic constants are usually only replaced in the lines, where
4417
  Though the symbolic constants are usually only replaced in the lines, where
3339
no preprocessor directives nor macroinstructions has been found, there are some
4418
no preprocessor directives nor macroinstructions has been found, there are some
3340
special cases where those replacements are performed in the parts of lines
4419
special cases where those replacements are performed in the parts of lines
3341
containing directives. First one is the definition of symbolic constant, where
4420
containing directives. First one is the definition of symbolic constant, where
3342
the replacements are done everywhere after the "equ" keyword and the resulting
4421
the replacements are done everywhere after the "equ" keyword and the resulting
3343
value is then assigned to the new constant (see 2.3.2). The second such case
4422
value is then assigned to the new constant (see 2.3.2). The second such case
3344
is the "match" directive, where the replacements are done in the symbols
4423
is the "match" directive, where the replacements are done in the symbols
3345
following comma before matching them with pattern. These features can be used
4424
following comma before matching them with pattern. These features can be used
3346
for example to maintain the lists, like this set of definitions:
4425
for example to maintain the lists, like this set of definitions:
3347
 
4426
 
3348
    list equ
4427
    list equ
3349
 
4428
 
3350
    macro append item
4429
    macro append item
3351
     {
4430
     {
3352
       match any, list \{ list equ list,item \}
4431
       match any, list \{ list equ list,item \}
3353
       match , list \{ list equ item \}
4432
       match , list \{ list equ item \}
3354
     }
4433
     }
3355
 
4434
 
3356
The "list" constant is here initialized with empty value, and the "append"
4435
The "list" constant is here initialized with empty value, and the "append"
3357
macroinstruction can be used to add the new items into this list, separating
4436
macroinstruction can be used to add the new items into this list, separating
3358
them with commas. The first match in this macroinstruction occurs only when
4437
them with commas. The first match in this macroinstruction occurs only when
3359
the value of list is not empty (see 2.3.6), in such case the new value for the
4438
the value of list is not empty (see 2.3.6), in such case the new value for the
3360
list is the previous one with the comma and the new item appended at the end.
4439
list is the previous one with the comma and the new item appended at the end.
3361
The second match happens only when the list is still empty, and in such case
4440
The second match happens only when the list is still empty, and in such case
3362
the list is defined to contain just the new item. So starting with the empty
4441
the list is defined to contain just the new item. So starting with the empty
3363
list, the "append 1" would define "list equ 1" and the "append 2" following it
4442
list, the "append 1" would define "list equ 1" and the "append 2" following it
3364
would define "list equ 1,2". One might then need to use this list as the
4443
would define "list equ 1,2". One might then need to use this list as the
3365
parameters to some macroinstruction. But it cannot be done directly - if "foo"
4444
parameters to some macroinstruction. But it cannot be done directly - if "foo"
3366
is the macroinstruction, then "foo list" would just pass the "list" symbol
4445
is the macroinstruction, then "foo list" would just pass the "list" symbol
3367
as a parameter to macro, since symbolic constants are not unrolled at this
4446
as a parameter to macro, since symbolic constants are not unrolled at this
3368
stage. For this purpose again "match" directive comes in handy:
4447
stage. For this purpose again "match" directive comes in handy:
3369
 
4448
 
3370
    match params, list { foo params }
4449
    match params, list { foo params }
3371
 
4450
 
3372
The value of "list", if it's not empty, matches the "params" keyword, which is
4451
The value of "list", if it's not empty, matches the "params" keyword, which is
3373
then replaced with matched value when generating the new lines defined by the
4452
then replaced with matched value when generating the new lines defined by the
3374
block enclosed with braces. So if the "list" had value "1,2", the above line
4453
block enclosed with braces. So if the "list" had value "1,2", the above line
3375
would generate the line containing "foo 1,2", which would then go through the
4454
would generate the line containing "foo 1,2", which would then go through the
3376
standard preprocessing.
4455
standard preprocessing.
3377
  There is one more special case - when preprocessor goes to checking the
4456
  The other special case is in the parameters of "rept" directive. The amount
-
 
4457
of repetitions and the base value for counter can be specified using
-
 
4458
numerical expressions, and if there is a symbolic constant with non-numerical
-
 
4459
name used in such an expression, preprocessor tries to evaluate its value as 
-
 
4460
a numerical expression and if succeeds, it replaces the symbolic constant with 
-
 
4461
the result of that calculation and continues to evaluate the primary 
-
 
4462
expression. If the expression inside that symbolic constants also contains 
-
 
4463
some symbolic constants, preprocessor will try to calculate all the needed 
-
 
4464
values recursively. 
-
 
4465
  This allows to perform some calculations at the time of preprocessing, as
-
 
4466
long as all the values used are the numbers known at the preprocessing stage. 
-
 
4467
A single repetition with "rept" can be used for the sole purpose of 
-
 
4468
calculating some value, like in this example: 
-
 
4469
 
-
 
4470
    define a b+4
-
 
4471
    define b 3
-
 
4472
    rept 1 result:a*b+2 { define c result }
-
 
4473
    
-
 
4474
To compute the base value for "result" counter, preprocessor replaces the "b"
-
 
4475
with its value and recursively calculates the value of "a", obtaining 7 as
-
 
4476
the result, then it calculates the main expression with the result being 23.
-
 
4477
The "c" then gets defined with the first value of counter (because the block
-
 
4478
is processed just one time), which is the result of the computation, so the 
-
 
4479
value of "c" is simple "23" symbol. Note that if "b" is later redefined with
-
 
4480
some other numerical value, the next time and expression containing "a" is
-
 
4481
calculated, the value of "a" will reflect the new value of "b", because the
-
 
4482
symbolic constant contains just the text of the expression.
-
 
4483
  There is one more special case - when preprocessor goes to checking the
3378
second symbol in the line and it happens to be the colon character (what is
4484
second symbol in the line and it happens to be the colon character (what is
3379
then interpreted by assembler as definition of a label), it stops in this
4485
then interpreted by assembler as definition of a label), it stops in this
3380
place and finishes the preprocessing of the first symbol (so if it's the
4486
place and finishes the preprocessing of the first symbol (so if it's the
3381
symbolic constant it gets unrolled) and if it still appears to be the label,
4487
symbolic constant it gets unrolled) and if it still appears to be the label,
3382
it performs the standard preprocessing starting from the place after the
4488
it performs the standard preprocessing starting from the place after the
3383
label. This allows to place preprocessor directives and macroinstructions
4489
label. This allows to place preprocessor directives and macroinstructions
3384
after the labels, analogously to the instructions and directives processed
4490
after the labels, analogously to the instructions and directives processed
3385
by assembler, like:
4491
by assembler, like:
3386
 
4492
 
3387
    start: include 'start.inc'
4493
    start: include 'start.inc'
3388
 
4494
 
3389
However if the label becomes broken during preprocessing (for example when
4495
However if the label becomes broken during preprocessing (for example when
3390
it is the symbolic constant with empty value), only replacing of the symbolic
4496
it is the symbolic constant with empty value), only replacing of the symbolic
3391
constants is continued for the rest of line.
4497
constants is continued for the rest of line.
3392
  It should be remembered, that the jobs performed by preprocessor are the
4498
  It should be remembered, that the jobs performed by preprocessor are the
3393
preliminary operations on the texts symbols, that are done in a simple
4499
preliminary operations on the texts symbols, that are done in a simple
3394
single pass before the main process of assembly. The text that is the
4500
single pass before the main process of assembly. The text that is the
3395
result of preprocessing is passed to assembler, and it then does its
4501
result of preprocessing is passed to assembler, and it then does its
3396
multiple passes on it. Thus the control directives, which are recognized and
4502
multiple passes on it. Thus the control directives, which are recognized and
3397
processed only by the assembler - as they are dependent on the numerical
4503
processed only by the assembler - as they are dependent on the numerical
3398
values that may even vary between passes - are not recognized in any way by
4504
values that may even vary between passes - are not recognized in any way by
3399
the preprocessor and have no effect on the preprocessing. Consider this
4505
the preprocessor and have no effect on the preprocessing. Consider this
3400
example source:
4506
example source:
3401
 
4507
 
3402
    if 0
4508
    if 0
3403
    a = 1
4509
    a = 1
3404
    b equ 2
4510
    b equ 2
3405
    end if
4511
    end if
3406
    dd b
4512
    dd b
3407
 
4513
 
3408
When it is preprocessed, they only directive that is recognized by the
4514
When it is preprocessed, they only directive that is recognized by the
3409
preprocessor is the "equ", which defines symbolic constant "b", so later
4515
preprocessor is the "equ", which defines symbolic constant "b", so later
3410
in the source the "b" symbol is replaced with the value "2". Except for this
4516
in the source the "b" symbol is replaced with the value "2". Except for this
3411
replacement, the other lines are passes unchanged to the assembler. So
4517
replacement, the other lines are passes unchanged to the assembler. So
3412
after preprocessing the above source becomes:
4518
after preprocessing the above source becomes:
3413
 
4519
 
3414
    if 0
4520
    if 0
3415
    a = 1
4521
    a = 1
3416
    end if
4522
    end if
3417
    dd 2
4523
    dd 2
3418
 
4524
 
3419
Now when assembler processes it, the condition for the "if" is false, and
4525
Now when assembler processes it, the condition for the "if" is false, and
3420
the "a" constant doesn't get defined. However symbolic constant "b" was
4526
the "a" constant doesn't get defined. However symbolic constant "b" was
3421
processed normally, even though its definition was put just next to the one
4527
processed normally, even though its definition was put just next to the one
3422
of "a". So because of the possible confusion you should be very careful
4528
of "a". So because of the possible confusion you should be very careful
3423
every time when mixing the features of preprocessor and assembler - always
4529
every time when mixing the features of preprocessor and assembler - in such
3424
try to imagine what your source will become after the preprocessing, and
4530
cases it is important to realize what the source will become after the 
3425
thus what the assembler will see and do its multiple passes on.
4531
preprocessing, and thus what the assembler will see and do its multiple passes 
3426
 
4532
on.
-
 
4533
 
3427
 
4534
 
3428
2.4  Formatter directives
4535
2.4  Formatter directives
3429
 
4536
 
3430
These directives are actually also a kind of control directives, with the
4537
These directives are actually also a kind of control directives, with the
3431
purpose of controlling the format of generated code.
4538
purpose of controlling the format of generated code.
3432
  "format" directive followed by the format identifier allows to select the
4539
  "format" directive followed by the format identifier allows to select the
3433
output format. This directive should be put at the beginning of the source.
4540
output format. This directive should be put at the beginning of the source.
3434
Default output format is a flat binary file, it can also be selected by using
4541
Default output format is a flat binary file, it can also be selected by using
3435
"format binary" directive.
4542
"format binary" directive. This directive can be followed by the "as" keyword
3436
  "use16" and "use32" directives force the assembler to generate 16-bit or
4543
and the quoted string specifying the default file extension for the output
-
 
4544
file. Unless the output file name was specified from the command line,
-
 
4545
assembler will use this extension when generating the output file.
-
 
4546
  "use16" and "use32" directives force the assembler to generate 16-bit or
3437
32-bit code, omitting the default setting for selected output format. "use64"
4547
32-bit code, omitting the default setting for selected output format. "use64"
3438
enables generating the code for the long mode of x86-64 processors.
4548
enables generating the code for the long mode of x86-64 processors.
3439
  Below are described different output formats with the directives specific to
4549
  Below are described different output formats with the directives specific to
3440
these formats.
4550
these formats.
3441
 
4551
 
3442
 
4552
 
3443
2.4.1  MZ executable
4553
2.4.1  MZ executable
3444
 
4554
 
3445
To select the MZ output format, use "format MZ" directive. The default code
4555
To select the MZ output format, use "format MZ" directive. The default code
3446
setting for this format is 16-bit.
4556
setting for this format is 16-bit.
3447
  "segment" directive defines a new segment, it should be followed by label,
4557
  "segment" directive defines a new segment, it should be followed by label,
3448
which value will be the number of defined segment, optionally "use16" or
4558
which value will be the number of defined segment, optionally "use16" or
3449
"use32" word can follow to specify whether code in this segment should be
4559
"use32" word can follow to specify whether code in this segment should be
3450
16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
4560
16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
3451
All the labels defined then will have values relative to the beginning of this
4561
All the labels defined then will have values relative to the beginning of this
3452
segment.
4562
segment.
3453
  "entry" directive sets the entry point for MZ executable, it should be
4563
  "entry" directive sets the entry point for MZ executable, it should be
3454
followed by the far address (name of segment, colon and the offset inside
4564
followed by the far address (name of segment, colon and the offset inside
3455
segment) of desired entry point.
4565
segment) of desired entry point.
3456
  "stack" directive sets up the stack for MZ executable. It can be followed by
4566
  "stack" directive sets up the stack for MZ executable. It can be followed by
3457
numerical expression specifying the size of stack to be created automatically
4567
numerical expression specifying the size of stack to be created automatically
3458
or by the far address of initial stack frame when you want to set up the stack
4568
or by the far address of initial stack frame when you want to set up the stack
3459
manually. When no stack is defined, the stack of default size 4096 bytes will
4569
manually. When no stack is defined, the stack of default size 4096 bytes will
3460
be created.
4570
be created.
3461
  "heap" directive should be followed by a 16-bit value defining maximum size
4571
  "heap" directive should be followed by a 16-bit value defining maximum size
3462
of additional heap in paragraphs (this is heap in addition to stack and
4572
of additional heap in paragraphs (this is heap in addition to stack and
3463
undefined data). Use "heap 0" to always allocate only memory program really
4573
undefined data). Use "heap 0" to always allocate only memory program really
3464
needs. Default size of heap is 65535.
4574
needs. Default size of heap is 65535.
3465
 
4575
 
3466
 
4576
 
3467
2.4.2  Portable Executable
4577
2.4.2  Portable Executable
3468
 
4578
 
3469
To select the Portable Executable output format, use "format PE" directive, it
4579
To select the Portable Executable output format, use "format PE" directive, it
3470
can be followed by additional format settings: use "console", "GUI" or
4580
can be followed by additional format settings: first the target subsystem
3471
"native" operator selects the target subsystem (floating point value
4581
setting, which can be "console" or "GUI" for Windows applications, "native"
3472
specifying subsystem version can follow), "DLL" marks the output file as a
4582
for Windows drivers, "EFI", "EFIboot" or "EFIruntime" for the UEFI, it may be
-
 
4583
followed by the minimum version of system that the executable is targeted to
3473
dynamic link library. Then can follow the "at" operator and the numerical
4584
(specified in form of floating-point value). Optional "DLL" and "WDM" keywords
-
 
4585
mark the output file as a dynamic link library and WDM driver respectively,
-
 
4586
and the "large" keyword marks the executable as able to handle addresses
-
 
4587
larger than 2 GB.
-
 
4588
  After those settings can follow the "at" operator and a numerical expression
3474
expression specifying the base of PE image and then optionally "on" operator
4589
specifying the base of PE image and then optionally "on" operator followed by
3475
followed by the quoted string containing file name selects custom MZ stub for
4590
the quoted string containing file name selects custom MZ stub for PE program
3476
PE program (when specified file is not a MZ executable, it is treated as a
4591
(when specified file is not a MZ executable, it is treated as a flat binary
3477
flat binary executable file and converted into MZ format). The default code
4592
executable file and converted into MZ format). The default code setting for
3478
setting for this format is 32-bit. The example of fully featured PE format
4593
this format is 32-bit. The example of fully featured PE format declaration:
3479
declaration:
4594
 
3480
 
-
 
3481
    format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
4595
    format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
3482
 
4596
 
3483
  To create PE file for the x86-64 architecture, use "PE64" keyword instead of
4597
  To create PE file for the x86-64 architecture, use "PE64" keyword instead of
3484
"PE" in the format declaration, in such case the long mode code is generated
4598
"PE" in the format declaration, in such case the long mode code is generated
3485
by default.
4599
by default.
3486
  "section" directive defines a new section, it should be followed by quoted
4600
  "section" directive defines a new section, it should be followed by quoted
3487
string defining the name of section, then one or more section flags can
4601
string defining the name of section, then one or more section flags can
3488
follow. Available flags are: "code", "data", "readable", "writeable",
4602
follow. Available flags are: "code", "data", "readable", "writeable",
3489
"executable", "shareable", "discardable", "notpageable". The origin of section
4603
"executable", "shareable", "discardable", "notpageable". The origin of section
3490
is aligned to page (4096 bytes). Example declaration of PE section:
4604
is aligned to page (4096 bytes). Example declaration of PE section:
3491
 
4605
 
3492
    section '.text' code readable executable
4606
    section '.text' code readable executable
3493
 
4607
 
3494
Among with flags also one of the special PE data identifiers can be specified
4608
Among with flags also one of the special PE data identifiers can be specified
3495
to mark the whole section as a special data, possible identifiers are
4609
to mark the whole section as a special data, possible identifiers are
3496
"export", "import", "resource" and "fixups". If the section is marked to
4610
"export", "import", "resource" and "fixups". If the section is marked to
3497
contain fixups, they are generated automatically and no more data needs to be
4611
contain fixups, they are generated automatically and no more data needs to be
3498
defined in this section. Also resource data can be generated automatically
4612
defined in this section. Also resource data can be generated automatically
3499
from the resource file, it can be achieved by writing the "from" operator and
4613
from the resource file, it can be achieved by writing the "from" operator and
3500
quoted file name after the "resource"  identifier. Below are the examples of
4614
quoted file name after the "resource"  identifier. Below are the examples of
3501
sections containing some special PE data:
4615
sections containing some special PE data:
3502
 
4616
 
3503
    section '.reloc' data discardable fixups
4617
    section '.reloc' data discardable fixups
3504
    section '.rsrc' data readable resource from 'my.res'
4618
    section '.rsrc' data readable resource from 'my.res'
3505
 
4619
 
3506
  "entry" directive sets the entry point for Portable Executable, the value of
4620
  "entry" directive sets the entry point for Portable Executable, the value of
3507
entry point should follow.
4621
entry point should follow.
3508
  "stack" directive sets up the size of stack for Portable Executable, value
4622
  "stack" directive sets up the size of stack for Portable Executable, value
3509
of stack reserve size should follow, optionally value of stack commit
4623
of stack reserve size should follow, optionally value of stack commit
3510
separated with comma can follow. When stack is not defined, it's set by
4624
separated with comma can follow. When stack is not defined, it's set by
3511
default to size of 4096 bytes.
4625
default to size of 4096 bytes.
3512
  "heap" directive chooses the size of heap for Portable Executable, value of
4626
  "heap" directive chooses the size of heap for Portable Executable, value of
3513
heap reserve size should follow, optionally value of heap commit separated
4627
heap reserve size should follow, optionally value of heap commit separated
3514
with comma can follow. When no heap is defined, it is set by default to size
4628
with comma can follow. When no heap is defined, it is set by default to size
3515
of 65536 bytes, when size of heap commit is unspecified, it is by default set
4629
of 65536 bytes, when size of heap commit is unspecified, it is by default set
3516
to zero.
4630
to zero.
3517
  "data" directive begins the definition of special PE data, it should be
4631
  "data" directive begins the definition of special PE data, it should be
3518
followed by one of the data identifiers ("export", "import", "resource" or
4632
followed by one of the data identifiers ("export", "import", "resource" or
3519
"fixups") or by the number of data entry in PE header. The data should be
4633
"fixups") or by the number of data entry in PE header. The data should be
3520
defined in next lines, ended with "end data" directive. When fixups data
4634
defined in next lines, ended with "end data" directive. When fixups data
3521
definition is chosen, they are generated automatically and no more data needs
4635
definition is chosen, they are generated automatically and no more data needs
3522
to be defined there. The same applies to the resource data when the "resource"
4636
to be defined there. The same applies to the resource data when the "resource"
3523
identifier is followed by "from" operator and quoted file name - in such case
4637
identifier is followed by "from" operator and quoted file name - in such case
3524
data is  taken from the given resource file.
4638
data is  taken from the given resource file.
3525
  The "rva" operator can be used inside the numerical expressions to obtain
4639
  The "rva" operator can be used inside the numerical expressions to obtain
3526
the RVA of the item addressed by the value it is applied to.
4640
the RVA of the item addressed by the value it is applied to, that is the
3527
 
4641
offset relative to the base of PE image.
-
 
4642
 
3528
 
4643
 
3529
2.4.3  Common Object File Format
4644
2.4.3  Common Object File Format
3530
 
4645
 
3531
To select Common Object File Format, use "format COFF" or "format MS COFF"
4646
To select Common Object File Format, use "format COFF" or "format MS COFF"
3532
directive whether you want to create classic or Microsoft's COFF file. The
4647
directive, depending whether you want to create classic (DJGPP) or Microsoft's 
3533
default code setting for this format is 32-bit. To create the file in
4648
variant of COFF file. The default code setting for this format is 32-bit. To 
3534
Microsoft's COFF format for the x86-64 architecture, use "format MS64 COFF"
4649
create the file in Microsoft's COFF format for the x86-64 architecture, use 
3535
setting, in such case long mode code is generated by default.
4650
"format MS64 COFF" setting, in such case long mode code is generated by 
3536
  "section" directive defines a new section, it should be followed by quoted
4651
default.
-
 
4652
  "section" directive defines a new section, it should be followed by quoted
3537
string defining the name of section, then one or more section flags can
4653
string defining the name of section, then one or more section flags can
3538
follow. Section flags available for both COFF variants are "code" and "data",
4654
follow. Section flags available for both COFF variants are "code" and "data",
3539
while "readable", "writeable", "executable", "shareable", "discardable",
4655
while flags "readable", "writeable", "executable", "shareable", "discardable",
3540
"notpageable", "linkremove" and "linkinfo" are flags available only with
4656
"notpageable", "linkremove" and "linkinfo" are available only with Microsoft's
3541
Microsoft COFF variant.
4657
COFF variant.
3542
  By default section is aligned to double word (four bytes), in case of
4658
  By default section is aligned to double word (four bytes), in case of
3543
Microsoft COFF variant other alignment can be specified by providing the
4659
Microsoft COFF variant other alignment can be specified by providing the
3544
"align" operator followed by alignment value (any power of two up to 8192)
4660
"align" operator followed by alignment value (any power of two up to 8192)
3545
among the section flags.
4661
among the section flags.
3546
  "extrn" directive defines the external symbol, it should be followed by the
4662
  "extrn" directive defines the external symbol, it should be followed by the
3547
name of symbol and optionally the size operator specifying the size of data
4663
name of symbol and optionally the size operator specifying the size of data
3548
labeled by this symbol. The name of symbol can be also preceded by quoted
4664
labeled by this symbol. The name of symbol can be also preceded by quoted
3549
string containing name of the external symbol and the "as" operator.
4665
string containing name of the external symbol and the "as" operator.
3550
Some example declarations of external symbols:
4666
Some example declarations of external symbols:
3551
 
4667
 
3552
    extrn exit
4668
    extrn exit
3553
    extrn '__imp__MessageBoxA@16' as MessageBox:dword
4669
    extrn '__imp__MessageBoxA@16' as MessageBox:dword
3554
 
4670
 
3555
  "public" directive declares the existing symbol as public, it should be
4671
  "public" directive declares the existing symbol as public, it should be
3556
followed by the name of symbol, optionally it can be followed by the "as"
4672
followed by the name of symbol, optionally it can be followed by the "as"
3557
operator and the quoted string containing name under which symbol should be
4673
operator and the quoted string containing name under which symbol should be
3558
available as public. Some examples of public symbols declarations:
4674
available as public. Some examples of public symbols declarations:
3559
 
4675
 
3560
    public main
4676
    public main
3561
    public start as '_start'
4677
    public start as '_start'
3562
 
4678
 
-
 
4679
Additionally, with COFF format it's possible to specify exported symbol as
-
 
4680
static, it's done by preceding the name of symbol with the "static" keyword.
-
 
4681
  When using the Microsoft's COFF format, the "rva" operator can be used
-
 
4682
inside the numerical expressions to obtain the RVA of the item addressed by the
-
 
4683
value it is applied to.
-
 
4684
 
3563
2.4.4  Executable and Linkable Format
4685
2.4.4  Executable and Linkable Format
3564
 
4686
 
3565
To select ELF output format, use "format ELF" directive. The default code
4687
To select ELF output format, use "format ELF" directive. The default code
3566
setting for this format is 32-bit. To create ELF file for the x86-64
4688
setting for this format is 32-bit. To create ELF file for the x86-64
3567
architecture, use "format ELF64" directive, in such case the long mode code is
4689
architecture, use "format ELF64" directive, in such case the long mode code is
3568
generated by default.
4690
generated by default.
3569
  "section" directive defines a new section, it should be followed by quoted
4691
  "section" directive defines a new section, it should be followed by quoted
3570
string defining the name of section, then can follow one or both of the
4692
string defining the name of section, then can follow one or both of the
3571
"executable" and "writeable" flags, optionally also "align" operator followed
4693
"executable" and "writeable" flags, optionally also "align" operator followed
3572
by the number specifying the alignment of section (it has to be the power of
4694
by the number specifying the alignment of section (it has to be the power of
3573
two), if no alignment is specified, the default value is used, which is 4 or 8,
4695
two), if no alignment is specified, the default value is used, which is 4 or 8,
3574
depending on which format variant has been chosen.
4696
depending on which format variant has been chosen.
3575
  "extrn" and "public" directives have the same meaning and syntax as when the
4697
  "extrn" and "public" directives have the same meaning and syntax as when the
3576
COFF output format is selected (described in previous section).
4698
COFF output format is selected (described in previous section).
3577
  The "rva" operator can be used also in the case of this format (however not
4699
  The "rva" operator can be used also in the case of this format (however not
3578
when target architecture is x86-64), it converts the address into the offset
4700
when target architecture is x86-64), it converts the address into the offset
3579
relative to the GOT table, so it may be useful to create position-independent
4701
relative to the GOT table, so it may be useful to create position-independent
3580
code.
4702
code. There's also a special "plt" operator, which allows to call the external
3581
  To create executable file, follow the format choice directive with the
-
 
3582
"executable" keyword. It allows to use "entry" directive followed by the value
-
 
3583
to set as entry point of program. On the other hand it makes "extrn" and
4703
functions through the Procedure Linkage Table. You can even create an alias
3584
"public" directives unavailable, and instead of "section" there should be the
4704
for external function that will make it always be called through PLT, with
3585
"segment" directive used, followed only by one or more segment permission
4705
the code like:
3586
flags. The origin of segment is aligned to page (4096 bytes), and available
-
 
3587
flags for are: "readable", "writeable" and "executable".
-
 
3588
 
4706
 
-
 
4707
    extrn 'printf' as _printf
-
 
4708
    printf = PLT _printf
-
 
4709
 
-
 
4710
  To create executable file, follow the format choice directive with the
-
 
4711
"executable" keyword and optionally the number specifying the brand of the
-
 
4712
target operating system (for example value 3 would mark the executable
-
 
4713
for Linux system). With this format selected it is allowed to use "entry"
-
 
4714
directive followed by the value to set as entry point of program. On the other
-
 
4715
hand it makes "extrn" and "public" directives unavailable, and instead of
-
 
4716
"section" there should be the "segment" directive used, followed by one or
-
 
4717
more segment permission flags and optionally a marker of special ELF
-
 
4718
executable segment, which can be "interpreter", "dynamic" or "note". The
-
 
4719
origin of segment is aligned to page (4096 bytes), and available permission
-
 
4720
flags are: "readable", "writeable" and "executable".
3589
 
4721
 
3590
EOF
4722
EOF