Subversion Repositories Kolibri OS

Rev

Rev 1737 | Go to most recent revision | Show entire file | Regard whitespace | Details | Blame | Last modification | View Log | RSS feed

Rev 1737 Rev 2666
Line 1... Line 1...
1
Üßßß
1
,'''
2
                         ÜÜÛÜÜ ÜÜÜÜ    ÜÜÜÜÜ ÜÜÜ ÜÜ
2
                         ,,;,, ,,,,    ,,,,, ,,, ,,
3
                           Û       Û  Û      Û  Û  Û
3
                           ;       ;  ;      ;  ;  ;
4
                           Û  ÜßßßßÛ   ßßßßÜ Û  Û  Û
4
                           ;  ,'''';   '''', ;  ;  ;
5
                           Û  ßÜÜÜÜÛÜ ÜÜÜÜÜß Û  Û  Û
5
                           ;  ',,,,;, ,,,,,' ;  ;  ;
Line 6... Line 6...
6
 
6
 
7
                              flat assembler 1.66
7
                              flat assembler 1.70
Line 8... Line 8...
8
                              Programmer's Manual
8
                              Programmer's Manual
9
 
9
 
Line 10... Line 10...
10
 
10
 
Line 11... Line 11...
11
Table of contents
11
Table of contents
12
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
12
-----------------
Line 48... Line 48...
48
        2.1.16  SSE2 instructions
48
        2.1.16  SSE2 instructions
49
        2.1.17  SSE3 instructions
49
        2.1.17  SSE3 instructions
50
        2.1.18  AMD 3DNow! instructions
50
        2.1.18  AMD 3DNow! instructions
51
        2.1.19  The x86-64 long mode instructions
51
        2.1.19  The x86-64 long mode instructions
52
 
52
        2.1.20  SSE4 instructions
-
 
53
        2.1.21  AVX instructions
-
 
54
        2.1.22  AVX2 instructions
-
 
55
        2.1.23  Auxiliary sets of computational instructions
-
 
56
        2.1.24  Other extensions of instruction set
-
 
57
 
Line 53... Line 58...
53
        2.2  Control directives
58
        2.2  Control directives
54
        2.2.1  Numerical constants
59
        2.2.1  Numerical constants
55
        2.2.2  Conditional assembly
60
        2.2.2  Conditional assembly
56
        2.2.3  Repeating blocks of instructions
61
        2.2.3  Repeating blocks of instructions
Line 73... Line 78...
73
        2.4.3  Common Object File Format
78
        2.4.3  Common Object File Format
74
        2.4.4  Executable and Linkable Format
79
        2.4.4  Executable and Linkable Format
75
 
80
 
Line -... Line 81...
-
 
81
 
76
 
82
 
77
Chapter 1  Introduction
83
Chapter 1  Introduction
Line 78... Line 84...
78
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
84
-----------------------
79
 
85
 
80
This chapter contains all the most important information you need to begin
86
This chapter contains all the most important information you need to begin
Line 137... Line 143...
137
done, how much time it took, and how many bytes were written into the
143
done, how much time it took, and how many bytes were written into the
138
destination file.
144
destination file.
139
The following is an example of the compilation summary:
145
The following is an example of the compilation summary:
140
 
146
 
Line 141... Line 147...
141
flat assembler  version 1.66
147
flat assembler  version 1.70 (16384 kilobytes memory)
142
38 passes, 5.3 seconds, 77824 bytes.
148
38 passes, 5.3 seconds, 77824 bytes.
Line 143... Line 149...
143
 
149
 
144
In case of error during the compilation process, the program will display an
150
In case of error during the compilation process, the program will display an
145
error message. For example, when compiler can't find the input file, it will
151
error message. For example, when compiler can't find the input file, it will
Line 146... Line 152...
146
display the following message:
152
display the following message:
147
 
153
 
Line 148... Line 154...
148
flat assembler  version 1.66
154
flat assembler  version 1.70 (16384 kilobytes memory)
149
error: source file not found.
155
error: source file not found.
150
 
156
 
Line 151... Line 157...
151
If the error is connected with a specific part of source code, the source line
157
If the error is connected with a specific part of source code, the source line
152
that caused the error will be also displayed. Also placement of this line in
158
that caused the error will be also displayed. Also placement of this line in
153
the source is given to help you finding this error, for example:
159
the source is given to help you finding this error, for example:
154
 
160
 
Line 155... Line 161...
155
flat assembler  version 1.66
161
flat assembler  version 1.70 (16384 kilobytes memory)
156
example.asm [3]:
162
example.asm [3]:
157
        mob     ax,1
163
        mob     ax,1
158
error: illegal instruction.
164
error: illegal instruction.
Line 159... Line 165...
159
 
165
 
160
It means that in the third line of the "example.asm" file compiler has
166
It means that in the third line of the "example.asm" file compiler has
161
encountered an unrecognized instruction. When the line that caused error
167
encountered an unrecognized instruction. When the line that caused error
162
contains a macroinstruction, also the line in macroinstruction definition
168
contains a macroinstruction, also the line in macroinstruction definition
163
that generated the erroneous instruction is displayed:
169
that generated the erroneous instruction is displayed:
164
 
170
 
Line 210... Line 216...
210
that are individual items even when are not spaced from the other ones.
216
that are individual items even when are not spaced from the other ones.
211
Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
217
Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
212
other characters, separated from other items with either blank spaces or
218
other characters, separated from other items with either blank spaces or
213
symbol characters, is a symbol. If the first character of symbol is either a
219
symbol characters, is a symbol. If the first character of symbol is either a
214
single or double quote, it integrates the any sequence of characters following
220
single or double quote, it integrates any sequence of characters following it,
215
it, even the special ones, into a quoted string, which should end with the same
221
even the special ones, into a quoted string, which should end with the same
216
character, with which it began (the single or double quote) - however if there
222
character, with which it began (the single or double quote) - however if there
217
are two such characters in a row (without any other character between them),
223
are two such characters in a row (without any other character between them),
218
they are integrated into quoted string as just one of them and the quoted
224
they are integrated into quoted string as just one of them and the quoted
219
string continues then. The symbols other than symbol characters and quoted
225
string continues then. The symbols other than symbol characters and quoted
220
strings can be used as names, so are also called the name symbols.
226
strings can be used as names, so are also called the name symbols.
221
  Every instruction consists of the mnemonic and the various number of
227
  Every instruction consists of the mnemonic and the various number of
Line 235... Line 241...
235
by a colon should be put just before the address value (inside the square
241
by a colon should be put just before the address value (inside the square
236
brackets or after the "ptr" operator).
242
brackets or after the "ptr" operator).
237
 
243
 
Line 238... Line 244...
238
   Table 1.1  Size operators
244
   Table 1.1  Size operators
239
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿
245
  /-------------------------\
240
  ³ Operator ³ Bits ³ Bytes ³
246
  | Operator | Bits | Bytes |
241
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵
247
  |==========|======|=======|
242
  ³ byte     ³ 8    ³ 1     ³
248
  | byte     | 8    | 1     |
243
  ³ word     ³ 16   ³ 2     ³
249
  | word     | 16   | 2     |
244
  ³ dword    ³ 32   ³ 4     ³
250
  | dword    | 32   | 4     |
245
  ³ fword    ³ 48   ³ 6     ³
251
  | fword    | 48   | 6     |
246
  ³ pword    ³ 48   ³ 6     ³
252
  | pword    | 48   | 6     |
247
  ³ qword    ³ 64   ³ 8     ³
253
  | qword    | 64   | 8     |
248
  ³ tbyte    ³ 80   ³ 10    ³
254
  | tbyte    | 80   | 10    |
249
  ³ tword    ³ 80   ³ 10    ³
255
  | tword    | 80   | 10    |
250
  ³ dqword   ³ 128  ³ 16    ³
256
  | dqword   | 128  | 16    |
-
 
257
  | xword    | 128  | 16    |
-
 
258
  | qqword   | 256  | 32    |
-
 
259
  | yword    | 256  | 32    |
251
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ
260
  \-------------------------/
Line 252... Line 261...
252
 
261
 
253
   Table 1.2  Registers
262
   Table 1.2  Registers
254
  ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
263
  /-----------------------------------------------------------------\
255
  ³ Type    ³ Bits ³                                                ³
264
  | Type    | Bits |                                                |
256
  ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
265
  |=========|======|================================================|
257
  ³         ³ 8    ³ al    cl    dl    bl    ah    ch    dh    bh   ³
266
  |         | 8    | al    cl    dl    bl    ah    ch    dh    bh   |
258
  ³ General ³ 16   ³ ax    cx    dx    bx    sp    bp    si    di   ³
267
  | General | 16   | ax    cx    dx    bx    sp    bp    si    di   |
259
  ³         ³ 32   ³ eax   ecx   edx   ebx   esp   ebp   esi   edi  ³
268
  |         | 32   | eax   ecx   edx   ebx   esp   ebp   esi   edi  |
260
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
269
  |---------|------|------------------------------------------------|
261
  ³ Segment ³ 16   ³ es    cs    ss    ds    fs    gs               ³
270
  | Segment | 16   | es    cs    ss    ds    fs    gs               |
262
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
271
  |---------|------|------------------------------------------------|
263
  ³ Control ³ 32   ³ cr0         cr2   cr3   cr4                    ³
272
  | Control | 32   | cr0         cr2   cr3   cr4                    |
264
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
273
  |---------|------|------------------------------------------------|
265
  ³ Debug   ³ 32   ³ dr0   dr1   dr2   dr3               dr6   dr7  ³
274
  | Debug   | 32   | dr0   dr1   dr2   dr3               dr6   dr7  |
266
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
275
  |---------|------|------------------------------------------------|
267
  ³ FPU     ³ 80   ³ st0   st1   st2   st3   st4   st5   st6   st7  ³
276
  | FPU     | 80   | st0   st1   st2   st3   st4   st5   st6   st7  |
268
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
277
  |---------|------|------------------------------------------------|
269
  ³ MMX     ³ 64   ³ mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  ³
278
  | MMX     | 64   | mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  |
270
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
279
  |---------|------|------------------------------------------------|
-
 
280
  | SSE     | 128  | xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 |
-
 
281
  |---------|------|------------------------------------------------|
271
  ³ SSE     ³ 128  ³ xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 ³
282
  | AVX     | 256  | ymm0  ymm1  ymm2  ymm3  ymm4  ymm5  ymm6  ymm7 |
Line 272... Line 283...
272
  ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
283
  \-----------------------------------------------------------------/
Line 273... Line 284...
273
 
284
 
Line 314... Line 325...
314
may not be included in the output file, so its values should be always
325
may not be included in the output file, so its values should be always
315
considered unknown.
326
considered unknown.
316
 
327
 
Line 317... Line 328...
317
   Table 1.3  Data directives
328
   Table 1.3  Data directives
318
  ÚÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄ¿
329
  /----------------------------\
319
  ³ Size    ³ Define ³ Reserve ³
330
  | Size    | Define | Reserve |
320
  ³ (bytes) ³ data   ³ data    ³
331
  | (bytes) | data   | data    |
321
  ÆÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍ͵
332
  |=========|========|=========|
322
  ³ 1       ³ db     ³ rb      ³
333
  | 1       | db     | rb      |
323
  ³         ³ file   ³         ³
334
  |         | file   |         |
324
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
335
  |---------|--------|---------|
325
  ³ 2       ³ dw     ³ rw      ³
336
  | 2       | dw     | rw      |
326
  ³         ³ du     ³         ³
337
  |         | du     |         |
327
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
338
  |---------|--------|---------|
328
  ³ 4       ³ dd     ³ rd      ³
339
  | 4       | dd     | rd      |
329
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
340
  |---------|--------|---------|
330
  ³ 6       ³ dp     ³ rp      ³
341
  | 6       | dp     | rp      |
331
  ³         ³ df     ³ rf      ³
342
  |         | df     | rf      |
332
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
343
  |---------|--------|---------|
333
  ³ 8       ³ dq     ³ rq      ³
344
  | 8       | dq     | rq      |
334
  ÃÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ´
345
  |---------|--------|---------|
335
  ³ 10      ³ dt     ³ rt      ³
346
  | 10      | dt     | rt      |
336
  ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÙ
347
  \----------------------------/
Line 337... Line 348...
337
 
348
 
Line 338... Line 349...
338
 
349
 
Line 397... Line 408...
397
 
408
 
Line 398... Line 409...
398
In the above examples all the numerical expressions were the simple numbers,
409
In the above examples all the numerical expressions were the simple numbers,
399
constants or labels. But they can be more complex, by using the arithmetical
410
constants or labels. But they can be more complex, by using the arithmetical
400
or logical operators for calculations at compile time. All these operators
411
or logical operators for calculations at compile time. All these operators
401
with their priority values are listed in table 1.4.
412
with their priority values are listed in table 1.4. The operations with higher
402
The operations with higher priority value will be calculated first, you can
413
priority value will be calculated first, you can of course change this
403
of course change this behavior by putting some parts of expression into
414
behavior by putting some parts of expression into parenthesis. The "+", "-",
404
parenthesis. The "+", "-", "*" and "/" are standard arithmetical operations,
415
"*" and "/" are standard arithmetical operations, "mod" calculates the
405
"mod" calculates the remainder from division. The "and", "or", "xor", "shl",
416
remainder from division. The "and", "or", "xor", "shl", "shr" and "not"
406
"shr" and "not" perform the same logical operations as assembly instructions
417
perform the same logical operations as assembly instructions of those names.
-
 
418
The "rva" and "plt" are special unary operators that perform conversions
407
of those names. The "rva" performs the conversion of an address into the
419
between different kinds of addresses, they can be used only with few of the
408
relocatable offset and is specific to some of the output formats (see 2.4).
420
output formats and their meaning may vary (see 2.4).
-
 
421
  The arithmetical and logical calculations are usually processed as if they
-
 
422
operated on infinite precision 2-adic numbers, and assembler signalizes an
-
 
423
overflow error if because of its limitations it is not table to perform the
-
 
424
required calculation, or if the result is too large number to fit in either
-
 
425
signed or unsigned range for the destination unit size. However "not", "xor"
-
 
426
and "shr" operators are exceptions from this rule - if the value specified
-
 
427
by numerical expression has to fit in a unit of specified size, and the
-
 
428
arguments for operation fit into that size, the operation will be performed
-
 
429
with precision limited to that size.
409
  The numbers in the expression are by default treated as a decimal, binary
430
  The numbers in the expression are by default treated as a decimal, binary
410
numbers should have the "b" letter attached at the end, octal number should
431
numbers should have the "b" letter attached at the end, octal number should
411
end with "o" letter, hexadecimal numbers should begin with "0x" characters
432
end with "o" letter, hexadecimal numbers should begin with "0x" characters
412
(like in C language) or with the "$" character (like in Pascal language) or
433
(like in C language) or with the "$" character (like in Pascal language) or
413
they should end with "h" letter. Also quoted string, when encountered in
434
they should end with "h" letter. Also quoted string, when encountered in
Line 429... Line 450...
429
characters. So "1.0", "1E0" and "1f" define the same floating point value,
450
characters. So "1.0", "1E0" and "1f" define the same floating point value,
430
while simple "1" defines an integer value.
451
while simple "1" defines an integer value.
431
 
452
 
Line 432... Line 453...
432
   Table 1.4  Arithmetical and logical operators by priority
453
   Table 1.4  Arithmetical and logical operators by priority
433
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
454
  /-------------------------\
434
  ³ Priority ³ Operators    ³
455
  | Priority | Operators    |
435
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
456
  |==========|==============|
436
  ³ 0        ³ +  -         ³
457
  | 0        | +  -         |
437
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
458
  |----------|--------------|
438
  ³ 1        ³ *  /         ³
459
  | 1        | *  /         |
439
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
460
  |----------|--------------|
440
  ³ 2        ³ mod          ³
461
  | 2        | mod          |
441
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
462
  |----------|--------------|
442
  ³ 3        ³ and  or  xor ³
463
  | 3        | and  or  xor |
443
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
464
  |----------|--------------|
444
  ³ 4        ³ shl  shr     ³
465
  | 4        | shl  shr     |
445
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
466
  |----------|--------------|
446
  ³ 5        ³ not          ³
467
  | 5        | not          |
447
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
468
  |----------|--------------|
448
  ³ 6        ³ rva          ³
469
  | 6        | rva  plt     |
449
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
470
  \-------------------------/
Line 450... Line 471...
450
 
471
 
Line 451... Line 472...
451
 
472
 
Line 457... Line 478...
457
instruction "jmp dword [0]" will become the far jump and when assembler is
478
instruction "jmp dword [0]" will become the far jump and when assembler is
458
in 32-bit mode, it will become the near jump. To force this instruction to be
479
in 32-bit mode, it will become the near jump. To force this instruction to be
459
treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
480
treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
460
  When operand of near jump is the immediate value, assembler will generate
481
  When operand of near jump is the immediate value, assembler will generate
461
the shortest variant of this jump instruction if possible (but won't create
482
the shortest variant of this jump instruction if possible (but will not create
462
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
483
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
463
unless there is a size operator stating it). By specifying the jump type
484
unless there is a size operator stating it). By specifying the jump type
464
you can force it to always generate long variant (for example "jmp near 0")
485
you can force it to always generate long variant (for example "jmp near 0")
465
or to always generate short variant and terminate with an error when it's
486
or to always generate short variant and terminate with an error when it's
466
impossible (for example "jmp short 0").
487
impossible (for example "jmp short 0").
467
 
488
 
Line 490... Line 511...
490
without forcing it to use the longer form of instruction.
511
without forcing it to use the longer form of instruction.
491
 
512
 
Line 492... Line 513...
492
 
513
 
493
Chapter 2  Instruction set
514
Chapter 2  Instruction set
Line 494... Line 515...
494
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
515
--------------------------
495
 
516
 
496
This chapter provides the detailed information about the instructions and
517
This chapter provides the detailed information about the instructions and
497
directives supported by flat assembler. Directives for defining labels were
518
directives supported by flat assembler. Directives for defining labels were
Line 765... Line 786...
765
 
786
 
Line 766... Line 787...
766
 
787
 
Line 767... Line 788...
767
2.1.5  Logical instructions
788
2.1.5  Logical instructions
768
 
789
 
769
"not" inverts the bits in the specified operand to form a one's
790
"not" inverts the bits in the specified operand to form a one's complement 
770
complement of the operand. It has no effect on the flags. Rules for the
791
of the operand. It has no effect on the flags. Rules for the operand are the 
771
operand are the same as for the "inc" instruction.
792
same as for the "inc" instruction.
772
  "and", "or" and "xor" instructions perform the standard
793
  "and", "or" and "xor" instructions perform the standard logical operations. 
773
logical operations. They update the SF, ZF and PF flags. Rules for the
794
They update the SF, ZF and PF flags. Rules for the operands are the same as 
774
operands are the same as for the "add" instruction.
795
for the "add" instruction.
775
  "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
796
  "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
776
be in memory or in a general register. The location of the bit is specified
797
be in memory or in a general register. The location of the bit is specified
777
as an offset from the low order end of the operand. The value of the offset
798
as an offset from the low order end of the operand. The value of the offset
Line 916... Line 937...
916
optimized (see 1.2.5), the operand should be an immediate value specifying
937
optimized (see 1.2.5), the operand should be an immediate value specifying
917
target address.
938
target address.
918
 
939
 
Line 919... Line 940...
919
   Table 2.1  Conditions
940
   Table 2.1  Conditions
920
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
941
  /-----------------------------------------------------------\
921
  ³ Mnemonic ³ Condition tested      ³ Description            ³
942
  | Mnemonic | Condition tested      | Description            |
922
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
943
  |==========|=======================|========================|
923
  ³ o        ³ OF = 1                ³ overflow               ³
944
  | o        | OF = 1                | overflow               |
924
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
945
  |----------|-----------------------|------------------------|
925
  ³ no       ³ OF = 0                ³ not overflow           ³
946
  | no       | OF = 0                | not overflow           |
926
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
947
  |----------|-----------------------|------------------------|
927
  ³ c        ³                       ³ carry                  ³
948
  | c        |                       | carry                  |
928
  ³ b        ³ CF = 1                ³ below                  ³
949
  | b        | CF = 1                | below                  |
929
  ³ nae      ³                       ³ not above nor equal    ³
950
  | nae      |                       | not above nor equal    |
930
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
951
  |----------|-----------------------|------------------------|
931
  ³ nc       ³                       ³ not carry              ³
952
  | nc       |                       | not carry              |
932
  ³ ae       ³ CF = 0                ³ above or equal         ³
953
  | ae       | CF = 0                | above or equal         |
933
  ³ nb       ³                       ³ not below              ³
954
  | nb       |                       | not below              |
934
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
955
  |----------|-----------------------|------------------------|
935
  ³ e        ³ ZF = 1                ³ equal                  ³
956
  | e        | ZF = 1                | equal                  |
936
  ³ z        ³                       ³ zero                   ³
957
  | z        |                       | zero                   |
937
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
958
  |----------|-----------------------|------------------------|
938
  ³ ne       ³ ZF = 0                ³ not equal              ³
959
  | ne       | ZF = 0                | not equal              |
939
  ³ nz       ³                       ³ not zero               ³
960
  | nz       |                       | not zero               |
940
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
961
  |----------|-----------------------|------------------------|
941
  ³ be       ³ CF or ZF = 1          ³ below or equal         ³
962
  | be       | CF or ZF = 1          | below or equal         |
942
  ³ na       ³                       ³ not above              ³
963
  | na       |                       | not above              |
943
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
964
  |----------|-----------------------|------------------------|
944
  ³ a        ³ CF or ZF = 0          ³ above                  ³
965
  | a        | CF or ZF = 0          | above                  |
945
  ³ nbe      ³                       ³ not below nor equal    ³
966
  | nbe      |                       | not below nor equal    |
946
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
967
  |----------|-----------------------|------------------------|
947
  ³ s        ³ SF = 1                ³ sign                   ³
968
  | s        | SF = 1                | sign                   |
948
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
969
  |----------|-----------------------|------------------------|
949
  ³ ns       ³ SF = 0                ³ not sign               ³
970
  | ns       | SF = 0                | not sign               |
950
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
971
  |----------|-----------------------|------------------------|
951
  ³ p        ³ PF = 1                ³ parity                 ³
972
  | p        | PF = 1                | parity                 |
952
  ³ pe       ³                       ³ parity even            ³
973
  | pe       |                       | parity even            |
953
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
974
  |----------|-----------------------|------------------------|
954
  ³ np       ³ PF = 0                ³ not parity             ³
975
  | np       | PF = 0                | not parity             |
955
  ³ po       ³                       ³ parity odd             ³
976
  | po       |                       | parity odd             |
956
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
977
  |----------|-----------------------|------------------------|
957
  ³ l        ³ SF xor OF = 1         ³ less                   ³
978
  | l        | SF xor OF = 1         | less                   |
958
  ³ nge      ³                       ³ not greater nor equal  ³
979
  | nge      |                       | not greater nor equal  |
959
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
980
  |----------|-----------------------|------------------------|
960
  ³ ge       ³ SF xor OF = 0         ³ greater or equal       ³
981
  | ge       | SF xor OF = 0         | greater or equal       |
961
  ³ nl       ³                       ³ not less               ³
982
  | nl       |                       | not less               |
962
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
983
  |----------|-----------------------|------------------------|
963
  ³ le       ³ (SF xor OF) or ZF = 1 ³ less or equal          ³
984
  | le       | (SF xor OF) or ZF = 1 | less or equal          |
964
  ³ ng       ³                       ³ not greater            ³
985
  | ng       |                       | not greater            |
965
  ÃÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´
986
  |----------|-----------------------|------------------------|
966
  ³ g        ³ (SF xor OF) or ZF = 0 ³ greater                ³
987
  | g        | (SF xor OF) or ZF = 0 | greater                |
967
  ³ nle      ³                       ³ not less nor equal     ³
988
  | nle      |                       | not less nor equal     |
968
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
989
  \-----------------------------------------------------------/
Line 969... Line 990...
969
 
990
 
970
  The "loop" instructions are conditional jumps that use a value placed in
991
  The "loop" instructions are conditional jumps that use a value placed in
971
CX (or ECX) to specify the number of repetitions of a software loop. All
992
CX (or ECX) to specify the number of repetitions of a software loop. All
972
"loop" instructions automatically decrement CX (or ECX) and terminate the
993
"loop" instructions automatically decrement CX (or ECX) and terminate the
Line 1156... Line 1177...
1156
    seto byte [bx]   ; set byte if overflow
1177
    seto byte [bx]   ; set byte if overflow
1157
 
1178
 
Line 1158... Line 1179...
1158
  "salc" instruction sets the all bits of AL register when the carry flag is
1179
  "salc" instruction sets the all bits of AL register when the carry flag is
1159
set and zeroes the AL register otherwise. This instruction has no arguments.
1180
set and zeroes the AL register otherwise. This instruction has no arguments.
1160
  The instructions obtained by attaching the condition mnemonic to the "cmov"
1181
  The instructions obtained by attaching the condition mnemonic to "cmov"
1161
mnemonic transfer the word or double word from the general register or memory
1182
mnemonic transfer the word or double word from the general register or memory
1162
to the general register only when the condition is true. The destination
1183
to the general register only when the condition is true. The destination
1163
operand should be general register, the source operand can be general register
1184
operand should be general register, the source operand can be general register
1164
or memory.
1185
or memory.
Line 1363... Line 1384...
1363
  "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
1384
  "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
1364
commonly used contants onto the FPU register stack. The loaded constants are
1385
commonly used contants onto the FPU register stack. The loaded constants are
1365
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
1386
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
1366
have no operands.
1387
have no operands.
1367
  "fild" convert the singed integer source operand into double extended
1388
  "fild" converts the signed integer source operand into double extended
1368
precision floating-point format and pushes the result onto the FPU register
1389
precision floating-point format and pushes the result onto the FPU register
1369
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
1390
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
Line 1370... Line 1391...
1370
 
1391
 
Line 1491... Line 1512...
1491
    fcomi st2        ; compare st0 with st2 and set flags
1512
    fcomi st2        ; compare st0 with st2 and set flags
1492
    fcmovb st0,st2   ; transfer st2 to st0 if below
1513
    fcmovb st0,st2   ; transfer st2 to st0 if below
Line 1493... Line 1514...
1493
 
1514
 
1494
   Table 2.2  FPU conditions
1515
   Table 2.2  FPU conditions
1495
  ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
1516
  /------------------------------------------------------\
1496
  ³ Mnemonic ³ Condition tested ³ Description            ³
1517
  | Mnemonic | Condition tested | Description            |
1497
  ÆÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
1518
  |==========|==================|========================|
1498
  ³ b        ³ CF = 1           ³ below                  ³
1519
  | b        | CF = 1           | below                  |
1499
  ³ e        ³ ZF = 1           ³ equal                  ³
1520
  | e        | ZF = 1           | equal                  |
1500
  ³ be       ³ CF or ZF = 1     ³ below or equal         ³
1521
  | be       | CF or ZF = 1     | below or equal         |
1501
  ³ u        ³ PF = 1           ³ unordered              ³
1522
  | u        | PF = 1           | unordered              |
1502
  ³ nb       ³ CF = 0           ³ not below              ³
1523
  | nb       | CF = 0           | not below              |
1503
  ³ ne       ³ ZF = 0           ³ not equal              ³
1524
  | ne       | ZF = 0           | not equal              |
1504
  ³ nbe      ³ CF and ZF = 0    ³ not below nor equal    ³
1525
  | nbe      | CF and ZF = 0    | not below nor equal    |
1505
  ³ nu       ³ PF = 0           ³ not unordered          ³
1526
  | nu       | PF = 0           | not unordered          |
Line 1506... Line 1527...
1506
  ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
1527
  \------------------------------------------------------/
1507
 
1528
 
1508
  "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
1529
  "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
1509
status word according to the results. "fxam" examines the contents of the ST0
1530
status word according to the results. "fxam" examines the contents of the ST0
Line 1526... Line 1547...
1526
FPU state (operating environment and register stack) at the specified
1547
FPU state (operating environment and register stack) at the specified
1527
destination in memory and reinitializes the FPU. "fsave" check for pending
1548
destination in memory and reinitializes the FPU. "fsave" check for pending
1528
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
1549
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
1529
loads the FPU state from the specified memory location. All these instructions
1550
loads the FPU state from the specified memory location. All these instructions
1530
need an operand being a memory location.
1551
need an operand being a memory location. For each of these instruction
1531
  "finit" and "fninit" set the FPU operating environment into its default
1552
exist two additional mnemonics that allow to precisely select the type of the
-
 
1553
operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
-
 
1554
"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
-
 
1555
mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
-
 
1556
force the operation as in 32-bit mode.
-
 
1557
  "finit" and "fninit" set the FPU operating environment into its default
1532
state. "finit" checks for pending unmasked FPU exception before proceeding,
1558
state. "finit" checks for pending unmasked FPU exception before proceeding,
1533
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
1559
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
1534
FPU status word. "fclex" checks for pending unmasked FPU exception before
1560
FPU status word. "fclex" checks for pending unmasked FPU exception before
1535
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
1561
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
1536
instruction, which causes the processor to check for pending unmasked FPU
1562
instruction, which causes the processor to check for pending unmasked FPU
Line 1571... Line 1597...
1571
"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1597
"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1572
"psubsb" and "psubsw" perform the addition or substraction of packed bytes
1598
"psubsb" and "psubsw" perform the addition or substraction of packed bytes
1573
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
1599
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
1574
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
1600
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
1575
performs a signed multiply of the packed words and store the high or low words
1601
performs a signed multiplication of the packed words and store the high or low
1576
of the results in the destination operand. "pmaddwd" performs a multiply of
1602
words of the results in the destination operand. "pmaddwd" performs a multiply
1577
the packed words and adds the four intermediate double word products in pairs
1603
of the packed words and adds the four intermediate double word products in
1578
to produce result as a packed double words. "pand", "por" and "pxor" perform
1604
pairs to produce result as a packed double words. "pand", "por" and "pxor"
1579
the logical operations on the quad words, "pandn" peforms also a logical
1605
perform the logical operations on the quad words, "pandn" peforms also a
1580
negation of the destination operand before performing the "and" operation.
1606
logical negation of the destination operand before performing the "and"
1581
"pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed bytes,
1607
operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
1582
packed words or packed double words. If a pair of data elements is equal, the
1608
bytes, packed words or packed double words. If a pair of data elements is
1583
corresponding data element in the destination operand is filled with bits of
1609
equal, the corresponding data element in the destination operand is filled with
1584
value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" perform
1610
bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
1585
the similar operation, but they check whether the data elements in the
1611
perform the similar operation, but they check whether the data elements in the
1586
destination operand are greater than the correspoding data elements in the
1612
destination operand are greater than the correspoding data elements in the
1587
source operand. "packsswb" converts packed signed words into packed signed
1613
source operand. "packsswb" converts packed signed words into packed signed
1588
bytes, "packssdw" converts packed signed double words into packed signed
1614
bytes, "packssdw" converts packed signed double words into packed signed
1589
words, using saturation to handle overflow conditions. "packuswb" converts
1615
words, using saturation to handle overflow conditions. "packuswb" converts
1590
packed signed words into packed unsigned bytes. Converted data elements from
1616
packed signed words into packed unsigned bytes. Converted data elements from
1591
the source operand are stored in the low part of the destination operand,
1617
the source operand are stored in the low part of the destination operand,
Line 1697... Line 1723...
1697
    cmpps xmm2,xmm4,0  ; compare packed single precision values
1723
    cmpps xmm2,xmm4,0  ; compare packed single precision values
1698
    cmpltss xmm0,[ebx] ; compare single precision values
1724
    cmpltss xmm0,[ebx] ; compare single precision values
Line 1699... Line 1725...
1699
 
1725
 
1700
   Table 2.3  SSE conditions
1726
   Table 2.3  SSE conditions
1701
  ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
1727
  /-------------------------------------------\
1702
  ³ Code ³ Mnemonic ³ Description             ³
1728
  | Code | Mnemonic | Description             |
1703
  ÆÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͵
1729
  |======|==========|=========================|
1704
  ³ 0    ³ eq       ³ equal                   ³
1730
  | 0    | eq       | equal                   |
1705
  ³ 1    ³ lt       ³ less than               ³
1731
  | 1    | lt       | less than               |
1706
  ³ 2    ³ le       ³ less than or equal      ³
1732
  | 2    | le       | less than or equal      |
1707
  ³ 3    ³ unord    ³ unordered               ³
1733
  | 3    | unord    | unordered               |
1708
  ³ 4    ³ neq      ³ not equal               ³
1734
  | 4    | neq      | not equal               |
1709
  ³ 5    ³ nlt      ³ not less than           ³
1735
  | 5    | nlt      | not less than           |
1710
  ³ 6    ³ nle      ³ not less than nor equal ³
1736
  | 6    | nle      | not less than nor equal |
1711
  ³ 7    ³ ord      ³ ordered                 ³
1737
  | 7    | ord      | ordered                 |
Line 1712... Line 1738...
1712
  ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
1738
  \-------------------------------------------/
1713
 
1739
 
1714
  "comiss" and "ucomiss" compare the single precision values and set the ZF,
1740
  "comiss" and "ucomiss" compare the single precision values and set the ZF,
1715
PF and CF flags to show the result. The destination operand must be a SSE
1741
PF and CF flags to show the result. The destination operand must be a SSE
Line 1769... Line 1795...
1769
    cvtss2si eax,xmm0  ; convert single precision value to integer
1795
    cvtss2si eax,xmm0  ; convert single precision value to integer
Line 1770... Line 1796...
1770
 
1796
 
1771
  "pextrw" copies the word in the source operand specified by the third
1797
  "pextrw" copies the word in the source operand specified by the third
1772
operand to the destination operand. The source operand must be a MMX register,
1798
operand to the destination operand. The source operand must be a MMX register,
1773
the destination operand must be a 32-bit general register (but only the low
1799
the destination operand must be a 32-bit general register (the high word of
Line 1774... Line 1800...
1774
word of it is affected), the third operand must an 8-bit immediate value.
1800
the destination is cleared), the third operand must an 8-bit immediate value.
Line 1775... Line 1801...
1775
 
1801
 
1776
    pextrw eax,mm0,1   ; extract word into eax
1802
    pextrw eax,mm0,1   ; extract word into eax
Line 1786... Line 1812...
1786
  "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"
1812
  "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"
1787
return the maximum values of packed unsigned bytes, "pminub" returns the
1813
return the maximum values of packed unsigned bytes, "pminub" returns the
1788
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
1814
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
1789
of packed signed words, "pminsw" returns the minimum values of packed signed
1815
of packed signed words, "pminsw" returns the minimum values of packed signed
1790
words. "pmulhuw" performs a unsigned multiply of the packed words and stores
1816
words. "pmulhuw" performs a unsigned multiplication of the packed words and
1791
the high words of the results in the destination operand. "psadbw" computes
1817
stores the high words of the results in the destination operand. "psadbw"
1792
the absolute differences of packed unsigned bytes, sums the differences, and
1818
computes the absolute differences of packed unsigned bytes, sums the
1793
stores the sum in the low word of destination operand. All these instructions
1819
differences, and stores the sum in the low word of destination operand. All
1794
follow the same rules for operands as the general MMX operations described in
1820
these instructions follow the same rules for operands as the general MMX
1795
previous section.
1821
operations described in previous section.
1796
  "pmovmskb" creates a mask made of the most significant bit of each byte in
1822
  "pmovmskb" creates a mask made of the most significant bit of each byte in
1797
the source operand and stores the result in the low byte of destination
1823
the source operand and stores the result in the low byte of destination
1798
operand. The source operand must be a MMX register, the destination operand
1824
operand. The source operand must be a MMX register, the destination operand
1799
must a 32-bit general register.
1825
must a 32-bit general register.
1800
  "pshufw" inserts words from the source operand in the destination operand
1826
  "pshufw" inserts words from the source operand in the destination operand
Line 1920... Line 1946...
1920
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
1946
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
1921
point values to packed two double word integers, storing the result in the low
1947
point values to packed two double word integers, storing the result in the low
1922
quad word of the destination operand. "cvtdq2ps" converts packed four
1948
quad word of the destination operand. "cvtdq2ps" converts packed four
1923
double word integers to packed single precision floating point values.
1949
double word integers to packed single precision floating point values.
1924
"cvtdq2pd" converts packed two double word integers from the low quad word
1950
For all these instruction destination operand must be a SSE register, the
1925
of the source operand to packed double precision floating point values.
-
 
1926
For all these instruction destination operand must be a SSE register, the
-
 
1927
source operand can be a 128-bit memory location or SSE register.
1951
source operand can be a 128-bit memory location or SSE register.
1928
  "movdqa" and "movdqu" transfer a double quad word operand containing packed
1952
"cvtdq2pd" converts packed two double word integers from the source operand to
-
 
1953
packed double precision floating point values, the source can be a 64-bit 
-
 
1954
memory location or SSE register, destination has to be SSE register.
-
 
1955
  "movdqa" and "movdqu" transfer a double quad word operand containing packed
1929
integers from source operand to destination operand. At least one of the
1956
integers from source operand to destination operand. At least one of the
1930
operands have to be a SSE register, the second one can be also a SSE register
1957
operands have to be a SSE register, the second one can be also a SSE register
1931
or 128-bit memory location. Memory operands for "movdqa" instruction must be
1958
or 128-bit memory location. Memory operands for "movdqa" instruction must be
1932
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
1959
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
1933
to be aligned.
1960
to be aligned.
Line 1941... Line 1968...
1941
  All MMX instructions operating on the 64-bit packed integers (those with
1968
  All MMX instructions operating on the 64-bit packed integers (those with
1942
mnemonics starting with "p") are extended to operate on 128-bit packed
1969
mnemonics starting with "p") are extended to operate on 128-bit packed
1943
integers located in SSE registers. Additional syntax for these instructions
1970
integers located in SSE registers. Additional syntax for these instructions
1944
needs an SSE register where MMX register was needed, and the 128-bit memory
1971
needs an SSE register where MMX register was needed, and the 128-bit memory
1945
location or SSE register where 64-bit memory location of MMX register were
1972
location or SSE register where 64-bit memory location or MMX register were
1946
needed. The exception is "pshufw" instruction, which doesn't allow extended
1973
needed. The exception is "pshufw" instruction, which doesn't allow extended
1947
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
1974
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
1948
the extended syntax, and perform the same operation as "pshufw" on the high
1975
the extended syntax, and perform the same operation as "pshufw" on the high
1949
or low quad words of operands respectively. Also the new instruction "pshufd"
1976
or low quad words of operands respectively. Also the new instruction "pshufd"
1950
is introduced, which performs the same operation as "pshufw", but on the
1977
is introduced, which performs the same operation as "pshufw", but on the
Line 1953... Line 1980...
1953
    psubb xmm0,[esi]   ; substract 16 packed bytes
1980
    psubb xmm0,[esi]   ; substract 16 packed bytes
1954
    pextrw eax,xmm0,7  ; extract highest word into eax
1981
    pextrw eax,xmm0,7  ; extract highest word into eax
Line 1955... Line 1982...
1955
 
1982
 
1956
  "paddq" performs the addition of packed quad words, "psubq" performs the
1983
  "paddq" performs the addition of packed quad words, "psubq" performs the
1957
substraction of packed quad words, "pmuludq" performs an unsigned multiply
1984
substraction of packed quad words, "pmuludq" performs an unsigned
1958
of low double words from each corresponding quad words and returns the results
1985
multiplication of low double words from each corresponding quad words and
1959
in packed quad words. These instructions follow the same rules for operands as
1986
returns the results in packed quad words. These instructions follow the same
1960
the general MMX operations described in 2.1.14.
1987
rules for operands as the general MMX operations described in 2.1.14.
1961
  "pslldq" and "psrldq" perform logical shift left or right of the double
1988
  "pslldq" and "psrldq" perform logical shift left or right of the double
1962
quad word in the destination operand by the amount of bits specified in the
1989
quad word in the destination operand by the amount of bytes specified in the
1963
source operand. The destination operand should be a SSE register, source
1990
source operand. The destination operand should be a SSE register, source
1964
operand should be an 8-bit immediate value.
1991
operand should be an 8-bit immediate value.
1965
  "punpckhqdq" interleaves the high quad word of the source operand and the
1992
  "punpckhqdq" interleaves the high quad word of the source operand and the
1966
high quad word of the destination operand and writes them to the destination
1993
high quad word of the destination operand and writes them to the destination
Line 2005... Line 2032...
2005
128-bit memory location.
2032
128-bit memory location.
2006
  "movddup" loads the 64-bit source value and duplicates it into high and low
2033
  "movddup" loads the 64-bit source value and duplicates it into high and low
2007
quad word of the destination operand. The destination operand should be SSE
2034
quad word of the destination operand. The destination operand should be SSE
2008
register, the source operand can be SSE register or 64-bit memory location.
2035
register, the source operand can be SSE register or 64-bit memory location.
2009
  "lddqu" is functionally equivalent to "movdqu" instruction with memory as
2036
  "lddqu" is functionally equivalent to "movdqu" with memory as source 
2010
source operand, but it may improve performance when the source operand crosses
2037
operand, but it may improve performance when the source operand crosses a 
2011
a cacheline boundary. The destination operand has to be SSE register, the
2038
cacheline boundary. The destination operand has to be SSE register, the source
2012
source operand must be 128-bit memory location.
2039
operand must be 128-bit memory location.
2013
  "addsubps" performs single precision addition of second and fourth pairs and
2040
  "addsubps" performs single precision addition of second and fourth pairs and
2014
single precision substracion of the first and third pairs of floating point
2041
single precision substracion of the first and third pairs of floating point
2015
values in the operands. "addsubpd" performs double precision addition of the
2042
values in the operands. "addsubpd" performs double precision addition of the
2016
second pair and double precision substraction of the first pair of floating
2043
second pair and double precision substraction of the first pair of floating
2017
point values in the operand. "haddps" performs the addition of two single
2044
point values in the operand. "haddps" performs the addition of two single
2018
precision values within the each quad word of source and destination operands,
2045
precision values within the each quad word of source and destination operands,
Line 2028... Line 2055...
2028
need its three operands to be EAX, ECX and EDX register in that order. "mwait"
2055
need its three operands to be EAX, ECX and EDX register in that order. "mwait"
2029
waits for a write-back store to the address range set up by the "monitor"
2056
waits for a write-back store to the address range set up by the "monitor"
2030
instruction. It uses two operands with additional parameters, first being the
2057
instruction. It uses two operands with additional parameters, first being the
2031
EAX and second the ECX register.
2058
EAX and second the ECX register.
2032
 
2059
  The functionality of SSE3 is further extended by the set of Supplemental
-
 
2060
SSE3 instructions (SSSE3). They generally follow the same rules for operands
-
 
2061
as all the MMX operations extended by SSE.
-
 
2062
  "phaddw" and "phaddd" perform the horizontal additional of the pairs of
-
 
2063
adjacent values from both the source and destination operand, and stores the
-
 
2064
sums into the destination (sums from the source operand go into lower part of
-
 
2065
destination register). They operate on 16-bit or 32-bit chunks, respectively.
-
 
2066
"phaddsw" performs the same operation on signed 16-bit packed values, but the
-
 
2067
result of each addition is saturated. "phsubw" and "phsubd" analogously
-
 
2068
perform the horizontal substraction of 16-bit or 32-bit packed value, and
-
 
2069
"phsubsw" performs the horizontal substraction of signed 16-bit packed values
-
 
2070
with saturation.
-
 
2071
  "pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed
-
 
2072
packed signed value in source operand and stores them into the destination
-
 
2073
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.
-
 
2074
  "pmaddubsw" multiplies signed 8-bit values from the source operand with the
-
 
2075
corresponding unsigned 8-bit values from the destination operand to produce
-
 
2076
intermediate 16-bit values, and every adjacent pair of those intermediate
-
 
2077
values is then added horizontally and those 16-bit sums are stored into the
-
 
2078
destination operand.
-
 
2079
  "pmulhrsw" multiplies corresponding 16-bit integers from the source and
-
 
2080
destination operand to produce intermediate 32-bit values, and the 16 bits
-
 
2081
next to the highest bit of each of those values are then rounded and packed
-
 
2082
into the destination operand.
-
 
2083
  "pshufb" shuffles the bytes in the destination operand according to the
-
 
2084
mask provided by source operand - each of the bytes in source operand is
-
 
2085
an index of the target position for the corresponding byte in the destination.
-
 
2086
  "psignb", "psignw" and "psignd" perform the operation on 8-bit, 16-bit or
-
 
2087
32-bit integers in destination operand, depending on the signs of the values
-
 
2088
in the source. If the value in source is negative, the corresponding value in
-
 
2089
the destination register is negated, if the value in source is positive, no
-
 
2090
operation is performed on the corresponding value is performed, and if the
-
 
2091
value in source is zero, the value in destination is zeroed, too.
-
 
2092
  "palignr" appends the source operand to the destination operand to form the
-
 
2093
intermediate value of twice the size, and then extracts into the destination
-
 
2094
register the 64 or 128 bits that are right-aligned to the byte offset
-
 
2095
specified by the third operand, which should be an 8-bit immediate value. This
-
 
2096
is the only SSSE3 instruction that takes three arguments.
-
 
2097
 
Line 2033... Line 2098...
2033
 
2098
 
Line 2034... Line 2099...
2034
2.1.18  AMD 3DNow! instructions
2099
2.1.18  AMD 3DNow! instructions
2035
 
2100
 
2036
The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
2101
The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
2037
and introduces operation on the 64-bit packed floating point values, each
2102
and introduces operation on the 64-bit packed floating point values, each
2038
consisting of two single precision floating point values.
2103
consisting of two single precision floating point values.
2039
  These instructions follow the same rules as the general MMX operations, the
2104
  These instructions follow the same rules as the general MMX operations, the
2040
destination operand should be a MMX register, the source operand can be a MMX
2105
destination operand should be a MMX register, the source operand can be a MMX
2041
register or 64-bit memory location. "pavgusb" computes the rounded averages
2106
register or 64-bit memory location. "pavgusb" computes the rounded averages
2042
of packed unsigned bytes. "pmulhrw" performs a signed multiply of the packed
2107
of packed unsigned bytes. "pmulhrw" performs a signed multiplication of the
2043
words, round the high word of each double word results and stores them in the
2108
packed words, round the high word of each double word results and stores them
2044
destination operand. "pi2fd" converts packed double word integers into
2109
in the destination operand. "pi2fd" converts packed double word integers into
2045
packed floating point values. "pf2id" converts packed floating point values
2110
packed floating point values. "pf2id" converts packed floating point values
2046
into packed double word integers using truncation. "pi2fw" converts packed
2111
into packed double word integers using truncation. "pi2fw" converts packed
2047
word integers into packed floating point values, only low words of each
2112
word integers into packed floating point values, only low words of each
Line 2104... Line 2169...
2104
"ch" and "dh" registers in long mode, but you cannot use them in the same
2169
"ch" and "dh" registers in long mode, but you cannot use them in the same
2105
instruction with any of the new registers.
2170
instruction with any of the new registers.
2106
 
2171
 
Line 2107... Line 2172...
2107
   Table 2.4  New registers in long mode
2172
   Table 2.4  New registers in long mode
2108
  ÚÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄ¿
2173
  /--------------------------------------------------\
2109
  ³ Type ³          General          ³  SSE  ³
2174
  | Type |          General          |  SSE  |  AVX  |
2110
  ÃÄÄÄÄÄÄÅÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄÅÄÄÄÄÄÄÄ´
2175
  |------|---------------------------|-------|-------|
2111
  ³ Bits ³  8   ³  16  ³  32  ³  64  ³  128  ³
2176
  | Bits |  8   |  16  |  32  |  64  |  128  |  256  |
2112
  ÆÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍØÍÍÍÍÍÍ͵
2177
  |======|======|======|======|======|=======|=======|
2113
  ³      ³      ³      ³      ³ rax  ³       ³
2178
  |      |      |      |      | rax  |       |       |
2114
  ³      ³      ³      ³      ³ rcx  ³       ³
2179
  |      |      |      |      | rcx  |       |       |
2115
  ³      ³      ³      ³      ³ rdx  ³       ³
2180
  |      |      |      |      | rdx  |       |       |
2116
  ³      ³      ³      ³      ³ rbx  ³       ³
2181
  |      |      |      |      | rbx  |       |       |
2117
  ³      ³ spl  ³      ³      ³ rsp  ³       ³
2182
  |      | spl  |      |      | rsp  |       |       |
2118
  ³      ³ bpl  ³      ³      ³ rbp  ³       ³
2183
  |      | bpl  |      |      | rbp  |       |       |
2119
  ³      ³ sil  ³      ³      ³ rsi  ³       ³
2184
  |      | sil  |      |      | rsi  |       |       |
2120
  ³      ³ dil  ³      ³      ³ rdi  ³       ³
2185
  |      | dil  |      |      | rdi  |       |       |
2121
  ³      ³ r8b  ³ r8w  ³ r8d  ³ r8   ³ xmm8  ³
2186
  |      | r8b  | r8w  | r8d  | r8   | xmm8  | ymm8  |
2122
  ³      ³ r9b  ³ r9w  ³ r9d  ³ r9   ³ xmm9  ³
2187
  |      | r9b  | r9w  | r9d  | r9   | xmm9  | ymm9  |
2123
  ³      ³ r10b ³ r10w ³ r10d ³ r10  ³ xmm10 ³
2188
  |      | r10b | r10w | r10d | r10  | xmm10 | ymm10 |
2124
  ³      ³ r11b ³ r11w ³ r11d ³ r11  ³ xmm11 ³
2189
  |      | r11b | r11w | r11d | r11  | xmm11 | ymm11 |
2125
  ³      ³ r12b ³ r12w ³ r12d ³ r12  ³ xmm12 ³
2190
  |      | r12b | r12w | r12d | r12  | xmm12 | ymm12 |
2126
  ³      ³ r13b ³ r13w ³ r13d ³ r13  ³ xmm13 ³
2191
  |      | r13b | r13w | r13d | r13  | xmm13 | ymm13 |
2127
  ³      ³ r14b ³ r14w ³ r14d ³ r14  ³ xmm14 ³
2192
  |      | r14b | r14w | r14d | r14  | xmm14 | ymm14 |
2128
  ³      ³ r15b ³ r15w ³ r15d ³ r15  ³ xmm15 ³
2193
  |      | r15b | r15w | r15d | r15  | xmm15 | ymm15 |
2129
  ÀÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÄÙ
2194
  \--------------------------------------------------/
Line 2130... Line 2195...
2130
 
2195
 
2131
   In general any instruction from x86 architecture, which allowed 16-bit or
2196
   In general any instruction from x86 architecture, which allowed 16-bit or
2132
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
2197
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
2133
registers should be used for addressing in long mode, the 32-bit addressing
2198
registers should be used for addressing in long mode, the 32-bit addressing
Line 2163... Line 2228...
2163
  If any operation is performed on the 32-bit general registers in long mode,
2228
  If any operation is performed on the 32-bit general registers in long mode,
2164
the upper 32 bits of the 64-bit registers containing them are filled with
2229
the upper 32 bits of the 64-bit registers containing them are filled with
2165
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
2230
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
2166
registers, which preserve the upper bits.
2231
registers, which preserve the upper bits.
2167
  Three new type conversion instructions are available. The "cdqe" sign extends
2232
  Three new type conversion instructions are available. The "cdqe" sign 
2168
the double word in EAX into quad word and stores the result in RAX register.
2233
extends the double word in EAX into quad word and stores the result in RAX 
2169
"cqo" sign extends the quad word in RAX into double quad word and stores the
2234
register. "cqo" sign extends the quad word in RAX into double quad word and 
2170
extra bits in the RDX register. These instructions have no operands. "movsxd"
2235
stores the extra bits in the RDX register. These instructions have no 
2171
sign extends the double word source operand, being either the 32-bit register
2236
operands. "movsxd" sign extends the double word source operand, being either
2172
or memory, into 64-bit destination operand, which has to be register.
2237
the 32-bit register or memory, into 64-bit destination operand, which has to
2173
No analogous instruction is needed for the zero extension, since it is done
2238
be register. No analogous instruction is needed for the zero extension, since
2174
automatically by any operations on 32-bit registers, as noted in previous
2239
it is done automatically by any operations on 32-bit registers, as noted in
2175
paragraph. And the "movzx" and "movsx" instructions, conforming to the general
2240
previous paragraph. And the "movzx" and "movsx" instructions, conforming to
2176
rule, can be used with 64-bit destination operand, allowing extension of byte
2241
the general rule, can be used with 64-bit destination operand, allowing
2177
or word values into quad words.
2242
extension of byte or word values into quad words.
2178
  All the binary arithmetic and logical instruction are promoted to allow
2243
  All the binary arithmetic and logical instruction have been promoted to
2179
64-bit operands in long mode. The use of decimal arithmetic instructions in
2244
allow 64-bit operands in long mode. The use of decimal arithmetic instructions
2180
long mode is prohibited.
2245
in long mode is prohibited.
2181
  The stack operations, like "push" and "pop" in long mode default to 64-bit
2246
  The stack operations, like "push" and "pop" in long mode default to 64-bit
2182
operands and it's not possible to use 32-bit operands with them. The "pusha"
2247
operands and it's not possible to use 32-bit operands with them. The "pusha"
2183
and "popa" are disallowed in long mode.
2248
and "popa" are disallowed in long mode.
2184
  The indirect near jumps and calls in long mode default to 64-bit operands and
2249
  The indirect near jumps and calls in long mode default to 64-bit operands
2185
it's not possible to use the 32-bit operands with them. On the other hand, the
2250
and it's not possible to use the 32-bit operands with them. On the other hand,
2186
indirect far jumps and calls allow any operands that were allowed by the x86
2251
the indirect far jumps and calls allow any operands that were allowed by the 
2187
architecture and also 80-bit memory operand is allowed (though only EM64T seems
2252
x86 architecture and also 80-bit memory operand is allowed (though only EM64T
2188
to implement such variant), with the first eight bytes defining the offset and
2253
seems to implement such variant), with the first eight bytes defining the 
2189
two last bytes specifying the selector. The direct far jumps and calls are not
2254
offset and two last bytes specifying the selector. The direct far jumps and 
2190
allowed in long mode.
2255
calls are not allowed in long mode.
2191
  The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
2256
  The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
2192
instructions that are not extended to accept quad word operands in long mode.
2257
instructions that are not extended to accept quad word operands in long mode.
2193
But all other string operations are, and there are new short forms "movsq",
2258
But all other string operations are, and there are new short forms "movsq",
2194
"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
2259
"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
2195
operations for 64-bit string elements. The RSI and RDI registers are used by
2260
operations for 64-bit string elements. The RSI and RDI registers are used by
2196
default to address the string elements.
2261
default to address the string elements.
Line 2201... Line 2266...
2201
in long mode require the 80-bit memory operand.
2266
in long mode require the 80-bit memory operand.
2202
  The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
2267
  The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
2203
the double quad word memory operand and 64-bit registers to perform the
2268
the double quad word memory operand and 64-bit registers to perform the
2204
analoguous operation.
2269
analoguous operation.
2205
  "swapgs" is the new instruction, which swaps the contents of GS register and
2270
  The "fxsave64" and "fxrstor64" are new variants of "fxsave" and "fxrstor"
-
 
2271
instructions, available only in long mode, which use a different format of
-
 
2272
storage area in order to store some pointers in full 64-bit size.  
-
 
2273
  "swapgs" is the new instruction, which swaps the contents of GS register and
2206
the KernelGSbase model-specific register (MSR address 0C0000102h).
2274
the KernelGSbase model-specific register (MSR address 0C0000102h).
2207
  "syscall" and "sysret" is the pair of new instructions that provide the
2275
  "syscall" and "sysret" is the pair of new instructions that provide the
2208
functionality similar to "sysenter" and "sysexit" in long mode, where the
2276
functionality similar to "sysenter" and "sysexit" in long mode, where the
2209
latter pair is disallowed.
2277
latter pair is disallowed. The "sysexitq" and "sysretq" mnemonics provide the
-
 
2278
64-bit versions of "sysexit" and "sysret" instructions.
-
 
2279
  The "rdmsrq" and "wrmsrq" mnemonics are the 64-bit variants of the "rdmsr"
-
 
2280
and "wrmsr" instructions.
-
 
2281
 
-
 
2282
 
-
 
2283
2.1.20  SSE4 instructions
-
 
2284
 
-
 
2285
There are actually three different sets of instructions under the name SSE4.
-
 
2286
Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
-
 
2287
former into the full Intel's SSE4 set. On the other hand, the implementation
-
 
2288
by AMD includes only a few instructions from this set, but also contains
-
 
2289
some additional instructions, that are called the SSE4a set.
-
 
2290
  The SSE4.1 instructions mostly follow the same rules for operands, as
-
 
2291
the basic SSE operations, so they require destination operand to be SSE
-
 
2292
register and source operand to be 128-bit memory location or SSE register,
-
 
2293
and some operations require a third operand, the 8-bit immediate value.
-
 
2294
  "pmulld" performs a signed multiplication of the packed double words and
-
 
2295
stores the low double words of the results in the destination operand.
-
 
2296
"pmuldq" performs a two signed multiplications of the corresponding double
-
 
2297
words in the lower quad words of operands, and stores the results as
-
 
2298
packed quad words into the destination register. "pminsb" and "pmaxsb"
-
 
2299
return the minimum or maximum values of packed signed bytes, "pminuw" and
-
 
2300
"pmaxuw" return the minimum and maximum values of packed unsigned words,
-
 
2301
"pminud", "pmaxud", "pminsd" and "pmaxsd" return minimum or maximum values
-
 
2302
of packed unsigned or signed words. These instruction complement the
-
 
2303
instructions computing packed minimum or maximum introduced by SSE.
-
 
2304
  "ptest" sets the ZF flag to one when the result of bitwise AND of the
-
 
2305
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
-
 
2306
to one, when the result of bitwise AND of the destination operand with
-
 
2307
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
-
 
2308
"pcmpeqq" compares packed quad words for equality, and fills the
-
 
2309
corresponding elements of destination operand with either ones or zeros,
-
 
2310
depending on the result of comparison.
-
 
2311
  "packusdw" converts packed signed double words from both the source and
-
 
2312
destination operand into the unsigned words using saturation, and stores
-
 
2313
the eight resulting word values into the destination register.
-
 
2314
  "phminposuw" finds the minimum unsigned word value in source operand and
-
 
2315
places it into the lowest word of destination operand, setting the remaining
-
 
2316
upper bits of destination to zero.
-
 
2317
  "roundps", "roundss", "roundpd" and "roundsd" perform the rounding of packed
-
 
2318
or individual floating point value of single or double precision, using the
-
 
2319
rounding mode specified by the third operand.
-
 
2320
 
-
 
2321
    roundsd xmm0,xmm1,0011b ; round toward zero
-
 
2322
 
-
 
2323
  "dpps" calculates dot product of packed single precision floating point
-
 
2324
values, that is it multiplies the corresponding pairs of values from source and
-
 
2325
destination operand and then sums the products up. The high four bits of the
-
 
2326
8-bit immediate third operand control which products are calculated and taken
-
 
2327
to the sum, and the low four bits control, into which elements of destination
-
 
2328
the resulting dot product is copied (the other elements are filled with zero).
-
 
2329
"dppd" calculates dot product of packed double precision floating point values.
-
 
2330
The bits 4 and 5 of third operand control, which products are calculated and
-
 
2331
added, and bits 0 and 1 of this value control, which elements in destination
-
 
2332
register should get filled with the result. "mpsadbw" calculates multiple sums
-
 
2333
of absolute differences of unsigned bytes. The third operand controls, with
-
 
2334
value in bits 0-1, which of the four-byte blocks in source operand is taken to
-
 
2335
calculate the absolute differencies, and with value in bit 2, at which of the
-
 
2336
two first four-byte block in destination operand start calculating multiple
-
 
2337
sums. The sum is calculated from four absolute differencies between the
-
 
2338
corresponding unsigned bytes in the source and destination block, and each next
-
 
2339
sum is calculated in the same way, but taking the four bytes from destination
-
 
2340
at the position one byte after the position of previous block. The four bytes
-
 
2341
from the source stay the same each time. This way eight sums of absolute
-
 
2342
differencies are calculated and stored as packed word values into the
-
 
2343
destination operand. The instructions described in this paragraph follow the
-
 
2344
same rules for operands, as "roundps" instruction.
-
 
2345
  "blendps", "blendvps", "blendpd" and "blendvpd" conditionally copy the
-
 
2346
values from source operand into the destination operand, depending on the bits
-
 
2347
of the mask provided by third operand. If a mask bit is set, the corresponding
-
 
2348
element of source is copied into the same place in destination, otherwise this
-
 
2349
position is destination is left unchanged. The rules for the first two operands
-
 
2350
are the same, as for general SSE instructions. "blendps" and "blendpd" need
-
 
2351
third operand to be 8-bit immediate, and they operate on single or double
-
 
2352
precision values, respectively. "blendvps" and "blendvpd" require third operand
-
 
2353
to be the XMM0 register.
-
 
2354
 
-
 
2355
    blendvps xmm3,xmm7,xmm0 ; blend according to mask
-
 
2356
 
-
 
2357
  "pblendw" conditionally copies word elements from the source operand into the
-
 
2358
destination, depending on the bits of mask provided by third operand, which
-
 
2359
needs to be 8-bit immediate value. "pblendvb" conditionally copies byte
-
 
2360
elements from the source operands into destination, depending on mask defined
-
 
2361
by the third operand, which has to be XMM0 register. These instructions follow
-
 
2362
the same rules for operands as "blendps" and "blendvps" instructions,
-
 
2363
respectively.
-
 
2364
  "insertps" inserts a single precision floating point value taken from the
-
 
2365
position in source operand specified by bits 6-7 of third operand into location
-
 
2366
in destination register selected by bits 4-5 of third operand. Additionally,
-
 
2367
the low four bits of third operand control, which elements in destination
-
 
2368
register will be set to zero. The first two operands follow the same rules as
-
 
2369
for the general SSE operation, the third operand should be 8-bit immediate.
-
 
2370
  "extractps" extracts a single precision floating point value taken from the
-
 
2371
location in source operand specified by low two bits of third operand, and
-
 
2372
stores it into the destination operand. The destination can be a 32-bit memory
-
 
2373
value or general purpose register, the source operand must be SSE register,
-
 
2374
and the third operand should be 8-bit immediate value.
-
 
2375
 
-
 
2376
    extractps edx,xmm3,3 ; extract the highest value
-
 
2377
 
-
 
2378
  "pinsrb", "pinsrd" and "pinsrq" copy a byte, double word or quad word from
-
 
2379
the source operand into the location of destination operand determined by the
-
 
2380
third operand. The destination operand has to be SSE register, the source
-
 
2381
operand can be a memory location of appropriate size, or the 32-bit general
-
 
2382
purpose register (but 64-bit general purpose register for "pinsrq", which is
-
 
2383
only available in long mode), and the third operand has to be 8-bit immediate
-
 
2384
value. These instructions complement the "pinsrw" instruction operating on SSE
-
 
2385
register destination, which was introduced by SSE2.
-
 
2386
 
-
 
2387
    pinsrd xmm4,eax,1 ; insert double word into second position
-
 
2388
 
-
 
2389
  "pextrb", "pextrw", "pextrd" and "pextrq" copy a byte, word, double word or
-
 
2390
quad word from the location in source operand specified by third operand, into
-
 
2391
the destination. The source operand should be SSE register, the third operand
-
 
2392
should be 8-bit immediate, and the destination operand can be memory location
-
 
2393
of appropriate size, or the 32-bit general purpose register (but 64-bit general
-
 
2394
purpose register for "pextrq", which is only available in long mode). The
-
 
2395
"pextrw" instruction with SSE register as source was already introduced by
-
 
2396
SSE2, but SSE4 extends it to allow memory operand as destination.
-
 
2397
 
-
 
2398
    pextrw [ebx],xmm3,7 ; extract highest word into memory
-
 
2399
 
-
 
2400
  "pmovsxbw" and "pmovzxbw" perform sign extension or zero extension of eight 
-
 
2401
byte values from the source operand into packed word values in destination 
-
 
2402
operand, which has to be SSE register. The source can be 64-bit memory or SSE 
-
 
2403
register - when it is register, only its low portion is used. "pmovsxbd" and 
-
 
2404
"pmovzxbd" perform sign extension or zero extension of the four byte values 
-
 
2405
from the source operand into packed double word values in destination operand, 
-
 
2406
the source can be 32-bit memory or SSE register. "pmovsxbq" and "pmovzxbq" 
-
 
2407
perform sign extension or zero extension of the two byte values from the 
-
 
2408
source operand into packed quad word values in destination operand, the source
-
 
2409
can be 16-bit memory or SSE register. "pmovsxwd" and "pmovzxwd" perform sign
-
 
2410
extension or zero extension of the four word values from the source operand 
-
 
2411
into packed double words in destination operand, the source can be 64-bit 
-
 
2412
memory or SSE register. "pmovsxwq" and "pmovzxwq" perform sign extension or 
-
 
2413
zero extension of the two word values from the source operand into packed quad
-
 
2414
words in destination operand, the source can be 32-bit memory or SSE register. 
-
 
2415
"pmovsxdq" and "pmovzxdq" perform sign extension or zero extension of the two 
-
 
2416
double word values from the source operand into packed quad words in 
-
 
2417
destination operand, the source can be 64-bit memory or SSE register.
-
 
2418
 
-
 
2419
    pmovzxbq xmm0,word [si]  ; zero-extend bytes to quad words
-
 
2420
    pmovsxwq xmm0,xmm1       ; sign-extend words to quad words 
-
 
2421
 
-
 
2422
  "movntdqa" loads double quad word from the source operand to the destination
-
 
2423
using a non-temporal hint. The destination operand should be SSE register,
-
 
2424
and the source operand should be 128-bit memory location.
-
 
2425
  The SSE4.2, described below, adds not only some new operations on SSE
-
 
2426
registers, but also introduces some completely new instructions operating on
-
 
2427
general purpose registers only.
-
 
2428
  "pcmpistri" compares two zero-ended (implicit length) strings provided in
-
 
2429
its source and destination operand and generates an index stored to ECX;
-
 
2430
"pcmpistrm" performs the same comparison and generates a mask stored to XMM0.
-
 
2431
"pcmpestri" compares two strings of explicit lengths, with length provided
-
 
2432
in EAX for the destination operand and in EDX for the source operand, and
-
 
2433
generates an index stored to ECX; "pcmpestrm" performs the same comparision
-
 
2434
and generates a mask stored to XMM0. The source and destination operand follow
-
 
2435
the same rules as for general SSE instructions, the third operand should be
-
 
2436
8-bit immediate value determining the details of performed operation - refer to
-
 
2437
Intel documentation for information on those details.
-
 
2438
  "pcmpgtq" compares packed quad words, and fills the corresponding elements of
-
 
2439
destination operand with either ones or zeros, depending on whether the value
-
 
2440
in destination is greater than the one in source, or not. This instruction
-
 
2441
follows the same rules for operands as "pcmpeqq".
-
 
2442
  "crc32" accumulates a CRC32 value for the source operand starting with
-
 
2443
initial value provided by destination operand, and stores the result in
-
 
2444
destination. Unless in long mode, the destination operand should be a 32-bit
-
 
2445
general purpose register, and the source operand can be a byte, word, or double
-
 
2446
word register or memory location. In long mode the destination operand can
-
 
2447
also be a 64-bit general purpose register, and the source operand in such case
-
 
2448
can be a byte or quad word register or memory location.
-
 
2449
 
-
 
2450
    crc32 eax,dl          ; accumulate CRC32 on byte value
-
 
2451
    crc32 eax,word [ebx]  ; accumulate CRC32 on word value
-
 
2452
    crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
-
 
2453
 
-
 
2454
  "popcnt" calculates the number of bits set in the source operand, which can
-
 
2455
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
-
 
2456
and stores this count in the destination operand, which has to be register of
-
 
2457
the same size as source operand. The 64-bit variant is available only in long
-
 
2458
mode.
-
 
2459
 
-
 
2460
    popcnt ecx,eax        ; count bits set to 1
-
 
2461
 
-
 
2462
  The SSE4a extension, which also includes the "popcnt" instruction introduced
-
 
2463
by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the
-
 
2464
same syntax, and calculates the count of leading zero bits in source operand
-
 
2465
(if the source operand is all zero bits, the total number of bits in source
-
 
2466
operand is stored in destination).
-
 
2467
  "extrq" extract the sequence of bits from the low quad word of SSE register
-
 
2468
provided as first operand and stores them at the low end of this register,
-
 
2469
filling the remaining bits in the low quad word with zeros. The position of bit
-
 
2470
string and its length can either be provided with two 8-bit immediate values
-
 
2471
as second and third operand, or by SSE register as second operand (and there
-
 
2472
is no third operand in such case), which should contain position value in bits
-
 
2473
8-13 and length of bit string in bits 0-5.
-
 
2474
 
-
 
2475
    extrq xmm0,8,7        ; extract 8 bits from position 7
-
 
2476
    extrq xmm0,xmm5       ; extract bits defined by register
-
 
2477
 
-
 
2478
  "insertq" writes the sequence of bits from the low quad word of the source
-
 
2479
operand into specified position in low quad word of the destination operand,
-
 
2480
leaving the other bits in low quad word of destination intact. The position
-
 
2481
where bits should be written and the length of bit string can either be
-
 
2482
provided with two 8-bit immediate values as third and fourth operand, or by
-
 
2483
the bit fields in source operand (and there are only two operands in such
-
 
2484
case), which should contain position value in bits 72-77 and length of bit
-
 
2485
string in bits 64-69.
-
 
2486
 
-
 
2487
    insertq xmm1,xmm0,4,2 ; insert 4 bits at position 2
-
 
2488
    insertq xmm1,xmm0     ; insert bits defined by register
-
 
2489
 
-
 
2490
  "movntss" and "movntsd" store single or double precision floating point
-
 
2491
value from the source SSE register into 32-bit or 64-bit destination memory
-
 
2492
location respectively, using non-temporal hint.
-
 
2493
 
-
 
2494
 
-
 
2495
2.1.21  AVX instructions
-
 
2496
 
-
 
2497
The Advanced Vector Extensions introduce instructions that are new variants 
-
 
2498
of SSE instructions, with new scheme of encoding that allows extended syntax 
-
 
2499
having a destination operand separate from all the source operands. It also 
-
 
2500
introduces 256-bit AVX registers, which extend up the old 128-bit SSE 
-
 
2501
registers. Any AVX instruction that puts some result into SSE register, puts 
-
 
2502
zero bits into high portion of the AVX register containing it.
-
 
2503
  The AVX version of SSE instruction has the mnemonic obtained by prepending
-
 
2504
SSE instruction name with "v". For any SSE arithmetic instruction which had a
-
 
2505
destination operand also being used as one of the source values, the AVX 
-
 
2506
variant has a new syntax with three operands - the destination and two sources. 
-
 
2507
The destination and first source can be SSE registers, and second source can be
-
 
2508
SSE register or memory. If the operation is performed on single pair of values,
-
 
2509
the remaining bits of first source SSE register are copied into the the 
-
 
2510
destination register.
-
 
2511
 
-
 
2512
    vsubss xmm0,xmm2,xmm3         ; substract two 32-bit floats
-
 
2513
    vmulsd xmm0,xmm7,qword [esi]  ; multiply two 64-bit floats 
-
 
2514
 
-
 
2515
In case of packed operations, each instruction can also operate on the 256-bit 
-
 
2516
data size when the AVX registers are specified instead of SSE registers, and 
-
 
2517
the size of memory operand is also doubled then.
-
 
2518
 
-
 
2519
    vaddps ymm1,ymm5,yword [esi]  ; eight sums of 32-bit float pairs 
-
 
2520
 
-
 
2521
The instructions that operate on packed integer types (in particular the ones
-
 
2522
that earlier had been promoted from MMX to SSE) also acquired the new syntax
-
 
2523
with three operands, however they are only allowed to operate on 128-bit 
-
 
2524
packed types and thus cannot use the whole AVX registers.
-
 
2525
 
-
 
2526
    vpavgw xmm3,xmm0,xmm2         ; average of 16-bit integers
-
 
2527
    vpslld xmm1,xmm0,1            ; shift double words left
-
 
2528
     
-
 
2529
If the SSE version of instruction had a syntax with three operands, the third
-
 
2530
one being an immediate value, the AVX version of such instruction takes four
-
 
2531
operands, with immediate remaining the last one.
-
 
2532
 
-
 
2533
    vshufpd ymm0,ymm1,ymm2,10010011b ; shuffle 64-bit floats
-
 
2534
    vpalignr xmm0,xmm4,xmm2,3        ; extract byte aligned value
-
 
2535
     
-
 
2536
The promotion to new syntax according to the rules described above has been 
-
 
2537
applied to all the instructions from SSE extensions up to SSE4, with the 
-
 
2538
exceptions described below.   
-
 
2539
  "vdppd" instruction has syntax extended to four operans, but it does not 
-
 
2540
have a 256-bit version.
-
 
2541
  The are a few instructions, namely "vsqrtpd", "vsqrtps", "vrcpps" and
-
 
2542
"vrsqrtps", which can operate on 256-bit data size, but retained the syntax 
-
 
2543
with only two operands, because they use data from only one source:
-
 
2544
    
-
 
2545
    vsqrtpd ymm1,ymm0         ; put square roots into other register
-
 
2546
 
-
 
2547
In a similar way "vroundpd" and "vroundps" retained the syntax with three 
-
 
2548
operands, the last one being immediate value.   
-
 
2549
 
-
 
2550
    vroundps ymm0,ymm1,0011b  ; round toward zero
-
 
2551
                              
-
 
2552
  Also some of the operations on packed integers kept their two-operand or
-
 
2553
three-operand syntax while being promoted to AVX version. In such case these
-
 
2554
instructions follow exactly the same rules for operands as their SSE 
-
 
2555
counterparts (since operations on packed integers do not have 256-bit variants
-
 
2556
in AVX extension). These include "vpcmpestri", "vpcmpestrm", "vpcmpistri",
-
 
2557
"vpcmpistrm", "vphminposuw", "vpshufd", "vpshufhw", "vpshuflw". And there are 
-
 
2558
more instructions that in AVX versions keep exactly the same syntax for 
-
 
2559
operands as the one from SSE, without any additional options: "vcomiss", 
-
 
2560
"vcomisd", "vcvtss2si", "vcvtsd2si", "vcvttss2si", "vcvttsd2si", "vextractps", 
-
 
2561
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vmovd", "vmovq", "vmovntdqa", 
-
 
2562
"vmaskmovdqu", "vpmovmskb", "vpmovsxbw", "vpmovsxbd", "vpmovsxbq", "vpmovsxwd", 
-
 
2563
"vpmovsxwq", "vpmovsxdq", "vpmovzxbw", "vpmovzxbd", "vpmovzxbq", "vpmovzxwd", 
-
 
2564
"vpmovzxwq" and "vpmovzxdq".
-
 
2565
  The move and conversion instructions have mostly been promoted to allow
-
 
2566
256-bit size operands in addition to the 128-bit variant with syntax identical
-
 
2567
to that from SSE version of the same instruction. Each of the "vcvtdq2ps", 
-
 
2568
"vcvtps2dq" and "vcvttps2dq", "vmovaps", "vmovapd", "vmovups", "vmovupd",
-
 
2569
"vmovdqa", "vmovdqu", "vlddqu", "vmovntps", "vmovntpd", "vmovntdq", 
-
 
2570
"vmovsldup", "vmovshdup", "vmovmskps" and "vmovmskpd" inherits the 128-bit 
-
 
2571
syntax from SSE without any changes, and also allows a new form with 256-bit 
-
 
2572
operands in place of 128-bit ones.  
-
 
2573
 
-
 
2574
    vmovups [edi],ymm6        ; store unaligned 256-bit data
-
 
2575
    
-
 
2576
  "vmovddup" has the identical 128-bit syntax as its SSE version, and it also 
-
 
2577
has a 256-bit version, which stores the duplicates of the lowest quad word 
-
 
2578
from the source operand in the lower half of destination operand, and in the 
-
 
2579
upper half of destination the duplicates of the low quad word from the upper 
-
 
2580
half of source. Both source and destination operands need then to be 256-bit 
-
 
2581
values.
-
 
2582
  "vmovlhps" and "vmovhlps" have only 128-bit versions, and each takes three
-
 
2583
operands, which all must be SSE registers. "vmovlhps" copies two single 
-
 
2584
precision values from the low quad word of second source register to the high 
-
 
2585
quad word of destination register, and copies the low quad word of first 
-
 
2586
source register into the low quad word of destination register. "vmovhlps" 
-
 
2587
copies two single  precision values from the high quad word of second source 
-
 
2588
register to the low quad word of destination register, and copies the high 
-
 
2589
quad word of first source register into the high quad word of destination 
-
 
2590
register. 
-
 
2591
  "vmovlps", "vmovhps", "vmovlpd" and "vmovhpd" have only 128-bit versions and
-
 
2592
their syntax varies depending on whether memory operand is a destination or
-
 
2593
source. When memory is destination, the syntax is identical to the one of
-
 
2594
equivalent SSE instruction, and when memory is source, the instruction requires
-
 
2595
three operands, first two being SSE registers and the third one 64-bit memory.
-
 
2596
The value put into destination is then the value copied from first source with
-
 
2597
either low or high quad word replaced with value from second source (the
-
 
2598
memory operand).
-
 
2599
 
-
 
2600
    vmovhps [esi],xmm7       ; store upper half to memory
-
 
2601
    vmovlps xmm0,xmm7,[ebx]  ; low from memory, rest from register  
-
 
2602
  
-
 
2603
  "vmovss" and "vmovsd" have syntax identical to their SSE equivalents as long
-
 
2604
as one of the operands is memory, while the versions that operate purely on 
-
 
2605
registers require three operands (each being SSE register). The value stored
-
 
2606
in destination is then the value copied from first source with lowest data
-
 
2607
element replaced with the lowest value from second source.
-
 
2608
 
-
 
2609
    vmovss xmm3,[edi]        ; low from memory, rest zeroed
-
 
2610
    vmovss xmm0,xmm1,xmm2    ; one value from xmm2, three from xmm1 
-
 
2611
  
-
 
2612
  "vcvtss2sd", "vcvtsd2ss", "vcvtsi2ss" and "vcvtsi2d" use the three-operand
-
 
2613
syntax, where destination and first source are always SSE registers, and the
-
 
2614
second source follows the same rules and the source in syntax of equivalent
-
 
2615
SSE instruction. The value stored in destination is then the value copied from
-
 
2616
first source with lowest data element replaced with the result of conversion. 
-
 
2617
 
-
 
2618
    vcvtsi2sd xmm4,xmm4,ecx  ; 32-bit integer to 64-bit float
-
 
2619
    vcvtsi2ss xmm0,xmm0,rax  ; 64-bit integer to 32-bit float
-
 
2620
 
-
 
2621
  "vcvtdq2pd" and "vcvtps2pd" allow the same syntax as their SSE equivalents, 
-
 
2622
plus the new variants with AVX register as destination and SSE register or 
-
 
2623
128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and 
-
 
2624
"vcvtpd2ps", in addition to variant with syntax identical to SSE version, 
-
 
2625
allow a variant with SSE register as destination and AVX register or 256-bit 
-
 
2626
memory as source.          
-
 
2627
  "vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use 
-
 
2628
a syntax with four operands, where destination and first source have to be SSE
-
 
2629
registers, and the third and fourth operand follow the same rules as second 
-
 
2630
and third operand in the syntax of equivalent SSE instruction. Value stored in 
-
 
2631
destination is the the value copied from first source with some data elements 
-
 
2632
replaced with values extracted from the second source, analogously to the 
-
 
2633
operation of corresponding SSE instruction.   
-
 
2634
  
-
 
2635
    vpinsrd xmm0,xmm0,eax,3  ; insert double word
-
 
2636
 
-
 
2637
  "vblendvps", "vblendvpd" and "vpblendvb" use a new syntax with four register
-
 
2638
operands: destination, two sources and a mask, where second source can also be
-
 
2639
a memory operand. "vblendvps" and "vblendvpd" have 256-bit variant, where 
-
 
2640
operands are AVX registers or 256-bit memory, as well as 128-bit variant, 
-
 
2641
which has operands being SSE registers or 128-bit memory. "vpblendvb" has only
-
 
2642
a 128-bit variant. Value stored in destination is the value copied from the
-
 
2643
first source with some data elements replaced, according to mask, by values 
-
 
2644
from the second source.
-
 
2645
 
-
 
2646
    vblendvps ymm3,ymm1,ymm2,ymm7  ; blend according to mask     
-
 
2647
   
-
 
2648
  "vptest" allows the same syntax as its SSE version and also has a 256-bit
-
 
2649
version, with both operands doubled in size. There are also two new 
-
 
2650
instructions, "vtestps" and "vtestpd", which perform analogous tests, but only
-
 
2651
of the sign bits of corresponding single precision or double precision values,
-
 
2652
and set the ZF and CF accordingly. They follow the same syntax rules as 
-
 
2653
"vptest".
-
 
2654
 
-
 
2655
    vptest ymm0,yword [ebx]  ; test 256-bit values
-
 
2656
    vtestpd xmm0,xmm1        ; test sign bits of 64-bit floats
-
 
2657
 
-
 
2658
  "vbroadcastss", "vbroadcastsd" and "vbroadcastf128" are new instructions, 
-
 
2659
which broadcast the data element defined by source operand into all elements
-
 
2660
of corresponing size in the destination register. "vbroadcastss" needs
-
 
2661
source to be 32-bit memory and destination to be either SSE or AVX register. 
-
 
2662
"vbroadcastsd" requires 64-bit memory as source, and AVX register as 
-
 
2663
destination. "vbroadcastf128" requires 128-bit memory as source, and AVX
-
 
2664
register as destination.
2210
 
2665
 
-
 
2666
    vbroadcastss ymm0,dword [eax]  ; get eight copies of value          
-
 
2667
 
-
 
2668
  "vinsertf128" is the new instruction, which takes four operands. The
-
 
2669
destination and first source have to be AVX registers, second source can be 
-
 
2670
SSE register or 128-bit memory location, and fourth operand should be an 
-
 
2671
immediate value. It stores in destination the value obtained by taking 
-
 
2672
contents of first source and replacing one of its 128-bit units with value of
-
 
2673
the second source. The lowest bit of fourth operand specifies at which 
-
 
2674
position that replacement is done (either 0 or 1). 
-
 
2675
  "vextractf128" is the new instruction with three operands. The destination
-
 
2676
needs to be SSE register or 128-bit memory location, the source must be AVX
-
 
2677
register, and the third operand should be an immediate value. It extracts
-
 
2678
into destination one of the 128-bit units from source. The lowest bit of third
-
 
2679
operand specifies, which unit is extracted.  
-
 
2680
  "vmaskmovps" and "vmaskmovpd" are the new instructions with three operands
-
 
2681
that selectively store in destination the elements from second source 
-
 
2682
depending on the sign bits of corresponding elements from first source. These
-
 
2683
instructions can operate on either 128-bit data (SSE registers) or 256-bit 
-
 
2684
data (AVX registers). Either destination or second source has to be a memory
-
 
2685
location of appropriate size, the two other operands should be registers.   
-
 
2686
  
-
 
2687
    vmaskmovps [edi],xmm0,xmm5  ; conditionally store
-
 
2688
    vmaskmovpd ymm5,ymm0,[esi]  ; conditionally load   
-
 
2689
 
-
 
2690
  "vpermilpd" and "vpermilps" are the new instructions with three operands 
-
 
2691
that permute the values from first source according to the control fields from 
-
 
2692
second source and put the result into destination operand. It allows to use
-
 
2693
either three SSE registers or three AVX registers as its operands, the second
-
 
2694
source can be a memory of size equal to the registers used. In alternative
-
 
2695
form the second source can be immediate value and then the first source
-
 
2696
can be a memory location of the size equal to destination register.
-
 
2697
  "vperm2f128" is the new instruction with four operands, which selects 
-
 
2698
128-bit blocks of floating point data from first and second source according
-
 
2699
to the bit fields from fourth operand, and stores them in destination.
-
 
2700
Destination and first source need to be AVX registers, second source can be
-
 
2701
AVX register or 256-bit memory area, and fourth operand should be an immediate
-
 
2702
value.
-
 
2703
 
-
 
2704
    vperm2f128 ymm0,ymm6,ymm7,12h  ; permute 128-bit blocks
-
 
2705
 
-
 
2706
  "vzeroall" instruction sets all the AVX registers to zero. "vzeroupper" sets
-
 
2707
the upper 128-bit portions of all AVX registers to zero, leaving the SSE 
-
 
2708
registers intact. These new instructions take no operands.
-
 
2709
  "vldmxcsr" and "vstmxcsr" are the AVX versions of "ldmxcsr" and "stmxcsr"
-
 
2710
instructions. The rules for their operands remain unchanged.  
-
 
2711
 
-
 
2712
  
-
 
2713
2.1.22  AVX2 instructions
-
 
2714
 
-
 
2715
The AVX2 extension allows all the AVX instructions operating on packed integers
-
 
2716
to use 256-bit data types, and introduces some new instructions as well.
-
 
2717
  The AVX instructions that operate on packed integers and had only a 128-bit
-
 
2718
variants, have been supplemented with 256-bit variants, and thus their syntax
-
 
2719
rules became analogous to AVX instructions operating on packed floating point
-
 
2720
types.
-
 
2721
 
-
 
2722
    vpsubb ymm0,ymm0,[esi]   ; substract 32 packed bytes
-
 
2723
    vpavgw ymm3,ymm0,ymm2    ; average of 16-bit integers
-
 
2724
 
-
 
2725
However there are some instructions that have not been equipped with the 
-
 
2726
256-bit variants. "vpcmpestri", "vpcmpestrm", "vpcmpistri", "vpcmpistrm", 
-
 
2727
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vpinsrb", "vpinsrw", "vpinsrd", 
-
 
2728
"vpinsrq" and "vphminposuw" are not affected by AVX2 and allow only the 
-
 
2729
128-bit operands.
-
 
2730
  The packed shift instructions, which allowed the third operand specifying
-
 
2731
amount to be SSE register or 128-bit memory location, use the same rules
-
 
2732
for the third operand in their 256-bit variant.
-
 
2733
 
-
 
2734
    vpsllw ymm2,ymm2,xmm4        ; shift words left
-
 
2735
    vpsrad ymm0,ymm3,xword [ebx] ; shift double words right
-
 
2736
 
-
 
2737
  There are also new packed shift instructions with standard three-operand AVX
-
 
2738
syntax, which shift each element from first source by the amount specified in 
-
 
2739
corresponding element of second source, and store the results in destination. 
-
 
2740
"vpsllvd" shifts 32-bit elements left, "vpsllvq" shifts 64-bit elements left, 
-
 
2741
"vpsrlvd" shifts 32-bit elements right logically, "vpsrlvq" shifts 64-bit 
-
 
2742
elements right logically and "vpsravd" shifts 32-bit elements right 
-
 
2743
arithmetically.
-
 
2744
  The sign-extend and zero-extend instructions, which in AVX versions allowed
-
 
2745
source operand to be SSE register or a memory of specific size, in the new
-
 
2746
256-bit variant need memory of that size doubled or SSE register as source and
-
 
2747
AVX register as destination.
-
 
2748
 
-
 
2749
    vpmovzxbq ymm0,dword [esi]   ; bytes to quad words
-
 
2750
    
-
 
2751
  Also "vmovntdqa" has been upgraded with 256-bit variant, so it allows to 
-
 
2752
transfer 256-bit value from memory to AVX register, it needs memory address 
-
 
2753
to be aligned to 32 bytes.   
-
 
2754
  "vpmaskmovd" and "vpmaskmovq" are the new instructions with syntax identical
-
 
2755
to "vmaskmovps" or "vmaskmovpd", and they performs analogous operation on
-
 
2756
packed 32-bit or 64-bit values.    
-
 
2757
  "vinserti128", "vextracti128", "vbroadcasti128" and "vperm2i128" are the new 
-
 
2758
instructions with syntax identical to "vinsertf128", "vextractf128",
-
 
2759
"vbroadcastf128" and "vperm2f128" respectively, and they perform analogous 
-
 
2760
operations on 128-bit blocks of integer data.
-
 
2761
  "vbroadcastss" and "vbroadcastsd" instructions have been extended to allow
-
 
2762
SSE register as a source operand (which in AVX could only be a memory).
-
 
2763
  "vpbroadcastb", "vpbroadcastw", "vpbroadcastd" and "vpbroadcastq" are the 
-
 
2764
new instructions which broadcast the byte, word, double word or quad word from
-
 
2765
the source operand into all elements of corresponing size in the destination 
-
 
2766
register. The destination operand can be either SSE or AVX register, and the
-
 
2767
source operand can be SSE register or memory of size equal to the size of data
-
 
2768
element.
-
 
2769
 
-
 
2770
    vpbroadcastb ymm0,byte [ebx]  ; get 32 identical bytes
-
 
2771
                 
-
 
2772
  "vpermd" and "vpermps" are new three-operand instructions, which use each 
-
 
2773
32-bit element from first source as an index of element in second source which
-
 
2774
is copied into destination at position corresponding to element containing
-
 
2775
index. The destination and first source have to be AVX registers, and the
-
 
2776
second source can be AVX register or 256-bit memory.
-
 
2777
  "vpermq" and "vpermpd" are new three-operand instructions, which use 2-bit
-
 
2778
indexes from the immediate value specified as third operand to determine which
-
 
2779
element from source store at given position in destination. The destination
-
 
2780
has to be AVX register, source can be AVX register or 256-bit memory, and the
-
 
2781
third operand must be 8-bit immediate value.    
-
 
2782
  The family of new instructions performing "gather" operation have special
-
 
2783
syntax, as in their memory operand they use addressing mode that is unique to
-
 
2784
them. The base of address can be a 32-bit or 64-bit general purpose register
-
 
2785
(the latter only in long mode), and the index (possibly multiplied by scale
-
 
2786
value, as in standard addressing) is specified by SSE or AVX register. It is
-
 
2787
possible to use only index without base and any numerical displacement can be
-
 
2788
added to the address. Each of those instructions takes three operands. First 
-
 
2789
operand is the destination register, second operand is memory addressed with
-
 
2790
a vector index, and third operand is register containing a mask. The most 
-
 
2791
significant bit of each element of mask determines whether a value will be 
-
 
2792
loaded from memory into corresponding element in destination. The address of
-
 
2793
each element to load is determined by using the corresponding element from 
-
 
2794
index register in memory operand to calculate final address with given base
-
 
2795
and displacement. When the index register contains less elements than the 
-
 
2796
destination and mask registers, the higher elements of destination are zeroed.
-
 
2797
After the value is successfuly loaded, the corresponding element in mask 
-
 
2798
register is set to zero. The destination, index and mask should all be
-
 
2799
distinct registers, it is not allowed to use the same register in two 
-
 
2800
different roles.
-
 
2801
  "vgatherdps" loads single precision floating point values addressed by 
-
 
2802
32-bit indexes. The destination, index and mask should all be registers of the
-
 
2803
same type, either SSE or AVX. The data addressed by memory operand is 32-bit
-
 
2804
in size. 
-
 
2805
 
-
 
2806
    vgatherdps xmm0,[eax+xmm1],xmm3    ; gather four floats
-
 
2807
    vgatherdps ymm0,[ebx+ymm7*4],ymm3  ; gather eight floats
-
 
2808
 
-
 
2809
  "vgatherqps" loads single precision floating point values addressed by
-
 
2810
64-bit indexes. The destination and mask should always be SSE registers, while
-
 
2811
index register can be either SSE or AVX register. The data addressed by memory
-
 
2812
operand is 32-bit in size.
-
 
2813
  
-
 
2814
    vgatherqps xmm0,[xmm2],xmm3        ; gather two floats     
-
 
2815
    vgatherqps xmm0,[ymm2+64],xmm3     ; gather four floats  
-
 
2816
  
-
 
2817
  "vgatherdpd" loads double precision floating point values addressed by
-
 
2818
32-bit indexes. The index register should always be SSE register, the 
-
 
2819
destination and mask should be two registers of the same type, either SSE or
-
 
2820
AVX. The data addressed by memory operand is 64-bit in size. 
-
 
2821
  
-
 
2822
    vgatherdpd xmm0,[ebp+xmm1],xmm3    ; gather two doubles
-
 
2823
    vgatherdpd ymm0,[xmm3*8],ymm5      ; gather four doubles
-
 
2824
 
-
 
2825
  "vgatherqpd" loads double precision floating point values addressed by
-
 
2826
64-bit indexes. The destination, index and mask should all be registers of the
-
 
2827
same type, either SSE or AVX. The data addressed by memory operand is 64-bit
-
 
2828
in size.      
-
 
2829
  "vpgatherdd" and "vpgatherqd" load 32-bit values addressed by either 32-bit
-
 
2830
or 64-bit indexes. They follow the same rules as "vgatherdps" and "vgatherqps"
-
 
2831
respectively.  
-
 
2832
  "vpgatherdq" and "vpgatherqq" load 64-bit values addressed by either 32-bit
-
 
2833
or 64-bit indexes. They follow the same rules as "vgatherdpd" and "vgatherqpd"
-
 
2834
respectively.  
-
 
2835
  
-
 
2836
 
-
 
2837
2.1.23  Auxiliary sets of computational instructions
-
 
2838
 
-
 
2839
  There is a number of additional instruction set extensions related to 
-
 
2840
AVX. They introduce new vector instructions (and sometimes also their SSE 
-
 
2841
equivalents that use classic instruction encoding), and even some new
-
 
2842
instructions operating on general registers that use the AVX-like encoding
-
 
2843
allowing the extended syntax with separate destination and source operands.
-
 
2844
The CPU support for each of these instruction sets needs to be determined
-
 
2845
separately.    
-
 
2846
  The AES extension provides a specialized set of instructions for the 
-
 
2847
purpose of cryptographic computations defined by Advanced Encryption Standard.
-
 
2848
Each of these instructions has two versions: the AVX one and the one with 
-
 
2849
SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
-
 
2850
details of operation of these instructions.
-
 
2851
  "aesenc" and "aesenclast" perform a single round of AES encryption on data
-
 
2852
from first source with a round key from second source, and store result in
-
 
2853
destination. The destination and first source are SSE registers, and the 
-
 
2854
second source can be SSE register or 128-bit memory. The AVX versions of these
-
 
2855
instructions, "vaesenc" and "vaesenclast", use the syntax with three operands,
-
 
2856
while the SSE-like version has only two operands, with first operand being 
-
 
2857
both the destination and first source.
-
 
2858
  "aesdec" and "aesdeclast" perform a single round of AES decryption on data
-
 
2859
from first source with a round key from second source. The syntax rules for
-
 
2860
them and their AVX versions are the same as for "aesenc".
-
 
2861
  "aesimc" performs the InvMixColumns transformation of source operand and
-
 
2862
store the result in destination. Both "aesimc" and "vaesimc" use only two
-
 
2863
operands, destination being SSE register, and source being SSE register or
-
 
2864
128-bit memory location.
-
 
2865
  "aeskeygenassist" is a helper instruction for generating the round key.
-
 
2866
It needs three operands: destination being SSE register, source being SSE
-
 
2867
register or 128-bit memory, and third operand being 8-bit immediate value.  
-
 
2868
The AVX version of this instruction uses the same syntax.  
-
 
2869
  The CLMUL extension introduces just one instruction, "pclmulqdq", and its
-
 
2870
AVX version as well. This instruction performs a carryless multiplication of
-
 
2871
two 64-bit values selected from first and second source according to the bit
-
 
2872
fields in immediate value. The destination and first source are SSE registers,
-
 
2873
second source is SSE register or 128-bit memory, and immediate value is 
-
 
2874
provided as last operand. "vpclmulqdq" takes four operands, while "pclmulqdq"
-
 
2875
takes only three operands, with the first one serving both the role of 
-
 
2876
destination and first source.
-
 
2877
  The FMA (Fused Multiply-Add) extension introduces additional AVX 
-
 
2878
instructions which perform multiplication and summation as single operation. 
-
 
2879
Each one takes three operands, first one serving both the role of destination 
-
 
2880
and first source, and the following ones being the second and third source. 
-
 
2881
The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first 
-
 
2882
either "m" or "nm" to select whether result of multiplication should be taken 
-
 
2883
as-is or negated, then either "add" or "sub" to select whether third value 
-
 
2884
will be added to the product or substracted from the product, then either 
-
 
2885
"132", "213" or "231" to select which source operands are multiplied and which 
-
 
2886
one is added or substracted, and finally the type of data on which the 
-
 
2887
instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE 
-
 
2888
instructions promoted to AVX, instructions operating on packed floating point 
-
 
2889
values allow 128-bit or 256-bit syntax, in former all the operands are SSE 
-
 
2890
registers, but the third one can also be a 128-bit memory, in latter the 
-
 
2891
operands are AVX registers and the third one can also be a 256-bit memory. 
-
 
2892
Instructions that compute just one floating point result need operands to be 
-
 
2893
SSE registers, and the third operand can also be a memory, either 32-bit for 
-
 
2894
single precision or 64-bit for double precision.
-
 
2895
 
-
 
2896
    vfmsub231ps ymm1,ymm2,ymm3     ; multiply and substract
-
 
2897
    vfnmadd132sd xmm0,xmm5,[ebx]   ; multiply, negate and add        
-
 
2898
 
-
 
2899
In addition to the instructions created by the rule described above, there are
-
 
2900
families of instructions with mnemonics starting with either "vfmaddsub" or
-
 
2901
"vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or
-
 
2902
"pd" (the operation must always be on packed values in this case). They add
-
 
2903
to the result of multiplication or substract from it depending on the position
-
 
2904
of value in packed data - instructions from the "vfmaddsub" group add when the
-
 
2905
position is odd and substract when the position is even, instructions from the
-
 
2906
"vfmsubadd" group add when the position is even and subtstract when the 
-
 
2907
position is odd. The rules for operands are the same as for other FMA 
-
 
2908
instructions.
-
 
2909
  The FMA4 instructions are similar to FMA, but use syntax with four operands
-
 
2910
and thus allow destination to be different than all the sources. Their 
-
 
2911
mnemonics are identical to FMA instructions with the "132", "213" or "231" cut
-
 
2912
out, as having separate destination operand makes such selection of operands
-
 
2913
superfluous. The multiplication is always performed on values from the first 
-
 
2914
and second source, and then the value from third source is added or 
-
 
2915
substracted. Either second or third source can be a memory operand, and the
-
 
2916
rules for the sizes of operands are the same as for FMA instructions.
-
 
2917
 
-
 
2918
    vfmaddpd ymm0,ymm1,[esi],ymm2  ; multiply and add   
-
 
2919
    vfmsubss xmm0,xmm1,xmm2,[ebx]  ; multiply and substract
-
 
2920
    
-
 
2921
  The F16C extension consists of two instructions, "vcvtps2ph" and 
-
 
2922
"vcvtph2ps", which convert floating point values between single precision and
-
 
2923
half precision (the 16-bit floating point format). "vcvtps2ph" takes three
-
 
2924
operands: destination, source, and rounding controls. The third operand is
-
 
2925
always an immediate, the source is either SSE or AVX register containing 
-
 
2926
single precision values, and the destination is SSE register or memory, the
-
 
2927
size of memory is 64 bits when the source is SSE register and 128 bits when
-
 
2928
the source is AVX register. "vcvtph2ps" takes two operands, the destination
-
 
2929
that can be SSE or AVX register, and the source that is SSE register or memory
-
 
2930
with size of the half of destination operand's size.
-
 
2931
  The AMD XOP extension introduces a number of new vector instructions with 
-
 
2932
encoding and syntax analogous to AVX instructions. "vfrczps", "vfrczss",
-
 
2933
"vfrczpd" and "vfrczsd" extract fractional portions of single or double
-
 
2934
precision values, they all take two operands. The packed operations allow
-
 
2935
either SSE or AVX register as destination, for the other two it has to be SSE
-
 
2936
register. Source can be register of the same type as destination, or memory
-
 
2937
of appropriate size (256-bit for destination being AVX register, 128-bit for
-
 
2938
packed operation with destination being SSE register, 64-bit for operation
-
 
2939
on a solitary double precision value and 32-bit for operation on a solitary 
-
 
2940
single precision value).
-
 
2941
 
-
 
2942
    vfrczps ymm0,[esi]           ; load fractional parts
-
 
2943
    
-
 
2944
  "vpcmov" copies bits from either first or second source into destination
-
 
2945
depending on the values of corresponding bits in the fourth operand (the
-
 
2946
selector). If the bit in selector is set, the corresponding bit from first
-
 
2947
source is copied into the same position in destination, otherwise the bit from
-
 
2948
second source is copied. Either second source or selector can be memory
-
 
2949
location, 128-bit or 256-bit depending on whether SSE registers or AVX
-
 
2950
registers are specified as the other operands.
-
 
2951
 
-
 
2952
    vpcmov xmm0,xmm1,xmm2,[ebx]  ; selector in memory
-
 
2953
    vpcmov ymm0,ymm5,[esi],ymm2  ; source in memory
-
 
2954
 
-
 
2955
The family of packed comparison instructions take four operands, the 
-
 
2956
destination and first source being SSE register, second source being SSE
-
 
2957
register or 128-bit memory and the fourth operand being immediate value
-
 
2958
defining the type of comparison. The mnemonic or instruction is created
-
 
2959
by appending to "vpcom" prefix either "b" or "ub" to compare signed or 
-
 
2960
unsigned bytes, "w" or "uw" to compare signed or unsigned words, "d" or "ud"
-
 
2961
to compare signed or unsigned double words, "q" or "uq" to compare signed or
-
 
2962
unsigned quad words. The respective values from the first and second source 
-
 
2963
are compared and the corresponding data element in destination is set to
-
 
2964
either all ones or all zeros depending on the result of comparison. The fourth
-
 
2965
operand has to specify one of the eight comparison types (table 2.5). All
-
 
2966
these instruction have also variants with only three operands and the type 
-
 
2967
of comparison encoded within the instruction name by inserting the comparison 
-
 
2968
mnemonic after "vpcom".
-
 
2969
 
-
 
2970
    vpcomb   xmm0,xmm1,xmm2,4    ; test for equal bytes
-
 
2971
    vpcomgew xmm0,xmm1,[ebx]     ; compare signed words
-
 
2972
 
-
 
2973
   Table 2.5  XOP comparisons
-
 
2974
  /-------------------------------------------\
-
 
2975
  | Code | Mnemonic | Description             |
-
 
2976
  |======|==========|=========================|
-
 
2977
  | 0    | lt       | less than               |
-
 
2978
  | 1    | le       | less than or equal      |
-
 
2979
  | 2    | gt       | greater than            |
-
 
2980
  | 3    | ge       | greater than or equal   |
-
 
2981
  | 4    | eq       | equal                   |
-
 
2982
  | 5    | neq      | not equal               |
-
 
2983
  | 6    | false    | false                   |
-
 
2984
  | 7    | true     | true                    |
-
 
2985
  \-------------------------------------------/
-
 
2986
 
-
 
2987
  "vpermil2ps" and "vpermil2pd" set the elements in destination register to
-
 
2988
zero or to a value selected from first or second source depending on the 
-
 
2989
corresponding bit fields from the fourth operand (the selector) and the 
-
 
2990
immediate value provided in fifth operand. Refer to the AMD manuals for the
-
 
2991
detailed explanation of the operation performed by these instructions. Each
-
 
2992
of the first four operands can be a register, and either second source or
-
 
2993
selector can be memory location, 128-bit or 256-bit depending on whether SSE 
-
 
2994
registers or AVX registers are used for the other operands.
-
 
2995
 
-
 
2996
    vpermil2ps ymm0,ymm3,ymm7,ymm2,0  ; permute from two sources
-
 
2997
  
-
 
2998
  "vphaddbw" adds pairs of adjacent signed bytes to form 16-bit values and 
-
 
2999
stores them at the same positions in destination. "vphaddubw" does the same 
-
 
3000
but treats the bytes as unsigned. "vphaddbd" and "vphaddubd" sum all bytes 
-
 
3001
(either signed or unsigned) in each four-byte block to 32-bit results, 
-
 
3002
"vphaddbq" and "vphaddubq" sum all bytes in each eight-byte block to 
-
 
3003
64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit 
-
 
3004
results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to 
-
 
3005
64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit
-
 
3006
results. "vphsubbw" substracts in each two-byte block the byte at higher 
-
 
3007
position from the one at lower position, and stores the result as a signed 
-
 
3008
16-bit value at the corresponding position in destination, "vphsubwd" 
-
 
3009
substracts in each two-word block the word at higher position from the one at
-
 
3010
lower position and makes signed 32-bit results, "vphsubdq" substract in each
-
 
3011
block of two double word the one at higher position from the one at lower
-
 
3012
position and makes signed 64-bit results. Each of these instructions takes
-
 
3013
two operands, the destination being SSE register, and the source being SSE
-
 
3014
register or 128-bit memory.
-
 
3015
 
-
 
3016
    vphadduwq xmm0,xmm1          ; sum quadruplets of words 
-
 
3017
  
-
 
3018
  "vpmacsww" and "vpmacssww" multiply the corresponding signed 16-bit values 
-
 
3019
from the first and second source and then add the products to the parallel 
-
 
3020
values from the third source, then "vpmacsww" takes the lowest 16 bits of the 
-
 
3021
result and "vpmacssww" saturates the result down to 16-bit value, and they 
-
 
3022
store the final 16-bit results in the destination. "vpmacsdd" and "vpmacssdd" 
-
 
3023
perform the analogous operation on 32-bit values. "vpmacswd" and "vpmacswd" do 
-
 
3024
the same calculation only on the low 16-bit values from each 32-bit block and 
-
 
3025
form the 32-bit results. "vpmacsdql" and "vpmacssdql" perform such operation 
-
 
3026
on the low 32-bit values from each 64-bit block and form the 64-bit results, 
-
 
3027
while "vpmacsdqh" and "vpmacssdqh" do the same on the high 32-bit values from 
-
 
3028
each 64-bit block, also forming the 64-bit results. "vpmadcswd" and 
-
 
3029
"vpmadcsswd" multiply the corresponding signed 16-bit value from the first
-
 
3030
and second source, then sum all the four products and add this sum to each
-
 
3031
16-bit element from third source, storing the truncated or saturated result
-
 
3032
in destination. All these instructions take four operands, the second source
-
 
3033
can be 128-bit memory or SSE register, all the other operands have to be
-
 
3034
SSE registers.
-
 
3035
 
-
 
3036
    vpmacsdd xmm6,xmm1,[ebx],xmm6  ; accumulate product
-
 
3037
 
-
 
3038
  "vpperm" selects bytes from first and second source, optionally applies a
-
 
3039
separate transformation to each of them, and stores them in the destination. 
-
 
3040
The bit fields in fourth operand (the selector) specify for each position in 
-
 
3041
destination what byte from which source is taken and what operation is applied 
-
 
3042
to it before it is stored there. Refer to the AMD manuals for the detailed 
-
 
3043
information about these bit fields. This instruction takes four operands, 
-
 
3044
either second source or selector can be a 128-bit memory (or they can be SSE
-
 
3045
registers both), all the other operands have to be SSE registers.
-
 
3046
  "vpshlb", "vpshlw", "vpshld" and "vpshlq" shift logically bytes, words, double
-
 
3047
words or quad words respectively. The amount of bits to shift by is specified
-
 
3048
for each element separately by the signed byte placed at the corresponding
-
 
3049
position in the third operand. The source containing elements to shift is
-
 
3050
provided as second operand. Either second or third operand can be 128-bit 
-
 
3051
memory (or they can be SSE registers both) and the other operands have to be 
-
 
3052
SSE registers.
-
 
3053
 
-
 
3054
    vpshld xmm3,xmm1,[ebx]       ; shift bytes from xmm1
-
 
3055
 
-
 
3056
"vpshab", "vpshaw", "vpshad" and "vpshaq" arithmetically shift bytes, words, 
-
 
3057
double words or quad words. These instructions follow the same rules as the 
-
 
3058
logical shifts described above. "vprotb", "vprotw", "vprotd" and "vprotq" 
-
 
3059
rotate bytes, word, double words or quad words. They follow the same rules as
-
 
3060
shifts, but additionally allow third operand to be immediate value, in which
-
 
3061
case the same amount of rotation is specified for all the elements in source.
-
 
3062
 
-
 
3063
    vprotb xmm0,[esi],3          ; rotate bytes to the left 
-
 
3064
 
-
 
3065
  The MOVBE extension introduces just one new instruction, "movbe", which
-
 
3066
swaps bytes in value from source before storing it in destination, so can
-
 
3067
be used to load and store big endian values. It takes two operands, either 
-
 
3068
the destination or source should be a 16-bit, 32-bit or 64-bit memory (the 
-
 
3069
last one being only allowed in long mode), and the other operand should be 
-
 
3070
a general register of the same size.  
-
 
3071
  The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces 
-
 
3072
new instructions operating on general registers, which use the same encoding
-
 
3073
as AVX instructions and so allow the extended syntax. All these instructions
-
 
3074
use 32-bit operands, and in long mode they also allow the forms with 64-bit
-
 
3075
operands.
-
 
3076
  "andn" calculates the bitwise AND of second source with the inverted bits
-
 
3077
of first source and stores the result in destination. The destination and 
-
 
3078
the first source have to be general registers, the second source can be 
-
 
3079
general register or memory.
-
 
3080
 
-
 
3081
    andn edx,eax,[ebx]   ; bit-multiply inverted eax with memory
-
 
3082
 
-
 
3083
  "bextr" extracts from the first source the sequence of bits using an index
-
 
3084
and length specified by bit fields in the second source operand and stores
-
 
3085
it into destination. The lowest 8 bits of second source specify the position 
-
 
3086
of bit sequence to extract and the next 8 bits of second source specify the 
-
 
3087
length of sequence. The first source can be a general register or memory,
-
 
3088
the other two operands have to be general registers.
-
 
3089
 
-
 
3090
    bextr eax,[esi],ecx  ; extract bit field from memory
-
 
3091
    
-
 
3092
  "blsi" extracts the lowest set bit from the source, setting all the other 
-
 
3093
bits in destination to zero. The destination must be a general register,
-
 
3094
the source can be general register or memory.
-
 
3095
 
-
 
3096
    blsi rax,r11         ; isolate the lowest set bit       
-
 
3097
  
-
 
3098
  "blsmsk" sets all the bits in the destination up to the lowest set bit in 
-
 
3099
the source, including this bit. "blsr" copies all the bits from the source to
-
 
3100
destination except for the lowest set bit, which is replaced by zero. These
-
 
3101
instructions follow the same rules for operands as "blsi".
-
 
3102
  "tzcnt" counts the number of trailing zero bits, that is the zero bits up to
-
 
3103
the lowest set bit of source value. This instruction is analogous to "lzcnt"
-
 
3104
and follows the same rules for operands, so it also has a 16-bit version, 
-
 
3105
unlike the other BMI instructions.
-
 
3106
  "bzhi" is BMI2 instruction, which copies the bits from first source to
-
 
3107
destination, zeroing all the bits up from the position specified by second
-
 
3108
source. It follows the same rules for operands as "bextr".
-
 
3109
  "pext" uses a mask in second source operand to select bits from first 
-
 
3110
operands and puts the selected bits as a continuous sequence into destination.
-
 
3111
"pdep" performs the reverse operation - it takes sequence of bits from the
-
 
3112
first source and puts them consecutively at the positions where the bits in 
-
 
3113
second source are set, setting all the other bits in destination to zero.
-
 
3114
These BMI2 instructions follow the same rules for operands as "andn".    
-
 
3115
  "mulx" is a BMI2 instruction which performs an unsigned multiplication of
-
 
3116
value from EDX or RDX register (depending on the size of specified operands)
-
 
3117
by the value from third operand, and stores the low half of result in the
-
 
3118
second operand, and the high half of result in the first operand, and it does
-
 
3119
it without affecting the flags. The third operand can be general register or 
-
 
3120
memory, and both the destination operands have to be general registers.
-
 
3121
 
-
 
3122
    mulx edx,eax,ecx     ; multiply edx by ecx into edx:eax   
-
 
3123
 
-
 
3124
  "shlx", "shrx" and "sarx" are BMI2 instructions, which perform logical or
-
 
3125
arithmetical shifts of value from first source by the amount specified by
-
 
3126
second source, and store the result in destination without affecting the 
-
 
3127
flags. The have the same rules for operands as "bzhi" instruction.
-
 
3128
  "rorx" is a BMI2 instruction which rotates right the value from source
-
 
3129
operand by the constant amount specified in third operand and stores the
-
 
3130
result in destination without affecting the flags. The destination operand
-
 
3131
has to be general register, the source operand can be general register or
-
 
3132
memory, and the third operand has to be an immediate value.
-
 
3133
 
-
 
3134
    rorx eax,edx,7       ; rotate without affecting flags
-
 
3135
                     
-
 
3136
  The TBM is an extension designed by AMD to supplement the BMI set. The 
-
 
3137
"bextr" instruction is extended with a new form, in which second source is
-
 
3138
a 32-bit immediate value. "blsic" is a new instruction which performs the
-
 
3139
same operation as "blsi", but with the bits of result reversed. It uses the
-
 
3140
same rules for operands as "blsi". "blsfill" is a new instruction, which takes
-
 
3141
the value from source, sets all the bits below the lowest set bit and store
-
 
3142
the result in destination, it also uses the same rules for operands as "blsi".
-
 
3143
  "blci", "blcic", "blcs", "blcmsk" and "blcfill" are instructions analogous
-
 
3144
to "blsi", "blsic", "blsr", "blsmsk" and "blsfill" respectively, but they
-
 
3145
perform the bit-inverted versions of the same operations. They follow the
-
 
3146
same rules for operands as the instructions they reflect.
-
 
3147
  "tzmsk" finds the lowest set bit in value from source operand, sets all bits
-
 
3148
below it to 1 and all the rest of bits to zero, then writes the result to 
-
 
3149
destination. "t1mskc" finds the least significant zero bit in the value from 
-
 
3150
source  operand, sets the bits below it to zero and all the other bits to 1, 
-
 
3151
and writes the result to destination. These instructions have the same rules
-
 
3152
for operands as "blsi".
-
 
3153
      
-
 
3154
 
-
 
3155
2.1.24  Other extensions of instruction set
-
 
3156
 
-
 
3157
There is a number of additional instruction set extensions recognized by flat
-
 
3158
assembler, and the general syntax of the instructions introduced by those
-
 
3159
extensions is provided here. For a detailed information on the operations
-
 
3160
performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE, 
-
 
3161
RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM 
-
 
3162
extension).
-
 
3163
  The Virtual-Machine Extensions (VMX) provide a set of instructions for the
-
 
3164
management of virtual machines. The "vmxon" instruction, which enters the VMX
-
 
3165
operation, requires a single 64-bit memory operand, which should be a physical
-
 
3166
address of memory region, which the logical processor may use to support VMX
-
 
3167
operation. The "vmxoff" instruction, which leaves the VMX operation, has no
-
 
3168
operands. The "vmlaunch" and "vmresume", which launch or resume the virtual
-
 
3169
machines, and "vmcall", which allows guest software to call the VM monitor, 
-
 
3170
use no operands either.
-
 
3171
  The "vmptrld" loads the physical address of current Virtual Machine Control
-
 
3172
Structure (VMCS) from its memory operand, "vmptrst" stores the pointer to
-
 
3173
current VMCS into address specified by its memory operand, and "vmclear" sets
-
 
3174
the launch state of the VMCS referenced by its memory operand to clear. These
-
 
3175
three instruction all require single 64-bit memory operand.
-
 
3176
  The "vmread" reads from VCMS a field specified by the source operand and
-
 
3177
stores it into the destination operand. The source operand should be a
-
 
3178
general purpose register, and the destination operand can be a register of
-
 
3179
memory. The "vmwrite" writes into a VMCS field specified by the destination
-
 
3180
operand the value provided by source operand. The source operand can be a
-
 
3181
general purpose register or memory, and the destination operand must be a
-
 
3182
register. The size of operands for those instructions should be 64-bit when
-
 
3183
in long mode, and 32-bit otherwise.
-
 
3184
  The "invept" and "invvpid" invalidate the translation lookaside buffers
-
 
3185
(TLBs) and paging-structure caches, either derived from extended page tables
-
 
3186
(EPT), or based on the virtual processor identifier (VPID). These instructions
-
 
3187
require two operands, the first one being the general purpose register
-
 
3188
specifying the type of invalidation, and the second one being a 128-bit
-
 
3189
memory operand providing the invalidation descriptor. The first operand
-
 
3190
should be a 64-bit register when in long mode, and 32-bit register otherwise.
-
 
3191
  The Safer Mode Extensions (SMX) provide the functionalities available
-
 
3192
throught the "getsec" instruction. This instruction takes no operands, and
-
 
3193
the function that is executed is determined by the contents of EAX register
-
 
3194
upon executing this instruction.
-
 
3195
  The Secure Virtual Machine (SVM) is a variant of virtual machine extension
-
 
3196
used by AMD. The "skinit" instruction securely reinitializes the processor
-
 
3197
allowing the startup of trusted software, such as the virtual machine monitor
-
 
3198
(VMM). This instruction takes a single operand, which must be EAX, and
-
 
3199
provides a physical address of the secure loader block (SLB).
-
 
3200
  The "vmrun" instruction is used to start a guest virtual machine,
-
 
3201
its only operand should be an accumulator register (AX, EAX or RAX, the
-
 
3202
last one available only in long mode) providing the physical address of the
-
 
3203
virtual machine control block (VMCB). The "vmsave" stores a subset of 
-
 
3204
processor state into VMCB specified by its operand, and "vmload" loads the 
-
 
3205
same subset of processor state from a specified VMCB. The same operand rules 
-
 
3206
as for the "vmrun" apply to those two instructions.
-
 
3207
  "vmmcall" allows the guest software to call the VMM. This instruction takes
-
 
3208
no operands.
-
 
3209
  "stgi" set the global interrupt flag to 1, and "clgi" zeroes it. These
-
 
3210
instructions take no operands.
-
 
3211
  "invlpga" invalidates the TLB mapping for a virtual page specified by the
-
 
3212
first operand (which has to be accumulator register) and address space
-
 
3213
identifier specified by the second operand (which must be ECX register).
-
 
3214
  The XSAVE set of instructions allows to save and restore processor state
-
 
3215
components. "xsave" and "xsaveopt" store the components of processor state 
-
 
3216
defined by bit mask in EDX and EAX registers into area defined by memory 
-
 
3217
operand. "xrstor" restores from the area specified by memory operand the 
-
 
3218
components of processor state defined by mask in EDX and EAX. The "xsave64",
-
 
3219
"xsaveopt64" and "xrstor64" are 64-bit versions of these instructions, allowed
-
 
3220
only in long mode.
-
 
3221
  "xgetbv" read the contents of 64-bit XCR (extended control register)
-
 
3222
specified in ECX register into EDX and EAX registers. "xsetbv" writes the
-
 
3223
contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
-
 
3224
instructions have no operands.
-
 
3225
  The RDRAND extension introduces one new instruction, "rdrand", which loads
-
 
3226
the hardware-generated random value into general register. It takes one
-
 
3227
operand, which can be 16-bit, 32-bit or 64-bit register (with the last one 
-
 
3228
being allowed only in long mode).
-
 
3229
  The FSGSBASE extension adds long mode instructions that allow to read and 
-
 
3230
write the segment base registers for FS and GS segments. "rdfsbase" and 
-
 
3231
"rdgsbase" read the corresponding segment base registers into operand, while 
-
 
3232
"wrfsbase" and "wrgsbase" write the value of operand into those register.
-
 
3233
All these instructions take one operand, which can be 32-bit or 64-bit general
-
 
3234
register.  
-
 
3235
  The INVPCID extension adds "invpcid" instruction, which invalidates mapping
-
 
3236
in the TLBs and paging caches based on the invalidation type specified in 
-
 
3237
first operand and PCID invalidate descriptor specified in second operand.
-
 
3238
The first operands should be 32-bit general register when not in long mode,
-
 
3239
or 64-bit general register when in long mode. The second operand should be
-
 
3240
128-bit memory location.  
-
 
3241
  The HLE and RTM extensions provide set of instructions for the transactional
-
 
3242
management. The "xacquire" and "xrelease" are new prefixes that can be used
-
 
3243
with some of the instructions to start or end lock elision on the memory
-
 
3244
address specified by prefixed instruction. The "xbegin" instruction starts
-
 
3245
the transactional execution, its operand is the address a fallback routine
-
 
3246
that gets executes in case of transaction abort, specified like the operand
-
 
3247
for near jump instruction. "xend" marks the end of transcational execution
-
 
3248
region, it takes no operands. "xabort" forces the transaction abort, it takes
-
 
3249
an 8-bit immediate value as its only operand, this value is passed in the
-
 
3250
highest bits of EAX to the fallback routine. "xtest" checks whether there is
-
 
3251
transactional execution in progress, this instruction takes no operands.
-
 
3252
 
Line 2211... Line 3253...
2211
 
3253
 
Line 2212... Line 3254...
2212
2.2  Control directives
3254
2.2  Control directives
Line 2269... Line 3311...
2269
 
3311
 
Line 2270... Line 3312...
2270
 
3312
 
Line 2271... Line 3313...
2271
2.2.2  Conditional assembly
3313
2.2.2  Conditional assembly
2272
 
3314
 
2273
"if" directive causes come block of instructions to be assembled only under
3315
"if" directive causes some block of instructions to be assembled only under
2274
certain condition. It should be followed by logical expression specifying the
3316
certain condition. It should be followed by logical expression specifying the
2275
condition, instructions in next lines will be assembled only when this
3317
condition, instructions in next lines will be assembled only when this
2276
condition is met, otherwise they will be skipped. The optional "else if"
3318
condition is met, otherwise they will be skipped. The optional "else if"
Line 2297... Line 3339...
2297
even if symbol is used only after this check). The "defined" operator can be
3339
even if symbol is used only after this check). The "defined" operator can be
2298
followed by any expression, usually just by a single symbol name; it checks
3340
followed by any expression, usually just by a single symbol name; it checks
2299
whether the given expression contains only symbols that are defined in the
3341
whether the given expression contains only symbols that are defined in the
2300
source and accessible from the current position.
3342
source and accessible from the current position.
2301
  The following simple example uses the "count" constant that should be
3343
  With "relativeto" operator it is possible to check whether values of two
-
 
3344
expressions differ only by constant amount. The valid syntax is a numerical
-
 
3345
expression followed by "relativeto" and then another expression (possibly
-
 
3346
register-based). Labels that have no simple numerical value can be tested
-
 
3347
this way to determine what kind of operations may be possible with them.
-
 
3348
  The following simple example uses the "count" constant that should be
2302
defined somewhere in source:
3349
defined somewhere in source:
2303
 
3350
 
Line 2304... Line 3351...
2304
    if count>0
3351
    if count>0
2305
        mov cx,count
3352
        mov cx,count
Line 2327... Line 3374...
2327
which follows the "else if", is evaluated and if it's true, the second block
3374
which follows the "else if", is evaluated and if it's true, the second block
2328
of instructions get assembled, otherwise the last block of instructions, which
3375
of instructions get assembled, otherwise the last block of instructions, which
2329
follows the line containing only "else", is assembled.
3376
follows the line containing only "else", is assembled.
2330
  There are also operators that allow comparison of values being any chains of
3377
  There are also operators that allow comparison of values being any chains of
2331
symbols. The "eq" compares two such values whether they are exactly the same.
3378
symbols. The "eq" compares whether two such values are exactly the same.
2332
The "in" operator checks whether given value is a member of the list of values
3379
The "in" operator checks whether given value is a member of the list of values
2333
following this operator, the list should be enclosed between "<" and ">"
3380
following this operator, the list should be enclosed between "<" and ">"
2334
characters, its members should be separated with commas. The symbols are
3381
characters, its members should be separated with commas. The symbols are
2335
considered the same when they have the same meaning for the assembler - for
3382
considered the same when they have the same meaning for the assembler - for
2336
example "pword" and "fword" for assembler are the same and thus are not
3383
example "pword" and "fword" for assembler are the same and thus are not
2337
distinguished by the above operators. In the same way "16 eq 10h" is the true
3384
distinguished by the above operators. In the same way "16 eq 10h" is the true
Line 2429... Line 3476...
2429
operator is specified, one byte is loaded (thus value is in range from 0 to
3476
operator is specified, one byte is loaded (thus value is in range from 0 to
2430
255). The loaded data cannot exceed current offset.
3477
255). The loaded data cannot exceed current offset.
2431
  The "store" directive can modify the already generated code by replacing
3478
  The "store" directive can modify the already generated code by replacing
2432
some of the previously generated data with the value defined by given
3479
some of the previously generated data with the value defined by given
2433
numerical expression, which follow. The expression can be preceded by the
3480
numerical expression, which follows. The expression can be preceded by the
2434
optional size operator to specify how large value the expression defines, and
3481
optional size operator to specify how large value the expression defines, and
2435
therefore how much bytes will be stored, if there is no size operator, the
3482
therefore how much bytes will be stored, if there is no size operator, the
2436
size of one byte is assumed. Then the "at" operator and the numerical
3483
size of one byte is assumed. Then the "at" operator and the numerical
2437
expression defining the valid address in current addressing code space, at
3484
expression defining the valid address in current addressing code space, at
2438
which the given value have to be stored should follow. This is a directive for
3485
which the given value have to be stored should follow. This is a directive for
2439
advanced appliances and should be used carefully.
3486
advanced appliances and should be used carefully.
Line 2451... Line 3498...
2451
        store byte a xor c at $$+%-1
3498
        store byte a xor c at $$+%-1
2452
    end repeat
3499
    end repeat
2453
 
3500
 
Line 2454... Line 3501...
2454
and each byte of code will be xored with the value defined by "c" constant.
3501
and each byte of code will be xored with the value defined by "c" constant.
2455
  "virtual" defines virtual data at specified address. This data won't be
3502
  "virtual" defines virtual data at specified address. This data will not be
2456
included in the output file, but labels defined there can be used in other
3503
included in the output file, but labels defined there can be used in other
2457
parts of source. This directive can be followed by "at" operator and the
3504
parts of source. This directive can be followed by "at" operator and the
2458
numerical expression specifying the address for virtual data, otherwise is
3505
numerical expression specifying the address for virtual data, otherwise is
2459
uses current address, the same as "virtual at $". Instructions defining data
3506
uses current address, the same as "virtual at $". Instructions defining data
2460
are expected in next lines, ended with "end virtual" directive. The block of
3507
are expected in next lines, ended with "end virtual" directive. The block of
Line 2478... Line 3525...
2478
        LDT_address dd ?
3525
        LDT_address dd ?
2479
    end virtual
3526
    end virtual
2480
 
3527
 
Line 2481... Line 3528...
2481
With such definition instruction "mov ax,[LDT_limit]" will be assembled
3528
With such definition instruction "mov ax,[LDT_limit]" will be assembled
2482
to "mov ax,[bx]".
3529
to the same instruction as "mov ax,[bx]".
2483
  Declaring defined data values or instructions inside the virtual block would
3530
  Declaring defined data values or instructions inside the virtual block would
2484
also be useful, because the "load" directive can be used to load the values
3531
also be useful, because the "load" directive can be used to load the values
2485
from the virtually generated code into a constants. This directive should be
3532
from the virtually generated code into a constants. This directive should be
2486
used after the code it loads but before the virtual block ends, because it can
3533
used after the code it loads but before the virtual block ends, because it can
2487
only load the values from the same addressing space. For example:
3534
only load the values from the same addressing space. For example:
Line 2545... Line 3592...
2545
        display d
3592
        display d
2546
    end repeat
3593
    end repeat
2547
    display 13,10
3594
    display 13,10
2548
 
3595
 
Line 2549... Line 3596...
2549
This block of directives calculates the four hexadecimal digits of 16-bit value
3596
This block of directives calculates the four hexadecimal digits of 16-bit
2550
and converts them into characters for displaying. Note that this won't work if
3597
value and converts them into characters for displaying. Note that this will 
2551
the adresses in current addressing space are relocatable (as it might happen
3598
not work if the adresses in current addressing space are relocatable (as it 
2552
with PE or object output formats), since only absolute values can be used this
3599
might happen with PE or object output formats), since only absolute values can
2553
way. The absolute value may be obtained by calculating the relative address,
3600
be used this way. The absolute value may be obtained by calculating the 
2554
like "$-$$", or "rva $" in case of PE format.
3601
relative address, like "$-$$", or "rva $" in case of PE format.
-
 
3602
  The "err" directive immediately terminates the assembly process when it is
-
 
3603
encountered by assembler.
-
 
3604
  The "assert" directive tests whether the logical expression that follows it
-
 
3605
is true, and if not, it signalizes the error.
Line 2555... Line 3606...
2555
 
3606
 
Line 2556... Line 3607...
2556
 
3607
 
Line 2652... Line 3703...
2652
also defined somewhere later.
3703
also defined somewhere later.
2653
  The "used" operator may be expected to behave in a similar manner in
3704
  The "used" operator may be expected to behave in a similar manner in
2654
analogous cases, however any other kinds of predictions my not be so simple and
3705
analogous cases, however any other kinds of predictions my not be so simple and
2655
you should never rely on them this way.
3706
you should never rely on them this way.
2656
 
3707
  The "err" directive, usually used to stop the assembly when some condition is
-
 
3708
met, stops the assembly immediately, regardless of whether the current pass
-
 
3709
is final or intermediate. So even when the condition that caused this directive
-
 
3710
to be interpreted is mispredicted and temporary, and would eventually disappear 
-
 
3711
in the later passes, the assembly is stopped anyway.
-
 
3712
  The "assert" directive signalizes the error only if its expression is false
-
 
3713
after all the symbols have been resolved. You can use "assert 0" in place of
-
 
3714
"err" when you do not want to have assembly stopped during the intermediate
-
 
3715
passes.
-
 
3716
 
Line 2657... Line 3717...
2657
 
3717
 
Line 2658... Line 3718...
2658
2.3  Preprocessor directives
3718
2.3  Preprocessor directives
Line 2674... Line 3734...
2674
to the line containing the "include" directive. There are no limits to the
3734
to the line containing the "include" directive. There are no limits to the
2675
number of included files as long as they fit in memory.
3735
number of included files as long as they fit in memory.
2676
  The quoted path can contain environment variables enclosed within "%"
3736
  The quoted path can contain environment variables enclosed within "%"
2677
characters, they will be replaced with their values inside the path, both the
3737
characters, they will be replaced with their values inside the path, both the
2678
"\" and "/" characters are allowed as a path separators. If no absolute path
3738
"\" and "/" characters are allowed as a path separators. The file is first 
2679
is given, the file is first searched for in the directory containing file
3739
searched for in the directory containing file which included it and when it is
-
 
3740
not found there, the search is continued in the directories specified in the 
2680
which included it and when it's not found there, in the directory containing
3741
environment variable called INCLUDE (the multiple paths separated with 
-
 
3742
semicolons can be defined there, they will be searched in the same order as 
-
 
3743
specified). If file was not found in any of these places, preprocessor looks
2681
the main source file (the one specified in command line). These rules concern
3744
for it in the directory containing the main source file (the one specified in 
2682
also paths given with the "file" directive.
3745
command line). These rules concern also paths given with the "file" directive.
2683
 
3746
 
Line 2684... Line 3747...
2684
 
3747
 
Line 2685... Line 3748...
2685
2.3.2  Symbolic constants
3748
2.3.2  Symbolic constants
Line 2711... Line 3774...
2711
separated with commas. So "restore d" after the above definitions will give
3774
separated with commas. So "restore d" after the above definitions will give
2712
"d" constant back the value "edx", the second one will restore it to value
3775
"d" constant back the value "edx", the second one will restore it to value
2713
"dword", and one more will revert "d" to original meaning as if no such
3776
"dword", and one more will revert "d" to original meaning as if no such
2714
constant was defined. If there was no constant defined of given name,
3777
constant was defined. If there was no constant defined of given name,
2715
"restore" won't cause an error, it will be just ignored.
3778
"restore" will not cause an error, it will be just ignored.
2716
  Symbolic constant can be used to adjust the syntax of assembler to personal
3779
  Symbolic constant can be used to adjust the syntax of assembler to personal
2717
preferences. For example the following set of definitions provides the handy
3780
preferences. For example the following set of definitions provides the handy
2718
shortcuts for all the size operators:
3781
shortcuts for all the size operators:
2719
 
3782
 
Line 2720... Line 3783...
2720
    b equ byte
3783
    b equ byte
Line 2724... Line 3787...
2724
    f equ fword
3787
    f equ fword
2725
    q equ qword
3788
    q equ qword
2726
    t equ tword
3789
    t equ tword
2727
    x equ dqword
3790
    x equ dqword
2728
 
3791
    y equ qqword
-
 
3792
 
Line 2729... Line 3793...
2729
  Because symbolic constant may also have an empty value, it can be used to
3793
  Because symbolic constant may also have an empty value, it can be used to
2730
allow the syntax with "offset" word before any address value:
3794
allow the syntax with "offset" word before any address value:
Line 2731... Line 3795...
2731
 
3795
 
Line 2839... Line 3903...
2839
given, this macroinstruction will become two macroinstructions of the previous
3903
given, this macroinstruction will become two macroinstructions of the previous
2840
definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
3904
definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
2841
"mov ds,dx".
3905
"mov ds,dx".
2842
  By placing the "*" after the name of argument you can mark the argument as
3906
  By placing the "*" after the name of argument you can mark the argument as
2843
required - preprocessor won't allow it to have an empty value. For example the
3907
required - preprocessor will not allow it to have an empty value. For example 
2844
above macroinstruction could be declared as "macro mov op1*,op2*,op3" to make
3908
the above macroinstruction could be declared as "macro mov op1*,op2*,op3" to 
2845
sure that first two arguments will always have to be given some non empty
3909
make sure that first two arguments will always have to be given some non empty
2846
values.
3910
values.
2847
  When it's needed to provide macroinstruction with argument that contains
3911
  Alternatively, you can provide the default value for argument, by placing
-
 
3912
the "=" followed by value after the name of argument. Then if the argument
-
 
3913
has an empty value provided, the default value will be used instead.
-
 
3914
  When it's needed to provide macroinstruction with argument that contains
2848
some commas, such argument should be enclosed between "<" and ">" characters.
3915
some commas, such argument should be enclosed between "<" and ">" characters.
2849
If it contains more than one "<" character, the same number of ">" should be
3916
If it contains more than one "<" character, the same number of ">" should be
2850
used to tell that the value of argument ends.
3917
used to tell that the value of argument ends.
2851
  "purge" directive allows removing the last definition of specified
3918
  "purge" directive allows removing the last definition of specified
2852
macroinstruction. It should be followed by one or more names of
3919
macroinstruction. It should be followed by one or more names of
2853
macroinstructions, separated with commas. If such macroinstruction has not
3920
macroinstructions, separated with commas. If such macroinstruction has not
2854
been defined, you won't get any error. For example after having the syntax of
3921
been defined, you will not get any error. For example after having the syntax
2855
"mov" extended with the macroinstructions defined above, you can disable
3922
of "mov" extended with the macroinstructions defined above, you can disable
2856
syntax with three operands back by using "purge mov" directive. Next
3923
syntax with three operands back by using "purge mov" directive. Next
2857
"purge mov" will disable also syntax for two operands being segment registers,
3924
"purge mov" will disable also syntax for two operands being segment registers,
2858
and all the next such directives will do nothing.
3925
and all the next such directives will do nothing.
2859
  If after the "macro" directive you enclose some group of arguments' names in
3926
  If after the "macro" directive you enclose some group of arguments' names in
2860
square brackets, it will allow giving more values for this group of arguments
3927
square brackets, it will allow giving more values for this group of arguments
2861
when using that macroinstruction. Any more argument given after the last
3928
when using that macroinstruction. Any more argument given after the last
Line 2901... Line 3968...
2901
        jnz move
3968
        jnz move
2902
     }
3969
     }
2903
 
3970
 
Line 2904... Line 3971...
2904
Each time this macroinstruction is used, "move" will become other unique name
3971
Each time this macroinstruction is used, "move" will become other unique name
2905
in its instructions, so you won't get an error you normally get when some
3972
in its instructions, so you will not get an error you normally get when some
2906
label is defined more than once.
3973
label is defined more than once.
2907
  "forward", "reverse" and "common" directives divide macroinstruction into
3974
  "forward", "reverse" and "common" directives divide macroinstruction into
2908
blocks, each one processed after the processing of previous is finished. They
3975
blocks, each one processed after the processing of previous is finished. They
2909
differ in behavior only if macroinstruction allows multiple groups of
3976
differ in behavior only if macroinstruction allows multiple groups of
2910
arguments. Block of instructions that follows "forward" directive is processed
3977
arguments. Block of instructions that follows "forward" directive is processed
Line 2946... Line 4013...
2946
      common call proc
4013
      common call proc
2947
     }
4014
     }
2948
 
4015
 
Line 2949... Line 4016...
2949
This macroinstruction can be used for calling the procedures using STDCALL
4016
This macroinstruction can be used for calling the procedures using STDCALL
2950
convention, arguments are pushed on stack in the reverse order. For example
4017
convention, which has all the arguments pushed on stack in the reverse order. 
2951
"stdcall foo,1,2,3" will be assembled as:
4018
For example "stdcall foo,1,2,3" will be assembled as:
Line 2952... Line 4019...
2952
 
4019
 
2953
    push 3
4020
    push 3
2954
    push 2
4021
    push 2
2955
    push 1
4022
    push 1
Line 2983... Line 4050...
2983
For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
4050
For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
2984
"jae exit" instructions.
4051
"jae exit" instructions.
2985
  The "#" operator can be also used to concatenate two quoted strings into one.
4052
  The "#" operator can be also used to concatenate two quoted strings into one.
2986
Also conversion of name into a quoted string is possible, with the "`" operator,
4053
Also conversion of name into a quoted string is possible, with the "`" operator,
2987
which likewise can be used inside the macroinstruction. It convert the name
4054
which likewise can be used inside the macroinstruction. It converts the name
2988
that follows it into a quoted string - but note, that when it is followed by
4055
that follows it into a quoted string - but note, that when it is followed by
2989
a macro argument which is being replaced with value containing more than one
4056
a macro argument which is being replaced with value containing more than one
2990
symbol, only the first of them will be converted, as the "`" operator converts
4057
symbol, only the first of them will be converted, as the "`" operator converts
2991
only one symbol that immediately follows it. Here's an example of utilizing
4058
only one symbol that immediately follows it. Here's an example of utilizing
2992
those two features:
4059
those two features:
Line 3102... Line 4169...
3102
used. This label will be also attached at the beginning of every name starting
4169
used. This label will be also attached at the beginning of every name starting
3103
with dot in the contents of macroinstruction. The macroinstruction defined
4170
with dot in the contents of macroinstruction. The macroinstruction defined
3104
using the "struc" directive can have the same name as some other
4171
using the "struc" directive can have the same name as some other
3105
macroinstruction defined using the "macro" directive, structure
4172
macroinstruction defined using the "macro" directive, structure
3106
macroinstruction won't prevent the standard macroinstruction being processed
4173
macroinstruction will not prevent the standard macroinstruction from being 
3107
when there is no label before it and vice versa. All the rules and features
4174
processed when there is no label before it and vice versa. All the rules and 
3108
concerning standard macroinstructions apply to structure macroinstructions.
4175
features concerning standard macroinstructions apply to structure 
3109
  Here is the sample of structure macroinstruction:
4176
macroinstructions.
-
 
4177
  Here is the sample of structure macroinstruction:
3110
 
4178
 
Line 3111... Line 4179...
3111
    struc point x,y
4179
    struc point x,y
3112
     {
4180
     {
3113
        .x dw x
4181
        .x dw x
Line 3144... Line 4212...
3144
2.3.5  Repeating macroinstructions
4212
2.3.5  Repeating macroinstructions
Line 3145... Line 4213...
3145
 
4213
 
3146
The "rept" directive is a special kind of macroinstruction, which makes given
4214
The "rept" directive is a special kind of macroinstruction, which makes given
3147
amount of duplicates of the block enclosed with braces. The basic syntax is
-
 
3148
"rept" directive followed by number (it cannot be an expression, since
-
 
3149
preprocessor doesn't do calculations, if you need repetitions based on values
-
 
3150
calculated by assembler, use one of the code repeating directives that are
4215
amount of duplicates of the block enclosed with braces. The basic syntax is
3151
processed by assembler, see 2.2.3), and then block of source enclosed between
4216
"rept" directive followed by number and then block of source enclosed between
Line 3152... Line 4217...
3152
the "{" and "}" characters. The simplest example:
4217
the "{" and "}" characters. The simplest example:
Line 3153... Line 4218...
3153
 
4218
 
Line 3198... Line 4263...
3198
 
4263
 
Line 3199... Line 4264...
3199
will generate code which will clear the contents of eight SSE registers.
4264
will generate code which will clear the contents of eight SSE registers.
3200
You can define multiple counters separated with commas, and each one can have
4265
You can define multiple counters separated with commas, and each one can have
3201
different base.
4266
different base.
-
 
4267
  The number of repetitions and the base values for counters can be specified
-
 
4268
using the numerical expressions with operator rules identical as in the case
-
 
4269
of assembler. However each value used in such expression must either be a
-
 
4270
directly specified number, or a symbolic constant with value also being an
-
 
4271
expression that can be calculated by preprocessor (in such case the value
-
 
4272
of expression associated with symbolic constant is calculated first, and then
-
 
4273
substituted into the outer expression in place of that constant). If you need
-
 
4274
repetitions based on values that can only be calculated at assembly time, use
-
 
4275
one of the code repeating directives that are processed by assembler, see
-
 
4276
section 2.2.3.
3202
  The "irp" directive iterates the single argument through the given list of
4277
  The "irp" directive iterates the single argument through the given list of
3203
parameters. The syntax is "irp" followed by the argument name, then the comma
4278
parameters. The syntax is "irp" followed by the argument name, then the comma
3204
and then the list of parameters. The parameters are specified in the same
4279
and then the list of parameters. The parameters are specified in the same
3205
way like in the invocation of standard macroinstruction, so they have to be
4280
way like in the invocation of standard macroinstruction, so they have to be
3206
separated with commas and each one can be enclosed with the "<" and ">"
4281
separated with commas and each one can be enclosed with the "<" and ">"
Line 3251... Line 4326...
3251
    match +,+ { include 'first.inc' }
4326
    match +,+ { include 'first.inc' }
3252
    match +,- { include 'second.inc' }
4327
    match +,- { include 'second.inc' }
Line 3253... Line 4328...
3253
 
4328
 
3254
the first file will get included, since "+" after comma matches the "+" in
4329
the first file will get included, since "+" after comma matches the "+" in
3255
pattern, and the second file won't be included, since there is no match.
4330
pattern, and the second file will not be included, since there is no match.
3256
  To match any other symbol literally, it has to be preceded by "=" character
4331
  To match any other symbol literally, it has to be preceded by "=" character
3257
in the pattern. Also to match the "=" character itself, or the comma, the
4332
in the pattern. Also to match the "=" character itself, or the comma, the
3258
"==" and "=," constructions have to be used. For example the "=a==" pattern
4333
"==" and "=," constructions have to be used. For example the "=a==" pattern
3259
will match the "a=" sequence.
4334
will match the "a=" sequence.
Line 3275... Line 4350...
3275
matched with "b". But in this case:
4350
matched with "b". But in this case:
3276
 
4351
 
Line 3277... Line 4352...
3277
    match a b, 1 { db a }
4352
    match a b, 1 { db a }
Line 3278... Line 4353...
3278
 
4353
 
3279
there will be nothing left for "b" to match, so the block won't get processed
4354
there will be nothing left for "b" to match, so the block will not get 
3280
at all.
4355
processed at all.
3281
  The block of source defined by match is processed in the same way as any
4356
  The block of source defined by match is processed in the same way as any
3282
macroinstruction, so any operators specific to macroinstructions can be used
4357
macroinstruction, so any operators specific to macroinstructions can be used
3283
also in this case.
4358
also in this case.
3284
  What makes "match" directive more useful is the fact, that it replaces the
4359
  What makes "match" directive more useful is the fact, that it replaces the
Line 3312... Line 4387...
3312
that the "fix" directive and prioritized symbolic constants are processed in
4387
that the "fix" directive and prioritized symbolic constants are processed in
3313
a separate stage, and all other preprocessing is done after on the resulting
4388
a separate stage, and all other preprocessing is done after on the resulting
3314
source.
4389
source.
3315
  The standard preprocessing that comes after, on each line begins with
4390
  The standard preprocessing that comes after, on each line begins with
3316
recognition of the first symbol. It begins with checking for the preprocessor
4391
recognition of the first symbol. It starts with checking for the preprocessor
3317
directives, and when none of them is detected, preprocessor checks whether the
4392
directives, and when none of them is detected, preprocessor checks whether the
3318
first symbol is macroinstruction. If no macroinstruction is found, it moves
4393
first symbol is macroinstruction. If no macroinstruction is found, it moves
3319
to the second symbol of line, and again begins with checking for directives,
4394
to the second symbol of line, and again begins with checking for directives,
3320
which in this case is only the "equ" directive, as this is the only one that
4395
which in this case is only the "equ" directive, as this is the only one that
3321
occurs as the second symbol in line. If there's no directive, the second
4396
occurs as the second symbol in line. If there is no directive, the second
3322
symbol is checked for the case of structure macroinstruction and when none
4397
symbol is checked for the case of structure macroinstruction and when none
3323
of those checks gives the positive result, the symbolic constants are replaced
4398
of those checks gives the positive result, the symbolic constants are replaced
3324
with their values and such line is passed to the assembler.
4399
with their values and such line is passed to the assembler.
3325
  To see it on the example, assume that there is defined the macroinstruction
4400
  To see it on the example, assume that there is defined the macroinstruction
3326
called "foo" and the structure macroinstruction called "bar". Those lines:
4401
called "foo" and the structure macroinstruction called "bar". Those lines:
3327
 
4402
 
Line 3329... Line 4404...
3329
    foo bar
4404
    foo bar
3330
 
4405
 
Line 3331... Line 4406...
3331
would be then both interpreted as invocations of macroinstruction "foo", since
4406
would be then both interpreted as invocations of macroinstruction "foo", since
3332
the meaning of the first symbol overrides the meaning of second one.
4407
the meaning of the first symbol overrides the meaning of second one.
3333
  The macroinstructions generate the new lines from their definition blocks,
4408
  When the macroinstruction generates the new lines from its definition block,
-
 
4409
in every line it first scans for macroinstruction directives, and interpretes
-
 
4410
them accordingly. All the other content in the definition block is used to
-
 
4411
brew the new lines, replacing the macroinstruction parameters with their values
3334
replacing the parameters with their values and then processing the "#" and "`"
4412
and then processing the symbol escaping and "#" and "`" operators. The
3335
operators. The conversion operator has the higher priority than concatenation.
4413
conversion operator has the higher priority than concatenation and if any of
-
 
4414
them operates on the escaped symbol, the escaping is cancelled before finishing
3336
After this is completed, the newly generated line goes through the standard
4415
the operation. After this is completed, the newly generated line goes through
3337
preprocessing, as described above.
4416
the standard preprocessing, as described above.
3338
  Though the symbolic constants are usually only replaced in the lines, where
4417
  Though the symbolic constants are usually only replaced in the lines, where
3339
no preprocessor directives nor macroinstructions has been found, there are some
4418
no preprocessor directives nor macroinstructions has been found, there are some
3340
special cases where those replacements are performed in the parts of lines
4419
special cases where those replacements are performed in the parts of lines
3341
containing directives. First one is the definition of symbolic constant, where
4420
containing directives. First one is the definition of symbolic constant, where
3342
the replacements are done everywhere after the "equ" keyword and the resulting
4421
the replacements are done everywhere after the "equ" keyword and the resulting
Line 3373... Line 4452...
3373
then replaced with matched value when generating the new lines defined by the
4452
then replaced with matched value when generating the new lines defined by the
3374
block enclosed with braces. So if the "list" had value "1,2", the above line
4453
block enclosed with braces. So if the "list" had value "1,2", the above line
3375
would generate the line containing "foo 1,2", which would then go through the
4454
would generate the line containing "foo 1,2", which would then go through the
3376
standard preprocessing.
4455
standard preprocessing.
3377
  There is one more special case - when preprocessor goes to checking the
4456
  The other special case is in the parameters of "rept" directive. The amount
-
 
4457
of repetitions and the base value for counter can be specified using
-
 
4458
numerical expressions, and if there is a symbolic constant with non-numerical
-
 
4459
name used in such an expression, preprocessor tries to evaluate its value as 
-
 
4460
a numerical expression and if succeeds, it replaces the symbolic constant with 
-
 
4461
the result of that calculation and continues to evaluate the primary 
-
 
4462
expression. If the expression inside that symbolic constants also contains 
-
 
4463
some symbolic constants, preprocessor will try to calculate all the needed 
-
 
4464
values recursively. 
-
 
4465
  This allows to perform some calculations at the time of preprocessing, as
-
 
4466
long as all the values used are the numbers known at the preprocessing stage. 
-
 
4467
A single repetition with "rept" can be used for the sole purpose of 
-
 
4468
calculating some value, like in this example: 
-
 
4469
 
-
 
4470
    define a b+4
-
 
4471
    define b 3
-
 
4472
    rept 1 result:a*b+2 { define c result }
-
 
4473
    
-
 
4474
To compute the base value for "result" counter, preprocessor replaces the "b"
-
 
4475
with its value and recursively calculates the value of "a", obtaining 7 as
-
 
4476
the result, then it calculates the main expression with the result being 23.
-
 
4477
The "c" then gets defined with the first value of counter (because the block
-
 
4478
is processed just one time), which is the result of the computation, so the 
-
 
4479
value of "c" is simple "23" symbol. Note that if "b" is later redefined with
-
 
4480
some other numerical value, the next time and expression containing "a" is
-
 
4481
calculated, the value of "a" will reflect the new value of "b", because the
-
 
4482
symbolic constant contains just the text of the expression.
-
 
4483
  There is one more special case - when preprocessor goes to checking the
3378
second symbol in the line and it happens to be the colon character (what is
4484
second symbol in the line and it happens to be the colon character (what is
3379
then interpreted by assembler as definition of a label), it stops in this
4485
then interpreted by assembler as definition of a label), it stops in this
3380
place and finishes the preprocessing of the first symbol (so if it's the
4486
place and finishes the preprocessing of the first symbol (so if it's the
3381
symbolic constant it gets unrolled) and if it still appears to be the label,
4487
symbolic constant it gets unrolled) and if it still appears to be the label,
3382
it performs the standard preprocessing starting from the place after the
4488
it performs the standard preprocessing starting from the place after the
Line 3419... Line 4525...
3419
Now when assembler processes it, the condition for the "if" is false, and
4525
Now when assembler processes it, the condition for the "if" is false, and
3420
the "a" constant doesn't get defined. However symbolic constant "b" was
4526
the "a" constant doesn't get defined. However symbolic constant "b" was
3421
processed normally, even though its definition was put just next to the one
4527
processed normally, even though its definition was put just next to the one
3422
of "a". So because of the possible confusion you should be very careful
4528
of "a". So because of the possible confusion you should be very careful
3423
every time when mixing the features of preprocessor and assembler - always
4529
every time when mixing the features of preprocessor and assembler - in such
3424
try to imagine what your source will become after the preprocessing, and
4530
cases it is important to realize what the source will become after the 
3425
thus what the assembler will see and do its multiple passes on.
4531
preprocessing, and thus what the assembler will see and do its multiple passes 
-
 
4532
on.
Line 3426... Line 4533...
3426
 
4533
 
Line 3427... Line 4534...
3427
 
4534
 
3428
2.4  Formatter directives
4535
2.4  Formatter directives
3429
 
4536
 
3430
These directives are actually also a kind of control directives, with the
4537
These directives are actually also a kind of control directives, with the
3431
purpose of controlling the format of generated code.
4538
purpose of controlling the format of generated code.
3432
  "format" directive followed by the format identifier allows to select the
4539
  "format" directive followed by the format identifier allows to select the
-
 
4540
output format. This directive should be put at the beginning of the source.
-
 
4541
Default output format is a flat binary file, it can also be selected by using
-
 
4542
"format binary" directive. This directive can be followed by the "as" keyword
3433
output format. This directive should be put at the beginning of the source.
4543
and the quoted string specifying the default file extension for the output
3434
Default output format is a flat binary file, it can also be selected by using
4544
file. Unless the output file name was specified from the command line,
3435
"format binary" directive.
4545
assembler will use this extension when generating the output file.
3436
  "use16" and "use32" directives force the assembler to generate 16-bit or
4546
  "use16" and "use32" directives force the assembler to generate 16-bit or
3437
32-bit code, omitting the default setting for selected output format. "use64"
4547
32-bit code, omitting the default setting for selected output format. "use64"
Line 3466... Line 4576...
3466
 
4576
 
Line 3467... Line 4577...
3467
2.4.2  Portable Executable
4577
2.4.2  Portable Executable
3468
 
4578
 
3469
To select the Portable Executable output format, use "format PE" directive, it
4579
To select the Portable Executable output format, use "format PE" directive, it
-
 
4580
can be followed by additional format settings: first the target subsystem
3470
can be followed by additional format settings: use "console", "GUI" or
4581
setting, which can be "console" or "GUI" for Windows applications, "native"
-
 
4582
for Windows drivers, "EFI", "EFIboot" or "EFIruntime" for the UEFI, it may be
-
 
4583
followed by the minimum version of system that the executable is targeted to
-
 
4584
(specified in form of floating-point value). Optional "DLL" and "WDM" keywords
-
 
4585
mark the output file as a dynamic link library and WDM driver respectively,
3471
"native" operator selects the target subsystem (floating point value
4586
and the "large" keyword marks the executable as able to handle addresses
3472
specifying subsystem version can follow), "DLL" marks the output file as a
4587
larger than 2 GB.
3473
dynamic link library. Then can follow the "at" operator and the numerical
4588
  After those settings can follow the "at" operator and a numerical expression
3474
expression specifying the base of PE image and then optionally "on" operator
4589
specifying the base of PE image and then optionally "on" operator followed by
3475
followed by the quoted string containing file name selects custom MZ stub for
4590
the quoted string containing file name selects custom MZ stub for PE program
3476
PE program (when specified file is not a MZ executable, it is treated as a
4591
(when specified file is not a MZ executable, it is treated as a flat binary
3477
flat binary executable file and converted into MZ format). The default code
-
 
Line 3478... Line 4592...
3478
setting for this format is 32-bit. The example of fully featured PE format
4592
executable file and converted into MZ format). The default code setting for
Line 3479... Line 4593...
3479
declaration:
4593
this format is 32-bit. The example of fully featured PE format declaration:
3480
 
4594
 
Line 3522... Line 4636...
3522
to be defined there. The same applies to the resource data when the "resource"
4636
to be defined there. The same applies to the resource data when the "resource"
3523
identifier is followed by "from" operator and quoted file name - in such case
4637
identifier is followed by "from" operator and quoted file name - in such case
3524
data is  taken from the given resource file.
4638
data is  taken from the given resource file.
3525
  The "rva" operator can be used inside the numerical expressions to obtain
4639
  The "rva" operator can be used inside the numerical expressions to obtain
3526
the RVA of the item addressed by the value it is applied to.
4640
the RVA of the item addressed by the value it is applied to, that is the
3527
 
4641
offset relative to the base of PE image.
-
 
4642
 
Line 3528... Line 4643...
3528
 
4643
 
Line 3529... Line 4644...
3529
2.4.3  Common Object File Format
4644
2.4.3  Common Object File Format
3530
 
4645
 
3531
To select Common Object File Format, use "format COFF" or "format MS COFF"
4646
To select Common Object File Format, use "format COFF" or "format MS COFF"
3532
directive whether you want to create classic or Microsoft's COFF file. The
4647
directive, depending whether you want to create classic (DJGPP) or Microsoft's 
3533
default code setting for this format is 32-bit. To create the file in
4648
variant of COFF file. The default code setting for this format is 32-bit. To 
-
 
4649
create the file in Microsoft's COFF format for the x86-64 architecture, use 
3534
Microsoft's COFF format for the x86-64 architecture, use "format MS64 COFF"
4650
"format MS64 COFF" setting, in such case long mode code is generated by 
3535
setting, in such case long mode code is generated by default.
4651
default.
3536
  "section" directive defines a new section, it should be followed by quoted
4652
  "section" directive defines a new section, it should be followed by quoted
3537
string defining the name of section, then one or more section flags can
4653
string defining the name of section, then one or more section flags can
3538
follow. Section flags available for both COFF variants are "code" and "data",
4654
follow. Section flags available for both COFF variants are "code" and "data",
3539
while "readable", "writeable", "executable", "shareable", "discardable",
4655
while flags "readable", "writeable", "executable", "shareable", "discardable",
3540
"notpageable", "linkremove" and "linkinfo" are flags available only with
4656
"notpageable", "linkremove" and "linkinfo" are available only with Microsoft's
3541
Microsoft COFF variant.
4657
COFF variant.
3542
  By default section is aligned to double word (four bytes), in case of
4658
  By default section is aligned to double word (four bytes), in case of
3543
Microsoft COFF variant other alignment can be specified by providing the
4659
Microsoft COFF variant other alignment can be specified by providing the
3544
"align" operator followed by alignment value (any power of two up to 8192)
4660
"align" operator followed by alignment value (any power of two up to 8192)
Line 3559... Line 4675...
3559
 
4675
 
Line 3560... Line 4676...
3560
    public main
4676
    public main
3561
    public start as '_start'
4677
    public start as '_start'
Line -... Line 4678...
-
 
4678
 
-
 
4679
Additionally, with COFF format it's possible to specify exported symbol as
-
 
4680
static, it's done by preceding the name of symbol with the "static" keyword.
-
 
4681
  When using the Microsoft's COFF format, the "rva" operator can be used
-
 
4682
inside the numerical expressions to obtain the RVA of the item addressed by the
-
 
4683
value it is applied to.
3562
 
4684
 
Line 3563... Line 4685...
3563
2.4.4  Executable and Linkable Format
4685
2.4.4  Executable and Linkable Format
3564
 
4686
 
3565
To select ELF output format, use "format ELF" directive. The default code
4687
To select ELF output format, use "format ELF" directive. The default code
Line 3576... Line 4698...
3576
COFF output format is selected (described in previous section).
4698
COFF output format is selected (described in previous section).
3577
  The "rva" operator can be used also in the case of this format (however not
4699
  The "rva" operator can be used also in the case of this format (however not
3578
when target architecture is x86-64), it converts the address into the offset
4700
when target architecture is x86-64), it converts the address into the offset
3579
relative to the GOT table, so it may be useful to create position-independent
4701
relative to the GOT table, so it may be useful to create position-independent
3580
code.
4702
code. There's also a special "plt" operator, which allows to call the external
3581
  To create executable file, follow the format choice directive with the
-
 
3582
"executable" keyword. It allows to use "entry" directive followed by the value
-
 
3583
to set as entry point of program. On the other hand it makes "extrn" and
4703
functions through the Procedure Linkage Table. You can even create an alias
3584
"public" directives unavailable, and instead of "section" there should be the
4704
for external function that will make it always be called through PLT, with
3585
"segment" directive used, followed only by one or more segment permission
4705
the code like:
3586
flags. The origin of segment is aligned to page (4096 bytes), and available
-
 
3587
flags for are: "readable", "writeable" and "executable".
-
 
3588
 
4706
 
Line -... Line 4707...
-
 
4707
    extrn 'printf' as _printf
-
 
4708
    printf = PLT _printf
-
 
4709
 
-
 
4710
  To create executable file, follow the format choice directive with the
-
 
4711
"executable" keyword and optionally the number specifying the brand of the
-
 
4712
target operating system (for example value 3 would mark the executable
-
 
4713
for Linux system). With this format selected it is allowed to use "entry"
-
 
4714
directive followed by the value to set as entry point of program. On the other
-
 
4715
hand it makes "extrn" and "public" directives unavailable, and instead of
-
 
4716
"section" there should be the "segment" directive used, followed by one or
-
 
4717
more segment permission flags and optionally a marker of special ELF
-
 
4718
executable segment, which can be "interpreter", "dynamic" or "note". The
-
 
4719
origin of segment is aligned to page (4096 bytes), and available permission
-
 
4720
flags are: "readable", "writeable" and "executable".
Line 3589... Line 4721...
3589
 
4721
 
3590
EOF
4722
EOF