Subversion Repositories Kolibri OS

Rev

Go to most recent revision | Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
4479 dunkaist 1
 
2
                         ,,;,, ,,,,    ,,,,, ,,, ,,
3
                           ;       ;  ;      ;  ;  ;
4
                           ;  ,'''';   '''', ;  ;  ;
5
                           ;  ',,,,;, ,,,,,' ;  ;  ;
6
7
 
8
                              Programmer's Manual
9
10
 
11
 
12
-----------------
13
14
 
15
16
 
17
        1.1.1  System requirements
18
        1.1.2  Executing compiler from command line
19
        1.1.3  Compiler messages
20
        1.1.4  Output formats
21
22
 
23
        1.2.1  Instruction syntax
24
        1.2.2  Data definitions
25
        1.2.3  Constants and labels
26
        1.2.4  Numerical expressions
27
        1.2.5  Jumps and calls
28
        1.2.6  Size settings
29
30
 
31
32
 
33
        2.1.1  Data movement instructions
34
        2.1.2  Type conversion instructions
35
        2.1.3  Binary arithmetic instructions
36
        2.1.4  Decimal arithmetic instructions
37
        2.1.5  Logical instructions
38
        2.1.6  Control transfer instructions
39
        2.1.7  I/O instructions
40
        2.1.8  Strings operations
41
        2.1.9  Flag control instructions
42
        2.1.10  Conditional operations
43
        2.1.11  Miscellaneous instructions
44
        2.1.12  System instructions
45
        2.1.13  FPU instructions
46
        2.1.14  MMX instructions
47
        2.1.15  SSE instructions
48
        2.1.16  SSE2 instructions
49
        2.1.17  SSE3 instructions
50
        2.1.18  AMD 3DNow! instructions
51
        2.1.19  The x86-64 long mode instructions
52
        2.1.20  SSE4 instructions
53
        2.1.21  AVX instructions
54
        2.1.22  AVX2 instructions
55
        2.1.23  Auxiliary sets of computational instructions
56
        2.1.24  Other extensions of instruction set
57
58
 
59
        2.2.1  Numerical constants
60
        2.2.2  Conditional assembly
61
        2.2.3  Repeating blocks of instructions
62
        2.2.4  Addressing spaces
63
        2.2.5  Other directives
64
        2.2.6  Multiple passes
65
66
 
67
        2.3.1  Including source files
68
        2.3.2  Symbolic constants
69
        2.3.3  Macroinstructions
70
        2.3.4  Structures
71
        2.3.5  Repeating macroinstructions
72
        2.3.6  Conditional preprocessing
73
        2.3.7  Order of processing
74
75
 
76
        2.4.1  MZ executable
77
        2.4.2  Portable Executable
78
        2.4.3  Common Object File Format
79
        2.4.4  Executable and Linkable Format
80
81
 
82
 
83
 
84
-----------------------
85
86
 
87
using the flat assembler. If you are experienced assembly language programmer,
88
you should read at least this chapter before using this compiler.
89
90
 
91
 
92
93
 
94
processors, which does multiple passes to optimize the size of generated
95
machine code. It is self-compilable and versions for different operating
96
systems are provided. All the versions are designed to be used from the system
97
command line and they should not differ in behavior.
98
99
 
100
 
101
102
 
103
although they can produce programs for the x86 architecture 16-bit processors,
104
too. DOS version requires an OS compatible with MS DOS 2.0 and either true
105
real mode environment or DPMI. Windows version requires a Win32 console
106
compatible with 3.1 version.
107
108
 
109
 
110
111
 
112
parameters - first should be name of source file, second should be name of
113
destination file. If no second parameter is given, the name for output
114
file will be guessed automatically. After displaying short information about
115
the program name and version, compiler will read the data from source file and
116
compile it. When the compilation is successful, compiler will write the
117
generated code to the destination file and display the summary of compilation
118
process; otherwise it will display the information about error that occurred.
119
  The source file should be a text file, and can be created in any text
120
editor. Line breaks are accepted in both DOS and Unix standards, tabulators
121
are treated as spaces.
122
  In the command line you can also include "-m" option followed by a number,
123
which specifies how many kilobytes of memory flat assembler should maximally
124
use. In case of DOS version this options limits only the usage of extended
125
memory. The "-p" option followed by a number can be used to specify the limit
126
for number of passes the assembler performs. If code cannot be generated
127
within specified amount of passes, the assembly will be terminated with an
128
error message. The maximum value of this setting is 65536, while the default
129
limit, used when no such option is included in command line, is 100.
130
It is also possible to limit the number of passes the assembler
131
performs, with the "-p" option followed by a number specifying the maximum
132
number of passes.
133
  There are no command line options that would affect the output of compiler,
134
flat assembler requires only the source code to include the information it
135
really needs. For example, to specify output format you specify it by using
136
the "format" directive at the beginning of source.
137
138
 
139
 
140
141
 
142
the compilation summary. It includes the information of how many passes was
143
done, how much time it took, and how many bytes were written into the
144
destination file.
145
The following is an example of the compilation summary:
146
147
 
148
38 passes, 5.3 seconds, 77824 bytes.
149
150
 
151
error message. For example, when compiler can't find the input file, it will
152
display the following message:
153
154
 
155
error: source file not found.
156
157
 
158
that caused the error will be also displayed. Also placement of this line in
159
the source is given to help you finding this error, for example:
160
161
 
162
example.asm [3]:
163
        mob     ax,1
164
error: illegal instruction.
165
166
 
167
encountered an unrecognized instruction. When the line that caused error
168
contains a macroinstruction, also the line in macroinstruction definition
169
that generated the erroneous instruction is displayed:
170
171
 
172
example.asm [6]:
173
        stoschar 7
174
example.asm [3] stoschar [1]:
175
        mob     al,char
176
error: illegal instruction.
177
178
 
179
generated an unrecognized instruction with the first line of its definition.
180
181
 
182
 
183
184
 
185
assembler simply puts generated instruction codes into output, creating this
186
way flat binary file. By default it generates 16-bit code, but you can always
187
turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
188
Some of the output formats switch into 32-bit mode, when selected - more
189
information about formats which you can choose can be found in 2.4.
190
  All output code is always in the order in which it was entered into the
191
source file.
192
193
 
194
 
195
196
 
197
programmers that have been using some other assembly compilers before.
198
If you are beginner, you should look for the assembly programming tutorials.
199
  Flat assembler by default uses the Intel syntax for the assembly
200
instructions, although you can customize it using the preprocessor
201
capabilities (macroinstructions and symbolic constants). It also has its own
202
set of the directives - the instructions for compiler.
203
  All symbols defined inside the sources are case-sensitive.
204
205
 
206
 
207
208
 
209
instruction is expected to fill the one line of text. If a line contains
210
a semicolon, except for the semicolons inside the quoted strings, the rest of
211
this line is the comment and compiler ignores it. If a line ends with "\"
212
character (eventually the semicolon and comment may follow it), the next line
213
is attached at this point.
214
  Each line in source is the sequence of items, which may be one of the three
215
types. One type are the symbol characters, which are the special characters
216
that are individual items even when are not spaced from the other ones.
217
Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
218
other characters, separated from other items with either blank spaces or
219
symbol characters, is a symbol. If the first character of symbol is either a
220
single or double quote, it integrates any sequence of characters following it,
221
even the special ones, into a quoted string, which should end with the same
222
character, with which it began (the single or double quote) - however if there
223
are two such characters in a row (without any other character between them),
224
they are integrated into quoted string as just one of them and the quoted
225
string continues then. The symbols other than symbol characters and quoted
226
strings can be used as names, so are also called the name symbols.
227
  Every instruction consists of the mnemonic and the various number of
228
operands, separated with commas. The operand can be register, immediate value
229
or a data addressed in memory, it can also be preceded by size operator to
230
define or override its size (table 1.1). Names of available registers you can
231
find in table 1.2, their sizes cannot be overridden. Immediate value can be
232
specified by any numerical expression.
233
  When operand is a data in memory, the address of that data (also any
234
numerical expression, but it may contain registers) should be enclosed in
235
square brackets or preceded by "ptr" operator. For example instruction
236
"mov eax,3" will put the immediate value 3 into the EAX register, instruction
237
"mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
238
instruction "mov byte [7],3" will put the immediate value 3 into the byte at
239
address 7, it can also be written as "mov byte ptr 7,3". To specify which
240
segment register should be used for addressing, segment register name followed
241
by a colon should be put just before the address value (inside the square
242
brackets or after the "ptr" operator).
243
244
 
245
  /-------------------------\
246
  | Operator | Bits | Bytes |
247
  |==========|======|=======|
248
  | byte     | 8    | 1     |
249
  | word     | 16   | 2     |
250
  | dword    | 32   | 4     |
251
  | fword    | 48   | 6     |
252
  | pword    | 48   | 6     |
253
  | qword    | 64   | 8     |
254
  | tbyte    | 80   | 10    |
255
  | tword    | 80   | 10    |
256
  | dqword   | 128  | 16    |
257
  | xword    | 128  | 16    |
258
  | qqword   | 256  | 32    |
259
  | yword    | 256  | 32    |
260
  \-------------------------/
261
262
 
263
  /-----------------------------------------------------------------\
264
  | Type    | Bits |                                                |
265
  |=========|======|================================================|
266
  |         | 8    | al    cl    dl    bl    ah    ch    dh    bh   |
267
  | General | 16   | ax    cx    dx    bx    sp    bp    si    di   |
268
  |         | 32   | eax   ecx   edx   ebx   esp   ebp   esi   edi  |
269
  |---------|------|------------------------------------------------|
270
  | Segment | 16   | es    cs    ss    ds    fs    gs               |
271
  |---------|------|------------------------------------------------|
272
  | Control | 32   | cr0         cr2   cr3   cr4                    |
273
  |---------|------|------------------------------------------------|
274
  | Debug   | 32   | dr0   dr1   dr2   dr3               dr6   dr7  |
275
  |---------|------|------------------------------------------------|
276
  | FPU     | 80   | st0   st1   st2   st3   st4   st5   st6   st7  |
277
  |---------|------|------------------------------------------------|
278
  | MMX     | 64   | mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  |
279
  |---------|------|------------------------------------------------|
280
  | SSE     | 128  | xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 |
281
  |---------|------|------------------------------------------------|
282
  | AVX     | 256  | ymm0  ymm1  ymm2  ymm3  ymm4  ymm5  ymm6  ymm7 |
283
  \-----------------------------------------------------------------/
284
285
 
286
 
287
288
 
289
table 1.3. The data definition directive should be followed by one or more of
290
numerical expressions, separated with commas. These expressions define the
291
values for data cells of size depending on which directive is used. For
292
example "db 1,2,3" will define the three bytes of values 1, 2 and 3
293
respectively.
294
  The "db" and "du" directives also accept the quoted string values of any
295
length, which will be converted into chain of bytes when "db" is used and into
296
chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
297
will define the three bytes of values 61, 62 and 63.
298
  The "dp" directive and its synonym "df" accept the values consisting of two
299
numerical expressions separated with colon, the first value will become the
300
high word and the second value will become the low double word of the far
301
pointer value. Also "dd" accepts such pointers consisting of two word values
302
separated with colon, and "dt" accepts the word and quad word value separated
303
with colon, the quad word is stored first. The "dt" directive with single
304
expression as parameter accepts only floating point values and creates data in
305
FPU double extended precision format.
306
  Any of the above directive allows the usage of special "dup" operator to
307
make multiple copies of given values. The count of duplicates should precede
308
this operator and the value to duplicate should follow - it can even be the
309
chain of values separated with commas, but such set of values needs to be
310
enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
311
of the given two byte sequence.
312
  The "file" is a special directive and its syntax is different. This
313
directive includes a chain of bytes from file and it should be followed by the
314
quoted file name, then optionally numerical expression specifying offset in
315
file preceded by the colon, and - also optionally - comma and numerical
316
expression specifying count of bytes to include (if no count is specified, all
317
data up to the end of file is included). For example "file 'data.bin'" will
318
include the whole file as binary data and "file 'data.bin':10h,4" will include
319
only four bytes starting at offset 10h.
320
  The data reservation directive should be followed by only one numerical
321
expression, and this value defines how many cells of the specified size should
322
be reserved. All data definition directives also accept the "?" value, which
323
means that this cell should not be initialized to any value and the effect is
324
the same as by using the data reservation directive. The uninitialized data
325
may not be included in the output file, so its values should be always
326
considered unknown.
327
328
 
329
  /----------------------------\
330
  | Size    | Define | Reserve |
331
  | (bytes) | data   | data    |
332
  |=========|========|=========|
333
  | 1       | db     | rb      |
334
  |         | file   |         |
335
  |---------|--------|---------|
336
  | 2       | dw     | rw      |
337
  |         | du     |         |
338
  |---------|--------|---------|
339
  | 4       | dd     | rd      |
340
  |---------|--------|---------|
341
  | 6       | dp     | rp      |
342
  |         | df     | rf      |
343
  |---------|--------|---------|
344
  | 8       | dq     | rq      |
345
  |---------|--------|---------|
346
  | 10      | dt     | rt      |
347
  \----------------------------/
348
349
 
350
 
351
352
 
353
numbers. To define the constant or label you should use the specific
354
directives. Each label can be defined only once and it is accessible from the
355
any place of source (even before it was defined). Constant can be redefined
356
many times, but in this case it is accessible only after it was defined, and
357
is always equal to the value from last definition before the place where it's
358
used. When a constant is defined only once in source, it is - like the label -
359
accessible from anywhere.
360
  The definition of constant consists of name of the constant followed by the
361
"=" character and numerical expression, which after calculation will become
362
the value of constant. This value is always calculated at the time the
363
constant is defined. For example you can define "count" constant by using the
364
directive "count = 17", and then use it in the assembly instructions, like
365
"mov cx,count" - which will become "mov cx,17" during the compilation process.
366
  There are different ways to define labels. The simplest is to follow the
367
name of label by the colon, this directive can even be followed by the other
368
instruction in the same line. It defines the label whose value is equal to
369
offset of the point where it's defined. This method is usually used to label
370
the places in code. The other way is to follow the name of label (without a
371
colon) by some data directive. It defines the label with value equal to
372
offset of the beginning of defined data, and remembered as a label for data
373
with cell size as specified for that data directive in table 1.3.
374
  The label can be treated as constant of value equal to offset of labeled
375
code or data. For example when you define data using the labeled directive
376
"char db 224", to put the offset of this data into BX register you should use
377
"mov bx,char" instruction, and to put the value of byte addressed by "char"
378
label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
379
But when you try to assemble "mov ax,[char]", it will cause an error, because
380
fasm compares the sizes of operands, which should be equal. You can force
381
assembling that instruction by using size override: "mov ax,word [char]", but
382
remember that this instruction will read the two bytes beginning at "char"
383
address, while it was defined as a one byte.
384
  The last and the most flexible way to define labels is to use "label"
385
directive. This directive should be followed by the name of label, then
386
optionally size operator (it can be preceded by a colon) and then - also
387
optionally "at" operator and the numerical expression defining the address at
388
which this label should be defined. For example "label wchar word at char"
389
will define a new label for the 16-bit data at the address of "char". Now the
390
instruction "mov ax,[wchar]" will be after compilation the same as
391
"mov ax,word [char]". If no address is specified, "label" directive defines
392
the label at current offset. Thus "mov [wchar],57568" will copy two bytes
393
while "mov [char],224" will copy one byte to the same address.
394
  The label whose name begins with dot is treated as local label, and its name
395
is attached to the name of last global label (with name beginning with
396
anything but dot) to make the full name of this label. So you can use the
397
short name (beginning with dot) of this label anywhere before the next global
398
label is defined, and in the other places you have to use the full name. Label
399
beginning with two dots are the exception - they are like global, but they
400
don't become the new prefix for local labels.
401
  The "@@" name means anonymous label, you can have defined many of them in
402
the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
403
anonymous label, symbol "@f" references the nearest following anonymous label.
404
These special symbol are case-insensitive.
405
406
 
407
 
408
409
 
410
constants or labels. But they can be more complex, by using the arithmetical
411
or logical operators for calculations at compile time. All these operators
412
with their priority values are listed in table 1.4. The operations with higher
413
priority value will be calculated first, you can of course change this
414
behavior by putting some parts of expression into parenthesis. The "+", "-",
415
"*" and "/" are standard arithmetical operations, "mod" calculates the
416
remainder from division. The "and", "or", "xor", "shl", "shr" and "not"
417
perform the same logical operations as assembly instructions of those names.
418
The "rva" and "plt" are special unary operators that perform conversions
419
between different kinds of addresses, they can be used only with few of the
420
output formats and their meaning may vary (see 2.4).
421
  The arithmetical and logical calculations are usually processed as if they
422
operated on infinite precision 2-adic numbers, and assembler signalizes an
423
overflow error if because of its limitations it is not table to perform the
424
required calculation, or if the result is too large number to fit in either
425
signed or unsigned range for the destination unit size. However "not", "xor"
426
and "shr" operators are exceptions from this rule - if the value specified
427
by numerical expression has to fit in a unit of specified size, and the
428
arguments for operation fit into that size, the operation will be performed
429
with precision limited to that size.
430
  The numbers in the expression are by default treated as a decimal, binary
431
numbers should have the "b" letter attached at the end, octal number should
432
end with "o" letter, hexadecimal numbers should begin with "0x" characters
433
(like in C language) or with the "$" character (like in Pascal language) or
434
they should end with "h" letter. Also quoted string, when encountered in
435
expression, will be converted into number - the first character will become
436
the least significant byte of number.
437
  The numerical expression used as an address value can also contain any of
438
general registers used for addressing, they can be added and multiplied by
439
appropriate values, as it is allowed for the x86 architecture instructions.
440
The numerical calculations inside address definition by default operate with
441
target size assumed to be the same as the current bitness of code, even if
442
generated instruction encoding will use a different address size.
443
  There are also some special symbols that can be used inside the numerical
444
expression. First is "$", which is always equal to the value of current
445
offset, while "$$" is equal to base address of current addressing space. The
446
other one is "%", which is the number of current repeat in parts of code that
447
are repeated using some special directives (see 2.2) and zero anywhere else.
448
There's also "%t" symbol, which is always equal to the current time stamp.
449
  Any numerical expression can also consist of single floating point value
450
(flat assembler does not allow any floating point operations at compilation
451
time) in the scientific notation, they can end with the "f" letter to be
452
recognized, otherwise they should contain at least one of the "." or "E"
453
characters. So "1.0", "1E0" and "1f" define the same floating point value,
454
while simple "1" defines an integer value.
455
456
 
457
  /-------------------------\
458
  | Priority | Operators    |
459
  |==========|==============|
460
  | 0        | +  -         |
461
  |----------|--------------|
462
  | 1        | *  /         |
463
  |----------|--------------|
464
  | 2        | mod          |
465
  |----------|--------------|
466
  | 3        | and  or  xor |
467
  |----------|--------------|
468
  | 4        | shl  shr     |
469
  |----------|--------------|
470
  | 5        | not          |
471
  |----------|--------------|
472
  | 6        | rva  plt     |
473
  \-------------------------/
474
475
 
476
 
477
478
 
479
size operator, but also by one of the operators specifying type of the jump:
480
"short", "near" or "far". For example, when assembler is in 16-bit mode,
481
instruction "jmp dword [0]" will become the far jump and when assembler is
482
in 32-bit mode, it will become the near jump. To force this instruction to be
483
treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
484
  When operand of near jump is the immediate value, assembler will generate
485
the shortest variant of this jump instruction if possible (but will not create
486
32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
487
unless there is a size operator stating it). By specifying the jump type
488
you can force it to always generate long variant (for example "jmp near 0")
489
or to always generate short variant and terminate with an error when it's
490
impossible (for example "jmp short 0").
491
492
 
493
 
494
495
 
496
instruction is generated by using the short displacement if only address
497
value fits in the range. This can be overridden using the "word" or "dword"
498
operator before the address inside the square brackets (or after the "ptr"
499
operator), which forces the long displacement of appropriate size to be made.
500
In case when address is not relative to any registers, those operators allow
501
also to choose the appropriate mode of absolute addressing.
502
  Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
503
first operand being 16-bit or 32-bit are by default generated in shortened
504
8-bit form when the second operand is immediate value fitting in the range
505
for signed 8-bit values. It also can be overridden by putting the "word" or
506
"dword" operator before the immediate value. The similar rules applies to the
507
"imul" instruction with the last operand being immediate value.
508
  Immediate value as an operand for "push" instruction without a size operator
509
is by default treated as a word value if assembler is in 16-bit mode and as a
510
double word value if assembler is in 32-bit mode, shorter 8-bit form of this
511
instruction is used if possible, "word" or "dword" size operator forces the
512
"push" instruction to be generated in longer form for specified size. "pushw"
513
and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
514
without forcing it to use the longer form of instruction.
515
516
 
517
 
518
--------------------------
519
520
 
521
directives supported by flat assembler. Directives for defining labels were
522
already discussed in 1.2.3, all other directives will be described later in
523
this chapter.
524
525
 
526
 
527
528
 
529
purpose the assembly language instructions. If you need more technical
530
information, look for the Intel Architecture Software Developer's Manual.
531
  Assembly instructions consist of the mnemonic (instruction's name) and from
532
zero to three operands. If there are two or more operands, usually first is
533
the destination operand and second is the source operand. Each operand can be
534
register, memory or immediate value (see 1.2 for details about syntax of
535
operands). After the description of each instruction there are examples
536
of different combinations of operands, if the instruction has any.
537
  Some instructions act as prefixes and can be followed by other instruction
538
in the same line, and there can be more than one prefix in a line. Each name
539
of the segment register is also a mnemonic of instruction prefix, altough it
540
is recommended to use segment overrides inside the square brackets instead of
541
these prefixes.
542
543
 
544
 
545
546
 
547
destination operand. It can transfer data between general registers, from
548
the general register to memory, or from memory to general register, but it
549
cannot move from memory to memory. It can also transfer an immediate value to
550
general register or memory, segment register to general register or memory,
551
general register or memory to segment register, control or debug register to
552
general register and general register to control or debug register. The "mov"
553
can be assembled only if the size of source operand and size of destination
554
operand are the same. Below are the examples for each of the allowed
555
combinations:
556
557
 
558
    mov [char],al   ; general register to memory
559
    mov bl,[char]   ; memory to general register
560
    mov dl,32       ; immediate value to general register
561
    mov [char],32   ; immediate value to memory
562
    mov ax,ds       ; segment register to general register
563
    mov [bx],ds     ; segment register to memory
564
    mov ds,ax       ; general register to segment register
565
    mov ds,[bx]     ; memory to segment register
566
    mov eax,cr0     ; control register to general register
567
    mov cr3,ebx     ; general register to control register
568
569
 
570
two word operands or two double word operands. Order of operands is not
571
important. The operands may be two general registers, or general register
572
with memory. For example:
573
574
 
575
    xchg al,[char]  ; swap register with memory
576
577
 
578
the operand to the top of stack indicated by ESP. The operand can be memory,
579
general register, segment register or immediate value of word or double word
580
size. If operand is an immediate value and no size is specified, it is by
581
default treated as a word value if assembler is in 16-bit mode and as a double
582
word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
583
variants of this instruction that store the values of word or double word size
584
respectively. If more operands follow in the same line (separated only with
585
spaces, not commas), compiler will assemble chain of the "push" instructions
586
with these operands. The examples are with single operands:
587
588
 
589
    push es         ; store segment register
590
    pushw [bx]      ; store memory
591
    push 1000h      ; store immediate value
592
593
 
594
This instruction has no operands. There are two version of this instruction,
595
one 16-bit and one 32-bit, assembler automatically generates the appropriate
596
version for current mode, but it can be overridden by using "pushaw" or
597
"pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
598
version of this instruction pushes general registers on the stack in the
599
following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
600
BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
601
in the same order.
602
  "pop" transfers the word or double word at the current top of stack to the
603
destination operand, and then increments ESP to point to the new top of stack.
604
The operand can be memory, general register or segment register. "popw" and
605
"popd" mnemonics are variants of this instruction for restoring the values of
606
word or double word size respectively. If more operands separated with spaces
607
follow in the same line, compiler will assemble chain of the "pop"
608
instructions with these operands.
609
610
 
611
    pop ds          ; restore segment register
612
    popw [si]       ; restore memory
613
614
 
615
except for the saved value of SP (or ESP), which is ignored. This instruction
616
has no operands. To force assembling 16-bit or 32-bit version of this
617
instruction use "popaw" or "popad" mnemonic.
618
619
 
620
 
621
622
 
623
words, and double words into quad words. These conversions can be done using
624
the sign extension or zero extension. The sign extension fills the extra bits
625
of the larger item with the value of the sign bit of the smaller item, the
626
zero extension simply fills them with zeros.
627
  "cwd" and "cdq" double the size of value AX or EAX register respectively
628
and store the extra bits into the DX or EDX register. The conversion is done
629
using the sign extension. These instructions have no operands.
630
  "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
631
the sign of the word in AX throughout EAX. These instructions also have no
632
operands.
633
  "movsx" converts a byte to word or double word and a word to double word
634
using the sign extension. "movzx" does the same, but it uses the zero
635
extension. The source operand can be general register or memory, while the
636
destination operand must be a general register. For example:
637
638
 
639
    movsx edx,dl        ; byte register to double word register
640
    movsx eax,ax        ; word register to double word register
641
    movsx ax,byte [bx]  ; byte memory to word register
642
    movsx edx,byte [bx] ; byte memory to double word register
643
    movsx eax,word [bx] ; word memory to double word register
644
645
 
646
 
647
648
 
649
destination operands and sets CF if overflow has occurred. The operands may
650
be bytes, words or double words. The destination operand can be general
651
register or memory, the source operand can be general register or immediate
652
value, it can also be memory if the destination operand is register.
653
654
 
655
    add ax,[si]     ; add memory to register
656
    add [di],al     ; add register to memory
657
    add al,48       ; add immediate value to register
658
    add [char],48   ; add immediate value to memory
659
660
 
661
operand with the result. Rules for the operands are the same as for the "add"
662
instruction. An "add" followed by multiple "adc" instructions can be used to
663
add numbers longer than 32 bits.
664
  "inc" adds one to the operand, it does not affect CF. The operand can be a
665
general register or memory, and the size of the operand can be byte, word or
666
double word.
667
668
 
669
    inc byte [bx]   ; increment memory by one
670
671
 
672
the destination operand with the result. If a borrow is required, the CF is
673
set. Rules for the operands are the same as for the "add" instruction.
674
  "sbb" subtracts the source operand from the destination operand, subtracts
675
one if CF is set, and stores the result to the destination operand. Rules for
676
the operands are the same as for the "add" instruction. A "sub" followed by
677
multiple "sbb" instructions may be used to subtract numbers longer than 32
678
bits.
679
  "dec" subtracts one from the operand, it does not affect CF. Rules for the
680
operand are the same as for the "inc" instruction.
681
  "cmp" subtracts the source operand from the destination operand. It updates
682
the flags as the "sub" instruction, but does not alter the source and
683
destination operands. Rules for the operands are the same as for the "sub"
684
instruction.
685
  "neg" subtracts a signed integer operand from zero. The effect of this
686
instructon is to reverse the sign of the operand from positive to negative or
687
from negative to positive. Rules for the operand are the same as for the "inc"
688
instruction.
689
  "xadd" exchanges the destination operand with the source operand, then loads
690
the sum of the two values into the destination operand. Rules for the operands
691
are the same as for the "add" instruction.
692
  All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
693
SF is always set to the same value as the result's sign bit, ZF is set when
694
all the bits of result are zero, PF is set when low order eight bits of result
695
contain an even number of set bits, OF is set if result is too large for a
696
positive number or too small for a negative number (excluding sign bit) to fit
697
in destination operand.
698
  "mul" performs an unsigned multiplication of the operand and the
699
accumulator. If the operand is a byte, the processor multiplies it by the
700
contents of AL and returns the 16-bit result to AH and AL. If the operand is a
701
word, the processor multiplies it by the contents of AX and returns the 32-bit
702
result to DX and AX. If the operand is a double word, the processor multiplies
703
it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
704
sets CF and OF when the upper half of the result is nonzero, otherwise they
705
are cleared. Rules for the operand are the same as for the "inc" instruction.
706
  "imul" performs a signed multiplication operation. This instruction has
707
three variations. First has one operand and behaves in the same way as the
708
"mul" instruction. Second has two operands, in this case destination operand
709
is multiplied by the source operand and the result replaces the destination
710
operand. Destination operand must be a general register, it can be word or
711
double word, source operand can be general register, memory or immediate
712
value. Third form has three operands, the destination operand must be a
713
general register, word or double word in size, source operand can be general
714
register or memory, and third operand must be an immediate value. The source
715
operand is multiplied by the immediate value and the result is stored in the
716
destination register. All the three forms calculate the product to twice the
717
size of operands and set CF and OF when the upper half of the result is
718
nonzero, but second and third form truncate the product to the size of
719
operands. So second and third forms can be also used for unsigned operands
720
because, whether the operands are signed or unsigned, the lower half of the
721
product is the same. Below are the examples for all three forms:
722
723
 
724
    imul word [si]  ; accumulator by memory
725
    imul bx,cx      ; register by register
726
    imul bx,[si]    ; register by memory
727
    imul bx,10      ; register by immediate value
728
    imul ax,bx,10   ; register by immediate value to register
729
    imul ax,[si],10 ; memory by immediate value to register
730
731
 
732
The dividend (the accumulator) is twice the size of the divisor (the operand),
733
the quotient and remainder have the same size as the divisor. If divisor is
734
byte, the dividend is taken from AX register, the quotient is stored in AL and
735
the remainder is stored in AH. If divisor is word, the upper half of dividend
736
is taken from DX, the lower half of dividend is taken from AX, the quotient is
737
stored in AX and the remainder is stored in DX. If divisor is double word,
738
the upper half of dividend is taken from EDX, the lower half of dividend is
739
taken from EAX, the quotient is stored in EAX and the remainder is stored in
740
EDX. Rules for the operand are the same as for the "mul" instruction.
741
  "idiv" performs a signed division of the accumulator by the operand.
742
It uses the same registers as the "div" instruction, and the rules for
743
the operand are the same.
744
745
 
746
 
747
748
 
749
instructions (already described in the prior section) with the decimal
750
arithmetic instructions. The decimal arithmetic instructions are used to
751
adjust the results of a previous binary arithmetic operation to produce a
752
valid packed or unpacked decimal result, or to adjust the inputs to a
753
subsequent binary arithmetic operation so the operation will produce a valid
754
packed or unpacked decimal result.
755
  "daa" adjusts the result of adding two valid packed decimal operands in
756
AL. "daa" must always follow the addition of two pairs of packed decimal
757
numbers (one digit in each half-byte) to obtain a pair of valid packed
758
decimal digits as results. The carry flag is set if carry was needed.
759
This instruction has no operands.
760
  "das" adjusts the result of subtracting two valid packed decimal operands
761
in AL. "das" must always follow the subtraction of one pair of packed decimal
762
numbers (one digit in each half-byte) from another to obtain a pair of valid
763
packed decimal digits as results. The carry flag is set if a borrow was
764
needed. This instruction has no operands.
765
  "aaa" changes the contents of register AL to a valid unpacked decimal
766
number, and zeroes the top four bits. "aaa" must always follow the addition
767
of two unpacked decimal operands in AL. The carry flag is set and AH is
768
incremented if a carry is necessary. This instruction has no operands.
769
  "aas" changes the contents of register AL to a valid unpacked decimal
770
number, and zeroes the top four bits. "aas" must always follow the
771
subtraction of one unpacked decimal operand from another in AL. The carry flag
772
is set and AH decremented if a borrow is necessary. This instruction has no
773
operands.
774
  "aam" corrects the result of a multiplication of two valid unpacked decimal
775
numbers. "aam" must always follow the multiplication of two decimal numbers
776
to produce a valid decimal result. The high order digit is left in AH, the
777
low order digit in AL. The generalized version of this instruction allows
778
adjustment of the contents of the AX to create two unpacked digits of any
779
number base. The standard version of this instruction has no operands, the
780
generalized version has one operand - an immediate value specifying the
781
number base for the created digits.
782
  "aad" modifies the numerator in AH and AL to prepare for the division of two
783
valid unpacked decimal operands so that the quotient produced by the division
784
will be a valid unpacked decimal number. AH should contain the high order
785
digit and AL the low order digit. This instruction adjusts the value and
786
places the result in AL, while AH will contain zero. The generalized version
787
of this instruction allows adjustment of two unpacked digits of any number
788
base. Rules for the operand are the same as for the "aam" instruction.
789
790
 
791
 
792
793
 
794
of the operand. It has no effect on the flags. Rules for the operand are the
795
same as for the "inc" instruction.
796
  "and", "or" and "xor" instructions perform the standard logical operations.
797
They update the SF, ZF and PF flags. Rules for the operands are the same as
798
for the "add" instruction.
799
  "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
800
be in memory or in a general register. The location of the bit is specified
801
as an offset from the low order end of the operand. The value of the offset
802
is the taken from the second operand, it either may be an immediate byte or
803
a general register. These instructions first assign the value of the selected
804
bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
805
1, "btr" resets the selected bit to 0, "btc" changes the bit to its
806
complement. The first operand can be word or double word.
807
808
 
809
    bts word [bx],15 ; test and set bit in memory
810
    btr ax,cx        ; test and reset bit in register
811
    btc word [bx],cx ; test and complement bit in memory
812
813
 
814
and store the index of this bit into destination operand, which must be
815
general register. The bit string being scanned is specified by source operand,
816
it may be either general register or memory. The ZF flag is set if the entire
817
string is zero (no set bits are found); otherwise it is cleared. If no set bit
818
is found, the value of the destination register is undefined. "bsf" scans from
819
low order to high order (starting from bit index zero). "bsr" scans from high
820
order to low order (starting from bit index 15 of a word or index 31 of a
821
double word).
822
823
 
824
    bsr ax,[si]      ; scan memory reverse
825
826
 
827
in the second operand. The destination operand can be byte, word, or double
828
word general register or memory. The second operand can be an immediate value
829
or the CL register. The processor shifts zeros in from the right (low order)
830
side of the operand as bits exit from the left side. The last bit that exited
831
is stored in CF. "sal" is a synonym for "shl".
832
833
 
834
    shl byte [bx],1  ; shift memory left by one bit
835
    shl ax,cl        ; shift register left by count from cl
836
    shl word [bx],cl ; shift memory left by count from cl
837
838
 
839
specified in the second operand. Rules for operands are the same as for the
840
"shl" instruction. "shr" shifts zeros in from the left side of the operand as
841
bits exit from the right side. The last bit that exited is stored in CF.
842
"sar" preserves the sign of the operand by shifting in zeros on the left side
843
if the value is positive or by shifting in ones if the value is negative.
844
  "shld" shifts bits of the destination operand to the left by the number
845
of bits specified in third operand, while shifting high order bits from the
846
source operand into the destination operand on the right. The source operand
847
remains unmodified. The destination operand can be a word or double word
848
general register or memory, the source operand must be a general register,
849
third operand can be an immediate value or the CL register.
850
851
 
852
    shld [di],bx,1   ; shift memory left by one bit
853
    shld ax,bx,cl    ; shift register left by count from cl
854
    shld [di],bx,cl  ; shift memory left by count from cl
855
856
 
857
low order bits from the source operand into the destination operand on the
858
left. The source operand remains unmodified. Rules for operands are the same
859
as for the "shld" instruction.
860
  "rol" and "rcl" rotate the byte, word or double word destination operand
861
left by the number of bits specified in the second operand. For each rotation
862
specified, the high order bit that exits from the left of the operand returns
863
at the right to become the new low order bit. "rcl" additionally puts in CF
864
each high order bit that exits from the left side of the operand before it
865
returns to the operand as the low order bit on the next rotation cycle. Rules
866
for operands are the same as for the "shl" instruction.
867
  "ror" and "rcr" rotate the byte, word or double word destination operand
868
right by the number of bits specified in the second operand. For each rotation
869
specified, the low order bit that exits from the right of the operand returns
870
at the left to become the new high order bit. "rcr" additionally puts in CF
871
each low order bit that exits from the right side of the operand before it
872
returns to the operand as the high order bit on the next rotation cycle.
873
Rules for operands are the same as for the "shl" instruction.
874
  "test" performs the same action as the "and" instruction, but it does not
875
alter the destination operand, only updates flags. Rules for the operands are
876
the same as for the "and" instruction.
877
  "bswap" reverses the byte order of a 32-bit general register: bits 0 through
878
7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
879
bits 16 through 23. This instruction is provided for converting little-endian
880
values to big-endian format and vice versa.
881
882
 
883
884
 
885
 
886
887
 
888
destination address can be specified directly within the instruction or
889
indirectly through a register or memory, the acceptable size of this address
890
depends on whether the jump is near or far (it can be specified by preceding
891
the operand with "near" or "far" operator) and whether the instruction is
892
16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
893
instruction or the "dword" size for 32-bit instruction. Operand for far jump
894
should be "dword" size for 16-bit instruction or "pword" size for 32-bit
895
instruction. A direct "jmp" instruction includes the destination address as
896
part of the instruction (and can be preceded by "short", "near" or "far"
897
operator), the operand specifying address should be the numerical expression
898
for near or short jump, or two numerical expressions separated with colon for
899
far jump, the first specifies selector of segment, the second is the offset
900
within segment. The "pword" operator can be used to force the 32-bit far call,
901
and "dword" to force the 16-bit far call. An indirect "jmp" instruction
902
obtains the destination address indirectly through a register or a pointer
903
variable, the operand should be general register or memory. See also 1.2.5 for
904
some more details.
905
906
 
907
    jmp 0FFFFh:0     ; direct far jump
908
    jmp ax           ; indirect near jump
909
    jmp pword [ebx]  ; indirect far jump
910
911
 
912
of the instruction following the "call" for later use by a "ret" (return)
913
instruction. Rules for the operands are the same as for the "jmp" instruction,
914
but the "call" has no short variant of direct instruction and thus it not
915
optimized.
916
  "ret", "retn" and "retf" instructions terminate the execution of a procedure
917
and transfers control back to the program that originally invoked the
918
procedure using the address that was stored on the stack by the "call"
919
instruction. "ret" is the equivalent for "retn", which returns from the
920
procedure that was executed using the near call, while "retf" returns from
921
the procedure that was executed using the far call. These instructions default
922
to the size of address appropriate for the current code setting, but the size
923
of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
924
mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
925
All these instructions may optionally specify an immediate operand, by adding
926
this constant to the stack pointer, they effectively remove any arguments that
927
the calling program pushed on the stack before the execution of the "call"
928
instruction.
929
  "iret" returns control to an interrupted procedure. It differs from "ret" in
930
that it also pops the flags from the stack into the flags register. The flags
931
are stored on the stack by the interrupt mechanism. It defaults to the size of
932
return address appropriate for the current code setting, but it can be forced
933
to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
934
  The conditional transfer instructions are jumps that may or may not transfer
935
control, depending on the state of the CPU flags when the instruction
936
executes. The mnemonics for conditional jumps may be obtained by attaching
937
the condition mnemonic (see table 2.1) to the "j" mnemonic,
938
for example "jc" instruction will transfer the control when the CF flag is
939
set. The conditional jumps can be short or near, and direct only, and can be
940
optimized (see 1.2.5), the operand should be an immediate value specifying
941
target address.
942
943
 
944
  /-----------------------------------------------------------\
945
  | Mnemonic | Condition tested      | Description            |
946
  |==========|=======================|========================|
947
  | o        | OF = 1                | overflow               |
948
  |----------|-----------------------|------------------------|
949
  | no       | OF = 0                | not overflow           |
950
  |----------|-----------------------|------------------------|
951
  | c        |                       | carry                  |
952
  | b        | CF = 1                | below                  |
953
  | nae      |                       | not above nor equal    |
954
  |----------|-----------------------|------------------------|
955
  | nc       |                       | not carry              |
956
  | ae       | CF = 0                | above or equal         |
957
  | nb       |                       | not below              |
958
  |----------|-----------------------|------------------------|
959
  | e        | ZF = 1                | equal                  |
960
  | z        |                       | zero                   |
961
  |----------|-----------------------|------------------------|
962
  | ne       | ZF = 0                | not equal              |
963
  | nz       |                       | not zero               |
964
  |----------|-----------------------|------------------------|
965
  | be       | CF or ZF = 1          | below or equal         |
966
  | na       |                       | not above              |
967
  |----------|-----------------------|------------------------|
968
  | a        | CF or ZF = 0          | above                  |
969
  | nbe      |                       | not below nor equal    |
970
  |----------|-----------------------|------------------------|
971
  | s        | SF = 1                | sign                   |
972
  |----------|-----------------------|------------------------|
973
  | ns       | SF = 0                | not sign               |
974
  |----------|-----------------------|------------------------|
975
  | p        | PF = 1                | parity                 |
976
  | pe       |                       | parity even            |
977
  |----------|-----------------------|------------------------|
978
  | np       | PF = 0                | not parity             |
979
  | po       |                       | parity odd             |
980
  |----------|-----------------------|------------------------|
981
  | l        | SF xor OF = 1         | less                   |
982
  | nge      |                       | not greater nor equal  |
983
  |----------|-----------------------|------------------------|
984
  | ge       | SF xor OF = 0         | greater or equal       |
985
  | nl       |                       | not less               |
986
  |----------|-----------------------|------------------------|
987
  | le       | (SF xor OF) or ZF = 1 | less or equal          |
988
  | ng       |                       | not greater            |
989
  |----------|-----------------------|------------------------|
990
  | g        | (SF xor OF) or ZF = 0 | greater                |
991
  | nle      |                       | not less nor equal     |
992
  \-----------------------------------------------------------/
993
994
 
995
CX (or ECX) to specify the number of repetitions of a software loop. All
996
"loop" instructions automatically decrement CX (or ECX) and terminate the
997
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
998
whether the current code setting is 16-bit or 32-bit, but it can be forced to
999
us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
1000
"loope" and "loopz" are the synonyms for the same instruction, which acts as
1001
the standard "loop", but also terminates the loop when ZF flag is set.
1002
"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
1003
and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
1004
synonyms for the same instructions, which acts as the standard "loop", but
1005
also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
1006
mnemonics force them to use CX register while "loopned" and "loopnzd" force
1007
them to use ECX register. Every "loop" instruction needs an operand being an
1008
immediate value specifying target address, it can be only short jump (in the
1009
range of 128 bytes back and 127 bytes forward from the address of instruction
1010
following the "loop" instruction).
1011
  "jcxz" branches to the label specified in the instruction if it finds a
1012
value of zero in CX, "jecxz" does the same, but checks the value of ECX
1013
instead of CX. Rules for the operands are the same as for the "loop"
1014
instruction.
1015
  "int" activates the interrupt service routine that corresponds to the
1016
number specified as an operand to the instruction, the number should be in
1017
range from 0 to 255. The interrupt service routine terminates with an "iret"
1018
instruction that returns control to the instruction that follows "int".
1019
"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
1020
"into" instruction invokes the interrupt 4 if the OF flag is set.
1021
  "bound" verifies that the signed value contained in the specified register
1022
lies within specified limits. An interrupt 5 occurs if the value contained in
1023
the register is less than the lower bound or greater than the upper bound. It
1024
needs two operands, the first operand specifies the register being tested,
1025
the second operand should be memory address for the two signed limit values.
1026
The operands can be "word" or "dword" in size.
1027
1028
 
1029
    bound eax,[esi]  ; check double word for bounds
1030
1031
 
1032
 
1033
1034
 
1035
or EAX. I/O ports can be addressed either directly, with the immediate byte
1036
value coded in instruction, or indirectly via the DX register. The destination
1037
operand should be AL, AX, or EAX register. The source operand should be an
1038
immediate value in range from 0 to 255, or DX register.
1039
1040
 
1041
    in ax,dx         ; input word from port addressed by dx
1042
1043
 
1044
or EAX. The program can specify the number of the port using the same methods
1045
as the "in" instruction. The destination operand should be an immediate value
1046
in range from 0 to 255, or DX register. The source operand should be AL, AX,
1047
or EAX register.
1048
1049
 
1050
    out dx,al        ; output byte to port addressed by dx
1051
1052
 
1053
 
1054
1055
 
1056
may be a byte, a word, or a double word. The string elements are addressed by
1057
SI and DI (or ESI and EDI) registers. After every string operation SI and/or
1058
DI (or ESI and/or EDI) are automatically updated to point to the next element
1059
of the string. If DF (direction flag) is zero, the index registers are
1060
incremented, if DF is one, they are decremented. The amount of the increment
1061
or decrement is 1, 2, or 4 depending on the size of the string element. Every
1062
string operation instruction has short forms which have no operands and use
1063
SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
1064
type is 32-bit. SI and ESI by default address data in the segment selected
1065
by DS, DI and EDI always address data in the segment selected by ES. Short
1066
form is obtained by attaching to the mnemonic of string operation letter
1067
specifying the size of string element, it should be "b" for byte element,
1068
"w" for word element, and "d" for double word element. Full form of string
1069
operation needs operands providing the size operator and the memory addresses,
1070
which can be SI or ESI with any segment prefix, DI or EDI always with ES
1071
segment prefix.
1072
  "movs" transfers the string element pointed to by SI (or ESI) to the
1073
location pointed to by DI (or EDI). Size of operands can be byte, word, or
1074
double word. The destination operand should be memory addressed by DI or EDI,
1075
the source operand should be memory addressed by SI or ESI with any segment
1076
prefix.
1077
1078
 
1079
    movs word [es:di],[ss:si]  ; transfer word
1080
    movsd                      ; transfer double word
1081
1082
 
1083
element and updates the flags AF, SF, PF, CF and OF, but it does not change
1084
any of the compared elements. If the string elements are equal, ZF is set,
1085
otherwise it is cleared. The first operand for this instruction should be the
1086
source string element addressed by SI or ESI with any segment prefix, the
1087
second operand should be the destination string element addressed by DI or
1088
EDI.
1089
1090
 
1091
    cmps word [ds:si],[es:di]  ; compare words
1092
    cmps dword [fs:esi],[edi]  ; compare double words
1093
1094
 
1095
(depending on the size of string element) and updates the flags AF, SF, ZF,
1096
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
1097
The operand should be the destination string element addressed by DI or EDI.
1098
1099
 
1100
    scasw                      ; scan word
1101
    scas dword [es:edi]        ; scan double word
1102
1103
 
1104
element. Rules for the operand are the same as for the "scas" instruction.
1105
  "lods" places the source string element into AL, AX, or EAX. The operand
1106
should be the source string element addressed by SI or ESI with any segment
1107
prefix.
1108
1109
 
1110
    lods word [cs:si]          ; load word
1111
    lodsd                      ; load double word
1112
1113
 
1114
by DX register to the destination string element. The destination operand
1115
should be memory addressed by DI or EDI, the source operand should be the DX
1116
register.
1117
1118
 
1119
    ins word [es:di],dx        ; input word
1120
    ins dword [edi],dx         ; input double word
1121
1122
 
1123
DX register. The destination operand should be the DX register and the source
1124
operand should be memory addressed by SI or ESI with any segment prefix.
1125
1126
 
1127
    outsw                      ; output word
1128
    outs dx,dword [gs:esi]     ; output double word
1129
1130
 
1131
repeated string operation. When a string operation instruction has a repeat
1132
prefix, the operation is executed repeatedly, each time using a different
1133
element of the string. The repetition terminates when one of the conditions
1134
specified by the prefix is satisfied. All three prefixes automatically
1135
decrease CX or ECX register (depending whether string operation instruction
1136
uses the 16-bit or 32-bit addressing) after each operation and repeat the
1137
associated operation until CX or ECX is zero. "repe"/"repz" and
1138
"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
1139
(described below). When these prefixes are used, repetition of the next
1140
instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
1141
the execution when the ZF is zero, "repne" and "repnz" terminate the execution
1142
when the ZF is set.
1143
1144
 
1145
    repe cmpsb       ; compare bytes until not equal
1146
1147
 
1148
 
1149
1150
 
1151
state of bits in the flag register. All instructions described in this
1152
section have no operands.
1153
  "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
1154
CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
1155
the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
1156
interrupts, "cli" zeroes the IF and therefore disables the interrupts.
1157
  "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
1158
AH register. The contents of the remaining bits are undefined. The flags
1159
remain unaffected.
1160
  "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
1161
AF, PF, and CF.
1162
  "pushf" decrements "esp" by two or four and stores the low word or
1163
double word of flags register at the top of stack, size of stored data
1164
depends on the current code setting. "pushfw" variant forces storing the
1165
word and "pushfd" forces storing the double word.
1166
  "popf" transfers specific bits from the word or double word at the top
1167
of stack, then increments "esp" by two or four, this value depends on
1168
the current code setting. "popfw" variant forces restoring from the word
1169
and "popfd" forces restoring from the double word.
1170
1171
 
1172
 
1173
1174
 
1175
2.1) to the "set" mnemonic set a byte to one if the condition is true and set
1176
the byte to zero otherwise. The operand should be an 8-bit be general register
1177
or the byte in memory.
1178
1179
 
1180
    seto byte [bx]   ; set byte if overflow
1181
1182
 
1183
set and zeroes the AL register otherwise. This instruction has no arguments.
1184
  The instructions obtained by attaching the condition mnemonic to "cmov"
1185
mnemonic transfer the word or double word from the general register or memory
1186
to the general register only when the condition is true. The destination
1187
operand should be general register, the source operand can be general register
1188
or memory.
1189
1190
 
1191
    cmovnc eax,[ebx] ; move when carry flag cleared
1192
1193
 
1194
destination operand. If the two values are equal, the source operand is
1195
loaded into the destination operand. Otherwise, the destination operand is
1196
loaded into the AL, AX, or EAX register. The destination operand may be a
1197
general register or memory, the source operand must be a general register.
1198
1199
 
1200
    cmpxchg [bx],dx  ; compare and exchange with memory
1201
1202
 
1203
destination operand. If the values are equal, the 64-bit value in ECX and EBX
1204
registers is stored in the destination operand. Otherwise, the value in the
1205
destination operand is loaded into EDX and EAX registers. The destination
1206
operand should be a quad word in memory.
1207
1208
 
1209
1210
 
1211
 
1212
1213
 
1214
pointer. This instruction has no operands and doesn't perform any operation.
1215
  "ud2" instruction generates an invalid opcode exception. This instruction
1216
is provided for software testing to explicitly generate an invalid opcode.
1217
This is instruction has no operands.
1218
  "xlat" replaces a byte in the AL register with a byte indexed by its value
1219
in a translation table addressed by BX or EBX. The operand should be a byte
1220
memory addressed by BX or EBX with any segment prefix. This instruction has
1221
also a short form "xlatb" which has no operands and uses the BX or EBX address
1222
in the segment selected by DS depending on the current code setting.
1223
  "lds" transfers a pointer variable from the source operand to DS and the
1224
destination register. The source operand must be a memory operand, and the
1225
destination operand must be a general register. The DS register receives the
1226
segment selector of the pointer while the destination register receives the
1227
offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
1228
to "lds" except that rather than DS register the ES, FS, GS and SS is used
1229
respectively.
1230
1231
 
1232
1233
 
1234
to the destination operand. The source operand must be a memory operand, and
1235
the destination operand must be a general register.
1236
1237
 
1238
1239
 
1240
EAX, EBX, ECX, and EDX registers. The information returned is selected by
1241
entering a value in the EAX register before the instruction is executed.
1242
This instruction has no operands.
1243
  "pause" instruction delays the execution of the next instruction an
1244
implementation specific amount of time. It can be used to improve the
1245
performance of spin wait loops. This instruction has no operands.
1246
  "enter" creates a stack frame that may be used to implement the scope rules
1247
of block-structured high-level languages. A "leave" instruction at the end of
1248
a procedure complements an "enter" at the beginning of the procedure to
1249
simplify stack management and to control access to variables for nested
1250
procedures. The "enter" instruction includes two parameters. The first
1251
parameter specifies the number of bytes of dynamic storage to be allocated on
1252
the stack for the routine being entered. The second parameter corresponds to
1253
the lexical nesting level of the routine, it can be in range from 0 to 31.
1254
The specified lexical level determines how many sets of stack frame pointers
1255
the CPU copies into the new stack frame from the preceding frame. This list
1256
of stack frame pointers is sometimes called the display. The first word (or
1257
double word when code is 32-bit) of the display is a pointer to the last stack
1258
frame. This pointer enables a "leave" instruction to reverse the action of the
1259
previous "enter" instruction by effectively discarding the last stack frame.
1260
After "enter" creates the new display for a procedure, it allocates the
1261
dynamic storage space for that procedure by decrementing ESP by the number of
1262
bytes specified in the first parameter. To enable a procedure to address its
1263
display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
1264
frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
1265
BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
1266
levels greater than zero, the processor pushes additional frame pointers on
1267
the stack before adjusting the stack pointer.
1268
1269
 
1270
1271
 
1272
 
1273
1274
 
1275
CR0 register), while "smsw" stores the machine status word into the
1276
destination operand. The operand for both those instructions can be 16-bit
1277
general register or memory, for "smsw" it can also be 32-bit general
1278
register.
1279
1280
 
1281
    smsw [bx]        ; store machine status to memory
1282
1283
 
1284
descriptor table register or the interrupt descriptor table register
1285
respectively. "sgdt" and "sidt" store the contents of the global descriptor
1286
table register or the interrupt descriptor table register in the destination
1287
operand. The operand should be a 6 bytes in memory.
1288
1289
 
1290
1291
 
1292
descriptor table register and "sldt" stores the segment selector from the
1293
local descriptor table register in the operand. "ltr" loads the operand into
1294
the segment selector field of the task register and "str" stores the segment
1295
selector from the task register in the operand. Rules for operand are the same
1296
as for the "lmsw" and "smsw" instructions.
1297
  "lar" loads the access rights from the segment descriptor specified by
1298
the selector in source operand into the destination operand and sets the ZF
1299
flag. The destination operand can be a 16-bit or 32-bit general register.
1300
The source operand should be a 16-bit general register or memory.
1301
1302
 
1303
    lar eax,dx       ; load access rights into double word
1304
1305
 
1306
selector in source operand into the destination operand and sets the ZF flag.
1307
Rules for operand are the same as for the "lar" instruction.
1308
  "verr" and "verw" verify whether the code or data segment specified with
1309
the operand is readable or writable from the current privilege level. The
1310
operand should be a word, it can be general register or memory. If the segment
1311
is accessible and readable (for "verr") or writable (for "verw") the ZF flag
1312
is set, otherwise it's cleared. Rules for operand are the same as for the
1313
"lldt" instruction.
1314
  "arpl" compares the RPL (requestor's privilege level) fields of two segment
1315
selectors. The first operand contains one segment selector and the second
1316
operand contains the other. If the RPL field of the destination operand is
1317
less than the RPL field of the source operand, the ZF flag is set and the RPL
1318
field of the destination operand is increased to match that of the source
1319
operand. Otherwise, the ZF flag is cleared and no change is made to the
1320
destination operand. The destination operand can be a word general register
1321
or memory, the source operand must be a general register.
1322
1323
 
1324
    arpl [bx],ax     ; adjust RPL of selector in memory
1325
1326
 
1327
instruction has no operands.
1328
  "lock" prefix causes the processor's bus-lock signal to be asserted during
1329
execution of the accompanying instruction. In a multiprocessor environment,
1330
the bus-lock signal insures that the processor has exclusive use of any shared
1331
memory while the signal is asserted. The "lock" prefix can be prepended only
1332
to the following instructions and only to those forms of the instructions
1333
where the destination operand is a memory operand: "add", "adc", "and", "btc",
1334
"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
1335
"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
1336
these instructions and the source operand is a memory operand, an undefined
1337
opcode exception may be generated. An undefined opcode exception will also be
1338
generated if the "lock" prefix is used with any instruction not in the above
1339
list. The "xchg" instruction always asserts the bus-lock signal regardless of
1340
the presence or absence of the "lock" prefix.
1341
  "hlt" stops instruction execution and places the processor in a halted
1342
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
1343
signal will resume execution. This instruction has no operands.
1344
  "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
1345
specified with the operand, which should be a memory. The processor determines
1346
the page that contains that address and flushes the TLB entry for that page.
1347
  "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
1348
address specified in the ECX register into registers EDX and EAX. "wrmsr"
1349
writes the contents of registers EDX and EAX into the 64-bit MSR of the
1350
address specified in the ECX register. "rdtsc" loads the current value of the
1351
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
1352
registers. The processor increments the time stamp counter MSR every clock
1353
cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
1354
contents of the 40-bit performance monitoring counter specified in the ECX
1355
register into registers EDX and EAX. These instructions have no operands.
1356
  "wbinvd" writes back all modified cache lines in the processor's internal
1357
cache to main memory and invalidates (flushes) the internal caches. The
1358
instruction then issues a special function bus cycle that directs external
1359
caches to also write back modified data and another bus cycle to indicate that
1360
the external caches should be invalidated. This instruction has no operands.
1361
  "rsm" return program control from the system management mode to the program
1362
that was interrupted when the processor received an SMM interrupt. This
1363
instruction has no operands.
1364
  "sysenter" executes a fast call to a level 0 system procedure, "sysexit"
1365
executes a fast return to level 3 user code. The addresses used by these
1366
instructions are stored in MSRs. These instructions have no operands.
1367
1368
 
1369
 
1370
1371
 
1372
values in three formats: single precision (32-bit), double precision (64-bit)
1373
and double extended precision (80-bit). The FPU registers form the stack and
1374
each of them holds the double extended precision floating-point value. When
1375
some values are pushed onto the stack or are removed from the top, the FPU
1376
registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
1377
is the first value below the top, etc. The ST0 name has also the synonym ST.
1378
  "fld" pushes the floating-point value onto the FPU register stack. The
1379
operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
1380
its value is then loaded onto the top of FPU register stack (the ST0
1381
register) and is automatically converted into the double extended precision
1382
format.
1383
1384
 
1385
    fld st2          ; push value of st2 onto register stack
1386
1387
 
1388
commonly used contants onto the FPU register stack. The loaded constants are
1389
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
1390
have no operands.
1391
  "fild" converts the signed integer source operand into double extended
1392
precision floating-point format and pushes the result onto the FPU register
1393
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
1394
1395
 
1396
1397
 
1398
can be 32-bit or 64-bit memory location or another FPU register. "fstp"
1399
performs the same operation as "fst" and then pops the register stack,
1400
getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
1401
and can also store value in the 80-bit memory.
1402
1403
 
1404
    fstp tword [bx]  ; store value in memory and pop stack
1405
1406
 
1407
in the destination operand. The operand can be 16-bit or 32-bit memory
1408
location. "fistp" performs the same operation and then pops the register
1409
stack, it accepts the same operands as the "fist" instruction and can also
1410
store integer value in the 64-bit memory, so it has the same rules for
1411
operands as "fild" instruction.
1412
  "fbld" converts the packed BCD integer into double extended precision
1413
floating-point format and pushes this value onto the FPU stack. "fbstp"
1414
converts the value in ST0 to an 18-digit packed BCD integer, stores the result
1415
in the destination operand, and pops the register stack. The operand should be
1416
an 80-bit memory location.
1417
  "fadd" adds the destination and source operand and stores the sum in the
1418
destination location. The destination operand is always an FPU register, if
1419
the source is a memory location, the destination is ST0 register and only
1420
source operand should be specified. If both operands are FPU registers, at
1421
least one of them should be ST0 register. An operand in memory can be a
1422
32-bit or 64-bit value.
1423
1424
 
1425
    fadd st2,st0     ; add st0 to st2
1426
1427
 
1428
destination location and then pops the register stack. The destination operand
1429
must be an FPU register and the source operand must be the ST0. When no
1430
operands are specified, ST1 is used as a destination operand.
1431
1432
 
1433
    faddp st2,st0    ; add st0 to st2 and pop the stack
1434
1435
 
1436
precision floating-point value and adds it to the destination operand. The
1437
operand should be a 16-bit or 32-bit memory location.
1438
1439
 
1440
1441
 
1442
have the same rules for operands and differ only in the perfomed computation.
1443
"fsub" substracts the source operand from the destination operand, "fsubr"
1444
substract the destination operand from the source operand, "fmul" multiplies
1445
the destination and source operands, "fdiv" divides the destination operand by
1446
the source operand and "fdivr" divides the source operand by the destination
1447
operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
1448
operations and pop the register stack, the rules for operand are the same as
1449
for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
1450
perform these operations after converting the integer source operand into
1451
floating-point value, they have the same rules for operands as "fiadd"
1452
instruction.
1453
  "fsqrt" computes the square root of the value in ST0 register, "fsin"
1454
computes the sine of that value, "fcos" computes the cosine of that value,
1455
"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
1456
value, "frndint" rounds it to the nearest integral value, depending on the
1457
current rounding mode. "f2xm1" computes the exponential value of 2 to the
1458
power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
1459
range -1.0 to +1.0. All these instructions store the result in ST0 and have no
1460
operands.
1461
  "fsincos" computes both the sine and the cosine of the value in ST0
1462
register, stores the sine in ST0 and pushes the cosine on the top of FPU
1463
register stack. "fptan" computes the tangent of the value in ST0, stores the
1464
result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
1465
the arctangent of the value in ST1 divided by the value in ST0, stores the
1466
result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
1467
logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
1468
FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
1469
ST0 before computing the logarithm. "fprem" computes the remainder obtained
1470
from dividing the value in ST0 by the value in ST1, and stores the result
1471
in ST0. "fprem1" performs the same operation as "fprem", but it computes the
1472
remainder in the way specified by IEEE Standard 754. "fscale" truncates the
1473
value in ST1 and increases the exponent of ST0 by this value. "fxtract"
1474
separates the value in ST0 into its exponent and significand, stores the
1475
exponent in ST0 and pushes the significand onto the register stack. "fnop"
1476
performs no operation. These instructions have no operands.
1477
  "fxch" exchanges the contents of ST0 an another FPU register. The operand
1478
should be an FPU register, if no operand is specified, the contents of ST0 and
1479
ST1 are exchanged.
1480
  "fcom" and "fcomp" compare the contents of ST0 and the source operand and
1481
set flags in the FPU status word according to the results. "fcomp"
1482
additionally pops the register stack after performing the comparison. The
1483
operand can be a single or double precision value in memory or the FPU
1484
register. When no operand is specified, ST1 is used as a source operand.
1485
1486
 
1487
    fcomp st2        ; compare st0 with st2 and pop stack
1488
1489
 
1490
word according to the results and pops the register stack twice. This
1491
instruction has no operands.
1492
  "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
1493
registers. Rules for operands are the same as for the "fcom", "fcomp" and
1494
"fcompp", but the source operand must be an FPU register.
1495
  "ficom" and "ficomp" compare the value in ST0 with an integer source operand
1496
and set the flags in the FPU status word according to the results. "ficomp"
1497
additionally pops the register stack after performing the comparison. The
1498
integer value is converted to double extended precision floating-point format
1499
before the comparison is made. The operand should be a 16-bit or 32-bit
1500
memory location.
1501
1502
 
1503
1504
 
1505
another FPU register and set the ZF, PF and CF flags according to the results.
1506
"fcomip" and "fucomip" additionaly pop the register stack after performing the
1507
comparison. The instructions obtained by attaching the FPU condition mnemonic
1508
(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
1509
into ST0 register if the given test condition is true. These instructions
1510
allow two different syntaxes, one with single operand specifying the source
1511
FPU register, and one with two operands, in that case destination operand
1512
should be ST0 register and the second operand specifies the source FPU
1513
register.
1514
1515
 
1516
    fcmovb st0,st2   ; transfer st2 to st0 if below
1517
1518
 
1519
  /------------------------------------------------------\
1520
  | Mnemonic | Condition tested | Description            |
1521
  |==========|==================|========================|
1522
  | b        | CF = 1           | below                  |
1523
  | e        | ZF = 1           | equal                  |
1524
  | be       | CF or ZF = 1     | below or equal         |
1525
  | u        | PF = 1           | unordered              |
1526
  | nb       | CF = 0           | not below              |
1527
  | ne       | ZF = 0           | not equal              |
1528
  | nbe      | CF and ZF = 0    | not below nor equal    |
1529
  | nu       | PF = 0           | not unordered          |
1530
  \------------------------------------------------------/
1531
1532
 
1533
status word according to the results. "fxam" examines the contents of the ST0
1534
and sets the flags in FPU status word to indicate the class of value in the
1535
register. These instructions have no operands.
1536
  "fstsw" and "fnstsw" store the current value of the FPU status word in the
1537
destination location. The destination operand can be either a 16-bit memory or
1538
the AX register. "fstsw" checks for pending unmasked FPU exceptions before
1539
storing the status word, "fnstsw" does not.
1540
  "fstcw" and "fnstcw" store the current value of the FPU control word at the
1541
specified destination in memory. "fstcw" checks for pending umasked FPU
1542
exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
1543
the operand into the FPU control word. The operand should be a 16-bit memory
1544
location.
1545
  "fstenv" and "fnstenv" store the current FPU operating environment at the
1546
memory location specified with the destination operand, and then mask all FPU
1547
exceptions. "fstenv" checks for pending umasked FPU exceptions before
1548
proceeding, "fnstenv" does not. "fldenv" loads the complete operating
1549
environment from memory into the FPU. "fsave" and "fnsave" store the current
1550
FPU state (operating environment and register stack) at the specified
1551
destination in memory and reinitializes the FPU. "fsave" check for pending
1552
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
1553
loads the FPU state from the specified memory location. All these instructions
1554
need an operand being a memory location. For each of these instructions
1555
exist two additional mnemonics that allow to precisely select the type of the
1556
operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
1557
"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
1558
mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
1559
force the operation as in 32-bit mode.
1560
  "finit" and "fninit" set the FPU operating environment into its default
1561
state. "finit" checks for pending unmasked FPU exception before proceeding,
1562
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
1563
FPU status word. "fclex" checks for pending unmasked FPU exception before
1564
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
1565
instruction, which causes the processor to check for pending unmasked FPU
1566
exceptions and handle them before proceeding. These instructions have no
1567
operands.
1568
  "ffree" sets the tag associated with specified FPU register to empty. The
1569
operand should be an FPU register.
1570
  "fincstp" and "fdecstp" rotate the FPU stack by one by adding or
1571
substracting one to the pointer of the top of stack. These instructions have no
1572
operands.
1573
1574
 
1575
 
1576
1577
 
1578
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
1579
of this MMX instructions cannot be used at the same time as FPU instructions.
1580
They can operate on packed bytes (eight 8-bit integers), packed words (four
1581
16-bit integers) or packed double words (two 32-bit integers), use of packed
1582
formats allows to perform operations on multiple data at one time.
1583
  "movq" copies a quad word from the source operand to the destination
1584
operand. At least one of the operands must be a MMX register, the second one
1585
can be also a MMX register or 64-bit memory location.
1586
1587
 
1588
    movq mm2,[ebx]   ; move quad word from memory to register
1589
1590
 
1591
operand. One of the operands must be a MMX register, the second one can be a
1592
general register or 32-bit memory location. Only low double word of MMX
1593
register is used.
1594
  All general MMX operations have two operands, the destination operand should
1595
be a MMX register, the source operand can be a MMX register or 64-bit memory
1596
location. Operation is performed on the corresponding data elements of the
1597
source and destination operand and stored in the data elements of the
1598
destination operand. "paddb", "paddw" and "paddd" perform the addition of
1599
packed bytes, packed words, or packed double words.  "psubb", "psubw" and
1600
"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1601
"psubsb" and "psubsw" perform the addition or substraction of packed bytes
1602
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
1603
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
1604
performs a signed multiplication of the packed words and store the high or low
1605
words of the results in the destination operand. "pmaddwd" performs a multiply
1606
of the packed words and adds the four intermediate double word products in
1607
pairs to produce result as a packed double words. "pand", "por" and "pxor"
1608
perform the logical operations on the quad words, "pandn" peforms also a
1609
logical negation of the destination operand before performing the "and"
1610
operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
1611
bytes, packed words or packed double words. If a pair of data elements is
1612
equal, the corresponding data element in the destination operand is filled with
1613
bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
1614
perform the similar operation, but they check whether the data elements in the
1615
destination operand are greater than the correspoding data elements in the
1616
source operand. "packsswb" converts packed signed words into packed signed
1617
bytes, "packssdw" converts packed signed double words into packed signed
1618
words, using saturation to handle overflow conditions. "packuswb" converts
1619
packed signed words into packed unsigned bytes. Converted data elements from
1620
the source operand are stored in the low part of the destination operand,
1621
while converted data elements from the destination operand are stored in the
1622
high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
1623
elements from the high parts of the source and destination operands and
1624
stores the result into the destination operand. "punpcklbw", "punpcklwd" and
1625
"punpckldq" perform the same operation, but the low parts of the source and
1626
destination operand are used.
1627
1628
 
1629
    pcmpeqw mm3,mm7  ; compare packed words for equality
1630
1631
 
1632
packed double words or a single quad word in the destination operand by the
1633
amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
1634
logical shift right of the packed words, packed double words or a single quad
1635
word. "psraw" and "psrad" perform arithmetic shift of the packed words or
1636
double words. The destination operand should be a MMX register, while source
1637
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
1638
value.
1639
1640
 
1641
    psrad mm4,[ebx]  ; shift double words right arithmetically
1642
1643
 
1644
used before using the FPU instructions if any MMX instructions were used.
1645
1646
 
1647
 
1648
1649
 
1650
operations on packed single precision floating point values. The 128-bit
1651
packed single precision format consists of four single precision floating
1652
point values. The 128-bit SSE registers are designed for the purpose of
1653
operations on this data type.
1654
  "movaps" and "movups" transfer a double quad word operand containing packed
1655
single precision values from source operand to destination operand. At least
1656
one of the operands have to be a SSE register, the second one can be also a
1657
SSE register or 128-bit memory location. Memory operands for "movaps"
1658
instruction must be aligned on boundary of 16 bytes, operands for "movups"
1659
instruction don't have to be aligned.
1660
1661
 
1662
1663
 
1664
low quad word of SSE register. "movhps" moved packed two single precision
1665
values between the memory and the high quad word of SSE register. One of the
1666
operands must be a SSE register, and the other operand must be a 64-bit memory
1667
location.
1668
1669
 
1670
    movhps [esi],xmm7  ; move high quad word of xmm7 to memory
1671
1672
 
1673
of source register to the high quad word of destination register. "movhlps"
1674
moves two packed single precision values from the high quad word of source
1675
register to the low quad word of destination register. Both operands have to
1676
be a SSE registers.
1677
  "movmskps" transfers the most significant bit of each of the four single
1678
precision values in the SSE register into low four bits of a general register.
1679
The source operand must be a SSE register, the destination operand must be a
1680
general register.
1681
  "movss" transfers a single precision value between source and destination
1682
operand (only the low double word is trasferred). At least one of the operands
1683
have to be a SSE register, the second one can be also a SSE register or 32-bit
1684
memory location.
1685
1686
 
1687
1688
 
1689
ends with "ps", the source operand can be a 128-bit memory location or a SSE
1690
register, the destination operand must be a SSE register and the operation is
1691
performed on packed four single precision values, for each pair of the
1692
corresponding data elements separately, the result is stored in the
1693
destination register. When the mnemonic ends with "ss", the source operand
1694
can be a 32-bit memory location or a SSE register, the destination operand
1695
must be a SSE register and the operation is performed on single precision
1696
values, only low double words of SSE registers are used in this case, the
1697
result is stored in the low double word of destination register. "addps" and
1698
"addss" add the values, "subps" and "subss" substract the source value from
1699
destination value, "mulps" and "mulss" multiply the values, "divps" and
1700
"divss" divide the destination value by the source value, "rcpps" and "rcpss"
1701
compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
1702
compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
1703
the approximate reciprocal of square root of the source value, "maxps" and
1704
"maxss" compare the source and destination values and return the greater one,
1705
"minps" and "minss" compare the source and destination values and return the
1706
lesser one.
1707
1708
 
1709
    addps xmm3,xmm7    ; add packed single precision values
1710
1711
 
1712
packed single precision values. The source operand can be a 128-bit memory
1713
location or a SSE register, the destination operand must be a SSE register.
1714
  "cmpps" compares packed single precision values and returns a mask result
1715
into the destination operand, which must be a SSE register. The source operand
1716
can be a 128-bit memory location or SSE register, the third operand must be an
1717
immediate operand selecting code of one of the eight compare conditions
1718
(table 2.3). "cmpss" performs the same operation on single precision values,
1719
only low double word of destination register is affected, in this case source
1720
operand can be a 32-bit memory location or SSE register. These two
1721
instructions have also variants with only two operands and the condition
1722
encoded within mnemonic. Their mnemonics are obtained by attaching the
1723
mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
1724
"ss" at the end.
1725
1726
 
1727
    cmpltss xmm0,[ebx] ; compare single precision values
1728
1729
 
1730
  /-------------------------------------------\
1731
  | Code | Mnemonic | Description             |
1732
  |======|==========|=========================|
1733
  | 0    | eq       | equal                   |
1734
  | 1    | lt       | less than               |
1735
  | 2    | le       | less than or equal      |
1736
  | 3    | unord    | unordered               |
1737
  | 4    | neq      | not equal               |
1738
  | 5    | nlt      | not less than           |
1739
  | 6    | nle      | not less than nor equal |
1740
  | 7    | ord      | ordered                 |
1741
  \-------------------------------------------/
1742
1743
 
1744
PF and CF flags to show the result. The destination operand must be a SSE
1745
register, the source operand can be a 32-bit memory location or SSE register.
1746
  "shufps" moves any two of the four single precision values from the
1747
destination operand into the low quad word of the destination operand, and any
1748
two of the four values from the source operand into the high quad word of the
1749
destination operand. The destination operand must be a SSE register, the
1750
source operand can be a 128-bit memory location or SSE register, the third
1751
operand must be an 8-bit immediate value selecting which values will be moved
1752
into the destination operand. Bits 0 and 1 select the value to be moved from
1753
destination operand to the low double word of the result, bits 2 and 3 select
1754
the value to be moved from the destination operand to the second double word,
1755
bits 4 and 5 select the value to be moved from the source operand to the third
1756
double word, and bits 6 and 7 select the value to be moved from the source
1757
operand to the high double word of the result.
1758
1759
 
1760
1761
 
1762
of the source and destination operands and stores the result in the
1763
destination operand, which must be a SSE register. The source operand can be
1764
a 128-bit memory location or a SSE register. "unpcklps" performs an
1765
interleaved unpack of the values from the low parts of the source and
1766
destination operand and stores the result in the destination operand,
1767
the rules for operands are the same.
1768
  "cvtpi2ps" converts packed two double word integers into the the packed two
1769
single precision floating point values and stores the result in the low quad
1770
word of the destination operand, which should be a SSE register. The source
1771
operand can be a 64-bit memory location or MMX register.
1772
1773
 
1774
1775
 
1776
point value and stores the result in the low double word of the destination
1777
operand, which should be a SSE register. The source operand can be a 32-bit
1778
memory location or 32-bit general register.
1779
1780
 
1781
1782
 
1783
packed two double word integers and stores the result in the destination
1784
operand, which should be a MMX register. The source operand can be a 64-bit
1785
memory location or SSE register, only low quad word of SSE register is used.
1786
"cvttps2pi" performs the similar operation, except that truncation is used to
1787
round a source values to integers, rules for the operands are the same.
1788
1789
 
1790
1791
 
1792
word integer and stores the result in the destination operand, which should be
1793
a 32-bit general register. The source operand can be a 32-bit memory location
1794
or SSE register, only low double word of SSE register is used. "cvttss2si"
1795
performs the similar operation, except that truncation is used to round a
1796
source value to integer, rules for the operands are the same.
1797
1798
 
1799
1800
 
1801
operand to the destination operand. The source operand must be a MMX register,
1802
the destination operand must be a 32-bit general register (the high word of
1803
the destination is cleared), the third operand must an 8-bit immediate value.
1804
1805
 
1806
1807
 
1808
at the location specified with the third operand, which must be an 8-bit
1809
immediate value. The destination operand must be a MMX register, the source
1810
operand can be a 16-bit memory location or 32-bit general register (only low
1811
word of the register is used).
1812
1813
 
1814
1815
 
1816
return the maximum values of packed unsigned bytes, "pminub" returns the
1817
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
1818
of packed signed words, "pminsw" returns the minimum values of packed signed
1819
words. "pmulhuw" performs a unsigned multiplication of the packed words and
1820
stores the high words of the results in the destination operand. "psadbw"
1821
computes the absolute differences of packed unsigned bytes, sums the
1822
differences, and stores the sum in the low word of destination operand. All
1823
these instructions follow the same rules for operands as the general MMX
1824
operations described in previous section.
1825
  "pmovmskb" creates a mask made of the most significant bit of each byte in
1826
the source operand and stores the result in the low byte of destination
1827
operand. The source operand must be a MMX register, the destination operand
1828
must a 32-bit general register.
1829
  "pshufw" inserts words from the source operand in the destination operand
1830
from the locations specified with the third operand. The destination operand
1831
must be a MMX register, the source operand can be a 64-bit memory location or
1832
MMX register, third operand must an 8-bit immediate value selecting which
1833
values will be moved into destination operand, in the similar way as the third
1834
operand of the "shufps" instruction.
1835
  "movntq" moves the quad word from the source operand to memory using a
1836
non-temporal hint to minimize cache pollution. The source operand should be a
1837
MMX register, the destination operand should be a 64-bit memory location.
1838
"movntps" stores packed single precision values from the SSE register to
1839
memory using a non-temporal hint. The source operand should be a SSE register,
1840
the destination operand should be a 128-bit memory location. "maskmovq" stores
1841
selected bytes from the first operand into a 64-bit memory location using a
1842
non-temporal hint. Both operands should be a MMX registers, the second operand
1843
selects wich bytes from the source operand are written to memory. The
1844
memory location is pointed by DI (or EDI) register in the segment selected
1845
by DS.
1846
  "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
1847
of data from memory that contains byte specified with the operand to a
1848
specified location in hierarchy.  The operand should be an 8-bit memory
1849
location.
1850
  "sfence" performs a serializing operation on all instruction storing to
1851
memory that were issued prior to it. This instruction has no operands.
1852
  "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
1853
stores the contents of MXCSR into a 32-bit memory operand.
1854
  "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
1855
and SSE registers to a 512-byte memory location specified in the destination
1856
operand. "fxrstor" reloads data previously stored with "fxsave" instruction
1857
from the specified 512-byte memory location. The memory operand for both those
1858
instructions must be aligned on 16 byte boundary, it should declare operand
1859
of no specified size.
1860
1861
 
1862
 
1863
1864
 
1865
floating point values, extends the syntax of MMX instructions, and adds also
1866
some new instructions.
1867
  "movapd" and "movupd" transfer a double quad word operand containing packed
1868
double precision values from source operand to destination operand. These
1869
instructions are analogous to "movaps" and "movups" and have the same rules
1870
for operands.
1871
  "movlpd" moves double precision value between the memory and the low quad
1872
word of SSE register. "movhpd" moved double precision value between the memory
1873
and the high quad word of SSE register. These instructions are analogous to
1874
"movlps" and "movhps" and have the same rules for operands.
1875
  "movmskpd" transfers the most significant bit of each of the two double
1876
precision values in the SSE register into low two bits of a general register.
1877
This instruction is analogous to "movmskps" and has the same rules for
1878
operands.
1879
  "movsd" transfers a double precision value between source and destination
1880
operand (only the low quad word is trasferred). At least one of the operands
1881
have to be a SSE register, the second one can be also a SSE register or 64-bit
1882
memory location.
1883
  Arithmetic operations on double precision values are: "addpd", "addsd",
1884
"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
1885
"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
1886
operations on single precision values described in previous section. When the
1887
mnemonic ends with "pd" instead of "ps", the operation is performed on packed
1888
two double precision values, but rules for operands are the same. When the
1889
mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
1890
memory location or a SSE register, the destination operand must be a SSE
1891
register and the operation is performed on double precision values, only low
1892
quad words of SSE registers are used in this case.
1893
  "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
1894
packed double precision values. They are analoguous to SSE logical operations
1895
on single prevision values and have the same rules for operands.
1896
  "cmppd" compares packed double precision values and returns and returns a
1897
mask result into the destination operand. This instruction is analoguous to
1898
"cmpps" and has the same rules for operands. "cmpsd" performs the same
1899
operation on double precision values, only low quad word of destination
1900
register is affected, in this case source operand can be a 64-bit memory or
1901
SSE register. Variant with only two operands are obtained by attaching the
1902
condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
1903
the "pd" or "sd" at the end.
1904
  "comisd" and "ucomisd" compare the double precision values and set the ZF,
1905
PF and CF flags to show the result. The destination operand must be a SSE
1906
register, the source operand can be a 128-bit memory location or SSE register.
1907
  "shufpd" moves any of the two double precision values from the destination
1908
operand into the low quad word of the destination operand, and any of the two
1909
values from the source operand into the high quad word of the destination
1910
operand. This instruction is analoguous to "shufps" and has the same rules for
1911
operand. Bit 0 of the third operand selects the value to be moved from the
1912
destination operand, bit 1 selects the value to be moved from the source
1913
operand, the rest of bits are reserved and must be zeroed.
1914
  "unpckhpd" performs an unpack of the high quad words from the source and
1915
destination operands, "unpcklpd" performs an unpack of the low quad words from
1916
the source and destination operands. They are analoguous to "unpckhps" and
1917
"unpcklps", and have the same rules for operands.
1918
  "cvtps2pd" converts the packed two single precision floating point values to
1919
two packed double precision floating point values, the destination operand
1920
must be a SSE register, the source operand can be a 64-bit memory location or
1921
SSE register. "cvtpd2ps" converts the packed two double precision floating
1922
point values to packed two single precision floating point values, the
1923
destination operand must be a SSE register, the source operand can be a
1924
128-bit memory location or SSE register. "cvtss2sd" converts the single
1925
precision floating point value to double precision floating point value, the
1926
destination operand must be a SSE register, the source operand can be a 32-bit
1927
memory location or SSE register. "cvtsd2ss" converts the double precision
1928
floating point value to single precision floating point value, the destination
1929
operand must be a SSE register, the source operand can be 64-bit memory
1930
location or SSE register.
1931
  "cvtpi2pd" converts packed two double word integers into the the packed
1932
double precision floating point values, the destination operand must be a SSE
1933
register, the source operand can be a 64-bit memory location or MMX register.
1934
"cvtsi2sd" converts a double word integer into a double precision floating
1935
point value, the destination operand must be a SSE register, the source
1936
operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
1937
converts packed double precision floating point values into packed two double
1938
word integers, the destination operand should be a MMX register, the source
1939
operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
1940
the similar operation, except that truncation is used to round a source values
1941
to integers, rules for operands are the same. "cvtsd2si" converts a double
1942
precision floating point value into a double word integer, the destination
1943
operand should be a 32-bit general register, the source operand can be a
1944
64-bit memory location or SSE register. "cvttsd2si" performs the similar
1945
operation, except that truncation is used to round a source value to integer,
1946
rules for operands are the same.
1947
  "cvtps2dq" and "cvttps2dq" convert packed single precision floating point
1948
values to packed four double word integers, storing them in the destination
1949
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
1950
point values to packed two double word integers, storing the result in the low
1951
quad word of the destination operand. "cvtdq2ps" converts packed four
1952
double word integers to packed single precision floating point values.
1953
For all these instructions destination operand must be a SSE register, the
1954
source operand can be a 128-bit memory location or SSE register.
1955
"cvtdq2pd" converts packed two double word integers from the source operand to
1956
packed double precision floating point values, the source can be a 64-bit
1957
memory location or SSE register, destination has to be SSE register.
1958
  "movdqa" and "movdqu" transfer a double quad word operand containing packed
1959
integers from source operand to destination operand. At least one of the
1960
operands have to be a SSE register, the second one can be also a SSE register
1961
or 128-bit memory location. Memory operands for "movdqa" instruction must be
1962
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
1963
to be aligned.
1964
  "movq2dq" moves the contents of the MMX source register to the low quad word
1965
of destination SSE register. "movdq2q" moves the low quad word from the source
1966
SSE register to the destination MMX register.
1967
1968
 
1969
    movdq2q mm0,xmm1   ; move from SSE register to MMX register
1970
1971
 
1972
mnemonics starting with "p") are extended to operate on 128-bit packed
1973
integers located in SSE registers. Additional syntax for these instructions
1974
needs an SSE register where MMX register was needed, and the 128-bit memory
1975
location or SSE register where 64-bit memory location or MMX register were
1976
needed. The exception is "pshufw" instruction, which doesn't allow extended
1977
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
1978
the extended syntax, and perform the same operation as "pshufw" on the high
1979
or low quad words of operands respectively. Also the new instruction "pshufd"
1980
is introduced, which performs the same operation as "pshufw", but on the
1981
double words instead of words, it allows only the extended syntax.
1982
1983
 
1984
    pextrw eax,xmm0,7  ; extract highest word into eax
1985
1986
 
1987
substraction of packed quad words, "pmuludq" performs an unsigned
1988
multiplication of low double words from each corresponding quad words and
1989
returns the results in packed quad words. These instructions follow the same
1990
rules for operands as the general MMX operations described in 2.1.14.
1991
  "pslldq" and "psrldq" perform logical shift left or right of the double
1992
quad word in the destination operand by the amount of bytes specified in the
1993
source operand. The destination operand should be a SSE register, source
1994
operand should be an 8-bit immediate value.
1995
  "punpckhqdq" interleaves the high quad word of the source operand and the
1996
high quad word of the destination operand and writes them to the destination
1997
SSE register. "punpcklqdq" interleaves the low quad word of the source operand
1998
and the low quad word of the destination operand and writes them to the
1999
destination SSE register. The source operand can be a 128-bit memory location
2000
or SSE register.
2001
  "movntdq" stores packed integer data from the SSE register to memory using
2002
non-temporal hint. The source operand should be a SSE register, the
2003
destination operand should be a 128-bit memory location. "movntpd" stores
2004
packed double precision values from the SSE register to memory using a
2005
non-temporal hint. Rules for operand are the same. "movnti" stores integer
2006
from a general register to memory using a non-temporal hint. The source
2007
operand should be a 32-bit general register, the destination operand should
2008
be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
2009
operand into a 128-bit memory location using a non-temporal hint. Both
2010
operands should be a SSE registers, the second operand selects wich bytes from
2011
the source operand are written to memory. The memory location is pointed by DI
2012
(or EDI) register in the segment selected by DS and does not need to be
2013
aligned.
2014
  "clflush" writes and invalidates the cache line associated with the address
2015
of byte specified with the operand, which should be a 8-bit memory location.
2016
  "lfence" performs a serializing operation on all instruction loading from
2017
memory that were issued prior to it. "mfence" performs a serializing operation
2018
on all instruction accesing memory that were issued prior to it, and so it
2019
combines the functions of "sfence" (described in previous section) and
2020
"lfence" instructions. These instructions have no operands.
2021
2022
 
2023
 
2024
2025
 
2026
of SSE and SSE2 - this extension is called SSE3.
2027
  "fisttp" behaves like the "fistp" instruction and accepts the same operands,
2028
the only difference is that it always used truncation, irrespective of the
2029
rounding mode.
2030
  "movshdup" loads into destination operand the 128-bit value obtained from
2031
the source value of the same size by filling the each quad word with the two
2032
duplicates of the value in its high double word. "movsldup" performs the same
2033
action, except it duplicates the values of low double words. The destination
2034
operand should be SSE register, the source operand can be SSE register or
2035
128-bit memory location.
2036
  "movddup" loads the 64-bit source value and duplicates it into high and low
2037
quad word of the destination operand. The destination operand should be SSE
2038
register, the source operand can be SSE register or 64-bit memory location.
2039
  "lddqu" is functionally equivalent to "movdqu" with memory as source
2040
operand, but it may improve performance when the source operand crosses a
2041
cacheline boundary. The destination operand has to be SSE register, the source
2042
operand must be 128-bit memory location.
2043
  "addsubps" performs single precision addition of second and fourth pairs and
2044
single precision substracion of the first and third pairs of floating point
2045
values in the operands. "addsubpd" performs double precision addition of the
2046
second pair and double precision substraction of the first pair of floating
2047
point values in the operand. "haddps" performs the addition of two single
2048
precision values within the each quad word of source and destination operands,
2049
and stores the results of such horizontal addition of values from destination
2050
operand into low quad word of destination operand, and the results from the
2051
source operand into high quad word of destination operand. "haddpd" performs
2052
the addition of two double precision values within each operand, and stores
2053
the result from destination operand into low quad word of destination operand,
2054
and the result from source operand into high quad word of destination operand.
2055
All these instructions need the destination operand to be SSE register, source
2056
operand can be SSE register or 128-bit memory location.
2057
  "monitor" sets up an address range for monitoring of write-back stores. It
2058
need its three operands to be EAX, ECX and EDX register in that order. "mwait"
2059
waits for a write-back store to the address range set up by the "monitor"
2060
instruction. It uses two operands with additional parameters, first being the
2061
EAX and second the ECX register.
2062
  The functionality of SSE3 is further extended by the set of Supplemental
2063
SSE3 instructions (SSSE3). They generally follow the same rules for operands
2064
as all the MMX operations extended by SSE.
2065
  "phaddw" and "phaddd" perform the horizontal additional of the pairs of
2066
adjacent values from both the source and destination operand, and stores the
2067
sums into the destination (sums from the source operand go into lower part of
2068
destination register). They operate on 16-bit or 32-bit chunks, respectively.
2069
"phaddsw" performs the same operation on signed 16-bit packed values, but the
2070
result of each addition is saturated. "phsubw" and "phsubd" analogously
2071
perform the horizontal substraction of 16-bit or 32-bit packed value, and
2072
"phsubsw" performs the horizontal substraction of signed 16-bit packed values
2073
with saturation.
2074
  "pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed
2075
packed signed value in source operand and stores them into the destination
2076
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.
2077
  "pmaddubsw" multiplies signed 8-bit values from the source operand with the
2078
corresponding unsigned 8-bit values from the destination operand to produce
2079
intermediate 16-bit values, and every adjacent pair of those intermediate
2080
values is then added horizontally and those 16-bit sums are stored into the
2081
destination operand.
2082
  "pmulhrsw" multiplies corresponding 16-bit integers from the source and
2083
destination operand to produce intermediate 32-bit values, and the 16 bits
2084
next to the highest bit of each of those values are then rounded and packed
2085
into the destination operand.
2086
  "pshufb" shuffles the bytes in the destination operand according to the
2087
mask provided by source operand - each of the bytes in source operand is
2088
an index of the target position for the corresponding byte in the destination.
2089
  "psignb", "psignw" and "psignd" perform the operation on 8-bit, 16-bit or
2090
32-bit integers in destination operand, depending on the signs of the values
2091
in the source. If the value in source is negative, the corresponding value in
2092
the destination register is negated, if the value in source is positive, no
2093
operation is performed on the corresponding value is performed, and if the
2094
value in source is zero, the value in destination is zeroed, too.
2095
  "palignr" appends the source operand to the destination operand to form the
2096
intermediate value of twice the size, and then extracts into the destination
2097
register the 64 or 128 bits that are right-aligned to the byte offset
2098
specified by the third operand, which should be an 8-bit immediate value. This
2099
is the only SSSE3 instruction that takes three arguments.
2100
2101
 
2102
 
2103
2104
 
2105
and introduces operation on the 64-bit packed floating point values, each
2106
consisting of two single precision floating point values.
2107
  These instructions follow the same rules as the general MMX operations, the
2108
destination operand should be a MMX register, the source operand can be a MMX
2109
register or 64-bit memory location. "pavgusb" computes the rounded averages
2110
of packed unsigned bytes. "pmulhrw" performs a signed multiplication of the
2111
packed words, round the high word of each double word results and stores them
2112
in the destination operand. "pi2fd" converts packed double word integers into
2113
packed floating point values. "pf2id" converts packed floating point values
2114
into packed double word integers using truncation. "pi2fw" converts packed
2115
word integers into packed floating point values, only low words of each
2116
double word in source operand are used. "pf2iw" converts packed floating
2117
point values to packed word integers, results are extended to double words
2118
using the sign extension. "pfadd" adds packed floating point values. "pfsub"
2119
and "pfsubr" substracts packed floating point values, the first one substracts
2120
source values from destination values, the second one substracts destination
2121
values from the source values. "pfmul" multiplies packed floating point
2122
values. "pfacc" adds the low and high floating point values of the destination
2123
operand, storing the result in the low double word of destination, and adds
2124
the low and high floating point values of the source operand, storing the
2125
result in the high double word of destination. "pfnacc" substracts the high
2126
floating point value of the destination operand from the low, storing the
2127
result in the low double word of destination, and substracts the high floating
2128
point value of the source operand from the low, storing the result in the high
2129
double word of destination. "pfpnacc" substracts the high floating point value
2130
of the destination operand from the low, storing the result in the low double
2131
word of destination, and adds the low and high floating point values of the
2132
source operand, storing the result in the high double word of destination.
2133
"pfmax" and "pfmin" compute the maximum and minimum of floating point values.
2134
"pswapd" reverses the high and low double word of the source operand. "pfrcp"
2135
returns an estimates of the reciprocals of floating point values from the
2136
source operand, "pfrsqrt" returns an estimates of the reciprocal square
2137
roots of floating point values from the source operand, "pfrcpit1" performs
2138
the first step in the Newton-Raphson iteration to refine the reciprocal
2139
approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
2140
step in the Newton-Raphson iteration to refine the reciprocal square root
2141
approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
2142
second final step in the Newton-Raphson iteration to refine the reciprocal
2143
approximation or the reciprocal square root approximation. "pfcmpeq",
2144
"pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
2145
all bits or zeroes all bits of the correspoding data element in the
2146
destination operand according to the result of comparison, first checks
2147
whether values are equal, second checks whether destination value is greater
2148
or equal to source value, third checks whether destination value is greater
2149
than source value.
2150
  "prefetch" and "prefetchw" load the line of data from memory that contains
2151
byte specified with the operand into the data cache, "prefetchw" instruction
2152
should be used when the data in the cache line is expected to be modified,
2153
otherwise the "prefetch" instruction should be used. The operand should be an
2154
8-bit memory location.
2155
  "femms" performs a fast clear of MMX state. This instruction has no
2156
operands.
2157
2158
 
2159
 
2160
2161
 
2162
both) extend the x86 instruction set for the 64-bit processing. While legacy
2163
and compatibility modes use the same set of registers and instructions, the
2164
new long mode extends the x86 operations to 64 bits and introduces several new
2165
registers. You can turn on generating the code for this mode with the "use64"
2166
directive.
2167
  Each of the general purpose registers is extended to 64 bits and the eight
2168
whole new general purpose registers and also eight new SSE registers are added.
2169
See table 2.4 for the summary of new registers (only the ones that was not
2170
listed in table 1.2). The general purpose registers of smallers sizes are the
2171
low order portions of the larger ones. You can still access the "ah", "bh",
2172
"ch" and "dh" registers in long mode, but you cannot use them in the same
2173
instruction with any of the new registers.
2174
2175
 
2176
  /--------------------------------------------------\
2177
  | Type |          General          |  SSE  |  AVX  |
2178
  |------|---------------------------|-------|-------|
2179
  | Bits |  8   |  16  |  32  |  64  |  128  |  256  |
2180
  |======|======|======|======|======|=======|=======|
2181
  |      |      |      |      | rax  |       |       |
2182
  |      |      |      |      | rcx  |       |       |
2183
  |      |      |      |      | rdx  |       |       |
2184
  |      |      |      |      | rbx  |       |       |
2185
  |      | spl  |      |      | rsp  |       |       |
2186
  |      | bpl  |      |      | rbp  |       |       |
2187
  |      | sil  |      |      | rsi  |       |       |
2188
  |      | dil  |      |      | rdi  |       |       |
2189
  |      | r8b  | r8w  | r8d  | r8   | xmm8  | ymm8  |
2190
  |      | r9b  | r9w  | r9d  | r9   | xmm9  | ymm9  |
2191
  |      | r10b | r10w | r10d | r10  | xmm10 | ymm10 |
2192
  |      | r11b | r11w | r11d | r11  | xmm11 | ymm11 |
2193
  |      | r12b | r12w | r12d | r12  | xmm12 | ymm12 |
2194
  |      | r13b | r13w | r13d | r13  | xmm13 | ymm13 |
2195
  |      | r14b | r14w | r14d | r14  | xmm14 | ymm14 |
2196
  |      | r15b | r15w | r15d | r15  | xmm15 | ymm15 |
2197
  \--------------------------------------------------/
2198
2199
 
2200
32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
2201
registers should be used for addressing in long mode, the 32-bit addressing
2202
is also allowed, but it's not possible to use the addresses based on 16-bit
2203
registers. Below are the samples of new operations possible in long mode on the
2204
example of "mov" instruction:
2205
2206
 
2207
    mov al,[rbx] ; transfer memory addressed by 64-bit register
2208
2209
 
2210
specify it manually with the special RIP register symbol, but such addressing
2211
is also automatically generated by flat assembler, since there is no 64-bit
2212
absolute addressing in long mode. You can still force the assembler to use the
2213
32-bit absolute addressing by putting the "dword" size override for address
2214
inside the square brackets. There is also one exception, where the 64-bit
2215
absolute addressing is possible, it's the "mov" instruction with one of the
2216
operand being accumulator register, and second being the memory operand.
2217
To force the assembler to use the 64-bit absolute addressing there, use the
2218
"qword" size operator for address inside the square brackets. When no size
2219
operator is applied to address, assembler generates the optimal form
2220
automatically.
2221
2222
 
2223
    mov [dword 0],r15d ; absolute 32-bit addressing
2224
    mov [0],rsi        ; automatic RIP-relative addressing
2225
    mov [rip+3],sil    ; manual RIP-relative addressing
2226
2227
 
2228
values are possible, with the only exception being the "mov" instruction with
2229
destination operand being 64-bit general purpose register. Trying to force the
2230
64-bit immediate with any other instruction will cause an error.
2231
  If any operation is performed on the 32-bit general registers in long mode,
2232
the upper 32 bits of the 64-bit registers containing them are filled with
2233
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
2234
registers, which preserve the upper bits.
2235
  Three new type conversion instructions are available. The "cdqe" sign
2236
extends the double word in EAX into quad word and stores the result in RAX
2237
register. "cqo" sign extends the quad word in RAX into double quad word and
2238
stores the extra bits in the RDX register. These instructions have no
2239
operands. "movsxd" sign extends the double word source operand, being either
2240
the 32-bit register or memory, into 64-bit destination operand, which has to
2241
be register. No analogous instruction is needed for the zero extension, since
2242
it is done automatically by any operations on 32-bit registers, as noted in
2243
previous paragraph. And the "movzx" and "movsx" instructions, conforming to
2244
the general rule, can be used with 64-bit destination operand, allowing
2245
extension of byte or word values into quad words.
2246
  All the binary arithmetic and logical instruction have been promoted to
2247
allow 64-bit operands in long mode. The use of decimal arithmetic instructions
2248
in long mode is prohibited.
2249
  The stack operations, like "push" and "pop" in long mode default to 64-bit
2250
operands and it's not possible to use 32-bit operands with them. The "pusha"
2251
and "popa" are disallowed in long mode.
2252
  The indirect near jumps and calls in long mode default to 64-bit operands
2253
and it's not possible to use the 32-bit operands with them. On the other hand,
2254
the indirect far jumps and calls allow any operands that were allowed by the
2255
x86 architecture and also 80-bit memory operand is allowed (though only EM64T
2256
seems to implement such variant), with the first eight bytes defining the
2257
offset and two last bytes specifying the selector. The direct far jumps and
2258
calls are not allowed in long mode.
2259
  The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
2260
instructions that are not extended to accept quad word operands in long mode.
2261
But all other string operations are, and there are new short forms "movsq",
2262
"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
2263
operations for 64-bit string elements. The RSI and RDI registers are used by
2264
default to address the string elements.
2265
  The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
2266
memory operand with 64-bit destination register (though only EM64T seems to
2267
implement such variant). The "lds" and "les" are disallowed in long mode.
2268
  The system instructions like "lgdt" which required the 48-bit memory operand,
2269
in long mode require the 80-bit memory operand.
2270
  The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
2271
the double quad word memory operand and 64-bit registers to perform the
2272
analoguous operation.
2273
  The "fxsave64" and "fxrstor64" are new variants of "fxsave" and "fxrstor"
2274
instructions, available only in long mode, which use a different format of
2275
storage area in order to store some pointers in full 64-bit size.
2276
  "swapgs" is the new instruction, which swaps the contents of GS register and
2277
the KernelGSbase model-specific register (MSR address 0C0000102h).
2278
  "syscall" and "sysret" is the pair of new instructions that provide the
2279
functionality similar to "sysenter" and "sysexit" in long mode, where the
2280
latter pair is disallowed. The "sysexitq" and "sysretq" mnemonics provide the
2281
64-bit versions of "sysexit" and "sysret" instructions.
2282
  The "rdmsrq" and "wrmsrq" mnemonics are the 64-bit variants of the "rdmsr"
2283
and "wrmsr" instructions.
2284
2285
 
2286
 
2287
2288
 
2289
Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
2290
former into the full Intel's SSE4 set. On the other hand, the implementation
2291
by AMD includes only a few instructions from this set, but also contains
2292
some additional instructions, that are called the SSE4a set.
2293
  The SSE4.1 instructions mostly follow the same rules for operands, as
2294
the basic SSE operations, so they require destination operand to be SSE
2295
register and source operand to be 128-bit memory location or SSE register,
2296
and some operations require a third operand, the 8-bit immediate value.
2297
  "pmulld" performs a signed multiplication of the packed double words and
2298
stores the low double words of the results in the destination operand.
2299
"pmuldq" performs a two signed multiplications of the corresponding double
2300
words in the lower quad words of operands, and stores the results as
2301
packed quad words into the destination register. "pminsb" and "pmaxsb"
2302
return the minimum or maximum values of packed signed bytes, "pminuw" and
2303
"pmaxuw" return the minimum and maximum values of packed unsigned words,
2304
"pminud", "pmaxud", "pminsd" and "pmaxsd" return minimum or maximum values
2305
of packed unsigned or signed words. These instructions complement the
2306
instructions computing packed minimum or maximum introduced by SSE.
2307
  "ptest" sets the ZF flag to one when the result of bitwise AND of the
2308
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
2309
to one, when the result of bitwise AND of the destination operand with
2310
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
2311
"pcmpeqq" compares packed quad words for equality, and fills the
2312
corresponding elements of destination operand with either ones or zeros,
2313
depending on the result of comparison.
2314
  "packusdw" converts packed signed double words from both the source and
2315
destination operand into the unsigned words using saturation, and stores
2316
the eight resulting word values into the destination register.
2317
  "phminposuw" finds the minimum unsigned word value in source operand and
2318
places it into the lowest word of destination operand, setting the remaining
2319
upper bits of destination to zero.
2320
  "roundps", "roundss", "roundpd" and "roundsd" perform the rounding of packed
2321
or individual floating point value of single or double precision, using the
2322
rounding mode specified by the third operand.
2323
2324
 
2325
2326
 
2327
values, that is it multiplies the corresponding pairs of values from source and
2328
destination operand and then sums the products up. The high four bits of the
2329
8-bit immediate third operand control which products are calculated and taken
2330
to the sum, and the low four bits control, into which elements of destination
2331
the resulting dot product is copied (the other elements are filled with zero).
2332
"dppd" calculates dot product of packed double precision floating point values.
2333
The bits 4 and 5 of third operand control, which products are calculated and
2334
added, and bits 0 and 1 of this value control, which elements in destination
2335
register should get filled with the result. "mpsadbw" calculates multiple sums
2336
of absolute differences of unsigned bytes. The third operand controls, with
2337
value in bits 0-1, which of the four-byte blocks in source operand is taken to
2338
calculate the absolute differencies, and with value in bit 2, at which of the
2339
two first four-byte block in destination operand start calculating multiple
2340
sums. The sum is calculated from four absolute differencies between the
2341
corresponding unsigned bytes in the source and destination block, and each next
2342
sum is calculated in the same way, but taking the four bytes from destination
2343
at the position one byte after the position of previous block. The four bytes
2344
from the source stay the same each time. This way eight sums of absolute
2345
differencies are calculated and stored as packed word values into the
2346
destination operand. The instructions described in this paragraph follow the
2347
same rules for operands, as "roundps" instruction.
2348
  "blendps", "blendvps", "blendpd" and "blendvpd" conditionally copy the
2349
values from source operand into the destination operand, depending on the bits
2350
of the mask provided by third operand. If a mask bit is set, the corresponding
2351
element of source is copied into the same place in destination, otherwise this
2352
position is destination is left unchanged. The rules for the first two operands
2353
are the same, as for general SSE instructions. "blendps" and "blendpd" need
2354
third operand to be 8-bit immediate, and they operate on single or double
2355
precision values, respectively. "blendvps" and "blendvpd" require third operand
2356
to be the XMM0 register.
2357
2358
 
2359
2360
 
2361
destination, depending on the bits of mask provided by third operand, which
2362
needs to be 8-bit immediate value. "pblendvb" conditionally copies byte
2363
elements from the source operands into destination, depending on mask defined
2364
by the third operand, which has to be XMM0 register. These instructions follow
2365
the same rules for operands as "blendps" and "blendvps" instructions,
2366
respectively.
2367
  "insertps" inserts a single precision floating point value taken from the
2368
position in source operand specified by bits 6-7 of third operand into location
2369
in destination register selected by bits 4-5 of third operand. Additionally,
2370
the low four bits of third operand control, which elements in destination
2371
register will be set to zero. The first two operands follow the same rules as
2372
for the general SSE operation, the third operand should be 8-bit immediate.
2373
  "extractps" extracts a single precision floating point value taken from the
2374
location in source operand specified by low two bits of third operand, and
2375
stores it into the destination operand. The destination can be a 32-bit memory
2376
value or general purpose register, the source operand must be SSE register,
2377
and the third operand should be 8-bit immediate value.
2378
2379
 
2380
2381
 
2382
the source operand into the location of destination operand determined by the
2383
third operand. The destination operand has to be SSE register, the source
2384
operand can be a memory location of appropriate size, or the 32-bit general
2385
purpose register (but 64-bit general purpose register for "pinsrq", which is
2386
only available in long mode), and the third operand has to be 8-bit immediate
2387
value. These instructions complement the "pinsrw" instruction operating on SSE
2388
register destination, which was introduced by SSE2.
2389
2390
 
2391
2392
 
2393
quad word from the location in source operand specified by third operand, into
2394
the destination. The source operand should be SSE register, the third operand
2395
should be 8-bit immediate, and the destination operand can be memory location
2396
of appropriate size, or the 32-bit general purpose register (but 64-bit general
2397
purpose register for "pextrq", which is only available in long mode). The
2398
"pextrw" instruction with SSE register as source was already introduced by
2399
SSE2, but SSE4 extends it to allow memory operand as destination.
2400
2401
 
2402
2403
 
2404
byte values from the source operand into packed word values in destination
2405
operand, which has to be SSE register. The source can be 64-bit memory or SSE
2406
register - when it is register, only its low portion is used. "pmovsxbd" and
2407
"pmovzxbd" perform sign extension or zero extension of the four byte values
2408
from the source operand into packed double word values in destination operand,
2409
the source can be 32-bit memory or SSE register. "pmovsxbq" and "pmovzxbq"
2410
perform sign extension or zero extension of the two byte values from the
2411
source operand into packed quad word values in destination operand, the source
2412
can be 16-bit memory or SSE register. "pmovsxwd" and "pmovzxwd" perform sign
2413
extension or zero extension of the four word values from the source operand
2414
into packed double words in destination operand, the source can be 64-bit
2415
memory or SSE register. "pmovsxwq" and "pmovzxwq" perform sign extension or
2416
zero extension of the two word values from the source operand into packed quad
2417
words in destination operand, the source can be 32-bit memory or SSE register.
2418
"pmovsxdq" and "pmovzxdq" perform sign extension or zero extension of the two
2419
double word values from the source operand into packed quad words in
2420
destination operand, the source can be 64-bit memory or SSE register.
2421
2422
 
2423
    pmovsxwq xmm0,xmm1       ; sign-extend words to quad words
2424
2425
 
2426
using a non-temporal hint. The destination operand should be SSE register,
2427
and the source operand should be 128-bit memory location.
2428
  The SSE4.2, described below, adds not only some new operations on SSE
2429
registers, but also introduces some completely new instructions operating on
2430
general purpose registers only.
2431
  "pcmpistri" compares two zero-ended (implicit length) strings provided in
2432
its source and destination operand and generates an index stored to ECX;
2433
"pcmpistrm" performs the same comparison and generates a mask stored to XMM0.
2434
"pcmpestri" compares two strings of explicit lengths, with length provided
2435
in EAX for the destination operand and in EDX for the source operand, and
2436
generates an index stored to ECX; "pcmpestrm" performs the same comparision
2437
and generates a mask stored to XMM0. The source and destination operand follow
2438
the same rules as for general SSE instructions, the third operand should be
2439
8-bit immediate value determining the details of performed operation - refer to
2440
Intel documentation for information on those details.
2441
  "pcmpgtq" compares packed quad words, and fills the corresponding elements of
2442
destination operand with either ones or zeros, depending on whether the value
2443
in destination is greater than the one in source, or not. This instruction
2444
follows the same rules for operands as "pcmpeqq".
2445
  "crc32" accumulates a CRC32 value for the source operand starting with
2446
initial value provided by destination operand, and stores the result in
2447
destination. Unless in long mode, the destination operand should be a 32-bit
2448
general purpose register, and the source operand can be a byte, word, or double
2449
word register or memory location. In long mode the destination operand can
2450
also be a 64-bit general purpose register, and the source operand in such case
2451
can be a byte or quad word register or memory location.
2452
2453
 
2454
    crc32 eax,word [ebx]  ; accumulate CRC32 on word value
2455
    crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
2456
2457
 
2458
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
2459
and stores this count in the destination operand, which has to be register of
2460
the same size as source operand. The 64-bit variant is available only in long
2461
mode.
2462
2463
 
2464
2465
 
2466
by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the
2467
same syntax, and calculates the count of leading zero bits in source operand
2468
(if the source operand is all zero bits, the total number of bits in source
2469
operand is stored in destination).
2470
  "extrq" extract the sequence of bits from the low quad word of SSE register
2471
provided as first operand and stores them at the low end of this register,
2472
filling the remaining bits in the low quad word with zeros. The position of bit
2473
string and its length can either be provided with two 8-bit immediate values
2474
as second and third operand, or by SSE register as second operand (and there
2475
is no third operand in such case), which should contain position value in bits
2476
8-13 and length of bit string in bits 0-5.
2477
2478
 
2479
    extrq xmm0,xmm5       ; extract bits defined by register
2480
2481
 
2482
operand into specified position in low quad word of the destination operand,
2483
leaving the other bits in low quad word of destination intact. The position
2484
where bits should be written and the length of bit string can either be
2485
provided with two 8-bit immediate values as third and fourth operand, or by
2486
the bit fields in source operand (and there are only two operands in such
2487
case), which should contain position value in bits 72-77 and length of bit
2488
string in bits 64-69.
2489
2490
 
2491
    insertq xmm1,xmm0     ; insert bits defined by register
2492
2493
 
2494
value from the source SSE register into 32-bit or 64-bit destination memory
2495
location respectively, using non-temporal hint.
2496
2497
 
2498
 
2499
2500
 
2501
of SSE instructions, with new scheme of encoding that allows extended syntax
2502
having a destination operand separate from all the source operands. It also
2503
introduces 256-bit AVX registers, which extend up the old 128-bit SSE
2504
registers. Any AVX instruction that puts some result into SSE register, puts
2505
zero bits into high portion of the AVX register containing it.
2506
  The AVX version of SSE instruction has the mnemonic obtained by prepending
2507
SSE instruction name with "v". For any SSE arithmetic instruction which had a
2508
destination operand also being used as one of the source values, the AVX
2509
variant has a new syntax with three operands - the destination and two sources.
2510
The destination and first source can be SSE registers, and second source can be
2511
SSE register or memory. If the operation is performed on single pair of values,
2512
the remaining bits of first source SSE register are copied into the the
2513
destination register.
2514
2515
 
2516
    vmulsd xmm0,xmm7,qword [esi]  ; multiply two 64-bit floats
2517
2518
 
2519
data size when the AVX registers are specified instead of SSE registers, and
2520
the size of memory operand is also doubled then.
2521
2522
 
2523
2524
 
2525
that earlier had been promoted from MMX to SSE) also acquired the new syntax
2526
with three operands, however they are only allowed to operate on 128-bit
2527
packed types and thus cannot use the whole AVX registers.
2528
2529
 
2530
    vpslld xmm1,xmm0,1            ; shift double words left
2531
2532
 
2533
one being an immediate value, the AVX version of such instruction takes four
2534
operands, with immediate remaining the last one.
2535
2536
 
2537
    vpalignr xmm0,xmm4,xmm2,3        ; extract byte aligned value
2538
2539
 
2540
applied to all the instructions from SSE extensions up to SSE4, with the
2541
exceptions described below.
2542
  "vdppd" instruction has syntax extended to four operans, but it does not
2543
have a 256-bit version.
2544
  The are a few instructions, namely "vsqrtpd", "vsqrtps", "vrcpps" and
2545
"vrsqrtps", which can operate on 256-bit data size, but retained the syntax
2546
with only two operands, because they use data from only one source:
2547
2548
 
2549
2550
 
2551
operands, the last one being immediate value.
2552
2553
 
2554
2555
 
2556
three-operand syntax while being promoted to AVX version. In such case these
2557
instructions follow exactly the same rules for operands as their SSE
2558
counterparts (since operations on packed integers do not have 256-bit variants
2559
in AVX extension). These include "vpcmpestri", "vpcmpestrm", "vpcmpistri",
2560
"vpcmpistrm", "vphminposuw", "vpshufd", "vpshufhw", "vpshuflw". And there are
2561
more instructions that in AVX versions keep exactly the same syntax for
2562
operands as the one from SSE, without any additional options: "vcomiss",
2563
"vcomisd", "vcvtss2si", "vcvtsd2si", "vcvttss2si", "vcvttsd2si", "vextractps",
2564
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vmovd", "vmovq", "vmovntdqa",
2565
"vmaskmovdqu", "vpmovmskb", "vpmovsxbw", "vpmovsxbd", "vpmovsxbq", "vpmovsxwd",
2566
"vpmovsxwq", "vpmovsxdq", "vpmovzxbw", "vpmovzxbd", "vpmovzxbq", "vpmovzxwd",
2567
"vpmovzxwq" and "vpmovzxdq".
2568
  The move and conversion instructions have mostly been promoted to allow
2569
256-bit size operands in addition to the 128-bit variant with syntax identical
2570
to that from SSE version of the same instruction. Each of the "vcvtdq2ps",
2571
"vcvtps2dq" and "vcvttps2dq", "vmovaps", "vmovapd", "vmovups", "vmovupd",
2572
"vmovdqa", "vmovdqu", "vlddqu", "vmovntps", "vmovntpd", "vmovntdq",
2573
"vmovsldup", "vmovshdup", "vmovmskps" and "vmovmskpd" inherits the 128-bit
2574
syntax from SSE without any changes, and also allows a new form with 256-bit
2575
operands in place of 128-bit ones.
2576
2577
 
2578
2579
 
2580
has a 256-bit version, which stores the duplicates of the lowest quad word
2581
from the source operand in the lower half of destination operand, and in the
2582
upper half of destination the duplicates of the low quad word from the upper
2583
half of source. Both source and destination operands need then to be 256-bit
2584
values.
2585
  "vmovlhps" and "vmovhlps" have only 128-bit versions, and each takes three
2586
operands, which all must be SSE registers. "vmovlhps" copies two single
2587
precision values from the low quad word of second source register to the high
2588
quad word of destination register, and copies the low quad word of first
2589
source register into the low quad word of destination register. "vmovhlps"
2590
copies two single  precision values from the high quad word of second source
2591
register to the low quad word of destination register, and copies the high
2592
quad word of first source register into the high quad word of destination
2593
register.
2594
  "vmovlps", "vmovhps", "vmovlpd" and "vmovhpd" have only 128-bit versions and
2595
their syntax varies depending on whether memory operand is a destination or
2596
source. When memory is destination, the syntax is identical to the one of
2597
equivalent SSE instruction, and when memory is source, the instruction requires
2598
three operands, first two being SSE registers and the third one 64-bit memory.
2599
The value put into destination is then the value copied from first source with
2600
either low or high quad word replaced with value from second source (the
2601
memory operand).
2602
2603
 
2604
    vmovlps xmm0,xmm7,[ebx]  ; low from memory, rest from register
2605
2606
 
2607
as one of the operands is memory, while the versions that operate purely on
2608
registers require three operands (each being SSE register). The value stored
2609
in destination is then the value copied from first source with lowest data
2610
element replaced with the lowest value from second source.
2611
2612
 
2613
    vmovss xmm0,xmm1,xmm2    ; one value from xmm2, three from xmm1
2614
2615
 
2616
syntax, where destination and first source are always SSE registers, and the
2617
second source follows the same rules and the source in syntax of equivalent
2618
SSE instruction. The value stored in destination is then the value copied from
2619
first source with lowest data element replaced with the result of conversion.
2620
2621
 
2622
    vcvtsi2ss xmm0,xmm0,rax  ; 64-bit integer to 32-bit float
2623
2624
 
2625
plus the new variants with AVX register as destination and SSE register or
2626
128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and
2627
"vcvtpd2ps", in addition to variant with syntax identical to SSE version,
2628
allow a variant with SSE register as destination and AVX register or 256-bit
2629
memory as source.
2630
  "vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use
2631
a syntax with four operands, where destination and first source have to be SSE
2632
registers, and the third and fourth operand follow the same rules as second
2633
and third operand in the syntax of equivalent SSE instruction. Value stored in
2634
destination is the the value copied from first source with some data elements
2635
replaced with values extracted from the second source, analogously to the
2636
operation of corresponding SSE instruction.
2637
2638
 
2639
2640
 
2641
operands: destination, two sources and a mask, where second source can also be
2642
a memory operand. "vblendvps" and "vblendvpd" have 256-bit variant, where
2643
operands are AVX registers or 256-bit memory, as well as 128-bit variant,
2644
which has operands being SSE registers or 128-bit memory. "vpblendvb" has only
2645
a 128-bit variant. Value stored in destination is the value copied from the
2646
first source with some data elements replaced, according to mask, by values
2647
from the second source.
2648
2649
 
2650
2651
 
2652
version, with both operands doubled in size. There are also two new
2653
instructions, "vtestps" and "vtestpd", which perform analogous tests, but only
2654
of the sign bits of corresponding single precision or double precision values,
2655
and set the ZF and CF accordingly. They follow the same syntax rules as
2656
"vptest".
2657
2658
 
2659
    vtestpd xmm0,xmm1        ; test sign bits of 64-bit floats
2660
2661
 
2662
which broadcast the data element defined by source operand into all elements
2663
of corresponing size in the destination register. "vbroadcastss" needs
2664
source to be 32-bit memory and destination to be either SSE or AVX register.
2665
"vbroadcastsd" requires 64-bit memory as source, and AVX register as
2666
destination. "vbroadcastf128" requires 128-bit memory as source, and AVX
2667
register as destination.
2668
2669
 
2670
2671
 
2672
destination and first source have to be AVX registers, second source can be
2673
SSE register or 128-bit memory location, and fourth operand should be an
2674
immediate value. It stores in destination the value obtained by taking
2675
contents of first source and replacing one of its 128-bit units with value of
2676
the second source. The lowest bit of fourth operand specifies at which
2677
position that replacement is done (either 0 or 1).
2678
  "vextractf128" is the new instruction with three operands. The destination
2679
needs to be SSE register or 128-bit memory location, the source must be AVX
2680
register, and the third operand should be an immediate value. It extracts
2681
into destination one of the 128-bit units from source. The lowest bit of third
2682
operand specifies, which unit is extracted.
2683
  "vmaskmovps" and "vmaskmovpd" are the new instructions with three operands
2684
that selectively store in destination the elements from second source
2685
depending on the sign bits of corresponding elements from first source. These
2686
instructions can operate on either 128-bit data (SSE registers) or 256-bit
2687
data (AVX registers). Either destination or second source has to be a memory
2688
location of appropriate size, the two other operands should be registers.
2689
2690
 
2691
    vmaskmovpd ymm5,ymm0,[esi]  ; conditionally load
2692
2693
 
2694
that permute the values from first source according to the control fields from
2695
second source and put the result into destination operand. It allows to use
2696
either three SSE registers or three AVX registers as its operands, the second
2697
source can be a memory of size equal to the registers used. In alternative
2698
form the second source can be immediate value and then the first source
2699
can be a memory location of the size equal to destination register.
2700
  "vperm2f128" is the new instruction with four operands, which selects
2701
128-bit blocks of floating point data from first and second source according
2702
to the bit fields from fourth operand, and stores them in destination.
2703
Destination and first source need to be AVX registers, second source can be
2704
AVX register or 256-bit memory area, and fourth operand should be an immediate
2705
value.
2706
2707
 
2708
2709
 
2710
the upper 128-bit portions of all AVX registers to zero, leaving the SSE
2711
registers intact. These new instructions take no operands.
2712
  "vldmxcsr" and "vstmxcsr" are the AVX versions of "ldmxcsr" and "stmxcsr"
2713
instructions. The rules for their operands remain unchanged.
2714
2715
 
2716
 
2717
2718
 
2719
to use 256-bit data types, and introduces some new instructions as well.
2720
  The AVX instructions that operate on packed integers and had only a 128-bit
2721
variants, have been supplemented with 256-bit variants, and thus their syntax
2722
rules became analogous to AVX instructions operating on packed floating point
2723
types.
2724
2725
 
2726
    vpavgw ymm3,ymm0,ymm2    ; average of 16-bit integers
2727
2728
 
2729
256-bit variants. "vpcmpestri", "vpcmpestrm", "vpcmpistri", "vpcmpistrm",
2730
"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vpinsrb", "vpinsrw", "vpinsrd",
2731
"vpinsrq" and "vphminposuw" are not affected by AVX2 and allow only the
2732
128-bit operands.
2733
  The packed shift instructions, which allowed the third operand specifying
2734
amount to be SSE register or 128-bit memory location, use the same rules
2735
for the third operand in their 256-bit variant.
2736
2737
 
2738
    vpsrad ymm0,ymm3,xword [ebx] ; shift double words right
2739
2740
 
2741
syntax, which shift each element from first source by the amount specified in
2742
corresponding element of second source, and store the results in destination.
2743
"vpsllvd" shifts 32-bit elements left, "vpsllvq" shifts 64-bit elements left,
2744
"vpsrlvd" shifts 32-bit elements right logically, "vpsrlvq" shifts 64-bit
2745
elements right logically and "vpsravd" shifts 32-bit elements right
2746
arithmetically.
2747
  The sign-extend and zero-extend instructions, which in AVX versions allowed
2748
source operand to be SSE register or a memory of specific size, in the new
2749
256-bit variant need memory of that size doubled or SSE register as source and
2750
AVX register as destination.
2751
2752
 
2753
2754
 
2755
transfer 256-bit value from memory to AVX register, it needs memory address
2756
to be aligned to 32 bytes.
2757
  "vpmaskmovd" and "vpmaskmovq" are the new instructions with syntax identical
2758
to "vmaskmovps" or "vmaskmovpd", and they performs analogous operation on
2759
packed 32-bit or 64-bit values.
2760
  "vinserti128", "vextracti128", "vbroadcasti128" and "vperm2i128" are the new
2761
instructions with syntax identical to "vinsertf128", "vextractf128",
2762
"vbroadcastf128" and "vperm2f128" respectively, and they perform analogous
2763
operations on 128-bit blocks of integer data.
2764
  "vbroadcastss" and "vbroadcastsd" instructions have been extended to allow
2765
SSE register as a source operand (which in AVX could only be a memory).
2766
  "vpbroadcastb", "vpbroadcastw", "vpbroadcastd" and "vpbroadcastq" are the
2767
new instructions which broadcast the byte, word, double word or quad word from
2768
the source operand into all elements of corresponing size in the destination
2769
register. The destination operand can be either SSE or AVX register, and the
2770
source operand can be SSE register or memory of size equal to the size of data
2771
element.
2772
2773
 
2774
2775
 
2776
32-bit element from first source as an index of element in second source which
2777
is copied into destination at position corresponding to element containing
2778
index. The destination and first source have to be AVX registers, and the
2779
second source can be AVX register or 256-bit memory.
2780
  "vpermq" and "vpermpd" are new three-operand instructions, which use 2-bit
2781
indexes from the immediate value specified as third operand to determine which
2782
element from source store at given position in destination. The destination
2783
has to be AVX register, source can be AVX register or 256-bit memory, and the
2784
third operand must be 8-bit immediate value.
2785
  The family of new instructions performing "gather" operation have special
2786
syntax, as in their memory operand they use addressing mode that is unique to
2787
them. The base of address can be a 32-bit or 64-bit general purpose register
2788
(the latter only in long mode), and the index (possibly multiplied by scale
2789
value, as in standard addressing) is specified by SSE or AVX register. It is
2790
possible to use only index without base and any numerical displacement can be
2791
added to the address. Each of those instructions takes three operands. First
2792
operand is the destination register, second operand is memory addressed with
2793
a vector index, and third operand is register containing a mask. The most
2794
significant bit of each element of mask determines whether a value will be
2795
loaded from memory into corresponding element in destination. The address of
2796
each element to load is determined by using the corresponding element from
2797
index register in memory operand to calculate final address with given base
2798
and displacement. When the index register contains less elements than the
2799
destination and mask registers, the higher elements of destination are zeroed.
2800
After the value is successfuly loaded, the corresponding element in mask
2801
register is set to zero. The destination, index and mask should all be
2802
distinct registers, it is not allowed to use the same register in two
2803
different roles.
2804
  "vgatherdps" loads single precision floating point values addressed by
2805
32-bit indexes. The destination, index and mask should all be registers of the
2806
same type, either SSE or AVX. The data addressed by memory operand is 32-bit
2807
in size.
2808
2809
 
2810
    vgatherdps ymm0,[ebx+ymm7*4],ymm3  ; gather eight floats
2811
2812
 
2813
64-bit indexes. The destination and mask should always be SSE registers, while
2814
index register can be either SSE or AVX register. The data addressed by memory
2815
operand is 32-bit in size.
2816
2817
 
2818
    vgatherqps xmm0,[ymm2+64],xmm3     ; gather four floats
2819
2820
 
2821
32-bit indexes. The index register should always be SSE register, the
2822
destination and mask should be two registers of the same type, either SSE or
2823
AVX. The data addressed by memory operand is 64-bit in size.
2824
2825
 
2826
    vgatherdpd ymm0,[xmm3*8],ymm5      ; gather four doubles
2827
2828
 
2829
64-bit indexes. The destination, index and mask should all be registers of the
2830
same type, either SSE or AVX. The data addressed by memory operand is 64-bit
2831
in size.
2832
  "vpgatherdd" and "vpgatherqd" load 32-bit values addressed by either 32-bit
2833
or 64-bit indexes. They follow the same rules as "vgatherdps" and "vgatherqps"
2834
respectively.
2835
  "vpgatherdq" and "vpgatherqq" load 64-bit values addressed by either 32-bit
2836
or 64-bit indexes. They follow the same rules as "vgatherdpd" and "vgatherqpd"
2837
respectively.
2838
2839
 
2840
 
2841
2842
 
2843
AVX. They introduce new vector instructions (and sometimes also their SSE
2844
equivalents that use classic instruction encoding), and even some new
2845
instructions operating on general registers that use the AVX-like encoding
2846
allowing the extended syntax with separate destination and source operands.
2847
The CPU support for each of these instructions sets needs to be determined
2848
separately.
2849
  The AES extension provides a specialized set of instructions for the
2850
purpose of cryptographic computations defined by Advanced Encryption Standard.
2851
Each of these instructions has two versions: the AVX one and the one with
2852
SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
2853
details of operation of these instructions.
2854
  "aesenc" and "aesenclast" perform a single round of AES encryption on data
2855
from first source with a round key from second source, and store result in
2856
destination. The destination and first source are SSE registers, and the
2857
second source can be SSE register or 128-bit memory. The AVX versions of these
2858
instructions, "vaesenc" and "vaesenclast", use the syntax with three operands,
2859
while the SSE-like version has only two operands, with first operand being
2860
both the destination and first source.
2861
  "aesdec" and "aesdeclast" perform a single round of AES decryption on data
2862
from first source with a round key from second source. The syntax rules for
2863
them and their AVX versions are the same as for "aesenc".
2864
  "aesimc" performs the InvMixColumns transformation of source operand and
2865
store the result in destination. Both "aesimc" and "vaesimc" use only two
2866
operands, destination being SSE register, and source being SSE register or
2867
128-bit memory location.
2868
  "aeskeygenassist" is a helper instruction for generating the round key.
2869
It needs three operands: destination being SSE register, source being SSE
2870
register or 128-bit memory, and third operand being 8-bit immediate value.
2871
The AVX version of this instruction uses the same syntax.
2872
  The CLMUL extension introduces just one instruction, "pclmulqdq", and its
2873
AVX version as well. This instruction performs a carryless multiplication of
2874
two 64-bit values selected from first and second source according to the bit
2875
fields in immediate value. The destination and first source are SSE registers,
2876
second source is SSE register or 128-bit memory, and immediate value is
2877
provided as last operand. "vpclmulqdq" takes four operands, while "pclmulqdq"
2878
takes only three operands, with the first one serving both the role of
2879
destination and first source.
2880
  The FMA (Fused Multiply-Add) extension introduces additional AVX
2881
instructions which perform multiplication and summation as single operation.
2882
Each one takes three operands, first one serving both the role of destination
2883
and first source, and the following ones being the second and third source.
2884
The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first
2885
either "m" or "nm" to select whether result of multiplication should be taken
2886
as-is or negated, then either "add" or "sub" to select whether third value
2887
will be added to the product or substracted from the product, then either
2888
"132", "213" or "231" to select which source operands are multiplied and which
2889
one is added or substracted, and finally the type of data on which the
2890
instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE
2891
instructions promoted to AVX, instructions operating on packed floating point
2892
values allow 128-bit or 256-bit syntax, in former all the operands are SSE
2893
registers, but the third one can also be a 128-bit memory, in latter the
2894
operands are AVX registers and the third one can also be a 256-bit memory.
2895
Instructions that compute just one floating point result need operands to be
2896
SSE registers, and the third operand can also be a memory, either 32-bit for
2897
single precision or 64-bit for double precision.
2898
2899
 
2900
    vfnmadd132sd xmm0,xmm5,[ebx]   ; multiply, negate and add
2901
2902
 
2903
families of instructions with mnemonics starting with either "vfmaddsub" or
2904
"vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or
2905
"pd" (the operation must always be on packed values in this case). They add
2906
to the result of multiplication or substract from it depending on the position
2907
of value in packed data - instructions from the "vfmaddsub" group add when the
2908
position is odd and substract when the position is even, instructions from the
2909
"vfmsubadd" group add when the position is even and subtstract when the
2910
position is odd. The rules for operands are the same as for other FMA
2911
instructions.
2912
  The FMA4 instructions are similar to FMA, but use syntax with four operands
2913
and thus allow destination to be different than all the sources. Their
2914
mnemonics are identical to FMA instructions with the "132", "213" or "231" cut
2915
out, as having separate destination operand makes such selection of operands
2916
superfluous. The multiplication is always performed on values from the first
2917
and second source, and then the value from third source is added or
2918
substracted. Either second or third source can be a memory operand, and the
2919
rules for the sizes of operands are the same as for FMA instructions.
2920
2921
 
2922
    vfmsubss xmm0,xmm1,xmm2,[ebx]  ; multiply and substract
2923
2924
 
2925
"vcvtph2ps", which convert floating point values between single precision and
2926
half precision (the 16-bit floating point format). "vcvtps2ph" takes three
2927
operands: destination, source, and rounding controls. The third operand is
2928
always an immediate, the source is either SSE or AVX register containing
2929
single precision values, and the destination is SSE register or memory, the
2930
size of memory is 64 bits when the source is SSE register and 128 bits when
2931
the source is AVX register. "vcvtph2ps" takes two operands, the destination
2932
that can be SSE or AVX register, and the source that is SSE register or memory
2933
with size of the half of destination operand's size.
2934
  The AMD XOP extension introduces a number of new vector instructions with
2935
encoding and syntax analogous to AVX instructions. "vfrczps", "vfrczss",
2936
"vfrczpd" and "vfrczsd" extract fractional portions of single or double
2937
precision values, they all take two operands. The packed operations allow
2938
either SSE or AVX register as destination, for the other two it has to be SSE
2939
register. Source can be register of the same type as destination, or memory
2940
of appropriate size (256-bit for destination being AVX register, 128-bit for
2941
packed operation with destination being SSE register, 64-bit for operation
2942
on a solitary double precision value and 32-bit for operation on a solitary
2943
single precision value).
2944
2945
 
2946
2947
 
2948
depending on the values of corresponding bits in the fourth operand (the
2949
selector). If the bit in selector is set, the corresponding bit from first
2950
source is copied into the same position in destination, otherwise the bit from
2951
second source is copied. Either second source or selector can be memory
2952
location, 128-bit or 256-bit depending on whether SSE registers or AVX
2953
registers are specified as the other operands.
2954
2955
 
2956
    vpcmov ymm0,ymm5,[esi],ymm2  ; source in memory
2957
2958
 
2959
destination and first source being SSE register, second source being SSE
2960
register or 128-bit memory and the fourth operand being immediate value
2961
defining the type of comparison. The mnemonic or instruction is created
2962
by appending to "vpcom" prefix either "b" or "ub" to compare signed or
2963
unsigned bytes, "w" or "uw" to compare signed or unsigned words, "d" or "ud"
2964
to compare signed or unsigned double words, "q" or "uq" to compare signed or
2965
unsigned quad words. The respective values from the first and second source
2966
are compared and the corresponding data element in destination is set to
2967
either all ones or all zeros depending on the result of comparison. The fourth
2968
operand has to specify one of the eight comparison types (table 2.5). All
2969
these instructions have also variants with only three operands and the type
2970
of comparison encoded within the instruction name by inserting the comparison
2971
mnemonic after "vpcom".
2972
2973
 
2974
    vpcomgew xmm0,xmm1,[ebx]     ; compare signed words
2975
2976
 
2977
  /-------------------------------------------\
2978
  | Code | Mnemonic | Description             |
2979
  |======|==========|=========================|
2980
  | 0    | lt       | less than               |
2981
  | 1    | le       | less than or equal      |
2982
  | 2    | gt       | greater than            |
2983
  | 3    | ge       | greater than or equal   |
2984
  | 4    | eq       | equal                   |
2985
  | 5    | neq      | not equal               |
2986
  | 6    | false    | false                   |
2987
  | 7    | true     | true                    |
2988
  \-------------------------------------------/
2989
2990
 
2991
zero or to a value selected from first or second source depending on the
2992
corresponding bit fields from the fourth operand (the selector) and the
2993
immediate value provided in fifth operand. Refer to the AMD manuals for the
2994
detailed explanation of the operation performed by these instructions. Each
2995
of the first four operands can be a register, and either second source or
2996
selector can be memory location, 128-bit or 256-bit depending on whether SSE
2997
registers or AVX registers are used for the other operands.
2998
2999
 
3000
3001
 
3002
stores them at the same positions in destination. "vphaddubw" does the same
3003
but treats the bytes as unsigned. "vphaddbd" and "vphaddubd" sum all bytes
3004
(either signed or unsigned) in each four-byte block to 32-bit results,
3005
"vphaddbq" and "vphaddubq" sum all bytes in each eight-byte block to
3006
64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit
3007
results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to
3008
64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit
3009
results. "vphsubbw" substracts in each two-byte block the byte at higher
3010
position from the one at lower position, and stores the result as a signed
3011
16-bit value at the corresponding position in destination, "vphsubwd"
3012
substracts in each two-word block the word at higher position from the one at
3013
lower position and makes signed 32-bit results, "vphsubdq" substract in each
3014
block of two double word the one at higher position from the one at lower
3015
position and makes signed 64-bit results. Each of these instructions takes
3016
two operands, the destination being SSE register, and the source being SSE
3017
register or 128-bit memory.
3018
3019
 
3020
3021
 
3022
from the first and second source and then add the products to the parallel
3023
values from the third source, then "vpmacsww" takes the lowest 16 bits of the
3024
result and "vpmacssww" saturates the result down to 16-bit value, and they
3025
store the final 16-bit results in the destination. "vpmacsdd" and "vpmacssdd"
3026
perform the analogous operation on 32-bit values. "vpmacswd" and "vpmacsswd" do
3027
the same calculation only on the low 16-bit values from each 32-bit block and
3028
form the 32-bit results. "vpmacsdql" and "vpmacssdql" perform such operation
3029
on the low 32-bit values from each 64-bit block and form the 64-bit results,
3030
while "vpmacsdqh" and "vpmacssdqh" do the same on the high 32-bit values from
3031
each 64-bit block, also forming the 64-bit results. "vpmadcswd" and
3032
"vpmadcsswd" multiply the corresponding signed 16-bit value from the first
3033
and second source, then sum all the four products and add this sum to each
3034
16-bit element from third source, storing the truncated or saturated result
3035
in destination. All these instructions take four operands, the second source
3036
can be 128-bit memory or SSE register, all the other operands have to be
3037
SSE registers.
3038
3039
 
3040
3041
 
3042
separate transformation to each of them, and stores them in the destination.
3043
The bit fields in fourth operand (the selector) specify for each position in
3044
destination what byte from which source is taken and what operation is applied
3045
to it before it is stored there. Refer to the AMD manuals for the detailed
3046
information about these bit fields. This instruction takes four operands,
3047
either second source or selector can be a 128-bit memory (or they can be SSE
3048
registers both), all the other operands have to be SSE registers.
3049
  "vpshlb", "vpshlw", "vpshld" and "vpshlq" shift logically bytes, words, double
3050
words or quad words respectively. The amount of bits to shift by is specified
3051
for each element separately by the signed byte placed at the corresponding
3052
position in the third operand. The source containing elements to shift is
3053
provided as second operand. Either second or third operand can be 128-bit
3054
memory (or they can be SSE registers both) and the other operands have to be
3055
SSE registers.
3056
3057
 
3058
3059
 
3060
double words or quad words. These instructions follow the same rules as the
3061
logical shifts described above. "vprotb", "vprotw", "vprotd" and "vprotq"
3062
rotate bytes, word, double words or quad words. They follow the same rules as
3063
shifts, but additionally allow third operand to be immediate value, in which
3064
case the same amount of rotation is specified for all the elements in source.
3065
3066
 
3067
3068
 
3069
swaps bytes in value from source before storing it in destination, so can
3070
be used to load and store big endian values. It takes two operands, either
3071
the destination or source should be a 16-bit, 32-bit or 64-bit memory (the
3072
last one being only allowed in long mode), and the other operand should be
3073
a general register of the same size.
3074
  The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces
3075
new instructions operating on general registers, which use the same encoding
3076
as AVX instructions and so allow the extended syntax. All these instructions
3077
use 32-bit operands, and in long mode they also allow the forms with 64-bit
3078
operands.
3079
  "andn" calculates the bitwise AND of second source with the inverted bits
3080
of first source and stores the result in destination. The destination and
3081
the first source have to be general registers, the second source can be
3082
general register or memory.
3083
3084
 
3085
3086
 
3087
and length specified by bit fields in the second source operand and stores
3088
it into destination. The lowest 8 bits of second source specify the position
3089
of bit sequence to extract and the next 8 bits of second source specify the
3090
length of sequence. The first source can be a general register or memory,
3091
the other two operands have to be general registers.
3092
3093
 
3094
3095
 
3096
bits in destination to zero. The destination must be a general register,
3097
the source can be general register or memory.
3098
3099
 
3100
3101
 
3102
the source, including this bit. "blsr" copies all the bits from the source to
3103
destination except for the lowest set bit, which is replaced by zero. These
3104
instructions follow the same rules for operands as "blsi".
3105
  "tzcnt" counts the number of trailing zero bits, that is the zero bits up to
3106
the lowest set bit of source value. This instruction is analogous to "lzcnt"
3107
and follows the same rules for operands, so it also has a 16-bit version,
3108
unlike the other BMI instructions.
3109
  "bzhi" is BMI2 instruction, which copies the bits from first source to
3110
destination, zeroing all the bits up from the position specified by second
3111
source. It follows the same rules for operands as "bextr".
3112
  "pext" uses a mask in second source operand to select bits from first
3113
operands and puts the selected bits as a continuous sequence into destination.
3114
"pdep" performs the reverse operation - it takes sequence of bits from the
3115
first source and puts them consecutively at the positions where the bits in
3116
second source are set, setting all the other bits in destination to zero.
3117
These BMI2 instructions follow the same rules for operands as "andn".
3118
  "mulx" is a BMI2 instruction which performs an unsigned multiplication of
3119
value from EDX or RDX register (depending on the size of specified operands)
3120
by the value from third operand, and stores the low half of result in the
3121
second operand, and the high half of result in the first operand, and it does
3122
it without affecting the flags. The third operand can be general register or
3123
memory, and both the destination operands have to be general registers.
3124
3125
 
3126
3127
 
3128
arithmetical shifts of value from first source by the amount specified by
3129
second source, and store the result in destination without affecting the
3130
flags. The have the same rules for operands as "bzhi" instruction.
3131
  "rorx" is a BMI2 instruction which rotates right the value from source
3132
operand by the constant amount specified in third operand and stores the
3133
result in destination without affecting the flags. The destination operand
3134
has to be general register, the source operand can be general register or
3135
memory, and the third operand has to be an immediate value.
3136
3137
 
3138
3139
 
3140
"bextr" instruction is extended with a new form, in which second source is
3141
a 32-bit immediate value. "blsic" is a new instruction which performs the
3142
same operation as "blsi", but with the bits of result reversed. It uses the
3143
same rules for operands as "blsi". "blsfill" is a new instruction, which takes
3144
the value from source, sets all the bits below the lowest set bit and store
3145
the result in destination, it also uses the same rules for operands as "blsi".
3146
  "blci", "blcic", "blcs", "blcmsk" and "blcfill" are instructions analogous
3147
to "blsi", "blsic", "blsr", "blsmsk" and "blsfill" respectively, but they
3148
perform the bit-inverted versions of the same operations. They follow the
3149
same rules for operands as the instructions they reflect.
3150
  "tzmsk" finds the lowest set bit in value from source operand, sets all bits
3151
below it to 1 and all the rest of bits to zero, then writes the result to
3152
destination. "t1mskc" finds the least significant zero bit in the value from
3153
source  operand, sets the bits below it to zero and all the other bits to 1,
3154
and writes the result to destination. These instructions have the same rules
3155
for operands as "blsi".
3156
3157
 
3158
 
3159
3160
 
3161
assembler, and the general syntax of the instructions introduced by those
3162
extensions is provided here. For a detailed information on the operations
3163
performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE,
3164
RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM
3165
extension).
3166
  The Virtual-Machine Extensions (VMX) provide a set of instructions for the
3167
management of virtual machines. The "vmxon" instruction, which enters the VMX
3168
operation, requires a single 64-bit memory operand, which should be a physical
3169
address of memory region, which the logical processor may use to support VMX
3170
operation. The "vmxoff" instruction, which leaves the VMX operation, has no
3171
operands. The "vmlaunch" and "vmresume", which launch or resume the virtual
3172
machines, and "vmcall", which allows guest software to call the VM monitor,
3173
use no operands either.
3174
  The "vmptrld" loads the physical address of current Virtual Machine Control
3175
Structure (VMCS) from its memory operand, "vmptrst" stores the pointer to
3176
current VMCS into address specified by its memory operand, and "vmclear" sets
3177
the launch state of the VMCS referenced by its memory operand to clear. These
3178
three instruction all require single 64-bit memory operand.
3179
  The "vmread" reads from VCMS a field specified by the source operand and
3180
stores it into the destination operand. The source operand should be a
3181
general purpose register, and the destination operand can be a register of
3182
memory. The "vmwrite" writes into a VMCS field specified by the destination
3183
operand the value provided by source operand. The source operand can be a
3184
general purpose register or memory, and the destination operand must be a
3185
register. The size of operands for those instructions should be 64-bit when
3186
in long mode, and 32-bit otherwise.
3187
  The "invept" and "invvpid" invalidate the translation lookaside buffers
3188
(TLBs) and paging-structure caches, either derived from extended page tables
3189
(EPT), or based on the virtual processor identifier (VPID). These instructions
3190
require two operands, the first one being the general purpose register
3191
specifying the type of invalidation, and the second one being a 128-bit
3192
memory operand providing the invalidation descriptor. The first operand
3193
should be a 64-bit register when in long mode, and 32-bit register otherwise.
3194
  The Safer Mode Extensions (SMX) provide the functionalities available
3195
throught the "getsec" instruction. This instruction takes no operands, and
3196
the function that is executed is determined by the contents of EAX register
3197
upon executing this instruction.
3198
  The Secure Virtual Machine (SVM) is a variant of virtual machine extension
3199
used by AMD. The "skinit" instruction securely reinitializes the processor
3200
allowing the startup of trusted software, such as the virtual machine monitor
3201
(VMM). This instruction takes a single operand, which must be EAX, and
3202
provides a physical address of the secure loader block (SLB).
3203
  The "vmrun" instruction is used to start a guest virtual machine,
3204
its only operand should be an accumulator register (AX, EAX or RAX, the
3205
last one available only in long mode) providing the physical address of the
3206
virtual machine control block (VMCB). The "vmsave" stores a subset of
3207
processor state into VMCB specified by its operand, and "vmload" loads the
3208
same subset of processor state from a specified VMCB. The same operand rules
3209
as for the "vmrun" apply to those two instructions.
3210
  "vmmcall" allows the guest software to call the VMM. This instruction takes
3211
no operands.
3212
  "stgi" set the global interrupt flag to 1, and "clgi" zeroes it. These
3213
instructions take no operands.
3214
  "invlpga" invalidates the TLB mapping for a virtual page specified by the
3215
first operand (which has to be accumulator register) and address space
3216
identifier specified by the second operand (which must be ECX register).
3217
  The XSAVE set of instructions allows to save and restore processor state
3218
components. "xsave" and "xsaveopt" store the components of processor state
3219
defined by bit mask in EDX and EAX registers into area defined by memory
3220
operand. "xrstor" restores from the area specified by memory operand the
3221
components of processor state defined by mask in EDX and EAX. The "xsave64",
3222
"xsaveopt64" and "xrstor64" are 64-bit versions of these instructions, allowed
3223
only in long mode.
3224
  "xgetbv" read the contents of 64-bit XCR (extended control register)
3225
specified in ECX register into EDX and EAX registers. "xsetbv" writes the
3226
contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
3227
instructions have no operands.
3228
  The RDRAND extension introduces one new instruction, "rdrand", which loads
3229
the hardware-generated random value into general register. It takes one
3230
operand, which can be 16-bit, 32-bit or 64-bit register (with the last one
3231
being allowed only in long mode).
3232
  The FSGSBASE extension adds long mode instructions that allow to read and
3233
write the segment base registers for FS and GS segments. "rdfsbase" and
3234
"rdgsbase" read the corresponding segment base registers into operand, while
3235
"wrfsbase" and "wrgsbase" write the value of operand into those register.
3236
All these instructions take one operand, which can be 32-bit or 64-bit general
3237
register.
3238
  The INVPCID extension adds "invpcid" instruction, which invalidates mapping
3239
in the TLBs and paging caches based on the invalidation type specified in
3240
first operand and PCID invalidate descriptor specified in second operand.
3241
The first operands should be 32-bit general register when not in long mode,
3242
or 64-bit general register when in long mode. The second operand should be
3243
128-bit memory location.
3244
  The HLE and RTM extensions provide set of instructions for the transactional
3245
management. The "xacquire" and "xrelease" are new prefixes that can be used
3246
with some of the instructions to start or end lock elision on the memory
3247
address specified by prefixed instruction. The "xbegin" instruction starts
3248
the transactional execution, its operand is the address a fallback routine
3249
that gets executes in case of transaction abort, specified like the operand
3250
for near jump instruction. "xend" marks the end of transcational execution
3251
region, it takes no operands. "xabort" forces the transaction abort, it takes
3252
an 8-bit immediate value as its only operand, this value is passed in the
3253
highest bits of EAX to the fallback routine. "xtest" checks whether there is
3254
transactional execution in progress, this instruction takes no operands.
3255
3256
 
3257
 
3258
3259
 
3260
are processed during the assembly and may cause some blocks of instructions
3261
to be assembled differently or not assembled at all.
3262
3263
 
3264
 
3265
3266
 
3267
preceded by the name for the constant and followed by the numerical expression
3268
providing the value. The value of such constants can be a number or an address,
3269
but - unlike labels - the numerical constants are not allowed to hold the
3270
register-based addresses. Besides this difference, in their basic variant
3271
numerical constants behave very much like labels and you can even
3272
forward-reference them (access their values before they actually get defined).
3273
  There is, however, a second variant of numerical constants, which is
3274
recognized by assembler when you try to define the constant of name, under
3275
which there already was a numerical constant defined. In such case assembler
3276
treats that constant as an assembly-time variable and allows it to be assigned
3277
with new value, but forbids forward-referencing it (for obvious reasons). Let's
3278
see both the variant of numerical constants in one example:
3279
3280
 
3281
    x = 1
3282
    x = x+2
3283
    sum = x
3284
3285
 
3286
value that was assigned to it the most recently is used. Thus if we tried to
3287
access the "x" before it gets defined the first time, like if we wrote "dd x"
3288
in place of the "dd sum" instruction, it would cause an error. And when it is
3289
re-defined with the "x = x+2" directive, the previous value of "x" is used to
3290
calculate the new one. So when the "sum" constant gets defined, the "x" has
3291
value of 3, and this value is assigned to the "sum". Since this one is defined
3292
only once in source, it is the standard numerical constant, and can be
3293
forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
3294
how the assembler is able to resolve this, see section 2.2.6.
3295
  The value of numerical constant can be preceded by size operator, which can
3296
ensure that the value will fit in the range for the specified size, and can
3297
affect also how some of the calculations inside the numerical expression are
3298
performed. This example:
3299
3300
 
3301
    c32 = dword -1
3302
3303
 
3304
fits in 32 bits.
3305
  When you need to define constant with the value of address, which may be
3306
register-based (and thus you cannot employ numerical constant for this
3307
purpose), you can use the extended syntax of "label" directive (already
3308
described in section 1.2.3), like:
3309
3310
 
3311
3312
 
3313
unlike numerical constants, cannot become assembly-time variables.
3314
3315
 
3316
 
3317
3318
 
3319
certain condition. It should be followed by logical expression specifying the
3320
condition, instructions in next lines will be assembled only when this
3321
condition is met, otherwise they will be skipped. The optional "else if"
3322
directive followed with logical expression specifying additional condition
3323
begins the next block of instructions that will be assembled if previous
3324
conditions were not met, and the additional condition is met. The optional
3325
"else" directive begins the block of instructions that will be assembled if
3326
all the conditions were not met. The "end if" directive ends the last block of
3327
instructions.
3328
  You should note that "if" directive is processed at assembly stage and
3329
therefore it doesn't affect any preprocessor directives, like the definitions
3330
of symbolic constants and macroinstructions - when the assembler recognizes the
3331
"if" directive, all the preprocessing has been already finished.
3332
  The logical expression consist of logical values and logical operators. The
3333
logical operators are "~" for logical negation, "&" for logical and, "|" for
3334
logical or. The negation has the highest priority. Logical value can be a
3335
numerical expression, it will be false if it is equal to zero, otherwise it
3336
will be true. Two numerical expression can be compared using one of the
3337
following operators to make the logical value: "=" (equal), "<" (less),
3338
">" (greater), "<=" (less or equal), ">=" (greater or equal),
3339
"<>" (not equal).
3340
  The "used" operator followed by a symbol name, is the logical value that
3341
checks whether the given symbol is used somewhere (it returns correct result
3342
even if symbol is used only after this check). The "defined" operator can be
3343
followed by any expression, usually just by a single symbol name; it checks
3344
whether the given expression contains only symbols that are defined in the
3345
source and accessible from the current position.
3346
  With "relativeto" operator it is possible to check whether values of two
3347
expressions differ only by constant amount. The valid syntax is a numerical
3348
expression followed by "relativeto" and then another expression (possibly
3349
register-based). Labels that have no simple numerical value can be tested
3350
this way to determine what kind of operations may be possible with them.
3351
  The following simple example uses the "count" constant that should be
3352
defined somewhere in source:
3353
3354
 
3355
        mov cx,count
3356
        rep movsb
3357
    end if
3358
3359
 
3360
is greater than 0. The next sample shows more complex conditional structure:
3361
3362
 
3363
        mov cx,count/4
3364
        rep movsd
3365
    else if count>4
3366
        mov cx,count/4
3367
        rep movsd
3368
        mov cx,count mod 4
3369
        rep movsb
3370
    else
3371
        mov cx,count
3372
        rep movsb
3373
    end if
3374
3375
 
3376
divisible by four, if this condition is not met, the second logical expression,
3377
which follows the "else if", is evaluated and if it's true, the second block
3378
of instructions get assembled, otherwise the last block of instructions, which
3379
follows the line containing only "else", is assembled.
3380
  There are also operators that allow comparison of values being any chains of
3381
symbols. The "eq" compares whether two such values are exactly the same.
3382
The "in" operator checks whether given value is a member of the list of values
3383
following this operator, the list should be enclosed between "<" and ">"
3384
characters, its members should be separated with commas. The symbols are
3385
considered the same when they have the same meaning for the assembler - for
3386
example "pword" and "fword" for assembler are the same and thus are not
3387
distinguished by the above operators. In the same way "16 eq 10h" is the true
3388
condition, however "16 eq 10+4" is not.
3389
  The "eqtype" operator checks whether the two compared values have the same
3390
structure, and whether the structural elements are of the same type. The
3391
distinguished types include numerical expressions, individual quoted strings,
3392
floating point numbers, address expressions (the expressions enclosed in square
3393
brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
3394
operators, jump type and code type operators. And each of the special
3395
characters that act as a separators, like comma or colon, is the separate type
3396
itself. For example, two values, each one consisting of register name followed
3397
by comma and numerical expression, will be regarded as of the same type, no
3398
matter what kind of register and how complicated numerical expression is used;
3399
with exception for the quoted strings and floating point values, which are the
3400
special kinds of numerical expressions and are treated as different types. Thus
3401
"eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
3402
3403
 
3404
 
3405
3406
 
3407
should be followed by numerical expression specifying number of repeats and
3408
the instruction to repeat (optionally colon can be used to separate number and
3409
instruction). When special symbol "%" is used inside the instruction, it is
3410
equal to the number of current repeat. For example "times 5 db %" will define
3411
five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
3412
also allowed, so "times 3 times % db %" will define six bytes with values
3413
1, 1, 2, 1, 2, 3.
3414
  "repeat" directive repeats the whole block of instructions. It should be
3415
followed by numerical expression specifying number of repeats. Instructions
3416
to repeat are expected in next lines, ended with the "end repeat" directive,
3417
for example:
3418
3419
 
3420
        mov byte [bx],%
3421
        inc bx
3422
    end repeat
3423
3424
 
3425
addressed by BX register.
3426
  Number of repeats can be zero, in that case the instructions are not
3427
assembled at all.
3428
  The "break" directive allows to stop repeating earlier and continue assembly
3429
from the first line after the "end repeat". Combined with the "if" directive it
3430
allows to stop repeating under some special condition, like:
3431
3432
 
3433
    repeat 100
3434
        if x/s = s
3435
            break
3436
        end if
3437
        s = (s+x/s)/2
3438
    end repeat
3439
3440
 
3441
condition specified by the logical expression following it is true. The block
3442
of instructions to be repeated should end with the "end while" directive.
3443
Before each repetition the logical expression is evaluated and when its value
3444
is false, the assembly is continued starting from the first line after the
3445
"end while". Also in this case the "%" symbol holds the number of current
3446
repeat. The "break" directive can be used to stop this kind of loop in the same
3447
way as with "repeat" directive. The previous sample can be rewritten to use the
3448
"while" instead of "repeat" this way:
3449
3450
 
3451
    while x/s <> s
3452
        s = (s+x/s)/2
3453
        if % = 100
3454
            break
3455
        end if
3456
    end while
3457
3458
 
3459
order, however they should be closed in the same order in which they were
3460
started. The "break" directive always stops processing the block that was
3461
started last with either the "repeat" or "while" directive.
3462
3463
 
3464
 
3465
3466
 
3467
appear in memory. It should be followed by numerical expression specifying
3468
the address. This directive begins the new addressing space, the following
3469
code itself is not moved in any way, but all the labels defined within it
3470
and the value of "$" symbol are affected as if it was put at the given
3471
address. However it's the responsibility of programmer to put the code at
3472
correct address at run-time.
3473
  The "load" directive allows to define constant with a binary value loaded
3474
from the already assembled code. This directive should be followed by the name
3475
of the constant, then optionally size operator, then "from" operator and a
3476
numerical expression specifying a valid address in current addressing space.
3477
The size operator has unusual meaning in this case - it states how many bytes
3478
(up to 8) have to be loaded to form the binary value of constant. If no size
3479
operator is specified, one byte is loaded (thus value is in range from 0 to
3480
255). The loaded data cannot exceed current offset.
3481
  The "store" directive can modify the already generated code by replacing
3482
some of the previously generated data with the value defined by given
3483
numerical expression, which follows. The expression can be preceded by the
3484
optional size operator to specify how large value the expression defines, and
3485
therefore how much bytes will be stored, if there is no size operator, the
3486
size of one byte is assumed. Then the "at" operator and the numerical
3487
expression defining the valid address in current addressing code space, at
3488
which the given value have to be stored should follow. This is a directive for
3489
advanced appliances and should be used carefully.
3490
  Both "load" and "store" directives are limited to operate on places in
3491
current addressing space. The "$$" symbol is always equal to the base address
3492
of current addressing space, and the "$" symbol is the address of current
3493
position in that addressing space, therefore these two values define limits
3494
of the area, where "load" and "store" can operate.
3495
  Combining the "load" and "store" directives allows to do things like encoding
3496
some of the already generated code. For example to encode the whole code
3497
generated in current addressing space you can use such block of directives:
3498
3499
 
3500
        load a byte from $$+%-1
3501
        store byte a xor c at $$+%-1
3502
    end repeat
3503
3504
 
3505
  "virtual" defines virtual data at specified address. This data will not be
3506
included in the output file, but labels defined there can be used in other
3507
parts of source. This directive can be followed by "at" operator and the
3508
numerical expression specifying the address for virtual data, otherwise is
3509
uses current address, the same as "virtual at $". Instructions defining data
3510
are expected in next lines, ended with "end virtual" directive. The block of
3511
virtual instructions itself is an independent addressing space, after it's
3512
ended, the context of previous addressing space is restored.
3513
  The "virtual" directive can be used to create union of some variables, for
3514
example:
3515
3516
 
3517
    virtual at GDTR
3518
        GDT_limit dw ?
3519
        GDT_address dd ?
3520
    end virtual
3521
3522
 
3523
  It can be also used to define labels for some structures addressed by a
3524
register, for example:
3525
3526
 
3527
        LDT_limit dw ?
3528
        LDT_address dd ?
3529
    end virtual
3530
3531
 
3532
to the same instruction as "mov ax,[bx]".
3533
  Declaring defined data values or instructions inside the virtual block would
3534
also be useful, because the "load" directive can be used to load the values
3535
from the virtually generated code into a constants. This directive should be
3536
used after the code it loads but before the virtual block ends, because it can
3537
only load the values from the same addressing space. For example:
3538
3539
 
3540
        xor eax,eax
3541
        and edx,eax
3542
        load zeroq dword from 0
3543
    end virtual
3544
3545
 
3546
of the machine code of the instructions defined inside the virtual block.
3547
This method can be also used to load some binary value from external file.
3548
For example this code:
3549
3550
 
3551
        file 'a.txt':10h,1
3552
        load char from 0
3553
    end virtual
3554
3555
 
3556
constant.
3557
  Any of the "section" directives described in 2.4 also begins a new
3558
addressing space.
3559
3560
 
3561
 
3562
3563
 
3564
be followed by a numerical expression specifying the number of bytes, to the
3565
multiply of which the current address has to be aligned. The boundary value
3566
has to be the power of two.
3567
  The "align" directive fills the bytes that had to be skipped to perform the
3568
alignment with the "nop" instructions and at the same time marks this area as
3569
uninitialized data, so if it is placed among other uninitialized data that
3570
wouldn't take space in the output file, the alignment bytes will act the same
3571
way. If you need to fill the alignment area with some other values, you can
3572
combine "align" with "virtual" to get the size of alignment needed and then
3573
create the alignment yourself, like:
3574
3575
 
3576
        align 16
3577
        a = $ - $$
3578
    end virtual
3579
    db a dup 0
3580
3581
 
3582
alignment and address of the "virtual" block (see previous section), so it is
3583
equal to the size of needed alignment space.
3584
  "display" directive displays the message at the assembly time. It should
3585
be followed by the quoted strings or byte values, separated with commas. It
3586
can be used to display values of some constants, for example:
3587
3588
 
3589
    display 'Current offset is 0x'
3590
    repeat bits/4
3591
        d = '0' + $ shr (bits-%*4) and 0Fh
3592
        if d > '9'
3593
            d = d + 'A'-'9'-1
3594
        end if
3595
        display d
3596
    end repeat
3597
    display 13,10
3598
3599
 
3600
value and converts them into characters for displaying. Note that this will
3601
not work if the adresses in current addressing space are relocatable (as it
3602
might happen with PE or object output formats), since only absolute values can
3603
be used this way. The absolute value may be obtained by calculating the
3604
relative address, like "$-$$", or "rva $" in case of PE format.
3605
  The "err" directive immediately terminates the assembly process when it is
3606
encountered by assembler.
3607
  The "assert" directive tests whether the logical expression that follows it
3608
is true, and if not, it signalizes the error.
3609
3610
 
3611
 
3612
3613
 
3614
before they get actually defined, it has to predict the values of such labels
3615
and if there is even a suspicion that prediction failed in at least one case,
3616
it does one more pass, assembling the whole source, this time doing better
3617
prediction based on the values the labels got in the previous pass.
3618
  The changing values of labels can cause some instructions to have encodings
3619
of different length, and this can cause the change in values of labels again.
3620
And since the labels and constants can also be used inside the expressions that
3621
affect the behavior of control directives, the whole block of source can be
3622
processed completely differently during the new pass. Thus the assembler does
3623
more and more passes, each time trying to do better predictions to approach
3624
the final solution, when all the values get predicted correctly. It uses
3625
various method for predicting the values, which has been chosen to allow
3626
finding in a few passes the solution of possibly smallest length for the most
3627
of the programs.
3628
  Some of the errors, like the values not fitting in required boundaries, are
3629
not signaled during those intermediate passes, since it may happen that when
3630
some of the values are predicted better, these errors will disappear. However
3631
if assembler meets some illegal syntax construction or unknown instruction, it
3632
always stops immediately. Also defining some label more than once causes such
3633
error, because it makes the predictions groundless.
3634
  Only the messages created with the "display" directive during the last
3635
performed pass get actually displayed. In case when the assembly has been
3636
stopped due to an error, these messages may reflect the predicted values that
3637
are not yet resolved correctly.
3638
  The solution may sometimes not exist and in such cases the assembler will
3639
never manage to make correct predictions - for this reason there is a limit for
3640
a number of passes, and when assembler reaches this limit, it stops and
3641
displays the message that it is not able to generate the correct output.
3642
Consider the following example:
3643
3644
 
3645
        alpha:
3646
    end if
3647
3648
 
3649
could be calculated in this place, what in this case means that the "alpha"
3650
label is defined somewhere. But the above block causes this label to be defined
3651
only when the value given by "defined" operator is false, what leads to an
3652
antynomy and makes it impossible to resolve such code. When processing the "if"
3653
directive assembler has to predict whether the "alpha" label will be defined
3654
somewhere (it wouldn't have to predict only if the label was already defined
3655
earlier in this pass), and whatever the prediction is, the opposite always
3656
happens. Thus the assembly will fail, unless the "alpha" label is defined
3657
somewhere in source preceding the above block of instructions - in such case,
3658
as it was already noted, the prediction is not needed and the block will just
3659
get skipped.
3660
  The above sample might have been written as a try to define the label only
3661
when it was not yet defined. It fails, because the "defined" operator does
3662
check whether the label is defined anywhere, and this includes the definition
3663
inside this conditionally processed block. However adding some additional
3664
condition may make it possible to get it resolved:
3665
3666
 
3667
        alpha:
3668
        @@:
3669
    end if
3670
3671
 
3672
following it, so the above sample would mean the same if any unique name was
3673
used instead of the anonymous label. When "alpha" is not defined in any other
3674
place in source, the only possible solution is when this block gets defined,
3675
and this time this doesn't lead to the antynomy, because of the anonymous
3676
label which makes this block self-establishing. To better understand this,
3677
look at the blocks that has nothing more than this self-establishing:
3678
3679
 
3680
        @@:
3681
    end if
3682
3683
 
3684
cases when this block gets processed or not are equally correct. Which one of
3685
those two solutions we get depends on the algorithm on the assembler, in case
3686
of flat assembler - on the algorithm of predictions. Back to the previous
3687
sample, when "alpha" is not defined anywhere else, the condition for "if" block
3688
cannot be false, so we are left with only one possible solution, and we can
3689
hope the assembler will arrive at it. On the other hand, when "alpha" is
3690
defined in some other place, we've got two possible solutions again, but one of
3691
them causes "alpha" to be defined twice, and such an error causes assembler to
3692
abort the assembly immediately, as this is the kind of error that deeply
3693
disturbs the process of resolving. So we can get such source either correctly
3694
resolved or causing an error, and what we get may depend on the internal
3695
choices made by the assembler.
3696
  However there are some facts about such choices that are certain. When
3697
assembler has to check whether the given symbol is defined and it was already
3698
defined in the current pass, no prediction is needed - it was already noted
3699
above. And when the given symbol has been defined never before, including all
3700
the already finished passes, the assembler predicts it to be not defined.
3701
Knowing this, we can expect that the simple self-establishing block shown
3702
above will not be assembled at all and that the previous sample will resolve
3703
correctly when "alpha" is defined somewhere before our conditional block,
3704
while it will itself define "alpha" when it's not already defined earlier, thus
3705
potentially causing the error because of double definition if the "alpha" is
3706
also defined somewhere later.
3707
  The "used" operator may be expected to behave in a similar manner in
3708
analogous cases, however any other kinds of predictions may not be so simple and
3709
you should never rely on them this way.
3710
  The "err" directive, usually used to stop the assembly when some condition is
3711
met, stops the assembly immediately, regardless of whether the current pass
3712
is final or intermediate. So even when the condition that caused this directive
3713
to be interpreted is mispredicted and temporary, and would eventually disappear
3714
in the later passes, the assembly is stopped anyway.
3715
  The "assert" directive signalizes the error only if its expression is false
3716
after all the symbols have been resolved. You can use "assert 0" in place of
3717
"err" when you do not want to have assembly stopped during the intermediate
3718
passes.
3719
3720
 
3721
 
3722
3723
 
3724
and therefore are not affected by the control directives. At this time also
3725
all comments are stripped out.
3726
3727
 
3728
 
3729
3730
 
3731
it is used. It should be followed by the quoted name of file that should be
3732
included, for example:
3733
3734
 
3735
3736
 
3737
to the line containing the "include" directive. There are no limits to the
3738
number of included files as long as they fit in memory.
3739
  The quoted path can contain environment variables enclosed within "%"
3740
characters, they will be replaced with their values inside the path, both the
3741
"\" and "/" characters are allowed as a path separators. The file is first
3742
searched for in the directory containing file which included it and when it is
3743
not found there, the search is continued in the directories specified in the
3744
environment variable called INCLUDE (the multiple paths separated with
3745
semicolons can be defined there, they will be searched in the same order as
3746
specified). If file was not found in any of these places, preprocessor looks
3747
for it in the directory containing the main source file (the one specified in
3748
command line). These rules concern also paths given with the "file" directive.
3749
3750
 
3751
 
3752
3753
 
3754
assembly process they are replaced with their values everywhere in source
3755
lines after their definitions, and anything can become their values.
3756
  The definition of symbolic constant consists of name of the constant
3757
followed by the "equ" directive. Everything that follows this directive will
3758
become the value of constant. If the value of symbolic constant contains
3759
other symbolic constants, they are replaced with their values before assigning
3760
this value to the new constant. For example:
3761
3762
 
3763
    NULL equ d 0
3764
    d equ edx
3765
3766
 
3767
the value of "d" is "edx". So, for example, "push NULL" will be assembled as
3768
"push dword 0" and "push d" will be assembled as "push edx". And if then the
3769
following line was put:
3770
3771
 
3772
3773
 
3774
lists of symbols can be defined.
3775
  "restore" directive allows to get back previous value of redefined symbolic
3776
constant. It should be followed by one more names of symbolic constants,
3777
separated with commas. So "restore d" after the above definitions will give
3778
"d" constant back the value "edx", the second one will restore it to value
3779
"dword", and one more will revert "d" to original meaning as if no such
3780
constant was defined. If there was no constant defined of given name,
3781
"restore" will not cause an error, it will be just ignored.
3782
  Symbolic constant can be used to adjust the syntax of assembler to personal
3783
preferences. For example the following set of definitions provides the handy
3784
shortcuts for all the size operators:
3785
3786
 
3787
    w equ word
3788
    d equ dword
3789
    p equ pword
3790
    f equ fword
3791
    q equ qword
3792
    t equ tword
3793
    x equ dqword
3794
    y equ qqword
3795
3796
 
3797
allow the syntax with "offset" word before any address value:
3798
3799
 
3800
3801
 
3802
copying the offset of "char" variable into "ax" register, because "offset" is
3803
replaced with an empty value, and therefore ignored.
3804
  The "define" directive followed by the name of constant and then the value,
3805
is the alternative way of defining symbolic constant. The only difference
3806
between "define" and "equ" is that "define" assigns the value as it is, it does
3807
not replace the symbolic constants with their values inside it.
3808
  Symbolic constants can also be defined with the "fix" directive, which has
3809
the same syntax as "equ", but defines constants of high priority - they are
3810
replaced with their symbolic values even before processing the preprocessor
3811
directives and macroinstructions, the only exception is "fix" directive
3812
itself, which has the highest possible priority, so it allows redefinition of
3813
constants defined this way.
3814
  The "fix" directive can be used for syntax adjustments related to directives
3815
of preprocessor, what cannot be done with "equ" directive. For example:
3816
3817
 
3818
3819
 
3820
with "equ" directive wouldn't give such result, as standard symbolic constants
3821
are replaced with their values after searching the line for preprocessor
3822
directives.
3823
3824
 
3825
 
3826
3827
 
3828
macroinstructions, using which can greatly simplify the process of
3829
programming. In its simplest form it's similar to symbolic constant
3830
definition. For example the following definition defines a shortcut for the
3831
"test al,0xFF" instruction:
3832
3833
 
3834
3835
 
3836
contents enclosed between the "{" and "}" characters. You can use "tst"
3837
instruction anywhere after this definition and it will be assembled as
3838
"test al,0xFF". Defining symbolic constant "tst" of that value would give the
3839
similar result, but the difference is that the name of macroinstruction is
3840
recognized only as an instruction mnemonic. Also, macroinstructions are
3841
replaced with corresponding code even before the symbolic constants are
3842
replaced with their values. So if you define macroinstruction and symbolic
3843
constant of the same name, and use this name as an instruction mnemonic, it
3844
will be replaced with the contents of macroinstruction, but it will be
3845
replaced with value if symbolic constant if used somewhere inside the
3846
operands.
3847
  The definition of macroinstruction can consist of many lines, because
3848
"{" and "}" characters don't have to be in the same line as "macro" directive.
3849
For example:
3850
3851
 
3852
     {
3853
        xor al,al
3854
        stosb
3855
     }
3856
3857
 
3858
instructions anywhere it's used.
3859
  Like instructions which needs some number of operands, the macroinstruction
3860
can be defined to need some number of arguments separated with commas. The
3861
names of needed argument should follow the name of macroinstruction in the
3862
line of "macro" directive and should be separated with commas if there is more
3863
than one. Anywhere one of these names occurs in the contents of
3864
macroinstruction, it will be replaced with corresponding value, provided when
3865
the macroinstruction is used. Here is an example of a macroinstruction that
3866
will do data alignment for binary output format:
3867
3868
 
3869
3870
 
3871
defined, it will be replaced with contents of this macroinstruction, and the
3872
"value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
3873
  If a macroinstruction is defined that uses an instruction with the same name
3874
inside its definition, the previous meaning of this name is used. Useful
3875
redefinition of macroinstructions can be done in that way, for example:
3876
3877
 
3878
     {
3879
      if op1 in  & op2 in 
3880
        push  op2
3881
        pop   op1
3882
      else
3883
        mov   op1,op2
3884
      end if
3885
     }
3886
3887
 
3888
operands to be segment registers. For example "mov ds,es" will be assembled as
3889
"push es" and "pop ds". In all other cases the standard "mov" instruction will
3890
be used. The syntax of this "mov" can be extended further by defining next
3891
macroinstruction of that name, which will use the previous macroinstruction:
3892
3893
 
3894
     {
3895
      if op3 eq
3896
        mov   op1,op2
3897
      else
3898
        mov   op1,op2
3899
        mov   op2,op3
3900
      end if
3901
     }
3902
3903
 
3904
operands only, because when macroinstruction is given less arguments than it
3905
needs, the rest of arguments will have empty values. When three operands are
3906
given, this macroinstruction will become two macroinstructions of the previous
3907
definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
3908
"mov ds,dx".
3909
  By placing the "*" after the name of argument you can mark the argument as
3910
required - preprocessor will not allow it to have an empty value. For example
3911
the above macroinstruction could be declared as "macro mov op1*,op2*,op3" to
3912
make sure that first two arguments will always have to be given some non empty
3913
values.
3914
  Alternatively, you can provide the default value for argument, by placing
3915
the "=" followed by value after the name of argument. Then if the argument
3916
has an empty value provided, the default value will be used instead.
3917
  When it's needed to provide macroinstruction with argument that contains
3918
some commas, such argument should be enclosed between "<" and ">" characters.
3919
If it contains more than one "<" character, the same number of ">" should be
3920
used to tell that the value of argument ends.
3921
  "purge" directive allows removing the last definition of specified
3922
macroinstruction. It should be followed by one or more names of
3923
macroinstructions, separated with commas. If such macroinstruction has not
3924
been defined, you will not get any error. For example after having the syntax
3925
of "mov" extended with the macroinstructions defined above, you can disable
3926
syntax with three operands back by using "purge mov" directive. Next
3927
"purge mov" will disable also syntax for two operands being segment registers,
3928
and all the next such directives will do nothing.
3929
  If after the "macro" directive you enclose some group of arguments' names in
3930
square brackets, it will allow giving more values for this group of arguments
3931
when using that macroinstruction. Any more argument given after the last
3932
argument of such group will begin the new group and will become the first
3933
argument of it. That's why after closing the square bracket no more argument
3934
names can follow. The contents of macroinstruction will be processed for each
3935
such group of arguments separately. The simplest example is to enclose one
3936
argument name in square brackets:
3937
3938
 
3939
     {
3940
        mov al,char
3941
        stosb
3942
     }
3943
3944
 
3945
will be processed into these two instructions separately. For example
3946
"stoschar 1,2,3" will be assembled as the following instructions:
3947
3948
 
3949
    stosb
3950
    mov al,2
3951
    stosb
3952
    mov al,3
3953
    stosb
3954
3955
 
3956
macroinstructions. "local" directive defines local names, which will be
3957
replaced with unique values each time the macroinstruction is used. It should
3958
be followed by names separated with commas. If the name given as parameter to
3959
"local" directive begins with a dot or two dots, the unique labels generated
3960
by each evaluation of macroinstruction will have the same properties.
3961
This directive is usually needed for the constants or labels that
3962
macroinstruction defines and uses internally. For example:
3963
3964
 
3965
     {
3966
        local move
3967
      move:
3968
        lodsb
3969
        stosb
3970
        test al,al
3971
        jnz move
3972
     }
3973
3974
 
3975
in its instructions, so you will not get an error you normally get when some
3976
label is defined more than once.
3977
  "forward", "reverse" and "common" directives divide macroinstruction into
3978
blocks, each one processed after the processing of previous is finished. They
3979
differ in behavior only if macroinstruction allows multiple groups of
3980
arguments. Block of instructions that follows "forward" directive is processed
3981
for each group of arguments, from first to last - exactly like the default
3982
block (not preceded by any of these directives). Block that follows "reverse"
3983
directive is processed for each group of argument in reverse order - from last
3984
to first. Block that follows "common" directive is processed only once,
3985
commonly for all groups of arguments. Local name defined in one of the blocks
3986
is available in all the following blocks when processing the same group of
3987
arguments as when it was defined, and when it is defined in common block it is
3988
available in all the following blocks not depending on which group of
3989
arguments is processed.
3990
  Here is an example of macroinstruction that will create the table of
3991
addresses to strings followed by these strings:
3992
3993
 
3994
     {
3995
      common
3996
        label name dword
3997
      forward
3998
        local label
3999
        dd label
4000
      forward
4001
        label db string,0
4002
     }
4003
4004
 
4005
of addresses, next arguments should be the strings. First block is processed
4006
only once and defines the label, second block for each string declares its
4007
local name and defines the table entry holding the address to that string.
4008
Third block defines the data of each string with the corresponding label.
4009
  The directive starting the block in macroinstruction can be followed by the
4010
first instruction of this block in the same line, like in the following
4011
example:
4012
4013
 
4014
     {
4015
      reverse push arg
4016
      common call proc
4017
     }
4018
4019
 
4020
convention, which has all the arguments pushed on stack in the reverse order.
4021
For example "stdcall foo,1,2,3" will be assembled as:
4022
4023
 
4024
    push 2
4025
    push 1
4026
    call foo
4027
4028
 
4029
of the arguments enclosed in square brackets or local name defined in the
4030
block following "forward" or "reverse" directive) and is used in block
4031
following the "common" directive, it will be replaced with all of its values,
4032
separated with commas. For example the following macroinstruction will pass
4033
all of the additional arguments to the previously defined "stdcall"
4034
macroinstruction:
4035
4036
 
4037
     { common stdcall [proc],arg }
4038
4039
 
4040
procedure using STDCALL convention.
4041
  Inside macroinstruction also special operator "#" can be used. This
4042
operator causes two names to be concatenated into one name. It can be useful,
4043
because it's done after the arguments and local names are replaced with their
4044
values. The following macroinstruction will generate the conditional jump
4045
according to the "cond" argument:
4046
4047
 
4048
     {
4049
        cmp op1,op2
4050
        j#cond label
4051
     }
4052
4053
 
4054
"jae exit" instructions.
4055
  The "#" operator can be also used to concatenate two quoted strings into one.
4056
Also conversion of name into a quoted string is possible, with the "`" operator,
4057
which likewise can be used inside the macroinstruction. It converts the name
4058
that follows it into a quoted string - but note, that when it is followed by
4059
a macro argument which is being replaced with value containing more than one
4060
symbol, only the first of them will be converted, as the "`" operator converts
4061
only one symbol that immediately follows it. Here's an example of utilizing
4062
those two features:
4063
4064
 
4065
     {
4066
        label name
4067
        if ~ used name
4068
          display `name # " is defined but not used.",13,10
4069
        end if
4070
     }
4071
4072
 
4073
you with the message, informing to which label it applies.
4074
  To make macroinstruction behaving differently when some of the arguments are
4075
of some special type, for example a quoted strings, you can use "eqtype"
4076
comparison operator. Here's an example of utilizing it to distinguish a
4077
quoted string from an other argument:
4078
4079
 
4080
     {
4081
      if arg eqtype ""
4082
        local str
4083
        jmp   @f
4084
        str   db arg,0Dh,0Ah,24h
4085
        @@:
4086
        mov   dx,str
4087
      else
4088
        mov   dx,arg
4089
      end if
4090
        mov   ah,9
4091
        int   21h
4092
     }
4093
4094
 
4095
argument of this macro is some number, label, or variable, the string from
4096
that address is displayed, but when the argument is a quoted string, the
4097
created code will display that string followed by the carriage return and
4098
line feed.
4099
  It is also possible to put a declaration of macroinstruction inside another
4100
macroinstruction, so one macro can define another, but there is a problem
4101
with such definitions caused by the fact, that "}" character cannot occur
4102
inside the macroinstruction, as it always means the end of definition. To
4103
overcome this problem, the escaping of symbols inside macroinstruction can be
4104
used. This is done by placing one or more backslashes in front of any other
4105
symbol (even the special character). Preprocessor sees such sequence as a
4106
single symbol, but each time it meets such symbol during the macroinstruction
4107
processing, it cuts the backslash character from the front of it. For example
4108
"\{" is treated as single symbol, but during processing of the macroinstruction
4109
it becomes the "{" symbol. This allows to put one definition of
4110
macroinstruction inside another:
4111
4112
 
4113
     {
4114
      macro instr op1,op2,op3
4115
       \{
4116
        if op3 eq
4117
          instr op1,op2
4118
        else
4119
          instr op1,op2
4120
          instr op2,op3
4121
        end if
4122
       \}
4123
     }
4124
4125
 
4126
    ext sub
4127
4128
 
4129
become the "{" and "}" symbols. So when the "ext add" is processed, the
4130
contents of macro becomes valid definition of a macroinstruction and this way
4131
the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
4132
macro. The use of "\{" symbol wasn't really necessary here, but is done this
4133
way to make the definition more clear.
4134
  If some directives specific to macroinstructions, like "local" or "common"
4135
are needed inside some macro embedded this way, they can be escaped in the same
4136
way. Escaping the symbol with more than one backslash is also allowed, which
4137
allows multiple levels of nesting the macroinstruction definitions.
4138
  The another technique for defining one macroinstruction by another is to
4139
use the "fix" directive, which becomes useful when some macroinstruction only
4140
begins the definition of another one, without closing it. For example:
4141
4142
 
4143
     {
4144
      common macro params {
4145
     }
4146
4147
 
4148
    ENDM fix }
4149
4150
 
4151
4152
 
4153
        mov al,char
4154
        stosb
4155
    ENDM
4156
4157
 
4158
directive, because only the prioritized symbolic constants are processed before
4159
the preprocessor looks for the "}" character while defining the macro. This
4160
might be a problem if one needed to perform some additional tasks one the end
4161
of such definition, but there is one more feature which helps in such cases.
4162
Namely it is possible to put any directive, instruction or  macroinstruction
4163
just after the "}" character that ends the macroinstruction and it will be
4164
processed in the same way as if it was put in the next line.
4165
4166
 
4167
 
4168
4169
 
4170
define data structures. Macroinstruction defined using the "struc" directive
4171
must be preceded by a label (like the data definition directive) when it's
4172
used. This label will be also attached at the beginning of every name starting
4173
with dot in the contents of macroinstruction. The macroinstruction defined
4174
using the "struc" directive can have the same name as some other
4175
macroinstruction defined using the "macro" directive, structure
4176
macroinstruction will not prevent the standard macroinstruction from being
4177
processed when there is no label before it and vice versa. All the rules and
4178
features concerning standard macroinstructions apply to structure
4179
macroinstructions.
4180
  Here is the sample of structure macroinstruction:
4181
4182
 
4183
     {
4184
        .x dw x
4185
        .y dw y
4186
     }
4187
4188
 
4189
two variables: "my.x" with value 7 and "my.y" with value 11.
4190
  If somewhere inside the definition of structure the name consisting of a
4191
single dot it found, it is replaced by the name of the label for the given
4192
instance of structure and this label will not be defined automatically in
4193
such case, allowing to completely customize the definition. The following
4194
example utilizes this feature to extend the data definition directive "db"
4195
with ability to calculate the size of defined data:
4196
4197
 
4198
     {
4199
       common
4200
        . db data
4201
        .size = $ - .
4202
     }
4203
4204
 
4205
constant, equal to the size of defined data in bytes.
4206
  Defining data structures addressed by registers or absolute values should be
4207
done using the "virtual" directive with structure macroinstruction
4208
(see 2.2.4).
4209
  "restruc" directive removes the last definition of the structure, just like
4210
"purge" does with macroinstructions and "restore" with symbolic constants.
4211
It also has the same syntax - should be followed by one or more names of
4212
structure macroinstructions, separated with commas.
4213
4214
 
4215
 
4216
4217
 
4218
amount of duplicates of the block enclosed with braces. The basic syntax is
4219
"rept" directive followed by number and then block of source enclosed between
4220
the "{" and "}" characters. The simplest example:
4221
4222
 
4223
4224
 
4225
is defined in the same way as for the standard macroinstruction and any
4226
special operators and directives which can be used only inside
4227
macroinstructions are also allowed here. When the given count is zero, the
4228
block is simply skipped, as if you defined macroinstruction but never used
4229
it. The number of repetitions can be followed by the name of counter symbol,
4230
which will get replaced symbolically with the number of duplicate currently
4231
generated. So this:
4232
4233
 
4234
     {
4235
        byte#counter db counter
4236
     }
4237
4238
 
4239
4240
 
4241
    byte2 db 2
4242
    byte3 db 3
4243
4244
 
4245
to process multiple groups of arguments for macroinstructions, so directives
4246
like "forward", "common" and "reverse" can be used in their usual meaning.
4247
Thus such macroinstruction:
4248
4249
 
4250
4251
 
4252
same way as inside macroinstruction with multiple groups of arguments, so:
4253
4254
 
4255
     {
4256
       local label
4257
       label: loop label
4258
     }
4259
4260
 
4261
  The counter symbol by default counts from 1, but you can declare different
4262
base value by placing the number preceded by colon immediately after the name
4263
of counter. For example:
4264
4265
 
4266
4267
 
4268
You can define multiple counters separated with commas, and each one can have
4269
different base.
4270
  The number of repetitions and the base values for counters can be specified
4271
using the numerical expressions with operator rules identical as in the case
4272
of assembler. However each value used in such expression must either be a
4273
directly specified number, or a symbolic constant with value also being an
4274
expression that can be calculated by preprocessor (in such case the value
4275
of expression associated with symbolic constant is calculated first, and then
4276
substituted into the outer expression in place of that constant). If you need
4277
repetitions based on values that can only be calculated at assembly time, use
4278
one of the code repeating directives that are processed by assembler, see
4279
section 2.2.3.
4280
  The "irp" directive iterates the single argument through the given list of
4281
parameters. The syntax is "irp" followed by the argument name, then the comma
4282
and then the list of parameters. The parameters are specified in the same
4283
way like in the invocation of standard macroinstruction, so they have to be
4284
separated with commas and each one can be enclosed with the "<" and ">"
4285
characters. Also the name of argument may be followed by "*" to mark that it
4286
cannot get an empty value. Such block:
4287
4288
 
4289
    { db value }
4290
4291
 
4292
4293
 
4294
   db 3
4295
   db 5
4296
4297
 
4298
be followed by the argument name, then the comma and then the sequence of any
4299
symbols. Each symbol in this sequence, no matter whether it is the name
4300
symbol, symbol character or quoted string, becomes an argument value for one
4301
iteration. If there are no symbols following the comma, no iteration is done
4302
at all. This example:
4303
4304
 
4305
    { xor reg,reg }
4306
4307
 
4308
4309
 
4310
   xor bx,bx
4311
   xor ecx,ecx
4312
4313
 
4314
the same way as any macroinstructions, so operators and directives specific
4315
to macroinstructions may be freely used also in this case.
4316
4317
 
4318
 
4319
4320
 
4321
to assembler only when the given sequence of symbols matches the specified
4322
pattern. The pattern comes first, ended with comma, then the symbols that have
4323
to be matched with the pattern, and finally the block of source, enclosed
4324
within braces as macroinstruction.
4325
  There are the few rules for building the expression for matching, first is
4326
that any of symbol characters and any quoted string should be matched exactly
4327
as is. In this example:
4328
4329
 
4330
    match +,- { include 'second.inc' }
4331
4332
 
4333
pattern, and the second file will not be included, since there is no match.
4334
  To match any other symbol literally, it has to be preceded by "=" character
4335
in the pattern. Also to match the "=" character itself, or the comma, the
4336
"==" and "=," constructions have to be used. For example the "=a==" pattern
4337
will match the "a=" sequence.
4338
  If some name symbol is placed in the pattern, it matches any sequence
4339
consisting of at least one symbol and then this name is replaced with the
4340
matched sequence everywhere inside the following block, analogously to the
4341
parameters of macroinstruction. For instance:
4342
4343
 
4344
     { dw a,b-a }
4345
4346
 
4347
as few symbols as possible, leaving the rest for the following ones, so in
4348
this case:
4349
4350
 
4351
4352
 
4353
matched with "b". But in this case:
4354
4355
 
4356
4357
 
4358
processed at all.
4359
  The block of source defined by match is processed in the same way as any
4360
macroinstruction, so any operators specific to macroinstructions can be used
4361
also in this case.
4362
  What makes "match" directive more useful is the fact, that it replaces the
4363
symbolic constants with their values in the matched sequence of symbols (that
4364
is everywhere after comma up to the beginning of the source block) before
4365
performing the match. Thanks to this it can be used for example to process
4366
some block of source under the condition that some symbolic constant has the
4367
given value, like:
4368
4369
 
4370
4371
 
4372
defined with value "TRUE".
4373
4374
 
4375
 
4376
4377
 
4378
the order in which they are processed. As it was already noted, the highest
4379
priority has the "fix" directive and the replacements defined with it. This
4380
is done completely before doing any other preprocessing, therefore this
4381
piece of source:
4382
4383
 
4384
      macro empty
4385
       V
4386
    V fix }
4387
       V
4388
4389
 
4390
that the "fix" directive and prioritized symbolic constants are processed in
4391
a separate stage, and all other preprocessing is done after on the resulting
4392
source.
4393
  The standard preprocessing that comes after, on each line begins with
4394
recognition of the first symbol. It starts with checking for the preprocessor
4395
directives, and when none of them is detected, preprocessor checks whether the
4396
first symbol is macroinstruction. If no macroinstruction is found, it moves
4397
to the second symbol of line, and again begins with checking for directives,
4398
which in this case is only the "equ" directive, as this is the only one that
4399
occurs as the second symbol in line. If there is no directive, the second
4400
symbol is checked for the case of structure macroinstruction and when none
4401
of those checks gives the positive result, the symbolic constants are replaced
4402
with their values and such line is passed to the assembler.
4403
  To see it on the example, assume that there is defined the macroinstruction
4404
called "foo" and the structure macroinstruction called "bar". Those lines:
4405
4406
 
4407
    foo bar
4408
4409
 
4410
the meaning of the first symbol overrides the meaning of second one.
4411
  When the macroinstruction generates the new lines from its definition block,
4412
in every line it first scans for macroinstruction directives, and interpretes
4413
them accordingly. All the other content in the definition block is used to
4414
brew the new lines, replacing the macroinstruction parameters with their values
4415
and then processing the symbol escaping and "#" and "`" operators. The
4416
conversion operator has the higher priority than concatenation and if any of
4417
them operates on the escaped symbol, the escaping is cancelled before finishing
4418
the operation. After this is completed, the newly generated line goes through
4419
the standard preprocessing, as described above.
4420
  Though the symbolic constants are usually only replaced in the lines, where
4421
no preprocessor directives nor macroinstructions has been found, there are some
4422
special cases where those replacements are performed in the parts of lines
4423
containing directives. First one is the definition of symbolic constant, where
4424
the replacements are done everywhere after the "equ" keyword and the resulting
4425
value is then assigned to the new constant (see 2.3.2). The second such case
4426
is the "match" directive, where the replacements are done in the symbols
4427
following comma before matching them with pattern. These features can be used
4428
for example to maintain the lists, like this set of definitions:
4429
4430
 
4431
4432
 
4433
     {
4434
       match any, list \{ list equ list,item \}
4435
       match , list \{ list equ item \}
4436
     }
4437
4438
 
4439
macroinstruction can be used to add the new items into this list, separating
4440
them with commas. The first match in this macroinstruction occurs only when
4441
the value of list is not empty (see 2.3.6), in such case the new value for the
4442
list is the previous one with the comma and the new item appended at the end.
4443
The second match happens only when the list is still empty, and in such case
4444
the list is defined to contain just the new item. So starting with the empty
4445
list, the "append 1" would define "list equ 1" and the "append 2" following it
4446
would define "list equ 1,2". One might then need to use this list as the
4447
parameters to some macroinstruction. But it cannot be done directly - if "foo"
4448
is the macroinstruction, then "foo list" would just pass the "list" symbol
4449
as a parameter to macro, since symbolic constants are not unrolled at this
4450
stage. For this purpose again "match" directive comes in handy:
4451
4452
 
4453
4454
 
4455
then replaced with matched value when generating the new lines defined by the
4456
block enclosed with braces. So if the "list" had value "1,2", the above line
4457
would generate the line containing "foo 1,2", which would then go through the
4458
standard preprocessing.
4459
  The other special case is in the parameters of "rept" directive. The amount
4460
of repetitions and the base value for counter can be specified using
4461
numerical expressions, and if there is a symbolic constant with non-numerical
4462
name used in such an expression, preprocessor tries to evaluate its value as
4463
a numerical expression and if succeeds, it replaces the symbolic constant with
4464
the result of that calculation and continues to evaluate the primary
4465
expression. If the expression inside that symbolic constants also contains
4466
some symbolic constants, preprocessor will try to calculate all the needed
4467
values recursively.
4468
  This allows to perform some calculations at the time of preprocessing, as
4469
long as all the values used are the numbers known at the preprocessing stage.
4470
A single repetition with "rept" can be used for the sole purpose of
4471
calculating some value, like in this example:
4472
4473
 
4474
    define b 3
4475
    rept 1 result:a*b+2 { define c result }
4476
4477
 
4478
with its value and recursively calculates the value of "a", obtaining 7 as
4479
the result, then it calculates the main expression with the result being 23.
4480
The "c" then gets defined with the first value of counter (because the block
4481
is processed just one time), which is the result of the computation, so the
4482
value of "c" is simple "23" symbol. Note that if "b" is later redefined with
4483
some other numerical value, the next time and expression containing "a" is
4484
calculated, the value of "a" will reflect the new value of "b", because the
4485
symbolic constant contains just the text of the expression.
4486
  There is one more special case - when preprocessor goes to checking the
4487
second symbol in the line and it happens to be the colon character (what is
4488
then interpreted by assembler as definition of a label), it stops in this
4489
place and finishes the preprocessing of the first symbol (so if it's the
4490
symbolic constant it gets unrolled) and if it still appears to be the label,
4491
it performs the standard preprocessing starting from the place after the
4492
label. This allows to place preprocessor directives and macroinstructions
4493
after the labels, analogously to the instructions and directives processed
4494
by assembler, like:
4495
4496
 
4497
4498
 
4499
it is the symbolic constant with empty value), only replacing of the symbolic
4500
constants is continued for the rest of line.
4501
  It should be remembered, that the jobs performed by preprocessor are the
4502
preliminary operations on the texts symbols, that are done in a simple
4503
single pass before the main process of assembly. The text that is the
4504
result of preprocessing is passed to assembler, and it then does its
4505
multiple passes on it. Thus the control directives, which are recognized and
4506
processed only by the assembler - as they are dependent on the numerical
4507
values that may even vary between passes - are not recognized in any way by
4508
the preprocessor and have no effect on the preprocessing. Consider this
4509
example source:
4510
4511
 
4512
    a = 1
4513
    b equ 2
4514
    end if
4515
    dd b
4516
4517
 
4518
preprocessor is the "equ", which defines symbolic constant "b", so later
4519
in the source the "b" symbol is replaced with the value "2". Except for this
4520
replacement, the other lines are passes unchanged to the assembler. So
4521
after preprocessing the above source becomes:
4522
4523
 
4524
    a = 1
4525
    end if
4526
    dd 2
4527
4528
 
4529
the "a" constant doesn't get defined. However symbolic constant "b" was
4530
processed normally, even though its definition was put just next to the one
4531
of "a". So because of the possible confusion you should be very careful
4532
every time when mixing the features of preprocessor and assembler - in such
4533
cases it is important to realize what the source will become after the
4534
preprocessing, and thus what the assembler will see and do its multiple passes
4535
on.
4536
4537
 
4538
 
4539
4540
 
4541
purpose of controlling the format of generated code.
4542
  "format" directive followed by the format identifier allows to select the
4543
output format. This directive should be put at the beginning of the source.
4544
Default output format is a flat binary file, it can also be selected by using
4545
"format binary" directive. This directive can be followed by the "as" keyword
4546
and the quoted string specifying the default file extension for the output
4547
file. Unless the output file name was specified from the command line,
4548
assembler will use this extension when generating the output file.
4549
  "use16" and "use32" directives force the assembler to generate 16-bit or
4550
32-bit code, omitting the default setting for selected output format. "use64"
4551
enables generating the code for the long mode of x86-64 processors.
4552
  Below are described different output formats with the directives specific to
4553
these formats.
4554
4555
 
4556
 
4557
4558
 
4559
setting for this format is 16-bit.
4560
  "segment" directive defines a new segment, it should be followed by label,
4561
which value will be the number of defined segment, optionally "use16" or
4562
"use32" word can follow to specify whether code in this segment should be
4563
16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
4564
All the labels defined then will have values relative to the beginning of this
4565
segment.
4566
  "entry" directive sets the entry point for MZ executable, it should be
4567
followed by the far address (name of segment, colon and the offset inside
4568
segment) of desired entry point.
4569
  "stack" directive sets up the stack for MZ executable. It can be followed by
4570
numerical expression specifying the size of stack to be created automatically
4571
or by the far address of initial stack frame when you want to set up the stack
4572
manually. When no stack is defined, the stack of default size 4096 bytes will
4573
be created.
4574
  "heap" directive should be followed by a 16-bit value defining maximum size
4575
of additional heap in paragraphs (this is heap in addition to stack and
4576
undefined data). Use "heap 0" to always allocate only memory program really
4577
needs. Default size of heap is 65535.
4578
4579
 
4580
 
4581
4582
 
4583
can be followed by additional format settings: first the target subsystem
4584
setting, which can be "console" or "GUI" for Windows applications, "native"
4585
for Windows drivers, "EFI", "EFIboot" or "EFIruntime" for the UEFI, it may be
4586
followed by the minimum version of system that the executable is targeted to
4587
(specified in form of floating-point value). Optional "DLL" and "WDM" keywords
4588
mark the output file as a dynamic link library and WDM driver respectively,
4589
and the "large" keyword marks the executable as able to handle addresses
4590
larger than 2 GB.
4591
  After those settings can follow the "at" operator and a numerical expression
4592
specifying the base of PE image and then optionally "on" operator followed by
4593
the quoted string containing file name selects custom MZ stub for PE program
4594
(when specified file is not a MZ executable, it is treated as a flat binary
4595
executable file and converted into MZ format). The default code setting for
4596
this format is 32-bit. The example of fully featured PE format declaration:
4597
4598
 
4599
4600
 
4601
"PE" in the format declaration, in such case the long mode code is generated
4602
by default.
4603
  "section" directive defines a new section, it should be followed by quoted
4604
string defining the name of section, then one or more section flags can
4605
follow. Available flags are: "code", "data", "readable", "writeable",
4606
"executable", "shareable", "discardable", "notpageable". The origin of section
4607
is aligned to page (4096 bytes). Example declaration of PE section:
4608
4609
 
4610
4611
 
4612
to mark the whole section as a special data, possible identifiers are
4613
"export", "import", "resource" and "fixups". If the section is marked to
4614
contain fixups, they are generated automatically and no more data needs to be
4615
defined in this section. Also resource data can be generated automatically
4616
from the resource file, it can be achieved by writing the "from" operator and
4617
quoted file name after the "resource"  identifier. Below are the examples of
4618
sections containing some special PE data:
4619
4620
 
4621
    section '.rsrc' data readable resource from 'my.res'
4622
4623
 
4624
entry point should follow.
4625
  "stack" directive sets up the size of stack for Portable Executable, value
4626
of stack reserve size should follow, optionally value of stack commit
4627
separated with comma can follow. When stack is not defined, it's set by
4628
default to size of 4096 bytes.
4629
  "heap" directive chooses the size of heap for Portable Executable, value of
4630
heap reserve size should follow, optionally value of heap commit separated
4631
with comma can follow. When no heap is defined, it is set by default to size
4632
of 65536 bytes, when size of heap commit is unspecified, it is by default set
4633
to zero.
4634
  "data" directive begins the definition of special PE data, it should be
4635
followed by one of the data identifiers ("export", "import", "resource" or
4636
"fixups") or by the number of data entry in PE header. The data should be
4637
defined in next lines, ended with "end data" directive. When fixups data
4638
definition is chosen, they are generated automatically and no more data needs
4639
to be defined there. The same applies to the resource data when the "resource"
4640
identifier is followed by "from" operator and quoted file name - in such case
4641
data is  taken from the given resource file.
4642
  The "rva" operator can be used inside the numerical expressions to obtain
4643
the RVA of the item addressed by the value it is applied to, that is the
4644
offset relative to the base of PE image.
4645
4646
 
4647
 
4648
4649
 
4650
directive, depending whether you want to create classic (DJGPP) or Microsoft's
4651
variant of COFF file. The default code setting for this format is 32-bit. To
4652
create the file in Microsoft's COFF format for the x86-64 architecture, use
4653
"format MS64 COFF" setting, in such case long mode code is generated by
4654
default.
4655
  "section" directive defines a new section, it should be followed by quoted
4656
string defining the name of section, then one or more section flags can
4657
follow. Section flags available for both COFF variants are "code" and "data",
4658
while flags "readable", "writeable", "executable", "shareable", "discardable",
4659
"notpageable", "linkremove" and "linkinfo" are available only with Microsoft's
4660
COFF variant.
4661
  By default section is aligned to double word (four bytes), in case of
4662
Microsoft COFF variant other alignment can be specified by providing the
4663
"align" operator followed by alignment value (any power of two up to 8192)
4664
among the section flags.
4665
  "extrn" directive defines the external symbol, it should be followed by the
4666
name of symbol and optionally the size operator specifying the size of data
4667
labeled by this symbol. The name of symbol can be also preceded by quoted
4668
string containing name of the external symbol and the "as" operator.
4669
Some example declarations of external symbols:
4670
4671
 
4672
    extrn '__imp__MessageBoxA@16' as MessageBox:dword
4673
4674
 
4675
followed by the name of symbol, optionally it can be followed by the "as"
4676
operator and the quoted string containing name under which symbol should be
4677
available as public. Some examples of public symbols declarations:
4678
4679
 
4680
    public start as '_start'
4681
4682
 
4683
static, it's done by preceding the name of symbol with the "static" keyword.
4684
  When using the Microsoft's COFF format, the "rva" operator can be used
4685
inside the numerical expressions to obtain the RVA of the item addressed by the
4686
value it is applied to.
4687
4688
 
4689
4690
 
4691
setting for this format is 32-bit. To create ELF file for the x86-64
4692
architecture, use "format ELF64" directive, in such case the long mode code is
4693
generated by default.
4694
  "section" directive defines a new section, it should be followed by quoted
4695
string defining the name of section, then can follow one or both of the
4696
"executable" and "writeable" flags, optionally also "align" operator followed
4697
by the number specifying the alignment of section (it has to be the power of
4698
two), if no alignment is specified, the default value is used, which is 4 or 8,
4699
depending on which format variant has been chosen.
4700
  "extrn" and "public" directives have the same meaning and syntax as when the
4701
COFF output format is selected (described in previous section).
4702
  The "rva" operator can be used also in the case of this format (however not
4703
when target architecture is x86-64), it converts the address into the offset
4704
relative to the GOT table, so it may be useful to create position-independent
4705
code. There's also a special "plt" operator, which allows to call the external
4706
functions through the Procedure Linkage Table. You can even create an alias
4707
for external function that will make it always be called through PLT, with
4708
the code like:
4709
4710
 
4711
    printf = PLT _printf
4712
4713
 
4714
"executable" keyword and optionally the number specifying the brand of the
4715
target operating system (for example value 3 would mark the executable
4716
for Linux system). With this format selected it is allowed to use "entry"
4717
directive followed by the value to set as entry point of program. On the other
4718
hand it makes "extrn" and "public" directives unavailable, and instead of
4719
"section" there should be the "segment" directive used, followed by one or
4720
more segment permission flags and optionally a marker of special ELF
4721
executable segment, which can be "interpreter", "dynamic" or "note". The
4722
origin of segment is aligned to page (4096 bytes), and available permission
4723
flags are: "readable", "writeable" and "executable".
4724
4725
 
4726