Subversion Repositories Kolibri OS

Rev

Go to most recent revision | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

  1.  
  2.                            ▄▀▀▀
  3.                          ▄▄█▄▄ ▄▄▄▄    ▄▄▄▄▄ ▄▄▄ ▄▄
  4.                            █       █  █      █  █  █
  5.                            █  ▄▀▀▀▀█   ▀▀▀▀▄ █  █  █
  6.                            █  ▀▄▄▄▄█▄ ▄▄▄▄▄▀ █  █  █
  7.  
  8.                               flat assembler 1.66
  9.                               Programmer's Manual
  10.  
  11.  
  12. Table of contents
  13. ─────────────────
  14.  
  15. Chapter 1  Introduction
  16.  
  17.         1.1  Compiler overview
  18.         1.1.1  System requirements
  19.         1.1.2  Executing compiler from command line
  20.         1.1.3  Compiler messages
  21.         1.1.4  Output formats
  22.  
  23.         1.2  Assembly syntax
  24.         1.2.1  Instruction syntax
  25.         1.2.2  Data definitions
  26.         1.2.3  Constants and labels
  27.         1.2.4  Numerical expressions
  28.         1.2.5  Jumps and calls
  29.         1.2.6  Size settings
  30.  
  31. Chapter 2  Instruction set
  32.  
  33.         2.1  The x86 architecture instructions
  34.         2.1.1  Data movement instructions
  35.         2.1.2  Type conversion instructions
  36.         2.1.3  Binary arithmetic instructions
  37.         2.1.4  Decimal arithmetic instructions
  38.         2.1.5  Logical instructions
  39.         2.1.6  Control transfer instructions
  40.         2.1.7  I/O instructions
  41.         2.1.8  Strings operations
  42.         2.1.9  Flag control instructions
  43.         2.1.10  Conditional operations
  44.         2.1.11  Miscellaneous instructions
  45.         2.1.12  System instructions
  46.         2.1.13  FPU instructions
  47.         2.1.14  MMX instructions
  48.         2.1.15  SSE instructions
  49.         2.1.16  SSE2 instructions
  50.         2.1.17  SSE3 instructions
  51.         2.1.18  AMD 3DNow! instructions
  52.         2.1.19  The x86-64 long mode instructions
  53.  
  54.         2.2  Control directives
  55.         2.2.1  Numerical constants
  56.         2.2.2  Conditional assembly
  57.         2.2.3  Repeating blocks of instructions
  58.         2.2.4  Addressing spaces
  59.         2.2.5  Other directives
  60.         2.2.6  Multiple passes
  61.  
  62.         2.3  Preprocessor directives
  63.         2.3.1  Including source files
  64.         2.3.2  Symbolic constants
  65.         2.3.3  Macroinstructions
  66.         2.3.4  Structures
  67.         2.3.5  Repeating macroinstructions
  68.         2.3.6  Conditional preprocessing
  69.         2.3.7  Order of processing
  70.  
  71.         2.4  Formatter directives
  72.         2.4.1  MZ executable
  73.         2.4.2  Portable Executable
  74.         2.4.3  Common Object File Format
  75.         2.4.4  Executable and Linkable Format
  76.  
  77.  
  78. Chapter 1  Introduction
  79. ───────────────────────
  80.  
  81. This chapter contains all the most important information you need to begin
  82. using the flat assembler. If you are experienced assembly language programmer,
  83. you should read at least this chapter before using this compiler.
  84.  
  85.  
  86. 1.1  Compiler overview
  87.  
  88. Flat assembler is a fast assembly language compiler for the x86 architecture
  89. processors, which does multiple passes to optimize the size of generated
  90. machine code. It is self-compilable and versions for different operating
  91. systems are provided. All the versions are designed to be used from the system
  92. command line and they should not differ in behavior.
  93.  
  94.  
  95. 1.1.1  System requirements
  96.  
  97. All versions require the x86 architecture 32-bit processor (at least 80386),
  98. although they can produce programs for the x86 architecture 16-bit processors,
  99. too. DOS version requires an OS compatible with MS DOS 2.0 and either true
  100. real mode environment or DPMI. Windows version requires a Win32 console
  101. compatible with 3.1 version.
  102.  
  103.  
  104. 1.1.2  Executing compiler from command line
  105.  
  106. To execute flat assembler from the command line you need to provide two
  107. parameters - first should be name of source file, second should be name of
  108. destination file. If no second parameter is given, the name for output
  109. file will be guessed automatically. After displaying short information about
  110. the program name and version, compiler will read the data from source file and
  111. compile it. When the compilation is successful, compiler will write the
  112. generated code to the destination file and display the summary of compilation
  113. process; otherwise it will display the information about error that occurred.
  114.   The source file should be a text file, and can be created in any text
  115. editor. Line breaks are accepted in both DOS and Unix standards, tabulators
  116. are treated as spaces.
  117.   In the command line you can also include "-m" option followed by a number,
  118. which specifies how many kilobytes of memory flat assembler should maximally
  119. use. In case of DOS version this options limits only the usage of extended
  120. memory. The "-p" option followed by a number can be used to specify the limit
  121. for number of passes the assembler performs. If code cannot be generated
  122. within specified amount of passes, the assembly will be terminated with an
  123. error message. The maximum value of this setting is 65536, while the default
  124. limit, used when no such option is included in command line, is 100.
  125. It is also possible to limit the number of passes the assembler
  126. performs, with the "-p" option followed by a number specifying the maximum
  127. number of passes.
  128.   There are no command line options that would affect the output of compiler,
  129. flat assembler requires only the source code to include the information it
  130. really needs. For example, to specify output format you specify it by using
  131. the "format" directive at the beginning of source.
  132.  
  133.  
  134. 1.1.3  Compiler messages
  135.  
  136. As it is stated above, after the successful compilation, the compiler displays
  137. the compilation summary. It includes the information of how many passes was
  138. done, how much time it took, and how many bytes were written into the
  139. destination file.
  140. The following is an example of the compilation summary:
  141.  
  142. flat assembler  version 1.66
  143. 38 passes, 5.3 seconds, 77824 bytes.
  144.  
  145. In case of error during the compilation process, the program will display an
  146. error message. For example, when compiler can't find the input file, it will
  147. display the following message:
  148.  
  149. flat assembler  version 1.66
  150. error: source file not found.
  151.  
  152. If the error is connected with a specific part of source code, the source line
  153. that caused the error will be also displayed. Also placement of this line in
  154. the source is given to help you finding this error, for example:
  155.  
  156. flat assembler  version 1.66
  157. example.asm [3]:
  158.         mob     ax,1
  159. error: illegal instruction.
  160.  
  161. It means that in the third line of the "example.asm" file compiler has
  162. encountered an unrecognized instruction. When the line that caused error
  163. contains a macroinstruction, also the line in macroinstruction definition
  164. that generated the erroneous instruction is displayed:
  165.  
  166. flat assembler  version 1.66
  167. example.asm [6]:
  168.         stoschar 7
  169. example.asm [3] stoschar [1]:
  170.         mob     al,char
  171. error: illegal instruction.
  172.  
  173. It means that the macroinstruction in the sixth line of the "example.asm" file
  174. generated an unrecognized instruction with the first line of its definition.
  175.  
  176.  
  177. 1.1.4  Output formats
  178.  
  179. By default, when there is no "format" directive in source file, flat
  180. assembler simply puts generated instruction codes into output, creating this
  181. way flat binary file. By default it generates 16-bit code, but you can always
  182. turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
  183. Some of the output formats switch into 32-bit mode, when selected - more
  184. information about formats which you can choose can be found in 2.4.
  185.   All output code is always in the order in which it was entered into the
  186. source file.
  187.  
  188.  
  189. 1.2  Assembly syntax
  190.  
  191. The information provided below is intended mainly for the assembler
  192. programmers that have been using some other assembly compilers before.
  193. If you are beginner, you should look for the assembly programming tutorials.
  194.   Flat assembler by default uses the Intel syntax for the assembly
  195. instructions, although you can customize it using the preprocessor
  196. capabilities (macroinstructions and symbolic constants). It also has its own
  197. set of the directives - the instructions for compiler.
  198.   All symbols defined inside the sources are case-sensitive.
  199.  
  200.  
  201. 1.2.1  Instruction syntax
  202.  
  203. Instructions in assembly language are separated by line breaks, and one
  204. instruction is expected to fill the one line of text. If a line contains
  205. a semicolon, except for the semicolons inside the quoted strings, the rest of
  206. this line is the comment and compiler ignores it. If a line ends with "\"
  207. character (eventually the semicolon and comment may follow it), the next line
  208. is attached at this point.
  209.   Each line in source is the sequence of items, which may be one of the three
  210. types. One type are the symbol characters, which are the special characters
  211. that are individual items even when are not spaced from the other ones.
  212. Any of the "+-*/=<>()[]{}:,|&~#`" is the symbol character. The sequence of
  213. other characters, separated from other items with either blank spaces or
  214. symbol characters, is a symbol. If the first character of symbol is either a
  215. single or double quote, it integrates the any sequence of characters following
  216. it, even the special ones, into a quoted string, which should end with the same
  217. character, with which it began (the single or double quote) - however if there
  218. are two such characters in a row (without any other character between them),
  219. they are integrated into quoted string as just one of them and the quoted
  220. string continues then. The symbols other than symbol characters and quoted
  221. strings can be used as names, so are also called the name symbols.
  222.   Every instruction consists of the mnemonic and the various number of
  223. operands, separated with commas. The operand can be register, immediate value
  224. or a data addressed in memory, it can also be preceded by size operator to
  225. define or override its size (table 1.1). Names of available registers you can
  226. find in table 1.2, their sizes cannot be overridden. Immediate value can be
  227. specified by any numerical expression.
  228.   When operand is a data in memory, the address of that data (also any
  229. numerical expression, but it may contain registers) should be enclosed in
  230. square brackets or preceded by "ptr" operator. For example instruction
  231. "mov eax,3" will put the immediate value 3 into the EAX register, instruction
  232. "mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
  233. instruction "mov byte [7],3" will put the immediate value 3 into the byte at
  234. address 7, it can also be written as "mov byte ptr 7,3". To specify which
  235. segment register should be used for addressing, segment register name followed
  236. by a colon should be put just before the address value (inside the square
  237. brackets or after the "ptr" operator).
  238.  
  239.    Table 1.1  Size operators
  240.   ┌──────────┬──────┬───────┐
  241.   │ Operator │ Bits │ Bytes │
  242.   ╞══════════╪══════╪═══════╡
  243.   │ byte     │ 8    │ 1     │
  244.   │ word     │ 16   │ 2     │
  245.   │ dword    │ 32   │ 4     │
  246.   │ fword    │ 48   │ 6     │
  247.   │ pword    │ 48   │ 6     │
  248.   │ qword    │ 64   │ 8     │
  249.   │ tbyte    │ 80   │ 10    │
  250.   │ tword    │ 80   │ 10    │
  251.   │ dqword   │ 128  │ 16    │
  252.   └──────────┴──────┴───────┘
  253.  
  254.    Table 1.2  Registers
  255.   ┌─────────┬──────┬────────────────────────────────────────────────┐
  256.   │ Type    │ Bits │                                                │
  257.   ╞═════════╪══════╪════════════════════════════════════════════════╡
  258.   │         │ 8    │ al    cl    dl    bl    ah    ch    dh    bh   │
  259.   │ General │ 16   │ ax    cx    dx    bx    sp    bp    si    di   │
  260.   │         │ 32   │ eax   ecx   edx   ebx   esp   ebp   esi   edi  │
  261.   ├─────────┼──────┼────────────────────────────────────────────────┤
  262.   │ Segment │ 16   │ es    cs    ss    ds    fs    gs               │
  263.   ├─────────┼──────┼────────────────────────────────────────────────┤
  264.   │ Control │ 32   │ cr0         cr2   cr3   cr4                    │
  265.   ├─────────┼──────┼────────────────────────────────────────────────┤
  266.   │ Debug   │ 32   │ dr0   dr1   dr2   dr3               dr6   dr7  │
  267.   ├─────────┼──────┼────────────────────────────────────────────────┤
  268.   │ FPU     │ 80   │ st0   st1   st2   st3   st4   st5   st6   st7  │
  269.   ├─────────┼──────┼────────────────────────────────────────────────┤
  270.   │ MMX     │ 64   │ mm0   mm1   mm2   mm3   mm4   mm5   mm6   mm7  │
  271.   ├─────────┼──────┼────────────────────────────────────────────────┤
  272.   │ SSE     │ 128  │ xmm0  xmm1  xmm2  xmm3  xmm4  xmm5  xmm6  xmm7 │
  273.   └─────────┴──────┴────────────────────────────────────────────────┘
  274.  
  275.  
  276. 1.2.2  Data definitions
  277.  
  278. To define data or reserve a space for it, use one of the directives listed in
  279. table 1.3. The data definition directive should be followed by one or more of
  280. numerical expressions, separated with commas. These expressions define the
  281. values for data cells of size depending on which directive is used. For
  282. example "db 1,2,3" will define the three bytes of values 1, 2 and 3
  283. respectively.
  284.   The "db" and "du" directives also accept the quoted string values of any
  285. length, which will be converted into chain of bytes when "db" is used and into
  286. chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
  287. will define the three bytes of values 61, 62 and 63.
  288.   The "dp" directive and its synonym "df" accept the values consisting of two
  289. numerical expressions separated with colon, the first value will become the
  290. high word and the second value will become the low double word of the far
  291. pointer value. Also "dd" accepts such pointers consisting of two word values
  292. separated with colon, and "dt" accepts the word and quad word value separated
  293. with colon, the quad word is stored first. The "dt" directive with single
  294. expression as parameter accepts only floating point values and creates data in
  295. FPU double extended precision format.
  296.   Any of the above directive allows the usage of special "dup" operator to
  297. make multiple copies of given values. The count of duplicates should precede
  298. this operator and the value to duplicate should follow - it can even be the
  299. chain of values separated with commas, but such set of values needs to be
  300. enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
  301. of the given two byte sequence.
  302.   The "file" is a special directive and its syntax is different. This
  303. directive includes a chain of bytes from file and it should be followed by the
  304. quoted file name, then optionally numerical expression specifying offset in
  305. file preceded by the colon, and - also optionally - comma and numerical
  306. expression specifying count of bytes to include (if no count is specified, all
  307. data up to the end of file is included). For example "file 'data.bin'" will
  308. include the whole file as binary data and "file 'data.bin':10h,4" will include
  309. only four bytes starting at offset 10h.
  310.   The data reservation directive should be followed by only one numerical
  311. expression, and this value defines how many cells of the specified size should
  312. be reserved. All data definition directives also accept the "?" value, which
  313. means that this cell should not be initialized to any value and the effect is
  314. the same as by using the data reservation directive. The uninitialized data
  315. may not be included in the output file, so its values should be always
  316. considered unknown.
  317.  
  318.    Table 1.3  Data directives
  319.   ┌─────────┬────────┬─────────┐
  320.   │ Size    │ Define │ Reserve │
  321.   │ (bytes) │ data   │ data    │
  322.   ╞═════════╪════════╪═════════╡
  323.   │ 1       │ db     │ rb      │
  324.   │         │ file   │         │
  325.   ├─────────┼────────┼─────────┤
  326.   │ 2       │ dw     │ rw      │
  327.   │         │ du     │         │
  328.   ├─────────┼────────┼─────────┤
  329.   │ 4       │ dd     │ rd      │
  330.   ├─────────┼────────┼─────────┤
  331.   │ 6       │ dp     │ rp      │
  332.   │         │ df     │ rf      │
  333.   ├─────────┼────────┼─────────┤
  334.   │ 8       │ dq     │ rq      │
  335.   ├─────────┼────────┼─────────┤
  336.   │ 10      │ dt     │ rt      │
  337.   └─────────┴────────┴─────────┘
  338.  
  339.  
  340. 1.2.3  Constants and labels
  341.  
  342. In the numerical expressions you can also use constants or labels instead of
  343. numbers. To define the constant or label you should use the specific
  344. directives. Each label can be defined only once and it is accessible from the
  345. any place of source (even before it was defined). Constant can be redefined
  346. many times, but in this case it is accessible only after it was defined, and
  347. is always equal to the value from last definition before the place where it's
  348. used. When a constant is defined only once in source, it is - like the label -
  349. accessible from anywhere.
  350.   The definition of constant consists of name of the constant followed by the
  351. "=" character and numerical expression, which after calculation will become
  352. the value of constant. This value is always calculated at the time the
  353. constant is defined. For example you can define "count" constant by using the
  354. directive "count = 17", and then use it in the assembly instructions, like
  355. "mov cx,count" - which will become "mov cx,17" during the compilation process.
  356.   There are different ways to define labels. The simplest is to follow the
  357. name of label by the colon, this directive can even be followed by the other
  358. instruction in the same line. It defines the label whose value is equal to
  359. offset of the point where it's defined. This method is usually used to label
  360. the places in code. The other way is to follow the name of label (without a
  361. colon) by some data directive. It defines the label with value equal to
  362. offset of the beginning of defined data, and remembered as a label for data
  363. with cell size as specified for that data directive in table 1.3.
  364.   The label can be treated as constant of value equal to offset of labeled
  365. code or data. For example when you define data using the labeled directive
  366. "char db 224", to put the offset of this data into BX register you should use
  367. "mov bx,char" instruction, and to put the value of byte addressed by "char"
  368. label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
  369. But when you try to assemble "mov ax,[char]", it will cause an error, because
  370. fasm compares the sizes of operands, which should be equal. You can force
  371. assembling that instruction by using size override: "mov ax,word [char]", but
  372. remember that this instruction will read the two bytes beginning at "char"
  373. address, while it was defined as a one byte.
  374.   The last and the most flexible way to define labels is to use "label"
  375. directive. This directive should be followed by the name of label, then
  376. optionally size operator (it can be preceded by a colon) and then - also
  377. optionally "at" operator and the numerical expression defining the address at
  378. which this label should be defined. For example "label wchar word at char"
  379. will define a new label for the 16-bit data at the address of "char". Now the
  380. instruction "mov ax,[wchar]" will be after compilation the same as
  381. "mov ax,word [char]". If no address is specified, "label" directive defines
  382. the label at current offset. Thus "mov [wchar],57568" will copy two bytes
  383. while "mov [char],224" will copy one byte to the same address.
  384.   The label whose name begins with dot is treated as local label, and its name
  385. is attached to the name of last global label (with name beginning with
  386. anything but dot) to make the full name of this label. So you can use the
  387. short name (beginning with dot) of this label anywhere before the next global
  388. label is defined, and in the other places you have to use the full name. Label
  389. beginning with two dots are the exception - they are like global, but they
  390. don't become the new prefix for local labels.
  391.   The "@@" name means anonymous label, you can have defined many of them in
  392. the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
  393. anonymous label, symbol "@f" references the nearest following anonymous label.
  394. These special symbol are case-insensitive.
  395.  
  396.  
  397. 1.2.4  Numerical expressions
  398.  
  399. In the above examples all the numerical expressions were the simple numbers,
  400. constants or labels. But they can be more complex, by using the arithmetical
  401. or logical operators for calculations at compile time. All these operators
  402. with their priority values are listed in table 1.4.
  403. The operations with higher priority value will be calculated first, you can
  404. of course change this behavior by putting some parts of expression into
  405. parenthesis. The "+", "-", "*" and "/" are standard arithmetical operations,
  406. "mod" calculates the remainder from division. The "and", "or", "xor", "shl",
  407. "shr" and "not" perform the same logical operations as assembly instructions
  408. of those names. The "rva" performs the conversion of an address into the
  409. relocatable offset and is specific to some of the output formats (see 2.4).
  410.   The numbers in the expression are by default treated as a decimal, binary
  411. numbers should have the "b" letter attached at the end, octal number should
  412. end with "o" letter, hexadecimal numbers should begin with "0x" characters
  413. (like in C language) or with the "$" character (like in Pascal language) or
  414. they should end with "h" letter. Also quoted string, when encountered in
  415. expression, will be converted into number - the first character will become
  416. the least significant byte of number.
  417.   The numerical expression used as an address value can also contain any of
  418. general registers used for addressing, they can be added and multiplied by
  419. appropriate values, as it is allowed for the x86 architecture instructions.
  420.   There are also some special symbols that can be used inside the numerical
  421. expression. First is "$", which is always equal to the value of current
  422. offset, while "$$" is equal to base address of current addressing space. The
  423. other one is "%", which is the number of current repeat in parts of code that
  424. are repeated using some special directives (see 2.2). There's also "%t"
  425. symbol, which is always equal to the current time stamp.
  426.   Any numerical expression can also consist of single floating point value
  427. (flat assembler does not allow any floating point operations at compilation
  428. time) in the scientific notation, they can end with the "f" letter to be
  429. recognized, otherwise they should contain at least one of the "." or "E"
  430. characters. So "1.0", "1E0" and "1f" define the same floating point value,
  431. while simple "1" defines an integer value.
  432.  
  433.    Table 1.4  Arithmetical and logical operators by priority
  434.   ┌──────────┬──────────────┐
  435.   │ Priority │ Operators    │
  436.   ╞══════════╪══════════════╡
  437.   │ 0        │ +  -         │
  438.   ├──────────┼──────────────┤
  439.   │ 1        │ *  /         │
  440.   ├──────────┼──────────────┤
  441.   │ 2        │ mod          │
  442.   ├──────────┼──────────────┤
  443.   │ 3        │ and  or  xor │
  444.   ├──────────┼──────────────┤
  445.   │ 4        │ shl  shr     │
  446.   ├──────────┼──────────────┤
  447.   │ 5        │ not          │
  448.   ├──────────┼──────────────┤
  449.   │ 6        │ rva          │
  450.   └──────────┴──────────────┘
  451.  
  452.  
  453. 1.2.5  Jumps and calls
  454.  
  455. The operand of any jump or call instruction can be preceded not only by the
  456. size operator, but also by one of the operators specifying type of the jump:
  457. "short", "near" of "far". For example, when assembler is in 16-bit mode,
  458. instruction "jmp dword [0]" will become the far jump and when assembler is
  459. in 32-bit mode, it will become the near jump. To force this instruction to be
  460. treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
  461.   When operand of near jump is the immediate value, assembler will generate
  462. the shortest variant of this jump instruction if possible (but won't create
  463. 32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
  464. unless there is a size operator stating it). By specifying the jump type
  465. you can force it to always generate long variant (for example "jmp near 0")
  466. or to always generate short variant and terminate with an error when it's
  467. impossible (for example "jmp short 0").
  468.  
  469.  
  470. 1.2.6  Size settings
  471.  
  472. When instruction uses some memory addressing, by default the smallest form of
  473. instruction is generated by using the short displacement if only address
  474. value fits in the range. This can be overridden using the "word" or "dword"
  475. operator before the address inside the square brackets (or after the "ptr"
  476. operator), which forces the long displacement of appropriate size to be made.
  477. In case when address is not relative to any registers, those operators allow
  478. also to choose the appropriate mode of absolute addressing.
  479.   Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
  480. first operand being 16-bit or 32-bit are by default generated in shortened
  481. 8-bit form when the second operand is immediate value fitting in the range
  482. for signed 8-bit values. It also can be overridden by putting the "word" or
  483. "dword" operator before the immediate value. The similar rules applies to the
  484. "imul" instruction with the last operand being immediate value.
  485.   Immediate value as an operand for "push" instruction without a size operator
  486. is by default treated as a word value if assembler is in 16-bit mode and as a
  487. double word value if assembler is in 32-bit mode, shorter 8-bit form of this
  488. instruction is used if possible, "word" or "dword" size operator forces the
  489. "push" instruction to be generated in longer form for specified size. "pushw"
  490. and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
  491. without forcing it to use the longer form of instruction.
  492.  
  493.  
  494. Chapter 2  Instruction set
  495. ──────────────────────────
  496.  
  497. This chapter provides the detailed information about the instructions and
  498. directives supported by flat assembler. Directives for defining labels were
  499. already discussed in 1.2.3, all other directives will be described later in
  500. this chapter.
  501.  
  502.  
  503. 2.1  The x86 architecture instructions
  504.  
  505. In this section you can find both the information about the syntax and
  506. purpose the assembly language instructions. If you need more technical
  507. information, look for the Intel Architecture Software Developer's Manual.
  508.   Assembly instructions consist of the mnemonic (instruction's name) and from
  509. zero to three operands. If there are two or more operands, usually first is
  510. the destination operand and second is the source operand. Each operand can be
  511. register, memory or immediate value (see 1.2 for details about syntax of
  512. operands). After the description of each instruction there are examples
  513. of different combinations of operands, if the instruction has any.
  514.   Some instructions act as prefixes and can be followed by other instruction
  515. in the same line, and there can be more than one prefix in a line. Each name
  516. of the segment register is also a mnemonic of instruction prefix, altough it
  517. is recommended to use segment overrides inside the square brackets instead of
  518. these prefixes.
  519.  
  520.  
  521. 2.1.1  Data movement instructions
  522.  
  523. "mov" transfers a byte, word or double word from the source operand to the
  524. destination operand. It can transfer data between general registers, from
  525. the general register to memory, or from memory to general register, but it
  526. cannot move from memory to memory. It can also transfer an immediate value to
  527. general register or memory, segment register to general register or memory,
  528. general register or memory to segment register, control or debug register to
  529. general register and general register to control or debug register. The "mov"
  530. can be assembled only if the size of source operand and size of destination
  531. operand are the same. Below are the examples for each of the allowed
  532. combinations:
  533.  
  534.     mov bx,ax       ; general register to general register
  535.     mov [char],al   ; general register to memory
  536.     mov bl,[char]   ; memory to general register
  537.     mov dl,32       ; immediate value to general register
  538.     mov [char],32   ; immediate value to memory
  539.     mov ax,ds       ; segment register to general register
  540.     mov [bx],ds     ; segment register to memory
  541.     mov ds,ax       ; general register to segment register
  542.     mov ds,[bx]     ; memory to segment register
  543.     mov eax,cr0     ; control register to general register
  544.     mov cr3,ebx     ; general register to control register
  545.  
  546.   "xchg" swaps the contents of two operands. It can swap two byte operands,
  547. two word operands or two double word operands. Order of operands is not
  548. important. The operands may be two general registers, or general register
  549. with memory. For example:
  550.  
  551.     xchg ax,bx      ; swap two general registers
  552.     xchg al,[char]  ; swap register with memory
  553.  
  554.   "push" decrements the stack frame pointer (ESP register), then transfers
  555. the operand to the top of stack indicated by ESP. The operand can be memory,
  556. general register, segment register or immediate value of word or double word
  557. size. If operand is an immediate value and no size is specified, it is by
  558. default treated as a word value if assembler is in 16-bit mode and as a double
  559. word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
  560. variants of this instruction that store the values of word or double word size
  561. respectively. If more operands follow in the same line (separated only with
  562. spaces, not commas), compiler will assemble chain of the "push" instructions
  563. with these operands. The examples are with single operands:
  564.  
  565.     push ax         ; store general register
  566.     push es         ; store segment register
  567.     pushw [bx]      ; store memory
  568.     push 1000h      ; store immediate value
  569.  
  570.   "pusha" saves the contents of the eight general register on the stack.
  571. This instruction has no operands. There are two version of this instruction,
  572. one 16-bit and one 32-bit, assembler automatically generates the appropriate
  573. version for current mode, but it can be overridden by using "pushaw" or
  574. "pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
  575. version of this instruction pushes general registers on the stack in the
  576. following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
  577. BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
  578. in the same order.
  579.   "pop" transfers the word or double word at the current top of stack to the
  580. destination operand, and then increments ESP to point to the new top of stack.
  581. The operand can be memory, general register or segment register. "popw" and
  582. "popd" mnemonics are variants of this instruction for restoring the values of
  583. word or double word size respectively. If more operands separated with spaces
  584. follow in the same line, compiler will assemble chain of the "pop"
  585. instructions with these operands.
  586.  
  587.     pop bx          ; restore general register
  588.     pop ds          ; restore segment register
  589.     popw [si]       ; restore memory
  590.  
  591.   "popa" restores the registers saved on the stack by "pusha" instruction,
  592. except for the saved value of SP (or ESP), which is ignored. This instruction
  593. has no operands. To force assembling 16-bit or 32-bit version of this
  594. instruction use "popaw" or "popad" mnemonic.
  595.  
  596.  
  597. 2.1.2  Type conversion instructions
  598.  
  599. The type conversion instructions convert bytes into words, words into double
  600. words, and double words into quad words. These conversions can be done using
  601. the sign extension or zero extension. The sign extension fills the extra bits
  602. of the larger item with the value of the sign bit of the smaller item, the
  603. zero extension simply fills them with zeros.
  604.   "cwd" and "cdq" double the size of value AX or EAX register respectively
  605. and store the extra bits into the DX or EDX register. The conversion is done
  606. using the sign extension. These instructions have no operands.
  607.   "cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
  608. the sign of the word in AX throughout EAX. These instructions also have no
  609. operands.
  610.   "movsx" converts a byte to word or double word and a word to double word
  611. using the sign extension. "movzx" does the same, but it uses the zero
  612. extension. The source operand can be general register or memory, while the
  613. destination operand must be a general register. For example:
  614.  
  615.     movsx ax,al         ; byte register to word register
  616.     movsx edx,dl        ; byte register to double word register
  617.     movsx eax,ax        ; word register to double word register
  618.     movsx ax,byte [bx]  ; byte memory to word register
  619.     movsx edx,byte [bx] ; byte memory to double word register
  620.     movsx eax,word [bx] ; word memory to double word register
  621.  
  622.  
  623. 2.1.3  Binary arithmetic instructions
  624.  
  625. "add" replaces the destination operand with the sum of the source and
  626. destination operands and sets CF if overflow has occurred. The operands may
  627. be bytes, words or double words. The destination operand can be general
  628. register or memory, the source operand can be general register or immediate
  629. value, it can also be memory if the destination operand is register.
  630.  
  631.     add ax,bx       ; add register to register
  632.     add ax,[si]     ; add memory to register
  633.     add [di],al     ; add register to memory
  634.     add al,48       ; add immediate value to register
  635.     add [char],48   ; add immediate value to memory
  636.  
  637.   "adc" sums the operands, adds one if CF is set, and replaces the destination
  638. operand with the result. Rules for the operands are the same as for the "add"
  639. instruction. An "add" followed by multiple "adc" instructions can be used to
  640. add numbers longer than 32 bits.
  641.   "inc" adds one to the operand, it does not affect CF. The operand can be a
  642. general register or memory, and the size of the operand can be byte, word or
  643. double word.
  644.  
  645.     inc ax          ; increment register by one
  646.     inc byte [bx]   ; increment memory by one
  647.  
  648.   "sub" subtracts the source operand from the destination operand and replaces
  649. the destination operand with the result. If a borrow is required, the CF is
  650. set. Rules for the operands are the same as for the "add" instruction.
  651.   "sbb" subtracts the source operand from the destination operand, subtracts
  652. one if CF is set, and stores the result to the destination operand. Rules for
  653. the operands are the same as for the "add" instruction. A "sub" followed by
  654. multiple "sbb" instructions may be used to subtract numbers longer than 32
  655. bits.
  656.   "dec" subtracts one from the operand, it does not affect CF. Rules for the
  657. operand are the same as for the "inc" instruction.
  658.   "cmp" subtracts the source operand from the destination operand. It updates
  659. the flags as the "sub" instruction, but does not alter the source and
  660. destination operands. Rules for the operands are the same as for the "sub"
  661. instruction.
  662.   "neg" subtracts a signed integer operand from zero. The effect of this
  663. instructon is to reverse the sign of the operand from positive to negative or
  664. from negative to positive. Rules for the operand are the same as for the "inc"
  665. instruction.
  666.   "xadd" exchanges the destination operand with the source operand, then loads
  667. the sum of the two values into the destination operand. Rules for the operands
  668. are the same as for the "add" instruction.
  669.   All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
  670. SF is always set to the same value as the result's sign bit, ZF is set when
  671. all the bits of result are zero, PF is set when low order eight bits of result
  672. contain an even number of set bits, OF is set if result is too large for a
  673. positive number or too small for a negative number (excluding sign bit) to fit
  674. in destination operand.
  675.   "mul" performs an unsigned multiplication of the operand and the
  676. accumulator. If the operand is a byte, the processor multiplies it by the
  677. contents of AL and returns the 16-bit result to AH and AL. If the operand is a
  678. word, the processor multiplies it by the contents of AX and returns the 32-bit
  679. result to DX and AX. If the operand is a double word, the processor multiplies
  680. it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
  681. sets CF and OF when the upper half of the result is nonzero, otherwise they
  682. are cleared. Rules for the operand are the same as for the "inc" instruction.
  683.   "imul" performs a signed multiplication operation. This instruction has
  684. three variations. First has one operand and behaves in the same way as the
  685. "mul" instruction. Second has two operands, in this case destination operand
  686. is multiplied by the source operand and the result replaces the destination
  687. operand. Destination operand must be a general register, it can be word or
  688. double word, source operand can be general register, memory or immediate
  689. value. Third form has three operands, the destination operand must be a
  690. general register, word or double word in size, source operand can be general
  691. register or memory, and third operand must be an immediate value. The source
  692. operand is multiplied by the immediate value and the result is stored in the
  693. destination register. All the three forms calculate the product to twice the
  694. size of operands and set CF and OF when the upper half of the result is
  695. nonzero, but second and third form truncate the product to the size of
  696. operands. So second and third forms can be also used for unsigned operands
  697. because, whether the operands are signed or unsigned, the lower half of the
  698. product is the same. Below are the examples for all three forms:
  699.  
  700.     imul bl         ; accumulator by register
  701.     imul word [si]  ; accumulator by memory
  702.     imul bx,cx      ; register by register
  703.     imul bx,[si]    ; register by memory
  704.     imul bx,10      ; register by immediate value
  705.     imul ax,bx,10   ; register by immediate value to register
  706.     imul ax,[si],10 ; memory by immediate value to register
  707.  
  708.   "div" performs an unsigned division of the accumulator by the operand.
  709. The dividend (the accumulator) is twice the size of the divisor (the operand),
  710. the quotient and remainder have the same size as the divisor. If divisor is
  711. byte, the dividend is taken from AX register, the quotient is stored in AL and
  712. the remainder is stored in AH. If divisor is word, the upper half of dividend
  713. is taken from DX, the lower half of dividend is taken from AX, the quotient is
  714. stored in AX and the remainder is stored in DX. If divisor is double word,
  715. the upper half of dividend is taken from EDX, the lower half of dividend is
  716. taken from EAX, the quotient is stored in EAX and the remainder is stored in
  717. EDX. Rules for the operand are the same as for the "mul" instruction.
  718.   "idiv" performs a signed division of the accumulator by the operand.
  719. It uses the same registers as the "div" instruction, and the rules for
  720. the operand are the same.
  721.  
  722.  
  723. 2.1.4  Decimal arithmetic instructions
  724.  
  725. Decimal arithmetic is performed by combining the binary arithmetic
  726. instructions (already described in the prior section) with the decimal
  727. arithmetic instructions. The decimal arithmetic instructions are used to
  728. adjust the results of a previous binary arithmetic operation to produce a
  729. valid packed or unpacked decimal result, or to adjust the inputs to a
  730. subsequent binary arithmetic operation so the operation will produce a valid
  731. packed or unpacked decimal result.
  732.   "daa" adjusts the result of adding two valid packed decimal operands in
  733. AL. "daa" must always follow the addition of two pairs of packed decimal
  734. numbers (one digit in each half-byte) to obtain a pair of valid packed
  735. decimal digits as results. The carry flag is set if carry was needed.
  736. This instruction has no operands.
  737.   "das" adjusts the result of subtracting two valid packed decimal operands
  738. in AL. "das" must always follow the subtraction of one pair of packed decimal
  739. numbers (one digit in each half-byte) from another to obtain a pair of valid
  740. packed decimal digits as results. The carry flag is set if a borrow was
  741. needed. This instruction has no operands.
  742.   "aaa" changes the contents of register AL to a valid unpacked decimal
  743. number, and zeroes the top four bits. "aaa" must always follow the addition
  744. of two unpacked decimal operands in AL. The carry flag is set and AH is
  745. incremented if a carry is necessary. This instruction has no operands.
  746.   "aas" changes the contents of register AL to a valid unpacked decimal
  747. number, and zeroes the top four bits. "aas" must always follow the
  748. subtraction of one unpacked decimal operand from another in AL. The carry flag
  749. is set and AH decremented if a borrow is necessary. This instruction has no
  750. operands.
  751.   "aam" corrects the result of a multiplication of two valid unpacked decimal
  752. numbers. "aam" must always follow the multiplication of two decimal numbers
  753. to produce a valid decimal result. The high order digit is left in AH, the
  754. low order digit in AL. The generalized version of this instruction allows
  755. adjustment of the contents of the AX to create two unpacked digits of any
  756. number base. The standard version of this instruction has no operands, the
  757. generalized version has one operand - an immediate value specifying the
  758. number base for the created digits.
  759.   "aad" modifies the numerator in AH and AL to prepare for the division of two
  760. valid unpacked decimal operands so that the quotient produced by the division
  761. will be a valid unpacked decimal number. AH should contain the high order
  762. digit and AL the low order digit. This instruction adjusts the value and
  763. places the result in AL, while AH will contain zero. The generalized version
  764. of this instruction allows adjustment of two unpacked digits of any number
  765. base. Rules for the operand are the same as for the "aam" instruction.
  766.  
  767.  
  768. 2.1.5  Logical instructions
  769.  
  770. "not" inverts the bits in the specified operand to form a one's
  771. complement of the operand. It has no effect on the flags. Rules for the
  772. operand are the same as for the "inc" instruction.
  773.   "and", "or" and "xor" instructions perform the standard
  774. logical operations. They update the SF, ZF and PF flags. Rules for the
  775. operands are the same as for the "add" instruction.
  776.   "bt", "bts", "btr" and "btc" instructions operate on a single bit which can
  777. be in memory or in a general register. The location of the bit is specified
  778. as an offset from the low order end of the operand. The value of the offset
  779. is the taken from the second operand, it either may be an immediate byte or
  780. a general register. These instructions first assign the value of the selected
  781. bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
  782. 1, "btr" resets the selected bit to 0, "btc" changes the bit to its
  783. complement. The first operand can be word or double word.
  784.  
  785.     bt  ax,15        ; test bit in register
  786.     bts word [bx],15 ; test and set bit in memory
  787.     btr ax,cx        ; test and reset bit in register
  788.     btc word [bx],cx ; test and complement bit in memory
  789.  
  790.   "bsf" and "bsr" instructions scan a word or double word for first set bit
  791. and store the index of this bit into destination operand, which must be
  792. general register. The bit string being scanned is specified by source operand,
  793. it may be either general register or memory. The ZF flag is set if the entire
  794. string is zero (no set bits are found); otherwise it is cleared. If no set bit
  795. is found, the value of the destination register is undefined. "bsf" scans from
  796. low order to high order (starting from bit index zero). "bsr" scans from high
  797. order to low order (starting from bit index 15 of a word or index 31 of a
  798. double word).
  799.  
  800.     bsf ax,bx        ; scan register forward
  801.     bsr ax,[si]      ; scan memory reverse
  802.  
  803.   "shl" shifts the destination operand left by the number of bits specified
  804. in the second operand. The destination operand can be byte, word, or double
  805. word general register or memory. The second operand can be an immediate value
  806. or the CL register. The processor shifts zeros in from the right (low order)
  807. side of the operand as bits exit from the left side. The last bit that exited
  808. is stored in CF. "sal" is a synonym for "shl".
  809.  
  810.     shl al,1         ; shift register left by one bit
  811.     shl byte [bx],1  ; shift memory left by one bit
  812.     shl ax,cl        ; shift register left by count from cl
  813.     shl word [bx],cl ; shift memory left by count from cl
  814.  
  815.   "shr" and "sar" shift the destination operand right by the number of bits
  816. specified in the second operand. Rules for operands are the same as for the
  817. "shl" instruction. "shr" shifts zeros in from the left side of the operand as
  818. bits exit from the right side. The last bit that exited is stored in CF.
  819. "sar" preserves the sign of the operand by shifting in zeros on the left side
  820. if the value is positive or by shifting in ones if the value is negative.
  821.   "shld" shifts bits of the destination operand to the left by the number
  822. of bits specified in third operand, while shifting high order bits from the
  823. source operand into the destination operand on the right. The source operand
  824. remains unmodified. The destination operand can be a word or double word
  825. general register or memory, the source operand must be a general register,
  826. third operand can be an immediate value or the CL register.
  827.  
  828.     shld ax,bx,1     ; shift register left by one bit
  829.     shld [di],bx,1   ; shift memory left by one bit
  830.     shld ax,bx,cl    ; shift register left by count from cl
  831.     shld [di],bx,cl  ; shift memory left by count from cl
  832.  
  833.   "shrd" shifts bits of the destination operand to the right, while shifting
  834. low order bits from the source operand into the destination operand on the
  835. left. The source operand remains unmodified. Rules for operands are the same
  836. as for the "shld" instruction.
  837.   "rol" and "rcl" rotate the byte, word or double word destination operand
  838. left by the number of bits specified in the second operand. For each rotation
  839. specified, the high order bit that exits from the left of the operand returns
  840. at the right to become the new low order bit. "rcl" additionally puts in CF
  841. each high order bit that exits from the left side of the operand before it
  842. returns to the operand as the low order bit on the next rotation cycle. Rules
  843. for operands are the same as for the "shl" instruction.
  844.   "ror" and "rcr" rotate the byte, word or double word destination operand
  845. right by the number of bits specified in the second operand. For each rotation
  846. specified, the low order bit that exits from the right of the operand returns
  847. at the left to become the new high order bit. "rcr" additionally puts in CF
  848. each low order bit that exits from the right side of the operand before it
  849. returns to the operand as the high order bit on the next rotation cycle.
  850. Rules for operands are the same as for the "shl" instruction.
  851.   "test" performs the same action as the "and" instruction, but it does not
  852. alter the destination operand, only updates flags. Rules for the operands are
  853. the same as for the "and" instruction.
  854.   "bswap" reverses the byte order of a 32-bit general register: bits 0 through
  855. 7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
  856. bits 16 through 23. This instruction is provided for converting little-endian
  857. values to big-endian format and vice versa.
  858.  
  859.     bswap edx        ; swap bytes in register
  860.  
  861.  
  862. 2.1.6  Control transfer instructions
  863.  
  864. "jmp" unconditionally transfers control to the target location. The
  865. destination address can be specified directly within the instruction or
  866. indirectly through a register or memory, the acceptable size of this address
  867. depends on whether the jump is near or far (it can be specified by preceding
  868. the operand with "near" or "far" operator) and whether the instruction is
  869. 16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
  870. instruction or the "dword" size for 32-bit instruction. Operand for far jump
  871. should be "dword" size for 16-bit instruction or "pword" size for 32-bit
  872. instruction. A direct "jmp" instruction includes the destination address as
  873. part of the instruction (and can be preceded by "short", "near" or "far"
  874. operator), the operand specifying address should be the numerical expression
  875. for near or short jump, or two numerical expressions separated with colon for
  876. far jump, the first specifies selector of segment, the second is the offset
  877. within segment. The "pword" operator can be used to force the 32-bit far call,
  878. and "dword" to force the 16-bit far call. An indirect "jmp" instruction
  879. obtains the destination address indirectly through a register or a pointer
  880. variable, the operand should be general register or memory. See also 1.2.5 for
  881. some more details.
  882.  
  883.     jmp 100h         ; direct near jump
  884.     jmp 0FFFFh:0     ; direct far jump
  885.     jmp ax           ; indirect near jump
  886.     jmp pword [ebx]  ; indirect far jump
  887.  
  888.   "call" transfers control to the procedure, saving on the stack the address
  889. of the instruction following the "call" for later use by a "ret" (return)
  890. instruction. Rules for the operands are the same as for the "jmp" instruction,
  891. but the "call" has no short variant of direct instruction and thus it not
  892. optimized.
  893.   "ret", "retn" and "retf" instructions terminate the execution of a procedure
  894. and transfers control back to the program that originally invoked the
  895. procedure using the address that was stored on the stack by the "call"
  896. instruction. "ret" is the equivalent for "retn", which returns from the
  897. procedure that was executed using the near call, while "retf" returns from
  898. the procedure that was executed using the far call. These instructions default
  899. to the size of address appropriate for the current code setting, but the size
  900. of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
  901. mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
  902. All these instructions may optionally specify an immediate operand, by adding
  903. this constant to the stack pointer, they effectively remove any arguments that
  904. the calling program pushed on the stack before the execution of the "call"
  905. instruction.
  906.   "iret" returns control to an interrupted procedure. It differs from "ret" in
  907. that it also pops the flags from the stack into the flags register. The flags
  908. are stored on the stack by the interrupt mechanism. It defaults to the size of
  909. return address appropriate for the current code setting, but it can be forced
  910. to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
  911.   The conditional transfer instructions are jumps that may or may not transfer
  912. control, depending on the state of the CPU flags when the instruction
  913. executes. The mnemonics for conditional jumps may be obtained by attaching
  914. the condition mnemonic (see table 2.1) to the "j" mnemonic,
  915. for example "jc" instruction will transfer the control when the CF flag is
  916. set. The conditional jumps can be short or near, and direct only, and can be
  917. optimized (see 1.2.5), the operand should be an immediate value specifying
  918. target address.
  919.  
  920.    Table 2.1  Conditions
  921.   ┌──────────┬───────────────────────┬────────────────────────┐
  922.   │ Mnemonic │ Condition tested      │ Description            │
  923.   ╞══════════╪═══════════════════════╪════════════════════════╡
  924.   │ o        │ OF = 1                │ overflow               │
  925.   ├──────────┼───────────────────────┼────────────────────────┤
  926.   │ no       │ OF = 0                │ not overflow           │
  927.   ├──────────┼───────────────────────┼────────────────────────┤
  928.   │ c        │                       │ carry                  │
  929.   │ b        │ CF = 1                │ below                  │
  930.   │ nae      │                       │ not above nor equal    │
  931.   ├──────────┼───────────────────────┼────────────────────────┤
  932.   │ nc       │                       │ not carry              │
  933.   │ ae       │ CF = 0                │ above or equal         │
  934.   │ nb       │                       │ not below              │
  935.   ├──────────┼───────────────────────┼────────────────────────┤
  936.   │ e        │ ZF = 1                │ equal                  │
  937.   │ z        │                       │ zero                   │
  938.   ├──────────┼───────────────────────┼────────────────────────┤
  939.   │ ne       │ ZF = 0                │ not equal              │
  940.   │ nz       │                       │ not zero               │
  941.   ├──────────┼───────────────────────┼────────────────────────┤
  942.   │ be       │ CF or ZF = 1          │ below or equal         │
  943.   │ na       │                       │ not above              │
  944.   ├──────────┼───────────────────────┼────────────────────────┤
  945.   │ a        │ CF or ZF = 0          │ above                  │
  946.   │ nbe      │                       │ not below nor equal    │
  947.   ├──────────┼───────────────────────┼────────────────────────┤
  948.   │ s        │ SF = 1                │ sign                   │
  949.   ├──────────┼───────────────────────┼────────────────────────┤
  950.   │ ns       │ SF = 0                │ not sign               │
  951.   ├──────────┼───────────────────────┼────────────────────────┤
  952.   │ p        │ PF = 1                │ parity                 │
  953.   │ pe       │                       │ parity even            │
  954.   ├──────────┼───────────────────────┼────────────────────────┤
  955.   │ np       │ PF = 0                │ not parity             │
  956.   │ po       │                       │ parity odd             │
  957.   ├──────────┼───────────────────────┼────────────────────────┤
  958.   │ l        │ SF xor OF = 1         │ less                   │
  959.   │ nge      │                       │ not greater nor equal  │
  960.   ├──────────┼───────────────────────┼────────────────────────┤
  961.   │ ge       │ SF xor OF = 0         │ greater or equal       │
  962.   │ nl       │                       │ not less               │
  963.   ├──────────┼───────────────────────┼────────────────────────┤
  964.   │ le       │ (SF xor OF) or ZF = 1 │ less or equal          │
  965.   │ ng       │                       │ not greater            │
  966.   ├──────────┼───────────────────────┼────────────────────────┤
  967.   │ g        │ (SF xor OF) or ZF = 0 │ greater                │
  968.   │ nle      │                       │ not less nor equal     │
  969.   └──────────┴───────────────────────┴────────────────────────┘
  970.  
  971.   The "loop" instructions are conditional jumps that use a value placed in
  972. CX (or ECX) to specify the number of repetitions of a software loop. All
  973. "loop" instructions automatically decrement CX (or ECX) and terminate the
  974. loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
  975. whether the current code setting is 16-bit or 32-bit, but it can be forced to
  976. us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
  977. "loope" and "loopz" are the synonyms for the same instruction, which acts as
  978. the standard "loop", but also terminates the loop when ZF flag is set.
  979. "loopew" and "loopzw" mnemonics force them to use CX register while "looped"
  980. and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
  981. synonyms for the same instructions, which acts as the standard "loop", but
  982. also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
  983. mnemonics force them to use CX register while "loopned" and "loopnzd" force
  984. them to use ECX register. Every "loop" instruction needs an operand being an
  985. immediate value specifying target address, it can be only short jump (in the
  986. range of 128 bytes back and 127 bytes forward from the address of instruction
  987. following the "loop" instruction).
  988.   "jcxz" branches to the label specified in the instruction if it finds a
  989. value of zero in CX, "jecxz" does the same, but checks the value of ECX
  990. instead of CX. Rules for the operands are the same as for the "loop"
  991. instruction.
  992.   "int" activates the interrupt service routine that corresponds to the
  993. number specified as an operand to the instruction, the number should be in
  994. range from 0 to 255. The interrupt service routine terminates with an "iret"
  995. instruction that returns control to the instruction that follows "int".
  996. "int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
  997. "into" instruction invokes the interrupt 4 if the OF flag is set.
  998.   "bound" verifies that the signed value contained in the specified register
  999. lies within specified limits. An interrupt 5 occurs if the value contained in
  1000. the register is less than the lower bound or greater than the upper bound. It
  1001. needs two operands, the first operand specifies the register being tested,
  1002. the second operand should be memory address for the two signed limit values.
  1003. The operands can be "word" or "dword" in size.
  1004.  
  1005.     bound ax,[bx]    ; check word for bounds
  1006.     bound eax,[esi]  ; check double word for bounds
  1007.  
  1008.  
  1009. 2.1.7  I/O instructions
  1010.  
  1011.   "in" transfers a byte, word, or double word from an input port to AL, AX,
  1012. or EAX. I/O ports can be addressed either directly, with the immediate byte
  1013. value coded in instruction, or indirectly via the DX register. The destination
  1014. operand should be AL, AX, or EAX register. The source operand should be an
  1015. immediate value in range from 0 to 255, or DX register.
  1016.  
  1017.     in al,20h        ; input byte from port 20h
  1018.     in ax,dx         ; input word from port addressed by dx
  1019.  
  1020.   "out" transfers a byte, word, or double word to an output port from AL, AX,
  1021. or EAX. The program can specify the number of the port using the same methods
  1022. as the "in" instruction. The destination operand should be an immediate value
  1023. in range from 0 to 255, or DX register. The source operand should be AL, AX,
  1024. or EAX register.
  1025.  
  1026.     out 20h,ax       ; output word to port 20h
  1027.     out dx,al        ; output byte to port addressed by dx
  1028.  
  1029.  
  1030. 2.1.8  Strings operations
  1031.  
  1032. The string operations operate on one element of a string. A string element
  1033. may be a byte, a word, or a double word. The string elements are addressed by
  1034. SI and DI (or ESI and EDI) registers. After every string operation SI and/or
  1035. DI (or ESI and/or EDI) are automatically updated to point to the next element
  1036. of the string. If DF (direction flag) is zero, the index registers are
  1037. incremented, if DF is one, they are decremented. The amount of the increment
  1038. or decrement is 1, 2, or 4 depending on the size of the string element. Every
  1039. string operation instruction has short forms which have no operands and use
  1040. SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
  1041. type is 32-bit. SI and ESI by default address data in the segment selected
  1042. by DS, DI and EDI always address data in the segment selected by ES. Short
  1043. form is obtained by attaching to the mnemonic of string operation letter
  1044. specifying the size of string element, it should be "b" for byte element,
  1045. "w" for word element, and "d" for double word element. Full form of string
  1046. operation needs operands providing the size operator and the memory addresses,
  1047. which can be SI or ESI with any segment prefix, DI or EDI always with ES
  1048. segment prefix.
  1049.   "movs" transfers the string element pointed to by SI (or ESI) to the
  1050. location pointed to by DI (or EDI). Size of operands can be byte, word, or
  1051. double word. The destination operand should be memory addressed by DI or EDI,
  1052. the source operand should be memory addressed by SI or ESI with any segment
  1053. prefix.
  1054.  
  1055.     movs byte [di],[si]        ; transfer byte
  1056.     movs word [es:di],[ss:si]  ; transfer word
  1057.     movsd                      ; transfer double word
  1058.  
  1059.   "cmps" subtracts the destination string element from the source string
  1060. element and updates the flags AF, SF, PF, CF and OF, but it does not change
  1061. any of the compared elements. If the string elements are equal, ZF is set,
  1062. otherwise it is cleared. The first operand for this instruction should be the
  1063. source string element addressed by SI or ESI with any segment prefix, the
  1064. second operand should be the destination string element addressed by DI or
  1065. EDI.
  1066.  
  1067.     cmpsb                      ; compare bytes
  1068.     cmps word [ds:si],[es:di]  ; compare words
  1069.     cmps dword [fs:esi],[edi]  ; compare double words
  1070.  
  1071.   "scas" subtracts the destination string element from AL, AX, or EAX
  1072. (depending on the size of string element) and updates the flags AF, SF, ZF,
  1073. PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
  1074. The operand should be the destination string element addressed by DI or EDI.
  1075.  
  1076.     scas byte [es:di]          ; scan byte
  1077.     scasw                      ; scan word
  1078.     scas dword [es:edi]        ; scan double word
  1079.  
  1080.   "stos" places the value of AL, AX, or EAX into the destination string
  1081. element. Rules for the operand are the same as for the "scas" instruction.
  1082.   "lods" places the source string element into AL, AX, or EAX. The operand
  1083. should be the source string element addressed by SI or ESI with any segment
  1084. prefix.
  1085.  
  1086.     lods byte [ds:si]          ; load byte
  1087.     lods word [cs:si]          ; load word
  1088.     lodsd                      ; load double word
  1089.  
  1090.   "ins" transfers a byte, word, or double word from an input port addressed
  1091. by DX register to the destination string element. The destination operand
  1092. should be memory addressed by DI or EDI, the source operand should be the DX
  1093. register.
  1094.  
  1095.     insb                       ; input byte
  1096.     ins word [es:di],dx        ; input word
  1097.     ins dword [edi],dx         ; input double word
  1098.  
  1099.   "outs" transfers the source string element to an output port addressed by
  1100. DX register. The destination operand should be the DX register and the source
  1101. operand should be memory addressed by SI or ESI with any segment prefix.
  1102.  
  1103.     outs dx,byte [si]          ; output byte
  1104.     outsw                      ; output word
  1105.     outs dx,dword [gs:esi]     ; output double word
  1106.  
  1107.   The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify
  1108. repeated string operation. When a string operation instruction has a repeat
  1109. prefix, the operation is executed repeatedly, each time using a different
  1110. element of the string. The repetition terminates when one of the conditions
  1111. specified by the prefix is satisfied. All three prefixes automatically
  1112. decrease CX or ECX register (depending whether string operation instruction
  1113. uses the 16-bit or 32-bit addressing) after each operation and repeat the
  1114. associated operation until CX or ECX is zero. "repe"/"repz" and
  1115. "repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
  1116. (described below). When these prefixes are used, repetition of the next
  1117. instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
  1118. the execution when the ZF is zero, "repne" and "repnz" terminate the execution
  1119. when the ZF is set.
  1120.  
  1121.     rep  movsd       ; transfer multiple double words
  1122.     repe cmpsb       ; compare bytes until not equal
  1123.  
  1124.  
  1125. 2.1.9  Flag control instructions
  1126.  
  1127. The flag control instructions provide a method for directly changing the
  1128. state of bits in the flag register. All instructions described in this
  1129. section have no operands.
  1130.   "stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
  1131. CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
  1132. the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
  1133. interrupts, "cli" zeroes the IF and therefore disables the interrupts.
  1134.   "lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
  1135. AH register. The contents of the remaining bits are undefined. The flags
  1136. remain unaffected.
  1137.   "sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
  1138. AF, PF, and CF.
  1139.   "pushf" decrements "esp" by two or four and stores the low word or
  1140. double word of flags register at the top of stack, size of stored data
  1141. depends on the current code setting. "pushfw" variant forces storing the
  1142. word and "pushfd" forces storing the double word.
  1143.   "popf" transfers specific bits from the word or double word at the top
  1144. of stack, then increments "esp" by two or four, this value depends on
  1145. the current code setting. "popfw" variant forces restoring from the word
  1146. and "popfd" forces restoring from the double word.
  1147.  
  1148.  
  1149. 2.1.10  Conditional operations
  1150.  
  1151.   The instructions obtained by attaching the condition mnemonic (see table
  1152. 2.1) to the "set" mnemonic set a byte to one if the condition is true and set
  1153. the byte to zero otherwise. The operand should be an 8-bit be general register
  1154. or the byte in memory.
  1155.  
  1156.     setne al         ; set al if zero flag cleared
  1157.     seto byte [bx]   ; set byte if overflow
  1158.  
  1159.   "salc" instruction sets the all bits of AL register when the carry flag is
  1160. set and zeroes the AL register otherwise. This instruction has no arguments.
  1161.   The instructions obtained by attaching the condition mnemonic to the "cmov"
  1162. mnemonic transfer the word or double word from the general register or memory
  1163. to the general register only when the condition is true. The destination
  1164. operand should be general register, the source operand can be general register
  1165. or memory.
  1166.  
  1167.     cmove ax,bx      ; move when zero flag set
  1168.     cmovnc eax,[ebx] ; move when carry flag cleared
  1169.  
  1170.   "cmpxchg" compares the value in the AL, AX, or EAX register with the
  1171. destination operand. If the two values are equal, the source operand is
  1172. loaded into the destination operand. Otherwise, the destination operand is
  1173. loaded into the AL, AX, or EAX register. The destination operand may be a
  1174. general register or memory, the source operand must be a general register.
  1175.  
  1176.     cmpxchg dl,bl    ; compare and exchange with register
  1177.     cmpxchg [bx],dx  ; compare and exchange with memory
  1178.  
  1179.   "cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the
  1180. destination operand. If the values are equal, the 64-bit value in ECX and EBX
  1181. registers is stored in the destination operand. Otherwise, the value in the
  1182. destination operand is loaded into EDX and EAX registers. The destination
  1183. operand should be a quad word in memory.
  1184.  
  1185.     cmpxchg8b [bx]   ; compare and exchange 8 bytes
  1186.  
  1187.  
  1188. 2.1.11  Miscellaneous instructions
  1189.  
  1190. "nop" instruction occupies one byte but affects nothing but the instruction
  1191. pointer. This instruction has no operands and doesn't perform any operation.
  1192.   "ud2" instruction generates an invalid opcode exception. This instruction
  1193. is provided for software testing to explicitly generate an invalid opcode.
  1194. This is instruction has no operands.
  1195.   "xlat" replaces a byte in the AL register with a byte indexed by its value
  1196. in a translation table addressed by BX or EBX. The operand should be a byte
  1197. memory addressed by BX or EBX with any segment prefix. This instruction has
  1198. also a short form "xlatb" which has no operands and uses the BX or EBX address
  1199. in the segment selected by DS depending on the current code setting.
  1200.   "lds" transfers a pointer variable from the source operand to DS and the
  1201. destination register. The source operand must be a memory operand, and the
  1202. destination operand must be a general register. The DS register receives the
  1203. segment selector of the pointer while the destination register receives the
  1204. offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
  1205. to "lds" except that rather than DS register the ES, FS, GS and SS is used
  1206. respectively.
  1207.  
  1208.     lds bx,[si]      ; load pointer to ds:bx
  1209.  
  1210.   "lea" transfers the offset of the source operand (rather than its value)
  1211. to the destination operand. The source operand must be a memory operand, and
  1212. the destination operand must be a general register.
  1213.  
  1214.     lea dx,[bx+si+1] ; load effective address to dx
  1215.  
  1216.   "cpuid" returns processor identification and feature information in the
  1217. EAX, EBX, ECX, and EDX registers. The information returned is selected by
  1218. entering a value in the EAX register before the instruction is executed.
  1219. This instruction has no operands.
  1220.   "pause" instruction delays the execution of the next instruction an
  1221. implementation specific amount of time. It can be used to improve the
  1222. performance of spin wait loops. This instruction has no operands.
  1223.   "enter" creates a stack frame that may be used to implement the scope rules
  1224. of block-structured high-level languages. A "leave" instruction at the end of
  1225. a procedure complements an "enter" at the beginning of the procedure to
  1226. simplify stack management and to control access to variables for nested
  1227. procedures. The "enter" instruction includes two parameters. The first
  1228. parameter specifies the number of bytes of dynamic storage to be allocated on
  1229. the stack for the routine being entered. The second parameter corresponds to
  1230. the lexical nesting level of the routine, it can be in range from 0 to 31.
  1231. The specified lexical level determines how many sets of stack frame pointers
  1232. the CPU copies into the new stack frame from the preceding frame. This list
  1233. of stack frame pointers is sometimes called the display. The first word (or
  1234. double word when code is 32-bit) of the display is a pointer to the last stack
  1235. frame. This pointer enables a "leave" instruction to reverse the action of the
  1236. previous "enter" instruction by effectively discarding the last stack frame.
  1237. After "enter" creates the new display for a procedure, it allocates the
  1238. dynamic storage space for that procedure by decrementing ESP by the number of
  1239. bytes specified in the first parameter. To enable a procedure to address its
  1240. display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
  1241. frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
  1242. BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
  1243. levels greater than zero, the processor pushes additional frame pointers on
  1244. the stack before adjusting the stack pointer.
  1245.  
  1246.     enter 2048,0     ; enter and allocate 2048 bytes on stack
  1247.  
  1248.  
  1249. 2.1.12  System instructions
  1250.  
  1251. "lmsw" loads the operand into the machine status word (bits 0 through 15 of
  1252. CR0 register), while "smsw" stores the machine status word into the
  1253. destination operand. The operand for both those instructions can be 16-bit
  1254. general register or memory, for "smsw" it can also be 32-bit general
  1255. register.
  1256.  
  1257.     lmsw ax          ; load machine status from register
  1258.     smsw [bx]        ; store machine status to memory
  1259.  
  1260.   "lgdt" and "lidt" instructions load the values in operand into the global
  1261. descriptor table register or the interrupt descriptor table register
  1262. respectively. "sgdt" and "sidt" store the contents of the global descriptor
  1263. table register or the interrupt descriptor table register in the destination
  1264. operand. The operand should be a 6 bytes in memory.
  1265.  
  1266.     lgdt [ebx]       ; load global descriptor table
  1267.  
  1268.   "lldt" loads the operand into the segment selector field of the local
  1269. descriptor table register and "sldt" stores the segment selector from the
  1270. local descriptor table register in the operand. "ltr" loads the operand into
  1271. the segment selector field of the task register and "str" stores the segment
  1272. selector from the task register in the operand. Rules for operand are the same
  1273. as for the "lmsw" and "smsw" instructions.
  1274.   "lar" loads the access rights from the segment descriptor specified by
  1275. the selector in source operand into the destination operand and sets the ZF
  1276. flag. The destination operand can be a 16-bit or 32-bit general register.
  1277. The source operand should be a 16-bit general register or memory.
  1278.  
  1279.     lar ax,[bx]      ; load access rights into word
  1280.     lar eax,dx       ; load access rights into double word
  1281.  
  1282.   "lsl" loads the segment limit from the segment descriptor specified by the
  1283. selector in source operand into the destination operand and sets the ZF flag.
  1284. Rules for operand are the same as for the "lar" instruction.
  1285.   "verr" and "verw" verify whether the code or data segment specified with
  1286. the operand is readable or writable from the current privilege level. The
  1287. operand should be a word, it can be general register or memory. If the segment
  1288. is accessible and readable (for "verr") or writable (for "verw") the ZF flag
  1289. is set, otherwise it's cleared. Rules for operand are the same as for the
  1290. "lldt" instruction.
  1291.   "arpl" compares the RPL (requestor's privilege level) fields of two segment
  1292. selectors. The first operand contains one segment selector and the second
  1293. operand contains the other. If the RPL field of the destination operand is
  1294. less than the RPL field of the source operand, the ZF flag is set and the RPL
  1295. field of the destination operand is increased to match that of the source
  1296. operand. Otherwise, the ZF flag is cleared and no change is made to the
  1297. destination operand. The destination operand can be a word general register
  1298. or memory, the source operand must be a general register.
  1299.  
  1300.     arpl bx,ax       ; adjust RPL of selector in register
  1301.     arpl [bx],ax     ; adjust RPL of selector in memory
  1302.  
  1303.   "clts" clears the TS (task switched) flag in the CR0 register. This
  1304. instruction has no operands.
  1305.   "lock" prefix causes the processor's bus-lock signal to be asserted during
  1306. execution of the accompanying instruction. In a multiprocessor environment,
  1307. the bus-lock signal insures that the processor has exclusive use of any shared
  1308. memory while the signal is asserted. The "lock" prefix can be prepended only
  1309. to the following instructions and only to those forms of the instructions
  1310. where the destination operand is a memory operand: "add", "adc", "and", "btc",
  1311. "btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
  1312. "sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
  1313. these instructions and the source operand is a memory operand, an undefined
  1314. opcode exception may be generated. An undefined opcode exception will also be
  1315. generated if the "lock" prefix is used with any instruction not in the above
  1316. list. The "xchg" instruction always asserts the bus-lock signal regardless of
  1317. the presence or absence of the "lock" prefix.
  1318.   "hlt" stops instruction execution and places the processor in a halted
  1319. state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
  1320. signal will resume execution. This instruction has no operands.
  1321.   "invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
  1322. specified with the operand, which should be a memory. The processor determines
  1323. the page that contains that address and flushes the TLB entry for that page.
  1324.   "rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
  1325. address specified in the ECX register into registers EDX and EAX. "wrmsr"
  1326. writes the contents of registers EDX and EAX into the 64-bit MSR of the
  1327. address specified in the ECX register. "rdtsc" loads the current value of the
  1328. processor's time stamp counter from the 64-bit MSR into the EDX and EAX
  1329. registers. The processor increments the time stamp counter MSR every clock
  1330. cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
  1331. contents of the 40-bit performance monitoring counter specified in the ECX
  1332. register into registers EDX and EAX. These instructions have no operands.
  1333.   "wbinvd" writes back all modified cache lines in the processor's internal
  1334. cache to main memory and invalidates (flushes) the internal caches. The
  1335. instruction then issues a special function bus cycle that directs external
  1336. caches to also write back modified data and another bus cycle to indicate that
  1337. the external caches should be invalidated. This instruction has no operands.
  1338.   "rsm" return program control from the system management mode to the program
  1339. that was interrupted when the processor received an SMM interrupt. This
  1340. instruction has no operands.
  1341.   "sysenter" executes a fast call to a level 0 system procedure, "sysexit"
  1342. executes a fast return to level 3 user code. The addresses used by these
  1343. instructions are stored in MSRs. These instructions have no operands.
  1344.  
  1345.  
  1346. 2.1.13  FPU instructions
  1347.  
  1348. The FPU (Floating-Point Unit) instructions operate on the floating-point
  1349. values in three formats: single precision (32-bit), double precision (64-bit)
  1350. and double extended precision (80-bit). The FPU registers form the stack and
  1351. each of them holds the double extended precision floating-point value. When
  1352. some values are pushed onto the stack or are removed from the top, the FPU
  1353. registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
  1354. is the first value below the top, etc. The ST0 name has also the synonym ST.
  1355.   "fld" pushes the floating-point value onto the FPU register stack. The
  1356. operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
  1357. its value is then loaded onto the top of FPU register stack (the ST0
  1358. register) and is automatically converted into the double extended precision
  1359. format.
  1360.  
  1361.     fld dword [bx]   ; load single prevision value from memory
  1362.     fld st2          ; push value of st2 onto register stack
  1363.  
  1364.   "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
  1365. commonly used contants onto the FPU register stack. The loaded constants are
  1366. +1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
  1367. have no operands.
  1368.   "fild" convert the singed integer source operand into double extended
  1369. precision floating-point format and pushes the result onto the FPU register
  1370. stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
  1371.  
  1372.     fild qword [bx]  ; load 64-bit integer from memory
  1373.  
  1374.   "fst" copies the value of ST0 register to the destination operand, which
  1375. can be 32-bit or 64-bit memory location or another FPU register. "fstp"
  1376. performs the same operation as "fst" and then pops the register stack,
  1377. getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
  1378. and can also store value in the 80-bit memory.
  1379.  
  1380.     fst st3          ; copy value of st0 into st3 register
  1381.     fstp tword [bx]  ; store value in memory and pop stack
  1382.  
  1383.   "fist" converts the value in ST0 to a signed integer and stores the result
  1384. in the destination operand. The operand can be 16-bit or 32-bit memory
  1385. location. "fistp" performs the same operation and then pops the register
  1386. stack, it accepts the same operands as the "fist" instruction and can also
  1387. store integer value in the 64-bit memory, so it has the same rules for
  1388. operands as "fild" instruction.
  1389.   "fbld" converts the packed BCD integer into double extended precision
  1390. floating-point format and pushes this value onto the FPU stack. "fbstp"
  1391. converts the value in ST0 to an 18-digit packed BCD integer, stores the result
  1392. in the destination operand, and pops the register stack. The operand should be
  1393. an 80-bit memory location.
  1394.   "fadd" adds the destination and source operand and stores the sum in the
  1395. destination location. The destination operand is always an FPU register, if
  1396. the source is a memory location, the destination is ST0 register and only
  1397. source operand should be specified. If both operands are FPU registers, at
  1398. least one of them should be ST0 register. An operand in memory can be a
  1399. 32-bit or 64-bit value.
  1400.  
  1401.     fadd qword [bx]  ; add double precision value to st0
  1402.     fadd st2,st0     ; add st0 to st2
  1403.  
  1404.   "faddp" adds the destination and source operand, stores the sum in the
  1405. destination location and then pops the register stack. The destination operand
  1406. must be an FPU register and the source operand must be the ST0. When no
  1407. operands are specified, ST1 is used as a destination operand.
  1408.  
  1409.     faddp            ; add st0 to st1 and pop the stack
  1410.     faddp st2,st0    ; add st0 to st2 and pop the stack
  1411.  
  1412. "fiadd" instruction converts an integer source operand into double extended
  1413. precision floating-point value and adds it to the destination operand. The
  1414. operand should be a 16-bit or 32-bit memory location.
  1415.  
  1416.     fiadd word [bx]  ; add word integer to st0
  1417.  
  1418.   "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd",
  1419. have the same rules for operands and differ only in the perfomed computation.
  1420. "fsub" substracts the source operand from the destination operand, "fsubr"
  1421. substract the destination operand from the source operand, "fmul" multiplies
  1422. the destination and source operands, "fdiv" divides the destination operand by
  1423. the source operand and "fdivr" divides the source operand by the destination
  1424. operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
  1425. operations and pop the register stack, the rules for operand are the same as
  1426. for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
  1427. perform these operations after converting the integer source operand into
  1428. floating-point value, they have the same rules for operands as "fiadd"
  1429. instruction.
  1430.   "fsqrt" computes the square root of the value in ST0 register, "fsin"
  1431. computes the sine of that value, "fcos" computes the cosine of that value,
  1432. "fchs" complements its sign bit, "fabs" clears its sign to create the absolute
  1433. value, "frndint" rounds it to the nearest integral value, depending on the
  1434. current rounding mode. "f2xm1" computes the exponential value of 2 to the
  1435. power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
  1436. range -1.0 to +1.0. All these instruction store the result in ST0 and have no
  1437. operands.
  1438.   "fsincos" computes both the sine and the cosine of the value in ST0
  1439. register, stores the sine in ST0 and pushes the cosine on the top of FPU
  1440. register stack. "fptan" computes the tangent of the value in ST0, stores the
  1441. result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
  1442. the arctangent of the value in ST1 divided by the value in ST0, stores the
  1443. result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
  1444. logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
  1445. FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
  1446. ST0 before computing the logarithm. "fprem" computes the remainder obtained
  1447. from dividing the value in ST0 by the value in ST1, and stores the result
  1448. in ST0. "fprem1" performs the same operation as "fprem", but it computes the
  1449. remainder in the way specified by IEEE Standard 754. "fscale" truncates the
  1450. value in ST1 and increases the exponent of ST0 by this value. "fxtract"
  1451. separates the value in ST0 into its exponent and significand, stores the
  1452. exponent in ST0 and pushes the significand onto the register stack. "fnop"
  1453. performs no operation. These instruction have no operands.
  1454.   "fxch" exchanges the contents of ST0 an another FPU register. The operand
  1455. should be an FPU register, if no operand is specified, the contents of ST0 and
  1456. ST1 are exchanged.
  1457.   "fcom" and "fcomp" compare the contents of ST0 and the source operand and
  1458. set flags in the FPU status word according to the results. "fcomp"
  1459. additionally pops the register stack after performing the comparison. The
  1460. operand can be a single or double precision value in memory or the FPU
  1461. register. When no operand is specified, ST1 is used as a source operand.
  1462.  
  1463.     fcom             ; compare st0 with st1
  1464.     fcomp st2        ; compare st0 with st2 and pop stack
  1465.  
  1466.   "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
  1467. word according to the results and pops the register stack twice. This
  1468. instruction has no operands.
  1469.   "fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
  1470. registers. Rules for operands are the same as for the "fcom", "fcomp" and
  1471. "fcompp", but the source operand must be an FPU register.
  1472.   "ficom" and "ficomp" compare the value in ST0 with an integer source operand
  1473. and set the flags in the FPU status word according to the results. "ficomp"
  1474. additionally pops the register stack after performing the comparison. The
  1475. integer value is converted to double extended precision floating-point format
  1476. before the comparison is made. The operand should be a 16-bit or 32-bit
  1477. memory location.
  1478.  
  1479.     ficom word [bx]  ; compare st0 with 16-bit integer
  1480.  
  1481.   "fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with
  1482. another FPU register and set the ZF, PF and CF flags according to the results.
  1483. "fcomip" and "fucomip" additionaly pop the register stack after performing the
  1484. comparison. The instructions obtained by attaching the FPU condition mnemonic
  1485. (see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
  1486. into ST0 register if the fiven test condition is true. These instruction
  1487. allow two different syntaxes, one with single operand specifying the source
  1488. FPU register, and one with two operands, in that case destination operand
  1489. should be ST0 register and the second operand specifies the source FPU
  1490. register.
  1491.  
  1492.     fcomi st2        ; compare st0 with st2 and set flags
  1493.     fcmovb st0,st2   ; transfer st2 to st0 if below
  1494.  
  1495.    Table 2.2  FPU conditions
  1496.   ┌──────────┬──────────────────┬────────────────────────┐
  1497.   │ Mnemonic │ Condition tested │ Description            │
  1498.   ╞══════════╪══════════════════╪════════════════════════╡
  1499.   │ b        │ CF = 1           │ below                  │
  1500.   │ e        │ ZF = 1           │ equal                  │
  1501.   │ be       │ CF or ZF = 1     │ below or equal         │
  1502.   │ u        │ PF = 1           │ unordered              │
  1503.   │ nb       │ CF = 0           │ not below              │
  1504.   │ ne       │ ZF = 0           │ not equal              │
  1505.   │ nbe      │ CF and ZF = 0    │ not below nor equal    │
  1506.   │ nu       │ PF = 0           │ not unordered          │
  1507.   └──────────┴──────────────────┴────────────────────────┘
  1508.  
  1509.   "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
  1510. status word according to the results. "fxam" examines the contents of the ST0
  1511. and sets the flags in FPU status word to indicate the class of value in the
  1512. register. These instructions have no operands.
  1513.   "fstsw" and "fnstsw" store the current value of the FPU status word in the
  1514. destination location. The destination operand can be either a 16-bit memory or
  1515. the AX register. "fstsw" checks for pending umasked FPU exceptions before
  1516. storing the status word, "fnstsw" does not.
  1517.   "fstcw" and "fnstcw" store the current value of the FPU control word at the
  1518. specified destination in memory. "fstcw" checks for pending umasked FPU
  1519. exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
  1520. the operand into the FPU control word. The operand should be a 16-bit memory
  1521. location.
  1522.   "fstenv" and "fnstenv" store the current FPU operating environment at the
  1523. memory location specified with the destination operand, and then mask all FPU
  1524. exceptions. "fstenv" checks for pending umasked FPU exceptions before
  1525. proceeding, "fnstenv" does not. "fldenv" loads the complete operating
  1526. environment from memory into the FPU. "fsave" and "fnsave" store the current
  1527. FPU state (operating environment and register stack) at the specified
  1528. destination in memory and reinitializes the FPU. "fsave" check for pending
  1529. unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
  1530. loads the FPU state from the specified memory location. All these instructions
  1531. need an operand being a memory location.
  1532.   "finit" and "fninit" set the FPU operating environment into its default
  1533. state. "finit" checks for pending unmasked FPU exception before proceeding,
  1534. "fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
  1535. FPU status word. "fclex" checks for pending unmasked FPU exception before
  1536. proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
  1537. instruction, which causes the processor to check for pending unmasked FPU
  1538. exceptions and handle them before proceeding. These instruction have no
  1539. operands.
  1540.   "ffree" sets the tag associated with specified FPU register to empty. The
  1541. operand should be an FPU register.
  1542.   "fincstp" and "fdecstp" rotate the FPU stack by one by adding or
  1543. substracting one to the pointer of the top of stack. These instruction have no
  1544. operands.
  1545.  
  1546.  
  1547. 2.1.14  MMX instructions
  1548.  
  1549. The MMX instructions operate on the packed integer types and use the MMX
  1550. registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
  1551. of this MMX instructions cannot be used at the same time as FPU instructions.
  1552. They can operate on packed bytes (eight 8-bit integers), packed words (four
  1553. 16-bit integers) or packed double words (two 32-bit integers), use of packed
  1554. formats allows to perform operations on multiple data at one time.
  1555.   "movq" copies a quad word from the source operand to the destination
  1556. operand. At least one of the operands must be a MMX register, the second one
  1557. can be also a MMX register or 64-bit memory location.
  1558.  
  1559.     movq mm0,mm1     ; move quad word from register to register
  1560.     movq mm2,[ebx]   ; move quad word from memory to register
  1561.  
  1562.   "movd" copies a double word from the source operand to the destination
  1563. operand. One of the operands must be a MMX register, the second one can be a
  1564. general register or 32-bit memory location. Only low double word of MMX
  1565. register is used.
  1566.   All general MMX operations have two operands, the destination operand should
  1567. be a MMX register, the source operand can be a MMX register or 64-bit memory
  1568. location. Operation is performed on the corresponding data elements of the
  1569. source and destination operand and stored in the data elements of the
  1570. destination operand. "paddb", "paddw" and "paddd" perform the addition of
  1571. packed bytes, packed words, or packed double words.  "psubb", "psubw" and
  1572. "psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
  1573. "psubsb" and "psubsw" perform the addition or substraction of packed bytes
  1574. or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
  1575. "psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
  1576. performs a signed multiply of the packed words and store the high or low words
  1577. of the results in the destination operand. "pmaddwd" performs a multiply of
  1578. the packed words and adds the four intermediate double word products in pairs
  1579. to produce result as a packed double words. "pand", "por" and "pxor" perform
  1580. the logical operations on the quad words, "pandn" peforms also a logical
  1581. negation of the destination operand before performing the "and" operation.
  1582. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed bytes,
  1583. packed words or packed double words. If a pair of data elements is equal, the
  1584. corresponding data element in the destination operand is filled with bits of
  1585. value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" perform
  1586. the similar operation, but they check whether the data elements in the
  1587. destination operand are greater than the correspoding data elements in the
  1588. source operand. "packsswb" converts packed signed words into packed signed
  1589. bytes, "packssdw" converts packed signed double words into packed signed
  1590. words, using saturation to handle overflow conditions. "packuswb" converts
  1591. packed signed words into packed unsigned bytes. Converted data elements from
  1592. the source operand are stored in the low part of the destination operand,
  1593. while converted data elements from the destination operand are stored in the
  1594. high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
  1595. elements from the high parts of the source and destination operands and
  1596. stores the result into the destination operand. "punpcklbw", "punpcklwd" and
  1597. "punpckldq" perform the same operation, but the low parts of the source and
  1598. destination operand are used.
  1599.  
  1600.     paddsb mm0,[esi] ; add packed bytes with signed saturation
  1601.     pcmpeqw mm3,mm7  ; compare packed words for equality
  1602.  
  1603.   "psllw", "pslld" and "psllq" perform logical shift left of the packed words,
  1604. packed double words or a single quad word in the destination operand by the
  1605. amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
  1606. logical shift right of the packed words, packed double words or a single quad
  1607. word. "psraw" and "psrad" perform arithmetic shift of the packed words or
  1608. double words. The destination operand should be a MMX register, while source
  1609. operand can be a MMX register, 64-bit memory location, or 8-bit immediate
  1610. value.
  1611.  
  1612.     psllw mm2,mm4    ; shift words left logically
  1613.     psrad mm4,[ebx]  ; shift double words right arithmetically
  1614.  
  1615.   "emms" makes the FPU registers usable for the FPU instructions, it must be
  1616. used before using the FPU instructions if any MMX instructions were used.
  1617.  
  1618.  
  1619. 2.1.15  SSE instructions
  1620.  
  1621. The SSE extension adds more MMX instructions and also introduces the
  1622. operations on packed single precision floating point values. The 128-bit
  1623. packed single precision format consists of four single precision floating
  1624. point values. The 128-bit SSE registers are designed for the purpose of
  1625. operations on this data type.
  1626.   "movaps" and "movups" transfer a double quad word operand containing packed
  1627. single precision values from source operand to destination operand. At least
  1628. one of the operands have to be a SSE register, the second one can be also a
  1629. SSE register or 128-bit memory location. Memory operands for "movaps"
  1630. instruction must be aligned on boundary of 16 bytes, operands for "movups"
  1631. instruction don't have to be aligned.
  1632.  
  1633.     movups xmm0,[ebx]  ; move unaligned double quad word
  1634.  
  1635.   "movlps" moves packed two single precision values between the memory and the
  1636. low quad word of SSE register. "movhps" moved packed two single precision
  1637. values between the memory and the high quad word of SSE register. One of the
  1638. operands must be a SSE register, and the other operand must be a 64-bit memory
  1639. location.
  1640.  
  1641.     movlps xmm0,[ebx]  ; move memory to low quad word of xmm0
  1642.     movhps [esi],xmm7  ; move high quad word of xmm7 to memory
  1643.  
  1644.   "movlhps" moves packed two single precision values from the low quad word
  1645. of source register to the high quad word of destination register. "movhlps"
  1646. moves two packed single precision values from the high quad word of source
  1647. register to the low quad word of destination register. Both operands have to
  1648. be a SSE registers.
  1649.   "movmskps" transfers the most significant bit of each of the four single
  1650. precision values in the SSE register into low four bits of a general register.
  1651. The source operand must be a SSE register, the destination operand must be a
  1652. general register.
  1653.   "movss" transfers a single precision value between source and destination
  1654. operand (only the low double word is trasferred). At least one of the operands
  1655. have to be a SSE register, the second one can be also a SSE register or 32-bit
  1656. memory location.
  1657.  
  1658.     movss [edi],xmm3   ; move low double word of xmm3 to memory
  1659.  
  1660.   Each of the SSE arithmetic operations has two variants. When the mnemonic
  1661. ends with "ps", the source operand can be a 128-bit memory location or a SSE
  1662. register, the destination operand must be a SSE register and the operation is
  1663. performed on packed four single precision values, for each pair of the
  1664. corresponding data elements separately, the result is stored in the
  1665. destination register. When the mnemonic ends with "ss", the source operand
  1666. can be a 32-bit memory location or a SSE register, the destination operand
  1667. must be a SSE register and the operation is performed on single precision
  1668. values, only low double words of SSE registers are used in this case, the
  1669. result is stored in the low double word of destination register. "addps" and
  1670. "addss" add the values, "subps" and "subss" substract the source value from
  1671. destination value, "mulps" and "mulss" multiply the values, "divps" and
  1672. "divss" divide the destination value by the source value, "rcpps" and "rcpss"
  1673. compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
  1674. compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
  1675. the approximate reciprocal of square root of the source value, "maxps" and
  1676. "maxss" compare the source and destination values and return the greater one,
  1677. "minps" and "minss" compare the source and destination values and return the
  1678. lesser one.
  1679.  
  1680.     mulss xmm0,[ebx]   ; multiply single precision values
  1681.     addps xmm3,xmm7    ; add packed single precision values
  1682.  
  1683.   "andps", "andnps", "orps" and "xorps" perform the logical operations on
  1684. packed single precision values. The source operand can be a 128-bit memory
  1685. location or a SSE register, the destination operand must be a SSE register.
  1686.   "cmpps" compares packed single precision values and returns a mask result
  1687. into the destination operand, which must be a SSE register. The source operand
  1688. can be a 128-bit memory location or SSE register, the third operand must be an
  1689. immediate operand selecting code of one of the eight compare conditions
  1690. (table 2.3). "cmpss" performs the same operation on single precision values,
  1691. only low double word of destination register is affected, in this case source
  1692. operand can be a 32-bit memory location or SSE register. These two
  1693. instructions have also variants with only two operands and the condition
  1694. encoded within mnemonic. Their mnemonics are obtained by attaching the
  1695. mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
  1696. "ss" at the end.
  1697.  
  1698.     cmpps xmm2,xmm4,0  ; compare packed single precision values
  1699.     cmpltss xmm0,[ebx] ; compare single precision values
  1700.  
  1701.    Table 2.3  SSE conditions
  1702.   ┌──────┬──────────┬─────────────────────────┐
  1703.   │ Code │ Mnemonic │ Description             │
  1704.   ╞══════╪══════════╪═════════════════════════╡
  1705.   │ 0    │ eq       │ equal                   │
  1706.   │ 1    │ lt       │ less than               │
  1707.   │ 2    │ le       │ less than or equal      │
  1708.   │ 3    │ unord    │ unordered               │
  1709.   │ 4    │ neq      │ not equal               │
  1710.   │ 5    │ nlt      │ not less than           │
  1711.   │ 6    │ nle      │ not less than nor equal │
  1712.   │ 7    │ ord      │ ordered                 │
  1713.   └──────┴──────────┴─────────────────────────┘
  1714.  
  1715.   "comiss" and "ucomiss" compare the single precision values and set the ZF,
  1716. PF and CF flags to show the result. The destination operand must be a SSE
  1717. register, the source operand can be a 32-bit memory location or SSE register.
  1718.   "shufps" moves any two of the four single precision values from the
  1719. destination operand into the low quad word of the destination operand, and any
  1720. two of the four values from the source operand into the high quad word of the
  1721. destination operand. The destination operand must be a SSE register, the
  1722. source operand can be a 128-bit memory location or SSE register, the third
  1723. operand must be an 8-bit immediate value selecting which values will be moved
  1724. into the destination operand. Bits 0 and 1 select the value to be moved from
  1725. destination operand to the low double word of the result, bits 2 and 3 select
  1726. the value to be moved from the destination operand to the second double word,
  1727. bits 4 and 5 select the value to be moved from the source operand to the third
  1728. double word, and bits 6 and 7 select the value to be moved from the source
  1729. operand to the high double word of the result.
  1730.  
  1731.     shufps xmm0,xmm0,10010011b ; shuffle double words
  1732.  
  1733.   "unpckhps" performs an interleaved unpack of the values from the high parts
  1734. of the source and destination operands and stores the result in the
  1735. destination operand, which must be a SSE register. The source operand can be
  1736. a 128-bit memory location or a SSE register. "unpcklps" performs an
  1737. interleaved unpack of the values from the low parts of the source and
  1738. destination operand and stores the result in the destination operand,
  1739. the rules for operands are the same.
  1740.   "cvtpi2ps" converts packed two double word integers into the the packed two
  1741. single precision floating point values and stores the result in the low quad
  1742. word of the destination operand, which should be a SSE register. The source
  1743. operand can be a 64-bit memory location or MMX register.
  1744.  
  1745.     cvtpi2ps xmm0,mm0  ; convert integers to single precision values
  1746.  
  1747.   "cvtsi2ss" converts a double word integer into a single precision floating
  1748. point value and stores the result in the low double word of the destination
  1749. operand, which should be a SSE register. The source operand can be a 32-bit
  1750. memory location or 32-bit general register.
  1751.  
  1752.     cvtsi2ss xmm0,eax  ; convert integer to single precision value
  1753.  
  1754.   "cvtps2pi" converts packed two single precision floating point values into
  1755. packed two double word integers and stores the result in the destination
  1756. operand, which should be a MMX register. The source operand can be a 64-bit
  1757. memory location or SSE register, only low quad word of SSE register is used.
  1758. "cvttps2pi" performs the similar operation, except that truncation is used to
  1759. round a source values to integers, rules for the operands are the same.
  1760.  
  1761.     cvtps2pi mm0,xmm0  ; convert single precision values to integers
  1762.  
  1763.   "cvtss2si" convert a single precision floating point value into a double
  1764. word integer and stores the result in the destination operand, which should be
  1765. a 32-bit general register. The source operand can be a 32-bit memory location
  1766. or SSE register, only low double word of SSE register is used. "cvttss2si"
  1767. performs the similar operation, except that truncation is used to round a
  1768. source value to integer, rules for the operands are the same.
  1769.  
  1770.     cvtss2si eax,xmm0  ; convert single precision value to integer
  1771.  
  1772.   "pextrw" copies the word in the source operand specified by the third
  1773. operand to the destination operand. The source operand must be a MMX register,
  1774. the destination operand must be a 32-bit general register (but only the low
  1775. word of it is affected), the third operand must an 8-bit immediate value.
  1776.  
  1777.     pextrw eax,mm0,1   ; extract word into eax
  1778.  
  1779.   "pinsrw" inserts a word from the source operand in the destination operand
  1780. at the location specified with the third operand, which must be an 8-bit
  1781. immediate value. The destination operand must be a MMX register, the source
  1782. operand can be a 16-bit memory location or 32-bit general register (only low
  1783. word of the register is used).
  1784.  
  1785.     pinsrw mm1,ebx,2   ; insert word from ebx
  1786.  
  1787.   "pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"
  1788. return the maximum values of packed unsigned bytes, "pminub" returns the
  1789. minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
  1790. of packed signed words, "pminsw" returns the minimum values of packed signed
  1791. words. "pmulhuw" performs a unsigned multiply of the packed words and stores
  1792. the high words of the results in the destination operand. "psadbw" computes
  1793. the absolute differences of packed unsigned bytes, sums the differences, and
  1794. stores the sum in the low word of destination operand. All these instructions
  1795. follow the same rules for operands as the general MMX operations described in
  1796. previous section.
  1797.   "pmovmskb" creates a mask made of the most significant bit of each byte in
  1798. the source operand and stores the result in the low byte of destination
  1799. operand. The source operand must be a MMX register, the destination operand
  1800. must a 32-bit general register.
  1801.   "pshufw" inserts words from the source operand in the destination operand
  1802. from the locations specified with the third operand. The destination operand
  1803. must be a MMX register, the source operand can be a 64-bit memory location or
  1804. MMX register, third operand must an 8-bit immediate value selecting which
  1805. values will be moved into destination operand, in the similar way as the third
  1806. operand of the "shufps" instruction.
  1807.   "movntq" moves the quad word from the source operand to memory using a
  1808. non-temporal hint to minimize cache pollution. The source operand should be a
  1809. MMX register, the destination operand should be a 64-bit memory location.
  1810. "movntps" stores packed single precision values from the SSE register to
  1811. memory using a non-temporal hint. The source operand should be a SSE register,
  1812. the destination operand should be a 128-bit memory location. "maskmovq" stores
  1813. selected bytes from the first operand into a 64-bit memory location using a
  1814. non-temporal hint. Both operands should be a MMX registers, the second operand
  1815. selects wich bytes from the source operand are written to memory. The
  1816. memory location is pointed by DI (or EDI) register in the segment selected
  1817. by DS.
  1818.   "prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
  1819. of data from memory that contains byte specified with the operand to a
  1820. specified location in hierarchy.  The operand should be an 8-bit memory
  1821. location.
  1822.   "sfence" performs a serializing operation on all instruction storing to
  1823. memory that were issued prior to it. This instruction has no operands.
  1824.   "ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
  1825. stores the contents of MXCSR into a 32-bit memory operand.
  1826.   "fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
  1827. and SSE registers to a 512-byte memory location specified in the destination
  1828. operand. "fxrstor" reloads data previously stored with "fxsave" instruction
  1829. from the specified 512-byte memory location. The memory operand for both those
  1830. instructions must be aligned on 16 byte boundary, it should declare operand
  1831. of no specified size.
  1832.  
  1833.  
  1834. 2.1.16  SSE2 instructions
  1835.  
  1836. The SSE2 extension introduces the operations on packed double precision
  1837. floating point values, extends the syntax of MMX instructions, and adds also
  1838. some new instructions.
  1839.   "movapd" and "movupd" transfer a double quad word operand containing packed
  1840. double precision values from source operand to destination operand. These
  1841. instructions are analogous to "movaps" and "movups" and have the same rules
  1842. for operands.
  1843.   "movlpd" moves double precision value between the memory and the low quad
  1844. word of SSE register. "movhpd" moved double precision value between the memory
  1845. and the high quad word of SSE register. These instructions are analogous to
  1846. "movlps" and "movhps" and have the same rules for operands.
  1847.   "movmskpd" transfers the most significant bit of each of the two double
  1848. precision values in the SSE register into low two bits of a general register.
  1849. This instruction is analogous to "movmskps" and has the same rules for
  1850. operands.
  1851.   "movsd" transfers a double precision value between source and destination
  1852. operand (only the low quad word is trasferred). At least one of the operands
  1853. have to be a SSE register, the second one can be also a SSE register or 64-bit
  1854. memory location.
  1855.   Arithmetic operations on double precision values are: "addpd", "addsd",
  1856. "subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
  1857. "maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
  1858. operations on single precision values described in previous section. When the
  1859. mnemonic ends with "pd" instead of "ps", the operation is performed on packed
  1860. two double precision values, but rules for operands are the same. When the
  1861. mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
  1862. memory location or a SSE register, the destination operand must be a SSE
  1863. register and the operation is performed on double precision values, only low
  1864. quad words of SSE registers are used in this case.
  1865.   "andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
  1866. packed double precision values. They are analoguous to SSE logical operations
  1867. on single prevision values and have the same rules for operands.
  1868.   "cmppd" compares packed double precision values and returns and returns a
  1869. mask result into the destination operand. This instruction is analoguous to
  1870. "cmpps" and has the same rules for operands. "cmpsd" performs the same
  1871. operation on double precision values, only low quad word of destination
  1872. register is affected, in this case source operand can be a 64-bit memory or
  1873. SSE register. Variant with only two operands are obtained by attaching the
  1874. condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
  1875. the "pd" or "sd" at the end.
  1876.   "comisd" and "ucomisd" compare the double precision values and set the ZF,
  1877. PF and CF flags to show the result. The destination operand must be a SSE
  1878. register, the source operand can be a 128-bit memory location or SSE register.
  1879.   "shufpd" moves any of the two double precision values from the destination
  1880. operand into the low quad word of the destination operand, and any of the two
  1881. values from the source operand into the high quad word of the destination
  1882. operand. This instruction is analoguous to "shufps" and has the same rules for
  1883. operand. Bit 0 of the third operand selects the value to be moved from the
  1884. destination operand, bit 1 selects the value to be moved from the source
  1885. operand, the rest of bits are reserved and must be zeroed.
  1886.   "unpckhpd" performs an unpack of the high quad words from the source and
  1887. destination operands, "unpcklpd" performs an unpack of the low quad words from
  1888. the source and destination operands. They are analoguous to "unpckhps" and
  1889. "unpcklps", and have the same rules for operands.
  1890.   "cvtps2pd" converts the packed two single precision floating point values to
  1891. two packed double precision floating point values, the destination operand
  1892. must be a SSE register, the source operand can be a 64-bit memory location or
  1893. SSE register. "cvtpd2ps" converts the packed two double precision floating
  1894. point values to packed two single precision floating point values, the
  1895. destination operand must be a SSE register, the source operand can be a
  1896. 128-bit memory location or SSE register. "cvtss2sd" converts the single
  1897. precision floating point value to double precision floating point value, the
  1898. destination operand must be a SSE register, the source operand can be a 32-bit
  1899. memory location or SSE register. "cvtsd2ss" converts the double precision
  1900. floating point value to single precision floating point value, the destination
  1901. operand must be a SSE register, the source operand can be 64-bit memory
  1902. location or SSE register.
  1903.   "cvtpi2pd" converts packed two double word integers into the the packed
  1904. double precision floating point values, the destination operand must be a SSE
  1905. register, the source operand can be a 64-bit memory location or MMX register.
  1906. "cvtsi2sd" converts a double word integer into a double precision floating
  1907. point value, the destination operand must be a SSE register, the source
  1908. operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
  1909. converts packed double precision floating point values into packed two double
  1910. word integers, the destination operand should be a MMX register, the source
  1911. operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
  1912. the similar operation, except that truncation is used to round a source values
  1913. to integers, rules for operands are the same. "cvtsd2si" converts a double
  1914. precision floating point value into a double word integer, the destination
  1915. operand should be a 32-bit general register, the source operand can be a
  1916. 64-bit memory location or SSE register. "cvttsd2si" performs the similar
  1917. operation, except that truncation is used to round a source value to integer,
  1918. rules for operands are the same.
  1919.   "cvtps2dq" and "cvttps2dq" convert packed single precision floating point
  1920. values to packed four double word integers, storing them in the destination
  1921. operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
  1922. point values to packed two double word integers, storing the result in the low
  1923. quad word of the destination operand. "cvtdq2ps" converts packed four
  1924. double word integers to packed single precision floating point values.
  1925. "cvtdq2pd" converts packed two double word integers from the low quad word
  1926. of the source operand to packed double precision floating point values.
  1927. For all these instruction destination operand must be a SSE register, the
  1928. source operand can be a 128-bit memory location or SSE register.
  1929.   "movdqa" and "movdqu" transfer a double quad word operand containing packed
  1930. integers from source operand to destination operand. At least one of the
  1931. operands have to be a SSE register, the second one can be also a SSE register
  1932. or 128-bit memory location. Memory operands for "movdqa" instruction must be
  1933. aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
  1934. to be aligned.
  1935.   "movq2dq" moves the contents of the MMX source register to the low quad word
  1936. of destination SSE register. "movdq2q" moves the low quad word from the source
  1937. SSE register to the destination MMX register.
  1938.  
  1939.     movq2dq xmm0,mm1   ; move from MMX register to SSE register
  1940.     movdq2q mm0,xmm1   ; move from SSE register to MMX register
  1941.  
  1942.   All MMX instructions operating on the 64-bit packed integers (those with
  1943. mnemonics starting with "p") are extended to operate on 128-bit packed
  1944. integers located in SSE registers. Additional syntax for these instructions
  1945. needs an SSE register where MMX register was needed, and the 128-bit memory
  1946. location or SSE register where 64-bit memory location of MMX register were
  1947. needed. The exception is "pshufw" instruction, which doesn't allow extended
  1948. syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
  1949. the extended syntax, and perform the same operation as "pshufw" on the high
  1950. or low quad words of operands respectively. Also the new instruction "pshufd"
  1951. is introduced, which performs the same operation as "pshufw", but on the
  1952. double words instead of words, it allows only the extended syntax.
  1953.  
  1954.     psubb xmm0,[esi]   ; substract 16 packed bytes
  1955.     pextrw eax,xmm0,7  ; extract highest word into eax
  1956.  
  1957.   "paddq" performs the addition of packed quad words, "psubq" performs the
  1958. substraction of packed quad words, "pmuludq" performs an unsigned multiply
  1959. of low double words from each corresponding quad words and returns the results
  1960. in packed quad words. These instructions follow the same rules for operands as
  1961. the general MMX operations described in 2.1.14.
  1962.   "pslldq" and "psrldq" perform logical shift left or right of the double
  1963. quad word in the destination operand by the amount of bits specified in the
  1964. source operand. The destination operand should be a SSE register, source
  1965. operand should be an 8-bit immediate value.
  1966.   "punpckhqdq" interleaves the high quad word of the source operand and the
  1967. high quad word of the destination operand and writes them to the destination
  1968. SSE register. "punpcklqdq" interleaves the low quad word of the source operand
  1969. and the low quad word of the destination operand and writes them to the
  1970. destination SSE register. The source operand can be a 128-bit memory location
  1971. or SSE register.
  1972.   "movntdq" stores packed integer data from the SSE register to memory using
  1973. non-temporal hint. The source operand should be a SSE register, the
  1974. destination operand should be a 128-bit memory location. "movntpd" stores
  1975. packed double precision values from the SSE register to memory using a
  1976. non-temporal hint. Rules for operand are the same. "movnti" stores integer
  1977. from a general register to memory using a non-temporal hint. The source
  1978. operand should be a 32-bit general register, the destination operand should
  1979. be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
  1980. operand into a 128-bit memory location using a non-temporal hint. Both
  1981. operands should be a SSE registers, the second operand selects wich bytes from
  1982. the source operand are written to memory. The memory location is pointed by DI
  1983. (or EDI) register in the segment selected by DS and does not need to be
  1984. aligned.
  1985.   "clflush" writes and invalidates the cache line associated with the address
  1986. of byte specified with the operand, which should be a 8-bit memory location.
  1987.   "lfence" performs a serializing operation on all instruction loading from
  1988. memory that were issued prior to it. "mfence" performs a serializing operation
  1989. on all instruction accesing memory that were issued prior to it, and so it
  1990. combines the functions of "sfence" (described in previous section) and
  1991. "lfence" instructions. These instructions have no operands.
  1992.  
  1993.  
  1994. 2.1.17  SSE3 instructions
  1995.  
  1996. Prescott technology introduced some new instructions to improve the performance
  1997. of SSE and SSE2 - this extension is called SSE3.
  1998.   "fisttp" behaves like the "fistp" instruction and accepts the same operands,
  1999. the only difference is that it always used truncation, irrespective of the
  2000. rounding mode.
  2001.   "movshdup" loads into destination operand the 128-bit value obtained from
  2002. the source value of the same size by filling the each quad word with the two
  2003. duplicates of the value in its high double word. "movsldup" performs the same
  2004. action, except it duplicates the values of low double words. The destination
  2005. operand should be SSE register, the source operand can be SSE register or
  2006. 128-bit memory location.
  2007.   "movddup" loads the 64-bit source value and duplicates it into high and low
  2008. quad word of the destination operand. The destination operand should be SSE
  2009. register, the source operand can be SSE register or 64-bit memory location.
  2010.   "lddqu" is functionally equivalent to "movdqu" instruction with memory as
  2011. source operand, but it may improve performance when the source operand crosses
  2012. a cacheline boundary. The destination operand has to be SSE register, the
  2013. source operand must be 128-bit memory location.
  2014.   "addsubps" performs single precision addition of second and fourth pairs and
  2015. single precision substracion of the first and third pairs of floating point
  2016. values in the operands. "addsubpd" performs double precision addition of the
  2017. second pair and double precision substraction of the first pair of floating
  2018. point values in the operand. "haddps" performs the addition of two single
  2019. precision values within the each quad word of source and destination operands,
  2020. and stores the results of such horizontal addition of values from destination
  2021. operand into low quad word of destination operand, and the results from the
  2022. source operand into high quad word of destination operand. "haddpd" performs
  2023. the addition of two double precision values within each operand, and stores
  2024. the result from destination operand into low quad word of destination operand,
  2025. and the result from source operand into high quad word of destination operand.
  2026. All these instruction need the destination operand to be SSE register, source
  2027. operand can be SSE register or 128-bit memory location.
  2028.   "monitor" sets up an address range for monitoring of write-back stores. It
  2029. need its three operands to be EAX, ECX and EDX register in that order. "mwait"
  2030. waits for a write-back store to the address range set up by the "monitor"
  2031. instruction. It uses two operands with additional parameters, first being the
  2032. EAX and second the ECX register.
  2033.  
  2034.  
  2035. 2.1.18  AMD 3DNow! instructions
  2036.  
  2037. The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
  2038. and introduces operation on the 64-bit packed floating point values, each
  2039. consisting of two single precision floating point values.
  2040.   These instructions follow the same rules as the general MMX operations, the
  2041. destination operand should be a MMX register, the source operand can be a MMX
  2042. register or 64-bit memory location. "pavgusb" computes the rounded averages
  2043. of packed unsigned bytes. "pmulhrw" performs a signed multiply of the packed
  2044. words, round the high word of each double word results and stores them in the
  2045. destination operand. "pi2fd" converts packed double word integers into
  2046. packed floating point values. "pf2id" converts packed floating point values
  2047. into packed double word integers using truncation. "pi2fw" converts packed
  2048. word integers into packed floating point values, only low words of each
  2049. double word in source operand are used. "pf2iw" converts packed floating
  2050. point values to packed word integers, results are extended to double words
  2051. using the sign extension. "pfadd" adds packed floating point values. "pfsub"
  2052. and "pfsubr" substracts packed floating point values, the first one substracts
  2053. source values from destination values, the second one substracts destination
  2054. values from the source values. "pfmul" multiplies packed floating point
  2055. values. "pfacc" adds the low and high floating point values of the destination
  2056. operand, storing the result in the low double word of destination, and adds
  2057. the low and high floating point values of the source operand, storing the
  2058. result in the high double word of destination. "pfnacc" substracts the high
  2059. floating point value of the destination operand from the low, storing the
  2060. result in the low double word of destination, and substracts the high floating
  2061. point value of the source operand from the low, storing the result in the high
  2062. double word of destination. "pfpnacc" substracts the high floating point value
  2063. of the destination operand from the low, storing the result in the low double
  2064. word of destination, and adds the low and high floating point values of the
  2065. source operand, storing the result in the high double word of destination.
  2066. "pfmax" and "pfmin" compute the maximum and minimum of floating point values.
  2067. "pswapd" reverses the high and low double word of the source operand. "pfrcp"
  2068. returns an estimates of the reciprocals of floating point values from the
  2069. source operand, "pfrsqrt" returns an estimates of the reciprocal square
  2070. roots of floating point values from the source operand, "pfrcpit1" performs
  2071. the first step in the Newton-Raphson iteration to refine the reciprocal
  2072. approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
  2073. step in the Newton-Raphson iteration to refine the reciprocal square root
  2074. approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
  2075. second final step in the Newton-Raphson iteration to refine the reciprocal
  2076. approximation or the reciprocal square root approximation. "pfcmpeq",
  2077. "pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
  2078. all bits or zeroes all bits of the correspoding data element in the
  2079. destination operand according to the result of comparison, first checks
  2080. whether values are equal, second checks whether destination value is greater
  2081. or equal to source value, third checks whether destination value is greater
  2082. than source value.
  2083.   "prefetch" and "prefetchw" load the line of data from memory that contains
  2084. byte specified with the operand into the data cache, "prefetchw" instruction
  2085. should be used when the data in the cache line is expected to be modified,
  2086. otherwise the "prefetch" instruction should be used. The operand should be an
  2087. 8-bit memory location.
  2088.   "femms" performs a fast clear of MMX state. This instruction has no
  2089. operands.
  2090.  
  2091.  
  2092. 2.1.19  The x86-64 long mode instructions
  2093.  
  2094. The AMD64 and EM64T architectures (we will use the common name x86-64 for them
  2095. both) extend the x86 instruction set for the 64-bit processing. While legacy
  2096. and compatibility modes use the same set of registers and instructions, the
  2097. new long mode extends the x86 operations to 64 bits and introduces several new
  2098. registers. You can turn on generating the code for this mode with the "use64"
  2099. directive.
  2100.   Each of the general purpose registers is extended to 64 bits and the eight
  2101. whole new general purpose registers and also eight new SSE registers are added.
  2102. See table 2.4 for the summary of new registers (only the ones that was not
  2103. listed in table 1.2). The general purpose registers of smallers sizes are the
  2104. low order portions of the larger ones. You can still access the "ah", "bh",
  2105. "ch" and "dh" registers in long mode, but you cannot use them in the same
  2106. instruction with any of the new registers.
  2107.  
  2108.    Table 2.4  New registers in long mode
  2109.   ┌──────┬───────────────────────────┬───────┐
  2110.   │ Type │          General          │  SSE  │
  2111.   ├──────┼──────┬──────┬──────┬──────┼───────┤
  2112.   │ Bits │  8   │  16  │  32  │  64  │  128  │
  2113.   ╞══════╪══════╪══════╪══════╪══════╪═══════╡
  2114.   │      │      │      │      │ rax  │       │
  2115.   │      │      │      │      │ rcx  │       │
  2116.   │      │      │      │      │ rdx  │       │
  2117.   │      │      │      │      │ rbx  │       │
  2118.   │      │ spl  │      │      │ rsp  │       │
  2119.   │      │ bpl  │      │      │ rbp  │       │
  2120.   │      │ sil  │      │      │ rsi  │       │
  2121.   │      │ dil  │      │      │ rdi  │       │
  2122.   │      │ r8b  │ r8w  │ r8d  │ r8   │ xmm8  │
  2123.   │      │ r9b  │ r9w  │ r9d  │ r9   │ xmm9  │
  2124.   │      │ r10b │ r10w │ r10d │ r10  │ xmm10 │
  2125.   │      │ r11b │ r11w │ r11d │ r11  │ xmm11 │
  2126.   │      │ r12b │ r12w │ r12d │ r12  │ xmm12 │
  2127.   │      │ r13b │ r13w │ r13d │ r13  │ xmm13 │
  2128.   │      │ r14b │ r14w │ r14d │ r14  │ xmm14 │
  2129.   │      │ r15b │ r15w │ r15d │ r15  │ xmm15 │
  2130.   └──────┴──────┴──────┴──────┴──────┴───────┘
  2131.  
  2132.    In general any instruction from x86 architecture, which allowed 16-bit or
  2133. 32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
  2134. registers should be used for addressing in long mode, the 32-bit addressing
  2135. is also allowed, but it's not possible to use the addresses based on 16-bit
  2136. registers. Below are the samples of new operations possible in long mode on the
  2137. example of "mov" instruction:
  2138.  
  2139.     mov rax,r8   ; transfer 64-bit general register
  2140.     mov al,[rbx] ; transfer memory addressed by 64-bit register
  2141.  
  2142. The long mode uses also the instruction pointer based addresses, you can
  2143. specify it manually with the special RIP register symbol, but such addressing
  2144. is also automatically generated by flat assembler, since there is no 64-bit
  2145. absolute addressing in long mode. You can still force the assembler to use the
  2146. 32-bit absolute addressing by putting the "dword" size override for address
  2147. inside the square brackets. There is also one exception, where the 64-bit
  2148. absolute addressing is possible, it's the "mov" instruction with one of the
  2149. operand being accumulator register, and second being the memory operand.
  2150. To force the assembler to use the 64-bit absolute addressing there, use the
  2151. "qword" size operator for address inside the square brackets. When no size
  2152. operator is applied to address, assembler generates the optimal form
  2153. automatically.
  2154.  
  2155.     mov [qword 0],rax  ; absolute 64-bit addressing
  2156.     mov [dword 0],r15d ; absolute 32-bit addressing
  2157.     mov [0],rsi        ; automatic RIP-relative addressing
  2158.     mov [rip+3],sil    ; manual RIP-relative addressing
  2159.  
  2160.   Also as the immediate operands for 64-bit operations only the signed 32-bit
  2161. values are possible, with the only exception being the "mov" instruction with
  2162. destination operand being 64-bit general purpose register. Trying to force the
  2163. 64-bit immediate with any other instruction will cause an error.
  2164.   If any operation is performed on the 32-bit general registers in long mode,
  2165. the upper 32 bits of the 64-bit registers containing them are filled with
  2166. zeros. This is unlike the operations on 16-bit or 8-bit portions of those
  2167. registers, which preserve the upper bits.
  2168.   Three new type conversion instructions are available. The "cdqe" sign extends
  2169. the double word in EAX into quad word and stores the result in RAX register.
  2170. "cqo" sign extends the quad word in RAX into double quad word and stores the
  2171. extra bits in the RDX register. These instructions have no operands. "movsxd"
  2172. sign extends the double word source operand, being either the 32-bit register
  2173. or memory, into 64-bit destination operand, which has to be register.
  2174. No analogous instruction is needed for the zero extension, since it is done
  2175. automatically by any operations on 32-bit registers, as noted in previous
  2176. paragraph. And the "movzx" and "movsx" instructions, conforming to the general
  2177. rule, can be used with 64-bit destination operand, allowing extension of byte
  2178. or word values into quad words.
  2179.   All the binary arithmetic and logical instruction are promoted to allow
  2180. 64-bit operands in long mode. The use of decimal arithmetic instructions in
  2181. long mode is prohibited.
  2182.   The stack operations, like "push" and "pop" in long mode default to 64-bit
  2183. operands and it's not possible to use 32-bit operands with them. The "pusha"
  2184. and "popa" are disallowed in long mode.
  2185.   The indirect near jumps and calls in long mode default to 64-bit operands and
  2186. it's not possible to use the 32-bit operands with them. On the other hand, the
  2187. indirect far jumps and calls allow any operands that were allowed by the x86
  2188. architecture and also 80-bit memory operand is allowed (though only EM64T seems
  2189. to implement such variant), with the first eight bytes defining the offset and
  2190. two last bytes specifying the selector. The direct far jumps and calls are not
  2191. allowed in long mode.
  2192.   The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
  2193. instructions that are not extended to accept quad word operands in long mode.
  2194. But all other string operations are, and there are new short forms "movsq",
  2195. "cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
  2196. operations for 64-bit string elements. The RSI and RDI registers are used by
  2197. default to address the string elements.
  2198.   The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
  2199. memory operand with 64-bit destination register (though only EM64T seems to
  2200. implement such variant). The "lds" and "les" are disallowed in long mode.
  2201.   The system instructions like "lgdt" which required the 48-bit memory operand,
  2202. in long mode require the 80-bit memory operand.
  2203.   The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
  2204. the double quad word memory operand and 64-bit registers to perform the
  2205. analoguous operation.
  2206.   "swapgs" is the new instruction, which swaps the contents of GS register and
  2207. the KernelGSbase model-specific register (MSR address 0C0000102h).
  2208.   "syscall" and "sysret" is the pair of new instructions that provide the
  2209. functionality similar to "sysenter" and "sysexit" in long mode, where the
  2210. latter pair is disallowed.
  2211.  
  2212.  
  2213. 2.2  Control directives
  2214.  
  2215. This section describes the directives that control the assembly process, they
  2216. are processed during the assembly and may cause some blocks of instructions
  2217. to be assembled differently or not assembled at all.
  2218.  
  2219.  
  2220. 2.2.1  Numerical constants
  2221.  
  2222. The "=" directive allows to define the numerical constant. It should be
  2223. preceded by the name for the constant and followed by the numerical expression
  2224. providing the value. The value of such constants can be a number or an address,
  2225. but - unlike labels - the numerical constants are not allowed to hold the
  2226. register-based addresses. Besides this difference, in their basic variant
  2227. numerical constants behave very much like labels and you can even
  2228. forward-reference them (access their values before they actually get defined).
  2229.   There is, however, a second variant of numerical constants, which is
  2230. recognized by assembler when you try to define the constant of name, under
  2231. which there already was a numerical constant defined. In such case assembler
  2232. treats that constant as an assembly-time variable and allows it to be assigned
  2233. with new value, but forbids forward-referencing it (for obvious reasons). Let's
  2234. see both the variant of numerical constants in one example:
  2235.  
  2236.     dd sum
  2237.     x = 1
  2238.     x = x+2
  2239.     sum = x
  2240.  
  2241. Here the "x" is an assembly-time variable, and every time it is accessed, the
  2242. value that was assigned to it the most recently is used. Thus if we tried to
  2243. access the "x" before it gets defined the first time, like if we wrote "dd x"
  2244. in place of the "dd sum" instruction, it would cause an error. And when it is
  2245. re-defined with the "x = x+2" directive, the previous value of "x" is used to
  2246. calculate the new one. So when the "sum" constant gets defined, the "x" has
  2247. value of 3, and this value is assigned to the "sum". Since this one is defined
  2248. only once in source, it is the standard numerical constant, and can be
  2249. forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
  2250. how the assembler is able to resolve this, see section 2.2.6.
  2251.   The value of numerical constant can be preceded by size operator, which can
  2252. ensure that the value will fit in the range for the specified size, and can
  2253. affect also how some of the calculations inside the numerical expression are
  2254. performed. This example:
  2255.  
  2256.     c8 = byte -1
  2257.     c32 = dword -1
  2258.  
  2259. defines two different constants, the first one fits in 8 bits, the second one
  2260. fits in 32 bits.
  2261.   When you need to define constant with the value of address, which may be
  2262. register-based (and thus you cannot employ numerical constant for this
  2263. purpose), you can use the extended syntax of "label" directive (already
  2264. described in section 1.2.3), like:
  2265.  
  2266.     label myaddr at ebp+4
  2267.  
  2268. which declares label placed at "ebp+4" address. However remember that labels,
  2269. unlike numerical constants, cannot become assembly-time variables.
  2270.  
  2271.  
  2272. 2.2.2  Conditional assembly
  2273.  
  2274. "if" directive causes come block of instructions to be assembled only under
  2275. certain condition. It should be followed by logical expression specifying the
  2276. condition, instructions in next lines will be assembled only when this
  2277. condition is met, otherwise they will be skipped. The optional "else if"
  2278. directive followed with logical expression specifying additional condition
  2279. begins the next block of instructions that will be assembled if previous
  2280. conditions were not met, and the additional condition is met. The optional
  2281. "else" directive begins the block of instructions that will be assembled if
  2282. all the conditions were not met. The "end if" directive ends the last block of
  2283. instructions.
  2284.   You should note that "if" directive is processed at assembly stage and
  2285. therefore it doesn't affect any preprocessor directives, like the definitions
  2286. of symbolic constants and macroinstructions - when the assembler recognizes the
  2287. "if" directive, all the preprocessing has been already finished.
  2288.   The logical expression consist of logical values and logical operators. The
  2289. logical operators are "~" for logical negation, "&" for logical and, "|" for
  2290. logical or. The negation has the highest priority. Logical value can be a
  2291. numerical expression, it will be false if it is equal to zero, otherwise it
  2292. will be true. Two numerical expression can be compared using one of the
  2293. following operators to make the logical value: "=" (equal), "<" (less),
  2294. ">" (greater), "<=" (less or equal), ">=" (greater or equal),
  2295. "<>" (not equal).
  2296.   The "used" operator followed by a symbol name, is the logical value that
  2297. checks whether the given symbol is used somewhere (it returns correct result
  2298. even if symbol is used only after this check). The "defined" operator can be
  2299. followed by any expression, usually just by a single symbol name; it checks
  2300. whether the given expression contains only symbols that are defined in the
  2301. source and accessible from the current position.
  2302.   The following simple example uses the "count" constant that should be
  2303. defined somewhere in source:
  2304.  
  2305.     if count>0
  2306.         mov cx,count
  2307.         rep movsb
  2308.     end if
  2309.  
  2310. These two assembly instructions will be assembled only if the "count" constant
  2311. is greater than 0. The next sample shows more complex conditional structure:
  2312.  
  2313.     if count & ~ count mod 4
  2314.         mov cx,count/4
  2315.         rep movsd
  2316.     else if count>4
  2317.         mov cx,count/4
  2318.         rep movsd
  2319.         mov cx,count mod 4
  2320.         rep movsb
  2321.     else
  2322.         mov cx,count
  2323.         rep movsb
  2324.     end if
  2325.  
  2326. The first block of instructions gets assembled when the "count" is non zero and
  2327. divisible by four, if this condition is not met, the second logical expression,
  2328. which follows the "else if", is evaluated and if it's true, the second block
  2329. of instructions get assembled, otherwise the last block of instructions, which
  2330. follows the line containing only "else", is assembled.
  2331.   There are also operators that allow comparison of values being any chains of
  2332. symbols. The "eq" compares two such values whether they are exactly the same.
  2333. The "in" operator checks whether given value is a member of the list of values
  2334. following this operator, the list should be enclosed between "<" and ">"
  2335. characters, its members should be separated with commas. The symbols are
  2336. considered the same when they have the same meaning for the assembler - for
  2337. example "pword" and "fword" for assembler are the same and thus are not
  2338. distinguished by the above operators. In the same way "16 eq 10h" is the true
  2339. condition, however "16 eq 10+4" is not.
  2340.   The "eqtype" operator checks whether the two compared values have the same
  2341. structure, and whether the structural elements are of the same type. The
  2342. distinguished types include numerical expressions, individual quoted strings,
  2343. floating point numbers, address expressions (the expressions enclosed in square
  2344. brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
  2345. operators, jump type and code type operators. And each of the special
  2346. characters that act as a separators, like comma or colon, is the separate type
  2347. itself. For example, two values, each one consisting of register name followed
  2348. by comma and numerical expression, will be regarded as of the same type, no
  2349. matter what kind of register and how complicated numerical expression is used;
  2350. with exception for the quoted strings and floating point values, which are the
  2351. special kinds of numerical expressions and are treated as different types. Thus
  2352. "eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
  2353.  
  2354.  
  2355. 2.2.3 Repeating blocks of instructions
  2356.  
  2357. "times" directive repeats one instruction specified number of times. It
  2358. should be followed by numerical expression specifying number of repeats and
  2359. the instruction to repeat (optionally colon can be used to separate number and
  2360. instruction). When special symbol "%" is used inside the instruction, it is
  2361. equal to the number of current repeat. For example "times 5 db %" will define
  2362. five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
  2363. also allowed, so "times 3 times % db %" will define six bytes with values
  2364. 1, 1, 2, 1, 2, 3.
  2365.   "repeat" directive repeats the whole block of instructions. It should be
  2366. followed by numerical expression specifying number of repeats. Instructions
  2367. to repeat are expected in next lines, ended with the "end repeat" directive,
  2368. for example:
  2369.  
  2370.     repeat 8
  2371.         mov byte [bx],%
  2372.         inc bx
  2373.     end repeat
  2374.  
  2375. The generated code will store byte values from one to eight in the memory
  2376. addressed by BX register.
  2377.   Number of repeats can be zero, in that case the instructions are not
  2378. assembled at all.
  2379.   The "break" directive allows to stop repeating earlier and continue assembly
  2380. from the first line after the "end repeat". Combined with the "if" directive it
  2381. allows to stop repeating under some special condition, like:
  2382.  
  2383.     s = x/2
  2384.     repeat 100
  2385.         if x/s = s
  2386.             break
  2387.         end if
  2388.         s = (s+x/s)/2
  2389.     end repeat
  2390.  
  2391.   The "while" directive repeats the block of instructions as long as the
  2392. condition specified by the logical expression following it is true. The block
  2393. of instructions to be repeated should end with the "end while" directive.
  2394. Before each repetition the logical expression is evaluated and when its value
  2395. is false, the assembly is continued starting from the first line after the
  2396. "end while". Also in this case the "%" symbol holds the number of current
  2397. repeat. The "break" directive can be used to stop this kind of loop in the same
  2398. way as with "repeat" directive. The previous sample can be rewritten to use the
  2399. "while" instead of "repeat" this way:
  2400.  
  2401.     s = x/2
  2402.     while x/s <> s
  2403.         s = (s+x/s)/2
  2404.         if % = 100
  2405.             break
  2406.         end if
  2407.     end while
  2408.  
  2409.   The blocks defined with "if", "repeat" and "while" can be nested in any
  2410. order, however they should be closed in the same order in which they were
  2411. started. The "break" directive always stops processing the block that was
  2412. started last with either the "repeat" or "while" directive.
  2413.  
  2414.  
  2415. 2.2.4  Addressing spaces
  2416.  
  2417.   "org" directive sets address at which the following code is expected to
  2418. appear in memory. It should be followed by numerical expression specifying
  2419. the address. This directive begins the new addressing space, the following
  2420. code itself is not moved in any way, but all the labels defined within it
  2421. and the value of "$" symbol are affected as if it was put at the given
  2422. address. However it's the responsibility of programmer to put the code at
  2423. correct address at run-time.
  2424.   The "load" directive allows to define constant with a binary value loaded
  2425. from the already assembled code. This directive should be followed by the name
  2426. of the constant, then optionally size operator, then "from" operator and a
  2427. numerical expression specifying a valid address in current addressing space.
  2428. The size operator has unusual meaning in this case - it states how many bytes
  2429. (up to 8) have to be loaded to form the binary value of constant. If no size
  2430. operator is specified, one byte is loaded (thus value is in range from 0 to
  2431. 255). The loaded data cannot exceed current offset.
  2432.   The "store" directive can modify the already generated code by replacing
  2433. some of the previously generated data with the value defined by given
  2434. numerical expression, which follow. The expression can be preceded by the
  2435. optional size operator to specify how large value the expression defines, and
  2436. therefore how much bytes will be stored, if there is no size operator, the
  2437. size of one byte is assumed. Then the "at" operator and the numerical
  2438. expression defining the valid address in current addressing code space, at
  2439. which the given value have to be stored should follow. This is a directive for
  2440. advanced appliances and should be used carefully.
  2441.   Both "load" and "store" directives are limited to operate on places in
  2442. current addressing space. The "$$" symbol is always equal to the base address
  2443. of current addressing space, and the "$" symbol is the address of current
  2444. position in that addressing space, therefore these two values define limits
  2445. of the area, where "load" and "store" can operate.
  2446.   Combining the "load" and "store" directives allows to do things like encoding
  2447. some of the already generated code. For example to encode the whole code
  2448. generated in current addressing space you can use such block of directives:
  2449.  
  2450.     repeat $-$$
  2451.         load a byte from $$+%-1
  2452.         store byte a xor c at $$+%-1
  2453.     end repeat
  2454.  
  2455. and each byte of code will be xored with the value defined by "c" constant.
  2456.   "virtual" defines virtual data at specified address. This data won't be
  2457. included in the output file, but labels defined there can be used in other
  2458. parts of source. This directive can be followed by "at" operator and the
  2459. numerical expression specifying the address for virtual data, otherwise is
  2460. uses current address, the same as "virtual at $". Instructions defining data
  2461. are expected in next lines, ended with "end virtual" directive. The block of
  2462. virtual instructions itself is an independent addressing space, after it's
  2463. ended, the context of previous addressing space is restored.
  2464.   The "virtual" directive can be used to create union of some variables, for
  2465. example:
  2466.  
  2467.     GDTR dp ?
  2468.     virtual at GDTR
  2469.         GDT_limit dw ?
  2470.         GDT_address dd ?
  2471.     end virtual
  2472.  
  2473. It defines two labels for parts of the 48-bit variable at "GDTR" address.
  2474.   It can be also used to define labels for some structures addressed by a
  2475. register, for example:
  2476.  
  2477.     virtual at bx
  2478.         LDT_limit dw ?
  2479.         LDT_address dd ?
  2480.     end virtual
  2481.  
  2482. With such definition instruction "mov ax,[LDT_limit]" will be assembled
  2483. to "mov ax,[bx]".
  2484.   Declaring defined data values or instructions inside the virtual block would
  2485. also be useful, because the "load" directive can be used to load the values
  2486. from the virtually generated code into a constants. This directive should be
  2487. used after the code it loads but before the virtual block ends, because it can
  2488. only load the values from the same addressing space. For example:
  2489.  
  2490.     virtual at 0
  2491.         xor eax,eax
  2492.         and edx,eax
  2493.         load zeroq dword from 0
  2494.     end virtual
  2495.  
  2496. The above piece of code will define the "zeroq" constant containing four bytes
  2497. of the machine code of the instructions defined inside the virtual block.
  2498. This method can be also used to load some binary value from external file.
  2499. For example this code:
  2500.  
  2501.     virtual at 0
  2502.         file 'a.txt':10h,1
  2503.         load char from 0
  2504.     end virtual
  2505.  
  2506. loads the single byte from offset 10h in file "a.txt" into the "char"
  2507. constant.
  2508.   Any of the "section" directives described in 2.4 also begins a new
  2509. addressing space.
  2510.  
  2511.  
  2512. 2.2.5  Other directives
  2513.  
  2514. "align" directive aligns code or data to the specified boundary. It should
  2515. be followed by a numerical expression specifying the number of bytes, to the
  2516. multiply of which the current address has to be aligned. The boundary value
  2517. has to be the power of two.
  2518.   The "align" directive fills the bytes that had to be skipped to perform the
  2519. alignment with the "nop" instructions and at the same time marks this area as
  2520. uninitialized data, so if it is placed among other uninitialized data that
  2521. wouldn't take space in the output file, the alignment bytes will act the same
  2522. way. If you need to fill the alignment area with some other values, you can
  2523. combine "align" with "virtual" to get the size of alignment needed and then
  2524. create the alignment yourself, like:
  2525.  
  2526.     virtual
  2527.         align 16
  2528.         a = $ - $$
  2529.     end virtual
  2530.     db a dup 0
  2531.  
  2532. The "a" constant is defined to be the difference between address after
  2533. alignment and address of the "virtual" block (see previous section), so it is
  2534. equal to the size of needed alignment space.
  2535.   "display" directive displays the message at the assembly time. It should
  2536. be followed by the quoted strings or byte values, separated with commas. It
  2537. can be used to display values of some constants, for example:
  2538.  
  2539.     bits = 16
  2540.     display 'Current offset is 0x'
  2541.     repeat bits/4
  2542.         d = '0' + $ shr (bits-%*4) and 0Fh
  2543.         if d > '9'
  2544.             d = d + 'A'-'9'-1
  2545.         end if
  2546.         display d
  2547.     end repeat
  2548.     display 13,10
  2549.  
  2550. This block of directives calculates the four hexadecimal digits of 16-bit value
  2551. and converts them into characters for displaying. Note that this won't work if
  2552. the adresses in current addressing space are relocatable (as it might happen
  2553. with PE or object output formats), since only absolute values can be used this
  2554. way. The absolute value may be obtained by calculating the relative address,
  2555. like "$-$$", or "rva $" in case of PE format.
  2556.  
  2557.  
  2558. 2.2.6  Multiple passes
  2559.  
  2560. Because the assembler allows to reference some of the labels or constants
  2561. before they get actually defined, it has to predict the values of such labels
  2562. and if there is even a suspicion that prediction failed in at least one case,
  2563. it does one more pass, assembling the whole source, this time doing better
  2564. prediction based on the values the labels got in the previous pass.
  2565.   The changing values of labels can cause some instructions to have encodings
  2566. of different length, and this can cause the change in values of labels again.
  2567. And since the labels and constants can also be used inside the expressions that
  2568. affect the behavior of control directives, the whole block of source can be
  2569. processed completely differently during the new pass. Thus the assembler does
  2570. more and more passes, each time trying to do better predictions to approach
  2571. the final solution, when all the values get predicted correctly. It uses
  2572. various method for predicting the values, which has been chosen to allow
  2573. finding in a few passes the solution of possibly smallest length for the most
  2574. of the programs.
  2575.   Some of the errors, like the values not fitting in required boundaries, are
  2576. not signaled during those intermediate passes, since it may happen that when
  2577. some of the values are predicted better, these errors will disappear. However
  2578. if assembler meets some illegal syntax construction or unknown instruction, it
  2579. always stops immediately. Also defining some label more than once causes such
  2580. error, because it makes the predictions groundless.
  2581.   Only the messages created with the "display" directive during the last
  2582. performed pass get actually displayed. In case when the assembly has been
  2583. stopped due to an error, these messages may reflect the predicted values that
  2584. are not yet resolved correctly.
  2585.   The solution may sometimes not exist and in such cases the assembler will
  2586. never manage to make correct predictions - for this reason there is a limit for
  2587. a number of passes, and when assembler reaches this limit, it stops and
  2588. displays the message that it is not able to generate the correct output.
  2589. Consider the following example:
  2590.  
  2591.     if ~ defined alpha
  2592.         alpha:
  2593.     end if
  2594.  
  2595. The "defined" operator gives the true value when the expression following it
  2596. could be calculated in this place, what in this case means that the "alpha"
  2597. label is defined somewhere. But the above block causes this label to be defined
  2598. only when the value given by "defined" operator is false, what leads to an
  2599. antynomy and makes it impossible to resolve such code. When processing the "if"
  2600. directive assembler has to predict whether the "alpha" label will be defined
  2601. somewhere (it wouldn't have to predict only if the label was already defined
  2602. earlier in this pass), and whatever the prediction is, the opposite always
  2603. happens. Thus the assembly will fail, unless the "alpha" label is defined
  2604. somewhere in source preceding the above block of instructions - in such case,
  2605. as it was already noted, the prediction is not needed and the block will just
  2606. get skipped.
  2607.   The above sample might have been written as a try to define the label only
  2608. when it was not yet defined. It fails, because the "defined" operator does
  2609. check whether the label is defined anywhere, and this includes the definition
  2610. inside this conditionally processed block. However adding some additional
  2611. condition may make it possible to get it resolved:
  2612.  
  2613.     if ~ defined alpha | defined @f
  2614.         alpha:
  2615.         @@:
  2616.     end if
  2617.  
  2618. The "@f" is always the same label as the nearest "@@" symbol in the source
  2619. following it, so the above sample would mean the same if any unique name was
  2620. used instead of the anonymous label. When "alpha" is not defined in any other
  2621. place in source, the only possible solution is when this block gets defined,
  2622. and this time this doesn't lead to the antynomy, because of the anonymous
  2623. label which makes this block self-establishing. To better understand this,
  2624. look at the blocks that has nothing more than this self-establishing:
  2625.  
  2626.     if defined @f
  2627.         @@:
  2628.     end if
  2629.  
  2630. This is an example of source that may have more than one solution, as both
  2631. cases when this block gets processed or not are equally correct. Which one of
  2632. those two solutions we get depends on the algorithm on the assembler, in case
  2633. of flat assembler - on the algorithm of predictions. Back to the previous
  2634. sample, when "alpha" is not defined anywhere else, the condition for "if" block
  2635. cannot be false, so we are left with only one possible solution, and we can
  2636. hope the assembler will arrive at it. On the other hand, when "alpha" is
  2637. defined in some other place, we've got two possible solutions again, but one of
  2638. them causes "alpha" to be defined twice, and such an error causes assembler to
  2639. abort the assembly immediately, as this is the kind of error that deeply
  2640. disturbs the process of resolving. So we can get such source either correctly
  2641. resolved or causing an error, and what we get may depend on the internal
  2642. choices made by the assembler.
  2643.   However there are some facts about such choices that are certain. When
  2644. assembler has to check whether the given symbol is defined and it was already
  2645. defined in the current pass, no prediction is needed - it was already noted
  2646. above. And when the given symbol has been defined never before, including all
  2647. the already finished passes, the assembler predicts it to be not defined.
  2648. Knowing this, we can expect that the simple self-establishing block shown
  2649. above will not be assembled at all and that the previous sample will resolve
  2650. correctly when "alpha" is defined somewhere before our conditional block,
  2651. while it will itself define "alpha" when it's not already defined earlier, thus
  2652. potentially causing the error because of double definition if the "alpha" is
  2653. also defined somewhere later.
  2654.   The "used" operator may be expected to behave in a similar manner in
  2655. analogous cases, however any other kinds of predictions my not be so simple and
  2656. you should never rely on them this way.
  2657.  
  2658.  
  2659. 2.3  Preprocessor directives
  2660.  
  2661. All preprocessor directives are processed before the main assembly process,
  2662. and therefore are not affected by the control directives. At this time also
  2663. all comments are stripped out.
  2664.  
  2665.  
  2666. 2.3.1  Including source files
  2667.  
  2668. "include" directive includes the specified source file at the position where
  2669. it is used. It should be followed by the quoted name of file that should be
  2670. included, for example:
  2671.  
  2672.     include 'macros.inc'
  2673.  
  2674. The whole included file is preprocessed before preprocessing the lines next
  2675. to the line containing the "include" directive. There are no limits to the
  2676. number of included files as long as they fit in memory.
  2677.   The quoted path can contain environment variables enclosed within "%"
  2678. characters, they will be replaced with their values inside the path, both the
  2679. "\" and "/" characters are allowed as a path separators. If no absolute path
  2680. is given, the file is first searched for in the directory containing file
  2681. which included it and when it's not found there, in the directory containing
  2682. the main source file (the one specified in command line). These rules concern
  2683. also paths given with the "file" directive.
  2684.  
  2685.  
  2686. 2.3.2  Symbolic constants
  2687.  
  2688. The symbolic constants are different from the numerical constants, before the
  2689. assembly process they are replaced with their values everywhere in source
  2690. lines after their definitions, and anything can become their values.
  2691.   The definition of symbolic constant consists of name of the constant
  2692. followed by the "equ" directive. Everything that follows this directive will
  2693. become the value of constant. If the value of symbolic constant contains
  2694. other symbolic constants, they are replaced with their values before assigning
  2695. this value to the new constant. For example:
  2696.  
  2697.     d equ dword
  2698.     NULL equ d 0
  2699.     d equ edx
  2700.  
  2701. After these three definitions the value of "NULL" constant is "dword 0" and
  2702. the value of "d" is "edx". So, for example, "push NULL" will be assembled as
  2703. "push dword 0" and "push d" will be assembled as "push edx". And if then the
  2704. following line was put:
  2705.  
  2706.     d equ d,eax
  2707.  
  2708. the "d" constant would get the new value of "edx,eax". This way the growing
  2709. lists of symbols can be defined.
  2710.   "restore" directive allows to get back previous value of redefined symbolic
  2711. constant. It should be followed by one more names of symbolic constants,
  2712. separated with commas. So "restore d" after the above definitions will give
  2713. "d" constant back the value "edx", the second one will restore it to value
  2714. "dword", and one more will revert "d" to original meaning as if no such
  2715. constant was defined. If there was no constant defined of given name,
  2716. "restore" won't cause an error, it will be just ignored.
  2717.   Symbolic constant can be used to adjust the syntax of assembler to personal
  2718. preferences. For example the following set of definitions provides the handy
  2719. shortcuts for all the size operators:
  2720.  
  2721.     b equ byte
  2722.     w equ word
  2723.     d equ dword
  2724.     p equ pword
  2725.     f equ fword
  2726.     q equ qword
  2727.     t equ tword
  2728.     x equ dqword
  2729.  
  2730.   Because symbolic constant may also have an empty value, it can be used to
  2731. allow the syntax with "offset" word before any address value:
  2732.  
  2733.     offset equ
  2734.  
  2735. After this definition "mov ax,offset char" will be valid construction for
  2736. copying the offset of "char" variable into "ax" register, because "offset" is
  2737. replaced with an empty value, and therefore ignored.
  2738.   The "define" directive followed by the name of constant and then the value,
  2739. is the alternative way of defining symbolic constant. The only difference
  2740. between "define" and "equ" is that "define" assigns the value as it is, it does
  2741. not replace the symbolic constants with their values inside it.
  2742.   Symbolic constants can also be defined with the "fix" directive, which has
  2743. the same syntax as "equ", but defines constants of high priority - they are
  2744. replaced with their symbolic values even before processing the preprocessor
  2745. directives and macroinstructions, the only exception is "fix" directive
  2746. itself, which has the highest possible priority, so it allows redefinition of
  2747. constants defined this way.
  2748.   The "fix" directive can be used for syntax adjustments related to directives
  2749. of preprocessor, what cannot be done with "equ" directive. For example:
  2750.  
  2751.     incl fix include
  2752.  
  2753. defines a short name for "include" directive, while the similar definition done
  2754. with "equ" directive wouldn't give such result, as standard symbolic constants
  2755. are replaced with their values after searching the line for preprocessor
  2756. directives.
  2757.  
  2758.  
  2759. 2.3.3  Macroinstructions
  2760.  
  2761. "macro" directive allows you to define your own complex instructions, called
  2762. macroinstructions, using which can greatly simplify the process of
  2763. programming. In its simplest form it's similar to symbolic constant
  2764. definition. For example the following definition defines a shortcut for the
  2765. "test al,0xFF" instruction:
  2766.  
  2767.     macro tst {test al,0xFF}
  2768.  
  2769. After the "macro" directive there is a name of macroinstruction and then its
  2770. contents enclosed between the "{" and "}" characters. You can use "tst"
  2771. instruction anywhere after this definition and it will be assembled as
  2772. "test al,0xFF". Defining symbolic constant "tst" of that value would give the
  2773. similar result, but the difference is that the name of macroinstruction is
  2774. recognized only as an instruction mnemonic. Also, macroinstructions are
  2775. replaced with corresponding code even before the symbolic constants are
  2776. replaced with their values. So if you define macroinstruction and symbolic
  2777. constant of the same name, and use this name as an instruction mnemonic, it
  2778. will be replaced with the contents of macroinstruction, but it will be
  2779. replaced with value if symbolic constant if used somewhere inside the
  2780. operands.
  2781.   The definition of macroinstruction can consist of many lines, because
  2782. "{" and "}" characters don't have to be in the same line as "macro" directive.
  2783. For example:
  2784.  
  2785.     macro stos0
  2786.      {
  2787.         xor al,al
  2788.         stosb
  2789.      }
  2790.  
  2791. The macroinstruction "stos0" will be replaced with these two assembly
  2792. instructions anywhere it's used.
  2793.   Like instructions which needs some number of operands, the macroinstruction
  2794. can be defined to need some number of arguments separated with commas. The
  2795. names of needed argument should follow the name of macroinstruction in the
  2796. line of "macro" directive and should be separated with commas if there is more
  2797. than one. Anywhere one of these names occurs in the contents of
  2798. macroinstruction, it will be replaced with corresponding value, provided when
  2799. the macroinstruction is used. Here is an example of a macroinstruction that
  2800. will do data alignment for binary output format:
  2801.  
  2802.     macro align value { rb (value-1)-($+value-1) mod value }
  2803.  
  2804. When the "align 4" instruction is found after this macroinstruction is
  2805. defined, it will be replaced with contents of this macroinstruction, and the
  2806. "value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
  2807.   If a macroinstruction is defined that uses an instruction with the same name
  2808. inside its definition, the previous meaning of this name is used. Useful
  2809. redefinition of macroinstructions can be done in that way, for example:
  2810.  
  2811.     macro mov op1,op2
  2812.      {
  2813.       if op1 in <ds,es,fs,gs,ss> & op2 in <cs,ds,es,fs,gs,ss>
  2814.         push  op2
  2815.         pop   op1
  2816.       else
  2817.         mov   op1,op2
  2818.       end if
  2819.      }
  2820.  
  2821. This macroinstruction extends the syntax of "mov" instruction, allowing both
  2822. operands to be segment registers. For example "mov ds,es" will be assembled as
  2823. "push es" and "pop ds". In all other cases the standard "mov" instruction will
  2824. be used. The syntax of this "mov" can be extended further by defining next
  2825. macroinstruction of that name, which will use the previous macroinstruction:
  2826.  
  2827.     macro mov op1,op2,op3
  2828.      {
  2829.       if op3 eq
  2830.         mov   op1,op2
  2831.       else
  2832.         mov   op1,op2
  2833.         mov   op2,op3
  2834.       end if
  2835.      }
  2836.  
  2837. It allows "mov" instruction to have three operands, but it can still have two
  2838. operands only, because when macroinstruction is given less arguments than it
  2839. needs, the rest of arguments will have empty values. When three operands are
  2840. given, this macroinstruction will become two macroinstructions of the previous
  2841. definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
  2842. "mov ds,dx".
  2843.   By placing the "*" after the name of argument you can mark the argument as
  2844. required - preprocessor won't allow it to have an empty value. For example the
  2845. above macroinstruction could be declared as "macro mov op1*,op2*,op3" to make
  2846. sure that first two arguments will always have to be given some non empty
  2847. values.
  2848.   When it's needed to provide macroinstruction with argument that contains
  2849. some commas, such argument should be enclosed between "<" and ">" characters.
  2850. If it contains more than one "<" character, the same number of ">" should be
  2851. used to tell that the value of argument ends.
  2852.   "purge" directive allows removing the last definition of specified
  2853. macroinstruction. It should be followed by one or more names of
  2854. macroinstructions, separated with commas. If such macroinstruction has not
  2855. been defined, you won't get any error. For example after having the syntax of
  2856. "mov" extended with the macroinstructions defined above, you can disable
  2857. syntax with three operands back by using "purge mov" directive. Next
  2858. "purge mov" will disable also syntax for two operands being segment registers,
  2859. and all the next such directives will do nothing.
  2860.   If after the "macro" directive you enclose some group of arguments' names in
  2861. square brackets, it will allow giving more values for this group of arguments
  2862. when using that macroinstruction. Any more argument given after the last
  2863. argument of such group will begin the new group and will become the first
  2864. argument of it. That's why after closing the square bracket no more argument
  2865. names can follow. The contents of macroinstruction will be processed for each
  2866. such group of arguments separately. The simplest example is to enclose one
  2867. argument name in square brackets:
  2868.  
  2869.     macro stoschar [char]
  2870.      {
  2871.         mov al,char
  2872.         stosb
  2873.      }
  2874.  
  2875. This macroinstruction accepts unlimited number of arguments, and each one
  2876. will be processed into these two instructions separately. For example
  2877. "stoschar 1,2,3" will be assembled as the following instructions:
  2878.  
  2879.     mov al,1
  2880.     stosb
  2881.     mov al,2
  2882.     stosb
  2883.     mov al,3
  2884.     stosb
  2885.  
  2886.   There are some special directives available only inside the definitions of
  2887. macroinstructions. "local" directive defines local names, which will be
  2888. replaced with unique values each time the macroinstruction is used. It should
  2889. be followed by names separated with commas. If the name given as parameter to
  2890. "local" directive begins with a dot or two dots, the unique labels generated
  2891. by each evaluation of macroinstruction will have the same properties.
  2892. This directive is usually needed for the constants or labels that
  2893. macroinstruction defines and uses internally. For example:
  2894.  
  2895.     macro movstr
  2896.      {
  2897.         local move
  2898.       move:
  2899.         lodsb
  2900.         stosb
  2901.         test al,al
  2902.         jnz move
  2903.      }
  2904.  
  2905. Each time this macroinstruction is used, "move" will become other unique name
  2906. in its instructions, so you won't get an error you normally get when some
  2907. label is defined more than once.
  2908.   "forward", "reverse" and "common" directives divide macroinstruction into
  2909. blocks, each one processed after the processing of previous is finished. They
  2910. differ in behavior only if macroinstruction allows multiple groups of
  2911. arguments. Block of instructions that follows "forward" directive is processed
  2912. for each group of arguments, from first to last - exactly like the default
  2913. block (not preceded by any of these directives). Block that follows "reverse"
  2914. directive is processed for each group of argument in reverse order - from last
  2915. to first. Block that follows "common" directive is processed only once,
  2916. commonly for all groups of arguments. Local name defined in one of the blocks
  2917. is available in all the following blocks when processing the same group of
  2918. arguments as when it was defined, and when it is defined in common block it is
  2919. available in all the following blocks not depending on which group of
  2920. arguments is processed.
  2921.   Here is an example of macroinstruction that will create the table of
  2922. addresses to strings followed by these strings:
  2923.  
  2924.     macro strtbl name,[string]
  2925.      {
  2926.       common
  2927.         label name dword
  2928.       forward
  2929.         local label
  2930.         dd label
  2931.       forward
  2932.         label db string,0
  2933.      }
  2934.  
  2935. First argument given to this macroinstruction will become the label for table
  2936. of addresses, next arguments should be the strings. First block is processed
  2937. only once and defines the label, second block for each string declares its
  2938. local name and defines the table entry holding the address to that string.
  2939. Third block defines the data of each string with the corresponding label.
  2940.   The directive starting the block in macroinstruction can be followed by the
  2941. first instruction of this block in the same line, like in the following
  2942. example:
  2943.  
  2944.     macro stdcall proc,[arg]
  2945.      {
  2946.       reverse push arg
  2947.       common call proc
  2948.      }
  2949.  
  2950. This macroinstruction can be used for calling the procedures using STDCALL
  2951. convention, arguments are pushed on stack in the reverse order. For example
  2952. "stdcall foo,1,2,3" will be assembled as:
  2953.  
  2954.     push 3
  2955.     push 2
  2956.     push 1
  2957.     call foo
  2958.  
  2959.   If some name inside macroinstruction has multiple values (it is either one
  2960. of the arguments enclosed in square brackets or local name defined in the
  2961. block following "forward" or "reverse" directive) and is used in block
  2962. following the "common" directive, it will be replaced with all of its values,
  2963. separated with commas. For example the following macroinstruction will pass
  2964. all of the additional arguments to the previously defined "stdcall"
  2965. macroinstruction:
  2966.  
  2967.     macro invoke proc,[arg]
  2968.      { common stdcall [proc],arg }
  2969.  
  2970. It can be used to call indirectly (by the pointer stored in memory) the
  2971. procedure using STDCALL convention.
  2972.   Inside macroinstruction also special operator "#" can be used. This
  2973. operator causes two names to be concatenated into one name. It can be useful,
  2974. because it's done after the arguments and local names are replaced with their
  2975. values. The following macroinstruction will generate the conditional jump
  2976. according to the "cond" argument:
  2977.  
  2978.     macro jif op1,cond,op2,label
  2979.      {
  2980.         cmp op1,op2
  2981.         j#cond label
  2982.      }
  2983.  
  2984. For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
  2985. "jae exit" instructions.
  2986.   The "#" operator can be also used to concatenate two quoted strings into one.
  2987. Also conversion of name into a quoted string is possible, with the "`" operator,
  2988. which likewise can be used inside the macroinstruction. It convert the name
  2989. that follows it into a quoted string - but note, that when it is followed by
  2990. a macro argument which is being replaced with value containing more than one
  2991. symbol, only the first of them will be converted, as the "`" operator converts
  2992. only one symbol that immediately follows it. Here's an example of utilizing
  2993. those two features:
  2994.  
  2995.     macro label name
  2996.      {
  2997.         label name
  2998.         if ~ used name
  2999.           display `name # " is defined but not used.",13,10
  3000.         end if
  3001.      }
  3002.  
  3003. When label defined with such macro is not used in the source, macro will warn
  3004. you with the message, informing to which label it applies.
  3005.   To make macroinstruction behaving differently when some of the arguments are
  3006. of some special type, for example a quoted strings, you can use "eqtype"
  3007. comparison operator. Here's an example of utilizing it to distinguish a
  3008. quoted string from an other argument:
  3009.  
  3010.     macro message arg
  3011.      {
  3012.       if arg eqtype ""
  3013.         local str
  3014.         jmp   @f
  3015.         str   db arg,0Dh,0Ah,24h
  3016.         @@:
  3017.         mov   dx,str
  3018.       else
  3019.         mov   dx,arg
  3020.       end if
  3021.         mov   ah,9
  3022.         int   21h
  3023.      }
  3024.  
  3025. The above macro is designed for displaying messages in DOS programs. When the
  3026. argument of this macro is some number, label, or variable, the string from
  3027. that address is displayed, but when the argument is a quoted string, the
  3028. created code will display that string followed by the carriage return and
  3029. line feed.
  3030.   It is also possible to put a declaration of macroinstruction inside another
  3031. macroinstruction, so one macro can define another, but there is a problem
  3032. with such definitions caused by the fact, that "}" character cannot occur
  3033. inside the macroinstruction, as it always means the end of definition. To
  3034. overcome this problem, the escaping of symbols inside macroinstruction can be
  3035. used. This is done by placing one or more backslashes in front of any other
  3036. symbol (even the special character). Preprocessor sees such sequence as a
  3037. single symbol, but each time it meets such symbol during the macroinstruction
  3038. processing, it cuts the backslash character from the front of it. For example
  3039. "\{" is treated as single symbol, but during processing of the macroinstruction
  3040. it becomes the "{" symbol. This allows to put one definition of
  3041. macroinstruction inside another:
  3042.  
  3043.     macro ext instr
  3044.      {
  3045.       macro instr op1,op2,op3
  3046.        \{
  3047.         if op3 eq
  3048.           instr op1,op2
  3049.         else
  3050.           instr op1,op2
  3051.           instr op2,op3
  3052.         end if
  3053.        \}
  3054.      }
  3055.  
  3056.     ext add
  3057.     ext sub
  3058.  
  3059. The macro "ext" is defined correctly, but when it is used, the "\{" and "\}"
  3060. become the "{" and "}" symbols. So when the "ext add" is processed, the
  3061. contents of macro becomes valid definition of a macroinstruction and this way
  3062. the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
  3063. macro. The use of "\{" symbol wasn't really necessary here, but is done this
  3064. way to make the definition more clear.
  3065.   If some directives specific to macroinstructions, like "local" or "common"
  3066. are needed inside some macro embedded this way, they can be escaped in the same
  3067. way. Escaping the symbol with more than one backslash is also allowed, which
  3068. allows multiple levels of nesting the macroinstruction definitions.
  3069.   The another technique for defining one macroinstruction by another is to
  3070. use the "fix" directive, which becomes useful when some macroinstruction only
  3071. begins the definition of another one, without closing it. For example:
  3072.  
  3073.     macro tmacro [params]
  3074.      {
  3075.       common macro params {
  3076.      }
  3077.  
  3078.     MACRO fix tmacro
  3079.     ENDM fix }
  3080.  
  3081. defines an alternative syntax for defining macroinstructions, which looks like:
  3082.  
  3083.     MACRO stoschar char
  3084.         mov al,char
  3085.         stosb
  3086.     ENDM
  3087.  
  3088. Note that symbol that has such customized definition must be defined with "fix"
  3089. directive, because only the prioritized symbolic constants are processed before
  3090. the preprocessor looks for the "}" character while defining the macro. This
  3091. might be a problem if one needed to perform some additional tasks one the end
  3092. of such definition, but there is one more feature which helps in such cases.
  3093. Namely it is possible to put any directive, instruction or  macroinstruction
  3094. just after the "}" character that ends the macroinstruction and it will be
  3095. processed in the same way as if it was put in the next line.
  3096.  
  3097.  
  3098. 2.3.4  Structures
  3099.  
  3100. "struc" directive is a special variant of "macro" directive that is used to
  3101. define data structures. Macroinstruction defined using the "struc" directive
  3102. must be preceded by a label (like the data definition directive) when it's
  3103. used. This label will be also attached at the beginning of every name starting
  3104. with dot in the contents of macroinstruction. The macroinstruction defined
  3105. using the "struc" directive can have the same name as some other
  3106. macroinstruction defined using the "macro" directive, structure
  3107. macroinstruction won't prevent the standard macroinstruction being processed
  3108. when there is no label before it and vice versa. All the rules and features
  3109. concerning standard macroinstructions apply to structure macroinstructions.
  3110.   Here is the sample of structure macroinstruction:
  3111.  
  3112.     struc point x,y
  3113.      {
  3114.         .x dw x
  3115.         .y dw y
  3116.      }
  3117.  
  3118. For example "my point 7,11" will define structure labeled "my", consisting of
  3119. two variables: "my.x" with value 7 and "my.y" with value 11.
  3120.   If somewhere inside the definition of structure the name consisting of a
  3121. single dot it found, it is replaced by the name of the label for the given
  3122. instance of structure and this label will not be defined automatically in
  3123. such case, allowing to completely customize the definition. The following
  3124. example utilizes this feature to extend the data definition directive "db"
  3125. with ability to calculate the size of defined data:
  3126.  
  3127.     struc db [data]
  3128.      {
  3129.        common
  3130.         . db data
  3131.         .size = $ - .
  3132.      }
  3133.  
  3134. With such definition "msg db 'Hello!',13,10" will define also "msg.size"
  3135. constant, equal to the size of defined data in bytes.
  3136.   Defining data structures addressed by registers or absolute values should be
  3137. done using the "virtual" directive with structure macroinstruction
  3138. (see 2.2.4).
  3139.   "restruc" directive removes the last definition of the structure, just like
  3140. "purge" does with macroinstructions and "restore" with symbolic constants.
  3141. It also has the same syntax - should be followed by one or more names of
  3142. structure macroinstructions, separated with commas.
  3143.  
  3144.  
  3145. 2.3.5  Repeating macroinstructions
  3146.  
  3147. The "rept" directive is a special kind of macroinstruction, which makes given
  3148. amount of duplicates of the block enclosed with braces. The basic syntax is
  3149. "rept" directive followed by number (it cannot be an expression, since
  3150. preprocessor doesn't do calculations, if you need repetitions based on values
  3151. calculated by assembler, use one of the code repeating directives that are
  3152. processed by assembler, see 2.2.3), and then block of source enclosed between
  3153. the "{" and "}" characters. The simplest example:
  3154.  
  3155.     rept 5 { in al,dx }
  3156.  
  3157. will make five duplicates of the "in al,dx" line. The block of instructions
  3158. is defined in the same way as for the standard macroinstruction and any
  3159. special operators and directives which can be used only inside
  3160. macroinstructions are also allowed here. When the given count is zero, the
  3161. block is simply skipped, as if you defined macroinstruction but never used
  3162. it. The number of repetitions can be followed by the name of counter symbol,
  3163. which will get replaced symbolically with the number of duplicate currently
  3164. generated. So this:
  3165.  
  3166.     rept 3 counter
  3167.      {
  3168.         byte#counter db counter
  3169.      }
  3170.  
  3171. will generate lines:
  3172.  
  3173.     byte1 db 1
  3174.     byte2 db 2
  3175.     byte3 db 3
  3176.  
  3177. The repetition mechanism applied to "rept" blocks is the same as the one used
  3178. to process multiple groups of arguments for macroinstructions, so directives
  3179. like "forward", "common" and "reverse" can be used in their usual meaning.
  3180. Thus such macroinstruction:
  3181.  
  3182.     rept 7 num { reverse display `num }
  3183.  
  3184. will display digits from 7 to 1 as text. The "local" directive behaves in the
  3185. same way as inside macroinstruction with multiple groups of arguments, so:
  3186.  
  3187.     rept 21
  3188.      {
  3189.        local label
  3190.        label: loop label
  3191.      }
  3192.  
  3193. will generate unique label for each duplicate.
  3194.   The counter symbol by default counts from 1, but you can declare different
  3195. base value by placing the number preceded by colon immediately after the name
  3196. of counter. For example:
  3197.  
  3198.     rept 8 n:0 { pxor xmm#n,xmm#n }
  3199.  
  3200. will generate code which will clear the contents of eight SSE registers.
  3201. You can define multiple counters separated with commas, and each one can have
  3202. different base.
  3203.   The "irp" directive iterates the single argument through the given list of
  3204. parameters. The syntax is "irp" followed by the argument name, then the comma
  3205. and then the list of parameters. The parameters are specified in the same
  3206. way like in the invocation of standard macroinstruction, so they have to be
  3207. separated with commas and each one can be enclosed with the "<" and ">"
  3208. characters. Also the name of argument may be followed by "*" to mark that it
  3209. cannot get an empty value. Such block:
  3210.  
  3211.    irp value, 2,3,5
  3212.     { db value }
  3213.  
  3214. will generate lines:
  3215.  
  3216.    db 2
  3217.    db 3
  3218.    db 5
  3219.  
  3220. The "irps" directive iterates through the given list of symbols, it should
  3221. be followed by the argument name, then the comma and then the sequence of any
  3222. symbols. Each symbol in this sequence, no matter whether it is the name
  3223. symbol, symbol character or quoted string, becomes an argument value for one
  3224. iteration. If there are no symbols following the comma, no iteration is done
  3225. at all. This example:
  3226.  
  3227.    irps reg, al bx ecx
  3228.     { xor reg,reg }
  3229.  
  3230. will generate lines:
  3231.  
  3232.    xor al,al
  3233.    xor bx,bx
  3234.    xor ecx,ecx
  3235.  
  3236. The blocks defined by the "irp" and "irps" directives are also processed in
  3237. the same way as any macroinstructions, so operators and directives specific
  3238. to macroinstructions may be freely used also in this case.
  3239.  
  3240.  
  3241. 2.3.6  Conditional preprocessing
  3242.  
  3243. "match" directive causes some block of source to be preprocessed and passed
  3244. to assembler only when the given sequence of symbols matches the specified
  3245. pattern. The pattern comes first, ended with comma, then the symbols that have
  3246. to be matched with the pattern, and finally the block of source, enclosed
  3247. within braces as macroinstruction.
  3248.   There are the few rules for building the expression for matching, first is
  3249. that any of symbol characters and any quoted string should be matched exactly
  3250. as is. In this example:
  3251.  
  3252.     match +,+ { include 'first.inc' }
  3253.     match +,- { include 'second.inc' }
  3254.  
  3255. the first file will get included, since "+" after comma matches the "+" in
  3256. pattern, and the second file won't be included, since there is no match.
  3257.   To match any other symbol literally, it has to be preceded by "=" character
  3258. in the pattern. Also to match the "=" character itself, or the comma, the
  3259. "==" and "=," constructions have to be used. For example the "=a==" pattern
  3260. will match the "a=" sequence.
  3261.   If some name symbol is placed in the pattern, it matches any sequence
  3262. consisting of at least one symbol and then this name is replaced with the
  3263. matched sequence everywhere inside the following block, analogously to the
  3264. parameters of macroinstruction. For instance:
  3265.  
  3266.     match a-b, 0-7
  3267.      { dw a,b-a }
  3268.  
  3269. will generate the "dw 0,7-0" instruction. Each name is always matched with
  3270. as few symbols as possible, leaving the rest for the following ones, so in
  3271. this case:
  3272.  
  3273.     match a b, 1+2+3 { db a }
  3274.  
  3275. the "a" name will match the "1" symbol, leaving the "+2+3" sequence to be
  3276. matched with "b". But in this case:
  3277.  
  3278.     match a b, 1 { db a }
  3279.  
  3280. there will be nothing left for "b" to match, so the block won't get processed
  3281. at all.
  3282.   The block of source defined by match is processed in the same way as any
  3283. macroinstruction, so any operators specific to macroinstructions can be used
  3284. also in this case.
  3285.   What makes "match" directive more useful is the fact, that it replaces the
  3286. symbolic constants with their values in the matched sequence of symbols (that
  3287. is everywhere after comma up to the beginning of the source block) before
  3288. performing the match. Thanks to this it can be used for example to process
  3289. some block of source under the condition that some symbolic constant has the
  3290. given value, like:
  3291.  
  3292.     match =TRUE, DEBUG { include 'debug.inc' }
  3293.  
  3294. which will include the file only when the symbolic constant "DEBUG" was
  3295. defined with value "TRUE".
  3296.  
  3297.  
  3298. 2.3.7  Order of processing
  3299.  
  3300. When combining various features of the preprocessor, it's important to know
  3301. the order in which they are processed. As it was already noted, the highest
  3302. priority has the "fix" directive and the replacements defined with it. This
  3303. is done completely before doing any other preprocessing, therefore this
  3304. piece of source:
  3305.  
  3306.     V fix {
  3307.       macro empty
  3308.        V
  3309.     V fix }
  3310.        V
  3311.  
  3312. becomes a valid definition of an empty macroinstruction. It can be interpreted
  3313. that the "fix" directive and prioritized symbolic constants are processed in
  3314. a separate stage, and all other preprocessing is done after on the resulting
  3315. source.
  3316.   The standard preprocessing that comes after, on each line begins with
  3317. recognition of the first symbol. It begins with checking for the preprocessor
  3318. directives, and when none of them is detected, preprocessor checks whether the
  3319. first symbol is macroinstruction. If no macroinstruction is found, it moves
  3320. to the second symbol of line, and again begins with checking for directives,
  3321. which in this case is only the "equ" directive, as this is the only one that
  3322. occurs as the second symbol in line. If there's no directive, the second
  3323. symbol is checked for the case of structure macroinstruction and when none
  3324. of those checks gives the positive result, the symbolic constants are replaced
  3325. with their values and such line is passed to the assembler.
  3326.   To see it on the example, assume that there is defined the macroinstruction
  3327. called "foo" and the structure macroinstruction called "bar". Those lines:
  3328.  
  3329.     foo equ
  3330.     foo bar
  3331.  
  3332. would be then both interpreted as invocations of macroinstruction "foo", since
  3333. the meaning of the first symbol overrides the meaning of second one.
  3334.   The macroinstructions generate the new lines from their definition blocks,
  3335. replacing the parameters with their values and then processing the "#" and "`"
  3336. operators. The conversion operator has the higher priority than concatenation.
  3337. After this is completed, the newly generated line goes through the standard
  3338. preprocessing, as described above.
  3339.   Though the symbolic constants are usually only replaced in the lines, where
  3340. no preprocessor directives nor macroinstructions has been found, there are some
  3341. special cases where those replacements are performed in the parts of lines
  3342. containing directives. First one is the definition of symbolic constant, where
  3343. the replacements are done everywhere after the "equ" keyword and the resulting
  3344. value is then assigned to the new constant (see 2.3.2). The second such case
  3345. is the "match" directive, where the replacements are done in the symbols
  3346. following comma before matching them with pattern. These features can be used
  3347. for example to maintain the lists, like this set of definitions:
  3348.  
  3349.     list equ
  3350.  
  3351.     macro append item
  3352.      {
  3353.        match any, list \{ list equ list,item \}
  3354.        match , list \{ list equ item \}
  3355.      }
  3356.  
  3357. The "list" constant is here initialized with empty value, and the "append"
  3358. macroinstruction can be used to add the new items into this list, separating
  3359. them with commas. The first match in this macroinstruction occurs only when
  3360. the value of list is not empty (see 2.3.6), in such case the new value for the
  3361. list is the previous one with the comma and the new item appended at the end.
  3362. The second match happens only when the list is still empty, and in such case
  3363. the list is defined to contain just the new item. So starting with the empty
  3364. list, the "append 1" would define "list equ 1" and the "append 2" following it
  3365. would define "list equ 1,2". One might then need to use this list as the
  3366. parameters to some macroinstruction. But it cannot be done directly - if "foo"
  3367. is the macroinstruction, then "foo list" would just pass the "list" symbol
  3368. as a parameter to macro, since symbolic constants are not unrolled at this
  3369. stage. For this purpose again "match" directive comes in handy:
  3370.  
  3371.     match params, list { foo params }
  3372.  
  3373. The value of "list", if it's not empty, matches the "params" keyword, which is
  3374. then replaced with matched value when generating the new lines defined by the
  3375. block enclosed with braces. So if the "list" had value "1,2", the above line
  3376. would generate the line containing "foo 1,2", which would then go through the
  3377. standard preprocessing.
  3378.   There is one more special case - when preprocessor goes to checking the
  3379. second symbol in the line and it happens to be the colon character (what is
  3380. then interpreted by assembler as definition of a label), it stops in this
  3381. place and finishes the preprocessing of the first symbol (so if it's the
  3382. symbolic constant it gets unrolled) and if it still appears to be the label,
  3383. it performs the standard preprocessing starting from the place after the
  3384. label. This allows to place preprocessor directives and macroinstructions
  3385. after the labels, analogously to the instructions and directives processed
  3386. by assembler, like:
  3387.  
  3388.     start: include 'start.inc'
  3389.  
  3390. However if the label becomes broken during preprocessing (for example when
  3391. it is the symbolic constant with empty value), only replacing of the symbolic
  3392. constants is continued for the rest of line.
  3393.   It should be remembered, that the jobs performed by preprocessor are the
  3394. preliminary operations on the texts symbols, that are done in a simple
  3395. single pass before the main process of assembly. The text that is the
  3396. result of preprocessing is passed to assembler, and it then does its
  3397. multiple passes on it. Thus the control directives, which are recognized and
  3398. processed only by the assembler - as they are dependent on the numerical
  3399. values that may even vary between passes - are not recognized in any way by
  3400. the preprocessor and have no effect on the preprocessing. Consider this
  3401. example source:
  3402.  
  3403.     if 0
  3404.     a = 1
  3405.     b equ 2
  3406.     end if
  3407.     dd b
  3408.  
  3409. When it is preprocessed, they only directive that is recognized by the
  3410. preprocessor is the "equ", which defines symbolic constant "b", so later
  3411. in the source the "b" symbol is replaced with the value "2". Except for this
  3412. replacement, the other lines are passes unchanged to the assembler. So
  3413. after preprocessing the above source becomes:
  3414.  
  3415.     if 0
  3416.     a = 1
  3417.     end if
  3418.     dd 2
  3419.  
  3420. Now when assembler processes it, the condition for the "if" is false, and
  3421. the "a" constant doesn't get defined. However symbolic constant "b" was
  3422. processed normally, even though its definition was put just next to the one
  3423. of "a". So because of the possible confusion you should be very careful
  3424. every time when mixing the features of preprocessor and assembler - always
  3425. try to imagine what your source will become after the preprocessing, and
  3426. thus what the assembler will see and do its multiple passes on.
  3427.  
  3428.  
  3429. 2.4  Formatter directives
  3430.  
  3431. These directives are actually also a kind of control directives, with the
  3432. purpose of controlling the format of generated code.
  3433.   "format" directive followed by the format identifier allows to select the
  3434. output format. This directive should be put at the beginning of the source.
  3435. Default output format is a flat binary file, it can also be selected by using
  3436. "format binary" directive.
  3437.   "use16" and "use32" directives force the assembler to generate 16-bit or
  3438. 32-bit code, omitting the default setting for selected output format. "use64"
  3439. enables generating the code for the long mode of x86-64 processors.
  3440.   Below are described different output formats with the directives specific to
  3441. these formats.
  3442.  
  3443.  
  3444. 2.4.1  MZ executable
  3445.  
  3446. To select the MZ output format, use "format MZ" directive. The default code
  3447. setting for this format is 16-bit.
  3448.   "segment" directive defines a new segment, it should be followed by label,
  3449. which value will be the number of defined segment, optionally "use16" or
  3450. "use32" word can follow to specify whether code in this segment should be
  3451. 16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
  3452. All the labels defined then will have values relative to the beginning of this
  3453. segment.
  3454.   "entry" directive sets the entry point for MZ executable, it should be
  3455. followed by the far address (name of segment, colon and the offset inside
  3456. segment) of desired entry point.
  3457.   "stack" directive sets up the stack for MZ executable. It can be followed by
  3458. numerical expression specifying the size of stack to be created automatically
  3459. or by the far address of initial stack frame when you want to set up the stack
  3460. manually. When no stack is defined, the stack of default size 4096 bytes will
  3461. be created.
  3462.   "heap" directive should be followed by a 16-bit value defining maximum size
  3463. of additional heap in paragraphs (this is heap in addition to stack and
  3464. undefined data). Use "heap 0" to always allocate only memory program really
  3465. needs. Default size of heap is 65535.
  3466.  
  3467.  
  3468. 2.4.2  Portable Executable
  3469.  
  3470. To select the Portable Executable output format, use "format PE" directive, it
  3471. can be followed by additional format settings: use "console", "GUI" or
  3472. "native" operator selects the target subsystem (floating point value
  3473. specifying subsystem version can follow), "DLL" marks the output file as a
  3474. dynamic link library. Then can follow the "at" operator and the numerical
  3475. expression specifying the base of PE image and then optionally "on" operator
  3476. followed by the quoted string containing file name selects custom MZ stub for
  3477. PE program (when specified file is not a MZ executable, it is treated as a
  3478. flat binary executable file and converted into MZ format). The default code
  3479. setting for this format is 32-bit. The example of fully featured PE format
  3480. declaration:
  3481.  
  3482.     format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
  3483.  
  3484.   To create PE file for the x86-64 architecture, use "PE64" keyword instead of
  3485. "PE" in the format declaration, in such case the long mode code is generated
  3486. by default.
  3487.   "section" directive defines a new section, it should be followed by quoted
  3488. string defining the name of section, then one or more section flags can
  3489. follow. Available flags are: "code", "data", "readable", "writeable",
  3490. "executable", "shareable", "discardable", "notpageable". The origin of section
  3491. is aligned to page (4096 bytes). Example declaration of PE section:
  3492.  
  3493.     section '.text' code readable executable
  3494.  
  3495. Among with flags also one of the special PE data identifiers can be specified
  3496. to mark the whole section as a special data, possible identifiers are
  3497. "export", "import", "resource" and "fixups". If the section is marked to
  3498. contain fixups, they are generated automatically and no more data needs to be
  3499. defined in this section. Also resource data can be generated automatically
  3500. from the resource file, it can be achieved by writing the "from" operator and
  3501. quoted file name after the "resource"  identifier. Below are the examples of
  3502. sections containing some special PE data:
  3503.  
  3504.     section '.reloc' data discardable fixups
  3505.     section '.rsrc' data readable resource from 'my.res'
  3506.  
  3507.   "entry" directive sets the entry point for Portable Executable, the value of
  3508. entry point should follow.
  3509.   "stack" directive sets up the size of stack for Portable Executable, value
  3510. of stack reserve size should follow, optionally value of stack commit
  3511. separated with comma can follow. When stack is not defined, it's set by
  3512. default to size of 4096 bytes.
  3513.   "heap" directive chooses the size of heap for Portable Executable, value of
  3514. heap reserve size should follow, optionally value of heap commit separated
  3515. with comma can follow. When no heap is defined, it is set by default to size
  3516. of 65536 bytes, when size of heap commit is unspecified, it is by default set
  3517. to zero.
  3518.   "data" directive begins the definition of special PE data, it should be
  3519. followed by one of the data identifiers ("export", "import", "resource" or
  3520. "fixups") or by the number of data entry in PE header. The data should be
  3521. defined in next lines, ended with "end data" directive. When fixups data
  3522. definition is chosen, they are generated automatically and no more data needs
  3523. to be defined there. The same applies to the resource data when the "resource"
  3524. identifier is followed by "from" operator and quoted file name - in such case
  3525. data is  taken from the given resource file.
  3526.   The "rva" operator can be used inside the numerical expressions to obtain
  3527. the RVA of the item addressed by the value it is applied to.
  3528.  
  3529.  
  3530. 2.4.3  Common Object File Format
  3531.  
  3532. To select Common Object File Format, use "format COFF" or "format MS COFF"
  3533. directive whether you want to create classic or Microsoft's COFF file. The
  3534. default code setting for this format is 32-bit. To create the file in
  3535. Microsoft's COFF format for the x86-64 architecture, use "format MS64 COFF"
  3536. setting, in such case long mode code is generated by default.
  3537.   "section" directive defines a new section, it should be followed by quoted
  3538. string defining the name of section, then one or more section flags can
  3539. follow. Section flags available for both COFF variants are "code" and "data",
  3540. while "readable", "writeable", "executable", "shareable", "discardable",
  3541. "notpageable", "linkremove" and "linkinfo" are flags available only with
  3542. Microsoft COFF variant.
  3543.   By default section is aligned to double word (four bytes), in case of
  3544. Microsoft COFF variant other alignment can be specified by providing the
  3545. "align" operator followed by alignment value (any power of two up to 8192)
  3546. among the section flags.
  3547.   "extrn" directive defines the external symbol, it should be followed by the
  3548. name of symbol and optionally the size operator specifying the size of data
  3549. labeled by this symbol. The name of symbol can be also preceded by quoted
  3550. string containing name of the external symbol and the "as" operator.
  3551. Some example declarations of external symbols:
  3552.  
  3553.     extrn exit
  3554.     extrn '__imp__MessageBoxA@16' as MessageBox:dword
  3555.  
  3556.   "public" directive declares the existing symbol as public, it should be
  3557. followed by the name of symbol, optionally it can be followed by the "as"
  3558. operator and the quoted string containing name under which symbol should be
  3559. available as public. Some examples of public symbols declarations:
  3560.  
  3561.     public main
  3562.     public start as '_start'
  3563.  
  3564. 2.4.4  Executable and Linkable Format
  3565.  
  3566. To select ELF output format, use "format ELF" directive. The default code
  3567. setting for this format is 32-bit. To create ELF file for the x86-64
  3568. architecture, use "format ELF64" directive, in such case the long mode code is
  3569. generated by default.
  3570.   "section" directive defines a new section, it should be followed by quoted
  3571. string defining the name of section, then can follow one or both of the
  3572. "executable" and "writeable" flags, optionally also "align" operator followed
  3573. by the number specifying the alignment of section (it has to be the power of
  3574. two), if no alignment is specified, the default value is used, which is 4 or 8,
  3575. depending on which format variant has been chosen.
  3576.   "extrn" and "public" directives have the same meaning and syntax as when the
  3577. COFF output format is selected (described in previous section).
  3578.   The "rva" operator can be used also in the case of this format (however not
  3579. when target architecture is x86-64), it converts the address into the offset
  3580. relative to the GOT table, so it may be useful to create position-independent
  3581. code.
  3582.   To create executable file, follow the format choice directive with the
  3583. "executable" keyword. It allows to use "entry" directive followed by the value
  3584. to set as entry point of program. On the other hand it makes "extrn" and
  3585. "public" directives unavailable, and instead of "section" there should be the
  3586. "segment" directive used, followed only by one or more segment permission
  3587. flags. The origin of segment is aligned to page (4096 bytes), and available
  3588. flags for are: "readable", "writeable" and "executable".
  3589.  
  3590.  
  3591. EOF