WebSVN – Kolibri OS – Blame – /programs/develop/fasm/trunk/fasm.txt

Rev	Author	Line No.	Line
4479	dunkaist	1
		2	,,;,, ,,,, ,,,,, ,,, ,,
		3	; ; ; ; ; ;
		4	; ,''''; '''', ; ; ;
		5	; ',,,,;, ,,,,,' ; ; ;
		6
		7
		8	Programmer's Manual
		9
		10
		11
		12	-----------------
		13
		14
		15
		16
		17	1.1.1 System requirements
		18	1.1.2 Executing compiler from command line
		19	1.1.3 Compiler messages
		20	1.1.4 Output formats
		21
		22
		23	1.2.1 Instruction syntax
		24	1.2.2 Data definitions
		25	1.2.3 Constants and labels
		26	1.2.4 Numerical expressions
		27	1.2.5 Jumps and calls
		28	1.2.6 Size settings
		29
		30
		31
		32
		33	2.1.1 Data movement instructions
		34	2.1.2 Type conversion instructions
		35	2.1.3 Binary arithmetic instructions
		36	2.1.4 Decimal arithmetic instructions
		37	2.1.5 Logical instructions
		38	2.1.6 Control transfer instructions
		39	2.1.7 I/O instructions
		40	2.1.8 Strings operations
		41	2.1.9 Flag control instructions
		42	2.1.10 Conditional operations
		43	2.1.11 Miscellaneous instructions
		44	2.1.12 System instructions
		45	2.1.13 FPU instructions
		46	2.1.14 MMX instructions
		47	2.1.15 SSE instructions
		48	2.1.16 SSE2 instructions
		49	2.1.17 SSE3 instructions
		50	2.1.18 AMD 3DNow! instructions
		51	2.1.19 The x86-64 long mode instructions
		52	2.1.20 SSE4 instructions
		53	2.1.21 AVX instructions
		54	2.1.22 AVX2 instructions
		55	2.1.23 Auxiliary sets of computational instructions
		56	2.1.24 Other extensions of instruction set
		57
		58
		59	2.2.1 Numerical constants
		60	2.2.2 Conditional assembly
		61	2.2.3 Repeating blocks of instructions
		62	2.2.4 Addressing spaces
		63	2.2.5 Other directives
		64	2.2.6 Multiple passes
		65
		66
		67	2.3.1 Including source files
		68	2.3.2 Symbolic constants
		69	2.3.3 Macroinstructions
		70	2.3.4 Structures
		71	2.3.5 Repeating macroinstructions
		72	2.3.6 Conditional preprocessing
		73	2.3.7 Order of processing
		74
		75
		76	2.4.1 MZ executable
		77	2.4.2 Portable Executable
		78	2.4.3 Common Object File Format
		79	2.4.4 Executable and Linkable Format
		80
		81
		82
		83
		84	-----------------------
		85
		86
		87	using the flat assembler. If you are experienced assembly language programmer,
		88	you should read at least this chapter before using this compiler.
		89
		90
		91
		92
		93
		94	processors, which does multiple passes to optimize the size of generated
		95	machine code. It is self-compilable and versions for different operating
		96	systems are provided. All the versions are designed to be used from the system
		97	command line and they should not differ in behavior.
		98
		99
		100
		101
		102
		103	although they can produce programs for the x86 architecture 16-bit processors,
		104	too. DOS version requires an OS compatible with MS DOS 2.0 and either true
		105	real mode environment or DPMI. Windows version requires a Win32 console
		106	compatible with 3.1 version.
		107
		108
		109
		110
		111
		112	parameters - first should be name of source file, second should be name of
		113	destination file. If no second parameter is given, the name for output
		114	file will be guessed automatically. After displaying short information about
		115	the program name and version, compiler will read the data from source file and
		116	compile it. When the compilation is successful, compiler will write the
		117	generated code to the destination file and display the summary of compilation
		118	process; otherwise it will display the information about error that occurred.
		119	The source file should be a text file, and can be created in any text
		120	editor. Line breaks are accepted in both DOS and Unix standards, tabulators
		121	are treated as spaces.
		122	In the command line you can also include "-m" option followed by a number,
		123	which specifies how many kilobytes of memory flat assembler should maximally
		124	use. In case of DOS version this options limits only the usage of extended
		125	memory. The "-p" option followed by a number can be used to specify the limit
		126	for number of passes the assembler performs. If code cannot be generated
		127	within specified amount of passes, the assembly will be terminated with an
		128	error message. The maximum value of this setting is 65536, while the default
		129	limit, used when no such option is included in command line, is 100.
		130	It is also possible to limit the number of passes the assembler
		131	performs, with the "-p" option followed by a number specifying the maximum
		132	number of passes.
		133	There are no command line options that would affect the output of compiler,
		134	flat assembler requires only the source code to include the information it
		135	really needs. For example, to specify output format you specify it by using
		136	the "format" directive at the beginning of source.
		137
		138
		139
		140
		141
		142	the compilation summary. It includes the information of how many passes was
		143	done, how much time it took, and how many bytes were written into the
		144	destination file.
		145	The following is an example of the compilation summary:
		146
		147
		148	38 passes, 5.3 seconds, 77824 bytes.
		149
		150
		151	error message. For example, when compiler can't find the input file, it will
		152	display the following message:
		153
		154
		155	error: source file not found.
		156
		157
		158	that caused the error will be also displayed. Also placement of this line in
		159	the source is given to help you finding this error, for example:
		160
		161
		162	example.asm [3]:
		163	mob ax,1
		164	error: illegal instruction.
		165
		166
		167	encountered an unrecognized instruction. When the line that caused error
		168	contains a macroinstruction, also the line in macroinstruction definition
		169	that generated the erroneous instruction is displayed:
		170
		171
		172	example.asm [6]:
		173	stoschar 7
		174	example.asm [3] stoschar [1]:
		175	mob al,char
		176	error: illegal instruction.
		177
		178
		179	generated an unrecognized instruction with the first line of its definition.
		180
		181
		182
		183
		184
		185	assembler simply puts generated instruction codes into output, creating this
		186	way flat binary file. By default it generates 16-bit code, but you can always
		187	turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
		188	Some of the output formats switch into 32-bit mode, when selected - more
		189	information about formats which you can choose can be found in 2.4.
		190	All output code is always in the order in which it was entered into the
		191	source file.
		192
		193
		194
		195
		196
		197	programmers that have been using some other assembly compilers before.
		198	If you are beginner, you should look for the assembly programming tutorials.
		199	Flat assembler by default uses the Intel syntax for the assembly
		200	instructions, although you can customize it using the preprocessor
		201	capabilities (macroinstructions and symbolic constants). It also has its own
		202	set of the directives - the instructions for compiler.
		203	All symbols defined inside the sources are case-sensitive.
		204
		205
		206
		207
		208
		209	instruction is expected to fill the one line of text. If a line contains
		210	a semicolon, except for the semicolons inside the quoted strings, the rest of
		211	this line is the comment and compiler ignores it. If a line ends with "\"
		212	character (eventually the semicolon and comment may follow it), the next line
		213	is attached at this point.
		214	Each line in source is the sequence of items, which may be one of the three
		215	types. One type are the symbol characters, which are the special characters
		216	that are individual items even when are not spaced from the other ones.
		217	Any of the "+-*/=<>()[]{}:,\|&~#`" is the symbol character. The sequence of
		218	other characters, separated from other items with either blank spaces or
		219	symbol characters, is a symbol. If the first character of symbol is either a
		220	single or double quote, it integrates any sequence of characters following it,
		221	even the special ones, into a quoted string, which should end with the same
		222	character, with which it began (the single or double quote) - however if there
		223	are two such characters in a row (without any other character between them),
		224	they are integrated into quoted string as just one of them and the quoted
		225	string continues then. The symbols other than symbol characters and quoted
		226	strings can be used as names, so are also called the name symbols.
		227	Every instruction consists of the mnemonic and the various number of
		228	operands, separated with commas. The operand can be register, immediate value
		229	or a data addressed in memory, it can also be preceded by size operator to
		230	define or override its size (table 1.1). Names of available registers you can
		231	find in table 1.2, their sizes cannot be overridden. Immediate value can be
		232	specified by any numerical expression.
		233	When operand is a data in memory, the address of that data (also any
		234	numerical expression, but it may contain registers) should be enclosed in
		235	square brackets or preceded by "ptr" operator. For example instruction
		236	"mov eax,3" will put the immediate value 3 into the EAX register, instruction
		237	"mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
		238	instruction "mov byte [7],3" will put the immediate value 3 into the byte at
		239	address 7, it can also be written as "mov byte ptr 7,3". To specify which
		240	segment register should be used for addressing, segment register name followed
		241	by a colon should be put just before the address value (inside the square
		242	brackets or after the "ptr" operator).
		243
		244
		245	/-------------------------\
		246	\| Operator \| Bits \| Bytes \|
		247	\|==========\|======\|=======\|
		248	\| byte \| 8 \| 1 \|
		249	\| word \| 16 \| 2 \|
		250	\| dword \| 32 \| 4 \|
		251	\| fword \| 48 \| 6 \|
		252	\| pword \| 48 \| 6 \|
		253	\| qword \| 64 \| 8 \|
		254	\| tbyte \| 80 \| 10 \|
		255	\| tword \| 80 \| 10 \|
		256	\| dqword \| 128 \| 16 \|
		257	\| xword \| 128 \| 16 \|
		258	\| qqword \| 256 \| 32 \|
		259	\| yword \| 256 \| 32 \|
		260	\-------------------------/
		261
		262
		263	/-----------------------------------------------------------------\
		264	\| Type \| Bits \| \|
		265	\|=========\|======\|================================================\|
		266	\| \| 8 \| al cl dl bl ah ch dh bh \|
		267	\| General \| 16 \| ax cx dx bx sp bp si di \|
		268	\| \| 32 \| eax ecx edx ebx esp ebp esi edi \|
		269	\|---------\|------\|------------------------------------------------\|
		270	\| Segment \| 16 \| es cs ss ds fs gs \|
		271	\|---------\|------\|------------------------------------------------\|
		272	\| Control \| 32 \| cr0 cr2 cr3 cr4 \|
		273	\|---------\|------\|------------------------------------------------\|
		274	\| Debug \| 32 \| dr0 dr1 dr2 dr3 dr6 dr7 \|
		275	\|---------\|------\|------------------------------------------------\|
		276	\| FPU \| 80 \| st0 st1 st2 st3 st4 st5 st6 st7 \|
		277	\|---------\|------\|------------------------------------------------\|
		278	\| MMX \| 64 \| mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 \|
		279	\|---------\|------\|------------------------------------------------\|
		280	\| SSE \| 128 \| xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 \|
		281	\|---------\|------\|------------------------------------------------\|
		282	\| AVX \| 256 \| ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 \|
		283	\-----------------------------------------------------------------/
		284
		285
		286
		287
		288
		289	table 1.3. The data definition directive should be followed by one or more of
		290	numerical expressions, separated with commas. These expressions define the
		291	values for data cells of size depending on which directive is used. For
		292	example "db 1,2,3" will define the three bytes of values 1, 2 and 3
		293	respectively.
		294	The "db" and "du" directives also accept the quoted string values of any
		295	length, which will be converted into chain of bytes when "db" is used and into
		296	chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
		297	will define the three bytes of values 61, 62 and 63.
		298	The "dp" directive and its synonym "df" accept the values consisting of two
		299	numerical expressions separated with colon, the first value will become the
		300	high word and the second value will become the low double word of the far
		301	pointer value. Also "dd" accepts such pointers consisting of two word values
		302	separated with colon, and "dt" accepts the word and quad word value separated
		303	with colon, the quad word is stored first. The "dt" directive with single
		304	expression as parameter accepts only floating point values and creates data in
		305	FPU double extended precision format.
		306	Any of the above directive allows the usage of special "dup" operator to
		307	make multiple copies of given values. The count of duplicates should precede
		308	this operator and the value to duplicate should follow - it can even be the
		309	chain of values separated with commas, but such set of values needs to be
		310	enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
		311	of the given two byte sequence.
		312	The "file" is a special directive and its syntax is different. This
		313	directive includes a chain of bytes from file and it should be followed by the
		314	quoted file name, then optionally numerical expression specifying offset in
		315	file preceded by the colon, and - also optionally - comma and numerical
		316	expression specifying count of bytes to include (if no count is specified, all
		317	data up to the end of file is included). For example "file 'data.bin'" will
		318	include the whole file as binary data and "file 'data.bin':10h,4" will include
		319	only four bytes starting at offset 10h.
		320	The data reservation directive should be followed by only one numerical
		321	expression, and this value defines how many cells of the specified size should
		322	be reserved. All data definition directives also accept the "?" value, which
		323	means that this cell should not be initialized to any value and the effect is
		324	the same as by using the data reservation directive. The uninitialized data
		325	may not be included in the output file, so its values should be always
		326	considered unknown.
		327
		328
		329	/----------------------------\
		330	\| Size \| Define \| Reserve \|
		331	\| (bytes) \| data \| data \|
		332	\|=========\|========\|=========\|
		333	\| 1 \| db \| rb \|
		334	\| \| file \| \|
		335	\|---------\|--------\|---------\|
		336	\| 2 \| dw \| rw \|
		337	\| \| du \| \|
		338	\|---------\|--------\|---------\|
		339	\| 4 \| dd \| rd \|
		340	\|---------\|--------\|---------\|
		341	\| 6 \| dp \| rp \|
		342	\| \| df \| rf \|
		343	\|---------\|--------\|---------\|
		344	\| 8 \| dq \| rq \|
		345	\|---------\|--------\|---------\|
		346	\| 10 \| dt \| rt \|
		347	\----------------------------/
		348
		349
		350
		351
		352
		353	numbers. To define the constant or label you should use the specific
		354	directives. Each label can be defined only once and it is accessible from the
		355	any place of source (even before it was defined). Constant can be redefined
		356	many times, but in this case it is accessible only after it was defined, and
		357	is always equal to the value from last definition before the place where it's
		358	used. When a constant is defined only once in source, it is - like the label -
		359	accessible from anywhere.
		360	The definition of constant consists of name of the constant followed by the
		361	"=" character and numerical expression, which after calculation will become
		362	the value of constant. This value is always calculated at the time the
		363	constant is defined. For example you can define "count" constant by using the
		364	directive "count = 17", and then use it in the assembly instructions, like
		365	"mov cx,count" - which will become "mov cx,17" during the compilation process.
		366	There are different ways to define labels. The simplest is to follow the
		367	name of label by the colon, this directive can even be followed by the other
		368	instruction in the same line. It defines the label whose value is equal to
		369	offset of the point where it's defined. This method is usually used to label
		370	the places in code. The other way is to follow the name of label (without a
		371	colon) by some data directive. It defines the label with value equal to
		372	offset of the beginning of defined data, and remembered as a label for data
		373	with cell size as specified for that data directive in table 1.3.
		374	The label can be treated as constant of value equal to offset of labeled
		375	code or data. For example when you define data using the labeled directive
		376	"char db 224", to put the offset of this data into BX register you should use
		377	"mov bx,char" instruction, and to put the value of byte addressed by "char"
		378	label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
		379	But when you try to assemble "mov ax,[char]", it will cause an error, because
		380	fasm compares the sizes of operands, which should be equal. You can force
		381	assembling that instruction by using size override: "mov ax,word [char]", but
		382	remember that this instruction will read the two bytes beginning at "char"
		383	address, while it was defined as a one byte.
		384	The last and the most flexible way to define labels is to use "label"
		385	directive. This directive should be followed by the name of label, then
		386	optionally size operator (it can be preceded by a colon) and then - also
		387	optionally "at" operator and the numerical expression defining the address at
		388	which this label should be defined. For example "label wchar word at char"
		389	will define a new label for the 16-bit data at the address of "char". Now the
		390	instruction "mov ax,[wchar]" will be after compilation the same as
		391	"mov ax,word [char]". If no address is specified, "label" directive defines
		392	the label at current offset. Thus "mov [wchar],57568" will copy two bytes
		393	while "mov [char],224" will copy one byte to the same address.
		394	The label whose name begins with dot is treated as local label, and its name
		395	is attached to the name of last global label (with name beginning with
		396	anything but dot) to make the full name of this label. So you can use the
		397	short name (beginning with dot) of this label anywhere before the next global
		398	label is defined, and in the other places you have to use the full name. Label
		399	beginning with two dots are the exception - they are like global, but they
		400	don't become the new prefix for local labels.
		401	The "@@" name means anonymous label, you can have defined many of them in
		402	the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
		403	anonymous label, symbol "@f" references the nearest following anonymous label.
		404	These special symbol are case-insensitive.
		405
		406
		407
		408
		409
		410	constants or labels. But they can be more complex, by using the arithmetical
		411	or logical operators for calculations at compile time. All these operators
		412	with their priority values are listed in table 1.4. The operations with higher
		413	priority value will be calculated first, you can of course change this
		414	behavior by putting some parts of expression into parenthesis. The "+", "-",
		415	"*" and "/" are standard arithmetical operations, "mod" calculates the
		416	remainder from division. The "and", "or", "xor", "shl", "shr" and "not"
		417	perform the same logical operations as assembly instructions of those names.
		418	The "rva" and "plt" are special unary operators that perform conversions
		419	between different kinds of addresses, they can be used only with few of the
		420	output formats and their meaning may vary (see 2.4).
		421	The arithmetical and logical calculations are usually processed as if they
		422	operated on infinite precision 2-adic numbers, and assembler signalizes an
		423	overflow error if because of its limitations it is not table to perform the
		424	required calculation, or if the result is too large number to fit in either
		425	signed or unsigned range for the destination unit size. However "not", "xor"
		426	and "shr" operators are exceptions from this rule - if the value specified
		427	by numerical expression has to fit in a unit of specified size, and the
		428	arguments for operation fit into that size, the operation will be performed
		429	with precision limited to that size.
		430	The numbers in the expression are by default treated as a decimal, binary
		431	numbers should have the "b" letter attached at the end, octal number should
		432	end with "o" letter, hexadecimal numbers should begin with "0x" characters
		433	(like in C language) or with the "$" character (like in Pascal language) or
		434	they should end with "h" letter. Also quoted string, when encountered in
		435	expression, will be converted into number - the first character will become
		436	the least significant byte of number.
		437	The numerical expression used as an address value can also contain any of
		438	general registers used for addressing, they can be added and multiplied by
		439	appropriate values, as it is allowed for the x86 architecture instructions.
		440	The numerical calculations inside address definition by default operate with
		441	target size assumed to be the same as the current bitness of code, even if
		442	generated instruction encoding will use a different address size.
		443	There are also some special symbols that can be used inside the numerical
		444	expression. First is "$", which is always equal to the value of current
		445	offset, while "$$" is equal to base address of current addressing space. The
		446	other one is "%", which is the number of current repeat in parts of code that
		447	are repeated using some special directives (see 2.2) and zero anywhere else.
		448	There's also "%t" symbol, which is always equal to the current time stamp.
		449	Any numerical expression can also consist of single floating point value
		450	(flat assembler does not allow any floating point operations at compilation
		451	time) in the scientific notation, they can end with the "f" letter to be
		452	recognized, otherwise they should contain at least one of the "." or "E"
		453	characters. So "1.0", "1E0" and "1f" define the same floating point value,
		454	while simple "1" defines an integer value.
		455
		456
		457	/-------------------------\
		458	\| Priority \| Operators \|
		459	\|==========\|==============\|
		460	\| 0 \| + - \|
		461	\|----------\|--------------\|
		462	\| 1 \| * / \|
		463	\|----------\|--------------\|
		464	\| 2 \| mod \|
		465	\|----------\|--------------\|
		466	\| 3 \| and or xor \|
		467	\|----------\|--------------\|
		468	\| 4 \| shl shr \|
		469	\|----------\|--------------\|
		470	\| 5 \| not \|
		471	\|----------\|--------------\|
		472	\| 6 \| rva plt \|
		473	\-------------------------/
		474
		475
		476
		477
		478
		479	size operator, but also by one of the operators specifying type of the jump:
		480	"short", "near" or "far". For example, when assembler is in 16-bit mode,
		481	instruction "jmp dword [0]" will become the far jump and when assembler is
		482	in 32-bit mode, it will become the near jump. To force this instruction to be
		483	treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
		484	When operand of near jump is the immediate value, assembler will generate
		485	the shortest variant of this jump instruction if possible (but will not create
		486	32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
		487	unless there is a size operator stating it). By specifying the jump type
		488	you can force it to always generate long variant (for example "jmp near 0")
		489	or to always generate short variant and terminate with an error when it's
		490	impossible (for example "jmp short 0").
		491
		492
		493
		494
		495
		496	instruction is generated by using the short displacement if only address
		497	value fits in the range. This can be overridden using the "word" or "dword"
		498	operator before the address inside the square brackets (or after the "ptr"
		499	operator), which forces the long displacement of appropriate size to be made.
		500	In case when address is not relative to any registers, those operators allow
		501	also to choose the appropriate mode of absolute addressing.
		502	Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
		503	first operand being 16-bit or 32-bit are by default generated in shortened
		504	8-bit form when the second operand is immediate value fitting in the range
		505	for signed 8-bit values. It also can be overridden by putting the "word" or
		506	"dword" operator before the immediate value. The similar rules applies to the
		507	"imul" instruction with the last operand being immediate value.
		508	Immediate value as an operand for "push" instruction without a size operator
		509	is by default treated as a word value if assembler is in 16-bit mode and as a
		510	double word value if assembler is in 32-bit mode, shorter 8-bit form of this
		511	instruction is used if possible, "word" or "dword" size operator forces the
		512	"push" instruction to be generated in longer form for specified size. "pushw"
		513	and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
		514	without forcing it to use the longer form of instruction.
		515
		516
		517
		518	--------------------------
		519
		520
		521	directives supported by flat assembler. Directives for defining labels were
		522	already discussed in 1.2.3, all other directives will be described later in
		523	this chapter.
		524
		525
		526
		527
		528
		529	purpose the assembly language instructions. If you need more technical
		530	information, look for the Intel Architecture Software Developer's Manual.
		531	Assembly instructions consist of the mnemonic (instruction's name) and from
		532	zero to three operands. If there are two or more operands, usually first is
		533	the destination operand and second is the source operand. Each operand can be
		534	register, memory or immediate value (see 1.2 for details about syntax of
		535	operands). After the description of each instruction there are examples
		536	of different combinations of operands, if the instruction has any.
		537	Some instructions act as prefixes and can be followed by other instruction
		538	in the same line, and there can be more than one prefix in a line. Each name
		539	of the segment register is also a mnemonic of instruction prefix, altough it
		540	is recommended to use segment overrides inside the square brackets instead of
		541	these prefixes.
		542
		543
		544
		545
		546
		547	destination operand. It can transfer data between general registers, from
		548	the general register to memory, or from memory to general register, but it
		549	cannot move from memory to memory. It can also transfer an immediate value to
		550	general register or memory, segment register to general register or memory,
		551	general register or memory to segment register, control or debug register to
		552	general register and general register to control or debug register. The "mov"
		553	can be assembled only if the size of source operand and size of destination
		554	operand are the same. Below are the examples for each of the allowed
		555	combinations:
		556
		557
		558	mov [char],al ; general register to memory
		559	mov bl,[char] ; memory to general register
		560	mov dl,32 ; immediate value to general register
		561	mov [char],32 ; immediate value to memory
		562	mov ax,ds ; segment register to general register
		563	mov [bx],ds ; segment register to memory
		564	mov ds,ax ; general register to segment register
		565	mov ds,[bx] ; memory to segment register
		566	mov eax,cr0 ; control register to general register
		567	mov cr3,ebx ; general register to control register
		568
		569
		570	two word operands or two double word operands. Order of operands is not
		571	important. The operands may be two general registers, or general register
		572	with memory. For example:
		573
		574
		575	xchg al,[char] ; swap register with memory
		576
		577
		578	the operand to the top of stack indicated by ESP. The operand can be memory,
		579	general register, segment register or immediate value of word or double word
		580	size. If operand is an immediate value and no size is specified, it is by
		581	default treated as a word value if assembler is in 16-bit mode and as a double
		582	word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
		583	variants of this instruction that store the values of word or double word size
		584	respectively. If more operands follow in the same line (separated only with
		585	spaces, not commas), compiler will assemble chain of the "push" instructions
		586	with these operands. The examples are with single operands:
		587
		588
		589	push es ; store segment register
		590	pushw [bx] ; store memory
		591	push 1000h ; store immediate value
		592
		593
		594	This instruction has no operands. There are two version of this instruction,
		595	one 16-bit and one 32-bit, assembler automatically generates the appropriate
		596	version for current mode, but it can be overridden by using "pushaw" or
		597	"pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
		598	version of this instruction pushes general registers on the stack in the
		599	following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
		600	BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
		601	in the same order.
		602	"pop" transfers the word or double word at the current top of stack to the
		603	destination operand, and then increments ESP to point to the new top of stack.
		604	The operand can be memory, general register or segment register. "popw" and
		605	"popd" mnemonics are variants of this instruction for restoring the values of
		606	word or double word size respectively. If more operands separated with spaces
		607	follow in the same line, compiler will assemble chain of the "pop"
		608	instructions with these operands.
		609
		610
		611	pop ds ; restore segment register
		612	popw [si] ; restore memory
		613
		614
		615	except for the saved value of SP (or ESP), which is ignored. This instruction
		616	has no operands. To force assembling 16-bit or 32-bit version of this
		617	instruction use "popaw" or "popad" mnemonic.
		618
		619
		620
		621
		622
		623	words, and double words into quad words. These conversions can be done using
		624	the sign extension or zero extension. The sign extension fills the extra bits
		625	of the larger item with the value of the sign bit of the smaller item, the
		626	zero extension simply fills them with zeros.
		627	"cwd" and "cdq" double the size of value AX or EAX register respectively
		628	and store the extra bits into the DX or EDX register. The conversion is done
		629	using the sign extension. These instructions have no operands.
		630	"cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
		631	the sign of the word in AX throughout EAX. These instructions also have no
		632	operands.
		633	"movsx" converts a byte to word or double word and a word to double word
		634	using the sign extension. "movzx" does the same, but it uses the zero
		635	extension. The source operand can be general register or memory, while the
		636	destination operand must be a general register. For example:
		637
		638
		639	movsx edx,dl ; byte register to double word register
		640	movsx eax,ax ; word register to double word register
		641	movsx ax,byte [bx] ; byte memory to word register
		642	movsx edx,byte [bx] ; byte memory to double word register
		643	movsx eax,word [bx] ; word memory to double word register
		644
		645
		646
		647
		648
		649	destination operands and sets CF if overflow has occurred. The operands may
		650	be bytes, words or double words. The destination operand can be general
		651	register or memory, the source operand can be general register or immediate
		652	value, it can also be memory if the destination operand is register.
		653
		654
		655	add ax,[si] ; add memory to register
		656	add [di],al ; add register to memory
		657	add al,48 ; add immediate value to register
		658	add [char],48 ; add immediate value to memory
		659
		660
		661	operand with the result. Rules for the operands are the same as for the "add"
		662	instruction. An "add" followed by multiple "adc" instructions can be used to
		663	add numbers longer than 32 bits.
		664	"inc" adds one to the operand, it does not affect CF. The operand can be a
		665	general register or memory, and the size of the operand can be byte, word or
		666	double word.
		667
		668
		669	inc byte [bx] ; increment memory by one
		670
		671
		672	the destination operand with the result. If a borrow is required, the CF is
		673	set. Rules for the operands are the same as for the "add" instruction.
		674	"sbb" subtracts the source operand from the destination operand, subtracts
		675	one if CF is set, and stores the result to the destination operand. Rules for
		676	the operands are the same as for the "add" instruction. A "sub" followed by
		677	multiple "sbb" instructions may be used to subtract numbers longer than 32
		678	bits.
		679	"dec" subtracts one from the operand, it does not affect CF. Rules for the
		680	operand are the same as for the "inc" instruction.
		681	"cmp" subtracts the source operand from the destination operand. It updates
		682	the flags as the "sub" instruction, but does not alter the source and
		683	destination operands. Rules for the operands are the same as for the "sub"
		684	instruction.
		685	"neg" subtracts a signed integer operand from zero. The effect of this
		686	instructon is to reverse the sign of the operand from positive to negative or
		687	from negative to positive. Rules for the operand are the same as for the "inc"
		688	instruction.
		689	"xadd" exchanges the destination operand with the source operand, then loads
		690	the sum of the two values into the destination operand. Rules for the operands
		691	are the same as for the "add" instruction.
		692	All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
		693	SF is always set to the same value as the result's sign bit, ZF is set when
		694	all the bits of result are zero, PF is set when low order eight bits of result
		695	contain an even number of set bits, OF is set if result is too large for a
		696	positive number or too small for a negative number (excluding sign bit) to fit
		697	in destination operand.
		698	"mul" performs an unsigned multiplication of the operand and the
		699	accumulator. If the operand is a byte, the processor multiplies it by the
		700	contents of AL and returns the 16-bit result to AH and AL. If the operand is a
		701	word, the processor multiplies it by the contents of AX and returns the 32-bit
		702	result to DX and AX. If the operand is a double word, the processor multiplies
		703	it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
		704	sets CF and OF when the upper half of the result is nonzero, otherwise they
		705	are cleared. Rules for the operand are the same as for the "inc" instruction.
		706	"imul" performs a signed multiplication operation. This instruction has
		707	three variations. First has one operand and behaves in the same way as the
		708	"mul" instruction. Second has two operands, in this case destination operand
		709	is multiplied by the source operand and the result replaces the destination
		710	operand. Destination operand must be a general register, it can be word or
		711	double word, source operand can be general register, memory or immediate
		712	value. Third form has three operands, the destination operand must be a
		713	general register, word or double word in size, source operand can be general
		714	register or memory, and third operand must be an immediate value. The source
		715	operand is multiplied by the immediate value and the result is stored in the
		716	destination register. All the three forms calculate the product to twice the
		717	size of operands and set CF and OF when the upper half of the result is
		718	nonzero, but second and third form truncate the product to the size of
		719	operands. So second and third forms can be also used for unsigned operands
		720	because, whether the operands are signed or unsigned, the lower half of the
		721	product is the same. Below are the examples for all three forms:
		722
		723
		724	imul word [si] ; accumulator by memory
		725	imul bx,cx ; register by register
		726	imul bx,[si] ; register by memory
		727	imul bx,10 ; register by immediate value
		728	imul ax,bx,10 ; register by immediate value to register
		729	imul ax,[si],10 ; memory by immediate value to register
		730
		731
		732	The dividend (the accumulator) is twice the size of the divisor (the operand),
		733	the quotient and remainder have the same size as the divisor. If divisor is
		734	byte, the dividend is taken from AX register, the quotient is stored in AL and
		735	the remainder is stored in AH. If divisor is word, the upper half of dividend
		736	is taken from DX, the lower half of dividend is taken from AX, the quotient is
		737	stored in AX and the remainder is stored in DX. If divisor is double word,
		738	the upper half of dividend is taken from EDX, the lower half of dividend is
		739	taken from EAX, the quotient is stored in EAX and the remainder is stored in
		740	EDX. Rules for the operand are the same as for the "mul" instruction.
		741	"idiv" performs a signed division of the accumulator by the operand.
		742	It uses the same registers as the "div" instruction, and the rules for
		743	the operand are the same.
		744
		745
		746
		747
		748
		749	instructions (already described in the prior section) with the decimal
		750	arithmetic instructions. The decimal arithmetic instructions are used to
		751	adjust the results of a previous binary arithmetic operation to produce a
		752	valid packed or unpacked decimal result, or to adjust the inputs to a
		753	subsequent binary arithmetic operation so the operation will produce a valid
		754	packed or unpacked decimal result.
		755	"daa" adjusts the result of adding two valid packed decimal operands in
		756	AL. "daa" must always follow the addition of two pairs of packed decimal
		757	numbers (one digit in each half-byte) to obtain a pair of valid packed
		758	decimal digits as results. The carry flag is set if carry was needed.
		759	This instruction has no operands.
		760	"das" adjusts the result of subtracting two valid packed decimal operands
		761	in AL. "das" must always follow the subtraction of one pair of packed decimal
		762	numbers (one digit in each half-byte) from another to obtain a pair of valid
		763	packed decimal digits as results. The carry flag is set if a borrow was
		764	needed. This instruction has no operands.
		765	"aaa" changes the contents of register AL to a valid unpacked decimal
		766	number, and zeroes the top four bits. "aaa" must always follow the addition
		767	of two unpacked decimal operands in AL. The carry flag is set and AH is
		768	incremented if a carry is necessary. This instruction has no operands.
		769	"aas" changes the contents of register AL to a valid unpacked decimal
		770	number, and zeroes the top four bits. "aas" must always follow the
		771	subtraction of one unpacked decimal operand from another in AL. The carry flag
		772	is set and AH decremented if a borrow is necessary. This instruction has no
		773	operands.
		774	"aam" corrects the result of a multiplication of two valid unpacked decimal
		775	numbers. "aam" must always follow the multiplication of two decimal numbers
		776	to produce a valid decimal result. The high order digit is left in AH, the
		777	low order digit in AL. The generalized version of this instruction allows
		778	adjustment of the contents of the AX to create two unpacked digits of any
		779	number base. The standard version of this instruction has no operands, the
		780	generalized version has one operand - an immediate value specifying the
		781	number base for the created digits.
		782	"aad" modifies the numerator in AH and AL to prepare for the division of two
		783	valid unpacked decimal operands so that the quotient produced by the division
		784	will be a valid unpacked decimal number. AH should contain the high order
		785	digit and AL the low order digit. This instruction adjusts the value and
		786	places the result in AL, while AH will contain zero. The generalized version
		787	of this instruction allows adjustment of two unpacked digits of any number
		788	base. Rules for the operand are the same as for the "aam" instruction.
		789
		790
		791
		792
		793
		794	of the operand. It has no effect on the flags. Rules for the operand are the
		795	same as for the "inc" instruction.
		796	"and", "or" and "xor" instructions perform the standard logical operations.
		797	They update the SF, ZF and PF flags. Rules for the operands are the same as
		798	for the "add" instruction.
		799	"bt", "bts", "btr" and "btc" instructions operate on a single bit which can
		800	be in memory or in a general register. The location of the bit is specified
		801	as an offset from the low order end of the operand. The value of the offset
		802	is the taken from the second operand, it either may be an immediate byte or
		803	a general register. These instructions first assign the value of the selected
		804	bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
		805	1, "btr" resets the selected bit to 0, "btc" changes the bit to its
		806	complement. The first operand can be word or double word.
		807
		808
		809	bts word [bx],15 ; test and set bit in memory
		810	btr ax,cx ; test and reset bit in register
		811	btc word [bx],cx ; test and complement bit in memory
		812
		813
		814	and store the index of this bit into destination operand, which must be
		815	general register. The bit string being scanned is specified by source operand,
		816	it may be either general register or memory. The ZF flag is set if the entire
		817	string is zero (no set bits are found); otherwise it is cleared. If no set bit
		818	is found, the value of the destination register is undefined. "bsf" scans from
		819	low order to high order (starting from bit index zero). "bsr" scans from high
		820	order to low order (starting from bit index 15 of a word or index 31 of a
		821	double word).
		822
		823
		824	bsr ax,[si] ; scan memory reverse
		825
		826
		827	in the second operand. The destination operand can be byte, word, or double
		828	word general register or memory. The second operand can be an immediate value
		829	or the CL register. The processor shifts zeros in from the right (low order)
		830	side of the operand as bits exit from the left side. The last bit that exited
		831	is stored in CF. "sal" is a synonym for "shl".
		832
		833
		834	shl byte [bx],1 ; shift memory left by one bit
		835	shl ax,cl ; shift register left by count from cl
		836	shl word [bx],cl ; shift memory left by count from cl
		837
		838
		839	specified in the second operand. Rules for operands are the same as for the
		840	"shl" instruction. "shr" shifts zeros in from the left side of the operand as
		841	bits exit from the right side. The last bit that exited is stored in CF.
		842	"sar" preserves the sign of the operand by shifting in zeros on the left side
		843	if the value is positive or by shifting in ones if the value is negative.
		844	"shld" shifts bits of the destination operand to the left by the number
		845	of bits specified in third operand, while shifting high order bits from the
		846	source operand into the destination operand on the right. The source operand
		847	remains unmodified. The destination operand can be a word or double word
		848	general register or memory, the source operand must be a general register,
		849	third operand can be an immediate value or the CL register.
		850
		851
		852	shld [di],bx,1 ; shift memory left by one bit
		853	shld ax,bx,cl ; shift register left by count from cl
		854	shld [di],bx,cl ; shift memory left by count from cl
		855
		856
		857	low order bits from the source operand into the destination operand on the
		858	left. The source operand remains unmodified. Rules for operands are the same
		859	as for the "shld" instruction.
		860	"rol" and "rcl" rotate the byte, word or double word destination operand
		861	left by the number of bits specified in the second operand. For each rotation
		862	specified, the high order bit that exits from the left of the operand returns
		863	at the right to become the new low order bit. "rcl" additionally puts in CF
		864	each high order bit that exits from the left side of the operand before it
		865	returns to the operand as the low order bit on the next rotation cycle. Rules
		866	for operands are the same as for the "shl" instruction.
		867	"ror" and "rcr" rotate the byte, word or double word destination operand
		868	right by the number of bits specified in the second operand. For each rotation
		869	specified, the low order bit that exits from the right of the operand returns
		870	at the left to become the new high order bit. "rcr" additionally puts in CF
		871	each low order bit that exits from the right side of the operand before it
		872	returns to the operand as the high order bit on the next rotation cycle.
		873	Rules for operands are the same as for the "shl" instruction.
		874	"test" performs the same action as the "and" instruction, but it does not
		875	alter the destination operand, only updates flags. Rules for the operands are
		876	the same as for the "and" instruction.
		877	"bswap" reverses the byte order of a 32-bit general register: bits 0 through
		878	7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
		879	bits 16 through 23. This instruction is provided for converting little-endian
		880	values to big-endian format and vice versa.
		881
		882
		883
		884
		885
		886
		887
		888	destination address can be specified directly within the instruction or
		889	indirectly through a register or memory, the acceptable size of this address
		890	depends on whether the jump is near or far (it can be specified by preceding
		891	the operand with "near" or "far" operator) and whether the instruction is
		892	16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
		893	instruction or the "dword" size for 32-bit instruction. Operand for far jump
		894	should be "dword" size for 16-bit instruction or "pword" size for 32-bit
		895	instruction. A direct "jmp" instruction includes the destination address as
		896	part of the instruction (and can be preceded by "short", "near" or "far"
		897	operator), the operand specifying address should be the numerical expression
		898	for near or short jump, or two numerical expressions separated with colon for
		899	far jump, the first specifies selector of segment, the second is the offset
		900	within segment. The "pword" operator can be used to force the 32-bit far call,
		901	and "dword" to force the 16-bit far call. An indirect "jmp" instruction
		902	obtains the destination address indirectly through a register or a pointer
		903	variable, the operand should be general register or memory. See also 1.2.5 for
		904	some more details.
		905
		906
		907	jmp 0FFFFh:0 ; direct far jump
		908	jmp ax ; indirect near jump
		909	jmp pword [ebx] ; indirect far jump
		910
		911
		912	of the instruction following the "call" for later use by a "ret" (return)
		913	instruction. Rules for the operands are the same as for the "jmp" instruction,
		914	but the "call" has no short variant of direct instruction and thus it not
		915	optimized.
		916	"ret", "retn" and "retf" instructions terminate the execution of a procedure
		917	and transfers control back to the program that originally invoked the
		918	procedure using the address that was stored on the stack by the "call"
		919	instruction. "ret" is the equivalent for "retn", which returns from the
		920	procedure that was executed using the near call, while "retf" returns from
		921	the procedure that was executed using the far call. These instructions default
		922	to the size of address appropriate for the current code setting, but the size
		923	of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
		924	mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
		925	All these instructions may optionally specify an immediate operand, by adding
		926	this constant to the stack pointer, they effectively remove any arguments that
		927	the calling program pushed on the stack before the execution of the "call"
		928	instruction.
		929	"iret" returns control to an interrupted procedure. It differs from "ret" in
		930	that it also pops the flags from the stack into the flags register. The flags
		931	are stored on the stack by the interrupt mechanism. It defaults to the size of
		932	return address appropriate for the current code setting, but it can be forced
		933	to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
		934	The conditional transfer instructions are jumps that may or may not transfer
		935	control, depending on the state of the CPU flags when the instruction
		936	executes. The mnemonics for conditional jumps may be obtained by attaching
		937	the condition mnemonic (see table 2.1) to the "j" mnemonic,
		938	for example "jc" instruction will transfer the control when the CF flag is
		939	set. The conditional jumps can be short or near, and direct only, and can be
		940	optimized (see 1.2.5), the operand should be an immediate value specifying
		941	target address.
		942
		943
		944	/-----------------------------------------------------------\
		945	\| Mnemonic \| Condition tested \| Description \|
		946	\|==========\|=======================\|========================\|
		947	\| o \| OF = 1 \| overflow \|
		948	\|----------\|-----------------------\|------------------------\|
		949	\| no \| OF = 0 \| not overflow \|
		950	\|----------\|-----------------------\|------------------------\|
		951	\| c \| \| carry \|
		952	\| b \| CF = 1 \| below \|
		953	\| nae \| \| not above nor equal \|
		954	\|----------\|-----------------------\|------------------------\|
		955	\| nc \| \| not carry \|
		956	\| ae \| CF = 0 \| above or equal \|
		957	\| nb \| \| not below \|
		958	\|----------\|-----------------------\|------------------------\|
		959	\| e \| ZF = 1 \| equal \|
		960	\| z \| \| zero \|
		961	\|----------\|-----------------------\|------------------------\|
		962	\| ne \| ZF = 0 \| not equal \|
		963	\| nz \| \| not zero \|
		964	\|----------\|-----------------------\|------------------------\|
		965	\| be \| CF or ZF = 1 \| below or equal \|
		966	\| na \| \| not above \|
		967	\|----------\|-----------------------\|------------------------\|
		968	\| a \| CF or ZF = 0 \| above \|
		969	\| nbe \| \| not below nor equal \|
		970	\|----------\|-----------------------\|------------------------\|
		971	\| s \| SF = 1 \| sign \|
		972	\|----------\|-----------------------\|------------------------\|
		973	\| ns \| SF = 0 \| not sign \|
		974	\|----------\|-----------------------\|------------------------\|
		975	\| p \| PF = 1 \| parity \|
		976	\| pe \| \| parity even \|
		977	\|----------\|-----------------------\|------------------------\|
		978	\| np \| PF = 0 \| not parity \|
		979	\| po \| \| parity odd \|
		980	\|----------\|-----------------------\|------------------------\|
		981	\| l \| SF xor OF = 1 \| less \|
		982	\| nge \| \| not greater nor equal \|
		983	\|----------\|-----------------------\|------------------------\|
		984	\| ge \| SF xor OF = 0 \| greater or equal \|
		985	\| nl \| \| not less \|
		986	\|----------\|-----------------------\|------------------------\|
		987	\| le \| (SF xor OF) or ZF = 1 \| less or equal \|
		988	\| ng \| \| not greater \|
		989	\|----------\|-----------------------\|------------------------\|
		990	\| g \| (SF xor OF) or ZF = 0 \| greater \|
		991	\| nle \| \| not less nor equal \|
		992	\-----------------------------------------------------------/
		993
		994
		995	CX (or ECX) to specify the number of repetitions of a software loop. All
		996	"loop" instructions automatically decrement CX (or ECX) and terminate the
		997	loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
		998	whether the current code setting is 16-bit or 32-bit, but it can be forced to
		999	us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
		1000	"loope" and "loopz" are the synonyms for the same instruction, which acts as
		1001	the standard "loop", but also terminates the loop when ZF flag is set.
		1002	"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
		1003	and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
		1004	synonyms for the same instructions, which acts as the standard "loop", but
		1005	also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
		1006	mnemonics force them to use CX register while "loopned" and "loopnzd" force
		1007	them to use ECX register. Every "loop" instruction needs an operand being an
		1008	immediate value specifying target address, it can be only short jump (in the
		1009	range of 128 bytes back and 127 bytes forward from the address of instruction
		1010	following the "loop" instruction).
		1011	"jcxz" branches to the label specified in the instruction if it finds a
		1012	value of zero in CX, "jecxz" does the same, but checks the value of ECX
		1013	instead of CX. Rules for the operands are the same as for the "loop"
		1014	instruction.
		1015	"int" activates the interrupt service routine that corresponds to the
		1016	number specified as an operand to the instruction, the number should be in
		1017	range from 0 to 255. The interrupt service routine terminates with an "iret"
		1018	instruction that returns control to the instruction that follows "int".
		1019	"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
		1020	"into" instruction invokes the interrupt 4 if the OF flag is set.
		1021	"bound" verifies that the signed value contained in the specified register
		1022	lies within specified limits. An interrupt 5 occurs if the value contained in
		1023	the register is less than the lower bound or greater than the upper bound. It
		1024	needs two operands, the first operand specifies the register being tested,
		1025	the second operand should be memory address for the two signed limit values.
		1026	The operands can be "word" or "dword" in size.
		1027
		1028
		1029	bound eax,[esi] ; check double word for bounds
		1030
		1031
		1032
		1033
		1034
		1035	or EAX. I/O ports can be addressed either directly, with the immediate byte
		1036	value coded in instruction, or indirectly via the DX register. The destination
		1037	operand should be AL, AX, or EAX register. The source operand should be an
		1038	immediate value in range from 0 to 255, or DX register.
		1039
		1040
		1041	in ax,dx ; input word from port addressed by dx
		1042
		1043
		1044	or EAX. The program can specify the number of the port using the same methods
		1045	as the "in" instruction. The destination operand should be an immediate value
		1046	in range from 0 to 255, or DX register. The source operand should be AL, AX,
		1047	or EAX register.
		1048
		1049
		1050	out dx,al ; output byte to port addressed by dx
		1051
		1052
		1053
		1054
		1055
		1056	may be a byte, a word, or a double word. The string elements are addressed by
		1057	SI and DI (or ESI and EDI) registers. After every string operation SI and/or
		1058	DI (or ESI and/or EDI) are automatically updated to point to the next element
		1059	of the string. If DF (direction flag) is zero, the index registers are
		1060	incremented, if DF is one, they are decremented. The amount of the increment
		1061	or decrement is 1, 2, or 4 depending on the size of the string element. Every
		1062	string operation instruction has short forms which have no operands and use
		1063	SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
		1064	type is 32-bit. SI and ESI by default address data in the segment selected
		1065	by DS, DI and EDI always address data in the segment selected by ES. Short
		1066	form is obtained by attaching to the mnemonic of string operation letter
		1067	specifying the size of string element, it should be "b" for byte element,
		1068	"w" for word element, and "d" for double word element. Full form of string
		1069	operation needs operands providing the size operator and the memory addresses,
		1070	which can be SI or ESI with any segment prefix, DI or EDI always with ES
		1071	segment prefix.
		1072	"movs" transfers the string element pointed to by SI (or ESI) to the
		1073	location pointed to by DI (or EDI). Size of operands can be byte, word, or
		1074	double word. The destination operand should be memory addressed by DI or EDI,
		1075	the source operand should be memory addressed by SI or ESI with any segment
		1076	prefix.
		1077
		1078
		1079	movs word [es:di],[ss:si] ; transfer word
		1080	movsd ; transfer double word
		1081
		1082
		1083	element and updates the flags AF, SF, PF, CF and OF, but it does not change
		1084	any of the compared elements. If the string elements are equal, ZF is set,
		1085	otherwise it is cleared. The first operand for this instruction should be the
		1086	source string element addressed by SI or ESI with any segment prefix, the
		1087	second operand should be the destination string element addressed by DI or
		1088	EDI.
		1089
		1090
		1091	cmps word [ds:si],[es:di] ; compare words
		1092	cmps dword [fs:esi],[edi] ; compare double words
		1093
		1094
		1095	(depending on the size of string element) and updates the flags AF, SF, ZF,
		1096	PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
		1097	The operand should be the destination string element addressed by DI or EDI.
		1098
		1099
		1100	scasw ; scan word
		1101	scas dword [es:edi] ; scan double word
		1102
		1103
		1104	element. Rules for the operand are the same as for the "scas" instruction.
		1105	"lods" places the source string element into AL, AX, or EAX. The operand
		1106	should be the source string element addressed by SI or ESI with any segment
		1107	prefix.
		1108
		1109
		1110	lods word [cs:si] ; load word
		1111	lodsd ; load double word
		1112
		1113
		1114	by DX register to the destination string element. The destination operand
		1115	should be memory addressed by DI or EDI, the source operand should be the DX
		1116	register.
		1117
		1118
		1119	ins word [es:di],dx ; input word
		1120	ins dword [edi],dx ; input double word
		1121
		1122
		1123	DX register. The destination operand should be the DX register and the source
		1124	operand should be memory addressed by SI or ESI with any segment prefix.
		1125
		1126
		1127	outsw ; output word
		1128	outs dx,dword [gs:esi] ; output double word
		1129
		1130
		1131	repeated string operation. When a string operation instruction has a repeat
		1132	prefix, the operation is executed repeatedly, each time using a different
		1133	element of the string. The repetition terminates when one of the conditions
		1134	specified by the prefix is satisfied. All three prefixes automatically
		1135	decrease CX or ECX register (depending whether string operation instruction
		1136	uses the 16-bit or 32-bit addressing) after each operation and repeat the
		1137	associated operation until CX or ECX is zero. "repe"/"repz" and
		1138	"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
		1139	(described below). When these prefixes are used, repetition of the next
		1140	instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
		1141	the execution when the ZF is zero, "repne" and "repnz" terminate the execution
		1142	when the ZF is set.
		1143
		1144
		1145	repe cmpsb ; compare bytes until not equal
		1146
		1147
		1148
		1149
		1150
		1151	state of bits in the flag register. All instructions described in this
		1152	section have no operands.
		1153	"stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
		1154	CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
		1155	the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
		1156	interrupts, "cli" zeroes the IF and therefore disables the interrupts.
		1157	"lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
		1158	AH register. The contents of the remaining bits are undefined. The flags
		1159	remain unaffected.
		1160	"sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
		1161	AF, PF, and CF.
		1162	"pushf" decrements "esp" by two or four and stores the low word or
		1163	double word of flags register at the top of stack, size of stored data
		1164	depends on the current code setting. "pushfw" variant forces storing the
		1165	word and "pushfd" forces storing the double word.
		1166	"popf" transfers specific bits from the word or double word at the top
		1167	of stack, then increments "esp" by two or four, this value depends on
		1168	the current code setting. "popfw" variant forces restoring from the word
		1169	and "popfd" forces restoring from the double word.
		1170
		1171
		1172
		1173
		1174
		1175	2.1) to the "set" mnemonic set a byte to one if the condition is true and set
		1176	the byte to zero otherwise. The operand should be an 8-bit be general register
		1177	or the byte in memory.
		1178
		1179
		1180	seto byte [bx] ; set byte if overflow
		1181
		1182
		1183	set and zeroes the AL register otherwise. This instruction has no arguments.
		1184	The instructions obtained by attaching the condition mnemonic to "cmov"
		1185	mnemonic transfer the word or double word from the general register or memory
		1186	to the general register only when the condition is true. The destination
		1187	operand should be general register, the source operand can be general register
		1188	or memory.
		1189
		1190
		1191	cmovnc eax,[ebx] ; move when carry flag cleared
		1192
		1193
		1194	destination operand. If the two values are equal, the source operand is
		1195	loaded into the destination operand. Otherwise, the destination operand is
		1196	loaded into the AL, AX, or EAX register. The destination operand may be a
		1197	general register or memory, the source operand must be a general register.
		1198
		1199
		1200	cmpxchg [bx],dx ; compare and exchange with memory
		1201
		1202
		1203	destination operand. If the values are equal, the 64-bit value in ECX and EBX
		1204	registers is stored in the destination operand. Otherwise, the value in the
		1205	destination operand is loaded into EDX and EAX registers. The destination
		1206	operand should be a quad word in memory.
		1207
		1208
		1209
		1210
		1211
		1212
		1213
		1214	pointer. This instruction has no operands and doesn't perform any operation.
		1215	"ud2" instruction generates an invalid opcode exception. This instruction
		1216	is provided for software testing to explicitly generate an invalid opcode.
		1217	This is instruction has no operands.
		1218	"xlat" replaces a byte in the AL register with a byte indexed by its value
		1219	in a translation table addressed by BX or EBX. The operand should be a byte
		1220	memory addressed by BX or EBX with any segment prefix. This instruction has
		1221	also a short form "xlatb" which has no operands and uses the BX or EBX address
		1222	in the segment selected by DS depending on the current code setting.
		1223	"lds" transfers a pointer variable from the source operand to DS and the
		1224	destination register. The source operand must be a memory operand, and the
		1225	destination operand must be a general register. The DS register receives the
		1226	segment selector of the pointer while the destination register receives the
		1227	offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
		1228	to "lds" except that rather than DS register the ES, FS, GS and SS is used
		1229	respectively.
		1230
		1231
		1232
		1233
		1234	to the destination operand. The source operand must be a memory operand, and
		1235	the destination operand must be a general register.
		1236
		1237
		1238
		1239
		1240	EAX, EBX, ECX, and EDX registers. The information returned is selected by
		1241	entering a value in the EAX register before the instruction is executed.
		1242	This instruction has no operands.
		1243	"pause" instruction delays the execution of the next instruction an
		1244	implementation specific amount of time. It can be used to improve the
		1245	performance of spin wait loops. This instruction has no operands.
		1246	"enter" creates a stack frame that may be used to implement the scope rules
		1247	of block-structured high-level languages. A "leave" instruction at the end of
		1248	a procedure complements an "enter" at the beginning of the procedure to
		1249	simplify stack management and to control access to variables for nested
		1250	procedures. The "enter" instruction includes two parameters. The first
		1251	parameter specifies the number of bytes of dynamic storage to be allocated on
		1252	the stack for the routine being entered. The second parameter corresponds to
		1253	the lexical nesting level of the routine, it can be in range from 0 to 31.
		1254	The specified lexical level determines how many sets of stack frame pointers
		1255	the CPU copies into the new stack frame from the preceding frame. This list
		1256	of stack frame pointers is sometimes called the display. The first word (or
		1257	double word when code is 32-bit) of the display is a pointer to the last stack
		1258	frame. This pointer enables a "leave" instruction to reverse the action of the
		1259	previous "enter" instruction by effectively discarding the last stack frame.
		1260	After "enter" creates the new display for a procedure, it allocates the
		1261	dynamic storage space for that procedure by decrementing ESP by the number of
		1262	bytes specified in the first parameter. To enable a procedure to address its
		1263	display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
		1264	frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
		1265	BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
		1266	levels greater than zero, the processor pushes additional frame pointers on
		1267	the stack before adjusting the stack pointer.
		1268
		1269
		1270
		1271
		1272
		1273
		1274
		1275	CR0 register), while "smsw" stores the machine status word into the
		1276	destination operand. The operand for both those instructions can be 16-bit
		1277	general register or memory, for "smsw" it can also be 32-bit general
		1278	register.
		1279
		1280
		1281	smsw [bx] ; store machine status to memory
		1282
		1283
		1284	descriptor table register or the interrupt descriptor table register
		1285	respectively. "sgdt" and "sidt" store the contents of the global descriptor
		1286	table register or the interrupt descriptor table register in the destination
		1287	operand. The operand should be a 6 bytes in memory.
		1288
		1289
		1290
		1291
		1292	descriptor table register and "sldt" stores the segment selector from the
		1293	local descriptor table register in the operand. "ltr" loads the operand into
		1294	the segment selector field of the task register and "str" stores the segment
		1295	selector from the task register in the operand. Rules for operand are the same
		1296	as for the "lmsw" and "smsw" instructions.
		1297	"lar" loads the access rights from the segment descriptor specified by
		1298	the selector in source operand into the destination operand and sets the ZF
		1299	flag. The destination operand can be a 16-bit or 32-bit general register.
		1300	The source operand should be a 16-bit general register or memory.
		1301
		1302
		1303	lar eax,dx ; load access rights into double word
		1304
		1305
		1306	selector in source operand into the destination operand and sets the ZF flag.
		1307	Rules for operand are the same as for the "lar" instruction.
		1308	"verr" and "verw" verify whether the code or data segment specified with
		1309	the operand is readable or writable from the current privilege level. The
		1310	operand should be a word, it can be general register or memory. If the segment
		1311	is accessible and readable (for "verr") or writable (for "verw") the ZF flag
		1312	is set, otherwise it's cleared. Rules for operand are the same as for the
		1313	"lldt" instruction.
		1314	"arpl" compares the RPL (requestor's privilege level) fields of two segment
		1315	selectors. The first operand contains one segment selector and the second
		1316	operand contains the other. If the RPL field of the destination operand is
		1317	less than the RPL field of the source operand, the ZF flag is set and the RPL
		1318	field of the destination operand is increased to match that of the source
		1319	operand. Otherwise, the ZF flag is cleared and no change is made to the
		1320	destination operand. The destination operand can be a word general register
		1321	or memory, the source operand must be a general register.
		1322
		1323
		1324	arpl [bx],ax ; adjust RPL of selector in memory
		1325
		1326
		1327	instruction has no operands.
		1328	"lock" prefix causes the processor's bus-lock signal to be asserted during
		1329	execution of the accompanying instruction. In a multiprocessor environment,
		1330	the bus-lock signal insures that the processor has exclusive use of any shared
		1331	memory while the signal is asserted. The "lock" prefix can be prepended only
		1332	to the following instructions and only to those forms of the instructions
		1333	where the destination operand is a memory operand: "add", "adc", "and", "btc",
		1334	"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
		1335	"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
		1336	these instructions and the source operand is a memory operand, an undefined
		1337	opcode exception may be generated. An undefined opcode exception will also be
		1338	generated if the "lock" prefix is used with any instruction not in the above
		1339	list. The "xchg" instruction always asserts the bus-lock signal regardless of
		1340	the presence or absence of the "lock" prefix.
		1341	"hlt" stops instruction execution and places the processor in a halted
		1342	state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
		1343	signal will resume execution. This instruction has no operands.
		1344	"invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
		1345	specified with the operand, which should be a memory. The processor determines
		1346	the page that contains that address and flushes the TLB entry for that page.
		1347	"rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
		1348	address specified in the ECX register into registers EDX and EAX. "wrmsr"
		1349	writes the contents of registers EDX and EAX into the 64-bit MSR of the
		1350	address specified in the ECX register. "rdtsc" loads the current value of the
		1351	processor's time stamp counter from the 64-bit MSR into the EDX and EAX
		1352	registers. The processor increments the time stamp counter MSR every clock
		1353	cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
		1354	contents of the 40-bit performance monitoring counter specified in the ECX
		1355	register into registers EDX and EAX. These instructions have no operands.
		1356	"wbinvd" writes back all modified cache lines in the processor's internal
		1357	cache to main memory and invalidates (flushes) the internal caches. The
		1358	instruction then issues a special function bus cycle that directs external
		1359	caches to also write back modified data and another bus cycle to indicate that
		1360	the external caches should be invalidated. This instruction has no operands.
		1361	"rsm" return program control from the system management mode to the program
		1362	that was interrupted when the processor received an SMM interrupt. This
		1363	instruction has no operands.
		1364	"sysenter" executes a fast call to a level 0 system procedure, "sysexit"
		1365	executes a fast return to level 3 user code. The addresses used by these
		1366	instructions are stored in MSRs. These instructions have no operands.
		1367
		1368
		1369
		1370
		1371
		1372	values in three formats: single precision (32-bit), double precision (64-bit)
		1373	and double extended precision (80-bit). The FPU registers form the stack and
		1374	each of them holds the double extended precision floating-point value. When
		1375	some values are pushed onto the stack or are removed from the top, the FPU
		1376	registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
		1377	is the first value below the top, etc. The ST0 name has also the synonym ST.
		1378	"fld" pushes the floating-point value onto the FPU register stack. The
		1379	operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
		1380	its value is then loaded onto the top of FPU register stack (the ST0
		1381	register) and is automatically converted into the double extended precision
		1382	format.
		1383
		1384
		1385	fld st2 ; push value of st2 onto register stack
		1386
		1387
		1388	commonly used contants onto the FPU register stack. The loaded constants are
		1389	+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
		1390	have no operands.
		1391	"fild" converts the signed integer source operand into double extended
		1392	precision floating-point format and pushes the result onto the FPU register
		1393	stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
		1394
		1395
		1396
		1397
		1398	can be 32-bit or 64-bit memory location or another FPU register. "fstp"
		1399	performs the same operation as "fst" and then pops the register stack,
		1400	getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
		1401	and can also store value in the 80-bit memory.
		1402
		1403
		1404	fstp tword [bx] ; store value in memory and pop stack
		1405
		1406
		1407	in the destination operand. The operand can be 16-bit or 32-bit memory
		1408	location. "fistp" performs the same operation and then pops the register
		1409	stack, it accepts the same operands as the "fist" instruction and can also
		1410	store integer value in the 64-bit memory, so it has the same rules for
		1411	operands as "fild" instruction.
		1412	"fbld" converts the packed BCD integer into double extended precision
		1413	floating-point format and pushes this value onto the FPU stack. "fbstp"
		1414	converts the value in ST0 to an 18-digit packed BCD integer, stores the result
		1415	in the destination operand, and pops the register stack. The operand should be
		1416	an 80-bit memory location.
		1417	"fadd" adds the destination and source operand and stores the sum in the
		1418	destination location. The destination operand is always an FPU register, if
		1419	the source is a memory location, the destination is ST0 register and only
		1420	source operand should be specified. If both operands are FPU registers, at
		1421	least one of them should be ST0 register. An operand in memory can be a
		1422	32-bit or 64-bit value.
		1423
		1424
		1425	fadd st2,st0 ; add st0 to st2
		1426
		1427
		1428	destination location and then pops the register stack. The destination operand
		1429	must be an FPU register and the source operand must be the ST0. When no
		1430	operands are specified, ST1 is used as a destination operand.
		1431
		1432
		1433	faddp st2,st0 ; add st0 to st2 and pop the stack
		1434
		1435
		1436	precision floating-point value and adds it to the destination operand. The
		1437	operand should be a 16-bit or 32-bit memory location.
		1438
		1439
		1440
		1441
		1442	have the same rules for operands and differ only in the perfomed computation.
		1443	"fsub" substracts the source operand from the destination operand, "fsubr"
		1444	substract the destination operand from the source operand, "fmul" multiplies
		1445	the destination and source operands, "fdiv" divides the destination operand by
		1446	the source operand and "fdivr" divides the source operand by the destination
		1447	operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
		1448	operations and pop the register stack, the rules for operand are the same as
		1449	for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
		1450	perform these operations after converting the integer source operand into
		1451	floating-point value, they have the same rules for operands as "fiadd"
		1452	instruction.
		1453	"fsqrt" computes the square root of the value in ST0 register, "fsin"
		1454	computes the sine of that value, "fcos" computes the cosine of that value,
		1455	"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
		1456	value, "frndint" rounds it to the nearest integral value, depending on the
		1457	current rounding mode. "f2xm1" computes the exponential value of 2 to the
		1458	power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
		1459	range -1.0 to +1.0. All these instructions store the result in ST0 and have no
		1460	operands.
		1461	"fsincos" computes both the sine and the cosine of the value in ST0
		1462	register, stores the sine in ST0 and pushes the cosine on the top of FPU
		1463	register stack. "fptan" computes the tangent of the value in ST0, stores the
		1464	result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
		1465	the arctangent of the value in ST1 divided by the value in ST0, stores the
		1466	result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
		1467	logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
		1468	FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
		1469	ST0 before computing the logarithm. "fprem" computes the remainder obtained
		1470	from dividing the value in ST0 by the value in ST1, and stores the result
		1471	in ST0. "fprem1" performs the same operation as "fprem", but it computes the
		1472	remainder in the way specified by IEEE Standard 754. "fscale" truncates the
		1473	value in ST1 and increases the exponent of ST0 by this value. "fxtract"
		1474	separates the value in ST0 into its exponent and significand, stores the
		1475	exponent in ST0 and pushes the significand onto the register stack. "fnop"
		1476	performs no operation. These instructions have no operands.
		1477	"fxch" exchanges the contents of ST0 an another FPU register. The operand
		1478	should be an FPU register, if no operand is specified, the contents of ST0 and
		1479	ST1 are exchanged.
		1480	"fcom" and "fcomp" compare the contents of ST0 and the source operand and
		1481	set flags in the FPU status word according to the results. "fcomp"
		1482	additionally pops the register stack after performing the comparison. The
		1483	operand can be a single or double precision value in memory or the FPU
		1484	register. When no operand is specified, ST1 is used as a source operand.
		1485
		1486
		1487	fcomp st2 ; compare st0 with st2 and pop stack
		1488
		1489
		1490	word according to the results and pops the register stack twice. This
		1491	instruction has no operands.
		1492	"fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
		1493	registers. Rules for operands are the same as for the "fcom", "fcomp" and
		1494	"fcompp", but the source operand must be an FPU register.
		1495	"ficom" and "ficomp" compare the value in ST0 with an integer source operand
		1496	and set the flags in the FPU status word according to the results. "ficomp"
		1497	additionally pops the register stack after performing the comparison. The
		1498	integer value is converted to double extended precision floating-point format
		1499	before the comparison is made. The operand should be a 16-bit or 32-bit
		1500	memory location.
		1501
		1502
		1503
		1504
		1505	another FPU register and set the ZF, PF and CF flags according to the results.
		1506	"fcomip" and "fucomip" additionaly pop the register stack after performing the
		1507	comparison. The instructions obtained by attaching the FPU condition mnemonic
		1508	(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
		1509	into ST0 register if the given test condition is true. These instructions
		1510	allow two different syntaxes, one with single operand specifying the source
		1511	FPU register, and one with two operands, in that case destination operand
		1512	should be ST0 register and the second operand specifies the source FPU
		1513	register.
		1514
		1515
		1516	fcmovb st0,st2 ; transfer st2 to st0 if below
		1517
		1518
		1519	/------------------------------------------------------\
		1520	\| Mnemonic \| Condition tested \| Description \|
		1521	\|==========\|==================\|========================\|
		1522	\| b \| CF = 1 \| below \|
		1523	\| e \| ZF = 1 \| equal \|
		1524	\| be \| CF or ZF = 1 \| below or equal \|
		1525	\| u \| PF = 1 \| unordered \|
		1526	\| nb \| CF = 0 \| not below \|
		1527	\| ne \| ZF = 0 \| not equal \|
		1528	\| nbe \| CF and ZF = 0 \| not below nor equal \|
		1529	\| nu \| PF = 0 \| not unordered \|
		1530	\------------------------------------------------------/
		1531
		1532
		1533	status word according to the results. "fxam" examines the contents of the ST0
		1534	and sets the flags in FPU status word to indicate the class of value in the
		1535	register. These instructions have no operands.
		1536	"fstsw" and "fnstsw" store the current value of the FPU status word in the
		1537	destination location. The destination operand can be either a 16-bit memory or
		1538	the AX register. "fstsw" checks for pending unmasked FPU exceptions before
		1539	storing the status word, "fnstsw" does not.
		1540	"fstcw" and "fnstcw" store the current value of the FPU control word at the
		1541	specified destination in memory. "fstcw" checks for pending umasked FPU
		1542	exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
		1543	the operand into the FPU control word. The operand should be a 16-bit memory
		1544	location.
		1545	"fstenv" and "fnstenv" store the current FPU operating environment at the
		1546	memory location specified with the destination operand, and then mask all FPU
		1547	exceptions. "fstenv" checks for pending umasked FPU exceptions before
		1548	proceeding, "fnstenv" does not. "fldenv" loads the complete operating
		1549	environment from memory into the FPU. "fsave" and "fnsave" store the current
		1550	FPU state (operating environment and register stack) at the specified
		1551	destination in memory and reinitializes the FPU. "fsave" check for pending
		1552	unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
		1553	loads the FPU state from the specified memory location. All these instructions
		1554	need an operand being a memory location. For each of these instructions
		1555	exist two additional mnemonics that allow to precisely select the type of the
		1556	operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
		1557	"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
		1558	mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
		1559	force the operation as in 32-bit mode.
		1560	"finit" and "fninit" set the FPU operating environment into its default
		1561	state. "finit" checks for pending unmasked FPU exception before proceeding,
		1562	"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
		1563	FPU status word. "fclex" checks for pending unmasked FPU exception before
		1564	proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
		1565	instruction, which causes the processor to check for pending unmasked FPU
		1566	exceptions and handle them before proceeding. These instructions have no
		1567	operands.
		1568	"ffree" sets the tag associated with specified FPU register to empty. The
		1569	operand should be an FPU register.
		1570	"fincstp" and "fdecstp" rotate the FPU stack by one by adding or
		1571	substracting one to the pointer of the top of stack. These instructions have no
		1572	operands.
		1573
		1574
		1575
		1576
		1577
		1578	registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
		1579	of this MMX instructions cannot be used at the same time as FPU instructions.
		1580	They can operate on packed bytes (eight 8-bit integers), packed words (four
		1581	16-bit integers) or packed double words (two 32-bit integers), use of packed
		1582	formats allows to perform operations on multiple data at one time.
		1583	"movq" copies a quad word from the source operand to the destination
		1584	operand. At least one of the operands must be a MMX register, the second one
		1585	can be also a MMX register or 64-bit memory location.
		1586
		1587
		1588	movq mm2,[ebx] ; move quad word from memory to register
		1589
		1590
		1591	operand. One of the operands must be a MMX register, the second one can be a
		1592	general register or 32-bit memory location. Only low double word of MMX
		1593	register is used.
		1594	All general MMX operations have two operands, the destination operand should
		1595	be a MMX register, the source operand can be a MMX register or 64-bit memory
		1596	location. Operation is performed on the corresponding data elements of the
		1597	source and destination operand and stored in the data elements of the
		1598	destination operand. "paddb", "paddw" and "paddd" perform the addition of
		1599	packed bytes, packed words, or packed double words. "psubb", "psubw" and
		1600	"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
		1601	"psubsb" and "psubsw" perform the addition or substraction of packed bytes
		1602	or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
		1603	"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
		1604	performs a signed multiplication of the packed words and store the high or low
		1605	words of the results in the destination operand. "pmaddwd" performs a multiply
		1606	of the packed words and adds the four intermediate double word products in
		1607	pairs to produce result as a packed double words. "pand", "por" and "pxor"
		1608	perform the logical operations on the quad words, "pandn" peforms also a
		1609	logical negation of the destination operand before performing the "and"
		1610	operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
		1611	bytes, packed words or packed double words. If a pair of data elements is
		1612	equal, the corresponding data element in the destination operand is filled with
		1613	bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
		1614	perform the similar operation, but they check whether the data elements in the
		1615	destination operand are greater than the correspoding data elements in the
		1616	source operand. "packsswb" converts packed signed words into packed signed
		1617	bytes, "packssdw" converts packed signed double words into packed signed
		1618	words, using saturation to handle overflow conditions. "packuswb" converts
		1619	packed signed words into packed unsigned bytes. Converted data elements from
		1620	the source operand are stored in the low part of the destination operand,
		1621	while converted data elements from the destination operand are stored in the
		1622	high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
		1623	elements from the high parts of the source and destination operands and
		1624	stores the result into the destination operand. "punpcklbw", "punpcklwd" and
		1625	"punpckldq" perform the same operation, but the low parts of the source and
		1626	destination operand are used.
		1627
		1628
		1629	pcmpeqw mm3,mm7 ; compare packed words for equality
		1630
		1631
		1632	packed double words or a single quad word in the destination operand by the
		1633	amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
		1634	logical shift right of the packed words, packed double words or a single quad
		1635	word. "psraw" and "psrad" perform arithmetic shift of the packed words or
		1636	double words. The destination operand should be a MMX register, while source
		1637	operand can be a MMX register, 64-bit memory location, or 8-bit immediate
		1638	value.
		1639
		1640
		1641	psrad mm4,[ebx] ; shift double words right arithmetically
		1642
		1643
		1644	used before using the FPU instructions if any MMX instructions were used.
		1645
		1646
		1647
		1648
		1649
		1650	operations on packed single precision floating point values. The 128-bit
		1651	packed single precision format consists of four single precision floating
		1652	point values. The 128-bit SSE registers are designed for the purpose of
		1653	operations on this data type.
		1654	"movaps" and "movups" transfer a double quad word operand containing packed
		1655	single precision values from source operand to destination operand. At least
		1656	one of the operands have to be a SSE register, the second one can be also a
		1657	SSE register or 128-bit memory location. Memory operands for "movaps"
		1658	instruction must be aligned on boundary of 16 bytes, operands for "movups"
		1659	instruction don't have to be aligned.
		1660
		1661
		1662
		1663
		1664	low quad word of SSE register. "movhps" moved packed two single precision
		1665	values between the memory and the high quad word of SSE register. One of the
		1666	operands must be a SSE register, and the other operand must be a 64-bit memory
		1667	location.
		1668
		1669
		1670	movhps [esi],xmm7 ; move high quad word of xmm7 to memory
		1671
		1672
		1673	of source register to the high quad word of destination register. "movhlps"
		1674	moves two packed single precision values from the high quad word of source
		1675	register to the low quad word of destination register. Both operands have to
		1676	be a SSE registers.
		1677	"movmskps" transfers the most significant bit of each of the four single
		1678	precision values in the SSE register into low four bits of a general register.
		1679	The source operand must be a SSE register, the destination operand must be a
		1680	general register.
		1681	"movss" transfers a single precision value between source and destination
		1682	operand (only the low double word is trasferred). At least one of the operands
		1683	have to be a SSE register, the second one can be also a SSE register or 32-bit
		1684	memory location.
		1685
		1686
		1687
		1688
		1689	ends with "ps", the source operand can be a 128-bit memory location or a SSE
		1690	register, the destination operand must be a SSE register and the operation is
		1691	performed on packed four single precision values, for each pair of the
		1692	corresponding data elements separately, the result is stored in the
		1693	destination register. When the mnemonic ends with "ss", the source operand
		1694	can be a 32-bit memory location or a SSE register, the destination operand
		1695	must be a SSE register and the operation is performed on single precision
		1696	values, only low double words of SSE registers are used in this case, the
		1697	result is stored in the low double word of destination register. "addps" and
		1698	"addss" add the values, "subps" and "subss" substract the source value from
		1699	destination value, "mulps" and "mulss" multiply the values, "divps" and
		1700	"divss" divide the destination value by the source value, "rcpps" and "rcpss"
		1701	compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
		1702	compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
		1703	the approximate reciprocal of square root of the source value, "maxps" and
		1704	"maxss" compare the source and destination values and return the greater one,
		1705	"minps" and "minss" compare the source and destination values and return the
		1706	lesser one.
		1707
		1708
		1709	addps xmm3,xmm7 ; add packed single precision values
		1710
		1711
		1712	packed single precision values. The source operand can be a 128-bit memory
		1713	location or a SSE register, the destination operand must be a SSE register.
		1714	"cmpps" compares packed single precision values and returns a mask result
		1715	into the destination operand, which must be a SSE register. The source operand
		1716	can be a 128-bit memory location or SSE register, the third operand must be an
		1717	immediate operand selecting code of one of the eight compare conditions
		1718	(table 2.3). "cmpss" performs the same operation on single precision values,
		1719	only low double word of destination register is affected, in this case source
		1720	operand can be a 32-bit memory location or SSE register. These two
		1721	instructions have also variants with only two operands and the condition
		1722	encoded within mnemonic. Their mnemonics are obtained by attaching the
		1723	mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
		1724	"ss" at the end.
		1725
		1726
		1727	cmpltss xmm0,[ebx] ; compare single precision values
		1728
		1729
		1730	/-------------------------------------------\
		1731	\| Code \| Mnemonic \| Description \|
		1732	\|======\|==========\|=========================\|
		1733	\| 0 \| eq \| equal \|
		1734	\| 1 \| lt \| less than \|
		1735	\| 2 \| le \| less than or equal \|
		1736	\| 3 \| unord \| unordered \|
		1737	\| 4 \| neq \| not equal \|
		1738	\| 5 \| nlt \| not less than \|
		1739	\| 6 \| nle \| not less than nor equal \|
		1740	\| 7 \| ord \| ordered \|
		1741	\-------------------------------------------/
		1742
		1743
		1744	PF and CF flags to show the result. The destination operand must be a SSE
		1745	register, the source operand can be a 32-bit memory location or SSE register.
		1746	"shufps" moves any two of the four single precision values from the
		1747	destination operand into the low quad word of the destination operand, and any
		1748	two of the four values from the source operand into the high quad word of the
		1749	destination operand. The destination operand must be a SSE register, the
		1750	source operand can be a 128-bit memory location or SSE register, the third
		1751	operand must be an 8-bit immediate value selecting which values will be moved
		1752	into the destination operand. Bits 0 and 1 select the value to be moved from
		1753	destination operand to the low double word of the result, bits 2 and 3 select
		1754	the value to be moved from the destination operand to the second double word,
		1755	bits 4 and 5 select the value to be moved from the source operand to the third
		1756	double word, and bits 6 and 7 select the value to be moved from the source
		1757	operand to the high double word of the result.
		1758
		1759
		1760
		1761
		1762	of the source and destination operands and stores the result in the
		1763	destination operand, which must be a SSE register. The source operand can be
		1764	a 128-bit memory location or a SSE register. "unpcklps" performs an
		1765	interleaved unpack of the values from the low parts of the source and
		1766	destination operand and stores the result in the destination operand,
		1767	the rules for operands are the same.
		1768	"cvtpi2ps" converts packed two double word integers into the the packed two
		1769	single precision floating point values and stores the result in the low quad
		1770	word of the destination operand, which should be a SSE register. The source
		1771	operand can be a 64-bit memory location or MMX register.
		1772
		1773
		1774
		1775
		1776	point value and stores the result in the low double word of the destination
		1777	operand, which should be a SSE register. The source operand can be a 32-bit
		1778	memory location or 32-bit general register.
		1779
		1780
		1781
		1782
		1783	packed two double word integers and stores the result in the destination
		1784	operand, which should be a MMX register. The source operand can be a 64-bit
		1785	memory location or SSE register, only low quad word of SSE register is used.
		1786	"cvttps2pi" performs the similar operation, except that truncation is used to
		1787	round a source values to integers, rules for the operands are the same.
		1788
		1789
		1790
		1791
		1792	word integer and stores the result in the destination operand, which should be
		1793	a 32-bit general register. The source operand can be a 32-bit memory location
		1794	or SSE register, only low double word of SSE register is used. "cvttss2si"
		1795	performs the similar operation, except that truncation is used to round a
		1796	source value to integer, rules for the operands are the same.
		1797
		1798
		1799
		1800
		1801	operand to the destination operand. The source operand must be a MMX register,
		1802	the destination operand must be a 32-bit general register (the high word of
		1803	the destination is cleared), the third operand must an 8-bit immediate value.
		1804
		1805
		1806
		1807
		1808	at the location specified with the third operand, which must be an 8-bit
		1809	immediate value. The destination operand must be a MMX register, the source
		1810	operand can be a 16-bit memory location or 32-bit general register (only low
		1811	word of the register is used).
		1812
		1813
		1814
		1815
		1816	return the maximum values of packed unsigned bytes, "pminub" returns the
		1817	minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
		1818	of packed signed words, "pminsw" returns the minimum values of packed signed
		1819	words. "pmulhuw" performs a unsigned multiplication of the packed words and
		1820	stores the high words of the results in the destination operand. "psadbw"
		1821	computes the absolute differences of packed unsigned bytes, sums the
		1822	differences, and stores the sum in the low word of destination operand. All
		1823	these instructions follow the same rules for operands as the general MMX
		1824	operations described in previous section.
		1825	"pmovmskb" creates a mask made of the most significant bit of each byte in
		1826	the source operand and stores the result in the low byte of destination
		1827	operand. The source operand must be a MMX register, the destination operand
		1828	must a 32-bit general register.
		1829	"pshufw" inserts words from the source operand in the destination operand
		1830	from the locations specified with the third operand. The destination operand
		1831	must be a MMX register, the source operand can be a 64-bit memory location or
		1832	MMX register, third operand must an 8-bit immediate value selecting which
		1833	values will be moved into destination operand, in the similar way as the third
		1834	operand of the "shufps" instruction.
		1835	"movntq" moves the quad word from the source operand to memory using a
		1836	non-temporal hint to minimize cache pollution. The source operand should be a
		1837	MMX register, the destination operand should be a 64-bit memory location.
		1838	"movntps" stores packed single precision values from the SSE register to
		1839	memory using a non-temporal hint. The source operand should be a SSE register,
		1840	the destination operand should be a 128-bit memory location. "maskmovq" stores
		1841	selected bytes from the first operand into a 64-bit memory location using a
		1842	non-temporal hint. Both operands should be a MMX registers, the second operand
		1843	selects wich bytes from the source operand are written to memory. The
		1844	memory location is pointed by DI (or EDI) register in the segment selected
		1845	by DS.
		1846	"prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
		1847	of data from memory that contains byte specified with the operand to a
		1848	specified location in hierarchy. The operand should be an 8-bit memory
		1849	location.
		1850	"sfence" performs a serializing operation on all instruction storing to
		1851	memory that were issued prior to it. This instruction has no operands.
		1852	"ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
		1853	stores the contents of MXCSR into a 32-bit memory operand.
		1854	"fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
		1855	and SSE registers to a 512-byte memory location specified in the destination
		1856	operand. "fxrstor" reloads data previously stored with "fxsave" instruction
		1857	from the specified 512-byte memory location. The memory operand for both those
		1858	instructions must be aligned on 16 byte boundary, it should declare operand
		1859	of no specified size.
		1860
		1861
		1862
		1863
		1864
		1865	floating point values, extends the syntax of MMX instructions, and adds also
		1866	some new instructions.
		1867	"movapd" and "movupd" transfer a double quad word operand containing packed
		1868	double precision values from source operand to destination operand. These
		1869	instructions are analogous to "movaps" and "movups" and have the same rules
		1870	for operands.
		1871	"movlpd" moves double precision value between the memory and the low quad
		1872	word of SSE register. "movhpd" moved double precision value between the memory
		1873	and the high quad word of SSE register. These instructions are analogous to
		1874	"movlps" and "movhps" and have the same rules for operands.
		1875	"movmskpd" transfers the most significant bit of each of the two double
		1876	precision values in the SSE register into low two bits of a general register.
		1877	This instruction is analogous to "movmskps" and has the same rules for
		1878	operands.
		1879	"movsd" transfers a double precision value between source and destination
		1880	operand (only the low quad word is trasferred). At least one of the operands
		1881	have to be a SSE register, the second one can be also a SSE register or 64-bit
		1882	memory location.
		1883	Arithmetic operations on double precision values are: "addpd", "addsd",
		1884	"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
		1885	"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
		1886	operations on single precision values described in previous section. When the
		1887	mnemonic ends with "pd" instead of "ps", the operation is performed on packed
		1888	two double precision values, but rules for operands are the same. When the
		1889	mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
		1890	memory location or a SSE register, the destination operand must be a SSE
		1891	register and the operation is performed on double precision values, only low
		1892	quad words of SSE registers are used in this case.
		1893	"andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
		1894	packed double precision values. They are analoguous to SSE logical operations
		1895	on single prevision values and have the same rules for operands.
		1896	"cmppd" compares packed double precision values and returns and returns a
		1897	mask result into the destination operand. This instruction is analoguous to
		1898	"cmpps" and has the same rules for operands. "cmpsd" performs the same
		1899	operation on double precision values, only low quad word of destination
		1900	register is affected, in this case source operand can be a 64-bit memory or
		1901	SSE register. Variant with only two operands are obtained by attaching the
		1902	condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
		1903	the "pd" or "sd" at the end.
		1904	"comisd" and "ucomisd" compare the double precision values and set the ZF,
		1905	PF and CF flags to show the result. The destination operand must be a SSE
		1906	register, the source operand can be a 128-bit memory location or SSE register.
		1907	"shufpd" moves any of the two double precision values from the destination
		1908	operand into the low quad word of the destination operand, and any of the two
		1909	values from the source operand into the high quad word of the destination
		1910	operand. This instruction is analoguous to "shufps" and has the same rules for
		1911	operand. Bit 0 of the third operand selects the value to be moved from the
		1912	destination operand, bit 1 selects the value to be moved from the source
		1913	operand, the rest of bits are reserved and must be zeroed.
		1914	"unpckhpd" performs an unpack of the high quad words from the source and
		1915	destination operands, "unpcklpd" performs an unpack of the low quad words from
		1916	the source and destination operands. They are analoguous to "unpckhps" and
		1917	"unpcklps", and have the same rules for operands.
		1918	"cvtps2pd" converts the packed two single precision floating point values to
		1919	two packed double precision floating point values, the destination operand
		1920	must be a SSE register, the source operand can be a 64-bit memory location or
		1921	SSE register. "cvtpd2ps" converts the packed two double precision floating
		1922	point values to packed two single precision floating point values, the
		1923	destination operand must be a SSE register, the source operand can be a
		1924	128-bit memory location or SSE register. "cvtss2sd" converts the single
		1925	precision floating point value to double precision floating point value, the
		1926	destination operand must be a SSE register, the source operand can be a 32-bit
		1927	memory location or SSE register. "cvtsd2ss" converts the double precision
		1928	floating point value to single precision floating point value, the destination
		1929	operand must be a SSE register, the source operand can be 64-bit memory
		1930	location or SSE register.
		1931	"cvtpi2pd" converts packed two double word integers into the the packed
		1932	double precision floating point values, the destination operand must be a SSE
		1933	register, the source operand can be a 64-bit memory location or MMX register.
		1934	"cvtsi2sd" converts a double word integer into a double precision floating
		1935	point value, the destination operand must be a SSE register, the source
		1936	operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
		1937	converts packed double precision floating point values into packed two double
		1938	word integers, the destination operand should be a MMX register, the source
		1939	operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
		1940	the similar operation, except that truncation is used to round a source values
		1941	to integers, rules for operands are the same. "cvtsd2si" converts a double
		1942	precision floating point value into a double word integer, the destination
		1943	operand should be a 32-bit general register, the source operand can be a
		1944	64-bit memory location or SSE register. "cvttsd2si" performs the similar
		1945	operation, except that truncation is used to round a source value to integer,
		1946	rules for operands are the same.
		1947	"cvtps2dq" and "cvttps2dq" convert packed single precision floating point
		1948	values to packed four double word integers, storing them in the destination
		1949	operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
		1950	point values to packed two double word integers, storing the result in the low
		1951	quad word of the destination operand. "cvtdq2ps" converts packed four
		1952	double word integers to packed single precision floating point values.
		1953	For all these instructions destination operand must be a SSE register, the
		1954	source operand can be a 128-bit memory location or SSE register.
		1955	"cvtdq2pd" converts packed two double word integers from the source operand to
		1956	packed double precision floating point values, the source can be a 64-bit
		1957	memory location or SSE register, destination has to be SSE register.
		1958	"movdqa" and "movdqu" transfer a double quad word operand containing packed
		1959	integers from source operand to destination operand. At least one of the
		1960	operands have to be a SSE register, the second one can be also a SSE register
		1961	or 128-bit memory location. Memory operands for "movdqa" instruction must be
		1962	aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
		1963	to be aligned.
		1964	"movq2dq" moves the contents of the MMX source register to the low quad word
		1965	of destination SSE register. "movdq2q" moves the low quad word from the source
		1966	SSE register to the destination MMX register.
		1967
		1968
		1969	movdq2q mm0,xmm1 ; move from SSE register to MMX register
		1970
		1971
		1972	mnemonics starting with "p") are extended to operate on 128-bit packed
		1973	integers located in SSE registers. Additional syntax for these instructions
		1974	needs an SSE register where MMX register was needed, and the 128-bit memory
		1975	location or SSE register where 64-bit memory location or MMX register were
		1976	needed. The exception is "pshufw" instruction, which doesn't allow extended
		1977	syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
		1978	the extended syntax, and perform the same operation as "pshufw" on the high
		1979	or low quad words of operands respectively. Also the new instruction "pshufd"
		1980	is introduced, which performs the same operation as "pshufw", but on the
		1981	double words instead of words, it allows only the extended syntax.
		1982
		1983
		1984	pextrw eax,xmm0,7 ; extract highest word into eax
		1985
		1986
		1987	substraction of packed quad words, "pmuludq" performs an unsigned
		1988	multiplication of low double words from each corresponding quad words and
		1989	returns the results in packed quad words. These instructions follow the same
		1990	rules for operands as the general MMX operations described in 2.1.14.
		1991	"pslldq" and "psrldq" perform logical shift left or right of the double
		1992	quad word in the destination operand by the amount of bytes specified in the
		1993	source operand. The destination operand should be a SSE register, source
		1994	operand should be an 8-bit immediate value.
		1995	"punpckhqdq" interleaves the high quad word of the source operand and the
		1996	high quad word of the destination operand and writes them to the destination
		1997	SSE register. "punpcklqdq" interleaves the low quad word of the source operand
		1998	and the low quad word of the destination operand and writes them to the
		1999	destination SSE register. The source operand can be a 128-bit memory location
		2000	or SSE register.
		2001	"movntdq" stores packed integer data from the SSE register to memory using
		2002	non-temporal hint. The source operand should be a SSE register, the
		2003	destination operand should be a 128-bit memory location. "movntpd" stores
		2004	packed double precision values from the SSE register to memory using a
		2005	non-temporal hint. Rules for operand are the same. "movnti" stores integer
		2006	from a general register to memory using a non-temporal hint. The source
		2007	operand should be a 32-bit general register, the destination operand should
		2008	be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
		2009	operand into a 128-bit memory location using a non-temporal hint. Both
		2010	operands should be a SSE registers, the second operand selects wich bytes from
		2011	the source operand are written to memory. The memory location is pointed by DI
		2012	(or EDI) register in the segment selected by DS and does not need to be
		2013	aligned.
		2014	"clflush" writes and invalidates the cache line associated with the address
		2015	of byte specified with the operand, which should be a 8-bit memory location.
		2016	"lfence" performs a serializing operation on all instruction loading from
		2017	memory that were issued prior to it. "mfence" performs a serializing operation
		2018	on all instruction accesing memory that were issued prior to it, and so it
		2019	combines the functions of "sfence" (described in previous section) and
		2020	"lfence" instructions. These instructions have no operands.
		2021
		2022
		2023
		2024
		2025
		2026	of SSE and SSE2 - this extension is called SSE3.
		2027	"fisttp" behaves like the "fistp" instruction and accepts the same operands,
		2028	the only difference is that it always used truncation, irrespective of the
		2029	rounding mode.
		2030	"movshdup" loads into destination operand the 128-bit value obtained from
		2031	the source value of the same size by filling the each quad word with the two
		2032	duplicates of the value in its high double word. "movsldup" performs the same
		2033	action, except it duplicates the values of low double words. The destination
		2034	operand should be SSE register, the source operand can be SSE register or
		2035	128-bit memory location.
		2036	"movddup" loads the 64-bit source value and duplicates it into high and low
		2037	quad word of the destination operand. The destination operand should be SSE
		2038	register, the source operand can be SSE register or 64-bit memory location.
		2039	"lddqu" is functionally equivalent to "movdqu" with memory as source
		2040	operand, but it may improve performance when the source operand crosses a
		2041	cacheline boundary. The destination operand has to be SSE register, the source
		2042	operand must be 128-bit memory location.
		2043	"addsubps" performs single precision addition of second and fourth pairs and
		2044	single precision substracion of the first and third pairs of floating point
		2045	values in the operands. "addsubpd" performs double precision addition of the
		2046	second pair and double precision substraction of the first pair of floating
		2047	point values in the operand. "haddps" performs the addition of two single
		2048	precision values within the each quad word of source and destination operands,
		2049	and stores the results of such horizontal addition of values from destination
		2050	operand into low quad word of destination operand, and the results from the
		2051	source operand into high quad word of destination operand. "haddpd" performs
		2052	the addition of two double precision values within each operand, and stores
		2053	the result from destination operand into low quad word of destination operand,
		2054	and the result from source operand into high quad word of destination operand.
		2055	All these instructions need the destination operand to be SSE register, source
		2056	operand can be SSE register or 128-bit memory location.
		2057	"monitor" sets up an address range for monitoring of write-back stores. It
		2058	need its three operands to be EAX, ECX and EDX register in that order. "mwait"
		2059	waits for a write-back store to the address range set up by the "monitor"
		2060	instruction. It uses two operands with additional parameters, first being the
		2061	EAX and second the ECX register.
		2062	The functionality of SSE3 is further extended by the set of Supplemental
		2063	SSE3 instructions (SSSE3). They generally follow the same rules for operands
		2064	as all the MMX operations extended by SSE.
		2065	"phaddw" and "phaddd" perform the horizontal additional of the pairs of
		2066	adjacent values from both the source and destination operand, and stores the
		2067	sums into the destination (sums from the source operand go into lower part of
		2068	destination register). They operate on 16-bit or 32-bit chunks, respectively.
		2069	"phaddsw" performs the same operation on signed 16-bit packed values, but the
		2070	result of each addition is saturated. "phsubw" and "phsubd" analogously
		2071	perform the horizontal substraction of 16-bit or 32-bit packed value, and
		2072	"phsubsw" performs the horizontal substraction of signed 16-bit packed values
		2073	with saturation.
		2074	"pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed
		2075	packed signed value in source operand and stores them into the destination
		2076	register. They operator on 8-bit, 16-bit and 32-bit elements respectively.
		2077	"pmaddubsw" multiplies signed 8-bit values from the source operand with the
		2078	corresponding unsigned 8-bit values from the destination operand to produce
		2079	intermediate 16-bit values, and every adjacent pair of those intermediate
		2080	values is then added horizontally and those 16-bit sums are stored into the
		2081	destination operand.
		2082	"pmulhrsw" multiplies corresponding 16-bit integers from the source and
		2083	destination operand to produce intermediate 32-bit values, and the 16 bits
		2084	next to the highest bit of each of those values are then rounded and packed
		2085	into the destination operand.
		2086	"pshufb" shuffles the bytes in the destination operand according to the
		2087	mask provided by source operand - each of the bytes in source operand is
		2088	an index of the target position for the corresponding byte in the destination.
		2089	"psignb", "psignw" and "psignd" perform the operation on 8-bit, 16-bit or
		2090	32-bit integers in destination operand, depending on the signs of the values
		2091	in the source. If the value in source is negative, the corresponding value in
		2092	the destination register is negated, if the value in source is positive, no
		2093	operation is performed on the corresponding value is performed, and if the
		2094	value in source is zero, the value in destination is zeroed, too.
		2095	"palignr" appends the source operand to the destination operand to form the
		2096	intermediate value of twice the size, and then extracts into the destination
		2097	register the 64 or 128 bits that are right-aligned to the byte offset
		2098	specified by the third operand, which should be an 8-bit immediate value. This
		2099	is the only SSSE3 instruction that takes three arguments.
		2100
		2101
		2102
		2103
		2104
		2105	and introduces operation on the 64-bit packed floating point values, each
		2106	consisting of two single precision floating point values.
		2107	These instructions follow the same rules as the general MMX operations, the
		2108	destination operand should be a MMX register, the source operand can be a MMX
		2109	register or 64-bit memory location. "pavgusb" computes the rounded averages
		2110	of packed unsigned bytes. "pmulhrw" performs a signed multiplication of the
		2111	packed words, round the high word of each double word results and stores them
		2112	in the destination operand. "pi2fd" converts packed double word integers into
		2113	packed floating point values. "pf2id" converts packed floating point values
		2114	into packed double word integers using truncation. "pi2fw" converts packed
		2115	word integers into packed floating point values, only low words of each
		2116	double word in source operand are used. "pf2iw" converts packed floating
		2117	point values to packed word integers, results are extended to double words
		2118	using the sign extension. "pfadd" adds packed floating point values. "pfsub"
		2119	and "pfsubr" substracts packed floating point values, the first one substracts
		2120	source values from destination values, the second one substracts destination
		2121	values from the source values. "pfmul" multiplies packed floating point
		2122	values. "pfacc" adds the low and high floating point values of the destination
		2123	operand, storing the result in the low double word of destination, and adds
		2124	the low and high floating point values of the source operand, storing the
		2125	result in the high double word of destination. "pfnacc" substracts the high
		2126	floating point value of the destination operand from the low, storing the
		2127	result in the low double word of destination, and substracts the high floating
		2128	point value of the source operand from the low, storing the result in the high
		2129	double word of destination. "pfpnacc" substracts the high floating point value
		2130	of the destination operand from the low, storing the result in the low double
		2131	word of destination, and adds the low and high floating point values of the
		2132	source operand, storing the result in the high double word of destination.
		2133	"pfmax" and "pfmin" compute the maximum and minimum of floating point values.
		2134	"pswapd" reverses the high and low double word of the source operand. "pfrcp"
		2135	returns an estimates of the reciprocals of floating point values from the
		2136	source operand, "pfrsqrt" returns an estimates of the reciprocal square
		2137	roots of floating point values from the source operand, "pfrcpit1" performs
		2138	the first step in the Newton-Raphson iteration to refine the reciprocal
		2139	approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
		2140	step in the Newton-Raphson iteration to refine the reciprocal square root
		2141	approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
		2142	second final step in the Newton-Raphson iteration to refine the reciprocal
		2143	approximation or the reciprocal square root approximation. "pfcmpeq",
		2144	"pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
		2145	all bits or zeroes all bits of the correspoding data element in the
		2146	destination operand according to the result of comparison, first checks
		2147	whether values are equal, second checks whether destination value is greater
		2148	or equal to source value, third checks whether destination value is greater
		2149	than source value.
		2150	"prefetch" and "prefetchw" load the line of data from memory that contains
		2151	byte specified with the operand into the data cache, "prefetchw" instruction
		2152	should be used when the data in the cache line is expected to be modified,
		2153	otherwise the "prefetch" instruction should be used. The operand should be an
		2154	8-bit memory location.
		2155	"femms" performs a fast clear of MMX state. This instruction has no
		2156	operands.
		2157
		2158
		2159
		2160
		2161
		2162	both) extend the x86 instruction set for the 64-bit processing. While legacy
		2163	and compatibility modes use the same set of registers and instructions, the
		2164	new long mode extends the x86 operations to 64 bits and introduces several new
		2165	registers. You can turn on generating the code for this mode with the "use64"
		2166	directive.
		2167	Each of the general purpose registers is extended to 64 bits and the eight
		2168	whole new general purpose registers and also eight new SSE registers are added.
		2169	See table 2.4 for the summary of new registers (only the ones that was not
		2170	listed in table 1.2). The general purpose registers of smallers sizes are the
		2171	low order portions of the larger ones. You can still access the "ah", "bh",
		2172	"ch" and "dh" registers in long mode, but you cannot use them in the same
		2173	instruction with any of the new registers.
		2174
		2175
		2176	/--------------------------------------------------\
		2177	\| Type \| General \| SSE \| AVX \|
		2178	\|------\|---------------------------\|-------\|-------\|
		2179	\| Bits \| 8 \| 16 \| 32 \| 64 \| 128 \| 256 \|
		2180	\|======\|======\|======\|======\|======\|=======\|=======\|
		2181	\| \| \| \| \| rax \| \| \|
		2182	\| \| \| \| \| rcx \| \| \|
		2183	\| \| \| \| \| rdx \| \| \|
		2184	\| \| \| \| \| rbx \| \| \|
		2185	\| \| spl \| \| \| rsp \| \| \|
		2186	\| \| bpl \| \| \| rbp \| \| \|
		2187	\| \| sil \| \| \| rsi \| \| \|
		2188	\| \| dil \| \| \| rdi \| \| \|
		2189	\| \| r8b \| r8w \| r8d \| r8 \| xmm8 \| ymm8 \|
		2190	\| \| r9b \| r9w \| r9d \| r9 \| xmm9 \| ymm9 \|
		2191	\| \| r10b \| r10w \| r10d \| r10 \| xmm10 \| ymm10 \|
		2192	\| \| r11b \| r11w \| r11d \| r11 \| xmm11 \| ymm11 \|
		2193	\| \| r12b \| r12w \| r12d \| r12 \| xmm12 \| ymm12 \|
		2194	\| \| r13b \| r13w \| r13d \| r13 \| xmm13 \| ymm13 \|
		2195	\| \| r14b \| r14w \| r14d \| r14 \| xmm14 \| ymm14 \|
		2196	\| \| r15b \| r15w \| r15d \| r15 \| xmm15 \| ymm15 \|
		2197	\--------------------------------------------------/
		2198
		2199
		2200	32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
		2201	registers should be used for addressing in long mode, the 32-bit addressing
		2202	is also allowed, but it's not possible to use the addresses based on 16-bit
		2203	registers. Below are the samples of new operations possible in long mode on the
		2204	example of "mov" instruction:
		2205
		2206
		2207	mov al,[rbx] ; transfer memory addressed by 64-bit register
		2208
		2209
		2210	specify it manually with the special RIP register symbol, but such addressing
		2211	is also automatically generated by flat assembler, since there is no 64-bit
		2212	absolute addressing in long mode. You can still force the assembler to use the
		2213	32-bit absolute addressing by putting the "dword" size override for address
		2214	inside the square brackets. There is also one exception, where the 64-bit
		2215	absolute addressing is possible, it's the "mov" instruction with one of the
		2216	operand being accumulator register, and second being the memory operand.
		2217	To force the assembler to use the 64-bit absolute addressing there, use the
		2218	"qword" size operator for address inside the square brackets. When no size
		2219	operator is applied to address, assembler generates the optimal form
		2220	automatically.
		2221
		2222
		2223	mov [dword 0],r15d ; absolute 32-bit addressing
		2224	mov [0],rsi ; automatic RIP-relative addressing
		2225	mov [rip+3],sil ; manual RIP-relative addressing
		2226
		2227
		2228	values are possible, with the only exception being the "mov" instruction with
		2229	destination operand being 64-bit general purpose register. Trying to force the
		2230	64-bit immediate with any other instruction will cause an error.
		2231	If any operation is performed on the 32-bit general registers in long mode,
		2232	the upper 32 bits of the 64-bit registers containing them are filled with
		2233	zeros. This is unlike the operations on 16-bit or 8-bit portions of those
		2234	registers, which preserve the upper bits.
		2235	Three new type conversion instructions are available. The "cdqe" sign
		2236	extends the double word in EAX into quad word and stores the result in RAX
		2237	register. "cqo" sign extends the quad word in RAX into double quad word and
		2238	stores the extra bits in the RDX register. These instructions have no
		2239	operands. "movsxd" sign extends the double word source operand, being either
		2240	the 32-bit register or memory, into 64-bit destination operand, which has to
		2241	be register. No analogous instruction is needed for the zero extension, since
		2242	it is done automatically by any operations on 32-bit registers, as noted in
		2243	previous paragraph. And the "movzx" and "movsx" instructions, conforming to
		2244	the general rule, can be used with 64-bit destination operand, allowing
		2245	extension of byte or word values into quad words.
		2246	All the binary arithmetic and logical instruction have been promoted to
		2247	allow 64-bit operands in long mode. The use of decimal arithmetic instructions
		2248	in long mode is prohibited.
		2249	The stack operations, like "push" and "pop" in long mode default to 64-bit
		2250	operands and it's not possible to use 32-bit operands with them. The "pusha"
		2251	and "popa" are disallowed in long mode.
		2252	The indirect near jumps and calls in long mode default to 64-bit operands
		2253	and it's not possible to use the 32-bit operands with them. On the other hand,
		2254	the indirect far jumps and calls allow any operands that were allowed by the
		2255	x86 architecture and also 80-bit memory operand is allowed (though only EM64T
		2256	seems to implement such variant), with the first eight bytes defining the
		2257	offset and two last bytes specifying the selector. The direct far jumps and
		2258	calls are not allowed in long mode.
		2259	The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
		2260	instructions that are not extended to accept quad word operands in long mode.
		2261	But all other string operations are, and there are new short forms "movsq",
		2262	"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
		2263	operations for 64-bit string elements. The RSI and RDI registers are used by
		2264	default to address the string elements.
		2265	The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
		2266	memory operand with 64-bit destination register (though only EM64T seems to
		2267	implement such variant). The "lds" and "les" are disallowed in long mode.
		2268	The system instructions like "lgdt" which required the 48-bit memory operand,
		2269	in long mode require the 80-bit memory operand.
		2270	The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
		2271	the double quad word memory operand and 64-bit registers to perform the
		2272	analoguous operation.
		2273	The "fxsave64" and "fxrstor64" are new variants of "fxsave" and "fxrstor"
		2274	instructions, available only in long mode, which use a different format of
		2275	storage area in order to store some pointers in full 64-bit size.
		2276	"swapgs" is the new instruction, which swaps the contents of GS register and
		2277	the KernelGSbase model-specific register (MSR address 0C0000102h).
		2278	"syscall" and "sysret" is the pair of new instructions that provide the
		2279	functionality similar to "sysenter" and "sysexit" in long mode, where the
		2280	latter pair is disallowed. The "sysexitq" and "sysretq" mnemonics provide the
		2281	64-bit versions of "sysexit" and "sysret" instructions.
		2282	The "rdmsrq" and "wrmsrq" mnemonics are the 64-bit variants of the "rdmsr"
		2283	and "wrmsr" instructions.
		2284
		2285
		2286
		2287
		2288
		2289	Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
		2290	former into the full Intel's SSE4 set. On the other hand, the implementation
		2291	by AMD includes only a few instructions from this set, but also contains
		2292	some additional instructions, that are called the SSE4a set.
		2293	The SSE4.1 instructions mostly follow the same rules for operands, as
		2294	the basic SSE operations, so they require destination operand to be SSE
		2295	register and source operand to be 128-bit memory location or SSE register,
		2296	and some operations require a third operand, the 8-bit immediate value.
		2297	"pmulld" performs a signed multiplication of the packed double words and
		2298	stores the low double words of the results in the destination operand.
		2299	"pmuldq" performs a two signed multiplications of the corresponding double
		2300	words in the lower quad words of operands, and stores the results as
		2301	packed quad words into the destination register. "pminsb" and "pmaxsb"
		2302	return the minimum or maximum values of packed signed bytes, "pminuw" and
		2303	"pmaxuw" return the minimum and maximum values of packed unsigned words,
		2304	"pminud", "pmaxud", "pminsd" and "pmaxsd" return minimum or maximum values
		2305	of packed unsigned or signed words. These instructions complement the
		2306	instructions computing packed minimum or maximum introduced by SSE.
		2307	"ptest" sets the ZF flag to one when the result of bitwise AND of the
		2308	both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
		2309	to one, when the result of bitwise AND of the destination operand with
		2310	the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
		2311	"pcmpeqq" compares packed quad words for equality, and fills the
		2312	corresponding elements of destination operand with either ones or zeros,
		2313	depending on the result of comparison.
		2314	"packusdw" converts packed signed double words from both the source and
		2315	destination operand into the unsigned words using saturation, and stores
		2316	the eight resulting word values into the destination register.
		2317	"phminposuw" finds the minimum unsigned word value in source operand and
		2318	places it into the lowest word of destination operand, setting the remaining
		2319	upper bits of destination to zero.
		2320	"roundps", "roundss", "roundpd" and "roundsd" perform the rounding of packed
		2321	or individual floating point value of single or double precision, using the
		2322	rounding mode specified by the third operand.
		2323
		2324
		2325
		2326
		2327	values, that is it multiplies the corresponding pairs of values from source and
		2328	destination operand and then sums the products up. The high four bits of the
		2329	8-bit immediate third operand control which products are calculated and taken
		2330	to the sum, and the low four bits control, into which elements of destination
		2331	the resulting dot product is copied (the other elements are filled with zero).
		2332	"dppd" calculates dot product of packed double precision floating point values.
		2333	The bits 4 and 5 of third operand control, which products are calculated and
		2334	added, and bits 0 and 1 of this value control, which elements in destination
		2335	register should get filled with the result. "mpsadbw" calculates multiple sums
		2336	of absolute differences of unsigned bytes. The third operand controls, with
		2337	value in bits 0-1, which of the four-byte blocks in source operand is taken to
		2338	calculate the absolute differencies, and with value in bit 2, at which of the
		2339	two first four-byte block in destination operand start calculating multiple
		2340	sums. The sum is calculated from four absolute differencies between the
		2341	corresponding unsigned bytes in the source and destination block, and each next
		2342	sum is calculated in the same way, but taking the four bytes from destination
		2343	at the position one byte after the position of previous block. The four bytes
		2344	from the source stay the same each time. This way eight sums of absolute
		2345	differencies are calculated and stored as packed word values into the
		2346	destination operand. The instructions described in this paragraph follow the
		2347	same rules for operands, as "roundps" instruction.
		2348	"blendps", "blendvps", "blendpd" and "blendvpd" conditionally copy the
		2349	values from source operand into the destination operand, depending on the bits
		2350	of the mask provided by third operand. If a mask bit is set, the corresponding
		2351	element of source is copied into the same place in destination, otherwise this
		2352	position is destination is left unchanged. The rules for the first two operands
		2353	are the same, as for general SSE instructions. "blendps" and "blendpd" need
		2354	third operand to be 8-bit immediate, and they operate on single or double
		2355	precision values, respectively. "blendvps" and "blendvpd" require third operand
		2356	to be the XMM0 register.
		2357
		2358
		2359
		2360
		2361	destination, depending on the bits of mask provided by third operand, which
		2362	needs to be 8-bit immediate value. "pblendvb" conditionally copies byte
		2363	elements from the source operands into destination, depending on mask defined
		2364	by the third operand, which has to be XMM0 register. These instructions follow
		2365	the same rules for operands as "blendps" and "blendvps" instructions,
		2366	respectively.
		2367	"insertps" inserts a single precision floating point value taken from the
		2368	position in source operand specified by bits 6-7 of third operand into location
		2369	in destination register selected by bits 4-5 of third operand. Additionally,
		2370	the low four bits of third operand control, which elements in destination
		2371	register will be set to zero. The first two operands follow the same rules as
		2372	for the general SSE operation, the third operand should be 8-bit immediate.
		2373	"extractps" extracts a single precision floating point value taken from the
		2374	location in source operand specified by low two bits of third operand, and
		2375	stores it into the destination operand. The destination can be a 32-bit memory
		2376	value or general purpose register, the source operand must be SSE register,
		2377	and the third operand should be 8-bit immediate value.
		2378
		2379
		2380
		2381
		2382	the source operand into the location of destination operand determined by the
		2383	third operand. The destination operand has to be SSE register, the source
		2384	operand can be a memory location of appropriate size, or the 32-bit general
		2385	purpose register (but 64-bit general purpose register for "pinsrq", which is
		2386	only available in long mode), and the third operand has to be 8-bit immediate
		2387	value. These instructions complement the "pinsrw" instruction operating on SSE
		2388	register destination, which was introduced by SSE2.
		2389
		2390
		2391
		2392
		2393	quad word from the location in source operand specified by third operand, into
		2394	the destination. The source operand should be SSE register, the third operand
		2395	should be 8-bit immediate, and the destination operand can be memory location
		2396	of appropriate size, or the 32-bit general purpose register (but 64-bit general
		2397	purpose register for "pextrq", which is only available in long mode). The
		2398	"pextrw" instruction with SSE register as source was already introduced by
		2399	SSE2, but SSE4 extends it to allow memory operand as destination.
		2400
		2401
		2402
		2403
		2404	byte values from the source operand into packed word values in destination
		2405	operand, which has to be SSE register. The source can be 64-bit memory or SSE
		2406	register - when it is register, only its low portion is used. "pmovsxbd" and
		2407	"pmovzxbd" perform sign extension or zero extension of the four byte values
		2408	from the source operand into packed double word values in destination operand,
		2409	the source can be 32-bit memory or SSE register. "pmovsxbq" and "pmovzxbq"
		2410	perform sign extension or zero extension of the two byte values from the
		2411	source operand into packed quad word values in destination operand, the source
		2412	can be 16-bit memory or SSE register. "pmovsxwd" and "pmovzxwd" perform sign
		2413	extension or zero extension of the four word values from the source operand
		2414	into packed double words in destination operand, the source can be 64-bit
		2415	memory or SSE register. "pmovsxwq" and "pmovzxwq" perform sign extension or
		2416	zero extension of the two word values from the source operand into packed quad
		2417	words in destination operand, the source can be 32-bit memory or SSE register.
		2418	"pmovsxdq" and "pmovzxdq" perform sign extension or zero extension of the two
		2419	double word values from the source operand into packed quad words in
		2420	destination operand, the source can be 64-bit memory or SSE register.
		2421
		2422
		2423	pmovsxwq xmm0,xmm1 ; sign-extend words to quad words
		2424
		2425
		2426	using a non-temporal hint. The destination operand should be SSE register,
		2427	and the source operand should be 128-bit memory location.
		2428	The SSE4.2, described below, adds not only some new operations on SSE
		2429	registers, but also introduces some completely new instructions operating on
		2430	general purpose registers only.
		2431	"pcmpistri" compares two zero-ended (implicit length) strings provided in
		2432	its source and destination operand and generates an index stored to ECX;
		2433	"pcmpistrm" performs the same comparison and generates a mask stored to XMM0.
		2434	"pcmpestri" compares two strings of explicit lengths, with length provided
		2435	in EAX for the destination operand and in EDX for the source operand, and
		2436	generates an index stored to ECX; "pcmpestrm" performs the same comparision
		2437	and generates a mask stored to XMM0. The source and destination operand follow
		2438	the same rules as for general SSE instructions, the third operand should be
		2439	8-bit immediate value determining the details of performed operation - refer to
		2440	Intel documentation for information on those details.
		2441	"pcmpgtq" compares packed quad words, and fills the corresponding elements of
		2442	destination operand with either ones or zeros, depending on whether the value
		2443	in destination is greater than the one in source, or not. This instruction
		2444	follows the same rules for operands as "pcmpeqq".
		2445	"crc32" accumulates a CRC32 value for the source operand starting with
		2446	initial value provided by destination operand, and stores the result in
		2447	destination. Unless in long mode, the destination operand should be a 32-bit
		2448	general purpose register, and the source operand can be a byte, word, or double
		2449	word register or memory location. In long mode the destination operand can
		2450	also be a 64-bit general purpose register, and the source operand in such case
		2451	can be a byte or quad word register or memory location.
		2452
		2453
		2454	crc32 eax,word [ebx] ; accumulate CRC32 on word value
		2455	crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
		2456
		2457
		2458	be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
		2459	and stores this count in the destination operand, which has to be register of
		2460	the same size as source operand. The 64-bit variant is available only in long
		2461	mode.
		2462
		2463
		2464
		2465
		2466	by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the
		2467	same syntax, and calculates the count of leading zero bits in source operand
		2468	(if the source operand is all zero bits, the total number of bits in source
		2469	operand is stored in destination).
		2470	"extrq" extract the sequence of bits from the low quad word of SSE register
		2471	provided as first operand and stores them at the low end of this register,
		2472	filling the remaining bits in the low quad word with zeros. The position of bit
		2473	string and its length can either be provided with two 8-bit immediate values
		2474	as second and third operand, or by SSE register as second operand (and there
		2475	is no third operand in such case), which should contain position value in bits
		2476	8-13 and length of bit string in bits 0-5.
		2477
		2478
		2479	extrq xmm0,xmm5 ; extract bits defined by register
		2480
		2481
		2482	operand into specified position in low quad word of the destination operand,
		2483	leaving the other bits in low quad word of destination intact. The position
		2484	where bits should be written and the length of bit string can either be
		2485	provided with two 8-bit immediate values as third and fourth operand, or by
		2486	the bit fields in source operand (and there are only two operands in such
		2487	case), which should contain position value in bits 72-77 and length of bit
		2488	string in bits 64-69.
		2489
		2490
		2491	insertq xmm1,xmm0 ; insert bits defined by register
		2492
		2493
		2494	value from the source SSE register into 32-bit or 64-bit destination memory
		2495	location respectively, using non-temporal hint.
		2496
		2497
		2498
		2499
		2500
		2501	of SSE instructions, with new scheme of encoding that allows extended syntax
		2502	having a destination operand separate from all the source operands. It also
		2503	introduces 256-bit AVX registers, which extend up the old 128-bit SSE
		2504	registers. Any AVX instruction that puts some result into SSE register, puts
		2505	zero bits into high portion of the AVX register containing it.
		2506	The AVX version of SSE instruction has the mnemonic obtained by prepending
		2507	SSE instruction name with "v". For any SSE arithmetic instruction which had a
		2508	destination operand also being used as one of the source values, the AVX
		2509	variant has a new syntax with three operands - the destination and two sources.
		2510	The destination and first source can be SSE registers, and second source can be
		2511	SSE register or memory. If the operation is performed on single pair of values,
		2512	the remaining bits of first source SSE register are copied into the the
		2513	destination register.
		2514
		2515
		2516	vmulsd xmm0,xmm7,qword [esi] ; multiply two 64-bit floats
		2517
		2518
		2519	data size when the AVX registers are specified instead of SSE registers, and
		2520	the size of memory operand is also doubled then.
		2521
		2522
		2523
		2524
		2525	that earlier had been promoted from MMX to SSE) also acquired the new syntax
		2526	with three operands, however they are only allowed to operate on 128-bit
		2527	packed types and thus cannot use the whole AVX registers.
		2528
		2529
		2530	vpslld xmm1,xmm0,1 ; shift double words left
		2531
		2532
		2533	one being an immediate value, the AVX version of such instruction takes four
		2534	operands, with immediate remaining the last one.
		2535
		2536
		2537	vpalignr xmm0,xmm4,xmm2,3 ; extract byte aligned value
		2538
		2539
		2540	applied to all the instructions from SSE extensions up to SSE4, with the
		2541	exceptions described below.
		2542	"vdppd" instruction has syntax extended to four operans, but it does not
		2543	have a 256-bit version.
		2544	The are a few instructions, namely "vsqrtpd", "vsqrtps", "vrcpps" and
		2545	"vrsqrtps", which can operate on 256-bit data size, but retained the syntax
		2546	with only two operands, because they use data from only one source:
		2547
		2548
		2549
		2550
		2551	operands, the last one being immediate value.
		2552
		2553
		2554
		2555
		2556	three-operand syntax while being promoted to AVX version. In such case these
		2557	instructions follow exactly the same rules for operands as their SSE
		2558	counterparts (since operations on packed integers do not have 256-bit variants
		2559	in AVX extension). These include "vpcmpestri", "vpcmpestrm", "vpcmpistri",
		2560	"vpcmpistrm", "vphminposuw", "vpshufd", "vpshufhw", "vpshuflw". And there are
		2561	more instructions that in AVX versions keep exactly the same syntax for
		2562	operands as the one from SSE, without any additional options: "vcomiss",
		2563	"vcomisd", "vcvtss2si", "vcvtsd2si", "vcvttss2si", "vcvttsd2si", "vextractps",
		2564	"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vmovd", "vmovq", "vmovntdqa",
		2565	"vmaskmovdqu", "vpmovmskb", "vpmovsxbw", "vpmovsxbd", "vpmovsxbq", "vpmovsxwd",
		2566	"vpmovsxwq", "vpmovsxdq", "vpmovzxbw", "vpmovzxbd", "vpmovzxbq", "vpmovzxwd",
		2567	"vpmovzxwq" and "vpmovzxdq".
		2568	The move and conversion instructions have mostly been promoted to allow
		2569	256-bit size operands in addition to the 128-bit variant with syntax identical
		2570	to that from SSE version of the same instruction. Each of the "vcvtdq2ps",
		2571	"vcvtps2dq" and "vcvttps2dq", "vmovaps", "vmovapd", "vmovups", "vmovupd",
		2572	"vmovdqa", "vmovdqu", "vlddqu", "vmovntps", "vmovntpd", "vmovntdq",
		2573	"vmovsldup", "vmovshdup", "vmovmskps" and "vmovmskpd" inherits the 128-bit
		2574	syntax from SSE without any changes, and also allows a new form with 256-bit
		2575	operands in place of 128-bit ones.
		2576
		2577
		2578
		2579
		2580	has a 256-bit version, which stores the duplicates of the lowest quad word
		2581	from the source operand in the lower half of destination operand, and in the
		2582	upper half of destination the duplicates of the low quad word from the upper
		2583	half of source. Both source and destination operands need then to be 256-bit
		2584	values.
		2585	"vmovlhps" and "vmovhlps" have only 128-bit versions, and each takes three
		2586	operands, which all must be SSE registers. "vmovlhps" copies two single
		2587	precision values from the low quad word of second source register to the high
		2588	quad word of destination register, and copies the low quad word of first
		2589	source register into the low quad word of destination register. "vmovhlps"
		2590	copies two single precision values from the high quad word of second source
		2591	register to the low quad word of destination register, and copies the high
		2592	quad word of first source register into the high quad word of destination
		2593	register.
		2594	"vmovlps", "vmovhps", "vmovlpd" and "vmovhpd" have only 128-bit versions and
		2595	their syntax varies depending on whether memory operand is a destination or
		2596	source. When memory is destination, the syntax is identical to the one of
		2597	equivalent SSE instruction, and when memory is source, the instruction requires
		2598	three operands, first two being SSE registers and the third one 64-bit memory.
		2599	The value put into destination is then the value copied from first source with
		2600	either low or high quad word replaced with value from second source (the
		2601	memory operand).
		2602
		2603
		2604	vmovlps xmm0,xmm7,[ebx] ; low from memory, rest from register
		2605
		2606
		2607	as one of the operands is memory, while the versions that operate purely on
		2608	registers require three operands (each being SSE register). The value stored
		2609	in destination is then the value copied from first source with lowest data
		2610	element replaced with the lowest value from second source.
		2611
		2612
		2613	vmovss xmm0,xmm1,xmm2 ; one value from xmm2, three from xmm1
		2614
		2615
		2616	syntax, where destination and first source are always SSE registers, and the
		2617	second source follows the same rules and the source in syntax of equivalent
		2618	SSE instruction. The value stored in destination is then the value copied from
		2619	first source with lowest data element replaced with the result of conversion.
		2620
		2621
		2622	vcvtsi2ss xmm0,xmm0,rax ; 64-bit integer to 32-bit float
		2623
		2624
		2625	plus the new variants with AVX register as destination and SSE register or
		2626	128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and
		2627	"vcvtpd2ps", in addition to variant with syntax identical to SSE version,
		2628	allow a variant with SSE register as destination and AVX register or 256-bit
		2629	memory as source.
		2630	"vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use
		2631	a syntax with four operands, where destination and first source have to be SSE
		2632	registers, and the third and fourth operand follow the same rules as second
		2633	and third operand in the syntax of equivalent SSE instruction. Value stored in
		2634	destination is the the value copied from first source with some data elements
		2635	replaced with values extracted from the second source, analogously to the
		2636	operation of corresponding SSE instruction.
		2637
		2638
		2639
		2640
		2641	operands: destination, two sources and a mask, where second source can also be
		2642	a memory operand. "vblendvps" and "vblendvpd" have 256-bit variant, where
		2643	operands are AVX registers or 256-bit memory, as well as 128-bit variant,
		2644	which has operands being SSE registers or 128-bit memory. "vpblendvb" has only
		2645	a 128-bit variant. Value stored in destination is the value copied from the
		2646	first source with some data elements replaced, according to mask, by values
		2647	from the second source.
		2648
		2649
		2650
		2651
		2652	version, with both operands doubled in size. There are also two new
		2653	instructions, "vtestps" and "vtestpd", which perform analogous tests, but only
		2654	of the sign bits of corresponding single precision or double precision values,
		2655	and set the ZF and CF accordingly. They follow the same syntax rules as
		2656	"vptest".
		2657
		2658
		2659	vtestpd xmm0,xmm1 ; test sign bits of 64-bit floats
		2660
		2661
		2662	which broadcast the data element defined by source operand into all elements
		2663	of corresponing size in the destination register. "vbroadcastss" needs
		2664	source to be 32-bit memory and destination to be either SSE or AVX register.
		2665	"vbroadcastsd" requires 64-bit memory as source, and AVX register as
		2666	destination. "vbroadcastf128" requires 128-bit memory as source, and AVX
		2667	register as destination.
		2668
		2669
		2670
		2671
		2672	destination and first source have to be AVX registers, second source can be
		2673	SSE register or 128-bit memory location, and fourth operand should be an
		2674	immediate value. It stores in destination the value obtained by taking
		2675	contents of first source and replacing one of its 128-bit units with value of
		2676	the second source. The lowest bit of fourth operand specifies at which
		2677	position that replacement is done (either 0 or 1).
		2678	"vextractf128" is the new instruction with three operands. The destination
		2679	needs to be SSE register or 128-bit memory location, the source must be AVX
		2680	register, and the third operand should be an immediate value. It extracts
		2681	into destination one of the 128-bit units from source. The lowest bit of third
		2682	operand specifies, which unit is extracted.
		2683	"vmaskmovps" and "vmaskmovpd" are the new instructions with three operands
		2684	that selectively store in destination the elements from second source
		2685	depending on the sign bits of corresponding elements from first source. These
		2686	instructions can operate on either 128-bit data (SSE registers) or 256-bit
		2687	data (AVX registers). Either destination or second source has to be a memory
		2688	location of appropriate size, the two other operands should be registers.
		2689
		2690
		2691	vmaskmovpd ymm5,ymm0,[esi] ; conditionally load
		2692
		2693
		2694	that permute the values from first source according to the control fields from
		2695	second source and put the result into destination operand. It allows to use
		2696	either three SSE registers or three AVX registers as its operands, the second
		2697	source can be a memory of size equal to the registers used. In alternative
		2698	form the second source can be immediate value and then the first source
		2699	can be a memory location of the size equal to destination register.
		2700	"vperm2f128" is the new instruction with four operands, which selects
		2701	128-bit blocks of floating point data from first and second source according
		2702	to the bit fields from fourth operand, and stores them in destination.
		2703	Destination and first source need to be AVX registers, second source can be
		2704	AVX register or 256-bit memory area, and fourth operand should be an immediate
		2705	value.
		2706
		2707
		2708
		2709
		2710	the upper 128-bit portions of all AVX registers to zero, leaving the SSE
		2711	registers intact. These new instructions take no operands.
		2712	"vldmxcsr" and "vstmxcsr" are the AVX versions of "ldmxcsr" and "stmxcsr"
		2713	instructions. The rules for their operands remain unchanged.
		2714
		2715
		2716
		2717
		2718
		2719	to use 256-bit data types, and introduces some new instructions as well.
		2720	The AVX instructions that operate on packed integers and had only a 128-bit
		2721	variants, have been supplemented with 256-bit variants, and thus their syntax
		2722	rules became analogous to AVX instructions operating on packed floating point
		2723	types.
		2724
		2725
		2726	vpavgw ymm3,ymm0,ymm2 ; average of 16-bit integers
		2727
		2728
		2729	256-bit variants. "vpcmpestri", "vpcmpestrm", "vpcmpistri", "vpcmpistrm",
		2730	"vpextrb", "vpextrw", "vpextrd", "vpextrq", "vpinsrb", "vpinsrw", "vpinsrd",
		2731	"vpinsrq" and "vphminposuw" are not affected by AVX2 and allow only the
		2732	128-bit operands.
		2733	The packed shift instructions, which allowed the third operand specifying
		2734	amount to be SSE register or 128-bit memory location, use the same rules
		2735	for the third operand in their 256-bit variant.
		2736
		2737
		2738	vpsrad ymm0,ymm3,xword [ebx] ; shift double words right
		2739
		2740
		2741	syntax, which shift each element from first source by the amount specified in
		2742	corresponding element of second source, and store the results in destination.
		2743	"vpsllvd" shifts 32-bit elements left, "vpsllvq" shifts 64-bit elements left,
		2744	"vpsrlvd" shifts 32-bit elements right logically, "vpsrlvq" shifts 64-bit
		2745	elements right logically and "vpsravd" shifts 32-bit elements right
		2746	arithmetically.
		2747	The sign-extend and zero-extend instructions, which in AVX versions allowed
		2748	source operand to be SSE register or a memory of specific size, in the new
		2749	256-bit variant need memory of that size doubled or SSE register as source and
		2750	AVX register as destination.
		2751
		2752
		2753
		2754
		2755	transfer 256-bit value from memory to AVX register, it needs memory address
		2756	to be aligned to 32 bytes.
		2757	"vpmaskmovd" and "vpmaskmovq" are the new instructions with syntax identical
		2758	to "vmaskmovps" or "vmaskmovpd", and they performs analogous operation on
		2759	packed 32-bit or 64-bit values.
		2760	"vinserti128", "vextracti128", "vbroadcasti128" and "vperm2i128" are the new
		2761	instructions with syntax identical to "vinsertf128", "vextractf128",
		2762	"vbroadcastf128" and "vperm2f128" respectively, and they perform analogous
		2763	operations on 128-bit blocks of integer data.
		2764	"vbroadcastss" and "vbroadcastsd" instructions have been extended to allow
		2765	SSE register as a source operand (which in AVX could only be a memory).
		2766	"vpbroadcastb", "vpbroadcastw", "vpbroadcastd" and "vpbroadcastq" are the
		2767	new instructions which broadcast the byte, word, double word or quad word from
		2768	the source operand into all elements of corresponing size in the destination
		2769	register. The destination operand can be either SSE or AVX register, and the
		2770	source operand can be SSE register or memory of size equal to the size of data
		2771	element.
		2772
		2773
		2774
		2775
		2776	32-bit element from first source as an index of element in second source which
		2777	is copied into destination at position corresponding to element containing
		2778	index. The destination and first source have to be AVX registers, and the
		2779	second source can be AVX register or 256-bit memory.
		2780	"vpermq" and "vpermpd" are new three-operand instructions, which use 2-bit
		2781	indexes from the immediate value specified as third operand to determine which
		2782	element from source store at given position in destination. The destination
		2783	has to be AVX register, source can be AVX register or 256-bit memory, and the
		2784	third operand must be 8-bit immediate value.
		2785	The family of new instructions performing "gather" operation have special
		2786	syntax, as in their memory operand they use addressing mode that is unique to
		2787	them. The base of address can be a 32-bit or 64-bit general purpose register
		2788	(the latter only in long mode), and the index (possibly multiplied by scale
		2789	value, as in standard addressing) is specified by SSE or AVX register. It is
		2790	possible to use only index without base and any numerical displacement can be
		2791	added to the address. Each of those instructions takes three operands. First
		2792	operand is the destination register, second operand is memory addressed with
		2793	a vector index, and third operand is register containing a mask. The most
		2794	significant bit of each element of mask determines whether a value will be
		2795	loaded from memory into corresponding element in destination. The address of
		2796	each element to load is determined by using the corresponding element from
		2797	index register in memory operand to calculate final address with given base
		2798	and displacement. When the index register contains less elements than the
		2799	destination and mask registers, the higher elements of destination are zeroed.
		2800	After the value is successfuly loaded, the corresponding element in mask
		2801	register is set to zero. The destination, index and mask should all be
		2802	distinct registers, it is not allowed to use the same register in two
		2803	different roles.
		2804	"vgatherdps" loads single precision floating point values addressed by
		2805	32-bit indexes. The destination, index and mask should all be registers of the
		2806	same type, either SSE or AVX. The data addressed by memory operand is 32-bit
		2807	in size.
		2808
		2809
		2810	vgatherdps ymm0,[ebx+ymm7*4],ymm3 ; gather eight floats
		2811
		2812
		2813	64-bit indexes. The destination and mask should always be SSE registers, while
		2814	index register can be either SSE or AVX register. The data addressed by memory
		2815	operand is 32-bit in size.
		2816
		2817
		2818	vgatherqps xmm0,[ymm2+64],xmm3 ; gather four floats
		2819
		2820
		2821	32-bit indexes. The index register should always be SSE register, the
		2822	destination and mask should be two registers of the same type, either SSE or
		2823	AVX. The data addressed by memory operand is 64-bit in size.
		2824
		2825
		2826	vgatherdpd ymm0,[xmm3*8],ymm5 ; gather four doubles
		2827
		2828
		2829	64-bit indexes. The destination, index and mask should all be registers of the
		2830	same type, either SSE or AVX. The data addressed by memory operand is 64-bit
		2831	in size.
		2832	"vpgatherdd" and "vpgatherqd" load 32-bit values addressed by either 32-bit
		2833	or 64-bit indexes. They follow the same rules as "vgatherdps" and "vgatherqps"
		2834	respectively.
		2835	"vpgatherdq" and "vpgatherqq" load 64-bit values addressed by either 32-bit
		2836	or 64-bit indexes. They follow the same rules as "vgatherdpd" and "vgatherqpd"
		2837	respectively.
		2838
		2839
		2840
		2841
		2842
		2843	AVX. They introduce new vector instructions (and sometimes also their SSE
		2844	equivalents that use classic instruction encoding), and even some new
		2845	instructions operating on general registers that use the AVX-like encoding
		2846	allowing the extended syntax with separate destination and source operands.
		2847	The CPU support for each of these instructions sets needs to be determined
		2848	separately.
		2849	The AES extension provides a specialized set of instructions for the
		2850	purpose of cryptographic computations defined by Advanced Encryption Standard.
		2851	Each of these instructions has two versions: the AVX one and the one with
		2852	SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
		2853	details of operation of these instructions.
		2854	"aesenc" and "aesenclast" perform a single round of AES encryption on data
		2855	from first source with a round key from second source, and store result in
		2856	destination. The destination and first source are SSE registers, and the
		2857	second source can be SSE register or 128-bit memory. The AVX versions of these
		2858	instructions, "vaesenc" and "vaesenclast", use the syntax with three operands,
		2859	while the SSE-like version has only two operands, with first operand being
		2860	both the destination and first source.
		2861	"aesdec" and "aesdeclast" perform a single round of AES decryption on data
		2862	from first source with a round key from second source. The syntax rules for
		2863	them and their AVX versions are the same as for "aesenc".
		2864	"aesimc" performs the InvMixColumns transformation of source operand and
		2865	store the result in destination. Both "aesimc" and "vaesimc" use only two
		2866	operands, destination being SSE register, and source being SSE register or
		2867	128-bit memory location.
		2868	"aeskeygenassist" is a helper instruction for generating the round key.
		2869	It needs three operands: destination being SSE register, source being SSE
		2870	register or 128-bit memory, and third operand being 8-bit immediate value.
		2871	The AVX version of this instruction uses the same syntax.
		2872	The CLMUL extension introduces just one instruction, "pclmulqdq", and its
		2873	AVX version as well. This instruction performs a carryless multiplication of
		2874	two 64-bit values selected from first and second source according to the bit
		2875	fields in immediate value. The destination and first source are SSE registers,
		2876	second source is SSE register or 128-bit memory, and immediate value is
		2877	provided as last operand. "vpclmulqdq" takes four operands, while "pclmulqdq"
		2878	takes only three operands, with the first one serving both the role of
		2879	destination and first source.
		2880	The FMA (Fused Multiply-Add) extension introduces additional AVX
		2881	instructions which perform multiplication and summation as single operation.
		2882	Each one takes three operands, first one serving both the role of destination
		2883	and first source, and the following ones being the second and third source.
		2884	The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first
		2885	either "m" or "nm" to select whether result of multiplication should be taken
		2886	as-is or negated, then either "add" or "sub" to select whether third value
		2887	will be added to the product or substracted from the product, then either
		2888	"132", "213" or "231" to select which source operands are multiplied and which
		2889	one is added or substracted, and finally the type of data on which the
		2890	instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE
		2891	instructions promoted to AVX, instructions operating on packed floating point
		2892	values allow 128-bit or 256-bit syntax, in former all the operands are SSE
		2893	registers, but the third one can also be a 128-bit memory, in latter the
		2894	operands are AVX registers and the third one can also be a 256-bit memory.
		2895	Instructions that compute just one floating point result need operands to be
		2896	SSE registers, and the third operand can also be a memory, either 32-bit for
		2897	single precision or 64-bit for double precision.
		2898
		2899
		2900	vfnmadd132sd xmm0,xmm5,[ebx] ; multiply, negate and add
		2901
		2902
		2903	families of instructions with mnemonics starting with either "vfmaddsub" or
		2904	"vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or
		2905	"pd" (the operation must always be on packed values in this case). They add
		2906	to the result of multiplication or substract from it depending on the position
		2907	of value in packed data - instructions from the "vfmaddsub" group add when the
		2908	position is odd and substract when the position is even, instructions from the
		2909	"vfmsubadd" group add when the position is even and subtstract when the
		2910	position is odd. The rules for operands are the same as for other FMA
		2911	instructions.
		2912	The FMA4 instructions are similar to FMA, but use syntax with four operands
		2913	and thus allow destination to be different than all the sources. Their
		2914	mnemonics are identical to FMA instructions with the "132", "213" or "231" cut
		2915	out, as having separate destination operand makes such selection of operands
		2916	superfluous. The multiplication is always performed on values from the first
		2917	and second source, and then the value from third source is added or
		2918	substracted. Either second or third source can be a memory operand, and the
		2919	rules for the sizes of operands are the same as for FMA instructions.
		2920
		2921
		2922	vfmsubss xmm0,xmm1,xmm2,[ebx] ; multiply and substract
		2923
		2924
		2925	"vcvtph2ps", which convert floating point values between single precision and
		2926	half precision (the 16-bit floating point format). "vcvtps2ph" takes three
		2927	operands: destination, source, and rounding controls. The third operand is
		2928	always an immediate, the source is either SSE or AVX register containing
		2929	single precision values, and the destination is SSE register or memory, the
		2930	size of memory is 64 bits when the source is SSE register and 128 bits when
		2931	the source is AVX register. "vcvtph2ps" takes two operands, the destination
		2932	that can be SSE or AVX register, and the source that is SSE register or memory
		2933	with size of the half of destination operand's size.
		2934	The AMD XOP extension introduces a number of new vector instructions with
		2935	encoding and syntax analogous to AVX instructions. "vfrczps", "vfrczss",
		2936	"vfrczpd" and "vfrczsd" extract fractional portions of single or double
		2937	precision values, they all take two operands. The packed operations allow
		2938	either SSE or AVX register as destination, for the other two it has to be SSE
		2939	register. Source can be register of the same type as destination, or memory
		2940	of appropriate size (256-bit for destination being AVX register, 128-bit for
		2941	packed operation with destination being SSE register, 64-bit for operation
		2942	on a solitary double precision value and 32-bit for operation on a solitary
		2943	single precision value).
		2944
		2945
		2946
		2947
		2948	depending on the values of corresponding bits in the fourth operand (the
		2949	selector). If the bit in selector is set, the corresponding bit from first
		2950	source is copied into the same position in destination, otherwise the bit from
		2951	second source is copied. Either second source or selector can be memory
		2952	location, 128-bit or 256-bit depending on whether SSE registers or AVX
		2953	registers are specified as the other operands.
		2954
		2955
		2956	vpcmov ymm0,ymm5,[esi],ymm2 ; source in memory
		2957
		2958
		2959	destination and first source being SSE register, second source being SSE
		2960	register or 128-bit memory and the fourth operand being immediate value
		2961	defining the type of comparison. The mnemonic or instruction is created
		2962	by appending to "vpcom" prefix either "b" or "ub" to compare signed or
		2963	unsigned bytes, "w" or "uw" to compare signed or unsigned words, "d" or "ud"
		2964	to compare signed or unsigned double words, "q" or "uq" to compare signed or
		2965	unsigned quad words. The respective values from the first and second source
		2966	are compared and the corresponding data element in destination is set to
		2967	either all ones or all zeros depending on the result of comparison. The fourth
		2968	operand has to specify one of the eight comparison types (table 2.5). All
		2969	these instructions have also variants with only three operands and the type
		2970	of comparison encoded within the instruction name by inserting the comparison
		2971	mnemonic after "vpcom".
		2972
		2973
		2974	vpcomgew xmm0,xmm1,[ebx] ; compare signed words
		2975
		2976
		2977	/-------------------------------------------\
		2978	\| Code \| Mnemonic \| Description \|
		2979	\|======\|==========\|=========================\|
		2980	\| 0 \| lt \| less than \|
		2981	\| 1 \| le \| less than or equal \|
		2982	\| 2 \| gt \| greater than \|
		2983	\| 3 \| ge \| greater than or equal \|
		2984	\| 4 \| eq \| equal \|
		2985	\| 5 \| neq \| not equal \|
		2986	\| 6 \| false \| false \|
		2987	\| 7 \| true \| true \|
		2988	\-------------------------------------------/
		2989
		2990
		2991	zero or to a value selected from first or second source depending on the
		2992	corresponding bit fields from the fourth operand (the selector) and the
		2993	immediate value provided in fifth operand. Refer to the AMD manuals for the
		2994	detailed explanation of the operation performed by these instructions. Each
		2995	of the first four operands can be a register, and either second source or
		2996	selector can be memory location, 128-bit or 256-bit depending on whether SSE
		2997	registers or AVX registers are used for the other operands.
		2998
		2999
		3000
		3001
		3002	stores them at the same positions in destination. "vphaddubw" does the same
		3003	but treats the bytes as unsigned. "vphaddbd" and "vphaddubd" sum all bytes
		3004	(either signed or unsigned) in each four-byte block to 32-bit results,
		3005	"vphaddbq" and "vphaddubq" sum all bytes in each eight-byte block to
		3006	64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit
		3007	results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to
		3008	64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit
		3009	results. "vphsubbw" substracts in each two-byte block the byte at higher
		3010	position from the one at lower position, and stores the result as a signed
		3011	16-bit value at the corresponding position in destination, "vphsubwd"
		3012	substracts in each two-word block the word at higher position from the one at
		3013	lower position and makes signed 32-bit results, "vphsubdq" substract in each
		3014	block of two double word the one at higher position from the one at lower
		3015	position and makes signed 64-bit results. Each of these instructions takes
		3016	two operands, the destination being SSE register, and the source being SSE
		3017	register or 128-bit memory.
		3018
		3019
		3020
		3021
		3022	from the first and second source and then add the products to the parallel
		3023	values from the third source, then "vpmacsww" takes the lowest 16 bits of the
		3024	result and "vpmacssww" saturates the result down to 16-bit value, and they
		3025	store the final 16-bit results in the destination. "vpmacsdd" and "vpmacssdd"
		3026	perform the analogous operation on 32-bit values. "vpmacswd" and "vpmacsswd" do
		3027	the same calculation only on the low 16-bit values from each 32-bit block and
		3028	form the 32-bit results. "vpmacsdql" and "vpmacssdql" perform such operation
		3029	on the low 32-bit values from each 64-bit block and form the 64-bit results,
		3030	while "vpmacsdqh" and "vpmacssdqh" do the same on the high 32-bit values from
		3031	each 64-bit block, also forming the 64-bit results. "vpmadcswd" and
		3032	"vpmadcsswd" multiply the corresponding signed 16-bit value from the first
		3033	and second source, then sum all the four products and add this sum to each
		3034	16-bit element from third source, storing the truncated or saturated result
		3035	in destination. All these instructions take four operands, the second source
		3036	can be 128-bit memory or SSE register, all the other operands have to be
		3037	SSE registers.
		3038
		3039
		3040
		3041
		3042	separate transformation to each of them, and stores them in the destination.
		3043	The bit fields in fourth operand (the selector) specify for each position in
		3044	destination what byte from which source is taken and what operation is applied
		3045	to it before it is stored there. Refer to the AMD manuals for the detailed
		3046	information about these bit fields. This instruction takes four operands,
		3047	either second source or selector can be a 128-bit memory (or they can be SSE
		3048	registers both), all the other operands have to be SSE registers.
		3049	"vpshlb", "vpshlw", "vpshld" and "vpshlq" shift logically bytes, words, double
		3050	words or quad words respectively. The amount of bits to shift by is specified
		3051	for each element separately by the signed byte placed at the corresponding
		3052	position in the third operand. The source containing elements to shift is
		3053	provided as second operand. Either second or third operand can be 128-bit
		3054	memory (or they can be SSE registers both) and the other operands have to be
		3055	SSE registers.
		3056
		3057
		3058
		3059
		3060	double words or quad words. These instructions follow the same rules as the
		3061	logical shifts described above. "vprotb", "vprotw", "vprotd" and "vprotq"
		3062	rotate bytes, word, double words or quad words. They follow the same rules as
		3063	shifts, but additionally allow third operand to be immediate value, in which
		3064	case the same amount of rotation is specified for all the elements in source.
		3065
		3066
		3067
		3068
		3069	swaps bytes in value from source before storing it in destination, so can
		3070	be used to load and store big endian values. It takes two operands, either
		3071	the destination or source should be a 16-bit, 32-bit or 64-bit memory (the
		3072	last one being only allowed in long mode), and the other operand should be
		3073	a general register of the same size.
		3074	The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces
		3075	new instructions operating on general registers, which use the same encoding
		3076	as AVX instructions and so allow the extended syntax. All these instructions
		3077	use 32-bit operands, and in long mode they also allow the forms with 64-bit
		3078	operands.
		3079	"andn" calculates the bitwise AND of second source with the inverted bits
		3080	of first source and stores the result in destination. The destination and
		3081	the first source have to be general registers, the second source can be
		3082	general register or memory.
		3083
		3084
		3085
		3086
		3087	and length specified by bit fields in the second source operand and stores
		3088	it into destination. The lowest 8 bits of second source specify the position
		3089	of bit sequence to extract and the next 8 bits of second source specify the
		3090	length of sequence. The first source can be a general register or memory,
		3091	the other two operands have to be general registers.
		3092
		3093
		3094
		3095
		3096	bits in destination to zero. The destination must be a general register,
		3097	the source can be general register or memory.
		3098
		3099
		3100
		3101
		3102	the source, including this bit. "blsr" copies all the bits from the source to
		3103	destination except for the lowest set bit, which is replaced by zero. These
		3104	instructions follow the same rules for operands as "blsi".
		3105	"tzcnt" counts the number of trailing zero bits, that is the zero bits up to
		3106	the lowest set bit of source value. This instruction is analogous to "lzcnt"
		3107	and follows the same rules for operands, so it also has a 16-bit version,
		3108	unlike the other BMI instructions.
		3109	"bzhi" is BMI2 instruction, which copies the bits from first source to
		3110	destination, zeroing all the bits up from the position specified by second
		3111	source. It follows the same rules for operands as "bextr".
		3112	"pext" uses a mask in second source operand to select bits from first
		3113	operands and puts the selected bits as a continuous sequence into destination.
		3114	"pdep" performs the reverse operation - it takes sequence of bits from the
		3115	first source and puts them consecutively at the positions where the bits in
		3116	second source are set, setting all the other bits in destination to zero.
		3117	These BMI2 instructions follow the same rules for operands as "andn".
		3118	"mulx" is a BMI2 instruction which performs an unsigned multiplication of
		3119	value from EDX or RDX register (depending on the size of specified operands)
		3120	by the value from third operand, and stores the low half of result in the
		3121	second operand, and the high half of result in the first operand, and it does
		3122	it without affecting the flags. The third operand can be general register or
		3123	memory, and both the destination operands have to be general registers.
		3124
		3125
		3126
		3127
		3128	arithmetical shifts of value from first source by the amount specified by
		3129	second source, and store the result in destination without affecting the
		3130	flags. The have the same rules for operands as "bzhi" instruction.
		3131	"rorx" is a BMI2 instruction which rotates right the value from source
		3132	operand by the constant amount specified in third operand and stores the
		3133	result in destination without affecting the flags. The destination operand
		3134	has to be general register, the source operand can be general register or
		3135	memory, and the third operand has to be an immediate value.
		3136
		3137
		3138
		3139
		3140	"bextr" instruction is extended with a new form, in which second source is
		3141	a 32-bit immediate value. "blsic" is a new instruction which performs the
		3142	same operation as "blsi", but with the bits of result reversed. It uses the
		3143	same rules for operands as "blsi". "blsfill" is a new instruction, which takes
		3144	the value from source, sets all the bits below the lowest set bit and store
		3145	the result in destination, it also uses the same rules for operands as "blsi".
		3146	"blci", "blcic", "blcs", "blcmsk" and "blcfill" are instructions analogous
		3147	to "blsi", "blsic", "blsr", "blsmsk" and "blsfill" respectively, but they
		3148	perform the bit-inverted versions of the same operations. They follow the
		3149	same rules for operands as the instructions they reflect.
		3150	"tzmsk" finds the lowest set bit in value from source operand, sets all bits
		3151	below it to 1 and all the rest of bits to zero, then writes the result to
		3152	destination. "t1mskc" finds the least significant zero bit in the value from
		3153	source operand, sets the bits below it to zero and all the other bits to 1,
		3154	and writes the result to destination. These instructions have the same rules
		3155	for operands as "blsi".
		3156
		3157
		3158
		3159
		3160
		3161	assembler, and the general syntax of the instructions introduced by those
		3162	extensions is provided here. For a detailed information on the operations
		3163	performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE,
		3164	RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM
		3165	extension).
		3166	The Virtual-Machine Extensions (VMX) provide a set of instructions for the
		3167	management of virtual machines. The "vmxon" instruction, which enters the VMX
		3168	operation, requires a single 64-bit memory operand, which should be a physical
		3169	address of memory region, which the logical processor may use to support VMX
		3170	operation. The "vmxoff" instruction, which leaves the VMX operation, has no
		3171	operands. The "vmlaunch" and "vmresume", which launch or resume the virtual
		3172	machines, and "vmcall", which allows guest software to call the VM monitor,
		3173	use no operands either.
		3174	The "vmptrld" loads the physical address of current Virtual Machine Control
		3175	Structure (VMCS) from its memory operand, "vmptrst" stores the pointer to
		3176	current VMCS into address specified by its memory operand, and "vmclear" sets
		3177	the launch state of the VMCS referenced by its memory operand to clear. These
		3178	three instruction all require single 64-bit memory operand.
		3179	The "vmread" reads from VCMS a field specified by the source operand and
		3180	stores it into the destination operand. The source operand should be a
		3181	general purpose register, and the destination operand can be a register of
		3182	memory. The "vmwrite" writes into a VMCS field specified by the destination
		3183	operand the value provided by source operand. The source operand can be a
		3184	general purpose register or memory, and the destination operand must be a
		3185	register. The size of operands for those instructions should be 64-bit when
		3186	in long mode, and 32-bit otherwise.
		3187	The "invept" and "invvpid" invalidate the translation lookaside buffers
		3188	(TLBs) and paging-structure caches, either derived from extended page tables
		3189	(EPT), or based on the virtual processor identifier (VPID). These instructions
		3190	require two operands, the first one being the general purpose register
		3191	specifying the type of invalidation, and the second one being a 128-bit
		3192	memory operand providing the invalidation descriptor. The first operand
		3193	should be a 64-bit register when in long mode, and 32-bit register otherwise.
		3194	The Safer Mode Extensions (SMX) provide the functionalities available
		3195	throught the "getsec" instruction. This instruction takes no operands, and
		3196	the function that is executed is determined by the contents of EAX register
		3197	upon executing this instruction.
		3198	The Secure Virtual Machine (SVM) is a variant of virtual machine extension
		3199	used by AMD. The "skinit" instruction securely reinitializes the processor
		3200	allowing the startup of trusted software, such as the virtual machine monitor
		3201	(VMM). This instruction takes a single operand, which must be EAX, and
		3202	provides a physical address of the secure loader block (SLB).
		3203	The "vmrun" instruction is used to start a guest virtual machine,
		3204	its only operand should be an accumulator register (AX, EAX or RAX, the
		3205	last one available only in long mode) providing the physical address of the
		3206	virtual machine control block (VMCB). The "vmsave" stores a subset of
		3207	processor state into VMCB specified by its operand, and "vmload" loads the
		3208	same subset of processor state from a specified VMCB. The same operand rules
		3209	as for the "vmrun" apply to those two instructions.
		3210	"vmmcall" allows the guest software to call the VMM. This instruction takes
		3211	no operands.
		3212	"stgi" set the global interrupt flag to 1, and "clgi" zeroes it. These
		3213	instructions take no operands.
		3214	"invlpga" invalidates the TLB mapping for a virtual page specified by the
		3215	first operand (which has to be accumulator register) and address space
		3216	identifier specified by the second operand (which must be ECX register).
		3217	The XSAVE set of instructions allows to save and restore processor state
		3218	components. "xsave" and "xsaveopt" store the components of processor state
		3219	defined by bit mask in EDX and EAX registers into area defined by memory
		3220	operand. "xrstor" restores from the area specified by memory operand the
		3221	components of processor state defined by mask in EDX and EAX. The "xsave64",
		3222	"xsaveopt64" and "xrstor64" are 64-bit versions of these instructions, allowed
		3223	only in long mode.
		3224	"xgetbv" read the contents of 64-bit XCR (extended control register)
		3225	specified in ECX register into EDX and EAX registers. "xsetbv" writes the
		3226	contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
		3227	instructions have no operands.
		3228	The RDRAND extension introduces one new instruction, "rdrand", which loads
		3229	the hardware-generated random value into general register. It takes one
		3230	operand, which can be 16-bit, 32-bit or 64-bit register (with the last one
		3231	being allowed only in long mode).
		3232	The FSGSBASE extension adds long mode instructions that allow to read and
		3233	write the segment base registers for FS and GS segments. "rdfsbase" and
		3234	"rdgsbase" read the corresponding segment base registers into operand, while
		3235	"wrfsbase" and "wrgsbase" write the value of operand into those register.
		3236	All these instructions take one operand, which can be 32-bit or 64-bit general
		3237	register.
		3238	The INVPCID extension adds "invpcid" instruction, which invalidates mapping
		3239	in the TLBs and paging caches based on the invalidation type specified in
		3240	first operand and PCID invalidate descriptor specified in second operand.
		3241	The first operands should be 32-bit general register when not in long mode,
		3242	or 64-bit general register when in long mode. The second operand should be
		3243	128-bit memory location.
		3244	The HLE and RTM extensions provide set of instructions for the transactional
		3245	management. The "xacquire" and "xrelease" are new prefixes that can be used
		3246	with some of the instructions to start or end lock elision on the memory
		3247	address specified by prefixed instruction. The "xbegin" instruction starts
		3248	the transactional execution, its operand is the address a fallback routine
		3249	that gets executes in case of transaction abort, specified like the operand
		3250	for near jump instruction. "xend" marks the end of transcational execution
		3251	region, it takes no operands. "xabort" forces the transaction abort, it takes
		3252	an 8-bit immediate value as its only operand, this value is passed in the
		3253	highest bits of EAX to the fallback routine. "xtest" checks whether there is
		3254	transactional execution in progress, this instruction takes no operands.
		3255
		3256
		3257
		3258
		3259
		3260	are processed during the assembly and may cause some blocks of instructions
		3261	to be assembled differently or not assembled at all.
		3262
		3263
		3264
		3265
		3266
		3267	preceded by the name for the constant and followed by the numerical expression
		3268	providing the value. The value of such constants can be a number or an address,
		3269	but - unlike labels - the numerical constants are not allowed to hold the
		3270	register-based addresses. Besides this difference, in their basic variant
		3271	numerical constants behave very much like labels and you can even
		3272	forward-reference them (access their values before they actually get defined).
		3273	There is, however, a second variant of numerical constants, which is
		3274	recognized by assembler when you try to define the constant of name, under
		3275	which there already was a numerical constant defined. In such case assembler
		3276	treats that constant as an assembly-time variable and allows it to be assigned
		3277	with new value, but forbids forward-referencing it (for obvious reasons). Let's
		3278	see both the variant of numerical constants in one example:
		3279
		3280
		3281	x = 1
		3282	x = x+2
		3283	sum = x
		3284
		3285
		3286	value that was assigned to it the most recently is used. Thus if we tried to
		3287	access the "x" before it gets defined the first time, like if we wrote "dd x"
		3288	in place of the "dd sum" instruction, it would cause an error. And when it is
		3289	re-defined with the "x = x+2" directive, the previous value of "x" is used to
		3290	calculate the new one. So when the "sum" constant gets defined, the "x" has
		3291	value of 3, and this value is assigned to the "sum". Since this one is defined
		3292	only once in source, it is the standard numerical constant, and can be
		3293	forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
		3294	how the assembler is able to resolve this, see section 2.2.6.
		3295	The value of numerical constant can be preceded by size operator, which can
		3296	ensure that the value will fit in the range for the specified size, and can
		3297	affect also how some of the calculations inside the numerical expression are
		3298	performed. This example:
		3299
		3300
		3301	c32 = dword -1
		3302
		3303
		3304	fits in 32 bits.
		3305	When you need to define constant with the value of address, which may be
		3306	register-based (and thus you cannot employ numerical constant for this
		3307	purpose), you can use the extended syntax of "label" directive (already
		3308	described in section 1.2.3), like:
		3309
		3310
		3311
		3312
		3313	unlike numerical constants, cannot become assembly-time variables.
		3314
		3315
		3316
		3317
		3318
		3319	certain condition. It should be followed by logical expression specifying the
		3320	condition, instructions in next lines will be assembled only when this
		3321	condition is met, otherwise they will be skipped. The optional "else if"
		3322	directive followed with logical expression specifying additional condition
		3323	begins the next block of instructions that will be assembled if previous
		3324	conditions were not met, and the additional condition is met. The optional
		3325	"else" directive begins the block of instructions that will be assembled if
		3326	all the conditions were not met. The "end if" directive ends the last block of
		3327	instructions.
		3328	You should note that "if" directive is processed at assembly stage and
		3329	therefore it doesn't affect any preprocessor directives, like the definitions
		3330	of symbolic constants and macroinstructions - when the assembler recognizes the
		3331	"if" directive, all the preprocessing has been already finished.
		3332	The logical expression consist of logical values and logical operators. The
		3333	logical operators are "~" for logical negation, "&" for logical and, "\|" for
		3334	logical or. The negation has the highest priority. Logical value can be a
		3335	numerical expression, it will be false if it is equal to zero, otherwise it
		3336	will be true. Two numerical expression can be compared using one of the
		3337	following operators to make the logical value: "=" (equal), "<" (less),
		3338	">" (greater), "<=" (less or equal), ">=" (greater or equal),
		3339	"<>" (not equal).
		3340	The "used" operator followed by a symbol name, is the logical value that
		3341	checks whether the given symbol is used somewhere (it returns correct result
		3342	even if symbol is used only after this check). The "defined" operator can be
		3343	followed by any expression, usually just by a single symbol name; it checks
		3344	whether the given expression contains only symbols that are defined in the
		3345	source and accessible from the current position.
		3346	With "relativeto" operator it is possible to check whether values of two
		3347	expressions differ only by constant amount. The valid syntax is a numerical
		3348	expression followed by "relativeto" and then another expression (possibly
		3349	register-based). Labels that have no simple numerical value can be tested
		3350	this way to determine what kind of operations may be possible with them.
		3351	The following simple example uses the "count" constant that should be
		3352	defined somewhere in source:
		3353
		3354
		3355	mov cx,count
		3356	rep movsb
		3357	end if
		3358
		3359
		3360	is greater than 0. The next sample shows more complex conditional structure:
		3361
		3362
		3363	mov cx,count/4
		3364	rep movsd
		3365	else if count>4
		3366	mov cx,count/4
		3367	rep movsd
		3368	mov cx,count mod 4
		3369	rep movsb
		3370	else
		3371	mov cx,count
		3372	rep movsb
		3373	end if
		3374
		3375
		3376	divisible by four, if this condition is not met, the second logical expression,
		3377	which follows the "else if", is evaluated and if it's true, the second block
		3378	of instructions get assembled, otherwise the last block of instructions, which
		3379	follows the line containing only "else", is assembled.
		3380	There are also operators that allow comparison of values being any chains of
		3381	symbols. The "eq" compares whether two such values are exactly the same.
		3382	The "in" operator checks whether given value is a member of the list of values
		3383	following this operator, the list should be enclosed between "<" and ">"
		3384	characters, its members should be separated with commas. The symbols are
		3385	considered the same when they have the same meaning for the assembler - for
		3386	example "pword" and "fword" for assembler are the same and thus are not
		3387	distinguished by the above operators. In the same way "16 eq 10h" is the true
		3388	condition, however "16 eq 10+4" is not.
		3389	The "eqtype" operator checks whether the two compared values have the same
		3390	structure, and whether the structural elements are of the same type. The
		3391	distinguished types include numerical expressions, individual quoted strings,
		3392	floating point numbers, address expressions (the expressions enclosed in square
		3393	brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
		3394	operators, jump type and code type operators. And each of the special
		3395	characters that act as a separators, like comma or colon, is the separate type
		3396	itself. For example, two values, each one consisting of register name followed
		3397	by comma and numerical expression, will be regarded as of the same type, no
		3398	matter what kind of register and how complicated numerical expression is used;
		3399	with exception for the quoted strings and floating point values, which are the
		3400	special kinds of numerical expressions and are treated as different types. Thus
		3401	"eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
		3402
		3403
		3404
		3405
		3406
		3407	should be followed by numerical expression specifying number of repeats and
		3408	the instruction to repeat (optionally colon can be used to separate number and
		3409	instruction). When special symbol "%" is used inside the instruction, it is
		3410	equal to the number of current repeat. For example "times 5 db %" will define
		3411	five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
		3412	also allowed, so "times 3 times % db %" will define six bytes with values
		3413	1, 1, 2, 1, 2, 3.
		3414	"repeat" directive repeats the whole block of instructions. It should be
		3415	followed by numerical expression specifying number of repeats. Instructions
		3416	to repeat are expected in next lines, ended with the "end repeat" directive,
		3417	for example:
		3418
		3419
		3420	mov byte [bx],%
		3421	inc bx
		3422	end repeat
		3423
		3424
		3425	addressed by BX register.
		3426	Number of repeats can be zero, in that case the instructions are not
		3427	assembled at all.
		3428	The "break" directive allows to stop repeating earlier and continue assembly
		3429	from the first line after the "end repeat". Combined with the "if" directive it
		3430	allows to stop repeating under some special condition, like:
		3431
		3432
		3433	repeat 100
		3434	if x/s = s
		3435	break
		3436	end if
		3437	s = (s+x/s)/2
		3438	end repeat
		3439
		3440
		3441	condition specified by the logical expression following it is true. The block
		3442	of instructions to be repeated should end with the "end while" directive.
		3443	Before each repetition the logical expression is evaluated and when its value
		3444	is false, the assembly is continued starting from the first line after the
		3445	"end while". Also in this case the "%" symbol holds the number of current
		3446	repeat. The "break" directive can be used to stop this kind of loop in the same
		3447	way as with "repeat" directive. The previous sample can be rewritten to use the
		3448	"while" instead of "repeat" this way:
		3449
		3450
		3451	while x/s <> s
		3452	s = (s+x/s)/2
		3453	if % = 100
		3454	break
		3455	end if
		3456	end while
		3457
		3458
		3459	order, however they should be closed in the same order in which they were
		3460	started. The "break" directive always stops processing the block that was
		3461	started last with either the "repeat" or "while" directive.
		3462
		3463
		3464
		3465
		3466
		3467	appear in memory. It should be followed by numerical expression specifying
		3468	the address. This directive begins the new addressing space, the following
		3469	code itself is not moved in any way, but all the labels defined within it
		3470	and the value of "$" symbol are affected as if it was put at the given
		3471	address. However it's the responsibility of programmer to put the code at
		3472	correct address at run-time.
		3473	The "load" directive allows to define constant with a binary value loaded
		3474	from the already assembled code. This directive should be followed by the name
		3475	of the constant, then optionally size operator, then "from" operator and a
		3476	numerical expression specifying a valid address in current addressing space.
		3477	The size operator has unusual meaning in this case - it states how many bytes
		3478	(up to 8) have to be loaded to form the binary value of constant. If no size
		3479	operator is specified, one byte is loaded (thus value is in range from 0 to
		3480	255). The loaded data cannot exceed current offset.
		3481	The "store" directive can modify the already generated code by replacing
		3482	some of the previously generated data with the value defined by given
		3483	numerical expression, which follows. The expression can be preceded by the
		3484	optional size operator to specify how large value the expression defines, and
		3485	therefore how much bytes will be stored, if there is no size operator, the
		3486	size of one byte is assumed. Then the "at" operator and the numerical
		3487	expression defining the valid address in current addressing code space, at
		3488	which the given value have to be stored should follow. This is a directive for
		3489	advanced appliances and should be used carefully.
		3490	Both "load" and "store" directives are limited to operate on places in
		3491	current addressing space. The "$$" symbol is always equal to the base address
		3492	of current addressing space, and the "$" symbol is the address of current
		3493	position in that addressing space, therefore these two values define limits
		3494	of the area, where "load" and "store" can operate.
		3495	Combining the "load" and "store" directives allows to do things like encoding
		3496	some of the already generated code. For example to encode the whole code
		3497	generated in current addressing space you can use such block of directives:
		3498
		3499
		3500	load a byte from $$+%-1
		3501	store byte a xor c at $$+%-1
		3502	end repeat
		3503
		3504
		3505	"virtual" defines virtual data at specified address. This data will not be
		3506	included in the output file, but labels defined there can be used in other
		3507	parts of source. This directive can be followed by "at" operator and the
		3508	numerical expression specifying the address for virtual data, otherwise is
		3509	uses current address, the same as "virtual at $". Instructions defining data
		3510	are expected in next lines, ended with "end virtual" directive. The block of
		3511	virtual instructions itself is an independent addressing space, after it's
		3512	ended, the context of previous addressing space is restored.
		3513	The "virtual" directive can be used to create union of some variables, for
		3514	example:
		3515
		3516
		3517	virtual at GDTR
		3518	GDT_limit dw ?
		3519	GDT_address dd ?
		3520	end virtual
		3521
		3522
		3523	It can be also used to define labels for some structures addressed by a
		3524	register, for example:
		3525
		3526
		3527	LDT_limit dw ?
		3528	LDT_address dd ?
		3529	end virtual
		3530
		3531
		3532	to the same instruction as "mov ax,[bx]".
		3533	Declaring defined data values or instructions inside the virtual block would
		3534	also be useful, because the "load" directive can be used to load the values
		3535	from the virtually generated code into a constants. This directive should be
		3536	used after the code it loads but before the virtual block ends, because it can
		3537	only load the values from the same addressing space. For example:
		3538
		3539
		3540	xor eax,eax
		3541	and edx,eax
		3542	load zeroq dword from 0
		3543	end virtual
		3544
		3545
		3546	of the machine code of the instructions defined inside the virtual block.
		3547	This method can be also used to load some binary value from external file.
		3548	For example this code:
		3549
		3550
		3551	file 'a.txt':10h,1
		3552	load char from 0
		3553	end virtual
		3554
		3555
		3556	constant.
		3557	Any of the "section" directives described in 2.4 also begins a new
		3558	addressing space.
		3559
		3560
		3561
		3562
		3563
		3564	be followed by a numerical expression specifying the number of bytes, to the
		3565	multiply of which the current address has to be aligned. The boundary value
		3566	has to be the power of two.
		3567	The "align" directive fills the bytes that had to be skipped to perform the
		3568	alignment with the "nop" instructions and at the same time marks this area as
		3569	uninitialized data, so if it is placed among other uninitialized data that
		3570	wouldn't take space in the output file, the alignment bytes will act the same
		3571	way. If you need to fill the alignment area with some other values, you can
		3572	combine "align" with "virtual" to get the size of alignment needed and then
		3573	create the alignment yourself, like:
		3574
		3575
		3576	align 16
		3577	a = $ - $$
		3578	end virtual
		3579	db a dup 0
		3580
		3581
		3582	alignment and address of the "virtual" block (see previous section), so it is
		3583	equal to the size of needed alignment space.
		3584	"display" directive displays the message at the assembly time. It should
		3585	be followed by the quoted strings or byte values, separated with commas. It
		3586	can be used to display values of some constants, for example:
		3587
		3588
		3589	display 'Current offset is 0x'
		3590	repeat bits/4
		3591	d = '0' + $ shr (bits-%*4) and 0Fh
		3592	if d > '9'
		3593	d = d + 'A'-'9'-1
		3594	end if
		3595	display d
		3596	end repeat
		3597	display 13,10
		3598
		3599
		3600	value and converts them into characters for displaying. Note that this will
		3601	not work if the adresses in current addressing space are relocatable (as it
		3602	might happen with PE or object output formats), since only absolute values can
		3603	be used this way. The absolute value may be obtained by calculating the
		3604	relative address, like "$-$$", or "rva $" in case of PE format.
		3605	The "err" directive immediately terminates the assembly process when it is
		3606	encountered by assembler.
		3607	The "assert" directive tests whether the logical expression that follows it
		3608	is true, and if not, it signalizes the error.
		3609
		3610
		3611
		3612
		3613
		3614	before they get actually defined, it has to predict the values of such labels
		3615	and if there is even a suspicion that prediction failed in at least one case,
		3616	it does one more pass, assembling the whole source, this time doing better
		3617	prediction based on the values the labels got in the previous pass.
		3618	The changing values of labels can cause some instructions to have encodings
		3619	of different length, and this can cause the change in values of labels again.
		3620	And since the labels and constants can also be used inside the expressions that
		3621	affect the behavior of control directives, the whole block of source can be
		3622	processed completely differently during the new pass. Thus the assembler does
		3623	more and more passes, each time trying to do better predictions to approach
		3624	the final solution, when all the values get predicted correctly. It uses
		3625	various method for predicting the values, which has been chosen to allow
		3626	finding in a few passes the solution of possibly smallest length for the most
		3627	of the programs.
		3628	Some of the errors, like the values not fitting in required boundaries, are
		3629	not signaled during those intermediate passes, since it may happen that when
		3630	some of the values are predicted better, these errors will disappear. However
		3631	if assembler meets some illegal syntax construction or unknown instruction, it
		3632	always stops immediately. Also defining some label more than once causes such
		3633	error, because it makes the predictions groundless.
		3634	Only the messages created with the "display" directive during the last
		3635	performed pass get actually displayed. In case when the assembly has been
		3636	stopped due to an error, these messages may reflect the predicted values that
		3637	are not yet resolved correctly.
		3638	The solution may sometimes not exist and in such cases the assembler will
		3639	never manage to make correct predictions - for this reason there is a limit for
		3640	a number of passes, and when assembler reaches this limit, it stops and
		3641	displays the message that it is not able to generate the correct output.
		3642	Consider the following example:
		3643
		3644
		3645	alpha:
		3646	end if
		3647
		3648
		3649	could be calculated in this place, what in this case means that the "alpha"
		3650	label is defined somewhere. But the above block causes this label to be defined
		3651	only when the value given by "defined" operator is false, what leads to an
		3652	antynomy and makes it impossible to resolve such code. When processing the "if"
		3653	directive assembler has to predict whether the "alpha" label will be defined
		3654	somewhere (it wouldn't have to predict only if the label was already defined
		3655	earlier in this pass), and whatever the prediction is, the opposite always
		3656	happens. Thus the assembly will fail, unless the "alpha" label is defined
		3657	somewhere in source preceding the above block of instructions - in such case,
		3658	as it was already noted, the prediction is not needed and the block will just
		3659	get skipped.
		3660	The above sample might have been written as a try to define the label only
		3661	when it was not yet defined. It fails, because the "defined" operator does
		3662	check whether the label is defined anywhere, and this includes the definition
		3663	inside this conditionally processed block. However adding some additional
		3664	condition may make it possible to get it resolved:
		3665
		3666
		3667	alpha:
		3668	@@:
		3669	end if
		3670
		3671
		3672	following it, so the above sample would mean the same if any unique name was
		3673	used instead of the anonymous label. When "alpha" is not defined in any other
		3674	place in source, the only possible solution is when this block gets defined,
		3675	and this time this doesn't lead to the antynomy, because of the anonymous
		3676	label which makes this block self-establishing. To better understand this,
		3677	look at the blocks that has nothing more than this self-establishing:
		3678
		3679
		3680	@@:
		3681	end if
		3682
		3683
		3684	cases when this block gets processed or not are equally correct. Which one of
		3685	those two solutions we get depends on the algorithm on the assembler, in case
		3686	of flat assembler - on the algorithm of predictions. Back to the previous
		3687	sample, when "alpha" is not defined anywhere else, the condition for "if" block
		3688	cannot be false, so we are left with only one possible solution, and we can
		3689	hope the assembler will arrive at it. On the other hand, when "alpha" is
		3690	defined in some other place, we've got two possible solutions again, but one of
		3691	them causes "alpha" to be defined twice, and such an error causes assembler to
		3692	abort the assembly immediately, as this is the kind of error that deeply
		3693	disturbs the process of resolving. So we can get such source either correctly
		3694	resolved or causing an error, and what we get may depend on the internal
		3695	choices made by the assembler.
		3696	However there are some facts about such choices that are certain. When
		3697	assembler has to check whether the given symbol is defined and it was already
		3698	defined in the current pass, no prediction is needed - it was already noted
		3699	above. And when the given symbol has been defined never before, including all
		3700	the already finished passes, the assembler predicts it to be not defined.
		3701	Knowing this, we can expect that the simple self-establishing block shown
		3702	above will not be assembled at all and that the previous sample will resolve
		3703	correctly when "alpha" is defined somewhere before our conditional block,
		3704	while it will itself define "alpha" when it's not already defined earlier, thus
		3705	potentially causing the error because of double definition if the "alpha" is
		3706	also defined somewhere later.
		3707	The "used" operator may be expected to behave in a similar manner in
		3708	analogous cases, however any other kinds of predictions may not be so simple and
		3709	you should never rely on them this way.
		3710	The "err" directive, usually used to stop the assembly when some condition is
		3711	met, stops the assembly immediately, regardless of whether the current pass
		3712	is final or intermediate. So even when the condition that caused this directive
		3713	to be interpreted is mispredicted and temporary, and would eventually disappear
		3714	in the later passes, the assembly is stopped anyway.
		3715	The "assert" directive signalizes the error only if its expression is false
		3716	after all the symbols have been resolved. You can use "assert 0" in place of
		3717	"err" when you do not want to have assembly stopped during the intermediate
		3718	passes.
		3719
		3720
		3721
		3722
		3723
		3724	and therefore are not affected by the control directives. At this time also
		3725	all comments are stripped out.
		3726
		3727
		3728
		3729
		3730
		3731	it is used. It should be followed by the quoted name of file that should be
		3732	included, for example:
		3733
		3734
		3735
		3736
		3737	to the line containing the "include" directive. There are no limits to the
		3738	number of included files as long as they fit in memory.
		3739	The quoted path can contain environment variables enclosed within "%"
		3740	characters, they will be replaced with their values inside the path, both the
		3741	"\" and "/" characters are allowed as a path separators. The file is first
		3742	searched for in the directory containing file which included it and when it is
		3743	not found there, the search is continued in the directories specified in the
		3744	environment variable called INCLUDE (the multiple paths separated with
		3745	semicolons can be defined there, they will be searched in the same order as
		3746	specified). If file was not found in any of these places, preprocessor looks
		3747	for it in the directory containing the main source file (the one specified in
		3748	command line). These rules concern also paths given with the "file" directive.
		3749
		3750
		3751
		3752
		3753
		3754	assembly process they are replaced with their values everywhere in source
		3755	lines after their definitions, and anything can become their values.
		3756	The definition of symbolic constant consists of name of the constant
		3757	followed by the "equ" directive. Everything that follows this directive will
		3758	become the value of constant. If the value of symbolic constant contains
		3759	other symbolic constants, they are replaced with their values before assigning
		3760	this value to the new constant. For example:
		3761
		3762
		3763	NULL equ d 0
		3764	d equ edx
		3765
		3766
		3767	the value of "d" is "edx". So, for example, "push NULL" will be assembled as
		3768	"push dword 0" and "push d" will be assembled as "push edx". And if then the
		3769	following line was put:
		3770
		3771
		3772
		3773
		3774	lists of symbols can be defined.
		3775	"restore" directive allows to get back previous value of redefined symbolic
		3776	constant. It should be followed by one more names of symbolic constants,
		3777	separated with commas. So "restore d" after the above definitions will give
		3778	"d" constant back the value "edx", the second one will restore it to value
		3779	"dword", and one more will revert "d" to original meaning as if no such
		3780	constant was defined. If there was no constant defined of given name,
		3781	"restore" will not cause an error, it will be just ignored.
		3782	Symbolic constant can be used to adjust the syntax of assembler to personal
		3783	preferences. For example the following set of definitions provides the handy
		3784	shortcuts for all the size operators:
		3785
		3786
		3787	w equ word
		3788	d equ dword
		3789	p equ pword
		3790	f equ fword
		3791	q equ qword
		3792	t equ tword
		3793	x equ dqword
		3794	y equ qqword
		3795
		3796
		3797	allow the syntax with "offset" word before any address value:
		3798
		3799
		3800
		3801
		3802	copying the offset of "char" variable into "ax" register, because "offset" is
		3803	replaced with an empty value, and therefore ignored.
		3804	The "define" directive followed by the name of constant and then the value,
		3805	is the alternative way of defining symbolic constant. The only difference
		3806	between "define" and "equ" is that "define" assigns the value as it is, it does
		3807	not replace the symbolic constants with their values inside it.
		3808	Symbolic constants can also be defined with the "fix" directive, which has
		3809	the same syntax as "equ", but defines constants of high priority - they are
		3810	replaced with their symbolic values even before processing the preprocessor
		3811	directives and macroinstructions, the only exception is "fix" directive
		3812	itself, which has the highest possible priority, so it allows redefinition of
		3813	constants defined this way.
		3814	The "fix" directive can be used for syntax adjustments related to directives
		3815	of preprocessor, what cannot be done with "equ" directive. For example:
		3816
		3817
		3818
		3819
		3820	with "equ" directive wouldn't give such result, as standard symbolic constants
		3821	are replaced with their values after searching the line for preprocessor
		3822	directives.
		3823
		3824
		3825
		3826
		3827
		3828	macroinstructions, using which can greatly simplify the process of
		3829	programming. In its simplest form it's similar to symbolic constant
		3830	definition. For example the following definition defines a shortcut for the
		3831	"test al,0xFF" instruction:
		3832
		3833
		3834
		3835
		3836	contents enclosed between the "{" and "}" characters. You can use "tst"
		3837	instruction anywhere after this definition and it will be assembled as
		3838	"test al,0xFF". Defining symbolic constant "tst" of that value would give the
		3839	similar result, but the difference is that the name of macroinstruction is
		3840	recognized only as an instruction mnemonic. Also, macroinstructions are
		3841	replaced with corresponding code even before the symbolic constants are
		3842	replaced with their values. So if you define macroinstruction and symbolic
		3843	constant of the same name, and use this name as an instruction mnemonic, it
		3844	will be replaced with the contents of macroinstruction, but it will be
		3845	replaced with value if symbolic constant if used somewhere inside the
		3846	operands.
		3847	The definition of macroinstruction can consist of many lines, because
		3848	"{" and "}" characters don't have to be in the same line as "macro" directive.
		3849	For example:
		3850
		3851
		3852	{
		3853	xor al,al
		3854	stosb
		3855	}
		3856
		3857
		3858	instructions anywhere it's used.
		3859	Like instructions which needs some number of operands, the macroinstruction
		3860	can be defined to need some number of arguments separated with commas. The
		3861	names of needed argument should follow the name of macroinstruction in the
		3862	line of "macro" directive and should be separated with commas if there is more
		3863	than one. Anywhere one of these names occurs in the contents of
		3864	macroinstruction, it will be replaced with corresponding value, provided when
		3865	the macroinstruction is used. Here is an example of a macroinstruction that
		3866	will do data alignment for binary output format:
		3867
		3868
		3869
		3870
		3871	defined, it will be replaced with contents of this macroinstruction, and the
		3872	"value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
		3873	If a macroinstruction is defined that uses an instruction with the same name
		3874	inside its definition, the previous meaning of this name is used. Useful
		3875	redefinition of macroinstructions can be done in that way, for example:
		3876
		3877
		3878	{
		3879	if op1 in & op2 in
		3880	push op2
		3881	pop op1
		3882	else
		3883	mov op1,op2
		3884	end if
		3885	}
		3886
		3887
		3888	operands to be segment registers. For example "mov ds,es" will be assembled as
		3889	"push es" and "pop ds". In all other cases the standard "mov" instruction will
		3890	be used. The syntax of this "mov" can be extended further by defining next
		3891	macroinstruction of that name, which will use the previous macroinstruction:
		3892
		3893
		3894	{
		3895	if op3 eq
		3896	mov op1,op2
		3897	else
		3898	mov op1,op2
		3899	mov op2,op3
		3900	end if
		3901	}
		3902
		3903
		3904	operands only, because when macroinstruction is given less arguments than it
		3905	needs, the rest of arguments will have empty values. When three operands are
		3906	given, this macroinstruction will become two macroinstructions of the previous
		3907	definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
		3908	"mov ds,dx".
		3909	By placing the "*" after the name of argument you can mark the argument as
		3910	required - preprocessor will not allow it to have an empty value. For example
		3911	the above macroinstruction could be declared as "macro mov op1,op2,op3" to
		3912	make sure that first two arguments will always have to be given some non empty
		3913	values.
		3914	Alternatively, you can provide the default value for argument, by placing
		3915	the "=" followed by value after the name of argument. Then if the argument
		3916	has an empty value provided, the default value will be used instead.
		3917	When it's needed to provide macroinstruction with argument that contains
		3918	some commas, such argument should be enclosed between "<" and ">" characters.
		3919	If it contains more than one "<" character, the same number of ">" should be
		3920	used to tell that the value of argument ends.
		3921	"purge" directive allows removing the last definition of specified
		3922	macroinstruction. It should be followed by one or more names of
		3923	macroinstructions, separated with commas. If such macroinstruction has not
		3924	been defined, you will not get any error. For example after having the syntax
		3925	of "mov" extended with the macroinstructions defined above, you can disable
		3926	syntax with three operands back by using "purge mov" directive. Next
		3927	"purge mov" will disable also syntax for two operands being segment registers,
		3928	and all the next such directives will do nothing.
		3929	If after the "macro" directive you enclose some group of arguments' names in
		3930	square brackets, it will allow giving more values for this group of arguments
		3931	when using that macroinstruction. Any more argument given after the last
		3932	argument of such group will begin the new group and will become the first
		3933	argument of it. That's why after closing the square bracket no more argument
		3934	names can follow. The contents of macroinstruction will be processed for each
		3935	such group of arguments separately. The simplest example is to enclose one
		3936	argument name in square brackets:
		3937
		3938
		3939	{
		3940	mov al,char
		3941	stosb
		3942	}
		3943
		3944
		3945	will be processed into these two instructions separately. For example
		3946	"stoschar 1,2,3" will be assembled as the following instructions:
		3947
		3948
		3949	stosb
		3950	mov al,2
		3951	stosb
		3952	mov al,3
		3953	stosb
		3954
		3955
		3956	macroinstructions. "local" directive defines local names, which will be
		3957	replaced with unique values each time the macroinstruction is used. It should
		3958	be followed by names separated with commas. If the name given as parameter to
		3959	"local" directive begins with a dot or two dots, the unique labels generated
		3960	by each evaluation of macroinstruction will have the same properties.
		3961	This directive is usually needed for the constants or labels that
		3962	macroinstruction defines and uses internally. For example:
		3963
		3964
		3965	{
		3966	local move
		3967	move:
		3968	lodsb
		3969	stosb
		3970	test al,al
		3971	jnz move
		3972	}
		3973
		3974
		3975	in its instructions, so you will not get an error you normally get when some
		3976	label is defined more than once.
		3977	"forward", "reverse" and "common" directives divide macroinstruction into
		3978	blocks, each one processed after the processing of previous is finished. They
		3979	differ in behavior only if macroinstruction allows multiple groups of
		3980	arguments. Block of instructions that follows "forward" directive is processed
		3981	for each group of arguments, from first to last - exactly like the default
		3982	block (not preceded by any of these directives). Block that follows "reverse"
		3983	directive is processed for each group of argument in reverse order - from last
		3984	to first. Block that follows "common" directive is processed only once,
		3985	commonly for all groups of arguments. Local name defined in one of the blocks
		3986	is available in all the following blocks when processing the same group of
		3987	arguments as when it was defined, and when it is defined in common block it is
		3988	available in all the following blocks not depending on which group of
		3989	arguments is processed.
		3990	Here is an example of macroinstruction that will create the table of
		3991	addresses to strings followed by these strings:
		3992
		3993
		3994	{
		3995	common
		3996	label name dword
		3997	forward
		3998	local label
		3999	dd label
		4000	forward
		4001	label db string,0
		4002	}
		4003
		4004
		4005	of addresses, next arguments should be the strings. First block is processed
		4006	only once and defines the label, second block for each string declares its
		4007	local name and defines the table entry holding the address to that string.
		4008	Third block defines the data of each string with the corresponding label.
		4009	The directive starting the block in macroinstruction can be followed by the
		4010	first instruction of this block in the same line, like in the following
		4011	example:
		4012
		4013
		4014	{
		4015	reverse push arg
		4016	common call proc
		4017	}
		4018
		4019
		4020	convention, which has all the arguments pushed on stack in the reverse order.
		4021	For example "stdcall foo,1,2,3" will be assembled as:
		4022
		4023
		4024	push 2
		4025	push 1
		4026	call foo
		4027
		4028
		4029	of the arguments enclosed in square brackets or local name defined in the
		4030	block following "forward" or "reverse" directive) and is used in block
		4031	following the "common" directive, it will be replaced with all of its values,
		4032	separated with commas. For example the following macroinstruction will pass
		4033	all of the additional arguments to the previously defined "stdcall"
		4034	macroinstruction:
		4035
		4036
		4037	{ common stdcall [proc],arg }
		4038
		4039
		4040	procedure using STDCALL convention.
		4041	Inside macroinstruction also special operator "#" can be used. This
		4042	operator causes two names to be concatenated into one name. It can be useful,
		4043	because it's done after the arguments and local names are replaced with their
		4044	values. The following macroinstruction will generate the conditional jump
		4045	according to the "cond" argument:
		4046
		4047
		4048	{
		4049	cmp op1,op2
		4050	j#cond label
		4051	}
		4052
		4053
		4054	"jae exit" instructions.
		4055	The "#" operator can be also used to concatenate two quoted strings into one.
		4056	Also conversion of name into a quoted string is possible, with the "`" operator,
		4057	which likewise can be used inside the macroinstruction. It converts the name
		4058	that follows it into a quoted string - but note, that when it is followed by
		4059	a macro argument which is being replaced with value containing more than one
		4060	symbol, only the first of them will be converted, as the "`" operator converts
		4061	only one symbol that immediately follows it. Here's an example of utilizing
		4062	those two features:
		4063
		4064
		4065	{
		4066	label name
		4067	if ~ used name
		4068	display `name # " is defined but not used.",13,10
		4069	end if
		4070	}
		4071
		4072
		4073	you with the message, informing to which label it applies.
		4074	To make macroinstruction behaving differently when some of the arguments are
		4075	of some special type, for example a quoted strings, you can use "eqtype"
		4076	comparison operator. Here's an example of utilizing it to distinguish a
		4077	quoted string from an other argument:
		4078
		4079
		4080	{
		4081	if arg eqtype ""
		4082	local str
		4083	jmp @f
		4084	str db arg,0Dh,0Ah,24h
		4085	@@:
		4086	mov dx,str
		4087	else
		4088	mov dx,arg
		4089	end if
		4090	mov ah,9
		4091	int 21h
		4092	}
		4093
		4094
		4095	argument of this macro is some number, label, or variable, the string from
		4096	that address is displayed, but when the argument is a quoted string, the
		4097	created code will display that string followed by the carriage return and
		4098	line feed.
		4099	It is also possible to put a declaration of macroinstruction inside another
		4100	macroinstruction, so one macro can define another, but there is a problem
		4101	with such definitions caused by the fact, that "}" character cannot occur
		4102	inside the macroinstruction, as it always means the end of definition. To
		4103	overcome this problem, the escaping of symbols inside macroinstruction can be
		4104	used. This is done by placing one or more backslashes in front of any other
		4105	symbol (even the special character). Preprocessor sees such sequence as a
		4106	single symbol, but each time it meets such symbol during the macroinstruction
		4107	processing, it cuts the backslash character from the front of it. For example
		4108	"\{" is treated as single symbol, but during processing of the macroinstruction
		4109	it becomes the "{" symbol. This allows to put one definition of
		4110	macroinstruction inside another:
		4111
		4112
		4113	{
		4114	macro instr op1,op2,op3
		4115	\{
		4116	if op3 eq
		4117	instr op1,op2
		4118	else
		4119	instr op1,op2
		4120	instr op2,op3
		4121	end if
		4122	\}
		4123	}
		4124
		4125
		4126	ext sub
		4127
		4128
		4129	become the "{" and "}" symbols. So when the "ext add" is processed, the
		4130	contents of macro becomes valid definition of a macroinstruction and this way
		4131	the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
		4132	macro. The use of "\{" symbol wasn't really necessary here, but is done this
		4133	way to make the definition more clear.
		4134	If some directives specific to macroinstructions, like "local" or "common"
		4135	are needed inside some macro embedded this way, they can be escaped in the same
		4136	way. Escaping the symbol with more than one backslash is also allowed, which
		4137	allows multiple levels of nesting the macroinstruction definitions.
		4138	The another technique for defining one macroinstruction by another is to
		4139	use the "fix" directive, which becomes useful when some macroinstruction only
		4140	begins the definition of another one, without closing it. For example:
		4141
		4142
		4143	{
		4144	common macro params {
		4145	}
		4146
		4147
		4148	ENDM fix }
		4149
		4150
		4151
		4152
		4153	mov al,char
		4154	stosb
		4155	ENDM
		4156
		4157
		4158	directive, because only the prioritized symbolic constants are processed before
		4159	the preprocessor looks for the "}" character while defining the macro. This
		4160	might be a problem if one needed to perform some additional tasks one the end
		4161	of such definition, but there is one more feature which helps in such cases.
		4162	Namely it is possible to put any directive, instruction or macroinstruction
		4163	just after the "}" character that ends the macroinstruction and it will be
		4164	processed in the same way as if it was put in the next line.
		4165
		4166
		4167
		4168
		4169
		4170	define data structures. Macroinstruction defined using the "struc" directive
		4171	must be preceded by a label (like the data definition directive) when it's
		4172	used. This label will be also attached at the beginning of every name starting
		4173	with dot in the contents of macroinstruction. The macroinstruction defined
		4174	using the "struc" directive can have the same name as some other
		4175	macroinstruction defined using the "macro" directive, structure
		4176	macroinstruction will not prevent the standard macroinstruction from being
		4177	processed when there is no label before it and vice versa. All the rules and
		4178	features concerning standard macroinstructions apply to structure
		4179	macroinstructions.
		4180	Here is the sample of structure macroinstruction:
		4181
		4182
		4183	{
		4184	.x dw x
		4185	.y dw y
		4186	}
		4187
		4188
		4189	two variables: "my.x" with value 7 and "my.y" with value 11.
		4190	If somewhere inside the definition of structure the name consisting of a
		4191	single dot it found, it is replaced by the name of the label for the given
		4192	instance of structure and this label will not be defined automatically in
		4193	such case, allowing to completely customize the definition. The following
		4194	example utilizes this feature to extend the data definition directive "db"
		4195	with ability to calculate the size of defined data:
		4196
		4197
		4198	{
		4199	common
		4200	. db data
		4201	.size = $ - .
		4202	}
		4203
		4204
		4205	constant, equal to the size of defined data in bytes.
		4206	Defining data structures addressed by registers or absolute values should be
		4207	done using the "virtual" directive with structure macroinstruction
		4208	(see 2.2.4).
		4209	"restruc" directive removes the last definition of the structure, just like
		4210	"purge" does with macroinstructions and "restore" with symbolic constants.
		4211	It also has the same syntax - should be followed by one or more names of
		4212	structure macroinstructions, separated with commas.
		4213
		4214
		4215
		4216
		4217
		4218	amount of duplicates of the block enclosed with braces. The basic syntax is
		4219	"rept" directive followed by number and then block of source enclosed between
		4220	the "{" and "}" characters. The simplest example:
		4221
		4222
		4223
		4224
		4225	is defined in the same way as for the standard macroinstruction and any
		4226	special operators and directives which can be used only inside
		4227	macroinstructions are also allowed here. When the given count is zero, the
		4228	block is simply skipped, as if you defined macroinstruction but never used
		4229	it. The number of repetitions can be followed by the name of counter symbol,
		4230	which will get replaced symbolically with the number of duplicate currently
		4231	generated. So this:
		4232
		4233
		4234	{
		4235	byte#counter db counter
		4236	}
		4237
		4238
		4239
		4240
		4241	byte2 db 2
		4242	byte3 db 3
		4243
		4244
		4245	to process multiple groups of arguments for macroinstructions, so directives
		4246	like "forward", "common" and "reverse" can be used in their usual meaning.
		4247	Thus such macroinstruction:
		4248
		4249
		4250
		4251
		4252	same way as inside macroinstruction with multiple groups of arguments, so:
		4253
		4254
		4255	{
		4256	local label
		4257	label: loop label
		4258	}
		4259
		4260
		4261	The counter symbol by default counts from 1, but you can declare different
		4262	base value by placing the number preceded by colon immediately after the name
		4263	of counter. For example:
		4264
		4265
		4266
		4267
		4268	You can define multiple counters separated with commas, and each one can have
		4269	different base.
		4270	The number of repetitions and the base values for counters can be specified
		4271	using the numerical expressions with operator rules identical as in the case
		4272	of assembler. However each value used in such expression must either be a
		4273	directly specified number, or a symbolic constant with value also being an
		4274	expression that can be calculated by preprocessor (in such case the value
		4275	of expression associated with symbolic constant is calculated first, and then
		4276	substituted into the outer expression in place of that constant). If you need
		4277	repetitions based on values that can only be calculated at assembly time, use
		4278	one of the code repeating directives that are processed by assembler, see
		4279	section 2.2.3.
		4280	The "irp" directive iterates the single argument through the given list of
		4281	parameters. The syntax is "irp" followed by the argument name, then the comma
		4282	and then the list of parameters. The parameters are specified in the same
		4283	way like in the invocation of standard macroinstruction, so they have to be
		4284	separated with commas and each one can be enclosed with the "<" and ">"
		4285	characters. Also the name of argument may be followed by "*" to mark that it
		4286	cannot get an empty value. Such block:
		4287
		4288
		4289	{ db value }
		4290
		4291
		4292
		4293
		4294	db 3
		4295	db 5
		4296
		4297
		4298	be followed by the argument name, then the comma and then the sequence of any
		4299	symbols. Each symbol in this sequence, no matter whether it is the name
		4300	symbol, symbol character or quoted string, becomes an argument value for one
		4301	iteration. If there are no symbols following the comma, no iteration is done
		4302	at all. This example:
		4303
		4304
		4305	{ xor reg,reg }
		4306
		4307
		4308
		4309
		4310	xor bx,bx
		4311	xor ecx,ecx
		4312
		4313
		4314	the same way as any macroinstructions, so operators and directives specific
		4315	to macroinstructions may be freely used also in this case.
		4316
		4317
		4318
		4319
		4320
		4321	to assembler only when the given sequence of symbols matches the specified
		4322	pattern. The pattern comes first, ended with comma, then the symbols that have
		4323	to be matched with the pattern, and finally the block of source, enclosed
		4324	within braces as macroinstruction.
		4325	There are the few rules for building the expression for matching, first is
		4326	that any of symbol characters and any quoted string should be matched exactly
		4327	as is. In this example:
		4328
		4329
		4330	match +,- { include 'second.inc' }
		4331
		4332
		4333	pattern, and the second file will not be included, since there is no match.
		4334	To match any other symbol literally, it has to be preceded by "=" character
		4335	in the pattern. Also to match the "=" character itself, or the comma, the
		4336	"==" and "=," constructions have to be used. For example the "=a==" pattern
		4337	will match the "a=" sequence.
		4338	If some name symbol is placed in the pattern, it matches any sequence
		4339	consisting of at least one symbol and then this name is replaced with the
		4340	matched sequence everywhere inside the following block, analogously to the
		4341	parameters of macroinstruction. For instance:
		4342
		4343
		4344	{ dw a,b-a }
		4345
		4346
		4347	as few symbols as possible, leaving the rest for the following ones, so in
		4348	this case:
		4349
		4350
		4351
		4352
		4353	matched with "b". But in this case:
		4354
		4355
		4356
		4357
		4358	processed at all.
		4359	The block of source defined by match is processed in the same way as any
		4360	macroinstruction, so any operators specific to macroinstructions can be used
		4361	also in this case.
		4362	What makes "match" directive more useful is the fact, that it replaces the
		4363	symbolic constants with their values in the matched sequence of symbols (that
		4364	is everywhere after comma up to the beginning of the source block) before
		4365	performing the match. Thanks to this it can be used for example to process
		4366	some block of source under the condition that some symbolic constant has the
		4367	given value, like:
		4368
		4369
		4370
		4371
		4372	defined with value "TRUE".
		4373
		4374
		4375
		4376
		4377
		4378	the order in which they are processed. As it was already noted, the highest
		4379	priority has the "fix" directive and the replacements defined with it. This
		4380	is done completely before doing any other preprocessing, therefore this
		4381	piece of source:
		4382
		4383
		4384	macro empty
		4385	V
		4386	V fix }
		4387	V
		4388
		4389
		4390	that the "fix" directive and prioritized symbolic constants are processed in
		4391	a separate stage, and all other preprocessing is done after on the resulting
		4392	source.
		4393	The standard preprocessing that comes after, on each line begins with
		4394	recognition of the first symbol. It starts with checking for the preprocessor
		4395	directives, and when none of them is detected, preprocessor checks whether the
		4396	first symbol is macroinstruction. If no macroinstruction is found, it moves
		4397	to the second symbol of line, and again begins with checking for directives,
		4398	which in this case is only the "equ" directive, as this is the only one that
		4399	occurs as the second symbol in line. If there is no directive, the second
		4400	symbol is checked for the case of structure macroinstruction and when none
		4401	of those checks gives the positive result, the symbolic constants are replaced
		4402	with their values and such line is passed to the assembler.
		4403	To see it on the example, assume that there is defined the macroinstruction
		4404	called "foo" and the structure macroinstruction called "bar". Those lines:
		4405
		4406
		4407	foo bar
		4408
		4409
		4410	the meaning of the first symbol overrides the meaning of second one.
		4411	When the macroinstruction generates the new lines from its definition block,
		4412	in every line it first scans for macroinstruction directives, and interpretes
		4413	them accordingly. All the other content in the definition block is used to
		4414	brew the new lines, replacing the macroinstruction parameters with their values
		4415	and then processing the symbol escaping and "#" and "`" operators. The
		4416	conversion operator has the higher priority than concatenation and if any of
		4417	them operates on the escaped symbol, the escaping is cancelled before finishing
		4418	the operation. After this is completed, the newly generated line goes through
		4419	the standard preprocessing, as described above.
		4420	Though the symbolic constants are usually only replaced in the lines, where
		4421	no preprocessor directives nor macroinstructions has been found, there are some
		4422	special cases where those replacements are performed in the parts of lines
		4423	containing directives. First one is the definition of symbolic constant, where
		4424	the replacements are done everywhere after the "equ" keyword and the resulting
		4425	value is then assigned to the new constant (see 2.3.2). The second such case
		4426	is the "match" directive, where the replacements are done in the symbols
		4427	following comma before matching them with pattern. These features can be used
		4428	for example to maintain the lists, like this set of definitions:
		4429
		4430
		4431
		4432
		4433	{
		4434	match any, list \{ list equ list,item \}
		4435	match , list \{ list equ item \}
		4436	}
		4437
		4438
		4439	macroinstruction can be used to add the new items into this list, separating
		4440	them with commas. The first match in this macroinstruction occurs only when
		4441	the value of list is not empty (see 2.3.6), in such case the new value for the
		4442	list is the previous one with the comma and the new item appended at the end.
		4443	The second match happens only when the list is still empty, and in such case
		4444	the list is defined to contain just the new item. So starting with the empty
		4445	list, the "append 1" would define "list equ 1" and the "append 2" following it
		4446	would define "list equ 1,2". One might then need to use this list as the
		4447	parameters to some macroinstruction. But it cannot be done directly - if "foo"
		4448	is the macroinstruction, then "foo list" would just pass the "list" symbol
		4449	as a parameter to macro, since symbolic constants are not unrolled at this
		4450	stage. For this purpose again "match" directive comes in handy:
		4451
		4452
		4453
		4454
		4455	then replaced with matched value when generating the new lines defined by the
		4456	block enclosed with braces. So if the "list" had value "1,2", the above line
		4457	would generate the line containing "foo 1,2", which would then go through the
		4458	standard preprocessing.
		4459	The other special case is in the parameters of "rept" directive. The amount
		4460	of repetitions and the base value for counter can be specified using
		4461	numerical expressions, and if there is a symbolic constant with non-numerical
		4462	name used in such an expression, preprocessor tries to evaluate its value as
		4463	a numerical expression and if succeeds, it replaces the symbolic constant with
		4464	the result of that calculation and continues to evaluate the primary
		4465	expression. If the expression inside that symbolic constants also contains
		4466	some symbolic constants, preprocessor will try to calculate all the needed
		4467	values recursively.
		4468	This allows to perform some calculations at the time of preprocessing, as
		4469	long as all the values used are the numbers known at the preprocessing stage.
		4470	A single repetition with "rept" can be used for the sole purpose of
		4471	calculating some value, like in this example:
		4472
		4473
		4474	define b 3
		4475	rept 1 result:a*b+2 { define c result }
		4476
		4477
		4478	with its value and recursively calculates the value of "a", obtaining 7 as
		4479	the result, then it calculates the main expression with the result being 23.
		4480	The "c" then gets defined with the first value of counter (because the block
		4481	is processed just one time), which is the result of the computation, so the
		4482	value of "c" is simple "23" symbol. Note that if "b" is later redefined with
		4483	some other numerical value, the next time and expression containing "a" is
		4484	calculated, the value of "a" will reflect the new value of "b", because the
		4485	symbolic constant contains just the text of the expression.
		4486	There is one more special case - when preprocessor goes to checking the
		4487	second symbol in the line and it happens to be the colon character (what is
		4488	then interpreted by assembler as definition of a label), it stops in this
		4489	place and finishes the preprocessing of the first symbol (so if it's the
		4490	symbolic constant it gets unrolled) and if it still appears to be the label,
		4491	it performs the standard preprocessing starting from the place after the
		4492	label. This allows to place preprocessor directives and macroinstructions
		4493	after the labels, analogously to the instructions and directives processed
		4494	by assembler, like:
		4495
		4496
		4497
		4498
		4499	it is the symbolic constant with empty value), only replacing of the symbolic
		4500	constants is continued for the rest of line.
		4501	It should be remembered, that the jobs performed by preprocessor are the
		4502	preliminary operations on the texts symbols, that are done in a simple
		4503	single pass before the main process of assembly. The text that is the
		4504	result of preprocessing is passed to assembler, and it then does its
		4505	multiple passes on it. Thus the control directives, which are recognized and
		4506	processed only by the assembler - as they are dependent on the numerical
		4507	values that may even vary between passes - are not recognized in any way by
		4508	the preprocessor and have no effect on the preprocessing. Consider this
		4509	example source:
		4510
		4511
		4512	a = 1
		4513	b equ 2
		4514	end if
		4515	dd b
		4516
		4517
		4518	preprocessor is the "equ", which defines symbolic constant "b", so later
		4519	in the source the "b" symbol is replaced with the value "2". Except for this
		4520	replacement, the other lines are passes unchanged to the assembler. So
		4521	after preprocessing the above source becomes:
		4522
		4523
		4524	a = 1
		4525	end if
		4526	dd 2
		4527
		4528
		4529	the "a" constant doesn't get defined. However symbolic constant "b" was
		4530	processed normally, even though its definition was put just next to the one
		4531	of "a". So because of the possible confusion you should be very careful
		4532	every time when mixing the features of preprocessor and assembler - in such
		4533	cases it is important to realize what the source will become after the
		4534	preprocessing, and thus what the assembler will see and do its multiple passes
		4535	on.
		4536
		4537
		4538
		4539
		4540
		4541	purpose of controlling the format of generated code.
		4542	"format" directive followed by the format identifier allows to select the
		4543	output format. This directive should be put at the beginning of the source.
		4544	Default output format is a flat binary file, it can also be selected by using
		4545	"format binary" directive. This directive can be followed by the "as" keyword
		4546	and the quoted string specifying the default file extension for the output
		4547	file. Unless the output file name was specified from the command line,
		4548	assembler will use this extension when generating the output file.
		4549	"use16" and "use32" directives force the assembler to generate 16-bit or
		4550	32-bit code, omitting the default setting for selected output format. "use64"
		4551	enables generating the code for the long mode of x86-64 processors.
		4552	Below are described different output formats with the directives specific to
		4553	these formats.
		4554
		4555
		4556
		4557
		4558
		4559	setting for this format is 16-bit.
		4560	"segment" directive defines a new segment, it should be followed by label,
		4561	which value will be the number of defined segment, optionally "use16" or
		4562	"use32" word can follow to specify whether code in this segment should be
		4563	16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
		4564	All the labels defined then will have values relative to the beginning of this
		4565	segment.
		4566	"entry" directive sets the entry point for MZ executable, it should be
		4567	followed by the far address (name of segment, colon and the offset inside
		4568	segment) of desired entry point.
		4569	"stack" directive sets up the stack for MZ executable. It can be followed by
		4570	numerical expression specifying the size of stack to be created automatically
		4571	or by the far address of initial stack frame when you want to set up the stack
		4572	manually. When no stack is defined, the stack of default size 4096 bytes will
		4573	be created.
		4574	"heap" directive should be followed by a 16-bit value defining maximum size
		4575	of additional heap in paragraphs (this is heap in addition to stack and
		4576	undefined data). Use "heap 0" to always allocate only memory program really
		4577	needs. Default size of heap is 65535.
		4578
		4579
		4580
		4581
		4582
		4583	can be followed by additional format settings: first the target subsystem
		4584	setting, which can be "console" or "GUI" for Windows applications, "native"
		4585	for Windows drivers, "EFI", "EFIboot" or "EFIruntime" for the UEFI, it may be
		4586	followed by the minimum version of system that the executable is targeted to
		4587	(specified in form of floating-point value). Optional "DLL" and "WDM" keywords
		4588	mark the output file as a dynamic link library and WDM driver respectively,
		4589	and the "large" keyword marks the executable as able to handle addresses
		4590	larger than 2 GB.
		4591	After those settings can follow the "at" operator and a numerical expression
		4592	specifying the base of PE image and then optionally "on" operator followed by
		4593	the quoted string containing file name selects custom MZ stub for PE program
		4594	(when specified file is not a MZ executable, it is treated as a flat binary
		4595	executable file and converted into MZ format). The default code setting for
		4596	this format is 32-bit. The example of fully featured PE format declaration:
		4597
		4598
		4599
		4600
		4601	"PE" in the format declaration, in such case the long mode code is generated
		4602	by default.
		4603	"section" directive defines a new section, it should be followed by quoted
		4604	string defining the name of section, then one or more section flags can
		4605	follow. Available flags are: "code", "data", "readable", "writeable",
		4606	"executable", "shareable", "discardable", "notpageable". The origin of section
		4607	is aligned to page (4096 bytes). Example declaration of PE section:
		4608
		4609
		4610
		4611
		4612	to mark the whole section as a special data, possible identifiers are
		4613	"export", "import", "resource" and "fixups". If the section is marked to
		4614	contain fixups, they are generated automatically and no more data needs to be
		4615	defined in this section. Also resource data can be generated automatically
		4616	from the resource file, it can be achieved by writing the "from" operator and
		4617	quoted file name after the "resource" identifier. Below are the examples of
		4618	sections containing some special PE data:
		4619
		4620
		4621	section '.rsrc' data readable resource from 'my.res'
		4622
		4623
		4624	entry point should follow.
		4625	"stack" directive sets up the size of stack for Portable Executable, value
		4626	of stack reserve size should follow, optionally value of stack commit
		4627	separated with comma can follow. When stack is not defined, it's set by
		4628	default to size of 4096 bytes.
		4629	"heap" directive chooses the size of heap for Portable Executable, value of
		4630	heap reserve size should follow, optionally value of heap commit separated
		4631	with comma can follow. When no heap is defined, it is set by default to size
		4632	of 65536 bytes, when size of heap commit is unspecified, it is by default set
		4633	to zero.
		4634	"data" directive begins the definition of special PE data, it should be
		4635	followed by one of the data identifiers ("export", "import", "resource" or
		4636	"fixups") or by the number of data entry in PE header. The data should be
		4637	defined in next lines, ended with "end data" directive. When fixups data
		4638	definition is chosen, they are generated automatically and no more data needs
		4639	to be defined there. The same applies to the resource data when the "resource"
		4640	identifier is followed by "from" operator and quoted file name - in such case
		4641	data is taken from the given resource file.
		4642	The "rva" operator can be used inside the numerical expressions to obtain
		4643	the RVA of the item addressed by the value it is applied to, that is the
		4644	offset relative to the base of PE image.
		4645
		4646
		4647
		4648
		4649
		4650	directive, depending whether you want to create classic (DJGPP) or Microsoft's
		4651	variant of COFF file. The default code setting for this format is 32-bit. To
		4652	create the file in Microsoft's COFF format for the x86-64 architecture, use
		4653	"format MS64 COFF" setting, in such case long mode code is generated by
		4654	default.
		4655	"section" directive defines a new section, it should be followed by quoted
		4656	string defining the name of section, then one or more section flags can
		4657	follow. Section flags available for both COFF variants are "code" and "data",
		4658	while flags "readable", "writeable", "executable", "shareable", "discardable",
		4659	"notpageable", "linkremove" and "linkinfo" are available only with Microsoft's
		4660	COFF variant.
		4661	By default section is aligned to double word (four bytes), in case of
		4662	Microsoft COFF variant other alignment can be specified by providing the
		4663	"align" operator followed by alignment value (any power of two up to 8192)
		4664	among the section flags.
		4665	"extrn" directive defines the external symbol, it should be followed by the
		4666	name of symbol and optionally the size operator specifying the size of data
		4667	labeled by this symbol. The name of symbol can be also preceded by quoted
		4668	string containing name of the external symbol and the "as" operator.
		4669	Some example declarations of external symbols:
		4670
		4671
		4672	extrn '__imp__MessageBoxA@16' as MessageBox:dword
		4673
		4674
		4675	followed by the name of symbol, optionally it can be followed by the "as"
		4676	operator and the quoted string containing name under which symbol should be
		4677	available as public. Some examples of public symbols declarations:
		4678
		4679
		4680	public start as '_start'
		4681
		4682
		4683	static, it's done by preceding the name of symbol with the "static" keyword.
		4684	When using the Microsoft's COFF format, the "rva" operator can be used
		4685	inside the numerical expressions to obtain the RVA of the item addressed by the
		4686	value it is applied to.
		4687
		4688
		4689
		4690
		4691	setting for this format is 32-bit. To create ELF file for the x86-64
		4692	architecture, use "format ELF64" directive, in such case the long mode code is
		4693	generated by default.
		4694	"section" directive defines a new section, it should be followed by quoted
		4695	string defining the name of section, then can follow one or both of the
		4696	"executable" and "writeable" flags, optionally also "align" operator followed
		4697	by the number specifying the alignment of section (it has to be the power of
		4698	two), if no alignment is specified, the default value is used, which is 4 or 8,
		4699	depending on which format variant has been chosen.
		4700	"extrn" and "public" directives have the same meaning and syntax as when the
		4701	COFF output format is selected (described in previous section).
		4702	The "rva" operator can be used also in the case of this format (however not
		4703	when target architecture is x86-64), it converts the address into the offset
		4704	relative to the GOT table, so it may be useful to create position-independent
		4705	code. There's also a special "plt" operator, which allows to call the external
		4706	functions through the Procedure Linkage Table. You can even create an alias
		4707	for external function that will make it always be called through PLT, with
		4708	the code like:
		4709
		4710
		4711	printf = PLT _printf
		4712
		4713
		4714	"executable" keyword and optionally the number specifying the brand of the
		4715	target operating system (for example value 3 would mark the executable
		4716	for Linux system). With this format selected it is allowed to use "entry"
		4717	directive followed by the value to set as entry point of program. On the other
		4718	hand it makes "extrn" and "public" directives unavailable, and instead of
		4719	"section" there should be the "segment" directive used, followed by one or
		4720	more segment permission flags and optionally a marker of special ELF
		4721	executable segment, which can be "interpreter", "dynamic" or "note". The
		4722	origin of segment is aligned to page (4096 bytes), and available permission
		4723	flags are: "readable", "writeable" and "executable".
		4724
		4725
		4726

Subversion Repositories Kolibri OS

(root)/programs/develop/fasm/trunk/fasm.txt @ 5098 – Rev 4479