WebSVN – Kolibri OS – Blame – /data/eng/docs/FASM.TXT

Rev	Author	Line No.	Line
1737	clevermous	1
		2	��
		3	� � � � � �
		4	� ��
		5	� ��
		6
		7
		8	Programmer's Manual
		9
		10
		11
		12	��
		13
		14
		15
		16
		17	1.1.1 System requirements
		18	1.1.2 Executing compiler from command line
		19	1.1.3 Compiler messages
		20	1.1.4 Output formats
		21
		22
		23	1.2.1 Instruction syntax
		24	1.2.2 Data definitions
		25	1.2.3 Constants and labels
		26	1.2.4 Numerical expressions
		27	1.2.5 Jumps and calls
		28	1.2.6 Size settings
		29
		30
		31
		32
		33	2.1.1 Data movement instructions
		34	2.1.2 Type conversion instructions
		35	2.1.3 Binary arithmetic instructions
		36	2.1.4 Decimal arithmetic instructions
		37	2.1.5 Logical instructions
		38	2.1.6 Control transfer instructions
		39	2.1.7 I/O instructions
		40	2.1.8 Strings operations
		41	2.1.9 Flag control instructions
		42	2.1.10 Conditional operations
		43	2.1.11 Miscellaneous instructions
		44	2.1.12 System instructions
		45	2.1.13 FPU instructions
		46	2.1.14 MMX instructions
		47	2.1.15 SSE instructions
		48	2.1.16 SSE2 instructions
		49	2.1.17 SSE3 instructions
		50	2.1.18 AMD 3DNow! instructions
		51	2.1.19 The x86-64 long mode instructions
		52
		53
		54	2.2.1 Numerical constants
		55	2.2.2 Conditional assembly
		56	2.2.3 Repeating blocks of instructions
		57	2.2.4 Addressing spaces
		58	2.2.5 Other directives
		59	2.2.6 Multiple passes
		60
		61
		62	2.3.1 Including source files
		63	2.3.2 Symbolic constants
		64	2.3.3 Macroinstructions
		65	2.3.4 Structures
		66	2.3.5 Repeating macroinstructions
		67	2.3.6 Conditional preprocessing
		68	2.3.7 Order of processing
		69
		70
		71	2.4.1 MZ executable
		72	2.4.2 Portable Executable
		73	2.4.3 Common Object File Format
		74	2.4.4 Executable and Linkable Format
		75
		76
		77
		78	��
		79
		80
		81	using the flat assembler. If you are experienced assembly language programmer,
		82	you should read at least this chapter before using this compiler.
		83
		84
		85
		86
		87
		88	processors, which does multiple passes to optimize the size of generated
		89	machine code. It is self-compilable and versions for different operating
		90	systems are provided. All the versions are designed to be used from the system
		91	command line and they should not differ in behavior.
		92
		93
		94
		95
		96
		97	although they can produce programs for the x86 architecture 16-bit processors,
		98	too. DOS version requires an OS compatible with MS DOS 2.0 and either true
		99	real mode environment or DPMI. Windows version requires a Win32 console
		100	compatible with 3.1 version.
		101
		102
		103
		104
		105
		106	parameters - first should be name of source file, second should be name of
		107	destination file. If no second parameter is given, the name for output
		108	file will be guessed automatically. After displaying short information about
		109	the program name and version, compiler will read the data from source file and
		110	compile it. When the compilation is successful, compiler will write the
		111	generated code to the destination file and display the summary of compilation
		112	process; otherwise it will display the information about error that occurred.
		113	The source file should be a text file, and can be created in any text
		114	editor. Line breaks are accepted in both DOS and Unix standards, tabulators
		115	are treated as spaces.
		116	In the command line you can also include "-m" option followed by a number,
		117	which specifies how many kilobytes of memory flat assembler should maximally
		118	use. In case of DOS version this options limits only the usage of extended
		119	memory. The "-p" option followed by a number can be used to specify the limit
		120	for number of passes the assembler performs. If code cannot be generated
		121	within specified amount of passes, the assembly will be terminated with an
		122	error message. The maximum value of this setting is 65536, while the default
		123	limit, used when no such option is included in command line, is 100.
		124	It is also possible to limit the number of passes the assembler
		125	performs, with the "-p" option followed by a number specifying the maximum
		126	number of passes.
		127	There are no command line options that would affect the output of compiler,
		128	flat assembler requires only the source code to include the information it
		129	really needs. For example, to specify output format you specify it by using
		130	the "format" directive at the beginning of source.
		131
		132
		133
		134
		135
		136	the compilation summary. It includes the information of how many passes was
		137	done, how much time it took, and how many bytes were written into the
		138	destination file.
		139	The following is an example of the compilation summary:
		140
		141
		142	38 passes, 5.3 seconds, 77824 bytes.
		143
		144
		145	error message. For example, when compiler can't find the input file, it will
		146	display the following message:
		147
		148
		149	error: source file not found.
		150
		151
		152	that caused the error will be also displayed. Also placement of this line in
		153	the source is given to help you finding this error, for example:
		154
		155
		156	example.asm [3]:
		157	mob ax,1
		158	error: illegal instruction.
		159
		160
		161	encountered an unrecognized instruction. When the line that caused error
		162	contains a macroinstruction, also the line in macroinstruction definition
		163	that generated the erroneous instruction is displayed:
		164
		165
		166	example.asm [6]:
		167	stoschar 7
		168	example.asm [3] stoschar [1]:
		169	mob al,char
		170	error: illegal instruction.
		171
		172
		173	generated an unrecognized instruction with the first line of its definition.
		174
		175
		176
		177
		178
		179	assembler simply puts generated instruction codes into output, creating this
		180	way flat binary file. By default it generates 16-bit code, but you can always
		181	turn it into the 16-bit or 32-bit mode by using "use16" or "use32" directive.
		182	Some of the output formats switch into 32-bit mode, when selected - more
		183	information about formats which you can choose can be found in 2.4.
		184	All output code is always in the order in which it was entered into the
		185	source file.
		186
		187
		188
		189
		190
		191	programmers that have been using some other assembly compilers before.
		192	If you are beginner, you should look for the assembly programming tutorials.
		193	Flat assembler by default uses the Intel syntax for the assembly
		194	instructions, although you can customize it using the preprocessor
		195	capabilities (macroinstructions and symbolic constants). It also has its own
		196	set of the directives - the instructions for compiler.
		197	All symbols defined inside the sources are case-sensitive.
		198
		199
		200
		201
		202
		203	instruction is expected to fill the one line of text. If a line contains
		204	a semicolon, except for the semicolons inside the quoted strings, the rest of
		205	this line is the comment and compiler ignores it. If a line ends with "\"
		206	character (eventually the semicolon and comment may follow it), the next line
		207	is attached at this point.
		208	Each line in source is the sequence of items, which may be one of the three
		209	types. One type are the symbol characters, which are the special characters
		210	that are individual items even when are not spaced from the other ones.
		211	Any of the "+-*/=<>()[]{}:,\|&~#`" is the symbol character. The sequence of
		212	other characters, separated from other items with either blank spaces or
		213	symbol characters, is a symbol. If the first character of symbol is either a
		214	single or double quote, it integrates the any sequence of characters following
		215	it, even the special ones, into a quoted string, which should end with the same
		216	character, with which it began (the single or double quote) - however if there
		217	are two such characters in a row (without any other character between them),
		218	they are integrated into quoted string as just one of them and the quoted
		219	string continues then. The symbols other than symbol characters and quoted
		220	strings can be used as names, so are also called the name symbols.
		221	Every instruction consists of the mnemonic and the various number of
		222	operands, separated with commas. The operand can be register, immediate value
		223	or a data addressed in memory, it can also be preceded by size operator to
		224	define or override its size (table 1.1). Names of available registers you can
		225	find in table 1.2, their sizes cannot be overridden. Immediate value can be
		226	specified by any numerical expression.
		227	When operand is a data in memory, the address of that data (also any
		228	numerical expression, but it may contain registers) should be enclosed in
		229	square brackets or preceded by "ptr" operator. For example instruction
		230	"mov eax,3" will put the immediate value 3 into the EAX register, instruction
		231	"mov eax,[7]" will put the 32-bit value from the address 7 into EAX and the
		232	instruction "mov byte [7],3" will put the immediate value 3 into the byte at
		233	address 7, it can also be written as "mov byte ptr 7,3". To specify which
		234	segment register should be used for addressing, segment register name followed
		235	by a colon should be put just before the address value (inside the square
		236	brackets or after the "ptr" operator).
		237
		238
		239	��Ŀ
		240	� Operator � Bits � Bytes �
		241	��͵
		242	� byte � 8 � 1 �
		243	� word � 16 � 2 �
		244	� dword � 32 � 4 �
		245	� fword � 48 � 6 �
		246	� pword � 48 � 6 �
		247	� qword � 64 � 8 �
		248	� tbyte � 80 � 10 �
		249	� tword � 80 � 10 �
		250	� dqword � 128 � 16 �
		251	��
		252
		253
		254	��Ŀ
		255	� Type � Bits � �
		256	��͵
		257	� � 8 � al cl dl bl ah ch dh bh �
		258	� General � 16 � ax cx dx bx sp bp si di �
		259	� � 32 � eax ecx edx ebx esp ebp esi edi �
		260	��Ĵ
		261	� Segment � 16 � es cs ss ds fs gs �
		262	��Ĵ
		263	� Control � 32 � cr0 cr2 cr3 cr4 �
		264	��Ĵ
		265	� Debug � 32 � dr0 dr1 dr2 dr3 dr6 dr7 �
		266	��Ĵ
		267	� FPU � 80 � st0 st1 st2 st3 st4 st5 st6 st7 �
		268	��Ĵ
		269	� MMX � 64 � mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 �
		270	��Ĵ
		271	� SSE � 128 � xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 �
		272	��
		273
		274
		275
		276
		277
		278	table 1.3. The data definition directive should be followed by one or more of
		279	numerical expressions, separated with commas. These expressions define the
		280	values for data cells of size depending on which directive is used. For
		281	example "db 1,2,3" will define the three bytes of values 1, 2 and 3
		282	respectively.
		283	The "db" and "du" directives also accept the quoted string values of any
		284	length, which will be converted into chain of bytes when "db" is used and into
		285	chain of words with zeroed high byte when "du" is used. For example "db 'abc'"
		286	will define the three bytes of values 61, 62 and 63.
		287	The "dp" directive and its synonym "df" accept the values consisting of two
		288	numerical expressions separated with colon, the first value will become the
		289	high word and the second value will become the low double word of the far
		290	pointer value. Also "dd" accepts such pointers consisting of two word values
		291	separated with colon, and "dt" accepts the word and quad word value separated
		292	with colon, the quad word is stored first. The "dt" directive with single
		293	expression as parameter accepts only floating point values and creates data in
		294	FPU double extended precision format.
		295	Any of the above directive allows the usage of special "dup" operator to
		296	make multiple copies of given values. The count of duplicates should precede
		297	this operator and the value to duplicate should follow - it can even be the
		298	chain of values separated with commas, but such set of values needs to be
		299	enclosed with parenthesis, like "db 5 dup (1,2)", which defines five copies
		300	of the given two byte sequence.
		301	The "file" is a special directive and its syntax is different. This
		302	directive includes a chain of bytes from file and it should be followed by the
		303	quoted file name, then optionally numerical expression specifying offset in
		304	file preceded by the colon, and - also optionally - comma and numerical
		305	expression specifying count of bytes to include (if no count is specified, all
		306	data up to the end of file is included). For example "file 'data.bin'" will
		307	include the whole file as binary data and "file 'data.bin':10h,4" will include
		308	only four bytes starting at offset 10h.
		309	The data reservation directive should be followed by only one numerical
		310	expression, and this value defines how many cells of the specified size should
		311	be reserved. All data definition directives also accept the "?" value, which
		312	means that this cell should not be initialized to any value and the effect is
		313	the same as by using the data reservation directive. The uninitialized data
		314	may not be included in the output file, so its values should be always
		315	considered unknown.
		316
		317
		318	��Ŀ
		319	� Size � Define � Reserve �
		320	� (bytes) � data � data �
		321	��͵
		322	� 1 � db � rb �
		323	� � file � �
		324	��Ĵ
		325	� 2 � dw � rw �
		326	� � du � �
		327	��Ĵ
		328	� 4 � dd � rd �
		329	��Ĵ
		330	� 6 � dp � rp �
		331	� � df � rf �
		332	��Ĵ
		333	� 8 � dq � rq �
		334	��Ĵ
		335	� 10 � dt � rt �
		336	��
		337
		338
		339
		340
		341
		342	numbers. To define the constant or label you should use the specific
		343	directives. Each label can be defined only once and it is accessible from the
		344	any place of source (even before it was defined). Constant can be redefined
		345	many times, but in this case it is accessible only after it was defined, and
		346	is always equal to the value from last definition before the place where it's
		347	used. When a constant is defined only once in source, it is - like the label -
		348	accessible from anywhere.
		349	The definition of constant consists of name of the constant followed by the
		350	"=" character and numerical expression, which after calculation will become
		351	the value of constant. This value is always calculated at the time the
		352	constant is defined. For example you can define "count" constant by using the
		353	directive "count = 17", and then use it in the assembly instructions, like
		354	"mov cx,count" - which will become "mov cx,17" during the compilation process.
		355	There are different ways to define labels. The simplest is to follow the
		356	name of label by the colon, this directive can even be followed by the other
		357	instruction in the same line. It defines the label whose value is equal to
		358	offset of the point where it's defined. This method is usually used to label
		359	the places in code. The other way is to follow the name of label (without a
		360	colon) by some data directive. It defines the label with value equal to
		361	offset of the beginning of defined data, and remembered as a label for data
		362	with cell size as specified for that data directive in table 1.3.
		363	The label can be treated as constant of value equal to offset of labeled
		364	code or data. For example when you define data using the labeled directive
		365	"char db 224", to put the offset of this data into BX register you should use
		366	"mov bx,char" instruction, and to put the value of byte addressed by "char"
		367	label to DL register, you should use "mov dl,[char]" (or "mov dl,ptr char").
		368	But when you try to assemble "mov ax,[char]", it will cause an error, because
		369	fasm compares the sizes of operands, which should be equal. You can force
		370	assembling that instruction by using size override: "mov ax,word [char]", but
		371	remember that this instruction will read the two bytes beginning at "char"
		372	address, while it was defined as a one byte.
		373	The last and the most flexible way to define labels is to use "label"
		374	directive. This directive should be followed by the name of label, then
		375	optionally size operator (it can be preceded by a colon) and then - also
		376	optionally "at" operator and the numerical expression defining the address at
		377	which this label should be defined. For example "label wchar word at char"
		378	will define a new label for the 16-bit data at the address of "char". Now the
		379	instruction "mov ax,[wchar]" will be after compilation the same as
		380	"mov ax,word [char]". If no address is specified, "label" directive defines
		381	the label at current offset. Thus "mov [wchar],57568" will copy two bytes
		382	while "mov [char],224" will copy one byte to the same address.
		383	The label whose name begins with dot is treated as local label, and its name
		384	is attached to the name of last global label (with name beginning with
		385	anything but dot) to make the full name of this label. So you can use the
		386	short name (beginning with dot) of this label anywhere before the next global
		387	label is defined, and in the other places you have to use the full name. Label
		388	beginning with two dots are the exception - they are like global, but they
		389	don't become the new prefix for local labels.
		390	The "@@" name means anonymous label, you can have defined many of them in
		391	the source. Symbol "@b" (or equivalent "@r") references the nearest preceding
		392	anonymous label, symbol "@f" references the nearest following anonymous label.
		393	These special symbol are case-insensitive.
		394
		395
		396
		397
		398
		399	constants or labels. But they can be more complex, by using the arithmetical
		400	or logical operators for calculations at compile time. All these operators
		401	with their priority values are listed in table 1.4.
		402	The operations with higher priority value will be calculated first, you can
		403	of course change this behavior by putting some parts of expression into
		404	parenthesis. The "+", "-", "*" and "/" are standard arithmetical operations,
		405	"mod" calculates the remainder from division. The "and", "or", "xor", "shl",
		406	"shr" and "not" perform the same logical operations as assembly instructions
		407	of those names. The "rva" performs the conversion of an address into the
		408	relocatable offset and is specific to some of the output formats (see 2.4).
		409	The numbers in the expression are by default treated as a decimal, binary
		410	numbers should have the "b" letter attached at the end, octal number should
		411	end with "o" letter, hexadecimal numbers should begin with "0x" characters
		412	(like in C language) or with the "$" character (like in Pascal language) or
		413	they should end with "h" letter. Also quoted string, when encountered in
		414	expression, will be converted into number - the first character will become
		415	the least significant byte of number.
		416	The numerical expression used as an address value can also contain any of
		417	general registers used for addressing, they can be added and multiplied by
		418	appropriate values, as it is allowed for the x86 architecture instructions.
		419	There are also some special symbols that can be used inside the numerical
		420	expression. First is "$", which is always equal to the value of current
		421	offset, while "$$" is equal to base address of current addressing space. The
		422	other one is "%", which is the number of current repeat in parts of code that
		423	are repeated using some special directives (see 2.2). There's also "%t"
		424	symbol, which is always equal to the current time stamp.
		425	Any numerical expression can also consist of single floating point value
		426	(flat assembler does not allow any floating point operations at compilation
		427	time) in the scientific notation, they can end with the "f" letter to be
		428	recognized, otherwise they should contain at least one of the "." or "E"
		429	characters. So "1.0", "1E0" and "1f" define the same floating point value,
		430	while simple "1" defines an integer value.
		431
		432
		433	��Ŀ
		434	� Priority � Operators �
		435	��͵
		436	� 0 � + - �
		437	��Ĵ
		438	� 1 � * / �
		439	��Ĵ
		440	� 2 � mod �
		441	��Ĵ
		442	� 3 � and or xor �
		443	��Ĵ
		444	� 4 � shl shr �
		445	��Ĵ
		446	� 5 � not �
		447	��Ĵ
		448	� 6 � rva �
		449	��
		450
		451
		452
		453
		454
		455	size operator, but also by one of the operators specifying type of the jump:
		456	"short", "near" of "far". For example, when assembler is in 16-bit mode,
		457	instruction "jmp dword [0]" will become the far jump and when assembler is
		458	in 32-bit mode, it will become the near jump. To force this instruction to be
		459	treated differently, use the "jmp near dword [0]" or "jmp far dword [0]" form.
		460	When operand of near jump is the immediate value, assembler will generate
		461	the shortest variant of this jump instruction if possible (but won't create
		462	32-bit instruction in 16-bit mode nor 16-bit instruction in 32-bit mode,
		463	unless there is a size operator stating it). By specifying the jump type
		464	you can force it to always generate long variant (for example "jmp near 0")
		465	or to always generate short variant and terminate with an error when it's
		466	impossible (for example "jmp short 0").
		467
		468
		469
		470
		471
		472	instruction is generated by using the short displacement if only address
		473	value fits in the range. This can be overridden using the "word" or "dword"
		474	operator before the address inside the square brackets (or after the "ptr"
		475	operator), which forces the long displacement of appropriate size to be made.
		476	In case when address is not relative to any registers, those operators allow
		477	also to choose the appropriate mode of absolute addressing.
		478	Instructions "adc", "add", "and", "cmp", "or", "sbb", "sub" and "xor" with
		479	first operand being 16-bit or 32-bit are by default generated in shortened
		480	8-bit form when the second operand is immediate value fitting in the range
		481	for signed 8-bit values. It also can be overridden by putting the "word" or
		482	"dword" operator before the immediate value. The similar rules applies to the
		483	"imul" instruction with the last operand being immediate value.
		484	Immediate value as an operand for "push" instruction without a size operator
		485	is by default treated as a word value if assembler is in 16-bit mode and as a
		486	double word value if assembler is in 32-bit mode, shorter 8-bit form of this
		487	instruction is used if possible, "word" or "dword" size operator forces the
		488	"push" instruction to be generated in longer form for specified size. "pushw"
		489	and "pushd" mnemonics force assembler to generate 16-bit or 32-bit code
		490	without forcing it to use the longer form of instruction.
		491
		492
		493
		494	��
		495
		496
		497	directives supported by flat assembler. Directives for defining labels were
		498	already discussed in 1.2.3, all other directives will be described later in
		499	this chapter.
		500
		501
		502
		503
		504
		505	purpose the assembly language instructions. If you need more technical
		506	information, look for the Intel Architecture Software Developer's Manual.
		507	Assembly instructions consist of the mnemonic (instruction's name) and from
		508	zero to three operands. If there are two or more operands, usually first is
		509	the destination operand and second is the source operand. Each operand can be
		510	register, memory or immediate value (see 1.2 for details about syntax of
		511	operands). After the description of each instruction there are examples
		512	of different combinations of operands, if the instruction has any.
		513	Some instructions act as prefixes and can be followed by other instruction
		514	in the same line, and there can be more than one prefix in a line. Each name
		515	of the segment register is also a mnemonic of instruction prefix, altough it
		516	is recommended to use segment overrides inside the square brackets instead of
		517	these prefixes.
		518
		519
		520
		521
		522
		523	destination operand. It can transfer data between general registers, from
		524	the general register to memory, or from memory to general register, but it
		525	cannot move from memory to memory. It can also transfer an immediate value to
		526	general register or memory, segment register to general register or memory,
		527	general register or memory to segment register, control or debug register to
		528	general register and general register to control or debug register. The "mov"
		529	can be assembled only if the size of source operand and size of destination
		530	operand are the same. Below are the examples for each of the allowed
		531	combinations:
		532
		533
		534	mov [char],al ; general register to memory
		535	mov bl,[char] ; memory to general register
		536	mov dl,32 ; immediate value to general register
		537	mov [char],32 ; immediate value to memory
		538	mov ax,ds ; segment register to general register
		539	mov [bx],ds ; segment register to memory
		540	mov ds,ax ; general register to segment register
		541	mov ds,[bx] ; memory to segment register
		542	mov eax,cr0 ; control register to general register
		543	mov cr3,ebx ; general register to control register
		544
		545
		546	two word operands or two double word operands. Order of operands is not
		547	important. The operands may be two general registers, or general register
		548	with memory. For example:
		549
		550
		551	xchg al,[char] ; swap register with memory
		552
		553
		554	the operand to the top of stack indicated by ESP. The operand can be memory,
		555	general register, segment register or immediate value of word or double word
		556	size. If operand is an immediate value and no size is specified, it is by
		557	default treated as a word value if assembler is in 16-bit mode and as a double
		558	word value if assembler is in 32-bit mode. "pushw" and "pushd" mnemonics are
		559	variants of this instruction that store the values of word or double word size
		560	respectively. If more operands follow in the same line (separated only with
		561	spaces, not commas), compiler will assemble chain of the "push" instructions
		562	with these operands. The examples are with single operands:
		563
		564
		565	push es ; store segment register
		566	pushw [bx] ; store memory
		567	push 1000h ; store immediate value
		568
		569
		570	This instruction has no operands. There are two version of this instruction,
		571	one 16-bit and one 32-bit, assembler automatically generates the appropriate
		572	version for current mode, but it can be overridden by using "pushaw" or
		573	"pushad" mnemonic to always get the 16-bit or 32-bit version. The 16-bit
		574	version of this instruction pushes general registers on the stack in the
		575	following order: AX, CX, DX, BX, the initial value of SP before AX was pushed,
		576	BP, SI and DI. The 32-bit version pushes equivalent 32-bit general registers
		577	in the same order.
		578	"pop" transfers the word or double word at the current top of stack to the
		579	destination operand, and then increments ESP to point to the new top of stack.
		580	The operand can be memory, general register or segment register. "popw" and
		581	"popd" mnemonics are variants of this instruction for restoring the values of
		582	word or double word size respectively. If more operands separated with spaces
		583	follow in the same line, compiler will assemble chain of the "pop"
		584	instructions with these operands.
		585
		586
		587	pop ds ; restore segment register
		588	popw [si] ; restore memory
		589
		590
		591	except for the saved value of SP (or ESP), which is ignored. This instruction
		592	has no operands. To force assembling 16-bit or 32-bit version of this
		593	instruction use "popaw" or "popad" mnemonic.
		594
		595
		596
		597
		598
		599	words, and double words into quad words. These conversions can be done using
		600	the sign extension or zero extension. The sign extension fills the extra bits
		601	of the larger item with the value of the sign bit of the smaller item, the
		602	zero extension simply fills them with zeros.
		603	"cwd" and "cdq" double the size of value AX or EAX register respectively
		604	and store the extra bits into the DX or EDX register. The conversion is done
		605	using the sign extension. These instructions have no operands.
		606	"cbw" extends the sign of the byte in AL throughout AX, and "cwde" extends
		607	the sign of the word in AX throughout EAX. These instructions also have no
		608	operands.
		609	"movsx" converts a byte to word or double word and a word to double word
		610	using the sign extension. "movzx" does the same, but it uses the zero
		611	extension. The source operand can be general register or memory, while the
		612	destination operand must be a general register. For example:
		613
		614
		615	movsx edx,dl ; byte register to double word register
		616	movsx eax,ax ; word register to double word register
		617	movsx ax,byte [bx] ; byte memory to word register
		618	movsx edx,byte [bx] ; byte memory to double word register
		619	movsx eax,word [bx] ; word memory to double word register
		620
		621
		622
		623
		624
		625	destination operands and sets CF if overflow has occurred. The operands may
		626	be bytes, words or double words. The destination operand can be general
		627	register or memory, the source operand can be general register or immediate
		628	value, it can also be memory if the destination operand is register.
		629
		630
		631	add ax,[si] ; add memory to register
		632	add [di],al ; add register to memory
		633	add al,48 ; add immediate value to register
		634	add [char],48 ; add immediate value to memory
		635
		636
		637	operand with the result. Rules for the operands are the same as for the "add"
		638	instruction. An "add" followed by multiple "adc" instructions can be used to
		639	add numbers longer than 32 bits.
		640	"inc" adds one to the operand, it does not affect CF. The operand can be a
		641	general register or memory, and the size of the operand can be byte, word or
		642	double word.
		643
		644
		645	inc byte [bx] ; increment memory by one
		646
		647
		648	the destination operand with the result. If a borrow is required, the CF is
		649	set. Rules for the operands are the same as for the "add" instruction.
		650	"sbb" subtracts the source operand from the destination operand, subtracts
		651	one if CF is set, and stores the result to the destination operand. Rules for
		652	the operands are the same as for the "add" instruction. A "sub" followed by
		653	multiple "sbb" instructions may be used to subtract numbers longer than 32
		654	bits.
		655	"dec" subtracts one from the operand, it does not affect CF. Rules for the
		656	operand are the same as for the "inc" instruction.
		657	"cmp" subtracts the source operand from the destination operand. It updates
		658	the flags as the "sub" instruction, but does not alter the source and
		659	destination operands. Rules for the operands are the same as for the "sub"
		660	instruction.
		661	"neg" subtracts a signed integer operand from zero. The effect of this
		662	instructon is to reverse the sign of the operand from positive to negative or
		663	from negative to positive. Rules for the operand are the same as for the "inc"
		664	instruction.
		665	"xadd" exchanges the destination operand with the source operand, then loads
		666	the sum of the two values into the destination operand. Rules for the operands
		667	are the same as for the "add" instruction.
		668	All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
		669	SF is always set to the same value as the result's sign bit, ZF is set when
		670	all the bits of result are zero, PF is set when low order eight bits of result
		671	contain an even number of set bits, OF is set if result is too large for a
		672	positive number or too small for a negative number (excluding sign bit) to fit
		673	in destination operand.
		674	"mul" performs an unsigned multiplication of the operand and the
		675	accumulator. If the operand is a byte, the processor multiplies it by the
		676	contents of AL and returns the 16-bit result to AH and AL. If the operand is a
		677	word, the processor multiplies it by the contents of AX and returns the 32-bit
		678	result to DX and AX. If the operand is a double word, the processor multiplies
		679	it by the contents of EAX and returns the 64-bit result in EDX and EAX. "mul"
		680	sets CF and OF when the upper half of the result is nonzero, otherwise they
		681	are cleared. Rules for the operand are the same as for the "inc" instruction.
		682	"imul" performs a signed multiplication operation. This instruction has
		683	three variations. First has one operand and behaves in the same way as the
		684	"mul" instruction. Second has two operands, in this case destination operand
		685	is multiplied by the source operand and the result replaces the destination
		686	operand. Destination operand must be a general register, it can be word or
		687	double word, source operand can be general register, memory or immediate
		688	value. Third form has three operands, the destination operand must be a
		689	general register, word or double word in size, source operand can be general
		690	register or memory, and third operand must be an immediate value. The source
		691	operand is multiplied by the immediate value and the result is stored in the
		692	destination register. All the three forms calculate the product to twice the
		693	size of operands and set CF and OF when the upper half of the result is
		694	nonzero, but second and third form truncate the product to the size of
		695	operands. So second and third forms can be also used for unsigned operands
		696	because, whether the operands are signed or unsigned, the lower half of the
		697	product is the same. Below are the examples for all three forms:
		698
		699
		700	imul word [si] ; accumulator by memory
		701	imul bx,cx ; register by register
		702	imul bx,[si] ; register by memory
		703	imul bx,10 ; register by immediate value
		704	imul ax,bx,10 ; register by immediate value to register
		705	imul ax,[si],10 ; memory by immediate value to register
		706
		707
		708	The dividend (the accumulator) is twice the size of the divisor (the operand),
		709	the quotient and remainder have the same size as the divisor. If divisor is
		710	byte, the dividend is taken from AX register, the quotient is stored in AL and
		711	the remainder is stored in AH. If divisor is word, the upper half of dividend
		712	is taken from DX, the lower half of dividend is taken from AX, the quotient is
		713	stored in AX and the remainder is stored in DX. If divisor is double word,
		714	the upper half of dividend is taken from EDX, the lower half of dividend is
		715	taken from EAX, the quotient is stored in EAX and the remainder is stored in
		716	EDX. Rules for the operand are the same as for the "mul" instruction.
		717	"idiv" performs a signed division of the accumulator by the operand.
		718	It uses the same registers as the "div" instruction, and the rules for
		719	the operand are the same.
		720
		721
		722
		723
		724
		725	instructions (already described in the prior section) with the decimal
		726	arithmetic instructions. The decimal arithmetic instructions are used to
		727	adjust the results of a previous binary arithmetic operation to produce a
		728	valid packed or unpacked decimal result, or to adjust the inputs to a
		729	subsequent binary arithmetic operation so the operation will produce a valid
		730	packed or unpacked decimal result.
		731	"daa" adjusts the result of adding two valid packed decimal operands in
		732	AL. "daa" must always follow the addition of two pairs of packed decimal
		733	numbers (one digit in each half-byte) to obtain a pair of valid packed
		734	decimal digits as results. The carry flag is set if carry was needed.
		735	This instruction has no operands.
		736	"das" adjusts the result of subtracting two valid packed decimal operands
		737	in AL. "das" must always follow the subtraction of one pair of packed decimal
		738	numbers (one digit in each half-byte) from another to obtain a pair of valid
		739	packed decimal digits as results. The carry flag is set if a borrow was
		740	needed. This instruction has no operands.
		741	"aaa" changes the contents of register AL to a valid unpacked decimal
		742	number, and zeroes the top four bits. "aaa" must always follow the addition
		743	of two unpacked decimal operands in AL. The carry flag is set and AH is
		744	incremented if a carry is necessary. This instruction has no operands.
		745	"aas" changes the contents of register AL to a valid unpacked decimal
		746	number, and zeroes the top four bits. "aas" must always follow the
		747	subtraction of one unpacked decimal operand from another in AL. The carry flag
		748	is set and AH decremented if a borrow is necessary. This instruction has no
		749	operands.
		750	"aam" corrects the result of a multiplication of two valid unpacked decimal
		751	numbers. "aam" must always follow the multiplication of two decimal numbers
		752	to produce a valid decimal result. The high order digit is left in AH, the
		753	low order digit in AL. The generalized version of this instruction allows
		754	adjustment of the contents of the AX to create two unpacked digits of any
		755	number base. The standard version of this instruction has no operands, the
		756	generalized version has one operand - an immediate value specifying the
		757	number base for the created digits.
		758	"aad" modifies the numerator in AH and AL to prepare for the division of two
		759	valid unpacked decimal operands so that the quotient produced by the division
		760	will be a valid unpacked decimal number. AH should contain the high order
		761	digit and AL the low order digit. This instruction adjusts the value and
		762	places the result in AL, while AH will contain zero. The generalized version
		763	of this instruction allows adjustment of two unpacked digits of any number
		764	base. Rules for the operand are the same as for the "aam" instruction.
		765
		766
		767
		768
		769
		770	complement of the operand. It has no effect on the flags. Rules for the
		771	operand are the same as for the "inc" instruction.
		772	"and", "or" and "xor" instructions perform the standard
		773	logical operations. They update the SF, ZF and PF flags. Rules for the
		774	operands are the same as for the "add" instruction.
		775	"bt", "bts", "btr" and "btc" instructions operate on a single bit which can
		776	be in memory or in a general register. The location of the bit is specified
		777	as an offset from the low order end of the operand. The value of the offset
		778	is the taken from the second operand, it either may be an immediate byte or
		779	a general register. These instructions first assign the value of the selected
		780	bit to CF. "bt" instruction does nothing more, "bts" sets the selected bit to
		781	1, "btr" resets the selected bit to 0, "btc" changes the bit to its
		782	complement. The first operand can be word or double word.
		783
		784
		785	bts word [bx],15 ; test and set bit in memory
		786	btr ax,cx ; test and reset bit in register
		787	btc word [bx],cx ; test and complement bit in memory
		788
		789
		790	and store the index of this bit into destination operand, which must be
		791	general register. The bit string being scanned is specified by source operand,
		792	it may be either general register or memory. The ZF flag is set if the entire
		793	string is zero (no set bits are found); otherwise it is cleared. If no set bit
		794	is found, the value of the destination register is undefined. "bsf" scans from
		795	low order to high order (starting from bit index zero). "bsr" scans from high
		796	order to low order (starting from bit index 15 of a word or index 31 of a
		797	double word).
		798
		799
		800	bsr ax,[si] ; scan memory reverse
		801
		802
		803	in the second operand. The destination operand can be byte, word, or double
		804	word general register or memory. The second operand can be an immediate value
		805	or the CL register. The processor shifts zeros in from the right (low order)
		806	side of the operand as bits exit from the left side. The last bit that exited
		807	is stored in CF. "sal" is a synonym for "shl".
		808
		809
		810	shl byte [bx],1 ; shift memory left by one bit
		811	shl ax,cl ; shift register left by count from cl
		812	shl word [bx],cl ; shift memory left by count from cl
		813
		814
		815	specified in the second operand. Rules for operands are the same as for the
		816	"shl" instruction. "shr" shifts zeros in from the left side of the operand as
		817	bits exit from the right side. The last bit that exited is stored in CF.
		818	"sar" preserves the sign of the operand by shifting in zeros on the left side
		819	if the value is positive or by shifting in ones if the value is negative.
		820	"shld" shifts bits of the destination operand to the left by the number
		821	of bits specified in third operand, while shifting high order bits from the
		822	source operand into the destination operand on the right. The source operand
		823	remains unmodified. The destination operand can be a word or double word
		824	general register or memory, the source operand must be a general register,
		825	third operand can be an immediate value or the CL register.
		826
		827
		828	shld [di],bx,1 ; shift memory left by one bit
		829	shld ax,bx,cl ; shift register left by count from cl
		830	shld [di],bx,cl ; shift memory left by count from cl
		831
		832
		833	low order bits from the source operand into the destination operand on the
		834	left. The source operand remains unmodified. Rules for operands are the same
		835	as for the "shld" instruction.
		836	"rol" and "rcl" rotate the byte, word or double word destination operand
		837	left by the number of bits specified in the second operand. For each rotation
		838	specified, the high order bit that exits from the left of the operand returns
		839	at the right to become the new low order bit. "rcl" additionally puts in CF
		840	each high order bit that exits from the left side of the operand before it
		841	returns to the operand as the low order bit on the next rotation cycle. Rules
		842	for operands are the same as for the "shl" instruction.
		843	"ror" and "rcr" rotate the byte, word or double word destination operand
		844	right by the number of bits specified in the second operand. For each rotation
		845	specified, the low order bit that exits from the right of the operand returns
		846	at the left to become the new high order bit. "rcr" additionally puts in CF
		847	each low order bit that exits from the right side of the operand before it
		848	returns to the operand as the high order bit on the next rotation cycle.
		849	Rules for operands are the same as for the "shl" instruction.
		850	"test" performs the same action as the "and" instruction, but it does not
		851	alter the destination operand, only updates flags. Rules for the operands are
		852	the same as for the "and" instruction.
		853	"bswap" reverses the byte order of a 32-bit general register: bits 0 through
		854	7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with
		855	bits 16 through 23. This instruction is provided for converting little-endian
		856	values to big-endian format and vice versa.
		857
		858
		859
		860
		861
		862
		863
		864	destination address can be specified directly within the instruction or
		865	indirectly through a register or memory, the acceptable size of this address
		866	depends on whether the jump is near or far (it can be specified by preceding
		867	the operand with "near" or "far" operator) and whether the instruction is
		868	16-bit or 32-bit. Operand for near jump should be "word" size for 16-bit
		869	instruction or the "dword" size for 32-bit instruction. Operand for far jump
		870	should be "dword" size for 16-bit instruction or "pword" size for 32-bit
		871	instruction. A direct "jmp" instruction includes the destination address as
		872	part of the instruction (and can be preceded by "short", "near" or "far"
		873	operator), the operand specifying address should be the numerical expression
		874	for near or short jump, or two numerical expressions separated with colon for
		875	far jump, the first specifies selector of segment, the second is the offset
		876	within segment. The "pword" operator can be used to force the 32-bit far call,
		877	and "dword" to force the 16-bit far call. An indirect "jmp" instruction
		878	obtains the destination address indirectly through a register or a pointer
		879	variable, the operand should be general register or memory. See also 1.2.5 for
		880	some more details.
		881
		882
		883	jmp 0FFFFh:0 ; direct far jump
		884	jmp ax ; indirect near jump
		885	jmp pword [ebx] ; indirect far jump
		886
		887
		888	of the instruction following the "call" for later use by a "ret" (return)
		889	instruction. Rules for the operands are the same as for the "jmp" instruction,
		890	but the "call" has no short variant of direct instruction and thus it not
		891	optimized.
		892	"ret", "retn" and "retf" instructions terminate the execution of a procedure
		893	and transfers control back to the program that originally invoked the
		894	procedure using the address that was stored on the stack by the "call"
		895	instruction. "ret" is the equivalent for "retn", which returns from the
		896	procedure that was executed using the near call, while "retf" returns from
		897	the procedure that was executed using the far call. These instructions default
		898	to the size of address appropriate for the current code setting, but the size
		899	of address can be forced to 16-bit by using the "retw", "retnw" and "retfw"
		900	mnemonics, and to 32-bit by using the "retd", "retnd" and "retfd" mnemonics.
		901	All these instructions may optionally specify an immediate operand, by adding
		902	this constant to the stack pointer, they effectively remove any arguments that
		903	the calling program pushed on the stack before the execution of the "call"
		904	instruction.
		905	"iret" returns control to an interrupted procedure. It differs from "ret" in
		906	that it also pops the flags from the stack into the flags register. The flags
		907	are stored on the stack by the interrupt mechanism. It defaults to the size of
		908	return address appropriate for the current code setting, but it can be forced
		909	to use 16-bit or 32-bit address by using the "iretw" or "iretd" mnemonic.
		910	The conditional transfer instructions are jumps that may or may not transfer
		911	control, depending on the state of the CPU flags when the instruction
		912	executes. The mnemonics for conditional jumps may be obtained by attaching
		913	the condition mnemonic (see table 2.1) to the "j" mnemonic,
		914	for example "jc" instruction will transfer the control when the CF flag is
		915	set. The conditional jumps can be short or near, and direct only, and can be
		916	optimized (see 1.2.5), the operand should be an immediate value specifying
		917	target address.
		918
		919
		920	��Ŀ
		921	� Mnemonic � Condition tested � Description �
		922	��͵
		923	� o � OF = 1 � overflow �
		924	��Ĵ
		925	� no � OF = 0 � not overflow �
		926	��Ĵ
		927	� c � � carry �
		928	� b � CF = 1 � below �
		929	� nae � � not above nor equal �
		930	��Ĵ
		931	� nc � � not carry �
		932	� ae � CF = 0 � above or equal �
		933	� nb � � not below �
		934	��Ĵ
		935	� e � ZF = 1 � equal �
		936	� z � � zero �
		937	��Ĵ
		938	� ne � ZF = 0 � not equal �
		939	� nz � � not zero �
		940	��Ĵ
		941	� be � CF or ZF = 1 � below or equal �
		942	� na � � not above �
		943	��Ĵ
		944	� a � CF or ZF = 0 � above �
		945	� nbe � � not below nor equal �
		946	��Ĵ
		947	� s � SF = 1 � sign �
		948	��Ĵ
		949	� ns � SF = 0 � not sign �
		950	��Ĵ
		951	� p � PF = 1 � parity �
		952	� pe � � parity even �
		953	��Ĵ
		954	� np � PF = 0 � not parity �
		955	� po � � parity odd �
		956	��Ĵ
		957	� l � SF xor OF = 1 � less �
		958	� nge � � not greater nor equal �
		959	��Ĵ
		960	� ge � SF xor OF = 0 � greater or equal �
		961	� nl � � not less �
		962	��Ĵ
		963	� le � (SF xor OF) or ZF = 1 � less or equal �
		964	� ng � � not greater �
		965	��Ĵ
		966	� g � (SF xor OF) or ZF = 0 � greater �
		967	� nle � � not less nor equal �
		968	��
		969
		970
		971	CX (or ECX) to specify the number of repetitions of a software loop. All
		972	"loop" instructions automatically decrement CX (or ECX) and terminate the
		973	loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
		974	whether the current code setting is 16-bit or 32-bit, but it can be forced to
		975	us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
		976	"loope" and "loopz" are the synonyms for the same instruction, which acts as
		977	the standard "loop", but also terminates the loop when ZF flag is set.
		978	"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
		979	and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
		980	synonyms for the same instructions, which acts as the standard "loop", but
		981	also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
		982	mnemonics force them to use CX register while "loopned" and "loopnzd" force
		983	them to use ECX register. Every "loop" instruction needs an operand being an
		984	immediate value specifying target address, it can be only short jump (in the
		985	range of 128 bytes back and 127 bytes forward from the address of instruction
		986	following the "loop" instruction).
		987	"jcxz" branches to the label specified in the instruction if it finds a
		988	value of zero in CX, "jecxz" does the same, but checks the value of ECX
		989	instead of CX. Rules for the operands are the same as for the "loop"
		990	instruction.
		991	"int" activates the interrupt service routine that corresponds to the
		992	number specified as an operand to the instruction, the number should be in
		993	range from 0 to 255. The interrupt service routine terminates with an "iret"
		994	instruction that returns control to the instruction that follows "int".
		995	"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
		996	"into" instruction invokes the interrupt 4 if the OF flag is set.
		997	"bound" verifies that the signed value contained in the specified register
		998	lies within specified limits. An interrupt 5 occurs if the value contained in
		999	the register is less than the lower bound or greater than the upper bound. It
		1000	needs two operands, the first operand specifies the register being tested,
		1001	the second operand should be memory address for the two signed limit values.
		1002	The operands can be "word" or "dword" in size.
		1003
		1004
		1005	bound eax,[esi] ; check double word for bounds
		1006
		1007
		1008
		1009
		1010
		1011	or EAX. I/O ports can be addressed either directly, with the immediate byte
		1012	value coded in instruction, or indirectly via the DX register. The destination
		1013	operand should be AL, AX, or EAX register. The source operand should be an
		1014	immediate value in range from 0 to 255, or DX register.
		1015
		1016
		1017	in ax,dx ; input word from port addressed by dx
		1018
		1019
		1020	or EAX. The program can specify the number of the port using the same methods
		1021	as the "in" instruction. The destination operand should be an immediate value
		1022	in range from 0 to 255, or DX register. The source operand should be AL, AX,
		1023	or EAX register.
		1024
		1025
		1026	out dx,al ; output byte to port addressed by dx
		1027
		1028
		1029
		1030
		1031
		1032	may be a byte, a word, or a double word. The string elements are addressed by
		1033	SI and DI (or ESI and EDI) registers. After every string operation SI and/or
		1034	DI (or ESI and/or EDI) are automatically updated to point to the next element
		1035	of the string. If DF (direction flag) is zero, the index registers are
		1036	incremented, if DF is one, they are decremented. The amount of the increment
		1037	or decrement is 1, 2, or 4 depending on the size of the string element. Every
		1038	string operation instruction has short forms which have no operands and use
		1039	SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
		1040	type is 32-bit. SI and ESI by default address data in the segment selected
		1041	by DS, DI and EDI always address data in the segment selected by ES. Short
		1042	form is obtained by attaching to the mnemonic of string operation letter
		1043	specifying the size of string element, it should be "b" for byte element,
		1044	"w" for word element, and "d" for double word element. Full form of string
		1045	operation needs operands providing the size operator and the memory addresses,
		1046	which can be SI or ESI with any segment prefix, DI or EDI always with ES
		1047	segment prefix.
		1048	"movs" transfers the string element pointed to by SI (or ESI) to the
		1049	location pointed to by DI (or EDI). Size of operands can be byte, word, or
		1050	double word. The destination operand should be memory addressed by DI or EDI,
		1051	the source operand should be memory addressed by SI or ESI with any segment
		1052	prefix.
		1053
		1054
		1055	movs word [es:di],[ss:si] ; transfer word
		1056	movsd ; transfer double word
		1057
		1058
		1059	element and updates the flags AF, SF, PF, CF and OF, but it does not change
		1060	any of the compared elements. If the string elements are equal, ZF is set,
		1061	otherwise it is cleared. The first operand for this instruction should be the
		1062	source string element addressed by SI or ESI with any segment prefix, the
		1063	second operand should be the destination string element addressed by DI or
		1064	EDI.
		1065
		1066
		1067	cmps word [ds:si],[es:di] ; compare words
		1068	cmps dword [fs:esi],[edi] ; compare double words
		1069
		1070
		1071	(depending on the size of string element) and updates the flags AF, SF, ZF,
		1072	PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
		1073	The operand should be the destination string element addressed by DI or EDI.
		1074
		1075
		1076	scasw ; scan word
		1077	scas dword [es:edi] ; scan double word
		1078
		1079
		1080	element. Rules for the operand are the same as for the "scas" instruction.
		1081	"lods" places the source string element into AL, AX, or EAX. The operand
		1082	should be the source string element addressed by SI or ESI with any segment
		1083	prefix.
		1084
		1085
		1086	lods word [cs:si] ; load word
		1087	lodsd ; load double word
		1088
		1089
		1090	by DX register to the destination string element. The destination operand
		1091	should be memory addressed by DI or EDI, the source operand should be the DX
		1092	register.
		1093
		1094
		1095	ins word [es:di],dx ; input word
		1096	ins dword [edi],dx ; input double word
		1097
		1098
		1099	DX register. The destination operand should be the DX register and the source
		1100	operand should be memory addressed by SI or ESI with any segment prefix.
		1101
		1102
		1103	outsw ; output word
		1104	outs dx,dword [gs:esi] ; output double word
		1105
		1106
		1107	repeated string operation. When a string operation instruction has a repeat
		1108	prefix, the operation is executed repeatedly, each time using a different
		1109	element of the string. The repetition terminates when one of the conditions
		1110	specified by the prefix is satisfied. All three prefixes automatically
		1111	decrease CX or ECX register (depending whether string operation instruction
		1112	uses the 16-bit or 32-bit addressing) after each operation and repeat the
		1113	associated operation until CX or ECX is zero. "repe"/"repz" and
		1114	"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
		1115	(described below). When these prefixes are used, repetition of the next
		1116	instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
		1117	the execution when the ZF is zero, "repne" and "repnz" terminate the execution
		1118	when the ZF is set.
		1119
		1120
		1121	repe cmpsb ; compare bytes until not equal
		1122
		1123
		1124
		1125
		1126
		1127	state of bits in the flag register. All instructions described in this
		1128	section have no operands.
		1129	"stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
		1130	CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
		1131	the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
		1132	interrupts, "cli" zeroes the IF and therefore disables the interrupts.
		1133	"lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
		1134	AH register. The contents of the remaining bits are undefined. The flags
		1135	remain unaffected.
		1136	"sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
		1137	AF, PF, and CF.
		1138	"pushf" decrements "esp" by two or four and stores the low word or
		1139	double word of flags register at the top of stack, size of stored data
		1140	depends on the current code setting. "pushfw" variant forces storing the
		1141	word and "pushfd" forces storing the double word.
		1142	"popf" transfers specific bits from the word or double word at the top
		1143	of stack, then increments "esp" by two or four, this value depends on
		1144	the current code setting. "popfw" variant forces restoring from the word
		1145	and "popfd" forces restoring from the double word.
		1146
		1147
		1148
		1149
		1150
		1151	2.1) to the "set" mnemonic set a byte to one if the condition is true and set
		1152	the byte to zero otherwise. The operand should be an 8-bit be general register
		1153	or the byte in memory.
		1154
		1155
		1156	seto byte [bx] ; set byte if overflow
		1157
		1158
		1159	set and zeroes the AL register otherwise. This instruction has no arguments.
		1160	The instructions obtained by attaching the condition mnemonic to the "cmov"
		1161	mnemonic transfer the word or double word from the general register or memory
		1162	to the general register only when the condition is true. The destination
		1163	operand should be general register, the source operand can be general register
		1164	or memory.
		1165
		1166
		1167	cmovnc eax,[ebx] ; move when carry flag cleared
		1168
		1169
		1170	destination operand. If the two values are equal, the source operand is
		1171	loaded into the destination operand. Otherwise, the destination operand is
		1172	loaded into the AL, AX, or EAX register. The destination operand may be a
		1173	general register or memory, the source operand must be a general register.
		1174
		1175
		1176	cmpxchg [bx],dx ; compare and exchange with memory
		1177
		1178
		1179	destination operand. If the values are equal, the 64-bit value in ECX and EBX
		1180	registers is stored in the destination operand. Otherwise, the value in the
		1181	destination operand is loaded into EDX and EAX registers. The destination
		1182	operand should be a quad word in memory.
		1183
		1184
		1185
		1186
		1187
		1188
		1189
		1190	pointer. This instruction has no operands and doesn't perform any operation.
		1191	"ud2" instruction generates an invalid opcode exception. This instruction
		1192	is provided for software testing to explicitly generate an invalid opcode.
		1193	This is instruction has no operands.
		1194	"xlat" replaces a byte in the AL register with a byte indexed by its value
		1195	in a translation table addressed by BX or EBX. The operand should be a byte
		1196	memory addressed by BX or EBX with any segment prefix. This instruction has
		1197	also a short form "xlatb" which has no operands and uses the BX or EBX address
		1198	in the segment selected by DS depending on the current code setting.
		1199	"lds" transfers a pointer variable from the source operand to DS and the
		1200	destination register. The source operand must be a memory operand, and the
		1201	destination operand must be a general register. The DS register receives the
		1202	segment selector of the pointer while the destination register receives the
		1203	offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
		1204	to "lds" except that rather than DS register the ES, FS, GS and SS is used
		1205	respectively.
		1206
		1207
		1208
		1209
		1210	to the destination operand. The source operand must be a memory operand, and
		1211	the destination operand must be a general register.
		1212
		1213
		1214
		1215
		1216	EAX, EBX, ECX, and EDX registers. The information returned is selected by
		1217	entering a value in the EAX register before the instruction is executed.
		1218	This instruction has no operands.
		1219	"pause" instruction delays the execution of the next instruction an
		1220	implementation specific amount of time. It can be used to improve the
		1221	performance of spin wait loops. This instruction has no operands.
		1222	"enter" creates a stack frame that may be used to implement the scope rules
		1223	of block-structured high-level languages. A "leave" instruction at the end of
		1224	a procedure complements an "enter" at the beginning of the procedure to
		1225	simplify stack management and to control access to variables for nested
		1226	procedures. The "enter" instruction includes two parameters. The first
		1227	parameter specifies the number of bytes of dynamic storage to be allocated on
		1228	the stack for the routine being entered. The second parameter corresponds to
		1229	the lexical nesting level of the routine, it can be in range from 0 to 31.
		1230	The specified lexical level determines how many sets of stack frame pointers
		1231	the CPU copies into the new stack frame from the preceding frame. This list
		1232	of stack frame pointers is sometimes called the display. The first word (or
		1233	double word when code is 32-bit) of the display is a pointer to the last stack
		1234	frame. This pointer enables a "leave" instruction to reverse the action of the
		1235	previous "enter" instruction by effectively discarding the last stack frame.
		1236	After "enter" creates the new display for a procedure, it allocates the
		1237	dynamic storage space for that procedure by decrementing ESP by the number of
		1238	bytes specified in the first parameter. To enable a procedure to address its
		1239	display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
		1240	frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
		1241	BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
		1242	levels greater than zero, the processor pushes additional frame pointers on
		1243	the stack before adjusting the stack pointer.
		1244
		1245
		1246
		1247
		1248
		1249
		1250
		1251	CR0 register), while "smsw" stores the machine status word into the
		1252	destination operand. The operand for both those instructions can be 16-bit
		1253	general register or memory, for "smsw" it can also be 32-bit general
		1254	register.
		1255
		1256
		1257	smsw [bx] ; store machine status to memory
		1258
		1259
		1260	descriptor table register or the interrupt descriptor table register
		1261	respectively. "sgdt" and "sidt" store the contents of the global descriptor
		1262	table register or the interrupt descriptor table register in the destination
		1263	operand. The operand should be a 6 bytes in memory.
		1264
		1265
		1266
		1267
		1268	descriptor table register and "sldt" stores the segment selector from the
		1269	local descriptor table register in the operand. "ltr" loads the operand into
		1270	the segment selector field of the task register and "str" stores the segment
		1271	selector from the task register in the operand. Rules for operand are the same
		1272	as for the "lmsw" and "smsw" instructions.
		1273	"lar" loads the access rights from the segment descriptor specified by
		1274	the selector in source operand into the destination operand and sets the ZF
		1275	flag. The destination operand can be a 16-bit or 32-bit general register.
		1276	The source operand should be a 16-bit general register or memory.
		1277
		1278
		1279	lar eax,dx ; load access rights into double word
		1280
		1281
		1282	selector in source operand into the destination operand and sets the ZF flag.
		1283	Rules for operand are the same as for the "lar" instruction.
		1284	"verr" and "verw" verify whether the code or data segment specified with
		1285	the operand is readable or writable from the current privilege level. The
		1286	operand should be a word, it can be general register or memory. If the segment
		1287	is accessible and readable (for "verr") or writable (for "verw") the ZF flag
		1288	is set, otherwise it's cleared. Rules for operand are the same as for the
		1289	"lldt" instruction.
		1290	"arpl" compares the RPL (requestor's privilege level) fields of two segment
		1291	selectors. The first operand contains one segment selector and the second
		1292	operand contains the other. If the RPL field of the destination operand is
		1293	less than the RPL field of the source operand, the ZF flag is set and the RPL
		1294	field of the destination operand is increased to match that of the source
		1295	operand. Otherwise, the ZF flag is cleared and no change is made to the
		1296	destination operand. The destination operand can be a word general register
		1297	or memory, the source operand must be a general register.
		1298
		1299
		1300	arpl [bx],ax ; adjust RPL of selector in memory
		1301
		1302
		1303	instruction has no operands.
		1304	"lock" prefix causes the processor's bus-lock signal to be asserted during
		1305	execution of the accompanying instruction. In a multiprocessor environment,
		1306	the bus-lock signal insures that the processor has exclusive use of any shared
		1307	memory while the signal is asserted. The "lock" prefix can be prepended only
		1308	to the following instructions and only to those forms of the instructions
		1309	where the destination operand is a memory operand: "add", "adc", "and", "btc",
		1310	"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
		1311	"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
		1312	these instructions and the source operand is a memory operand, an undefined
		1313	opcode exception may be generated. An undefined opcode exception will also be
		1314	generated if the "lock" prefix is used with any instruction not in the above
		1315	list. The "xchg" instruction always asserts the bus-lock signal regardless of
		1316	the presence or absence of the "lock" prefix.
		1317	"hlt" stops instruction execution and places the processor in a halted
		1318	state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
		1319	signal will resume execution. This instruction has no operands.
		1320	"invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
		1321	specified with the operand, which should be a memory. The processor determines
		1322	the page that contains that address and flushes the TLB entry for that page.
		1323	"rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
		1324	address specified in the ECX register into registers EDX and EAX. "wrmsr"
		1325	writes the contents of registers EDX and EAX into the 64-bit MSR of the
		1326	address specified in the ECX register. "rdtsc" loads the current value of the
		1327	processor's time stamp counter from the 64-bit MSR into the EDX and EAX
		1328	registers. The processor increments the time stamp counter MSR every clock
		1329	cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
		1330	contents of the 40-bit performance monitoring counter specified in the ECX
		1331	register into registers EDX and EAX. These instructions have no operands.
		1332	"wbinvd" writes back all modified cache lines in the processor's internal
		1333	cache to main memory and invalidates (flushes) the internal caches. The
		1334	instruction then issues a special function bus cycle that directs external
		1335	caches to also write back modified data and another bus cycle to indicate that
		1336	the external caches should be invalidated. This instruction has no operands.
		1337	"rsm" return program control from the system management mode to the program
		1338	that was interrupted when the processor received an SMM interrupt. This
		1339	instruction has no operands.
		1340	"sysenter" executes a fast call to a level 0 system procedure, "sysexit"
		1341	executes a fast return to level 3 user code. The addresses used by these
		1342	instructions are stored in MSRs. These instructions have no operands.
		1343
		1344
		1345
		1346
		1347
		1348	values in three formats: single precision (32-bit), double precision (64-bit)
		1349	and double extended precision (80-bit). The FPU registers form the stack and
		1350	each of them holds the double extended precision floating-point value. When
		1351	some values are pushed onto the stack or are removed from the top, the FPU
		1352	registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
		1353	is the first value below the top, etc. The ST0 name has also the synonym ST.
		1354	"fld" pushes the floating-point value onto the FPU register stack. The
		1355	operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
		1356	its value is then loaded onto the top of FPU register stack (the ST0
		1357	register) and is automatically converted into the double extended precision
		1358	format.
		1359
		1360
		1361	fld st2 ; push value of st2 onto register stack
		1362
		1363
		1364	commonly used contants onto the FPU register stack. The loaded constants are
		1365	+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
		1366	have no operands.
		1367	"fild" convert the singed integer source operand into double extended
		1368	precision floating-point format and pushes the result onto the FPU register
		1369	stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
		1370
		1371
		1372
		1373
		1374	can be 32-bit or 64-bit memory location or another FPU register. "fstp"
		1375	performs the same operation as "fst" and then pops the register stack,
		1376	getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
		1377	and can also store value in the 80-bit memory.
		1378
		1379
		1380	fstp tword [bx] ; store value in memory and pop stack
		1381
		1382
		1383	in the destination operand. The operand can be 16-bit or 32-bit memory
		1384	location. "fistp" performs the same operation and then pops the register
		1385	stack, it accepts the same operands as the "fist" instruction and can also
		1386	store integer value in the 64-bit memory, so it has the same rules for
		1387	operands as "fild" instruction.
		1388	"fbld" converts the packed BCD integer into double extended precision
		1389	floating-point format and pushes this value onto the FPU stack. "fbstp"
		1390	converts the value in ST0 to an 18-digit packed BCD integer, stores the result
		1391	in the destination operand, and pops the register stack. The operand should be
		1392	an 80-bit memory location.
		1393	"fadd" adds the destination and source operand and stores the sum in the
		1394	destination location. The destination operand is always an FPU register, if
		1395	the source is a memory location, the destination is ST0 register and only
		1396	source operand should be specified. If both operands are FPU registers, at
		1397	least one of them should be ST0 register. An operand in memory can be a
		1398	32-bit or 64-bit value.
		1399
		1400
		1401	fadd st2,st0 ; add st0 to st2
		1402
		1403
		1404	destination location and then pops the register stack. The destination operand
		1405	must be an FPU register and the source operand must be the ST0. When no
		1406	operands are specified, ST1 is used as a destination operand.
		1407
		1408
		1409	faddp st2,st0 ; add st0 to st2 and pop the stack
		1410
		1411
		1412	precision floating-point value and adds it to the destination operand. The
		1413	operand should be a 16-bit or 32-bit memory location.
		1414
		1415
		1416
		1417
		1418	have the same rules for operands and differ only in the perfomed computation.
		1419	"fsub" substracts the source operand from the destination operand, "fsubr"
		1420	substract the destination operand from the source operand, "fmul" multiplies
		1421	the destination and source operands, "fdiv" divides the destination operand by
		1422	the source operand and "fdivr" divides the source operand by the destination
		1423	operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
		1424	operations and pop the register stack, the rules for operand are the same as
		1425	for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
		1426	perform these operations after converting the integer source operand into
		1427	floating-point value, they have the same rules for operands as "fiadd"
		1428	instruction.
		1429	"fsqrt" computes the square root of the value in ST0 register, "fsin"
		1430	computes the sine of that value, "fcos" computes the cosine of that value,
		1431	"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
		1432	value, "frndint" rounds it to the nearest integral value, depending on the
		1433	current rounding mode. "f2xm1" computes the exponential value of 2 to the
		1434	power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
		1435	range -1.0 to +1.0. All these instruction store the result in ST0 and have no
		1436	operands.
		1437	"fsincos" computes both the sine and the cosine of the value in ST0
		1438	register, stores the sine in ST0 and pushes the cosine on the top of FPU
		1439	register stack. "fptan" computes the tangent of the value in ST0, stores the
		1440	result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
		1441	the arctangent of the value in ST1 divided by the value in ST0, stores the
		1442	result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
		1443	logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
		1444	FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
		1445	ST0 before computing the logarithm. "fprem" computes the remainder obtained
		1446	from dividing the value in ST0 by the value in ST1, and stores the result
		1447	in ST0. "fprem1" performs the same operation as "fprem", but it computes the
		1448	remainder in the way specified by IEEE Standard 754. "fscale" truncates the
		1449	value in ST1 and increases the exponent of ST0 by this value. "fxtract"
		1450	separates the value in ST0 into its exponent and significand, stores the
		1451	exponent in ST0 and pushes the significand onto the register stack. "fnop"
		1452	performs no operation. These instruction have no operands.
		1453	"fxch" exchanges the contents of ST0 an another FPU register. The operand
		1454	should be an FPU register, if no operand is specified, the contents of ST0 and
		1455	ST1 are exchanged.
		1456	"fcom" and "fcomp" compare the contents of ST0 and the source operand and
		1457	set flags in the FPU status word according to the results. "fcomp"
		1458	additionally pops the register stack after performing the comparison. The
		1459	operand can be a single or double precision value in memory or the FPU
		1460	register. When no operand is specified, ST1 is used as a source operand.
		1461
		1462
		1463	fcomp st2 ; compare st0 with st2 and pop stack
		1464
		1465
		1466	word according to the results and pops the register stack twice. This
		1467	instruction has no operands.
		1468	"fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
		1469	registers. Rules for operands are the same as for the "fcom", "fcomp" and
		1470	"fcompp", but the source operand must be an FPU register.
		1471	"ficom" and "ficomp" compare the value in ST0 with an integer source operand
		1472	and set the flags in the FPU status word according to the results. "ficomp"
		1473	additionally pops the register stack after performing the comparison. The
		1474	integer value is converted to double extended precision floating-point format
		1475	before the comparison is made. The operand should be a 16-bit or 32-bit
		1476	memory location.
		1477
		1478
		1479
		1480
		1481	another FPU register and set the ZF, PF and CF flags according to the results.
		1482	"fcomip" and "fucomip" additionaly pop the register stack after performing the
		1483	comparison. The instructions obtained by attaching the FPU condition mnemonic
		1484	(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
		1485	into ST0 register if the fiven test condition is true. These instruction
		1486	allow two different syntaxes, one with single operand specifying the source
		1487	FPU register, and one with two operands, in that case destination operand
		1488	should be ST0 register and the second operand specifies the source FPU
		1489	register.
		1490
		1491
		1492	fcmovb st0,st2 ; transfer st2 to st0 if below
		1493
		1494
		1495	��Ŀ
		1496	� Mnemonic � Condition tested � Description �
		1497	��͵
		1498	� b � CF = 1 � below �
		1499	� e � ZF = 1 � equal �
		1500	� be � CF or ZF = 1 � below or equal �
		1501	� u � PF = 1 � unordered �
		1502	� nb � CF = 0 � not below �
		1503	� ne � ZF = 0 � not equal �
		1504	� nbe � CF and ZF = 0 � not below nor equal �
		1505	� nu � PF = 0 � not unordered �
		1506	��
		1507
		1508
		1509	status word according to the results. "fxam" examines the contents of the ST0
		1510	and sets the flags in FPU status word to indicate the class of value in the
		1511	register. These instructions have no operands.
		1512	"fstsw" and "fnstsw" store the current value of the FPU status word in the
		1513	destination location. The destination operand can be either a 16-bit memory or
		1514	the AX register. "fstsw" checks for pending umasked FPU exceptions before
		1515	storing the status word, "fnstsw" does not.
		1516	"fstcw" and "fnstcw" store the current value of the FPU control word at the
		1517	specified destination in memory. "fstcw" checks for pending umasked FPU
		1518	exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
		1519	the operand into the FPU control word. The operand should be a 16-bit memory
		1520	location.
		1521	"fstenv" and "fnstenv" store the current FPU operating environment at the
		1522	memory location specified with the destination operand, and then mask all FPU
		1523	exceptions. "fstenv" checks for pending umasked FPU exceptions before
		1524	proceeding, "fnstenv" does not. "fldenv" loads the complete operating
		1525	environment from memory into the FPU. "fsave" and "fnsave" store the current
		1526	FPU state (operating environment and register stack) at the specified
		1527	destination in memory and reinitializes the FPU. "fsave" check for pending
		1528	unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
		1529	loads the FPU state from the specified memory location. All these instructions
		1530	need an operand being a memory location.
		1531	"finit" and "fninit" set the FPU operating environment into its default
		1532	state. "finit" checks for pending unmasked FPU exception before proceeding,
		1533	"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
		1534	FPU status word. "fclex" checks for pending unmasked FPU exception before
		1535	proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
		1536	instruction, which causes the processor to check for pending unmasked FPU
		1537	exceptions and handle them before proceeding. These instruction have no
		1538	operands.
		1539	"ffree" sets the tag associated with specified FPU register to empty. The
		1540	operand should be an FPU register.
		1541	"fincstp" and "fdecstp" rotate the FPU stack by one by adding or
		1542	substracting one to the pointer of the top of stack. These instruction have no
		1543	operands.
		1544
		1545
		1546
		1547
		1548
		1549	registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
		1550	of this MMX instructions cannot be used at the same time as FPU instructions.
		1551	They can operate on packed bytes (eight 8-bit integers), packed words (four
		1552	16-bit integers) or packed double words (two 32-bit integers), use of packed
		1553	formats allows to perform operations on multiple data at one time.
		1554	"movq" copies a quad word from the source operand to the destination
		1555	operand. At least one of the operands must be a MMX register, the second one
		1556	can be also a MMX register or 64-bit memory location.
		1557
		1558
		1559	movq mm2,[ebx] ; move quad word from memory to register
		1560
		1561
		1562	operand. One of the operands must be a MMX register, the second one can be a
		1563	general register or 32-bit memory location. Only low double word of MMX
		1564	register is used.
		1565	All general MMX operations have two operands, the destination operand should
		1566	be a MMX register, the source operand can be a MMX register or 64-bit memory
		1567	location. Operation is performed on the corresponding data elements of the
		1568	source and destination operand and stored in the data elements of the
		1569	destination operand. "paddb", "paddw" and "paddd" perform the addition of
		1570	packed bytes, packed words, or packed double words. "psubb", "psubw" and
		1571	"psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
		1572	"psubsb" and "psubsw" perform the addition or substraction of packed bytes
		1573	or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
		1574	"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
		1575	performs a signed multiply of the packed words and store the high or low words
		1576	of the results in the destination operand. "pmaddwd" performs a multiply of
		1577	the packed words and adds the four intermediate double word products in pairs
		1578	to produce result as a packed double words. "pand", "por" and "pxor" perform
		1579	the logical operations on the quad words, "pandn" peforms also a logical
		1580	negation of the destination operand before performing the "and" operation.
		1581	"pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed bytes,
		1582	packed words or packed double words. If a pair of data elements is equal, the
		1583	corresponding data element in the destination operand is filled with bits of
		1584	value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd" perform
		1585	the similar operation, but they check whether the data elements in the
		1586	destination operand are greater than the correspoding data elements in the
		1587	source operand. "packsswb" converts packed signed words into packed signed
		1588	bytes, "packssdw" converts packed signed double words into packed signed
		1589	words, using saturation to handle overflow conditions. "packuswb" converts
		1590	packed signed words into packed unsigned bytes. Converted data elements from
		1591	the source operand are stored in the low part of the destination operand,
		1592	while converted data elements from the destination operand are stored in the
		1593	high part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
		1594	elements from the high parts of the source and destination operands and
		1595	stores the result into the destination operand. "punpcklbw", "punpcklwd" and
		1596	"punpckldq" perform the same operation, but the low parts of the source and
		1597	destination operand are used.
		1598
		1599
		1600	pcmpeqw mm3,mm7 ; compare packed words for equality
		1601
		1602
		1603	packed double words or a single quad word in the destination operand by the
		1604	amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
		1605	logical shift right of the packed words, packed double words or a single quad
		1606	word. "psraw" and "psrad" perform arithmetic shift of the packed words or
		1607	double words. The destination operand should be a MMX register, while source
		1608	operand can be a MMX register, 64-bit memory location, or 8-bit immediate
		1609	value.
		1610
		1611
		1612	psrad mm4,[ebx] ; shift double words right arithmetically
		1613
		1614
		1615	used before using the FPU instructions if any MMX instructions were used.
		1616
		1617
		1618
		1619
		1620
		1621	operations on packed single precision floating point values. The 128-bit
		1622	packed single precision format consists of four single precision floating
		1623	point values. The 128-bit SSE registers are designed for the purpose of
		1624	operations on this data type.
		1625	"movaps" and "movups" transfer a double quad word operand containing packed
		1626	single precision values from source operand to destination operand. At least
		1627	one of the operands have to be a SSE register, the second one can be also a
		1628	SSE register or 128-bit memory location. Memory operands for "movaps"
		1629	instruction must be aligned on boundary of 16 bytes, operands for "movups"
		1630	instruction don't have to be aligned.
		1631
		1632
		1633
		1634
		1635	low quad word of SSE register. "movhps" moved packed two single precision
		1636	values between the memory and the high quad word of SSE register. One of the
		1637	operands must be a SSE register, and the other operand must be a 64-bit memory
		1638	location.
		1639
		1640
		1641	movhps [esi],xmm7 ; move high quad word of xmm7 to memory
		1642
		1643
		1644	of source register to the high quad word of destination register. "movhlps"
		1645	moves two packed single precision values from the high quad word of source
		1646	register to the low quad word of destination register. Both operands have to
		1647	be a SSE registers.
		1648	"movmskps" transfers the most significant bit of each of the four single
		1649	precision values in the SSE register into low four bits of a general register.
		1650	The source operand must be a SSE register, the destination operand must be a
		1651	general register.
		1652	"movss" transfers a single precision value between source and destination
		1653	operand (only the low double word is trasferred). At least one of the operands
		1654	have to be a SSE register, the second one can be also a SSE register or 32-bit
		1655	memory location.
		1656
		1657
		1658
		1659
		1660	ends with "ps", the source operand can be a 128-bit memory location or a SSE
		1661	register, the destination operand must be a SSE register and the operation is
		1662	performed on packed four single precision values, for each pair of the
		1663	corresponding data elements separately, the result is stored in the
		1664	destination register. When the mnemonic ends with "ss", the source operand
		1665	can be a 32-bit memory location or a SSE register, the destination operand
		1666	must be a SSE register and the operation is performed on single precision
		1667	values, only low double words of SSE registers are used in this case, the
		1668	result is stored in the low double word of destination register. "addps" and
		1669	"addss" add the values, "subps" and "subss" substract the source value from
		1670	destination value, "mulps" and "mulss" multiply the values, "divps" and
		1671	"divss" divide the destination value by the source value, "rcpps" and "rcpss"
		1672	compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
		1673	compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
		1674	the approximate reciprocal of square root of the source value, "maxps" and
		1675	"maxss" compare the source and destination values and return the greater one,
		1676	"minps" and "minss" compare the source and destination values and return the
		1677	lesser one.
		1678
		1679
		1680	addps xmm3,xmm7 ; add packed single precision values
		1681
		1682
		1683	packed single precision values. The source operand can be a 128-bit memory
		1684	location or a SSE register, the destination operand must be a SSE register.
		1685	"cmpps" compares packed single precision values and returns a mask result
		1686	into the destination operand, which must be a SSE register. The source operand
		1687	can be a 128-bit memory location or SSE register, the third operand must be an
		1688	immediate operand selecting code of one of the eight compare conditions
		1689	(table 2.3). "cmpss" performs the same operation on single precision values,
		1690	only low double word of destination register is affected, in this case source
		1691	operand can be a 32-bit memory location or SSE register. These two
		1692	instructions have also variants with only two operands and the condition
		1693	encoded within mnemonic. Their mnemonics are obtained by attaching the
		1694	mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
		1695	"ss" at the end.
		1696
		1697
		1698	cmpltss xmm0,[ebx] ; compare single precision values
		1699
		1700
		1701	��Ŀ
		1702	� Code � Mnemonic � Description �
		1703	��͵
		1704	� 0 � eq � equal �
		1705	� 1 � lt � less than �
		1706	� 2 � le � less than or equal �
		1707	� 3 � unord � unordered �
		1708	� 4 � neq � not equal �
		1709	� 5 � nlt � not less than �
		1710	� 6 � nle � not less than nor equal �
		1711	� 7 � ord � ordered �
		1712	��
		1713
		1714
		1715	PF and CF flags to show the result. The destination operand must be a SSE
		1716	register, the source operand can be a 32-bit memory location or SSE register.
		1717	"shufps" moves any two of the four single precision values from the
		1718	destination operand into the low quad word of the destination operand, and any
		1719	two of the four values from the source operand into the high quad word of the
		1720	destination operand. The destination operand must be a SSE register, the
		1721	source operand can be a 128-bit memory location or SSE register, the third
		1722	operand must be an 8-bit immediate value selecting which values will be moved
		1723	into the destination operand. Bits 0 and 1 select the value to be moved from
		1724	destination operand to the low double word of the result, bits 2 and 3 select
		1725	the value to be moved from the destination operand to the second double word,
		1726	bits 4 and 5 select the value to be moved from the source operand to the third
		1727	double word, and bits 6 and 7 select the value to be moved from the source
		1728	operand to the high double word of the result.
		1729
		1730
		1731
		1732
		1733	of the source and destination operands and stores the result in the
		1734	destination operand, which must be a SSE register. The source operand can be
		1735	a 128-bit memory location or a SSE register. "unpcklps" performs an
		1736	interleaved unpack of the values from the low parts of the source and
		1737	destination operand and stores the result in the destination operand,
		1738	the rules for operands are the same.
		1739	"cvtpi2ps" converts packed two double word integers into the the packed two
		1740	single precision floating point values and stores the result in the low quad
		1741	word of the destination operand, which should be a SSE register. The source
		1742	operand can be a 64-bit memory location or MMX register.
		1743
		1744
		1745
		1746
		1747	point value and stores the result in the low double word of the destination
		1748	operand, which should be a SSE register. The source operand can be a 32-bit
		1749	memory location or 32-bit general register.
		1750
		1751
		1752
		1753
		1754	packed two double word integers and stores the result in the destination
		1755	operand, which should be a MMX register. The source operand can be a 64-bit
		1756	memory location or SSE register, only low quad word of SSE register is used.
		1757	"cvttps2pi" performs the similar operation, except that truncation is used to
		1758	round a source values to integers, rules for the operands are the same.
		1759
		1760
		1761
		1762
		1763	word integer and stores the result in the destination operand, which should be
		1764	a 32-bit general register. The source operand can be a 32-bit memory location
		1765	or SSE register, only low double word of SSE register is used. "cvttss2si"
		1766	performs the similar operation, except that truncation is used to round a
		1767	source value to integer, rules for the operands are the same.
		1768
		1769
		1770
		1771
		1772	operand to the destination operand. The source operand must be a MMX register,
		1773	the destination operand must be a 32-bit general register (but only the low
		1774	word of it is affected), the third operand must an 8-bit immediate value.
		1775
		1776
		1777
		1778
		1779	at the location specified with the third operand, which must be an 8-bit
		1780	immediate value. The destination operand must be a MMX register, the source
		1781	operand can be a 16-bit memory location or 32-bit general register (only low
		1782	word of the register is used).
		1783
		1784
		1785
		1786
		1787	return the maximum values of packed unsigned bytes, "pminub" returns the
		1788	minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
		1789	of packed signed words, "pminsw" returns the minimum values of packed signed
		1790	words. "pmulhuw" performs a unsigned multiply of the packed words and stores
		1791	the high words of the results in the destination operand. "psadbw" computes
		1792	the absolute differences of packed unsigned bytes, sums the differences, and
		1793	stores the sum in the low word of destination operand. All these instructions
		1794	follow the same rules for operands as the general MMX operations described in
		1795	previous section.
		1796	"pmovmskb" creates a mask made of the most significant bit of each byte in
		1797	the source operand and stores the result in the low byte of destination
		1798	operand. The source operand must be a MMX register, the destination operand
		1799	must a 32-bit general register.
		1800	"pshufw" inserts words from the source operand in the destination operand
		1801	from the locations specified with the third operand. The destination operand
		1802	must be a MMX register, the source operand can be a 64-bit memory location or
		1803	MMX register, third operand must an 8-bit immediate value selecting which
		1804	values will be moved into destination operand, in the similar way as the third
		1805	operand of the "shufps" instruction.
		1806	"movntq" moves the quad word from the source operand to memory using a
		1807	non-temporal hint to minimize cache pollution. The source operand should be a
		1808	MMX register, the destination operand should be a 64-bit memory location.
		1809	"movntps" stores packed single precision values from the SSE register to
		1810	memory using a non-temporal hint. The source operand should be a SSE register,
		1811	the destination operand should be a 128-bit memory location. "maskmovq" stores
		1812	selected bytes from the first operand into a 64-bit memory location using a
		1813	non-temporal hint. Both operands should be a MMX registers, the second operand
		1814	selects wich bytes from the source operand are written to memory. The
		1815	memory location is pointed by DI (or EDI) register in the segment selected
		1816	by DS.
		1817	"prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
		1818	of data from memory that contains byte specified with the operand to a
		1819	specified location in hierarchy. The operand should be an 8-bit memory
		1820	location.
		1821	"sfence" performs a serializing operation on all instruction storing to
		1822	memory that were issued prior to it. This instruction has no operands.
		1823	"ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
		1824	stores the contents of MXCSR into a 32-bit memory operand.
		1825	"fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
		1826	and SSE registers to a 512-byte memory location specified in the destination
		1827	operand. "fxrstor" reloads data previously stored with "fxsave" instruction
		1828	from the specified 512-byte memory location. The memory operand for both those
		1829	instructions must be aligned on 16 byte boundary, it should declare operand
		1830	of no specified size.
		1831
		1832
		1833
		1834
		1835
		1836	floating point values, extends the syntax of MMX instructions, and adds also
		1837	some new instructions.
		1838	"movapd" and "movupd" transfer a double quad word operand containing packed
		1839	double precision values from source operand to destination operand. These
		1840	instructions are analogous to "movaps" and "movups" and have the same rules
		1841	for operands.
		1842	"movlpd" moves double precision value between the memory and the low quad
		1843	word of SSE register. "movhpd" moved double precision value between the memory
		1844	and the high quad word of SSE register. These instructions are analogous to
		1845	"movlps" and "movhps" and have the same rules for operands.
		1846	"movmskpd" transfers the most significant bit of each of the two double
		1847	precision values in the SSE register into low two bits of a general register.
		1848	This instruction is analogous to "movmskps" and has the same rules for
		1849	operands.
		1850	"movsd" transfers a double precision value between source and destination
		1851	operand (only the low quad word is trasferred). At least one of the operands
		1852	have to be a SSE register, the second one can be also a SSE register or 64-bit
		1853	memory location.
		1854	Arithmetic operations on double precision values are: "addpd", "addsd",
		1855	"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
		1856	"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
		1857	operations on single precision values described in previous section. When the
		1858	mnemonic ends with "pd" instead of "ps", the operation is performed on packed
		1859	two double precision values, but rules for operands are the same. When the
		1860	mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
		1861	memory location or a SSE register, the destination operand must be a SSE
		1862	register and the operation is performed on double precision values, only low
		1863	quad words of SSE registers are used in this case.
		1864	"andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
		1865	packed double precision values. They are analoguous to SSE logical operations
		1866	on single prevision values and have the same rules for operands.
		1867	"cmppd" compares packed double precision values and returns and returns a
		1868	mask result into the destination operand. This instruction is analoguous to
		1869	"cmpps" and has the same rules for operands. "cmpsd" performs the same
		1870	operation on double precision values, only low quad word of destination
		1871	register is affected, in this case source operand can be a 64-bit memory or
		1872	SSE register. Variant with only two operands are obtained by attaching the
		1873	condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
		1874	the "pd" or "sd" at the end.
		1875	"comisd" and "ucomisd" compare the double precision values and set the ZF,
		1876	PF and CF flags to show the result. The destination operand must be a SSE
		1877	register, the source operand can be a 128-bit memory location or SSE register.
		1878	"shufpd" moves any of the two double precision values from the destination
		1879	operand into the low quad word of the destination operand, and any of the two
		1880	values from the source operand into the high quad word of the destination
		1881	operand. This instruction is analoguous to "shufps" and has the same rules for
		1882	operand. Bit 0 of the third operand selects the value to be moved from the
		1883	destination operand, bit 1 selects the value to be moved from the source
		1884	operand, the rest of bits are reserved and must be zeroed.
		1885	"unpckhpd" performs an unpack of the high quad words from the source and
		1886	destination operands, "unpcklpd" performs an unpack of the low quad words from
		1887	the source and destination operands. They are analoguous to "unpckhps" and
		1888	"unpcklps", and have the same rules for operands.
		1889	"cvtps2pd" converts the packed two single precision floating point values to
		1890	two packed double precision floating point values, the destination operand
		1891	must be a SSE register, the source operand can be a 64-bit memory location or
		1892	SSE register. "cvtpd2ps" converts the packed two double precision floating
		1893	point values to packed two single precision floating point values, the
		1894	destination operand must be a SSE register, the source operand can be a
		1895	128-bit memory location or SSE register. "cvtss2sd" converts the single
		1896	precision floating point value to double precision floating point value, the
		1897	destination operand must be a SSE register, the source operand can be a 32-bit
		1898	memory location or SSE register. "cvtsd2ss" converts the double precision
		1899	floating point value to single precision floating point value, the destination
		1900	operand must be a SSE register, the source operand can be 64-bit memory
		1901	location or SSE register.
		1902	"cvtpi2pd" converts packed two double word integers into the the packed
		1903	double precision floating point values, the destination operand must be a SSE
		1904	register, the source operand can be a 64-bit memory location or MMX register.
		1905	"cvtsi2sd" converts a double word integer into a double precision floating
		1906	point value, the destination operand must be a SSE register, the source
		1907	operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
		1908	converts packed double precision floating point values into packed two double
		1909	word integers, the destination operand should be a MMX register, the source
		1910	operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
		1911	the similar operation, except that truncation is used to round a source values
		1912	to integers, rules for operands are the same. "cvtsd2si" converts a double
		1913	precision floating point value into a double word integer, the destination
		1914	operand should be a 32-bit general register, the source operand can be a
		1915	64-bit memory location or SSE register. "cvttsd2si" performs the similar
		1916	operation, except that truncation is used to round a source value to integer,
		1917	rules for operands are the same.
		1918	"cvtps2dq" and "cvttps2dq" convert packed single precision floating point
		1919	values to packed four double word integers, storing them in the destination
		1920	operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
		1921	point values to packed two double word integers, storing the result in the low
		1922	quad word of the destination operand. "cvtdq2ps" converts packed four
		1923	double word integers to packed single precision floating point values.
		1924	"cvtdq2pd" converts packed two double word integers from the low quad word
		1925	of the source operand to packed double precision floating point values.
		1926	For all these instruction destination operand must be a SSE register, the
		1927	source operand can be a 128-bit memory location or SSE register.
		1928	"movdqa" and "movdqu" transfer a double quad word operand containing packed
		1929	integers from source operand to destination operand. At least one of the
		1930	operands have to be a SSE register, the second one can be also a SSE register
		1931	or 128-bit memory location. Memory operands for "movdqa" instruction must be
		1932	aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
		1933	to be aligned.
		1934	"movq2dq" moves the contents of the MMX source register to the low quad word
		1935	of destination SSE register. "movdq2q" moves the low quad word from the source
		1936	SSE register to the destination MMX register.
		1937
		1938
		1939	movdq2q mm0,xmm1 ; move from SSE register to MMX register
		1940
		1941
		1942	mnemonics starting with "p") are extended to operate on 128-bit packed
		1943	integers located in SSE registers. Additional syntax for these instructions
		1944	needs an SSE register where MMX register was needed, and the 128-bit memory
		1945	location or SSE register where 64-bit memory location of MMX register were
		1946	needed. The exception is "pshufw" instruction, which doesn't allow extended
		1947	syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
		1948	the extended syntax, and perform the same operation as "pshufw" on the high
		1949	or low quad words of operands respectively. Also the new instruction "pshufd"
		1950	is introduced, which performs the same operation as "pshufw", but on the
		1951	double words instead of words, it allows only the extended syntax.
		1952
		1953
		1954	pextrw eax,xmm0,7 ; extract highest word into eax
		1955
		1956
		1957	substraction of packed quad words, "pmuludq" performs an unsigned multiply
		1958	of low double words from each corresponding quad words and returns the results
		1959	in packed quad words. These instructions follow the same rules for operands as
		1960	the general MMX operations described in 2.1.14.
		1961	"pslldq" and "psrldq" perform logical shift left or right of the double
		1962	quad word in the destination operand by the amount of bits specified in the
		1963	source operand. The destination operand should be a SSE register, source
		1964	operand should be an 8-bit immediate value.
		1965	"punpckhqdq" interleaves the high quad word of the source operand and the
		1966	high quad word of the destination operand and writes them to the destination
		1967	SSE register. "punpcklqdq" interleaves the low quad word of the source operand
		1968	and the low quad word of the destination operand and writes them to the
		1969	destination SSE register. The source operand can be a 128-bit memory location
		1970	or SSE register.
		1971	"movntdq" stores packed integer data from the SSE register to memory using
		1972	non-temporal hint. The source operand should be a SSE register, the
		1973	destination operand should be a 128-bit memory location. "movntpd" stores
		1974	packed double precision values from the SSE register to memory using a
		1975	non-temporal hint. Rules for operand are the same. "movnti" stores integer
		1976	from a general register to memory using a non-temporal hint. The source
		1977	operand should be a 32-bit general register, the destination operand should
		1978	be a 32-bit memory location. "maskmovdqu" stores selected bytes from the first
		1979	operand into a 128-bit memory location using a non-temporal hint. Both
		1980	operands should be a SSE registers, the second operand selects wich bytes from
		1981	the source operand are written to memory. The memory location is pointed by DI
		1982	(or EDI) register in the segment selected by DS and does not need to be
		1983	aligned.
		1984	"clflush" writes and invalidates the cache line associated with the address
		1985	of byte specified with the operand, which should be a 8-bit memory location.
		1986	"lfence" performs a serializing operation on all instruction loading from
		1987	memory that were issued prior to it. "mfence" performs a serializing operation
		1988	on all instruction accesing memory that were issued prior to it, and so it
		1989	combines the functions of "sfence" (described in previous section) and
		1990	"lfence" instructions. These instructions have no operands.
		1991
		1992
		1993
		1994
		1995
		1996	of SSE and SSE2 - this extension is called SSE3.
		1997	"fisttp" behaves like the "fistp" instruction and accepts the same operands,
		1998	the only difference is that it always used truncation, irrespective of the
		1999	rounding mode.
		2000	"movshdup" loads into destination operand the 128-bit value obtained from
		2001	the source value of the same size by filling the each quad word with the two
		2002	duplicates of the value in its high double word. "movsldup" performs the same
		2003	action, except it duplicates the values of low double words. The destination
		2004	operand should be SSE register, the source operand can be SSE register or
		2005	128-bit memory location.
		2006	"movddup" loads the 64-bit source value and duplicates it into high and low
		2007	quad word of the destination operand. The destination operand should be SSE
		2008	register, the source operand can be SSE register or 64-bit memory location.
		2009	"lddqu" is functionally equivalent to "movdqu" instruction with memory as
		2010	source operand, but it may improve performance when the source operand crosses
		2011	a cacheline boundary. The destination operand has to be SSE register, the
		2012	source operand must be 128-bit memory location.
		2013	"addsubps" performs single precision addition of second and fourth pairs and
		2014	single precision substracion of the first and third pairs of floating point
		2015	values in the operands. "addsubpd" performs double precision addition of the
		2016	second pair and double precision substraction of the first pair of floating
		2017	point values in the operand. "haddps" performs the addition of two single
		2018	precision values within the each quad word of source and destination operands,
		2019	and stores the results of such horizontal addition of values from destination
		2020	operand into low quad word of destination operand, and the results from the
		2021	source operand into high quad word of destination operand. "haddpd" performs
		2022	the addition of two double precision values within each operand, and stores
		2023	the result from destination operand into low quad word of destination operand,
		2024	and the result from source operand into high quad word of destination operand.
		2025	All these instruction need the destination operand to be SSE register, source
		2026	operand can be SSE register or 128-bit memory location.
		2027	"monitor" sets up an address range for monitoring of write-back stores. It
		2028	need its three operands to be EAX, ECX and EDX register in that order. "mwait"
		2029	waits for a write-back store to the address range set up by the "monitor"
		2030	instruction. It uses two operands with additional parameters, first being the
		2031	EAX and second the ECX register.
		2032
		2033
		2034
		2035
		2036
		2037	and introduces operation on the 64-bit packed floating point values, each
		2038	consisting of two single precision floating point values.
		2039	These instructions follow the same rules as the general MMX operations, the
		2040	destination operand should be a MMX register, the source operand can be a MMX
		2041	register or 64-bit memory location. "pavgusb" computes the rounded averages
		2042	of packed unsigned bytes. "pmulhrw" performs a signed multiply of the packed
		2043	words, round the high word of each double word results and stores them in the
		2044	destination operand. "pi2fd" converts packed double word integers into
		2045	packed floating point values. "pf2id" converts packed floating point values
		2046	into packed double word integers using truncation. "pi2fw" converts packed
		2047	word integers into packed floating point values, only low words of each
		2048	double word in source operand are used. "pf2iw" converts packed floating
		2049	point values to packed word integers, results are extended to double words
		2050	using the sign extension. "pfadd" adds packed floating point values. "pfsub"
		2051	and "pfsubr" substracts packed floating point values, the first one substracts
		2052	source values from destination values, the second one substracts destination
		2053	values from the source values. "pfmul" multiplies packed floating point
		2054	values. "pfacc" adds the low and high floating point values of the destination
		2055	operand, storing the result in the low double word of destination, and adds
		2056	the low and high floating point values of the source operand, storing the
		2057	result in the high double word of destination. "pfnacc" substracts the high
		2058	floating point value of the destination operand from the low, storing the
		2059	result in the low double word of destination, and substracts the high floating
		2060	point value of the source operand from the low, storing the result in the high
		2061	double word of destination. "pfpnacc" substracts the high floating point value
		2062	of the destination operand from the low, storing the result in the low double
		2063	word of destination, and adds the low and high floating point values of the
		2064	source operand, storing the result in the high double word of destination.
		2065	"pfmax" and "pfmin" compute the maximum and minimum of floating point values.
		2066	"pswapd" reverses the high and low double word of the source operand. "pfrcp"
		2067	returns an estimates of the reciprocals of floating point values from the
		2068	source operand, "pfrsqrt" returns an estimates of the reciprocal square
		2069	roots of floating point values from the source operand, "pfrcpit1" performs
		2070	the first step in the Newton-Raphson iteration to refine the reciprocal
		2071	approximation produced by "pfrcp" instruction, "pfrsqit1" performs the first
		2072	step in the Newton-Raphson iteration to refine the reciprocal square root
		2073	approximation produced by "pfrsqrt" instruction, "pfrcpit2" performs the
		2074	second final step in the Newton-Raphson iteration to refine the reciprocal
		2075	approximation or the reciprocal square root approximation. "pfcmpeq",
		2076	"pfcmpge" and "pfcmpgt" compare the packed floating point values and sets
		2077	all bits or zeroes all bits of the correspoding data element in the
		2078	destination operand according to the result of comparison, first checks
		2079	whether values are equal, second checks whether destination value is greater
		2080	or equal to source value, third checks whether destination value is greater
		2081	than source value.
		2082	"prefetch" and "prefetchw" load the line of data from memory that contains
		2083	byte specified with the operand into the data cache, "prefetchw" instruction
		2084	should be used when the data in the cache line is expected to be modified,
		2085	otherwise the "prefetch" instruction should be used. The operand should be an
		2086	8-bit memory location.
		2087	"femms" performs a fast clear of MMX state. This instruction has no
		2088	operands.
		2089
		2090
		2091
		2092
		2093
		2094	both) extend the x86 instruction set for the 64-bit processing. While legacy
		2095	and compatibility modes use the same set of registers and instructions, the
		2096	new long mode extends the x86 operations to 64 bits and introduces several new
		2097	registers. You can turn on generating the code for this mode with the "use64"
		2098	directive.
		2099	Each of the general purpose registers is extended to 64 bits and the eight
		2100	whole new general purpose registers and also eight new SSE registers are added.
		2101	See table 2.4 for the summary of new registers (only the ones that was not
		2102	listed in table 1.2). The general purpose registers of smallers sizes are the
		2103	low order portions of the larger ones. You can still access the "ah", "bh",
		2104	"ch" and "dh" registers in long mode, but you cannot use them in the same
		2105	instruction with any of the new registers.
		2106
		2107
		2108	��Ŀ
		2109	� Type � General � SSE �
		2110	��Ĵ
		2111	� Bits � 8 � 16 � 32 � 64 � 128 �
		2112	��͵
		2113	� � � � � rax � �
		2114	� � � � � rcx � �
		2115	� � � � � rdx � �
		2116	� � � � � rbx � �
		2117	� � spl � � � rsp � �
		2118	� � bpl � � � rbp � �
		2119	� � sil � � � rsi � �
		2120	� � dil � � � rdi � �
		2121	� � r8b � r8w � r8d � r8 � xmm8 �
		2122	� � r9b � r9w � r9d � r9 � xmm9 �
		2123	� � r10b � r10w � r10d � r10 � xmm10 �
		2124	� � r11b � r11w � r11d � r11 � xmm11 �
		2125	� � r12b � r12w � r12d � r12 � xmm12 �
		2126	� � r13b � r13w � r13d � r13 � xmm13 �
		2127	� � r14b � r14w � r14d � r14 � xmm14 �
		2128	� � r15b � r15w � r15d � r15 � xmm15 �
		2129	��
		2130
		2131
		2132	32-bit operand sizes, in long mode allows also the 64-bit operands. The 64-bit
		2133	registers should be used for addressing in long mode, the 32-bit addressing
		2134	is also allowed, but it's not possible to use the addresses based on 16-bit
		2135	registers. Below are the samples of new operations possible in long mode on the
		2136	example of "mov" instruction:
		2137
		2138
		2139	mov al,[rbx] ; transfer memory addressed by 64-bit register
		2140
		2141
		2142	specify it manually with the special RIP register symbol, but such addressing
		2143	is also automatically generated by flat assembler, since there is no 64-bit
		2144	absolute addressing in long mode. You can still force the assembler to use the
		2145	32-bit absolute addressing by putting the "dword" size override for address
		2146	inside the square brackets. There is also one exception, where the 64-bit
		2147	absolute addressing is possible, it's the "mov" instruction with one of the
		2148	operand being accumulator register, and second being the memory operand.
		2149	To force the assembler to use the 64-bit absolute addressing there, use the
		2150	"qword" size operator for address inside the square brackets. When no size
		2151	operator is applied to address, assembler generates the optimal form
		2152	automatically.
		2153
		2154
		2155	mov [dword 0],r15d ; absolute 32-bit addressing
		2156	mov [0],rsi ; automatic RIP-relative addressing
		2157	mov [rip+3],sil ; manual RIP-relative addressing
		2158
		2159
		2160	values are possible, with the only exception being the "mov" instruction with
		2161	destination operand being 64-bit general purpose register. Trying to force the
		2162	64-bit immediate with any other instruction will cause an error.
		2163	If any operation is performed on the 32-bit general registers in long mode,
		2164	the upper 32 bits of the 64-bit registers containing them are filled with
		2165	zeros. This is unlike the operations on 16-bit or 8-bit portions of those
		2166	registers, which preserve the upper bits.
		2167	Three new type conversion instructions are available. The "cdqe" sign extends
		2168	the double word in EAX into quad word and stores the result in RAX register.
		2169	"cqo" sign extends the quad word in RAX into double quad word and stores the
		2170	extra bits in the RDX register. These instructions have no operands. "movsxd"
		2171	sign extends the double word source operand, being either the 32-bit register
		2172	or memory, into 64-bit destination operand, which has to be register.
		2173	No analogous instruction is needed for the zero extension, since it is done
		2174	automatically by any operations on 32-bit registers, as noted in previous
		2175	paragraph. And the "movzx" and "movsx" instructions, conforming to the general
		2176	rule, can be used with 64-bit destination operand, allowing extension of byte
		2177	or word values into quad words.
		2178	All the binary arithmetic and logical instruction are promoted to allow
		2179	64-bit operands in long mode. The use of decimal arithmetic instructions in
		2180	long mode is prohibited.
		2181	The stack operations, like "push" and "pop" in long mode default to 64-bit
		2182	operands and it's not possible to use 32-bit operands with them. The "pusha"
		2183	and "popa" are disallowed in long mode.
		2184	The indirect near jumps and calls in long mode default to 64-bit operands and
		2185	it's not possible to use the 32-bit operands with them. On the other hand, the
		2186	indirect far jumps and calls allow any operands that were allowed by the x86
		2187	architecture and also 80-bit memory operand is allowed (though only EM64T seems
		2188	to implement such variant), with the first eight bytes defining the offset and
		2189	two last bytes specifying the selector. The direct far jumps and calls are not
		2190	allowed in long mode.
		2191	The I/O instructions, "in", "out", "ins" and "outs" are the exceptional
		2192	instructions that are not extended to accept quad word operands in long mode.
		2193	But all other string operations are, and there are new short forms "movsq",
		2194	"cmpsq", "scasq", "lodsq" and "stosq" introduced for the variants of string
		2195	operations for 64-bit string elements. The RSI and RDI registers are used by
		2196	default to address the string elements.
		2197	The "lfs", "lgs" and "lss" instructions are extended to accept 80-bit source
		2198	memory operand with 64-bit destination register (though only EM64T seems to
		2199	implement such variant). The "lds" and "les" are disallowed in long mode.
		2200	The system instructions like "lgdt" which required the 48-bit memory operand,
		2201	in long mode require the 80-bit memory operand.
		2202	The "cmpxchg16b" is the 64-bit equivalent of "cmpxchg8b" instruction, it uses
		2203	the double quad word memory operand and 64-bit registers to perform the
		2204	analoguous operation.
		2205	"swapgs" is the new instruction, which swaps the contents of GS register and
		2206	the KernelGSbase model-specific register (MSR address 0C0000102h).
		2207	"syscall" and "sysret" is the pair of new instructions that provide the
		2208	functionality similar to "sysenter" and "sysexit" in long mode, where the
		2209	latter pair is disallowed.
		2210
		2211
		2212
		2213
		2214
		2215	are processed during the assembly and may cause some blocks of instructions
		2216	to be assembled differently or not assembled at all.
		2217
		2218
		2219
		2220
		2221
		2222	preceded by the name for the constant and followed by the numerical expression
		2223	providing the value. The value of such constants can be a number or an address,
		2224	but - unlike labels - the numerical constants are not allowed to hold the
		2225	register-based addresses. Besides this difference, in their basic variant
		2226	numerical constants behave very much like labels and you can even
		2227	forward-reference them (access their values before they actually get defined).
		2228	There is, however, a second variant of numerical constants, which is
		2229	recognized by assembler when you try to define the constant of name, under
		2230	which there already was a numerical constant defined. In such case assembler
		2231	treats that constant as an assembly-time variable and allows it to be assigned
		2232	with new value, but forbids forward-referencing it (for obvious reasons). Let's
		2233	see both the variant of numerical constants in one example:
		2234
		2235
		2236	x = 1
		2237	x = x+2
		2238	sum = x
		2239
		2240
		2241	value that was assigned to it the most recently is used. Thus if we tried to
		2242	access the "x" before it gets defined the first time, like if we wrote "dd x"
		2243	in place of the "dd sum" instruction, it would cause an error. And when it is
		2244	re-defined with the "x = x+2" directive, the previous value of "x" is used to
		2245	calculate the new one. So when the "sum" constant gets defined, the "x" has
		2246	value of 3, and this value is assigned to the "sum". Since this one is defined
		2247	only once in source, it is the standard numerical constant, and can be
		2248	forward-referenced. So the "dd sum" is assembled as "dd 3". To read more about
		2249	how the assembler is able to resolve this, see section 2.2.6.
		2250	The value of numerical constant can be preceded by size operator, which can
		2251	ensure that the value will fit in the range for the specified size, and can
		2252	affect also how some of the calculations inside the numerical expression are
		2253	performed. This example:
		2254
		2255
		2256	c32 = dword -1
		2257
		2258
		2259	fits in 32 bits.
		2260	When you need to define constant with the value of address, which may be
		2261	register-based (and thus you cannot employ numerical constant for this
		2262	purpose), you can use the extended syntax of "label" directive (already
		2263	described in section 1.2.3), like:
		2264
		2265
		2266
		2267
		2268	unlike numerical constants, cannot become assembly-time variables.
		2269
		2270
		2271
		2272
		2273
		2274	certain condition. It should be followed by logical expression specifying the
		2275	condition, instructions in next lines will be assembled only when this
		2276	condition is met, otherwise they will be skipped. The optional "else if"
		2277	directive followed with logical expression specifying additional condition
		2278	begins the next block of instructions that will be assembled if previous
		2279	conditions were not met, and the additional condition is met. The optional
		2280	"else" directive begins the block of instructions that will be assembled if
		2281	all the conditions were not met. The "end if" directive ends the last block of
		2282	instructions.
		2283	You should note that "if" directive is processed at assembly stage and
		2284	therefore it doesn't affect any preprocessor directives, like the definitions
		2285	of symbolic constants and macroinstructions - when the assembler recognizes the
		2286	"if" directive, all the preprocessing has been already finished.
		2287	The logical expression consist of logical values and logical operators. The
		2288	logical operators are "~" for logical negation, "&" for logical and, "\|" for
		2289	logical or. The negation has the highest priority. Logical value can be a
		2290	numerical expression, it will be false if it is equal to zero, otherwise it
		2291	will be true. Two numerical expression can be compared using one of the
		2292	following operators to make the logical value: "=" (equal), "<" (less),
		2293	">" (greater), "<=" (less or equal), ">=" (greater or equal),
		2294	"<>" (not equal).
		2295	The "used" operator followed by a symbol name, is the logical value that
		2296	checks whether the given symbol is used somewhere (it returns correct result
		2297	even if symbol is used only after this check). The "defined" operator can be
		2298	followed by any expression, usually just by a single symbol name; it checks
		2299	whether the given expression contains only symbols that are defined in the
		2300	source and accessible from the current position.
		2301	The following simple example uses the "count" constant that should be
		2302	defined somewhere in source:
		2303
		2304
		2305	mov cx,count
		2306	rep movsb
		2307	end if
		2308
		2309
		2310	is greater than 0. The next sample shows more complex conditional structure:
		2311
		2312
		2313	mov cx,count/4
		2314	rep movsd
		2315	else if count>4
		2316	mov cx,count/4
		2317	rep movsd
		2318	mov cx,count mod 4
		2319	rep movsb
		2320	else
		2321	mov cx,count
		2322	rep movsb
		2323	end if
		2324
		2325
		2326	divisible by four, if this condition is not met, the second logical expression,
		2327	which follows the "else if", is evaluated and if it's true, the second block
		2328	of instructions get assembled, otherwise the last block of instructions, which
		2329	follows the line containing only "else", is assembled.
		2330	There are also operators that allow comparison of values being any chains of
		2331	symbols. The "eq" compares two such values whether they are exactly the same.
		2332	The "in" operator checks whether given value is a member of the list of values
		2333	following this operator, the list should be enclosed between "<" and ">"
		2334	characters, its members should be separated with commas. The symbols are
		2335	considered the same when they have the same meaning for the assembler - for
		2336	example "pword" and "fword" for assembler are the same and thus are not
		2337	distinguished by the above operators. In the same way "16 eq 10h" is the true
		2338	condition, however "16 eq 10+4" is not.
		2339	The "eqtype" operator checks whether the two compared values have the same
		2340	structure, and whether the structural elements are of the same type. The
		2341	distinguished types include numerical expressions, individual quoted strings,
		2342	floating point numbers, address expressions (the expressions enclosed in square
		2343	brackets or preceded by "ptr" operator), instruction mnemonics, registers, size
		2344	operators, jump type and code type operators. And each of the special
		2345	characters that act as a separators, like comma or colon, is the separate type
		2346	itself. For example, two values, each one consisting of register name followed
		2347	by comma and numerical expression, will be regarded as of the same type, no
		2348	matter what kind of register and how complicated numerical expression is used;
		2349	with exception for the quoted strings and floating point values, which are the
		2350	special kinds of numerical expressions and are treated as different types. Thus
		2351	"eax,16 eqtype fs,3+7" condition is true, but "eax,16 eqtype eax,1.6" is false.
		2352
		2353
		2354
		2355
		2356
		2357	should be followed by numerical expression specifying number of repeats and
		2358	the instruction to repeat (optionally colon can be used to separate number and
		2359	instruction). When special symbol "%" is used inside the instruction, it is
		2360	equal to the number of current repeat. For example "times 5 db %" will define
		2361	five bytes with values 1, 2, 3, 4, 5. Recursive use of "times" directive is
		2362	also allowed, so "times 3 times % db %" will define six bytes with values
		2363	1, 1, 2, 1, 2, 3.
		2364	"repeat" directive repeats the whole block of instructions. It should be
		2365	followed by numerical expression specifying number of repeats. Instructions
		2366	to repeat are expected in next lines, ended with the "end repeat" directive,
		2367	for example:
		2368
		2369
		2370	mov byte [bx],%
		2371	inc bx
		2372	end repeat
		2373
		2374
		2375	addressed by BX register.
		2376	Number of repeats can be zero, in that case the instructions are not
		2377	assembled at all.
		2378	The "break" directive allows to stop repeating earlier and continue assembly
		2379	from the first line after the "end repeat". Combined with the "if" directive it
		2380	allows to stop repeating under some special condition, like:
		2381
		2382
		2383	repeat 100
		2384	if x/s = s
		2385	break
		2386	end if
		2387	s = (s+x/s)/2
		2388	end repeat
		2389
		2390
		2391	condition specified by the logical expression following it is true. The block
		2392	of instructions to be repeated should end with the "end while" directive.
		2393	Before each repetition the logical expression is evaluated and when its value
		2394	is false, the assembly is continued starting from the first line after the
		2395	"end while". Also in this case the "%" symbol holds the number of current
		2396	repeat. The "break" directive can be used to stop this kind of loop in the same
		2397	way as with "repeat" directive. The previous sample can be rewritten to use the
		2398	"while" instead of "repeat" this way:
		2399
		2400
		2401	while x/s <> s
		2402	s = (s+x/s)/2
		2403	if % = 100
		2404	break
		2405	end if
		2406	end while
		2407
		2408
		2409	order, however they should be closed in the same order in which they were
		2410	started. The "break" directive always stops processing the block that was
		2411	started last with either the "repeat" or "while" directive.
		2412
		2413
		2414
		2415
		2416
		2417	appear in memory. It should be followed by numerical expression specifying
		2418	the address. This directive begins the new addressing space, the following
		2419	code itself is not moved in any way, but all the labels defined within it
		2420	and the value of "$" symbol are affected as if it was put at the given
		2421	address. However it's the responsibility of programmer to put the code at
		2422	correct address at run-time.
		2423	The "load" directive allows to define constant with a binary value loaded
		2424	from the already assembled code. This directive should be followed by the name
		2425	of the constant, then optionally size operator, then "from" operator and a
		2426	numerical expression specifying a valid address in current addressing space.
		2427	The size operator has unusual meaning in this case - it states how many bytes
		2428	(up to 8) have to be loaded to form the binary value of constant. If no size
		2429	operator is specified, one byte is loaded (thus value is in range from 0 to
		2430	255). The loaded data cannot exceed current offset.
		2431	The "store" directive can modify the already generated code by replacing
		2432	some of the previously generated data with the value defined by given
		2433	numerical expression, which follow. The expression can be preceded by the
		2434	optional size operator to specify how large value the expression defines, and
		2435	therefore how much bytes will be stored, if there is no size operator, the
		2436	size of one byte is assumed. Then the "at" operator and the numerical
		2437	expression defining the valid address in current addressing code space, at
		2438	which the given value have to be stored should follow. This is a directive for
		2439	advanced appliances and should be used carefully.
		2440	Both "load" and "store" directives are limited to operate on places in
		2441	current addressing space. The "$$" symbol is always equal to the base address
		2442	of current addressing space, and the "$" symbol is the address of current
		2443	position in that addressing space, therefore these two values define limits
		2444	of the area, where "load" and "store" can operate.
		2445	Combining the "load" and "store" directives allows to do things like encoding
		2446	some of the already generated code. For example to encode the whole code
		2447	generated in current addressing space you can use such block of directives:
		2448
		2449
		2450	load a byte from $$+%-1
		2451	store byte a xor c at $$+%-1
		2452	end repeat
		2453
		2454
		2455	"virtual" defines virtual data at specified address. This data won't be
		2456	included in the output file, but labels defined there can be used in other
		2457	parts of source. This directive can be followed by "at" operator and the
		2458	numerical expression specifying the address for virtual data, otherwise is
		2459	uses current address, the same as "virtual at $". Instructions defining data
		2460	are expected in next lines, ended with "end virtual" directive. The block of
		2461	virtual instructions itself is an independent addressing space, after it's
		2462	ended, the context of previous addressing space is restored.
		2463	The "virtual" directive can be used to create union of some variables, for
		2464	example:
		2465
		2466
		2467	virtual at GDTR
		2468	GDT_limit dw ?
		2469	GDT_address dd ?
		2470	end virtual
		2471
		2472
		2473	It can be also used to define labels for some structures addressed by a
		2474	register, for example:
		2475
		2476
		2477	LDT_limit dw ?
		2478	LDT_address dd ?
		2479	end virtual
		2480
		2481
		2482	to "mov ax,[bx]".
		2483	Declaring defined data values or instructions inside the virtual block would
		2484	also be useful, because the "load" directive can be used to load the values
		2485	from the virtually generated code into a constants. This directive should be
		2486	used after the code it loads but before the virtual block ends, because it can
		2487	only load the values from the same addressing space. For example:
		2488
		2489
		2490	xor eax,eax
		2491	and edx,eax
		2492	load zeroq dword from 0
		2493	end virtual
		2494
		2495
		2496	of the machine code of the instructions defined inside the virtual block.
		2497	This method can be also used to load some binary value from external file.
		2498	For example this code:
		2499
		2500
		2501	file 'a.txt':10h,1
		2502	load char from 0
		2503	end virtual
		2504
		2505
		2506	constant.
		2507	Any of the "section" directives described in 2.4 also begins a new
		2508	addressing space.
		2509
		2510
		2511
		2512
		2513
		2514	be followed by a numerical expression specifying the number of bytes, to the
		2515	multiply of which the current address has to be aligned. The boundary value
		2516	has to be the power of two.
		2517	The "align" directive fills the bytes that had to be skipped to perform the
		2518	alignment with the "nop" instructions and at the same time marks this area as
		2519	uninitialized data, so if it is placed among other uninitialized data that
		2520	wouldn't take space in the output file, the alignment bytes will act the same
		2521	way. If you need to fill the alignment area with some other values, you can
		2522	combine "align" with "virtual" to get the size of alignment needed and then
		2523	create the alignment yourself, like:
		2524
		2525
		2526	align 16
		2527	a = $ - $$
		2528	end virtual
		2529	db a dup 0
		2530
		2531
		2532	alignment and address of the "virtual" block (see previous section), so it is
		2533	equal to the size of needed alignment space.
		2534	"display" directive displays the message at the assembly time. It should
		2535	be followed by the quoted strings or byte values, separated with commas. It
		2536	can be used to display values of some constants, for example:
		2537
		2538
		2539	display 'Current offset is 0x'
		2540	repeat bits/4
		2541	d = '0' + $ shr (bits-%*4) and 0Fh
		2542	if d > '9'
		2543	d = d + 'A'-'9'-1
		2544	end if
		2545	display d
		2546	end repeat
		2547	display 13,10
		2548
		2549
		2550	and converts them into characters for displaying. Note that this won't work if
		2551	the adresses in current addressing space are relocatable (as it might happen
		2552	with PE or object output formats), since only absolute values can be used this
		2553	way. The absolute value may be obtained by calculating the relative address,
		2554	like "$-$$", or "rva $" in case of PE format.
		2555
		2556
		2557
		2558
		2559
		2560	before they get actually defined, it has to predict the values of such labels
		2561	and if there is even a suspicion that prediction failed in at least one case,
		2562	it does one more pass, assembling the whole source, this time doing better
		2563	prediction based on the values the labels got in the previous pass.
		2564	The changing values of labels can cause some instructions to have encodings
		2565	of different length, and this can cause the change in values of labels again.
		2566	And since the labels and constants can also be used inside the expressions that
		2567	affect the behavior of control directives, the whole block of source can be
		2568	processed completely differently during the new pass. Thus the assembler does
		2569	more and more passes, each time trying to do better predictions to approach
		2570	the final solution, when all the values get predicted correctly. It uses
		2571	various method for predicting the values, which has been chosen to allow
		2572	finding in a few passes the solution of possibly smallest length for the most
		2573	of the programs.
		2574	Some of the errors, like the values not fitting in required boundaries, are
		2575	not signaled during those intermediate passes, since it may happen that when
		2576	some of the values are predicted better, these errors will disappear. However
		2577	if assembler meets some illegal syntax construction or unknown instruction, it
		2578	always stops immediately. Also defining some label more than once causes such
		2579	error, because it makes the predictions groundless.
		2580	Only the messages created with the "display" directive during the last
		2581	performed pass get actually displayed. In case when the assembly has been
		2582	stopped due to an error, these messages may reflect the predicted values that
		2583	are not yet resolved correctly.
		2584	The solution may sometimes not exist and in such cases the assembler will
		2585	never manage to make correct predictions - for this reason there is a limit for
		2586	a number of passes, and when assembler reaches this limit, it stops and
		2587	displays the message that it is not able to generate the correct output.
		2588	Consider the following example:
		2589
		2590
		2591	alpha:
		2592	end if
		2593
		2594
		2595	could be calculated in this place, what in this case means that the "alpha"
		2596	label is defined somewhere. But the above block causes this label to be defined
		2597	only when the value given by "defined" operator is false, what leads to an
		2598	antynomy and makes it impossible to resolve such code. When processing the "if"
		2599	directive assembler has to predict whether the "alpha" label will be defined
		2600	somewhere (it wouldn't have to predict only if the label was already defined
		2601	earlier in this pass), and whatever the prediction is, the opposite always
		2602	happens. Thus the assembly will fail, unless the "alpha" label is defined
		2603	somewhere in source preceding the above block of instructions - in such case,
		2604	as it was already noted, the prediction is not needed and the block will just
		2605	get skipped.
		2606	The above sample might have been written as a try to define the label only
		2607	when it was not yet defined. It fails, because the "defined" operator does
		2608	check whether the label is defined anywhere, and this includes the definition
		2609	inside this conditionally processed block. However adding some additional
		2610	condition may make it possible to get it resolved:
		2611
		2612
		2613	alpha:
		2614	@@:
		2615	end if
		2616
		2617
		2618	following it, so the above sample would mean the same if any unique name was
		2619	used instead of the anonymous label. When "alpha" is not defined in any other
		2620	place in source, the only possible solution is when this block gets defined,
		2621	and this time this doesn't lead to the antynomy, because of the anonymous
		2622	label which makes this block self-establishing. To better understand this,
		2623	look at the blocks that has nothing more than this self-establishing:
		2624
		2625
		2626	@@:
		2627	end if
		2628
		2629
		2630	cases when this block gets processed or not are equally correct. Which one of
		2631	those two solutions we get depends on the algorithm on the assembler, in case
		2632	of flat assembler - on the algorithm of predictions. Back to the previous
		2633	sample, when "alpha" is not defined anywhere else, the condition for "if" block
		2634	cannot be false, so we are left with only one possible solution, and we can
		2635	hope the assembler will arrive at it. On the other hand, when "alpha" is
		2636	defined in some other place, we've got two possible solutions again, but one of
		2637	them causes "alpha" to be defined twice, and such an error causes assembler to
		2638	abort the assembly immediately, as this is the kind of error that deeply
		2639	disturbs the process of resolving. So we can get such source either correctly
		2640	resolved or causing an error, and what we get may depend on the internal
		2641	choices made by the assembler.
		2642	However there are some facts about such choices that are certain. When
		2643	assembler has to check whether the given symbol is defined and it was already
		2644	defined in the current pass, no prediction is needed - it was already noted
		2645	above. And when the given symbol has been defined never before, including all
		2646	the already finished passes, the assembler predicts it to be not defined.
		2647	Knowing this, we can expect that the simple self-establishing block shown
		2648	above will not be assembled at all and that the previous sample will resolve
		2649	correctly when "alpha" is defined somewhere before our conditional block,
		2650	while it will itself define "alpha" when it's not already defined earlier, thus
		2651	potentially causing the error because of double definition if the "alpha" is
		2652	also defined somewhere later.
		2653	The "used" operator may be expected to behave in a similar manner in
		2654	analogous cases, however any other kinds of predictions my not be so simple and
		2655	you should never rely on them this way.
		2656
		2657
		2658
		2659
		2660
		2661	and therefore are not affected by the control directives. At this time also
		2662	all comments are stripped out.
		2663
		2664
		2665
		2666
		2667
		2668	it is used. It should be followed by the quoted name of file that should be
		2669	included, for example:
		2670
		2671
		2672
		2673
		2674	to the line containing the "include" directive. There are no limits to the
		2675	number of included files as long as they fit in memory.
		2676	The quoted path can contain environment variables enclosed within "%"
		2677	characters, they will be replaced with their values inside the path, both the
		2678	"\" and "/" characters are allowed as a path separators. If no absolute path
		2679	is given, the file is first searched for in the directory containing file
		2680	which included it and when it's not found there, in the directory containing
		2681	the main source file (the one specified in command line). These rules concern
		2682	also paths given with the "file" directive.
		2683
		2684
		2685
		2686
		2687
		2688	assembly process they are replaced with their values everywhere in source
		2689	lines after their definitions, and anything can become their values.
		2690	The definition of symbolic constant consists of name of the constant
		2691	followed by the "equ" directive. Everything that follows this directive will
		2692	become the value of constant. If the value of symbolic constant contains
		2693	other symbolic constants, they are replaced with their values before assigning
		2694	this value to the new constant. For example:
		2695
		2696
		2697	NULL equ d 0
		2698	d equ edx
		2699
		2700
		2701	the value of "d" is "edx". So, for example, "push NULL" will be assembled as
		2702	"push dword 0" and "push d" will be assembled as "push edx". And if then the
		2703	following line was put:
		2704
		2705
		2706
		2707
		2708	lists of symbols can be defined.
		2709	"restore" directive allows to get back previous value of redefined symbolic
		2710	constant. It should be followed by one more names of symbolic constants,
		2711	separated with commas. So "restore d" after the above definitions will give
		2712	"d" constant back the value "edx", the second one will restore it to value
		2713	"dword", and one more will revert "d" to original meaning as if no such
		2714	constant was defined. If there was no constant defined of given name,
		2715	"restore" won't cause an error, it will be just ignored.
		2716	Symbolic constant can be used to adjust the syntax of assembler to personal
		2717	preferences. For example the following set of definitions provides the handy
		2718	shortcuts for all the size operators:
		2719
		2720
		2721	w equ word
		2722	d equ dword
		2723	p equ pword
		2724	f equ fword
		2725	q equ qword
		2726	t equ tword
		2727	x equ dqword
		2728
		2729
		2730	allow the syntax with "offset" word before any address value:
		2731
		2732
		2733
		2734
		2735	copying the offset of "char" variable into "ax" register, because "offset" is
		2736	replaced with an empty value, and therefore ignored.
		2737	The "define" directive followed by the name of constant and then the value,
		2738	is the alternative way of defining symbolic constant. The only difference
		2739	between "define" and "equ" is that "define" assigns the value as it is, it does
		2740	not replace the symbolic constants with their values inside it.
		2741	Symbolic constants can also be defined with the "fix" directive, which has
		2742	the same syntax as "equ", but defines constants of high priority - they are
		2743	replaced with their symbolic values even before processing the preprocessor
		2744	directives and macroinstructions, the only exception is "fix" directive
		2745	itself, which has the highest possible priority, so it allows redefinition of
		2746	constants defined this way.
		2747	The "fix" directive can be used for syntax adjustments related to directives
		2748	of preprocessor, what cannot be done with "equ" directive. For example:
		2749
		2750
		2751
		2752
		2753	with "equ" directive wouldn't give such result, as standard symbolic constants
		2754	are replaced with their values after searching the line for preprocessor
		2755	directives.
		2756
		2757
		2758
		2759
		2760
		2761	macroinstructions, using which can greatly simplify the process of
		2762	programming. In its simplest form it's similar to symbolic constant
		2763	definition. For example the following definition defines a shortcut for the
		2764	"test al,0xFF" instruction:
		2765
		2766
		2767
		2768
		2769	contents enclosed between the "{" and "}" characters. You can use "tst"
		2770	instruction anywhere after this definition and it will be assembled as
		2771	"test al,0xFF". Defining symbolic constant "tst" of that value would give the
		2772	similar result, but the difference is that the name of macroinstruction is
		2773	recognized only as an instruction mnemonic. Also, macroinstructions are
		2774	replaced with corresponding code even before the symbolic constants are
		2775	replaced with their values. So if you define macroinstruction and symbolic
		2776	constant of the same name, and use this name as an instruction mnemonic, it
		2777	will be replaced with the contents of macroinstruction, but it will be
		2778	replaced with value if symbolic constant if used somewhere inside the
		2779	operands.
		2780	The definition of macroinstruction can consist of many lines, because
		2781	"{" and "}" characters don't have to be in the same line as "macro" directive.
		2782	For example:
		2783
		2784
		2785	{
		2786	xor al,al
		2787	stosb
		2788	}
		2789
		2790
		2791	instructions anywhere it's used.
		2792	Like instructions which needs some number of operands, the macroinstruction
		2793	can be defined to need some number of arguments separated with commas. The
		2794	names of needed argument should follow the name of macroinstruction in the
		2795	line of "macro" directive and should be separated with commas if there is more
		2796	than one. Anywhere one of these names occurs in the contents of
		2797	macroinstruction, it will be replaced with corresponding value, provided when
		2798	the macroinstruction is used. Here is an example of a macroinstruction that
		2799	will do data alignment for binary output format:
		2800
		2801
		2802
		2803
		2804	defined, it will be replaced with contents of this macroinstruction, and the
		2805	"value" will there become 4, so the result will be "rb (4-1)-($+4-1) mod 4".
		2806	If a macroinstruction is defined that uses an instruction with the same name
		2807	inside its definition, the previous meaning of this name is used. Useful
		2808	redefinition of macroinstructions can be done in that way, for example:
		2809
		2810
		2811	{
		2812	if op1 in & op2 in
		2813	push op2
		2814	pop op1
		2815	else
		2816	mov op1,op2
		2817	end if
		2818	}
		2819
		2820
		2821	operands to be segment registers. For example "mov ds,es" will be assembled as
		2822	"push es" and "pop ds". In all other cases the standard "mov" instruction will
		2823	be used. The syntax of this "mov" can be extended further by defining next
		2824	macroinstruction of that name, which will use the previous macroinstruction:
		2825
		2826
		2827	{
		2828	if op3 eq
		2829	mov op1,op2
		2830	else
		2831	mov op1,op2
		2832	mov op2,op3
		2833	end if
		2834	}
		2835
		2836
		2837	operands only, because when macroinstruction is given less arguments than it
		2838	needs, the rest of arguments will have empty values. When three operands are
		2839	given, this macroinstruction will become two macroinstructions of the previous
		2840	definition, so "mov es,ds,dx" will be assembled as "push ds", "pop es" and
		2841	"mov ds,dx".
		2842	By placing the "*" after the name of argument you can mark the argument as
		2843	required - preprocessor won't allow it to have an empty value. For example the
		2844	above macroinstruction could be declared as "macro mov op1,op2,op3" to make
		2845	sure that first two arguments will always have to be given some non empty
		2846	values.
		2847	When it's needed to provide macroinstruction with argument that contains
		2848	some commas, such argument should be enclosed between "<" and ">" characters.
		2849	If it contains more than one "<" character, the same number of ">" should be
		2850	used to tell that the value of argument ends.
		2851	"purge" directive allows removing the last definition of specified
		2852	macroinstruction. It should be followed by one or more names of
		2853	macroinstructions, separated with commas. If such macroinstruction has not
		2854	been defined, you won't get any error. For example after having the syntax of
		2855	"mov" extended with the macroinstructions defined above, you can disable
		2856	syntax with three operands back by using "purge mov" directive. Next
		2857	"purge mov" will disable also syntax for two operands being segment registers,
		2858	and all the next such directives will do nothing.
		2859	If after the "macro" directive you enclose some group of arguments' names in
		2860	square brackets, it will allow giving more values for this group of arguments
		2861	when using that macroinstruction. Any more argument given after the last
		2862	argument of such group will begin the new group and will become the first
		2863	argument of it. That's why after closing the square bracket no more argument
		2864	names can follow. The contents of macroinstruction will be processed for each
		2865	such group of arguments separately. The simplest example is to enclose one
		2866	argument name in square brackets:
		2867
		2868
		2869	{
		2870	mov al,char
		2871	stosb
		2872	}
		2873
		2874
		2875	will be processed into these two instructions separately. For example
		2876	"stoschar 1,2,3" will be assembled as the following instructions:
		2877
		2878
		2879	stosb
		2880	mov al,2
		2881	stosb
		2882	mov al,3
		2883	stosb
		2884
		2885
		2886	macroinstructions. "local" directive defines local names, which will be
		2887	replaced with unique values each time the macroinstruction is used. It should
		2888	be followed by names separated with commas. If the name given as parameter to
		2889	"local" directive begins with a dot or two dots, the unique labels generated
		2890	by each evaluation of macroinstruction will have the same properties.
		2891	This directive is usually needed for the constants or labels that
		2892	macroinstruction defines and uses internally. For example:
		2893
		2894
		2895	{
		2896	local move
		2897	move:
		2898	lodsb
		2899	stosb
		2900	test al,al
		2901	jnz move
		2902	}
		2903
		2904
		2905	in its instructions, so you won't get an error you normally get when some
		2906	label is defined more than once.
		2907	"forward", "reverse" and "common" directives divide macroinstruction into
		2908	blocks, each one processed after the processing of previous is finished. They
		2909	differ in behavior only if macroinstruction allows multiple groups of
		2910	arguments. Block of instructions that follows "forward" directive is processed
		2911	for each group of arguments, from first to last - exactly like the default
		2912	block (not preceded by any of these directives). Block that follows "reverse"
		2913	directive is processed for each group of argument in reverse order - from last
		2914	to first. Block that follows "common" directive is processed only once,
		2915	commonly for all groups of arguments. Local name defined in one of the blocks
		2916	is available in all the following blocks when processing the same group of
		2917	arguments as when it was defined, and when it is defined in common block it is
		2918	available in all the following blocks not depending on which group of
		2919	arguments is processed.
		2920	Here is an example of macroinstruction that will create the table of
		2921	addresses to strings followed by these strings:
		2922
		2923
		2924	{
		2925	common
		2926	label name dword
		2927	forward
		2928	local label
		2929	dd label
		2930	forward
		2931	label db string,0
		2932	}
		2933
		2934
		2935	of addresses, next arguments should be the strings. First block is processed
		2936	only once and defines the label, second block for each string declares its
		2937	local name and defines the table entry holding the address to that string.
		2938	Third block defines the data of each string with the corresponding label.
		2939	The directive starting the block in macroinstruction can be followed by the
		2940	first instruction of this block in the same line, like in the following
		2941	example:
		2942
		2943
		2944	{
		2945	reverse push arg
		2946	common call proc
		2947	}
		2948
		2949
		2950	convention, arguments are pushed on stack in the reverse order. For example
		2951	"stdcall foo,1,2,3" will be assembled as:
		2952
		2953
		2954	push 2
		2955	push 1
		2956	call foo
		2957
		2958
		2959	of the arguments enclosed in square brackets or local name defined in the
		2960	block following "forward" or "reverse" directive) and is used in block
		2961	following the "common" directive, it will be replaced with all of its values,
		2962	separated with commas. For example the following macroinstruction will pass
		2963	all of the additional arguments to the previously defined "stdcall"
		2964	macroinstruction:
		2965
		2966
		2967	{ common stdcall [proc],arg }
		2968
		2969
		2970	procedure using STDCALL convention.
		2971	Inside macroinstruction also special operator "#" can be used. This
		2972	operator causes two names to be concatenated into one name. It can be useful,
		2973	because it's done after the arguments and local names are replaced with their
		2974	values. The following macroinstruction will generate the conditional jump
		2975	according to the "cond" argument:
		2976
		2977
		2978	{
		2979	cmp op1,op2
		2980	j#cond label
		2981	}
		2982
		2983
		2984	"jae exit" instructions.
		2985	The "#" operator can be also used to concatenate two quoted strings into one.
		2986	Also conversion of name into a quoted string is possible, with the "`" operator,
		2987	which likewise can be used inside the macroinstruction. It convert the name
		2988	that follows it into a quoted string - but note, that when it is followed by
		2989	a macro argument which is being replaced with value containing more than one
		2990	symbol, only the first of them will be converted, as the "`" operator converts
		2991	only one symbol that immediately follows it. Here's an example of utilizing
		2992	those two features:
		2993
		2994
		2995	{
		2996	label name
		2997	if ~ used name
		2998	display `name # " is defined but not used.",13,10
		2999	end if
		3000	}
		3001
		3002
		3003	you with the message, informing to which label it applies.
		3004	To make macroinstruction behaving differently when some of the arguments are
		3005	of some special type, for example a quoted strings, you can use "eqtype"
		3006	comparison operator. Here's an example of utilizing it to distinguish a
		3007	quoted string from an other argument:
		3008
		3009
		3010	{
		3011	if arg eqtype ""
		3012	local str
		3013	jmp @f
		3014	str db arg,0Dh,0Ah,24h
		3015	@@:
		3016	mov dx,str
		3017	else
		3018	mov dx,arg
		3019	end if
		3020	mov ah,9
		3021	int 21h
		3022	}
		3023
		3024
		3025	argument of this macro is some number, label, or variable, the string from
		3026	that address is displayed, but when the argument is a quoted string, the
		3027	created code will display that string followed by the carriage return and
		3028	line feed.
		3029	It is also possible to put a declaration of macroinstruction inside another
		3030	macroinstruction, so one macro can define another, but there is a problem
		3031	with such definitions caused by the fact, that "}" character cannot occur
		3032	inside the macroinstruction, as it always means the end of definition. To
		3033	overcome this problem, the escaping of symbols inside macroinstruction can be
		3034	used. This is done by placing one or more backslashes in front of any other
		3035	symbol (even the special character). Preprocessor sees such sequence as a
		3036	single symbol, but each time it meets such symbol during the macroinstruction
		3037	processing, it cuts the backslash character from the front of it. For example
		3038	"\{" is treated as single symbol, but during processing of the macroinstruction
		3039	it becomes the "{" symbol. This allows to put one definition of
		3040	macroinstruction inside another:
		3041
		3042
		3043	{
		3044	macro instr op1,op2,op3
		3045	\{
		3046	if op3 eq
		3047	instr op1,op2
		3048	else
		3049	instr op1,op2
		3050	instr op2,op3
		3051	end if
		3052	\}
		3053	}
		3054
		3055
		3056	ext sub
		3057
		3058
		3059	become the "{" and "}" symbols. So when the "ext add" is processed, the
		3060	contents of macro becomes valid definition of a macroinstruction and this way
		3061	the "add" macro becomes defined. In the same way "ext sub" defines the "sub"
		3062	macro. The use of "\{" symbol wasn't really necessary here, but is done this
		3063	way to make the definition more clear.
		3064	If some directives specific to macroinstructions, like "local" or "common"
		3065	are needed inside some macro embedded this way, they can be escaped in the same
		3066	way. Escaping the symbol with more than one backslash is also allowed, which
		3067	allows multiple levels of nesting the macroinstruction definitions.
		3068	The another technique for defining one macroinstruction by another is to
		3069	use the "fix" directive, which becomes useful when some macroinstruction only
		3070	begins the definition of another one, without closing it. For example:
		3071
		3072
		3073	{
		3074	common macro params {
		3075	}
		3076
		3077
		3078	ENDM fix }
		3079
		3080
		3081
		3082
		3083	mov al,char
		3084	stosb
		3085	ENDM
		3086
		3087
		3088	directive, because only the prioritized symbolic constants are processed before
		3089	the preprocessor looks for the "}" character while defining the macro. This
		3090	might be a problem if one needed to perform some additional tasks one the end
		3091	of such definition, but there is one more feature which helps in such cases.
		3092	Namely it is possible to put any directive, instruction or macroinstruction
		3093	just after the "}" character that ends the macroinstruction and it will be
		3094	processed in the same way as if it was put in the next line.
		3095
		3096
		3097
		3098
		3099
		3100	define data structures. Macroinstruction defined using the "struc" directive
		3101	must be preceded by a label (like the data definition directive) when it's
		3102	used. This label will be also attached at the beginning of every name starting
		3103	with dot in the contents of macroinstruction. The macroinstruction defined
		3104	using the "struc" directive can have the same name as some other
		3105	macroinstruction defined using the "macro" directive, structure
		3106	macroinstruction won't prevent the standard macroinstruction being processed
		3107	when there is no label before it and vice versa. All the rules and features
		3108	concerning standard macroinstructions apply to structure macroinstructions.
		3109	Here is the sample of structure macroinstruction:
		3110
		3111
		3112	{
		3113	.x dw x
		3114	.y dw y
		3115	}
		3116
		3117
		3118	two variables: "my.x" with value 7 and "my.y" with value 11.
		3119	If somewhere inside the definition of structure the name consisting of a
		3120	single dot it found, it is replaced by the name of the label for the given
		3121	instance of structure and this label will not be defined automatically in
		3122	such case, allowing to completely customize the definition. The following
		3123	example utilizes this feature to extend the data definition directive "db"
		3124	with ability to calculate the size of defined data:
		3125
		3126
		3127	{
		3128	common
		3129	. db data
		3130	.size = $ - .
		3131	}
		3132
		3133
		3134	constant, equal to the size of defined data in bytes.
		3135	Defining data structures addressed by registers or absolute values should be
		3136	done using the "virtual" directive with structure macroinstruction
		3137	(see 2.2.4).
		3138	"restruc" directive removes the last definition of the structure, just like
		3139	"purge" does with macroinstructions and "restore" with symbolic constants.
		3140	It also has the same syntax - should be followed by one or more names of
		3141	structure macroinstructions, separated with commas.
		3142
		3143
		3144
		3145
		3146
		3147	amount of duplicates of the block enclosed with braces. The basic syntax is
		3148	"rept" directive followed by number (it cannot be an expression, since
		3149	preprocessor doesn't do calculations, if you need repetitions based on values
		3150	calculated by assembler, use one of the code repeating directives that are
		3151	processed by assembler, see 2.2.3), and then block of source enclosed between
		3152	the "{" and "}" characters. The simplest example:
		3153
		3154
		3155
		3156
		3157	is defined in the same way as for the standard macroinstruction and any
		3158	special operators and directives which can be used only inside
		3159	macroinstructions are also allowed here. When the given count is zero, the
		3160	block is simply skipped, as if you defined macroinstruction but never used
		3161	it. The number of repetitions can be followed by the name of counter symbol,
		3162	which will get replaced symbolically with the number of duplicate currently
		3163	generated. So this:
		3164
		3165
		3166	{
		3167	byte#counter db counter
		3168	}
		3169
		3170
		3171
		3172
		3173	byte2 db 2
		3174	byte3 db 3
		3175
		3176
		3177	to process multiple groups of arguments for macroinstructions, so directives
		3178	like "forward", "common" and "reverse" can be used in their usual meaning.
		3179	Thus such macroinstruction:
		3180
		3181
		3182
		3183
		3184	same way as inside macroinstruction with multiple groups of arguments, so:
		3185
		3186
		3187	{
		3188	local label
		3189	label: loop label
		3190	}
		3191
		3192
		3193	The counter symbol by default counts from 1, but you can declare different
		3194	base value by placing the number preceded by colon immediately after the name
		3195	of counter. For example:
		3196
		3197
		3198
		3199
		3200	You can define multiple counters separated with commas, and each one can have
		3201	different base.
		3202	The "irp" directive iterates the single argument through the given list of
		3203	parameters. The syntax is "irp" followed by the argument name, then the comma
		3204	and then the list of parameters. The parameters are specified in the same
		3205	way like in the invocation of standard macroinstruction, so they have to be
		3206	separated with commas and each one can be enclosed with the "<" and ">"
		3207	characters. Also the name of argument may be followed by "*" to mark that it
		3208	cannot get an empty value. Such block:
		3209
		3210
		3211	{ db value }
		3212
		3213
		3214
		3215
		3216	db 3
		3217	db 5
		3218
		3219
		3220	be followed by the argument name, then the comma and then the sequence of any
		3221	symbols. Each symbol in this sequence, no matter whether it is the name
		3222	symbol, symbol character or quoted string, becomes an argument value for one
		3223	iteration. If there are no symbols following the comma, no iteration is done
		3224	at all. This example:
		3225
		3226
		3227	{ xor reg,reg }
		3228
		3229
		3230
		3231
		3232	xor bx,bx
		3233	xor ecx,ecx
		3234
		3235
		3236	the same way as any macroinstructions, so operators and directives specific
		3237	to macroinstructions may be freely used also in this case.
		3238
		3239
		3240
		3241
		3242
		3243	to assembler only when the given sequence of symbols matches the specified
		3244	pattern. The pattern comes first, ended with comma, then the symbols that have
		3245	to be matched with the pattern, and finally the block of source, enclosed
		3246	within braces as macroinstruction.
		3247	There are the few rules for building the expression for matching, first is
		3248	that any of symbol characters and any quoted string should be matched exactly
		3249	as is. In this example:
		3250
		3251
		3252	match +,- { include 'second.inc' }
		3253
		3254
		3255	pattern, and the second file won't be included, since there is no match.
		3256	To match any other symbol literally, it has to be preceded by "=" character
		3257	in the pattern. Also to match the "=" character itself, or the comma, the
		3258	"==" and "=," constructions have to be used. For example the "=a==" pattern
		3259	will match the "a=" sequence.
		3260	If some name symbol is placed in the pattern, it matches any sequence
		3261	consisting of at least one symbol and then this name is replaced with the
		3262	matched sequence everywhere inside the following block, analogously to the
		3263	parameters of macroinstruction. For instance:
		3264
		3265
		3266	{ dw a,b-a }
		3267
		3268
		3269	as few symbols as possible, leaving the rest for the following ones, so in
		3270	this case:
		3271
		3272
		3273
		3274
		3275	matched with "b". But in this case:
		3276
		3277
		3278
		3279
		3280	at all.
		3281	The block of source defined by match is processed in the same way as any
		3282	macroinstruction, so any operators specific to macroinstructions can be used
		3283	also in this case.
		3284	What makes "match" directive more useful is the fact, that it replaces the
		3285	symbolic constants with their values in the matched sequence of symbols (that
		3286	is everywhere after comma up to the beginning of the source block) before
		3287	performing the match. Thanks to this it can be used for example to process
		3288	some block of source under the condition that some symbolic constant has the
		3289	given value, like:
		3290
		3291
		3292
		3293
		3294	defined with value "TRUE".
		3295
		3296
		3297
		3298
		3299
		3300	the order in which they are processed. As it was already noted, the highest
		3301	priority has the "fix" directive and the replacements defined with it. This
		3302	is done completely before doing any other preprocessing, therefore this
		3303	piece of source:
		3304
		3305
		3306	macro empty
		3307	V
		3308	V fix }
		3309	V
		3310
		3311
		3312	that the "fix" directive and prioritized symbolic constants are processed in
		3313	a separate stage, and all other preprocessing is done after on the resulting
		3314	source.
		3315	The standard preprocessing that comes after, on each line begins with
		3316	recognition of the first symbol. It begins with checking for the preprocessor
		3317	directives, and when none of them is detected, preprocessor checks whether the
		3318	first symbol is macroinstruction. If no macroinstruction is found, it moves
		3319	to the second symbol of line, and again begins with checking for directives,
		3320	which in this case is only the "equ" directive, as this is the only one that
		3321	occurs as the second symbol in line. If there's no directive, the second
		3322	symbol is checked for the case of structure macroinstruction and when none
		3323	of those checks gives the positive result, the symbolic constants are replaced
		3324	with their values and such line is passed to the assembler.
		3325	To see it on the example, assume that there is defined the macroinstruction
		3326	called "foo" and the structure macroinstruction called "bar". Those lines:
		3327
		3328
		3329	foo bar
		3330
		3331
		3332	the meaning of the first symbol overrides the meaning of second one.
		3333	The macroinstructions generate the new lines from their definition blocks,
		3334	replacing the parameters with their values and then processing the "#" and "`"
		3335	operators. The conversion operator has the higher priority than concatenation.
		3336	After this is completed, the newly generated line goes through the standard
		3337	preprocessing, as described above.
		3338	Though the symbolic constants are usually only replaced in the lines, where
		3339	no preprocessor directives nor macroinstructions has been found, there are some
		3340	special cases where those replacements are performed in the parts of lines
		3341	containing directives. First one is the definition of symbolic constant, where
		3342	the replacements are done everywhere after the "equ" keyword and the resulting
		3343	value is then assigned to the new constant (see 2.3.2). The second such case
		3344	is the "match" directive, where the replacements are done in the symbols
		3345	following comma before matching them with pattern. These features can be used
		3346	for example to maintain the lists, like this set of definitions:
		3347
		3348
		3349
		3350
		3351	{
		3352	match any, list \{ list equ list,item \}
		3353	match , list \{ list equ item \}
		3354	}
		3355
		3356
		3357	macroinstruction can be used to add the new items into this list, separating
		3358	them with commas. The first match in this macroinstruction occurs only when
		3359	the value of list is not empty (see 2.3.6), in such case the new value for the
		3360	list is the previous one with the comma and the new item appended at the end.
		3361	The second match happens only when the list is still empty, and in such case
		3362	the list is defined to contain just the new item. So starting with the empty
		3363	list, the "append 1" would define "list equ 1" and the "append 2" following it
		3364	would define "list equ 1,2". One might then need to use this list as the
		3365	parameters to some macroinstruction. But it cannot be done directly - if "foo"
		3366	is the macroinstruction, then "foo list" would just pass the "list" symbol
		3367	as a parameter to macro, since symbolic constants are not unrolled at this
		3368	stage. For this purpose again "match" directive comes in handy:
		3369
		3370
		3371
		3372
		3373	then replaced with matched value when generating the new lines defined by the
		3374	block enclosed with braces. So if the "list" had value "1,2", the above line
		3375	would generate the line containing "foo 1,2", which would then go through the
		3376	standard preprocessing.
		3377	There is one more special case - when preprocessor goes to checking the
		3378	second symbol in the line and it happens to be the colon character (what is
		3379	then interpreted by assembler as definition of a label), it stops in this
		3380	place and finishes the preprocessing of the first symbol (so if it's the
		3381	symbolic constant it gets unrolled) and if it still appears to be the label,
		3382	it performs the standard preprocessing starting from the place after the
		3383	label. This allows to place preprocessor directives and macroinstructions
		3384	after the labels, analogously to the instructions and directives processed
		3385	by assembler, like:
		3386
		3387
		3388
		3389
		3390	it is the symbolic constant with empty value), only replacing of the symbolic
		3391	constants is continued for the rest of line.
		3392	It should be remembered, that the jobs performed by preprocessor are the
		3393	preliminary operations on the texts symbols, that are done in a simple
		3394	single pass before the main process of assembly. The text that is the
		3395	result of preprocessing is passed to assembler, and it then does its
		3396	multiple passes on it. Thus the control directives, which are recognized and
		3397	processed only by the assembler - as they are dependent on the numerical
		3398	values that may even vary between passes - are not recognized in any way by
		3399	the preprocessor and have no effect on the preprocessing. Consider this
		3400	example source:
		3401
		3402
		3403	a = 1
		3404	b equ 2
		3405	end if
		3406	dd b
		3407
		3408
		3409	preprocessor is the "equ", which defines symbolic constant "b", so later
		3410	in the source the "b" symbol is replaced with the value "2". Except for this
		3411	replacement, the other lines are passes unchanged to the assembler. So
		3412	after preprocessing the above source becomes:
		3413
		3414
		3415	a = 1
		3416	end if
		3417	dd 2
		3418
		3419
		3420	the "a" constant doesn't get defined. However symbolic constant "b" was
		3421	processed normally, even though its definition was put just next to the one
		3422	of "a". So because of the possible confusion you should be very careful
		3423	every time when mixing the features of preprocessor and assembler - always
		3424	try to imagine what your source will become after the preprocessing, and
		3425	thus what the assembler will see and do its multiple passes on.
		3426
		3427
		3428
		3429
		3430
		3431	purpose of controlling the format of generated code.
		3432	"format" directive followed by the format identifier allows to select the
		3433	output format. This directive should be put at the beginning of the source.
		3434	Default output format is a flat binary file, it can also be selected by using
		3435	"format binary" directive.
		3436	"use16" and "use32" directives force the assembler to generate 16-bit or
		3437	32-bit code, omitting the default setting for selected output format. "use64"
		3438	enables generating the code for the long mode of x86-64 processors.
		3439	Below are described different output formats with the directives specific to
		3440	these formats.
		3441
		3442
		3443
		3444
		3445
		3446	setting for this format is 16-bit.
		3447	"segment" directive defines a new segment, it should be followed by label,
		3448	which value will be the number of defined segment, optionally "use16" or
		3449	"use32" word can follow to specify whether code in this segment should be
		3450	16-bit or 32-bit. The origin of segment is aligned to paragraph (16 bytes).
		3451	All the labels defined then will have values relative to the beginning of this
		3452	segment.
		3453	"entry" directive sets the entry point for MZ executable, it should be
		3454	followed by the far address (name of segment, colon and the offset inside
		3455	segment) of desired entry point.
		3456	"stack" directive sets up the stack for MZ executable. It can be followed by
		3457	numerical expression specifying the size of stack to be created automatically
		3458	or by the far address of initial stack frame when you want to set up the stack
		3459	manually. When no stack is defined, the stack of default size 4096 bytes will
		3460	be created.
		3461	"heap" directive should be followed by a 16-bit value defining maximum size
		3462	of additional heap in paragraphs (this is heap in addition to stack and
		3463	undefined data). Use "heap 0" to always allocate only memory program really
		3464	needs. Default size of heap is 65535.
		3465
		3466
		3467
		3468
		3469
		3470	can be followed by additional format settings: use "console", "GUI" or
		3471	"native" operator selects the target subsystem (floating point value
		3472	specifying subsystem version can follow), "DLL" marks the output file as a
		3473	dynamic link library. Then can follow the "at" operator and the numerical
		3474	expression specifying the base of PE image and then optionally "on" operator
		3475	followed by the quoted string containing file name selects custom MZ stub for
		3476	PE program (when specified file is not a MZ executable, it is treated as a
		3477	flat binary executable file and converted into MZ format). The default code
		3478	setting for this format is 32-bit. The example of fully featured PE format
		3479	declaration:
		3480
		3481
		3482
		3483
		3484	"PE" in the format declaration, in such case the long mode code is generated
		3485	by default.
		3486	"section" directive defines a new section, it should be followed by quoted
		3487	string defining the name of section, then one or more section flags can
		3488	follow. Available flags are: "code", "data", "readable", "writeable",
		3489	"executable", "shareable", "discardable", "notpageable". The origin of section
		3490	is aligned to page (4096 bytes). Example declaration of PE section:
		3491
		3492
		3493
		3494
		3495	to mark the whole section as a special data, possible identifiers are
		3496	"export", "import", "resource" and "fixups". If the section is marked to
		3497	contain fixups, they are generated automatically and no more data needs to be
		3498	defined in this section. Also resource data can be generated automatically
		3499	from the resource file, it can be achieved by writing the "from" operator and
		3500	quoted file name after the "resource" identifier. Below are the examples of
		3501	sections containing some special PE data:
		3502
		3503
		3504	section '.rsrc' data readable resource from 'my.res'
		3505
		3506
		3507	entry point should follow.
		3508	"stack" directive sets up the size of stack for Portable Executable, value
		3509	of stack reserve size should follow, optionally value of stack commit
		3510	separated with comma can follow. When stack is not defined, it's set by
		3511	default to size of 4096 bytes.
		3512	"heap" directive chooses the size of heap for Portable Executable, value of
		3513	heap reserve size should follow, optionally value of heap commit separated
		3514	with comma can follow. When no heap is defined, it is set by default to size
		3515	of 65536 bytes, when size of heap commit is unspecified, it is by default set
		3516	to zero.
		3517	"data" directive begins the definition of special PE data, it should be
		3518	followed by one of the data identifiers ("export", "import", "resource" or
		3519	"fixups") or by the number of data entry in PE header. The data should be
		3520	defined in next lines, ended with "end data" directive. When fixups data
		3521	definition is chosen, they are generated automatically and no more data needs
		3522	to be defined there. The same applies to the resource data when the "resource"
		3523	identifier is followed by "from" operator and quoted file name - in such case
		3524	data is taken from the given resource file.
		3525	The "rva" operator can be used inside the numerical expressions to obtain
		3526	the RVA of the item addressed by the value it is applied to.
		3527
		3528
		3529
		3530
		3531
		3532	directive whether you want to create classic or Microsoft's COFF file. The
		3533	default code setting for this format is 32-bit. To create the file in
		3534	Microsoft's COFF format for the x86-64 architecture, use "format MS64 COFF"
		3535	setting, in such case long mode code is generated by default.
		3536	"section" directive defines a new section, it should be followed by quoted
		3537	string defining the name of section, then one or more section flags can
		3538	follow. Section flags available for both COFF variants are "code" and "data",
		3539	while "readable", "writeable", "executable", "shareable", "discardable",
		3540	"notpageable", "linkremove" and "linkinfo" are flags available only with
		3541	Microsoft COFF variant.
		3542	By default section is aligned to double word (four bytes), in case of
		3543	Microsoft COFF variant other alignment can be specified by providing the
		3544	"align" operator followed by alignment value (any power of two up to 8192)
		3545	among the section flags.
		3546	"extrn" directive defines the external symbol, it should be followed by the
		3547	name of symbol and optionally the size operator specifying the size of data
		3548	labeled by this symbol. The name of symbol can be also preceded by quoted
		3549	string containing name of the external symbol and the "as" operator.
		3550	Some example declarations of external symbols:
		3551
		3552
		3553	extrn '__imp__MessageBoxA@16' as MessageBox:dword
		3554
		3555
		3556	followed by the name of symbol, optionally it can be followed by the "as"
		3557	operator and the quoted string containing name under which symbol should be
		3558	available as public. Some examples of public symbols declarations:
		3559
		3560
		3561	public start as '_start'
		3562
		3563
		3564
		3565
		3566	setting for this format is 32-bit. To create ELF file for the x86-64
		3567	architecture, use "format ELF64" directive, in such case the long mode code is
		3568	generated by default.
		3569	"section" directive defines a new section, it should be followed by quoted
		3570	string defining the name of section, then can follow one or both of the
		3571	"executable" and "writeable" flags, optionally also "align" operator followed
		3572	by the number specifying the alignment of section (it has to be the power of
		3573	two), if no alignment is specified, the default value is used, which is 4 or 8,
		3574	depending on which format variant has been chosen.
		3575	"extrn" and "public" directives have the same meaning and syntax as when the
		3576	COFF output format is selected (described in previous section).
		3577	The "rva" operator can be used also in the case of this format (however not
		3578	when target architecture is x86-64), it converts the address into the offset
		3579	relative to the GOT table, so it may be useful to create position-independent
		3580	code.
		3581	To create executable file, follow the format choice directive with the
		3582	"executable" keyword. It allows to use "entry" directive followed by the value
		3583	to set as entry point of program. On the other hand it makes "extrn" and
		3584	"public" directives unavailable, and instead of "section" there should be the
		3585	"segment" directive used, followed only by one or more segment permission
		3586	flags. The origin of segment is aligned to page (4096 bytes), and available
		3587	flags for are: "readable", "writeable" and "executable".
		3588
		3589
		3590
		3591

Subversion Repositories Kolibri OS

(root)/data/eng/docs/FASM.TXT @ 1737 – Rev