0,0 → 1,2622 |
TGSI |
==== |
|
TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language |
for describing shaders. Since Gallium is inherently shaderful, shaders are |
an important part of the API. TGSI is the only intermediate representation |
used by all drivers. |
|
Basics |
------ |
|
All TGSI instructions, known as *opcodes*, operate on arbitrary-precision |
floating-point four-component vectors. An opcode may have up to one |
destination register, known as *dst*, and between zero and three source |
registers, called *src0* through *src2*, or simply *src* if there is only |
one. |
|
Some instructions, like :opcode:`I2F`, permit re-interpretation of vector |
components as integers. Other instructions permit using registers as |
two-component vectors with double precision; see :ref:`Double Opcodes`. |
|
When an instruction has a scalar result, the result is usually copied into |
each of the components of *dst*. When this happens, the result is said to be |
*replicated* to *dst*. :opcode:`RCP` is one such instruction. |
|
Modifiers |
^^^^^^^^^^^^^^^ |
|
TGSI supports modifiers on inputs (as well as saturate modifier on instructions). |
|
For inputs which have a floating point type, both absolute value and negation |
modifiers are supported (with absolute value being applied first). |
TGSI_OPCODE_MOV is considered to have float input type for applying modifiers. |
|
For inputs which have signed or unsigned type only the negate modifier is |
supported. |
|
Instruction Set |
--------------- |
|
Core ISA |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
These opcodes are guaranteed to be available regardless of the driver being |
used. |
|
.. opcode:: ARL - Address Register Load |
|
.. math:: |
|
dst.x = \lfloor src.x\rfloor |
|
dst.y = \lfloor src.y\rfloor |
|
dst.z = \lfloor src.z\rfloor |
|
dst.w = \lfloor src.w\rfloor |
|
|
.. opcode:: MOV - Move |
|
.. math:: |
|
dst.x = src.x |
|
dst.y = src.y |
|
dst.z = src.z |
|
dst.w = src.w |
|
|
.. opcode:: LIT - Light Coefficients |
|
.. math:: |
|
dst.x = 1 |
|
dst.y = max(src.x, 0) |
|
dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 |
|
dst.w = 1 |
|
|
.. opcode:: RCP - Reciprocal |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = \frac{1}{src.x} |
|
|
.. opcode:: RSQ - Reciprocal Square Root |
|
This instruction replicates its result. The results are undefined for src <= 0. |
|
.. math:: |
|
dst = \frac{1}{\sqrt{src.x}} |
|
|
.. opcode:: SQRT - Square Root |
|
This instruction replicates its result. The results are undefined for src < 0. |
|
.. math:: |
|
dst = {\sqrt{src.x}} |
|
|
.. opcode:: EXP - Approximate Exponential Base 2 |
|
.. math:: |
|
dst.x = 2^{\lfloor src.x\rfloor} |
|
dst.y = src.x - \lfloor src.x\rfloor |
|
dst.z = 2^{src.x} |
|
dst.w = 1 |
|
|
.. opcode:: LOG - Approximate Logarithm Base 2 |
|
.. math:: |
|
dst.x = \lfloor\log_2{|src.x|}\rfloor |
|
dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} |
|
dst.z = \log_2{|src.x|} |
|
dst.w = 1 |
|
|
.. opcode:: MUL - Multiply |
|
.. math:: |
|
dst.x = src0.x \times src1.x |
|
dst.y = src0.y \times src1.y |
|
dst.z = src0.z \times src1.z |
|
dst.w = src0.w \times src1.w |
|
|
.. opcode:: ADD - Add |
|
.. math:: |
|
dst.x = src0.x + src1.x |
|
dst.y = src0.y + src1.y |
|
dst.z = src0.z + src1.z |
|
dst.w = src0.w + src1.w |
|
|
.. opcode:: DP3 - 3-component Dot Product |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z |
|
|
.. opcode:: DP4 - 4-component Dot Product |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w |
|
|
.. opcode:: DST - Distance Vector |
|
.. math:: |
|
dst.x = 1 |
|
dst.y = src0.y \times src1.y |
|
dst.z = src0.z |
|
dst.w = src1.w |
|
|
.. opcode:: MIN - Minimum |
|
.. math:: |
|
dst.x = min(src0.x, src1.x) |
|
dst.y = min(src0.y, src1.y) |
|
dst.z = min(src0.z, src1.z) |
|
dst.w = min(src0.w, src1.w) |
|
|
.. opcode:: MAX - Maximum |
|
.. math:: |
|
dst.x = max(src0.x, src1.x) |
|
dst.y = max(src0.y, src1.y) |
|
dst.z = max(src0.z, src1.z) |
|
dst.w = max(src0.w, src1.w) |
|
|
.. opcode:: SLT - Set On Less Than |
|
.. math:: |
|
dst.x = (src0.x < src1.x) ? 1 : 0 |
|
dst.y = (src0.y < src1.y) ? 1 : 0 |
|
dst.z = (src0.z < src1.z) ? 1 : 0 |
|
dst.w = (src0.w < src1.w) ? 1 : 0 |
|
|
.. opcode:: SGE - Set On Greater Equal Than |
|
.. math:: |
|
dst.x = (src0.x >= src1.x) ? 1 : 0 |
|
dst.y = (src0.y >= src1.y) ? 1 : 0 |
|
dst.z = (src0.z >= src1.z) ? 1 : 0 |
|
dst.w = (src0.w >= src1.w) ? 1 : 0 |
|
|
.. opcode:: MAD - Multiply And Add |
|
.. math:: |
|
dst.x = src0.x \times src1.x + src2.x |
|
dst.y = src0.y \times src1.y + src2.y |
|
dst.z = src0.z \times src1.z + src2.z |
|
dst.w = src0.w \times src1.w + src2.w |
|
|
.. opcode:: SUB - Subtract |
|
.. math:: |
|
dst.x = src0.x - src1.x |
|
dst.y = src0.y - src1.y |
|
dst.z = src0.z - src1.z |
|
dst.w = src0.w - src1.w |
|
|
.. opcode:: LRP - Linear Interpolate |
|
.. math:: |
|
dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x |
|
dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y |
|
dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z |
|
dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w |
|
|
.. opcode:: CND - Condition |
|
.. math:: |
|
dst.x = (src2.x > 0.5) ? src0.x : src1.x |
|
dst.y = (src2.y > 0.5) ? src0.y : src1.y |
|
dst.z = (src2.z > 0.5) ? src0.z : src1.z |
|
dst.w = (src2.w > 0.5) ? src0.w : src1.w |
|
|
.. opcode:: DP2A - 2-component Dot Product And Add |
|
.. math:: |
|
dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x |
|
dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x |
|
dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x |
|
dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x |
|
|
.. opcode:: FRC - Fraction |
|
.. math:: |
|
dst.x = src.x - \lfloor src.x\rfloor |
|
dst.y = src.y - \lfloor src.y\rfloor |
|
dst.z = src.z - \lfloor src.z\rfloor |
|
dst.w = src.w - \lfloor src.w\rfloor |
|
|
.. opcode:: CLAMP - Clamp |
|
.. math:: |
|
dst.x = clamp(src0.x, src1.x, src2.x) |
|
dst.y = clamp(src0.y, src1.y, src2.y) |
|
dst.z = clamp(src0.z, src1.z, src2.z) |
|
dst.w = clamp(src0.w, src1.w, src2.w) |
|
|
.. opcode:: FLR - Floor |
|
This is identical to :opcode:`ARL`. |
|
.. math:: |
|
dst.x = \lfloor src.x\rfloor |
|
dst.y = \lfloor src.y\rfloor |
|
dst.z = \lfloor src.z\rfloor |
|
dst.w = \lfloor src.w\rfloor |
|
|
.. opcode:: ROUND - Round |
|
.. math:: |
|
dst.x = round(src.x) |
|
dst.y = round(src.y) |
|
dst.z = round(src.z) |
|
dst.w = round(src.w) |
|
|
.. opcode:: EX2 - Exponential Base 2 |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = 2^{src.x} |
|
|
.. opcode:: LG2 - Logarithm Base 2 |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = \log_2{src.x} |
|
|
.. opcode:: POW - Power |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = src0.x^{src1.x} |
|
.. opcode:: XPD - Cross Product |
|
.. math:: |
|
dst.x = src0.y \times src1.z - src1.y \times src0.z |
|
dst.y = src0.z \times src1.x - src1.z \times src0.x |
|
dst.z = src0.x \times src1.y - src1.x \times src0.y |
|
dst.w = 1 |
|
|
.. opcode:: ABS - Absolute |
|
.. math:: |
|
dst.x = |src.x| |
|
dst.y = |src.y| |
|
dst.z = |src.z| |
|
dst.w = |src.w| |
|
|
.. opcode:: RCC - Reciprocal Clamped |
|
This instruction replicates its result. |
|
XXX cleanup on aisle three |
|
.. math:: |
|
dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) |
|
|
.. opcode:: DPH - Homogeneous Dot Product |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w |
|
|
.. opcode:: COS - Cosine |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = \cos{src.x} |
|
|
.. opcode:: DDX - Derivative Relative To X |
|
.. math:: |
|
dst.x = partialx(src.x) |
|
dst.y = partialx(src.y) |
|
dst.z = partialx(src.z) |
|
dst.w = partialx(src.w) |
|
|
.. opcode:: DDY - Derivative Relative To Y |
|
.. math:: |
|
dst.x = partialy(src.x) |
|
dst.y = partialy(src.y) |
|
dst.z = partialy(src.z) |
|
dst.w = partialy(src.w) |
|
|
.. opcode:: PK2H - Pack Two 16-bit Floats |
|
TBD |
|
|
.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars |
|
TBD |
|
|
.. opcode:: PK4B - Pack Four Signed 8-bit Scalars |
|
TBD |
|
|
.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars |
|
TBD |
|
|
.. opcode:: RFL - Reflection Vector |
|
.. math:: |
|
dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x |
|
dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y |
|
dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z |
|
dst.w = 1 |
|
.. note:: |
|
Considered for removal. |
|
|
.. opcode:: SEQ - Set On Equal |
|
.. math:: |
|
dst.x = (src0.x == src1.x) ? 1 : 0 |
|
dst.y = (src0.y == src1.y) ? 1 : 0 |
|
dst.z = (src0.z == src1.z) ? 1 : 0 |
|
dst.w = (src0.w == src1.w) ? 1 : 0 |
|
|
.. opcode:: SFL - Set On False |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = 0 |
|
.. note:: |
|
Considered for removal. |
|
|
.. opcode:: SGT - Set On Greater Than |
|
.. math:: |
|
dst.x = (src0.x > src1.x) ? 1 : 0 |
|
dst.y = (src0.y > src1.y) ? 1 : 0 |
|
dst.z = (src0.z > src1.z) ? 1 : 0 |
|
dst.w = (src0.w > src1.w) ? 1 : 0 |
|
|
.. opcode:: SIN - Sine |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = \sin{src.x} |
|
|
.. opcode:: SLE - Set On Less Equal Than |
|
.. math:: |
|
dst.x = (src0.x <= src1.x) ? 1 : 0 |
|
dst.y = (src0.y <= src1.y) ? 1 : 0 |
|
dst.z = (src0.z <= src1.z) ? 1 : 0 |
|
dst.w = (src0.w <= src1.w) ? 1 : 0 |
|
|
.. opcode:: SNE - Set On Not Equal |
|
.. math:: |
|
dst.x = (src0.x != src1.x) ? 1 : 0 |
|
dst.y = (src0.y != src1.y) ? 1 : 0 |
|
dst.z = (src0.z != src1.z) ? 1 : 0 |
|
dst.w = (src0.w != src1.w) ? 1 : 0 |
|
|
.. opcode:: STR - Set On True |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = 1 |
|
|
.. opcode:: TEX - Texture Lookup |
|
.. math:: |
|
coord = src0 |
|
bias = 0.0 |
|
dst = texture_sample(unit, coord, bias) |
|
for array textures src0.y contains the slice for 1D, |
and src0.z contain the slice for 2D. |
for shadow textures with no arrays, src0.z contains |
the reference value. |
for shadow textures with arrays, src0.z contains |
the reference value for 1D arrays, and src0.w contains |
the reference value for 2D arrays. |
There is no way to pass a bias in the .w value for |
shadow arrays, and GLSL doesn't allow this. |
GLSL does allow cube shadows maps to take a bias value, |
and we have to determine how this will look in TGSI. |
|
.. opcode:: TXD - Texture Lookup with Derivatives |
|
.. math:: |
|
coord = src0 |
|
ddx = src1 |
|
ddy = src2 |
|
bias = 0.0 |
|
dst = texture_sample_deriv(unit, coord, bias, ddx, ddy) |
|
|
.. opcode:: TXP - Projective Texture Lookup |
|
.. math:: |
|
coord.x = src0.x / src.w |
|
coord.y = src0.y / src.w |
|
coord.z = src0.z / src.w |
|
coord.w = src0.w |
|
bias = 0.0 |
|
dst = texture_sample(unit, coord, bias) |
|
|
.. opcode:: UP2H - Unpack Two 16-Bit Floats |
|
TBD |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars |
|
TBD |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: UP4B - Unpack Four Signed 8-Bit Values |
|
TBD |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars |
|
TBD |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: X2D - 2D Coordinate Transformation |
|
.. math:: |
|
dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y |
|
dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w |
|
dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y |
|
dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w |
|
.. note:: |
|
Considered for removal. |
|
|
.. opcode:: ARA - Address Register Add |
|
TBD |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: ARR - Address Register Load With Round |
|
.. math:: |
|
dst.x = round(src.x) |
|
dst.y = round(src.y) |
|
dst.z = round(src.z) |
|
dst.w = round(src.w) |
|
|
.. opcode:: SSG - Set Sign |
|
.. math:: |
|
dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0 |
|
dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0 |
|
dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0 |
|
dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 |
|
|
.. opcode:: CMP - Compare |
|
.. math:: |
|
dst.x = (src0.x < 0) ? src1.x : src2.x |
|
dst.y = (src0.y < 0) ? src1.y : src2.y |
|
dst.z = (src0.z < 0) ? src1.z : src2.z |
|
dst.w = (src0.w < 0) ? src1.w : src2.w |
|
|
.. opcode:: KILL_IF - Conditional Discard |
|
Conditional discard. Allowed in fragment shaders only. |
|
.. math:: |
|
if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0) |
discard |
endif |
|
|
.. opcode:: KILL - Discard |
|
Unconditional discard. Allowed in fragment shaders only. |
|
|
.. opcode:: SCS - Sine Cosine |
|
.. math:: |
|
dst.x = \cos{src.x} |
|
dst.y = \sin{src.x} |
|
dst.z = 0 |
|
dst.w = 1 |
|
|
.. opcode:: TXB - Texture Lookup With Bias |
|
.. math:: |
|
coord.x = src.x |
|
coord.y = src.y |
|
coord.z = src.z |
|
coord.w = 1.0 |
|
bias = src.z |
|
dst = texture_sample(unit, coord, bias) |
|
|
.. opcode:: NRM - 3-component Vector Normalise |
|
.. math:: |
|
dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z) |
|
dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z) |
|
dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z) |
|
dst.w = 1 |
|
|
.. opcode:: DIV - Divide |
|
.. math:: |
|
dst.x = \frac{src0.x}{src1.x} |
|
dst.y = \frac{src0.y}{src1.y} |
|
dst.z = \frac{src0.z}{src1.z} |
|
dst.w = \frac{src0.w}{src1.w} |
|
|
.. opcode:: DP2 - 2-component Dot Product |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = src0.x \times src1.x + src0.y \times src1.y |
|
|
.. opcode:: TXL - Texture Lookup With explicit LOD |
|
.. math:: |
|
coord.x = src0.x |
|
coord.y = src0.y |
|
coord.z = src0.z |
|
coord.w = 1.0 |
|
lod = src0.w |
|
dst = texture_sample(unit, coord, lod) |
|
|
.. opcode:: PUSHA - Push Address Register On Stack |
|
push(src.x) |
push(src.y) |
push(src.z) |
push(src.w) |
|
.. note:: |
|
Considered for cleanup. |
|
.. note:: |
|
Considered for removal. |
|
.. opcode:: POPA - Pop Address Register From Stack |
|
dst.w = pop() |
dst.z = pop() |
dst.y = pop() |
dst.x = pop() |
|
.. note:: |
|
Considered for cleanup. |
|
.. note:: |
|
Considered for removal. |
|
|
.. opcode:: BRA - Branch |
|
pc = target |
|
.. note:: |
|
Considered for removal. |
|
|
.. opcode:: CALLNZ - Subroutine Call If Not Zero |
|
TBD |
|
.. note:: |
|
Considered for cleanup. |
|
.. note:: |
|
Considered for removal. |
|
|
Compute ISA |
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
These opcodes are primarily provided for special-use computational shaders. |
Support for these opcodes indicated by a special pipe capability bit (TBD). |
|
XXX doesn't look like most of the opcodes really belong here. |
|
.. opcode:: CEIL - Ceiling |
|
.. math:: |
|
dst.x = \lceil src.x\rceil |
|
dst.y = \lceil src.y\rceil |
|
dst.z = \lceil src.z\rceil |
|
dst.w = \lceil src.w\rceil |
|
|
.. opcode:: TRUNC - Truncate |
|
.. math:: |
|
dst.x = trunc(src.x) |
|
dst.y = trunc(src.y) |
|
dst.z = trunc(src.z) |
|
dst.w = trunc(src.w) |
|
|
.. opcode:: MOD - Modulus |
|
.. math:: |
|
dst.x = src0.x \bmod src1.x |
|
dst.y = src0.y \bmod src1.y |
|
dst.z = src0.z \bmod src1.z |
|
dst.w = src0.w \bmod src1.w |
|
|
.. opcode:: UARL - Integer Address Register Load |
|
Moves the contents of the source register, assumed to be an integer, into the |
destination register, which is assumed to be an address (ADDR) register. |
|
|
.. opcode:: SAD - Sum Of Absolute Differences |
|
.. math:: |
|
dst.x = |src0.x - src1.x| + src2.x |
|
dst.y = |src0.y - src1.y| + src2.y |
|
dst.z = |src0.z - src1.z| + src2.z |
|
dst.w = |src0.w - src1.w| + src2.w |
|
|
.. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel |
from a specified texture image. The source sampler may |
not be a CUBE or SHADOW. |
src 0 is a four-component signed integer vector used to |
identify the single texel accessed. 3 components + level. |
src 1 is a 3 component constant signed integer vector, |
with each component only have a range of |
-8..+8 (hw only seems to deal with this range, interface |
allows for up to unsigned int). |
TXF(uint_vec coord, int_vec offset). |
|
|
.. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4) |
retrieve the dimensions of the texture |
depending on the target. For 1D (width), 2D/RECT/CUBE |
(width, height), 3D (width, height, depth), |
1D array (width, layers), 2D array (width, height, layers) |
|
.. math:: |
|
lod = src0.x |
|
dst.x = texture_width(unit, lod) |
|
dst.y = texture_height(unit, lod) |
|
dst.z = texture_depth(unit, lod) |
|
|
Integer ISA |
^^^^^^^^^^^^^^^^^^^^^^^^ |
These opcodes are used for integer operations. |
Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) |
|
|
.. opcode:: I2F - Signed Integer To Float |
|
Rounding is unspecified (round to nearest even suggested). |
|
.. math:: |
|
dst.x = (float) src.x |
|
dst.y = (float) src.y |
|
dst.z = (float) src.z |
|
dst.w = (float) src.w |
|
|
.. opcode:: U2F - Unsigned Integer To Float |
|
Rounding is unspecified (round to nearest even suggested). |
|
.. math:: |
|
dst.x = (float) src.x |
|
dst.y = (float) src.y |
|
dst.z = (float) src.z |
|
dst.w = (float) src.w |
|
|
.. opcode:: F2I - Float to Signed Integer |
|
Rounding is towards zero (truncate). |
Values outside signed range (including NaNs) produce undefined results. |
|
.. math:: |
|
dst.x = (int) src.x |
|
dst.y = (int) src.y |
|
dst.z = (int) src.z |
|
dst.w = (int) src.w |
|
|
.. opcode:: F2U - Float to Unsigned Integer |
|
Rounding is towards zero (truncate). |
Values outside unsigned range (including NaNs) produce undefined results. |
|
.. math:: |
|
dst.x = (unsigned) src.x |
|
dst.y = (unsigned) src.y |
|
dst.z = (unsigned) src.z |
|
dst.w = (unsigned) src.w |
|
|
.. opcode:: UADD - Integer Add |
|
This instruction works the same for signed and unsigned integers. |
The low 32bit of the result is returned. |
|
.. math:: |
|
dst.x = src0.x + src1.x |
|
dst.y = src0.y + src1.y |
|
dst.z = src0.z + src1.z |
|
dst.w = src0.w + src1.w |
|
|
.. opcode:: UMAD - Integer Multiply And Add |
|
This instruction works the same for signed and unsigned integers. |
The multiplication returns the low 32bit (as does the result itself). |
|
.. math:: |
|
dst.x = src0.x \times src1.x + src2.x |
|
dst.y = src0.y \times src1.y + src2.y |
|
dst.z = src0.z \times src1.z + src2.z |
|
dst.w = src0.w \times src1.w + src2.w |
|
|
.. opcode:: UMUL - Integer Multiply |
|
This instruction works the same for signed and unsigned integers. |
The low 32bit of the result is returned. |
|
.. math:: |
|
dst.x = src0.x \times src1.x |
|
dst.y = src0.y \times src1.y |
|
dst.z = src0.z \times src1.z |
|
dst.w = src0.w \times src1.w |
|
|
.. opcode:: IDIV - Signed Integer Division |
|
TBD: behavior for division by zero. |
|
.. math:: |
|
dst.x = src0.x \ src1.x |
|
dst.y = src0.y \ src1.y |
|
dst.z = src0.z \ src1.z |
|
dst.w = src0.w \ src1.w |
|
|
.. opcode:: UDIV - Unsigned Integer Division |
|
For division by zero, 0xffffffff is returned. |
|
.. math:: |
|
dst.x = src0.x \ src1.x |
|
dst.y = src0.y \ src1.y |
|
dst.z = src0.z \ src1.z |
|
dst.w = src0.w \ src1.w |
|
|
.. opcode:: UMOD - Unsigned Integer Remainder |
|
If second arg is zero, 0xffffffff is returned. |
|
.. math:: |
|
dst.x = src0.x \ src1.x |
|
dst.y = src0.y \ src1.y |
|
dst.z = src0.z \ src1.z |
|
dst.w = src0.w \ src1.w |
|
|
.. opcode:: NOT - Bitwise Not |
|
.. math:: |
|
dst.x = ~src.x |
|
dst.y = ~src.y |
|
dst.z = ~src.z |
|
dst.w = ~src.w |
|
|
.. opcode:: AND - Bitwise And |
|
.. math:: |
|
dst.x = src0.x & src1.x |
|
dst.y = src0.y & src1.y |
|
dst.z = src0.z & src1.z |
|
dst.w = src0.w & src1.w |
|
|
.. opcode:: OR - Bitwise Or |
|
.. math:: |
|
dst.x = src0.x | src1.x |
|
dst.y = src0.y | src1.y |
|
dst.z = src0.z | src1.z |
|
dst.w = src0.w | src1.w |
|
|
.. opcode:: XOR - Bitwise Xor |
|
.. math:: |
|
dst.x = src0.x \oplus src1.x |
|
dst.y = src0.y \oplus src1.y |
|
dst.z = src0.z \oplus src1.z |
|
dst.w = src0.w \oplus src1.w |
|
|
.. opcode:: IMAX - Maximum of Signed Integers |
|
.. math:: |
|
dst.x = max(src0.x, src1.x) |
|
dst.y = max(src0.y, src1.y) |
|
dst.z = max(src0.z, src1.z) |
|
dst.w = max(src0.w, src1.w) |
|
|
.. opcode:: UMAX - Maximum of Unsigned Integers |
|
.. math:: |
|
dst.x = max(src0.x, src1.x) |
|
dst.y = max(src0.y, src1.y) |
|
dst.z = max(src0.z, src1.z) |
|
dst.w = max(src0.w, src1.w) |
|
|
.. opcode:: IMIN - Minimum of Signed Integers |
|
.. math:: |
|
dst.x = min(src0.x, src1.x) |
|
dst.y = min(src0.y, src1.y) |
|
dst.z = min(src0.z, src1.z) |
|
dst.w = min(src0.w, src1.w) |
|
|
.. opcode:: UMIN - Minimum of Unsigned Integers |
|
.. math:: |
|
dst.x = min(src0.x, src1.x) |
|
dst.y = min(src0.y, src1.y) |
|
dst.z = min(src0.z, src1.z) |
|
dst.w = min(src0.w, src1.w) |
|
|
.. opcode:: SHL - Shift Left |
|
.. math:: |
|
dst.x = src0.x << src1.x |
|
dst.y = src0.y << src1.x |
|
dst.z = src0.z << src1.x |
|
dst.w = src0.w << src1.x |
|
|
.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer) |
|
.. math:: |
|
dst.x = src0.x >> src1.x |
|
dst.y = src0.y >> src1.x |
|
dst.z = src0.z >> src1.x |
|
dst.w = src0.w >> src1.x |
|
|
.. opcode:: USHR - Logical Shift Right |
|
.. math:: |
|
dst.x = src0.x >> (unsigned) src1.x |
|
dst.y = src0.y >> (unsigned) src1.x |
|
dst.z = src0.z >> (unsigned) src1.x |
|
dst.w = src0.w >> (unsigned) src1.x |
|
|
.. opcode:: UCMP - Integer Conditional Move |
|
.. math:: |
|
dst.x = src0.x ? src1.x : src2.x |
|
dst.y = src0.y ? src1.y : src2.y |
|
dst.z = src0.z ? src1.z : src2.z |
|
dst.w = src0.w ? src1.w : src2.w |
|
|
|
.. opcode:: ISSG - Integer Set Sign |
|
.. math:: |
|
dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0 |
|
dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0 |
|
dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0 |
|
dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0 |
|
|
|
.. opcode:: ISLT - Signed Integer Set On Less Than |
|
.. math:: |
|
dst.x = (src0.x < src1.x) ? ~0 : 0 |
|
dst.y = (src0.y < src1.y) ? ~0 : 0 |
|
dst.z = (src0.z < src1.z) ? ~0 : 0 |
|
dst.w = (src0.w < src1.w) ? ~0 : 0 |
|
|
.. opcode:: USLT - Unsigned Integer Set On Less Than |
|
.. math:: |
|
dst.x = (src0.x < src1.x) ? ~0 : 0 |
|
dst.y = (src0.y < src1.y) ? ~0 : 0 |
|
dst.z = (src0.z < src1.z) ? ~0 : 0 |
|
dst.w = (src0.w < src1.w) ? ~0 : 0 |
|
|
.. opcode:: ISGE - Signed Integer Set On Greater Equal Than |
|
.. math:: |
|
dst.x = (src0.x >= src1.x) ? ~0 : 0 |
|
dst.y = (src0.y >= src1.y) ? ~0 : 0 |
|
dst.z = (src0.z >= src1.z) ? ~0 : 0 |
|
dst.w = (src0.w >= src1.w) ? ~0 : 0 |
|
|
.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than |
|
.. math:: |
|
dst.x = (src0.x >= src1.x) ? ~0 : 0 |
|
dst.y = (src0.y >= src1.y) ? ~0 : 0 |
|
dst.z = (src0.z >= src1.z) ? ~0 : 0 |
|
dst.w = (src0.w >= src1.w) ? ~0 : 0 |
|
|
.. opcode:: USEQ - Integer Set On Equal |
|
.. math:: |
|
dst.x = (src0.x == src1.x) ? ~0 : 0 |
|
dst.y = (src0.y == src1.y) ? ~0 : 0 |
|
dst.z = (src0.z == src1.z) ? ~0 : 0 |
|
dst.w = (src0.w == src1.w) ? ~0 : 0 |
|
|
.. opcode:: USNE - Integer Set On Not Equal |
|
.. math:: |
|
dst.x = (src0.x != src1.x) ? ~0 : 0 |
|
dst.y = (src0.y != src1.y) ? ~0 : 0 |
|
dst.z = (src0.z != src1.z) ? ~0 : 0 |
|
dst.w = (src0.w != src1.w) ? ~0 : 0 |
|
|
.. opcode:: INEG - Integer Negate |
|
Two's complement. |
|
.. math:: |
|
dst.x = -src.x |
|
dst.y = -src.y |
|
dst.z = -src.z |
|
dst.w = -src.w |
|
|
.. opcode:: IABS - Integer Absolute Value |
|
.. math:: |
|
dst.x = |src.x| |
|
dst.y = |src.y| |
|
dst.z = |src.z| |
|
dst.w = |src.w| |
|
|
Geometry ISA |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
These opcodes are only supported in geometry shaders; they have no meaning |
in any other type of shader. |
|
.. opcode:: EMIT - Emit |
|
Generate a new vertex for the current primitive using the values in the |
output registers. |
|
|
.. opcode:: ENDPRIM - End Primitive |
|
Complete the current primitive (consisting of the emitted vertices), |
and start a new one. |
|
|
GLSL ISA |
^^^^^^^^^^ |
|
These opcodes are part of :term:`GLSL`'s opcode set. Support for these |
opcodes is determined by a special capability bit, ``GLSL``. |
Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH). |
|
.. opcode:: CAL - Subroutine Call |
|
push(pc) |
pc = target |
|
|
.. opcode:: RET - Subroutine Call Return |
|
pc = pop() |
|
|
.. opcode:: CONT - Continue |
|
Unconditionally moves the point of execution to the instruction after the |
last bgnloop. The instruction must appear within a bgnloop/endloop. |
|
.. note:: |
|
Support for CONT is determined by a special capability bit, |
``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. |
|
|
.. opcode:: BGNLOOP - Begin a Loop |
|
Start a loop. Must have a matching endloop. |
|
|
.. opcode:: BGNSUB - Begin Subroutine |
|
Starts definition of a subroutine. Must have a matching endsub. |
|
|
.. opcode:: ENDLOOP - End a Loop |
|
End a loop started with bgnloop. |
|
|
.. opcode:: ENDSUB - End Subroutine |
|
Ends definition of a subroutine. |
|
|
.. opcode:: NOP - No Operation |
|
Do nothing. |
|
|
.. opcode:: BRK - Break |
|
Unconditionally moves the point of execution to the instruction after the |
next endloop or endswitch. The instruction must appear within a loop/endloop |
or switch/endswitch. |
|
|
.. opcode:: BREAKC - Break Conditional |
|
Conditionally moves the point of execution to the instruction after the |
next endloop or endswitch. The instruction must appear within a loop/endloop |
or switch/endswitch. |
Condition evaluates to true if src0.x != 0 where src0.x is interpreted |
as an integer register. |
|
.. note:: |
|
Considered for removal as it's quite inconsistent wrt other opcodes |
(could emulate with UIF/BRK/ENDIF). |
|
|
.. opcode:: IF - Float If |
|
Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if |
|
src0.x != 0.0 |
|
where src0.x is interpreted as a floating point register. |
|
|
.. opcode:: UIF - Bitwise If |
|
Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if |
|
src0.x != 0 |
|
where src0.x is interpreted as an integer register. |
|
|
.. opcode:: ELSE - Else |
|
Starts an else block, after an IF or UIF statement. |
|
|
.. opcode:: ENDIF - End If |
|
Ends an IF or UIF block. |
|
|
.. opcode:: SWITCH - Switch |
|
Starts a C-style switch expression. The switch consists of one or multiple |
CASE statements, and at most one DEFAULT statement. Execution of a statement |
ends when a BRK is hit, but just like in C falling through to other cases |
without a break is allowed. Similarly, DEFAULT label is allowed anywhere not |
just as last statement, and fallthrough is allowed into/from it. |
CASE src arguments are evaluated at bit level against the SWITCH src argument. |
|
Example: |
SWITCH src[0].x |
CASE src[0].x |
(some instructions here) |
(optional BRK here) |
DEFAULT |
(some instructions here) |
(optional BRK here) |
CASE src[0].x |
(some instructions here) |
(optional BRK here) |
ENDSWITCH |
|
|
.. opcode:: CASE - Switch case |
|
This represents a switch case label. The src arg must be an integer immediate. |
|
|
.. opcode:: DEFAULT - Switch default |
|
This represents the default case in the switch, which is taken if no other |
case matches. |
|
|
.. opcode:: ENDSWITCH - End of switch |
|
Ends a switch expression. |
|
|
.. opcode:: NRM4 - 4-component Vector Normalise |
|
This instruction replicates its result. |
|
.. math:: |
|
dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} |
|
|
.. _doubleopcodes: |
|
Double ISA |
^^^^^^^^^^^^^^^ |
|
The double-precision opcodes reinterpret four-component vectors into |
two-component vectors with doubled precision in each component. |
|
Support for these opcodes is XXX undecided. :T |
|
.. opcode:: DADD - Add |
|
.. math:: |
|
dst.xy = src0.xy + src1.xy |
|
dst.zw = src0.zw + src1.zw |
|
|
.. opcode:: DDIV - Divide |
|
.. math:: |
|
dst.xy = src0.xy / src1.xy |
|
dst.zw = src0.zw / src1.zw |
|
.. opcode:: DSEQ - Set on Equal |
|
.. math:: |
|
dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F |
|
dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F |
|
.. opcode:: DSLT - Set on Less than |
|
.. math:: |
|
dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F |
|
dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F |
|
.. opcode:: DFRAC - Fraction |
|
.. math:: |
|
dst.xy = src.xy - \lfloor src.xy\rfloor |
|
dst.zw = src.zw - \lfloor src.zw\rfloor |
|
|
.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components |
|
Like the ``frexp()`` routine in many math libraries, this opcode stores the |
exponent of its source to ``dst0``, and the significand to ``dst1``, such that |
:math:`dst1 \times 2^{dst0} = src` . |
|
.. math:: |
|
dst0.xy = exp(src.xy) |
|
dst1.xy = frac(src.xy) |
|
dst0.zw = exp(src.zw) |
|
dst1.zw = frac(src.zw) |
|
.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 |
|
This opcode is the inverse of :opcode:`DFRACEXP`. |
|
.. math:: |
|
dst.xy = src0.xy \times 2^{src1.xy} |
|
dst.zw = src0.zw \times 2^{src1.zw} |
|
.. opcode:: DMIN - Minimum |
|
.. math:: |
|
dst.xy = min(src0.xy, src1.xy) |
|
dst.zw = min(src0.zw, src1.zw) |
|
.. opcode:: DMAX - Maximum |
|
.. math:: |
|
dst.xy = max(src0.xy, src1.xy) |
|
dst.zw = max(src0.zw, src1.zw) |
|
.. opcode:: DMUL - Multiply |
|
.. math:: |
|
dst.xy = src0.xy \times src1.xy |
|
dst.zw = src0.zw \times src1.zw |
|
|
.. opcode:: DMAD - Multiply And Add |
|
.. math:: |
|
dst.xy = src0.xy \times src1.xy + src2.xy |
|
dst.zw = src0.zw \times src1.zw + src2.zw |
|
|
.. opcode:: DRCP - Reciprocal |
|
.. math:: |
|
dst.xy = \frac{1}{src.xy} |
|
dst.zw = \frac{1}{src.zw} |
|
.. opcode:: DSQRT - Square Root |
|
.. math:: |
|
dst.xy = \sqrt{src.xy} |
|
dst.zw = \sqrt{src.zw} |
|
|
.. _samplingopcodes: |
|
Resource Sampling Opcodes |
^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
Those opcodes follow very closely semantics of the respective Direct3D |
instructions. If in doubt double check Direct3D documentation. |
|
.. opcode:: SAMPLE - Using provided address, sample data from the |
specified texture using the filtering mode identified |
by the gven sampler. The source data may come from |
any resource type other than buffers. |
SAMPLE dst, address, sampler_view, sampler |
e.g. |
SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0] |
|
.. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction. |
Using the provided integer address, SAMPLE_I fetches data |
from the specified sampler view without any filtering. |
The source data may come from any resource type other |
than CUBE. |
SAMPLE_I dst, address, sampler_view |
e.g. |
SAMPLE_I TEMP[0], TEMP[1], SVIEW[0] |
The 'address' is specified as unsigned integers. If the |
'address' is out of range [0...(# texels - 1)] the |
result of the fetch is always 0 in all components. |
As such the instruction doesn't honor address wrap |
modes, in cases where that behavior is desirable |
'SAMPLE' instruction should be used. |
address.w always provides an unsigned integer mipmap |
level. If the value is out of the range then the |
instruction always returns 0 in all components. |
address.yz are ignored for buffers and 1d textures. |
address.z is ignored for 1d texture arrays and 2d |
textures. |
For 1D texture arrays address.y provides the array |
index (also as unsigned integer). If the value is |
out of the range of available array indices |
[0... (array size - 1)] then the opcode always returns |
0 in all components. |
For 2D texture arrays address.z provides the array |
index, otherwise it exhibits the same behavior as in |
the case for 1D texture arrays. |
The exact semantics of the source address are presented |
in the table below: |
resource type X Y Z W |
------------- ------------------------ |
PIPE_BUFFER x ignored |
PIPE_TEXTURE_1D x mpl |
PIPE_TEXTURE_2D x y mpl |
PIPE_TEXTURE_3D x y z mpl |
PIPE_TEXTURE_RECT x y mpl |
PIPE_TEXTURE_CUBE not allowed as source |
PIPE_TEXTURE_1D_ARRAY x idx mpl |
PIPE_TEXTURE_2D_ARRAY x y idx mpl |
|
Where 'mpl' is a mipmap level and 'idx' is the |
array index. |
|
.. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from |
multi-sampled surfaces. |
SAMPLE_I_MS dst, address, sampler_view, sample |
|
.. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the |
exception that an additional bias is applied to the |
level of detail computed as part of the instruction |
execution. |
SAMPLE_B dst, address, sampler_view, sampler, lod_bias |
e.g. |
SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x |
|
.. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it |
performs a comparison filter. The operands to SAMPLE_C |
are identical to SAMPLE, except that there is an additional |
float32 operand, reference value, which must be a register |
with single-component, or a scalar literal. |
SAMPLE_C makes the hardware use the current samplers |
compare_func (in pipe_sampler_state) to compare |
reference value against the red component value for the |
surce resource at each texel that the currently configured |
texture filter covers based on the provided coordinates. |
SAMPLE_C dst, address, sampler_view.r, sampler, ref_value |
e.g. |
SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x |
|
.. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives |
are ignored. The LZ stands for level-zero. |
SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value |
e.g. |
SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x |
|
|
.. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except |
that the derivatives for the source address in the x |
direction and the y direction are provided by extra |
parameters. |
SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y |
e.g. |
SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3] |
|
.. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except |
that the LOD is provided directly as a scalar value, |
representing no anisotropy. |
SAMPLE_L dst, address, sampler_view, sampler, explicit_lod |
e.g. |
SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x |
|
.. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear |
filtering operation and packs them into a single register. |
Only works with 2D, 2D array, cubemaps, and cubemaps arrays. |
For 2D textures, only the addressing modes of the sampler and |
the top level of any mip pyramid are used. Set W to zero. |
It behaves like the SAMPLE instruction, but a filtered |
sample is not generated. The four samples that contribute |
to filtering are placed into xyzw in counter-clockwise order, |
starting with the (u,v) texture coordinate delta at the |
following locations (-, +), (+, +), (+, -), (-, -), where |
the magnitude of the deltas are half a texel. |
|
|
.. opcode:: SVIEWINFO - query the dimensions of a given sampler view. |
dst receives width, height, depth or array size and |
number of mipmap levels as int4. The dst can have a writemask |
which will specify what info is the caller interested |
in. |
SVIEWINFO dst, src_mip_level, sampler_view |
e.g. |
SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0] |
src_mip_level is an unsigned integer scalar. If it's |
out of range then returns 0 for width, height and |
depth/array size but the total number of mipmap is |
still returned correctly for the given sampler view. |
The returned width, height and depth values are for |
the mipmap level selected by the src_mip_level and |
are in the number of texels. |
For 1d texture array width is in dst.x, array size |
is in dst.y and dst.zw are always 0. |
|
.. opcode:: SAMPLE_POS - query the position of a given sample. |
dst receives float4 (x, y, 0, 0) indicated where the |
sample is located. If the resource is not a multi-sample |
resource and not a render target, the result is 0. |
|
.. opcode:: SAMPLE_INFO - dst receives number of samples in x. |
If the resource is not a multi-sample resource and |
not a render target, the result is 0. |
|
|
.. _resourceopcodes: |
|
Resource Access Opcodes |
^^^^^^^^^^^^^^^^^^^^^^^ |
|
.. opcode:: LOAD - Fetch data from a shader resource |
|
Syntax: ``LOAD dst, resource, address`` |
|
Example: ``LOAD TEMP[0], RES[0], TEMP[1]`` |
|
Using the provided integer address, LOAD fetches data |
from the specified buffer or texture without any |
filtering. |
|
The 'address' is specified as a vector of unsigned |
integers. If the 'address' is out of range the result |
is unspecified. |
|
Only the first mipmap level of a resource can be read |
from using this instruction. |
|
For 1D or 2D texture arrays, the array index is |
provided as an unsigned integer in address.y or |
address.z, respectively. address.yz are ignored for |
buffers and 1D textures. address.z is ignored for 1D |
texture arrays and 2D textures. address.w is always |
ignored. |
|
.. opcode:: STORE - Write data to a shader resource |
|
Syntax: ``STORE resource, address, src`` |
|
Example: ``STORE RES[0], TEMP[0], TEMP[1]`` |
|
Using the provided integer address, STORE writes data |
to the specified buffer or texture. |
|
The 'address' is specified as a vector of unsigned |
integers. If the 'address' is out of range the result |
is unspecified. |
|
Only the first mipmap level of a resource can be |
written to using this instruction. |
|
For 1D or 2D texture arrays, the array index is |
provided as an unsigned integer in address.y or |
address.z, respectively. address.yz are ignored for |
buffers and 1D textures. address.z is ignored for 1D |
texture arrays and 2D textures. address.w is always |
ignored. |
|
|
.. _threadsyncopcodes: |
|
Inter-thread synchronization opcodes |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
These opcodes are intended for communication between threads running |
within the same compute grid. For now they're only valid in compute |
programs. |
|
.. opcode:: MFENCE - Memory fence |
|
Syntax: ``MFENCE resource`` |
|
Example: ``MFENCE RES[0]`` |
|
This opcode forces strong ordering between any memory access |
operations that affect the specified resource. This means that |
previous loads and stores (and only those) will be performed and |
visible to other threads before the program execution continues. |
|
|
.. opcode:: LFENCE - Load memory fence |
|
Syntax: ``LFENCE resource`` |
|
Example: ``LFENCE RES[0]`` |
|
Similar to MFENCE, but it only affects the ordering of memory loads. |
|
|
.. opcode:: SFENCE - Store memory fence |
|
Syntax: ``SFENCE resource`` |
|
Example: ``SFENCE RES[0]`` |
|
Similar to MFENCE, but it only affects the ordering of memory stores. |
|
|
.. opcode:: BARRIER - Thread group barrier |
|
``BARRIER`` |
|
This opcode suspends the execution of the current thread until all |
the remaining threads in the working group reach the same point of |
the program. Results are unspecified if any of the remaining |
threads terminates or never reaches an executed BARRIER instruction. |
|
|
.. _atomopcodes: |
|
Atomic opcodes |
^^^^^^^^^^^^^^ |
|
These opcodes provide atomic variants of some common arithmetic and |
logical operations. In this context atomicity means that another |
concurrent memory access operation that affects the same memory |
location is guaranteed to be performed strictly before or after the |
entire execution of the atomic operation. |
|
For the moment they're only valid in compute programs. |
|
.. opcode:: ATOMUADD - Atomic integer addition |
|
Syntax: ``ATOMUADD dst, resource, offset, src`` |
|
Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = dst_i + src_i |
|
|
.. opcode:: ATOMXCHG - Atomic exchange |
|
Syntax: ``ATOMXCHG dst, resource, offset, src`` |
|
Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = src_i |
|
|
.. opcode:: ATOMCAS - Atomic compare-and-exchange |
|
Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` |
|
Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i) |
|
|
.. opcode:: ATOMAND - Atomic bitwise And |
|
Syntax: ``ATOMAND dst, resource, offset, src`` |
|
Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = dst_i \& src_i |
|
|
.. opcode:: ATOMOR - Atomic bitwise Or |
|
Syntax: ``ATOMOR dst, resource, offset, src`` |
|
Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = dst_i | src_i |
|
|
.. opcode:: ATOMXOR - Atomic bitwise Xor |
|
Syntax: ``ATOMXOR dst, resource, offset, src`` |
|
Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = dst_i \oplus src_i |
|
|
.. opcode:: ATOMUMIN - Atomic unsigned minimum |
|
Syntax: ``ATOMUMIN dst, resource, offset, src`` |
|
Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = (dst_i < src_i ? dst_i : src_i) |
|
|
.. opcode:: ATOMUMAX - Atomic unsigned maximum |
|
Syntax: ``ATOMUMAX dst, resource, offset, src`` |
|
Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = (dst_i > src_i ? dst_i : src_i) |
|
|
.. opcode:: ATOMIMIN - Atomic signed minimum |
|
Syntax: ``ATOMIMIN dst, resource, offset, src`` |
|
Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = (dst_i < src_i ? dst_i : src_i) |
|
|
.. opcode:: ATOMIMAX - Atomic signed maximum |
|
Syntax: ``ATOMIMAX dst, resource, offset, src`` |
|
Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` |
|
The following operation is performed atomically on each component: |
|
.. math:: |
|
dst_i = resource[offset]_i |
|
resource[offset]_i = (dst_i > src_i ? dst_i : src_i) |
|
|
|
Explanation of symbols used |
------------------------------ |
|
|
Functions |
^^^^^^^^^^^^^^ |
|
|
:math:`|x|` Absolute value of `x`. |
|
:math:`\lceil x \rceil` Ceiling of `x`. |
|
clamp(x,y,z) Clamp x between y and z. |
(x < y) ? y : (x > z) ? z : x |
|
:math:`\lfloor x\rfloor` Floor of `x`. |
|
:math:`\log_2{x}` Logarithm of `x`, base 2. |
|
max(x,y) Maximum of x and y. |
(x > y) ? x : y |
|
min(x,y) Minimum of x and y. |
(x < y) ? x : y |
|
partialx(x) Derivative of x relative to fragment's X. |
|
partialy(x) Derivative of x relative to fragment's Y. |
|
pop() Pop from stack. |
|
:math:`x^y` `x` to the power `y`. |
|
push(x) Push x on stack. |
|
round(x) Round x. |
|
trunc(x) Truncate x, i.e. drop the fraction bits. |
|
|
Keywords |
^^^^^^^^^^^^^ |
|
|
discard Discard fragment. |
|
pc Program counter. |
|
target Label of target instruction. |
|
|
Other tokens |
--------------- |
|
|
Declaration |
^^^^^^^^^^^ |
|
|
Declares a register that is will be referenced as an operand in Instruction |
tokens. |
|
File field contains register file that is being declared and is one |
of TGSI_FILE. |
|
UsageMask field specifies which of the register components can be accessed |
and is one of TGSI_WRITEMASK. |
|
The Local flag specifies that a given value isn't intended for |
subroutine parameter passing and, as a result, the implementation |
isn't required to give any guarantees of it being preserved across |
subroutine boundaries. As it's merely a compiler hint, the |
implementation is free to ignore it. |
|
If Dimension flag is set to 1, a Declaration Dimension token follows. |
|
If Semantic flag is set to 1, a Declaration Semantic token follows. |
|
If Interpolate flag is set to 1, a Declaration Interpolate token follows. |
|
If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. |
|
If Array flag is set to 1, a Declaration Array token follows. |
|
Array Declaration |
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
Declarations can optional have an ArrayID attribute which can be referred by |
indirect addressing operands. An ArrayID of zero is reserved and treaded as |
if no ArrayID is specified. |
|
If an indirect addressing operand refers to a specific declaration by using |
an ArrayID only the registers in this declaration are guaranteed to be |
accessed, accessing any register outside this declaration results in undefined |
behavior. Note that for compatibility the effective index is zero-based and |
not relative to the specified declaration |
|
If no ArrayID is specified with an indirect addressing operand the whole |
register file might be accessed by this operand. This is strongly discouraged |
and will prevent packing of scalar/vec2 arrays and effective alias analysis. |
|
Declaration Semantic |
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
Vertex and fragment shader input and output registers may be labeled |
with semantic information consisting of a name and index. |
|
Follows Declaration token if Semantic bit is set. |
|
Since its purpose is to link a shader with other stages of the pipeline, |
it is valid to follow only those Declaration tokens that declare a register |
either in INPUT or OUTPUT file. |
|
SemanticName field contains the semantic name of the register being declared. |
There is no default value. |
|
SemanticIndex is an optional subscript that can be used to distinguish |
different register declarations with the same semantic name. The default value |
is 0. |
|
The meanings of the individual semantic names are explained in the following |
sections. |
|
TGSI_SEMANTIC_POSITION |
"""""""""""""""""""""" |
|
For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader |
output register which contains the homogeneous vertex position in the clip |
space coordinate system. After clipping, the X, Y and Z components of the |
vertex will be divided by the W value to get normalized device coordinates. |
|
For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that |
fragment shader input contains the fragment's window position. The X |
component starts at zero and always increases from left to right. |
The Y component starts at zero and always increases but Y=0 may either |
indicate the top of the window or the bottom depending on the fragment |
coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). |
The Z coordinate ranges from 0 to 1 to represent depth from the front |
to the back of the Z buffer. The W component contains the reciprocol |
of the interpolated vertex position W component. |
|
Fragment shaders may also declare an output register with |
TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows |
the fragment shader to change the fragment's Z position. |
|
|
|
TGSI_SEMANTIC_COLOR |
""""""""""""""""""" |
|
For vertex shader outputs or fragment shader inputs/outputs, this |
label indicates that the resister contains an R,G,B,A color. |
|
Several shader inputs/outputs may contain colors so the semantic index |
is used to distinguish them. For example, color[0] may be the diffuse |
color while color[1] may be the specular color. |
|
This label is needed so that the flat/smooth shading can be applied |
to the right interpolants during rasterization. |
|
|
|
TGSI_SEMANTIC_BCOLOR |
"""""""""""""""""""" |
|
Back-facing colors are only used for back-facing polygons, and are only valid |
in vertex shader outputs. After rasterization, all polygons are front-facing |
and COLOR and BCOLOR end up occupying the same slots in the fragment shader, |
so all BCOLORs effectively become regular COLORs in the fragment shader. |
|
|
TGSI_SEMANTIC_FOG |
""""""""""""""""" |
|
Vertex shader inputs and outputs and fragment shader inputs may be |
labeled with TGSI_SEMANTIC_FOG to indicate that the register contains |
a fog coordinate in the form (F, 0, 0, 1). Typically, the fragment |
shader will use the fog coordinate to compute a fog blend factor which |
is used to blend the normal fragment color with a constant fog color. |
|
Only the first component matters when writing from the vertex shader; |
the driver will ensure that the coordinate is in this format when used |
as a fragment shader input. |
|
|
TGSI_SEMANTIC_PSIZE |
""""""""""""""""""" |
|
Vertex shader input and output registers may be labeled with |
TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size |
in the form (S, 0, 0, 1). The point size controls the width or diameter |
of points for rasterization. This label cannot be used in fragment |
shaders. |
|
When using this semantic, be sure to set the appropriate state in the |
:ref:`rasterizer` first. |
|
|
TGSI_SEMANTIC_TEXCOORD |
"""""""""""""""""""""" |
|
Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! |
|
Vertex shader outputs and fragment shader inputs may be labeled with |
this semantic to make them replaceable by sprite coordinates via the |
sprite_coord_enable state in the :ref:`rasterizer`. |
The semantic index permitted with this semantic is limited to <= 7. |
|
If the driver does not support TEXCOORD, sprite coordinate replacement |
applies to inputs with the GENERIC semantic instead. |
|
The intended use case for this semantic is gl_TexCoord. |
|
|
TGSI_SEMANTIC_PCOORD |
"""""""""""""""""""" |
|
Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! |
|
Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate |
that the register contains sprite coordinates in the form (x, y, 0, 1), if |
the current primitive is a point and point sprites are enabled. Otherwise, |
the contents of the register are undefined. |
|
The intended use case for this semantic is gl_PointCoord. |
|
|
TGSI_SEMANTIC_GENERIC |
""""""""""""""""""""" |
|
All vertex/fragment shader inputs/outputs not labeled with any other |
semantic label can be considered to be generic attributes. Typical |
uses of generic inputs/outputs are texcoords and user-defined values. |
|
|
TGSI_SEMANTIC_NORMAL |
"""""""""""""""""""" |
|
Indicates that a vertex shader input is a normal vector. This is |
typically only used for legacy graphics APIs. |
|
|
TGSI_SEMANTIC_FACE |
"""""""""""""""""" |
|
This label applies to fragment shader inputs only and indicates that |
the register contains front/back-face information of the form (F, 0, |
0, 1). The first component will be positive when the fragment belongs |
to a front-facing polygon, and negative when the fragment belongs to a |
back-facing polygon. |
|
|
TGSI_SEMANTIC_EDGEFLAG |
"""""""""""""""""""""" |
|
For vertex shaders, this sematic label indicates that an input or |
output is a boolean edge flag. The register layout is [F, x, x, x] |
where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader |
simply copies the edge flag input to the edgeflag output. |
|
Edge flags are used to control which lines or points are actually |
drawn when the polygon mode converts triangles/quads/polygons into |
points or lines. |
|
|
TGSI_SEMANTIC_STENCIL |
""""""""""""""""""""" |
|
For fragment shaders, this semantic label indicates that an output |
is a writable stencil reference value. Only the Y component is writable. |
This allows the fragment shader to change the fragments stencilref value. |
|
|
TGSI_SEMANTIC_VIEWPORT_INDEX |
"""""""""""""""""""""""""""" |
|
For geometry shaders, this semantic label indicates that an output |
contains the index of the viewport (and scissor) to use. |
Only the X value is used. |
|
|
TGSI_SEMANTIC_LAYER |
""""""""""""""""""" |
|
For geometry shaders, this semantic label indicates that an output |
contains the layer value to use for the color and depth/stencil surfaces. |
Only the X value is used. (Also known as rendertarget array index.) |
|
|
TGSI_SEMANTIC_CULLDIST |
"""""""""""""""""""""" |
|
Used as distance to plane for performing application-defined culling |
of individual primitives against a plane. When components of vertex |
elements are given this label, these values are assumed to be a |
float32 signed distance to a plane. Primitives will be completely |
discarded if the plane distance for all of the vertices in the |
primitive are < 0. If a vertex has a cull distance of NaN, that |
vertex counts as "out" (as if its < 0); |
The limits on both clip and cull distances are bound |
by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines |
the maximum number of components that can be used to hold the |
distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT |
which specifies the maximum number of registers which can be |
annotated with those semantics. |
|
|
TGSI_SEMANTIC_CLIPDIST |
"""""""""""""""""""""" |
|
When components of vertex elements are identified this way, these |
values are each assumed to be a float32 signed distance to a plane. |
Primitive setup only invokes rasterization on pixels for which |
the interpolated plane distances are >= 0. Multiple clip planes |
can be implemented simultaneously, by annotating multiple |
components of one or more vertex elements with the above specified |
semantic. The limits on both clip and cull distances are bound |
by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines |
the maximum number of components that can be used to hold the |
distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT |
which specifies the maximum number of registers which can be |
annotated with those semantics. |
|
|
Declaration Interpolate |
^^^^^^^^^^^^^^^^^^^^^^^ |
|
This token is only valid for fragment shader INPUT declarations. |
|
The Interpolate field specifes the way input is being interpolated by |
the rasteriser and is one of TGSI_INTERPOLATE_*. |
|
The CylindricalWrap bitfield specifies which register components |
should be subject to cylindrical wrapping when interpolating by the |
rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component |
should be interpolated according to cylindrical wrapping rules. |
|
|
Declaration Sampler View |
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. |
|
DCL SVIEW[#], resource, type(s) |
|
Declares a shader input sampler view and assigns it to a SVIEW[#] |
register. |
|
resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. |
|
type must be 1 or 4 entries (if specifying on a per-component |
level) out of UNORM, SNORM, SINT, UINT and FLOAT. |
|
|
Declaration Resource |
^^^^^^^^^^^^^^^^^^^^ |
|
Follows Declaration token if file is TGSI_FILE_RESOURCE. |
|
DCL RES[#], resource [, WR] [, RAW] |
|
Declares a shader input resource and assigns it to a RES[#] |
register. |
|
resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and |
2DArray. |
|
If the RAW keyword is not specified, the texture data will be |
subject to conversion, swizzling and scaling as required to yield |
the specified data type from the physical data format of the bound |
resource. |
|
If the RAW keyword is specified, no channel conversion will be |
performed: the values read for each of the channels (X,Y,Z,W) will |
correspond to consecutive words in the same order and format |
they're found in memory. No element-to-address conversion will be |
performed either: the value of the provided X coordinate will be |
interpreted in byte units instead of texel units. The result of |
accessing a misaligned address is undefined. |
|
Usage of the STORE opcode is only allowed if the WR (writable) flag |
is set. |
|
|
Properties |
^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
Properties are general directives that apply to the whole TGSI program. |
|
FS_COORD_ORIGIN |
""""""""""""""" |
|
Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. |
The default value is UPPER_LEFT. |
|
If UPPER_LEFT, the position will be (0,0) at the upper left corner and |
increase downward and rightward. |
If LOWER_LEFT, the position will be (0,0) at the lower left corner and |
increase upward and rightward. |
|
OpenGL defaults to LOWER_LEFT, and is configurable with the |
GL_ARB_fragment_coord_conventions extension. |
|
DirectX 9/10 use UPPER_LEFT. |
|
FS_COORD_PIXEL_CENTER |
""""""""""""""""""""" |
|
Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. |
The default value is HALF_INTEGER. |
|
If HALF_INTEGER, the fractionary part of the position will be 0.5 |
If INTEGER, the fractionary part of the position will be 0.0 |
|
Note that this does not affect the set of fragments generated by |
rasterization, which is instead controlled by half_pixel_center in the |
rasterizer. |
|
OpenGL defaults to HALF_INTEGER, and is configurable with the |
GL_ARB_fragment_coord_conventions extension. |
|
DirectX 9 uses INTEGER. |
DirectX 10 uses HALF_INTEGER. |
|
FS_COLOR0_WRITES_ALL_CBUFS |
"""""""""""""""""""""""""" |
Specifies that writes to the fragment shader color 0 are replicated to all |
bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where |
fragData is directed to a single color buffer, but fragColor is broadcast. |
|
VS_PROHIBIT_UCPS |
"""""""""""""""""""""""""" |
If this property is set on the program bound to the shader stage before the |
fragment shader, user clip planes should have no effect (be disabled) even if |
that shader does not write to any clip distance outputs and the rasterizer's |
clip_plane_enable is non-zero. |
This property is only supported by drivers that also support shader clip |
distance outputs. |
This is useful for APIs that don't have UCPs and where clip distances written |
by a shader cannot be disabled. |
|
|
Texture Sampling and Texture Formats |
------------------------------------ |
|
This table shows how texture image components are returned as (x,y,z,w) tuples |
by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and |
:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as |
well. |
|
+--------------------+--------------+--------------------+--------------+ |
| Texture Components | Gallium | OpenGL | Direct3D 9 | |
+====================+==============+====================+==============+ |
| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | |
+--------------------+--------------+--------------------+--------------+ |
| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | |
+--------------------+--------------+--------------------+--------------+ |
| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | |
+--------------------+--------------+--------------------+--------------+ |
| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | |
+--------------------+--------------+--------------------+--------------+ |
| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | |
+--------------------+--------------+--------------------+--------------+ |
| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | |
+--------------------+--------------+--------------------+--------------+ |
| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | |
+--------------------+--------------+--------------------+--------------+ |
| I | (i, i, i, i) | (i, i, i, i) | N/A | |
+--------------------+--------------+--------------------+--------------+ |
| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | |
| | | [#envmap-bumpmap]_ | | |
+--------------------+--------------+--------------------+--------------+ |
| Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | |
| | | [#depth-tex-mode]_ | | |
+--------------------+--------------+--------------------+--------------+ |
| S | (s, s, s, s) | unknown | unknown | |
+--------------------+--------------+--------------------+--------------+ |
|
.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt |
.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) |
or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. |