Subversion Repositories Kolibri OS

Rev

Go to most recent revision | Blame | Last modification | View Log | RSS feed

  1. TGSI
  2. ====
  3.  
  4. TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
  5. for describing shaders. Since Gallium is inherently shaderful, shaders are
  6. an important part of the API. TGSI is the only intermediate representation
  7. used by all drivers.
  8.  
  9. Basics
  10. ------
  11.  
  12. All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
  13. floating-point four-component vectors. An opcode may have up to one
  14. destination register, known as *dst*, and between zero and three source
  15. registers, called *src0* through *src2*, or simply *src* if there is only
  16. one.
  17.  
  18. Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
  19. components as integers. Other instructions permit using registers as
  20. two-component vectors with double precision; see :ref:`doubleopcodes`.
  21.  
  22. When an instruction has a scalar result, the result is usually copied into
  23. each of the components of *dst*. When this happens, the result is said to be
  24. *replicated* to *dst*. :opcode:`RCP` is one such instruction.
  25.  
  26. Modifiers
  27. ^^^^^^^^^^^^^^^
  28.  
  29. TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
  30.  
  31. For inputs which have a floating point type, both absolute value and negation
  32. modifiers are supported (with absolute value being applied first).
  33. TGSI_OPCODE_MOV is considered to have float input type for applying modifiers.
  34.  
  35. For inputs which have signed or unsigned type only the negate modifier is
  36. supported.
  37.  
  38. Instruction Set
  39. ---------------
  40.  
  41. Core ISA
  42. ^^^^^^^^^^^^^^^^^^^^^^^^^
  43.  
  44. These opcodes are guaranteed to be available regardless of the driver being
  45. used.
  46.  
  47. .. opcode:: ARL - Address Register Load
  48.  
  49. .. math::
  50.  
  51.   dst.x = (int) \lfloor src.x\rfloor
  52.  
  53.   dst.y = (int) \lfloor src.y\rfloor
  54.  
  55.   dst.z = (int) \lfloor src.z\rfloor
  56.  
  57.   dst.w = (int) \lfloor src.w\rfloor
  58.  
  59.  
  60. .. opcode:: MOV - Move
  61.  
  62. .. math::
  63.  
  64.   dst.x = src.x
  65.  
  66.   dst.y = src.y
  67.  
  68.   dst.z = src.z
  69.  
  70.   dst.w = src.w
  71.  
  72.  
  73. .. opcode:: LIT - Light Coefficients
  74.  
  75. .. math::
  76.  
  77.   dst.x &= 1 \\
  78.   dst.y &= max(src.x, 0) \\
  79.   dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
  80.   dst.w &= 1
  81.  
  82.  
  83. .. opcode:: RCP - Reciprocal
  84.  
  85. This instruction replicates its result.
  86.  
  87. .. math::
  88.  
  89.   dst = \frac{1}{src.x}
  90.  
  91.  
  92. .. opcode:: RSQ - Reciprocal Square Root
  93.  
  94. This instruction replicates its result. The results are undefined for src <= 0.
  95.  
  96. .. math::
  97.  
  98.   dst = \frac{1}{\sqrt{src.x}}
  99.  
  100.  
  101. .. opcode:: SQRT - Square Root
  102.  
  103. This instruction replicates its result. The results are undefined for src < 0.
  104.  
  105. .. math::
  106.  
  107.   dst = {\sqrt{src.x}}
  108.  
  109.  
  110. .. opcode:: EXP - Approximate Exponential Base 2
  111.  
  112. .. math::
  113.  
  114.   dst.x &= 2^{\lfloor src.x\rfloor} \\
  115.   dst.y &= src.x - \lfloor src.x\rfloor \\
  116.   dst.z &= 2^{src.x} \\
  117.   dst.w &= 1
  118.  
  119.  
  120. .. opcode:: LOG - Approximate Logarithm Base 2
  121.  
  122. .. math::
  123.  
  124.   dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
  125.   dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
  126.   dst.z &= \log_2{|src.x|} \\
  127.   dst.w &= 1
  128.  
  129.  
  130. .. opcode:: MUL - Multiply
  131.  
  132. .. math::
  133.  
  134.   dst.x = src0.x \times src1.x
  135.  
  136.   dst.y = src0.y \times src1.y
  137.  
  138.   dst.z = src0.z \times src1.z
  139.  
  140.   dst.w = src0.w \times src1.w
  141.  
  142.  
  143. .. opcode:: ADD - Add
  144.  
  145. .. math::
  146.  
  147.   dst.x = src0.x + src1.x
  148.  
  149.   dst.y = src0.y + src1.y
  150.  
  151.   dst.z = src0.z + src1.z
  152.  
  153.   dst.w = src0.w + src1.w
  154.  
  155.  
  156. .. opcode:: DP3 - 3-component Dot Product
  157.  
  158. This instruction replicates its result.
  159.  
  160. .. math::
  161.  
  162.   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
  163.  
  164.  
  165. .. opcode:: DP4 - 4-component Dot Product
  166.  
  167. This instruction replicates its result.
  168.  
  169. .. math::
  170.  
  171.   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
  172.  
  173.  
  174. .. opcode:: DST - Distance Vector
  175.  
  176. .. math::
  177.  
  178.   dst.x &= 1\\
  179.   dst.y &= src0.y \times src1.y\\
  180.   dst.z &= src0.z\\
  181.   dst.w &= src1.w
  182.  
  183.  
  184. .. opcode:: MIN - Minimum
  185.  
  186. .. math::
  187.  
  188.   dst.x = min(src0.x, src1.x)
  189.  
  190.   dst.y = min(src0.y, src1.y)
  191.  
  192.   dst.z = min(src0.z, src1.z)
  193.  
  194.   dst.w = min(src0.w, src1.w)
  195.  
  196.  
  197. .. opcode:: MAX - Maximum
  198.  
  199. .. math::
  200.  
  201.   dst.x = max(src0.x, src1.x)
  202.  
  203.   dst.y = max(src0.y, src1.y)
  204.  
  205.   dst.z = max(src0.z, src1.z)
  206.  
  207.   dst.w = max(src0.w, src1.w)
  208.  
  209.  
  210. .. opcode:: SLT - Set On Less Than
  211.  
  212. .. math::
  213.  
  214.   dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
  215.  
  216.   dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
  217.  
  218.   dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
  219.  
  220.   dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
  221.  
  222.  
  223. .. opcode:: SGE - Set On Greater Equal Than
  224.  
  225. .. math::
  226.  
  227.   dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
  228.  
  229.   dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
  230.  
  231.   dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
  232.  
  233.   dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
  234.  
  235.  
  236. .. opcode:: MAD - Multiply And Add
  237.  
  238. .. math::
  239.  
  240.   dst.x = src0.x \times src1.x + src2.x
  241.  
  242.   dst.y = src0.y \times src1.y + src2.y
  243.  
  244.   dst.z = src0.z \times src1.z + src2.z
  245.  
  246.   dst.w = src0.w \times src1.w + src2.w
  247.  
  248.  
  249. .. opcode:: SUB - Subtract
  250.  
  251. .. math::
  252.  
  253.   dst.x = src0.x - src1.x
  254.  
  255.   dst.y = src0.y - src1.y
  256.  
  257.   dst.z = src0.z - src1.z
  258.  
  259.   dst.w = src0.w - src1.w
  260.  
  261.  
  262. .. opcode:: LRP - Linear Interpolate
  263.  
  264. .. math::
  265.  
  266.   dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
  267.  
  268.   dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
  269.  
  270.   dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
  271.  
  272.   dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
  273.  
  274.  
  275. .. opcode:: FMA - Fused Multiply-Add
  276.  
  277. Perform a * b + c with no intermediate rounding step.
  278.  
  279. .. math::
  280.  
  281.   dst.x = src0.x \times src1.x + src2.x
  282.  
  283.   dst.y = src0.y \times src1.y + src2.y
  284.  
  285.   dst.z = src0.z \times src1.z + src2.z
  286.  
  287.   dst.w = src0.w \times src1.w + src2.w
  288.  
  289.  
  290. .. opcode:: DP2A - 2-component Dot Product And Add
  291.  
  292. .. math::
  293.  
  294.   dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
  295.  
  296.   dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
  297.  
  298.   dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
  299.  
  300.   dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
  301.  
  302.  
  303. .. opcode:: FRC - Fraction
  304.  
  305. .. math::
  306.  
  307.   dst.x = src.x - \lfloor src.x\rfloor
  308.  
  309.   dst.y = src.y - \lfloor src.y\rfloor
  310.  
  311.   dst.z = src.z - \lfloor src.z\rfloor
  312.  
  313.   dst.w = src.w - \lfloor src.w\rfloor
  314.  
  315.  
  316. .. opcode:: CLAMP - Clamp
  317.  
  318. .. math::
  319.  
  320.   dst.x = clamp(src0.x, src1.x, src2.x)
  321.  
  322.   dst.y = clamp(src0.y, src1.y, src2.y)
  323.  
  324.   dst.z = clamp(src0.z, src1.z, src2.z)
  325.  
  326.   dst.w = clamp(src0.w, src1.w, src2.w)
  327.  
  328.  
  329. .. opcode:: FLR - Floor
  330.  
  331. .. math::
  332.  
  333.   dst.x = \lfloor src.x\rfloor
  334.  
  335.   dst.y = \lfloor src.y\rfloor
  336.  
  337.   dst.z = \lfloor src.z\rfloor
  338.  
  339.   dst.w = \lfloor src.w\rfloor
  340.  
  341.  
  342. .. opcode:: ROUND - Round
  343.  
  344. .. math::
  345.  
  346.   dst.x = round(src.x)
  347.  
  348.   dst.y = round(src.y)
  349.  
  350.   dst.z = round(src.z)
  351.  
  352.   dst.w = round(src.w)
  353.  
  354.  
  355. .. opcode:: EX2 - Exponential Base 2
  356.  
  357. This instruction replicates its result.
  358.  
  359. .. math::
  360.  
  361.   dst = 2^{src.x}
  362.  
  363.  
  364. .. opcode:: LG2 - Logarithm Base 2
  365.  
  366. This instruction replicates its result.
  367.  
  368. .. math::
  369.  
  370.   dst = \log_2{src.x}
  371.  
  372.  
  373. .. opcode:: POW - Power
  374.  
  375. This instruction replicates its result.
  376.  
  377. .. math::
  378.  
  379.   dst = src0.x^{src1.x}
  380.  
  381. .. opcode:: XPD - Cross Product
  382.  
  383. .. math::
  384.  
  385.   dst.x = src0.y \times src1.z - src1.y \times src0.z
  386.  
  387.   dst.y = src0.z \times src1.x - src1.z \times src0.x
  388.  
  389.   dst.z = src0.x \times src1.y - src1.x \times src0.y
  390.  
  391.   dst.w = 1
  392.  
  393.  
  394. .. opcode:: ABS - Absolute
  395.  
  396. .. math::
  397.  
  398.   dst.x = |src.x|
  399.  
  400.   dst.y = |src.y|
  401.  
  402.   dst.z = |src.z|
  403.  
  404.   dst.w = |src.w|
  405.  
  406.  
  407. .. opcode:: DPH - Homogeneous Dot Product
  408.  
  409. This instruction replicates its result.
  410.  
  411. .. math::
  412.  
  413.   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
  414.  
  415.  
  416. .. opcode:: COS - Cosine
  417.  
  418. This instruction replicates its result.
  419.  
  420. .. math::
  421.  
  422.   dst = \cos{src.x}
  423.  
  424.  
  425. .. opcode:: DDX, DDX_FINE - Derivative Relative To X
  426.  
  427. The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
  428. advertised. When it is, the fine version guarantees one derivative per row
  429. while DDX is allowed to be the same for the entire 2x2 quad.
  430.  
  431. .. math::
  432.  
  433.   dst.x = partialx(src.x)
  434.  
  435.   dst.y = partialx(src.y)
  436.  
  437.   dst.z = partialx(src.z)
  438.  
  439.   dst.w = partialx(src.w)
  440.  
  441.  
  442. .. opcode:: DDY, DDY_FINE - Derivative Relative To Y
  443.  
  444. The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
  445. advertised. When it is, the fine version guarantees one derivative per column
  446. while DDY is allowed to be the same for the entire 2x2 quad.
  447.  
  448. .. math::
  449.  
  450.   dst.x = partialy(src.x)
  451.  
  452.   dst.y = partialy(src.y)
  453.  
  454.   dst.z = partialy(src.z)
  455.  
  456.   dst.w = partialy(src.w)
  457.  
  458.  
  459. .. opcode:: PK2H - Pack Two 16-bit Floats
  460.  
  461.   TBD
  462.  
  463.  
  464. .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
  465.  
  466.   TBD
  467.  
  468.  
  469. .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
  470.  
  471.   TBD
  472.  
  473.  
  474. .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
  475.  
  476.   TBD
  477.  
  478.  
  479. .. opcode:: SEQ - Set On Equal
  480.  
  481. .. math::
  482.  
  483.   dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
  484.  
  485.   dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
  486.  
  487.   dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
  488.  
  489.   dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
  490.  
  491.  
  492. .. opcode:: SGT - Set On Greater Than
  493.  
  494. .. math::
  495.  
  496.   dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
  497.  
  498.   dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
  499.  
  500.   dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
  501.  
  502.   dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
  503.  
  504.  
  505. .. opcode:: SIN - Sine
  506.  
  507. This instruction replicates its result.
  508.  
  509. .. math::
  510.  
  511.   dst = \sin{src.x}
  512.  
  513.  
  514. .. opcode:: SLE - Set On Less Equal Than
  515.  
  516. .. math::
  517.  
  518.   dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
  519.  
  520.   dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
  521.  
  522.   dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
  523.  
  524.   dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
  525.  
  526.  
  527. .. opcode:: SNE - Set On Not Equal
  528.  
  529. .. math::
  530.  
  531.   dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
  532.  
  533.   dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
  534.  
  535.   dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
  536.  
  537.   dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
  538.  
  539.  
  540. .. opcode:: TEX - Texture Lookup
  541.  
  542.   for array textures src0.y contains the slice for 1D,
  543.   and src0.z contain the slice for 2D.
  544.  
  545.   for shadow textures with no arrays (and not cube map),
  546.   src0.z contains the reference value.
  547.  
  548.   for shadow textures with arrays, src0.z contains
  549.   the reference value for 1D arrays, and src0.w contains
  550.   the reference value for 2D arrays and cube maps.
  551.  
  552.   for cube map array shadow textures, the reference value
  553.   cannot be passed in src0.w, and TEX2 must be used instead.
  554.  
  555. .. math::
  556.  
  557.   coord = src0
  558.  
  559.   shadow_ref = src0.z or src0.w (optional)
  560.  
  561.   unit = src1
  562.  
  563.   dst = texture\_sample(unit, coord, shadow_ref)
  564.  
  565.  
  566. .. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
  567.  
  568.   this is the same as TEX, but uses another reg to encode the
  569.   reference value.
  570.  
  571. .. math::
  572.  
  573.   coord = src0
  574.  
  575.   shadow_ref = src1.x
  576.  
  577.   unit = src2
  578.  
  579.   dst = texture\_sample(unit, coord, shadow_ref)
  580.  
  581.  
  582.  
  583.  
  584. .. opcode:: TXD - Texture Lookup with Derivatives
  585.  
  586. .. math::
  587.  
  588.   coord = src0
  589.  
  590.   ddx = src1
  591.  
  592.   ddy = src2
  593.  
  594.   unit = src3
  595.  
  596.   dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
  597.  
  598.  
  599. .. opcode:: TXP - Projective Texture Lookup
  600.  
  601. .. math::
  602.  
  603.   coord.x = src0.x / src0.w
  604.  
  605.   coord.y = src0.y / src0.w
  606.  
  607.   coord.z = src0.z / src0.w
  608.  
  609.   coord.w = src0.w
  610.  
  611.   unit = src1
  612.  
  613.   dst = texture\_sample(unit, coord)
  614.  
  615.  
  616. .. opcode:: UP2H - Unpack Two 16-Bit Floats
  617.  
  618.   TBD
  619.  
  620. .. note::
  621.  
  622.    Considered for removal.
  623.  
  624. .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
  625.  
  626.   TBD
  627.  
  628. .. note::
  629.  
  630.    Considered for removal.
  631.  
  632. .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
  633.  
  634.   TBD
  635.  
  636. .. note::
  637.  
  638.    Considered for removal.
  639.  
  640. .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
  641.  
  642.   TBD
  643.  
  644. .. note::
  645.  
  646.    Considered for removal.
  647.  
  648.  
  649. .. opcode:: ARR - Address Register Load With Round
  650.  
  651. .. math::
  652.  
  653.   dst.x = (int) round(src.x)
  654.  
  655.   dst.y = (int) round(src.y)
  656.  
  657.   dst.z = (int) round(src.z)
  658.  
  659.   dst.w = (int) round(src.w)
  660.  
  661.  
  662. .. opcode:: SSG - Set Sign
  663.  
  664. .. math::
  665.  
  666.   dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
  667.  
  668.   dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
  669.  
  670.   dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
  671.  
  672.   dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
  673.  
  674.  
  675. .. opcode:: CMP - Compare
  676.  
  677. .. math::
  678.  
  679.   dst.x = (src0.x < 0) ? src1.x : src2.x
  680.  
  681.   dst.y = (src0.y < 0) ? src1.y : src2.y
  682.  
  683.   dst.z = (src0.z < 0) ? src1.z : src2.z
  684.  
  685.   dst.w = (src0.w < 0) ? src1.w : src2.w
  686.  
  687.  
  688. .. opcode:: KILL_IF - Conditional Discard
  689.  
  690.   Conditional discard.  Allowed in fragment shaders only.
  691.  
  692. .. math::
  693.  
  694.   if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
  695.     discard
  696.   endif
  697.  
  698.  
  699. .. opcode:: KILL - Discard
  700.  
  701.   Unconditional discard.  Allowed in fragment shaders only.
  702.  
  703.  
  704. .. opcode:: SCS - Sine Cosine
  705.  
  706. .. math::
  707.  
  708.   dst.x = \cos{src.x}
  709.  
  710.   dst.y = \sin{src.x}
  711.  
  712.   dst.z = 0
  713.  
  714.   dst.w = 1
  715.  
  716.  
  717. .. opcode:: TXB - Texture Lookup With Bias
  718.  
  719.   for cube map array textures and shadow cube maps, the bias value
  720.   cannot be passed in src0.w, and TXB2 must be used instead.
  721.  
  722.   if the target is a shadow texture, the reference value is always
  723.   in src.z (this prevents shadow 3d and shadow 2d arrays from
  724.   using this instruction, but this is not needed).
  725.  
  726. .. math::
  727.  
  728.   coord.x = src0.x
  729.  
  730.   coord.y = src0.y
  731.  
  732.   coord.z = src0.z
  733.  
  734.   coord.w = none
  735.  
  736.   bias = src0.w
  737.  
  738.   unit = src1
  739.  
  740.   dst = texture\_sample(unit, coord, bias)
  741.  
  742.  
  743. .. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
  744.  
  745.   this is the same as TXB, but uses another reg to encode the
  746.   lod bias value for cube map arrays and shadow cube maps.
  747.   Presumably shadow 2d arrays and shadow 3d targets could use
  748.   this encoding too, but this is not legal.
  749.  
  750.   shadow cube map arrays are neither possible nor required.
  751.  
  752. .. math::
  753.  
  754.   coord = src0
  755.  
  756.   bias = src1.x
  757.  
  758.   unit = src2
  759.  
  760.   dst = texture\_sample(unit, coord, bias)
  761.  
  762.  
  763. .. opcode:: DIV - Divide
  764.  
  765. .. math::
  766.  
  767.   dst.x = \frac{src0.x}{src1.x}
  768.  
  769.   dst.y = \frac{src0.y}{src1.y}
  770.  
  771.   dst.z = \frac{src0.z}{src1.z}
  772.  
  773.   dst.w = \frac{src0.w}{src1.w}
  774.  
  775.  
  776. .. opcode:: DP2 - 2-component Dot Product
  777.  
  778. This instruction replicates its result.
  779.  
  780. .. math::
  781.  
  782.   dst = src0.x \times src1.x + src0.y \times src1.y
  783.  
  784.  
  785. .. opcode:: TXL - Texture Lookup With explicit LOD
  786.  
  787.   for cube map array textures, the explicit lod value
  788.   cannot be passed in src0.w, and TXL2 must be used instead.
  789.  
  790.   if the target is a shadow texture, the reference value is always
  791.   in src.z (this prevents shadow 3d / 2d array / cube targets from
  792.   using this instruction, but this is not needed).
  793.  
  794. .. math::
  795.  
  796.   coord.x = src0.x
  797.  
  798.   coord.y = src0.y
  799.  
  800.   coord.z = src0.z
  801.  
  802.   coord.w = none
  803.  
  804.   lod = src0.w
  805.  
  806.   unit = src1
  807.  
  808.   dst = texture\_sample(unit, coord, lod)
  809.  
  810.  
  811. .. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
  812.  
  813.   this is the same as TXL, but uses another reg to encode the
  814.   explicit lod value.
  815.   Presumably shadow 3d / 2d array / cube targets could use
  816.   this encoding too, but this is not legal.
  817.  
  818.   shadow cube map arrays are neither possible nor required.
  819.  
  820. .. math::
  821.  
  822.   coord = src0
  823.  
  824.   lod = src1.x
  825.  
  826.   unit = src2
  827.  
  828.   dst = texture\_sample(unit, coord, lod)
  829.  
  830.  
  831. .. opcode:: PUSHA - Push Address Register On Stack
  832.  
  833.   push(src.x)
  834.   push(src.y)
  835.   push(src.z)
  836.   push(src.w)
  837.  
  838. .. note::
  839.  
  840.    Considered for cleanup.
  841.  
  842. .. note::
  843.  
  844.    Considered for removal.
  845.  
  846. .. opcode:: POPA - Pop Address Register From Stack
  847.  
  848.   dst.w = pop()
  849.   dst.z = pop()
  850.   dst.y = pop()
  851.   dst.x = pop()
  852.  
  853. .. note::
  854.  
  855.    Considered for cleanup.
  856.  
  857. .. note::
  858.  
  859.    Considered for removal.
  860.  
  861.  
  862. .. opcode:: CALLNZ - Subroutine Call If Not Zero
  863.  
  864.    TBD
  865.  
  866. .. note::
  867.  
  868.    Considered for cleanup.
  869.  
  870. .. note::
  871.  
  872.    Considered for removal.
  873.  
  874.  
  875. Compute ISA
  876. ^^^^^^^^^^^^^^^^^^^^^^^^
  877.  
  878. These opcodes are primarily provided for special-use computational shaders.
  879. Support for these opcodes indicated by a special pipe capability bit (TBD).
  880.  
  881. XXX doesn't look like most of the opcodes really belong here.
  882.  
  883. .. opcode:: CEIL - Ceiling
  884.  
  885. .. math::
  886.  
  887.   dst.x = \lceil src.x\rceil
  888.  
  889.   dst.y = \lceil src.y\rceil
  890.  
  891.   dst.z = \lceil src.z\rceil
  892.  
  893.   dst.w = \lceil src.w\rceil
  894.  
  895.  
  896. .. opcode:: TRUNC - Truncate
  897.  
  898. .. math::
  899.  
  900.   dst.x = trunc(src.x)
  901.  
  902.   dst.y = trunc(src.y)
  903.  
  904.   dst.z = trunc(src.z)
  905.  
  906.   dst.w = trunc(src.w)
  907.  
  908.  
  909. .. opcode:: MOD - Modulus
  910.  
  911. .. math::
  912.  
  913.   dst.x = src0.x \bmod src1.x
  914.  
  915.   dst.y = src0.y \bmod src1.y
  916.  
  917.   dst.z = src0.z \bmod src1.z
  918.  
  919.   dst.w = src0.w \bmod src1.w
  920.  
  921.  
  922. .. opcode:: UARL - Integer Address Register Load
  923.  
  924.   Moves the contents of the source register, assumed to be an integer, into the
  925.   destination register, which is assumed to be an address (ADDR) register.
  926.  
  927.  
  928. .. opcode:: SAD - Sum Of Absolute Differences
  929.  
  930. .. math::
  931.  
  932.   dst.x = |src0.x - src1.x| + src2.x
  933.  
  934.   dst.y = |src0.y - src1.y| + src2.y
  935.  
  936.   dst.z = |src0.z - src1.z| + src2.z
  937.  
  938.   dst.w = |src0.w - src1.w| + src2.w
  939.  
  940.  
  941. .. opcode:: TXF - Texel Fetch
  942.  
  943.   As per NV_gpu_shader4, extract a single texel from a specified texture
  944.   image. The source sampler may not be a CUBE or SHADOW.  src 0 is a
  945.   four-component signed integer vector used to identify the single texel
  946.   accessed. 3 components + level.  Just like texture instructions, an optional
  947.   offset vector is provided, which is subject to various driver restrictions
  948.   (regarding range, source of offsets).
  949.   TXF(uint_vec coord, int_vec offset).
  950.  
  951.  
  952. .. opcode:: TXQ - Texture Size Query
  953.  
  954.   As per NV_gpu_program4, retrieve the dimensions of the texture depending on
  955.   the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
  956.   depth), 1D array (width, layers), 2D array (width, height, layers).
  957.   Also return the number of accessible levels (last_level - first_level + 1)
  958.   in W.
  959.  
  960.   For components which don't return a resource dimension, their value
  961.   is undefined.
  962.  
  963.  
  964. .. math::
  965.  
  966.   lod = src0.x
  967.  
  968.   dst.x = texture\_width(unit, lod)
  969.  
  970.   dst.y = texture\_height(unit, lod)
  971.  
  972.   dst.z = texture\_depth(unit, lod)
  973.  
  974.   dst.w = texture\_levels(unit)
  975.  
  976. .. opcode:: TG4 - Texture Gather
  977.  
  978.   As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
  979.   filtering operation and packs them into a single register.  Only works with
  980.   2D, 2D array, cubemaps, and cubemaps arrays.  For 2D textures, only the
  981.   addressing modes of the sampler and the top level of any mip pyramid are
  982.   used. Set W to zero.  It behaves like the TEX instruction, but a filtered
  983.   sample is not generated. The four samples that contribute to filtering are
  984.   placed into xyzw in clockwise order, starting with the (u,v) texture
  985.   coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
  986.   where the magnitude of the deltas are half a texel.
  987.  
  988.   PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
  989.   depth compares, single component selection, and a non-constant offset. It
  990.   doesn't allow support for the GL independent offset to get i0,j0. This would
  991.   require another CAP is hw can do it natively. For now we lower that before
  992.   TGSI.
  993.  
  994. .. math::
  995.  
  996.    coord = src0
  997.  
  998.    component = src1
  999.  
  1000.    dst = texture\_gather4 (unit, coord, component)
  1001.  
  1002. (with SM5 - cube array shadow)
  1003.  
  1004. .. math::
  1005.  
  1006.    coord = src0
  1007.  
  1008.    compare = src1
  1009.  
  1010.    dst = texture\_gather (uint, coord, compare)
  1011.  
  1012. .. opcode:: LODQ - level of detail query
  1013.  
  1014.    Compute the LOD information that the texture pipe would use to access the
  1015.    texture. The Y component contains the computed LOD lambda_prime. The X
  1016.    component contains the LOD that will be accessed, based on min/max lod's
  1017.    and mipmap filters.
  1018.  
  1019. .. math::
  1020.  
  1021.    coord = src0
  1022.  
  1023.    dst.xy = lodq(uint, coord);
  1024.  
  1025. Integer ISA
  1026. ^^^^^^^^^^^^^^^^^^^^^^^^
  1027. These opcodes are used for integer operations.
  1028. Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
  1029.  
  1030.  
  1031. .. opcode:: I2F - Signed Integer To Float
  1032.  
  1033.    Rounding is unspecified (round to nearest even suggested).
  1034.  
  1035. .. math::
  1036.  
  1037.   dst.x = (float) src.x
  1038.  
  1039.   dst.y = (float) src.y
  1040.  
  1041.   dst.z = (float) src.z
  1042.  
  1043.   dst.w = (float) src.w
  1044.  
  1045.  
  1046. .. opcode:: U2F - Unsigned Integer To Float
  1047.  
  1048.    Rounding is unspecified (round to nearest even suggested).
  1049.  
  1050. .. math::
  1051.  
  1052.   dst.x = (float) src.x
  1053.  
  1054.   dst.y = (float) src.y
  1055.  
  1056.   dst.z = (float) src.z
  1057.  
  1058.   dst.w = (float) src.w
  1059.  
  1060.  
  1061. .. opcode:: F2I - Float to Signed Integer
  1062.  
  1063.    Rounding is towards zero (truncate).
  1064.    Values outside signed range (including NaNs) produce undefined results.
  1065.  
  1066. .. math::
  1067.  
  1068.   dst.x = (int) src.x
  1069.  
  1070.   dst.y = (int) src.y
  1071.  
  1072.   dst.z = (int) src.z
  1073.  
  1074.   dst.w = (int) src.w
  1075.  
  1076.  
  1077. .. opcode:: F2U - Float to Unsigned Integer
  1078.  
  1079.    Rounding is towards zero (truncate).
  1080.    Values outside unsigned range (including NaNs) produce undefined results.
  1081.  
  1082. .. math::
  1083.  
  1084.   dst.x = (unsigned) src.x
  1085.  
  1086.   dst.y = (unsigned) src.y
  1087.  
  1088.   dst.z = (unsigned) src.z
  1089.  
  1090.   dst.w = (unsigned) src.w
  1091.  
  1092.  
  1093. .. opcode:: UADD - Integer Add
  1094.  
  1095.    This instruction works the same for signed and unsigned integers.
  1096.    The low 32bit of the result is returned.
  1097.  
  1098. .. math::
  1099.  
  1100.   dst.x = src0.x + src1.x
  1101.  
  1102.   dst.y = src0.y + src1.y
  1103.  
  1104.   dst.z = src0.z + src1.z
  1105.  
  1106.   dst.w = src0.w + src1.w
  1107.  
  1108.  
  1109. .. opcode:: UMAD - Integer Multiply And Add
  1110.  
  1111.    This instruction works the same for signed and unsigned integers.
  1112.    The multiplication returns the low 32bit (as does the result itself).
  1113.  
  1114. .. math::
  1115.  
  1116.   dst.x = src0.x \times src1.x + src2.x
  1117.  
  1118.   dst.y = src0.y \times src1.y + src2.y
  1119.  
  1120.   dst.z = src0.z \times src1.z + src2.z
  1121.  
  1122.   dst.w = src0.w \times src1.w + src2.w
  1123.  
  1124.  
  1125. .. opcode:: UMUL - Integer Multiply
  1126.  
  1127.    This instruction works the same for signed and unsigned integers.
  1128.    The low 32bit of the result is returned.
  1129.  
  1130. .. math::
  1131.  
  1132.   dst.x = src0.x \times src1.x
  1133.  
  1134.   dst.y = src0.y \times src1.y
  1135.  
  1136.   dst.z = src0.z \times src1.z
  1137.  
  1138.   dst.w = src0.w \times src1.w
  1139.  
  1140.  
  1141. .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
  1142.  
  1143.    The high 32bits of the multiplication of 2 signed integers are returned.
  1144.  
  1145. .. math::
  1146.  
  1147.   dst.x = (src0.x \times src1.x) >> 32
  1148.  
  1149.   dst.y = (src0.y \times src1.y) >> 32
  1150.  
  1151.   dst.z = (src0.z \times src1.z) >> 32
  1152.  
  1153.   dst.w = (src0.w \times src1.w) >> 32
  1154.  
  1155.  
  1156. .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
  1157.  
  1158.    The high 32bits of the multiplication of 2 unsigned integers are returned.
  1159.  
  1160. .. math::
  1161.  
  1162.   dst.x = (src0.x \times src1.x) >> 32
  1163.  
  1164.   dst.y = (src0.y \times src1.y) >> 32
  1165.  
  1166.   dst.z = (src0.z \times src1.z) >> 32
  1167.  
  1168.   dst.w = (src0.w \times src1.w) >> 32
  1169.  
  1170.  
  1171. .. opcode:: IDIV - Signed Integer Division
  1172.  
  1173.    TBD: behavior for division by zero.
  1174.  
  1175. .. math::
  1176.  
  1177.   dst.x = src0.x \ src1.x
  1178.  
  1179.   dst.y = src0.y \ src1.y
  1180.  
  1181.   dst.z = src0.z \ src1.z
  1182.  
  1183.   dst.w = src0.w \ src1.w
  1184.  
  1185.  
  1186. .. opcode:: UDIV - Unsigned Integer Division
  1187.  
  1188.    For division by zero, 0xffffffff is returned.
  1189.  
  1190. .. math::
  1191.  
  1192.   dst.x = src0.x \ src1.x
  1193.  
  1194.   dst.y = src0.y \ src1.y
  1195.  
  1196.   dst.z = src0.z \ src1.z
  1197.  
  1198.   dst.w = src0.w \ src1.w
  1199.  
  1200.  
  1201. .. opcode:: UMOD - Unsigned Integer Remainder
  1202.  
  1203.    If second arg is zero, 0xffffffff is returned.
  1204.  
  1205. .. math::
  1206.  
  1207.   dst.x = src0.x \ src1.x
  1208.  
  1209.   dst.y = src0.y \ src1.y
  1210.  
  1211.   dst.z = src0.z \ src1.z
  1212.  
  1213.   dst.w = src0.w \ src1.w
  1214.  
  1215.  
  1216. .. opcode:: NOT - Bitwise Not
  1217.  
  1218. .. math::
  1219.  
  1220.   dst.x = \sim src.x
  1221.  
  1222.   dst.y = \sim src.y
  1223.  
  1224.   dst.z = \sim src.z
  1225.  
  1226.   dst.w = \sim src.w
  1227.  
  1228.  
  1229. .. opcode:: AND - Bitwise And
  1230.  
  1231. .. math::
  1232.  
  1233.   dst.x = src0.x \& src1.x
  1234.  
  1235.   dst.y = src0.y \& src1.y
  1236.  
  1237.   dst.z = src0.z \& src1.z
  1238.  
  1239.   dst.w = src0.w \& src1.w
  1240.  
  1241.  
  1242. .. opcode:: OR - Bitwise Or
  1243.  
  1244. .. math::
  1245.  
  1246.   dst.x = src0.x | src1.x
  1247.  
  1248.   dst.y = src0.y | src1.y
  1249.  
  1250.   dst.z = src0.z | src1.z
  1251.  
  1252.   dst.w = src0.w | src1.w
  1253.  
  1254.  
  1255. .. opcode:: XOR - Bitwise Xor
  1256.  
  1257. .. math::
  1258.  
  1259.   dst.x = src0.x \oplus src1.x
  1260.  
  1261.   dst.y = src0.y \oplus src1.y
  1262.  
  1263.   dst.z = src0.z \oplus src1.z
  1264.  
  1265.   dst.w = src0.w \oplus src1.w
  1266.  
  1267.  
  1268. .. opcode:: IMAX - Maximum of Signed Integers
  1269.  
  1270. .. math::
  1271.  
  1272.   dst.x = max(src0.x, src1.x)
  1273.  
  1274.   dst.y = max(src0.y, src1.y)
  1275.  
  1276.   dst.z = max(src0.z, src1.z)
  1277.  
  1278.   dst.w = max(src0.w, src1.w)
  1279.  
  1280.  
  1281. .. opcode:: UMAX - Maximum of Unsigned Integers
  1282.  
  1283. .. math::
  1284.  
  1285.   dst.x = max(src0.x, src1.x)
  1286.  
  1287.   dst.y = max(src0.y, src1.y)
  1288.  
  1289.   dst.z = max(src0.z, src1.z)
  1290.  
  1291.   dst.w = max(src0.w, src1.w)
  1292.  
  1293.  
  1294. .. opcode:: IMIN - Minimum of Signed Integers
  1295.  
  1296. .. math::
  1297.  
  1298.   dst.x = min(src0.x, src1.x)
  1299.  
  1300.   dst.y = min(src0.y, src1.y)
  1301.  
  1302.   dst.z = min(src0.z, src1.z)
  1303.  
  1304.   dst.w = min(src0.w, src1.w)
  1305.  
  1306.  
  1307. .. opcode:: UMIN - Minimum of Unsigned Integers
  1308.  
  1309. .. math::
  1310.  
  1311.   dst.x = min(src0.x, src1.x)
  1312.  
  1313.   dst.y = min(src0.y, src1.y)
  1314.  
  1315.   dst.z = min(src0.z, src1.z)
  1316.  
  1317.   dst.w = min(src0.w, src1.w)
  1318.  
  1319.  
  1320. .. opcode:: SHL - Shift Left
  1321.  
  1322.    The shift count is masked with 0x1f before the shift is applied.
  1323.  
  1324. .. math::
  1325.  
  1326.   dst.x = src0.x << (0x1f \& src1.x)
  1327.  
  1328.   dst.y = src0.y << (0x1f \& src1.y)
  1329.  
  1330.   dst.z = src0.z << (0x1f \& src1.z)
  1331.  
  1332.   dst.w = src0.w << (0x1f \& src1.w)
  1333.  
  1334.  
  1335. .. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
  1336.  
  1337.    The shift count is masked with 0x1f before the shift is applied.
  1338.  
  1339. .. math::
  1340.  
  1341.   dst.x = src0.x >> (0x1f \& src1.x)
  1342.  
  1343.   dst.y = src0.y >> (0x1f \& src1.y)
  1344.  
  1345.   dst.z = src0.z >> (0x1f \& src1.z)
  1346.  
  1347.   dst.w = src0.w >> (0x1f \& src1.w)
  1348.  
  1349.  
  1350. .. opcode:: USHR - Logical Shift Right
  1351.  
  1352.    The shift count is masked with 0x1f before the shift is applied.
  1353.  
  1354. .. math::
  1355.  
  1356.   dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
  1357.  
  1358.   dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
  1359.  
  1360.   dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
  1361.  
  1362.   dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
  1363.  
  1364.  
  1365. .. opcode:: UCMP - Integer Conditional Move
  1366.  
  1367. .. math::
  1368.  
  1369.   dst.x = src0.x ? src1.x : src2.x
  1370.  
  1371.   dst.y = src0.y ? src1.y : src2.y
  1372.  
  1373.   dst.z = src0.z ? src1.z : src2.z
  1374.  
  1375.   dst.w = src0.w ? src1.w : src2.w
  1376.  
  1377.  
  1378.  
  1379. .. opcode:: ISSG - Integer Set Sign
  1380.  
  1381. .. math::
  1382.  
  1383.   dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
  1384.  
  1385.   dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
  1386.  
  1387.   dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
  1388.  
  1389.   dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
  1390.  
  1391.  
  1392.  
  1393. .. opcode:: FSLT - Float Set On Less Than (ordered)
  1394.  
  1395.    Same comparison as SLT but returns integer instead of 1.0/0.0 float
  1396.  
  1397. .. math::
  1398.  
  1399.   dst.x = (src0.x < src1.x) ? \sim 0 : 0
  1400.  
  1401.   dst.y = (src0.y < src1.y) ? \sim 0 : 0
  1402.  
  1403.   dst.z = (src0.z < src1.z) ? \sim 0 : 0
  1404.  
  1405.   dst.w = (src0.w < src1.w) ? \sim 0 : 0
  1406.  
  1407.  
  1408. .. opcode:: ISLT - Signed Integer Set On Less Than
  1409.  
  1410. .. math::
  1411.  
  1412.   dst.x = (src0.x < src1.x) ? \sim 0 : 0
  1413.  
  1414.   dst.y = (src0.y < src1.y) ? \sim 0 : 0
  1415.  
  1416.   dst.z = (src0.z < src1.z) ? \sim 0 : 0
  1417.  
  1418.   dst.w = (src0.w < src1.w) ? \sim 0 : 0
  1419.  
  1420.  
  1421. .. opcode:: USLT - Unsigned Integer Set On Less Than
  1422.  
  1423. .. math::
  1424.  
  1425.   dst.x = (src0.x < src1.x) ? \sim 0 : 0
  1426.  
  1427.   dst.y = (src0.y < src1.y) ? \sim 0 : 0
  1428.  
  1429.   dst.z = (src0.z < src1.z) ? \sim 0 : 0
  1430.  
  1431.   dst.w = (src0.w < src1.w) ? \sim 0 : 0
  1432.  
  1433.  
  1434. .. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
  1435.  
  1436.    Same comparison as SGE but returns integer instead of 1.0/0.0 float
  1437.  
  1438. .. math::
  1439.  
  1440.   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
  1441.  
  1442.   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
  1443.  
  1444.   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
  1445.  
  1446.   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
  1447.  
  1448.  
  1449. .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
  1450.  
  1451. .. math::
  1452.  
  1453.   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
  1454.  
  1455.   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
  1456.  
  1457.   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
  1458.  
  1459.   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
  1460.  
  1461.  
  1462. .. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
  1463.  
  1464. .. math::
  1465.  
  1466.   dst.x = (src0.x >= src1.x) ? \sim 0 : 0
  1467.  
  1468.   dst.y = (src0.y >= src1.y) ? \sim 0 : 0
  1469.  
  1470.   dst.z = (src0.z >= src1.z) ? \sim 0 : 0
  1471.  
  1472.   dst.w = (src0.w >= src1.w) ? \sim 0 : 0
  1473.  
  1474.  
  1475. .. opcode:: FSEQ - Float Set On Equal (ordered)
  1476.  
  1477.    Same comparison as SEQ but returns integer instead of 1.0/0.0 float
  1478.  
  1479. .. math::
  1480.  
  1481.   dst.x = (src0.x == src1.x) ? \sim 0 : 0
  1482.  
  1483.   dst.y = (src0.y == src1.y) ? \sim 0 : 0
  1484.  
  1485.   dst.z = (src0.z == src1.z) ? \sim 0 : 0
  1486.  
  1487.   dst.w = (src0.w == src1.w) ? \sim 0 : 0
  1488.  
  1489.  
  1490. .. opcode:: USEQ - Integer Set On Equal
  1491.  
  1492. .. math::
  1493.  
  1494.   dst.x = (src0.x == src1.x) ? \sim 0 : 0
  1495.  
  1496.   dst.y = (src0.y == src1.y) ? \sim 0 : 0
  1497.  
  1498.   dst.z = (src0.z == src1.z) ? \sim 0 : 0
  1499.  
  1500.   dst.w = (src0.w == src1.w) ? \sim 0 : 0
  1501.  
  1502.  
  1503. .. opcode:: FSNE - Float Set On Not Equal (unordered)
  1504.  
  1505.    Same comparison as SNE but returns integer instead of 1.0/0.0 float
  1506.  
  1507. .. math::
  1508.  
  1509.   dst.x = (src0.x != src1.x) ? \sim 0 : 0
  1510.  
  1511.   dst.y = (src0.y != src1.y) ? \sim 0 : 0
  1512.  
  1513.   dst.z = (src0.z != src1.z) ? \sim 0 : 0
  1514.  
  1515.   dst.w = (src0.w != src1.w) ? \sim 0 : 0
  1516.  
  1517.  
  1518. .. opcode:: USNE - Integer Set On Not Equal
  1519.  
  1520. .. math::
  1521.  
  1522.   dst.x = (src0.x != src1.x) ? \sim 0 : 0
  1523.  
  1524.   dst.y = (src0.y != src1.y) ? \sim 0 : 0
  1525.  
  1526.   dst.z = (src0.z != src1.z) ? \sim 0 : 0
  1527.  
  1528.   dst.w = (src0.w != src1.w) ? \sim 0 : 0
  1529.  
  1530.  
  1531. .. opcode:: INEG - Integer Negate
  1532.  
  1533.   Two's complement.
  1534.  
  1535. .. math::
  1536.  
  1537.   dst.x = -src.x
  1538.  
  1539.   dst.y = -src.y
  1540.  
  1541.   dst.z = -src.z
  1542.  
  1543.   dst.w = -src.w
  1544.  
  1545.  
  1546. .. opcode:: IABS - Integer Absolute Value
  1547.  
  1548. .. math::
  1549.  
  1550.   dst.x = |src.x|
  1551.  
  1552.   dst.y = |src.y|
  1553.  
  1554.   dst.z = |src.z|
  1555.  
  1556.   dst.w = |src.w|
  1557.  
  1558. Bitwise ISA
  1559. ^^^^^^^^^^^
  1560. These opcodes are used for bit-level manipulation of integers.
  1561.  
  1562. .. opcode:: IBFE - Signed Bitfield Extract
  1563.  
  1564.   See SM5 instruction of the same name. Extracts a set of bits from the input,
  1565.   and sign-extends them if the high bit of the extracted window is set.
  1566.  
  1567.   Pseudocode::
  1568.  
  1569.     def ibfe(value, offset, bits):
  1570.       offset = offset & 0x1f
  1571.       bits = bits & 0x1f
  1572.       if bits == 0: return 0
  1573.       # Note: >> sign-extends
  1574.       if width + offset < 32:
  1575.         return (value << (32 - offset - bits)) >> (32 - bits)
  1576.       else:
  1577.         return value >> offset
  1578.  
  1579. .. opcode:: UBFE - Unsigned Bitfield Extract
  1580.  
  1581.   See SM5 instruction of the same name. Extracts a set of bits from the input,
  1582.   without any sign-extension.
  1583.  
  1584.   Pseudocode::
  1585.  
  1586.     def ubfe(value, offset, bits):
  1587.       offset = offset & 0x1f
  1588.       bits = bits & 0x1f
  1589.       if bits == 0: return 0
  1590.       # Note: >> does not sign-extend
  1591.       if width + offset < 32:
  1592.         return (value << (32 - offset - bits)) >> (32 - bits)
  1593.       else:
  1594.         return value >> offset
  1595.  
  1596. .. opcode:: BFI - Bitfield Insert
  1597.  
  1598.   See SM5 instruction of the same name. Replaces a bit region of 'base' with
  1599.   the low bits of 'insert'.
  1600.  
  1601.   Pseudocode::
  1602.  
  1603.     def bfi(base, insert, offset, bits):
  1604.       offset = offset & 0x1f
  1605.       bits = bits & 0x1f
  1606.       mask = ((1 << bits) - 1) << offset
  1607.       return ((insert << offset) & mask) | (base & ~mask)
  1608.  
  1609. .. opcode:: BREV - Bitfield Reverse
  1610.  
  1611.   See SM5 instruction BFREV. Reverses the bits of the argument.
  1612.  
  1613. .. opcode:: POPC - Population Count
  1614.  
  1615.   See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
  1616.  
  1617. .. opcode:: LSB - Index of lowest set bit
  1618.  
  1619.   See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
  1620.   bit of the argument. Returns -1 if none are set.
  1621.  
  1622. .. opcode:: IMSB - Index of highest non-sign bit
  1623.  
  1624.   See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
  1625.   non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
  1626.   highest 1 bit for positive numbers). Returns -1 if all bits are the same
  1627.   (i.e. for inputs 0 and -1).
  1628.  
  1629. .. opcode:: UMSB - Index of highest set bit
  1630.  
  1631.   See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
  1632.   set bit of the argument. Returns -1 if none are set.
  1633.  
  1634. Geometry ISA
  1635. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1636.  
  1637. These opcodes are only supported in geometry shaders; they have no meaning
  1638. in any other type of shader.
  1639.  
  1640. .. opcode:: EMIT - Emit
  1641.  
  1642.   Generate a new vertex for the current primitive into the specified vertex
  1643.   stream using the values in the output registers.
  1644.  
  1645.  
  1646. .. opcode:: ENDPRIM - End Primitive
  1647.  
  1648.   Complete the current primitive in the specified vertex stream (consisting of
  1649.   the emitted vertices), and start a new one.
  1650.  
  1651.  
  1652. GLSL ISA
  1653. ^^^^^^^^^^
  1654.  
  1655. These opcodes are part of :term:`GLSL`'s opcode set. Support for these
  1656. opcodes is determined by a special capability bit, ``GLSL``.
  1657. Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH).
  1658.  
  1659. .. opcode:: CAL - Subroutine Call
  1660.  
  1661.   push(pc)
  1662.   pc = target
  1663.  
  1664.  
  1665. .. opcode:: RET - Subroutine Call Return
  1666.  
  1667.   pc = pop()
  1668.  
  1669.  
  1670. .. opcode:: CONT - Continue
  1671.  
  1672.   Unconditionally moves the point of execution to the instruction after the
  1673.   last bgnloop. The instruction must appear within a bgnloop/endloop.
  1674.  
  1675. .. note::
  1676.  
  1677.    Support for CONT is determined by a special capability bit,
  1678.    ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
  1679.  
  1680.  
  1681. .. opcode:: BGNLOOP - Begin a Loop
  1682.  
  1683.   Start a loop. Must have a matching endloop.
  1684.  
  1685.  
  1686. .. opcode:: BGNSUB - Begin Subroutine
  1687.  
  1688.   Starts definition of a subroutine. Must have a matching endsub.
  1689.  
  1690.  
  1691. .. opcode:: ENDLOOP - End a Loop
  1692.  
  1693.   End a loop started with bgnloop.
  1694.  
  1695.  
  1696. .. opcode:: ENDSUB - End Subroutine
  1697.  
  1698.   Ends definition of a subroutine.
  1699.  
  1700.  
  1701. .. opcode:: NOP - No Operation
  1702.  
  1703.   Do nothing.
  1704.  
  1705.  
  1706. .. opcode:: BRK - Break
  1707.  
  1708.   Unconditionally moves the point of execution to the instruction after the
  1709.   next endloop or endswitch. The instruction must appear within a loop/endloop
  1710.   or switch/endswitch.
  1711.  
  1712.  
  1713. .. opcode:: BREAKC - Break Conditional
  1714.  
  1715.   Conditionally moves the point of execution to the instruction after the
  1716.   next endloop or endswitch. The instruction must appear within a loop/endloop
  1717.   or switch/endswitch.
  1718.   Condition evaluates to true if src0.x != 0 where src0.x is interpreted
  1719.   as an integer register.
  1720.  
  1721. .. note::
  1722.  
  1723.    Considered for removal as it's quite inconsistent wrt other opcodes
  1724.    (could emulate with UIF/BRK/ENDIF).
  1725.  
  1726.  
  1727. .. opcode:: IF - Float If
  1728.  
  1729.   Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
  1730.  
  1731.     src0.x != 0.0
  1732.  
  1733.   where src0.x is interpreted as a floating point register.
  1734.  
  1735.  
  1736. .. opcode:: UIF - Bitwise If
  1737.  
  1738.   Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
  1739.  
  1740.     src0.x != 0
  1741.  
  1742.   where src0.x is interpreted as an integer register.
  1743.  
  1744.  
  1745. .. opcode:: ELSE - Else
  1746.  
  1747.   Starts an else block, after an IF or UIF statement.
  1748.  
  1749.  
  1750. .. opcode:: ENDIF - End If
  1751.  
  1752.   Ends an IF or UIF block.
  1753.  
  1754.  
  1755. .. opcode:: SWITCH - Switch
  1756.  
  1757.    Starts a C-style switch expression. The switch consists of one or multiple
  1758.    CASE statements, and at most one DEFAULT statement. Execution of a statement
  1759.    ends when a BRK is hit, but just like in C falling through to other cases
  1760.    without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
  1761.    just as last statement, and fallthrough is allowed into/from it.
  1762.    CASE src arguments are evaluated at bit level against the SWITCH src argument.
  1763.  
  1764.    Example::
  1765.  
  1766.      SWITCH src[0].x
  1767.      CASE src[0].x
  1768.      (some instructions here)
  1769.      (optional BRK here)
  1770.      DEFAULT
  1771.      (some instructions here)
  1772.      (optional BRK here)
  1773.      CASE src[0].x
  1774.      (some instructions here)
  1775.      (optional BRK here)
  1776.      ENDSWITCH
  1777.  
  1778.  
  1779. .. opcode:: CASE - Switch case
  1780.  
  1781.    This represents a switch case label. The src arg must be an integer immediate.
  1782.  
  1783.  
  1784. .. opcode:: DEFAULT - Switch default
  1785.  
  1786.    This represents the default case in the switch, which is taken if no other
  1787.    case matches.
  1788.  
  1789.  
  1790. .. opcode:: ENDSWITCH - End of switch
  1791.  
  1792.    Ends a switch expression.
  1793.  
  1794.  
  1795. Interpolation ISA
  1796. ^^^^^^^^^^^^^^^^^
  1797.  
  1798. The interpolation instructions allow an input to be interpolated in a
  1799. different way than its declaration. This corresponds to the GLSL 4.00
  1800. interpolateAt* functions. The first argument of each of these must come from
  1801. ``TGSI_FILE_INPUT``.
  1802.  
  1803. .. opcode:: INTERP_CENTROID - Interpolate at the centroid
  1804.  
  1805.    Interpolates the varying specified by src0 at the centroid
  1806.  
  1807. .. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
  1808.  
  1809.    Interpolates the varying specified by src0 at the sample id specified by
  1810.    src1.x (interpreted as an integer)
  1811.  
  1812. .. opcode:: INTERP_OFFSET - Interpolate at the specified offset
  1813.  
  1814.    Interpolates the varying specified by src0 at the offset src1.xy from the
  1815.    pixel center (interpreted as floats)
  1816.  
  1817.  
  1818. .. _doubleopcodes:
  1819.  
  1820. Double ISA
  1821. ^^^^^^^^^^^^^^^
  1822.  
  1823. The double-precision opcodes reinterpret four-component vectors into
  1824. two-component vectors with doubled precision in each component.
  1825.  
  1826. .. opcode:: DABS - Absolute
  1827.  
  1828.   dst.xy = |src0.xy|
  1829.   dst.zw = |src0.zw|
  1830.  
  1831. .. opcode:: DADD - Add
  1832.  
  1833. .. math::
  1834.  
  1835.   dst.xy = src0.xy + src1.xy
  1836.  
  1837.   dst.zw = src0.zw + src1.zw
  1838.  
  1839. .. opcode:: DSEQ - Set on Equal
  1840.  
  1841. .. math::
  1842.  
  1843.   dst.x = src0.xy == src1.xy ? \sim 0 : 0
  1844.  
  1845.   dst.z = src0.zw == src1.zw ? \sim 0 : 0
  1846.  
  1847. .. opcode:: DSNE - Set on Equal
  1848.  
  1849. .. math::
  1850.  
  1851.   dst.x = src0.xy != src1.xy ? \sim 0 : 0
  1852.  
  1853.   dst.z = src0.zw != src1.zw ? \sim 0 : 0
  1854.  
  1855. .. opcode:: DSLT - Set on Less than
  1856.  
  1857. .. math::
  1858.  
  1859.   dst.x = src0.xy < src1.xy ? \sim 0 : 0
  1860.  
  1861.   dst.z = src0.zw < src1.zw ? \sim 0 : 0
  1862.  
  1863. .. opcode:: DSGE - Set on Greater equal
  1864.  
  1865. .. math::
  1866.  
  1867.   dst.x = src0.xy >= src1.xy ? \sim 0 : 0
  1868.  
  1869.   dst.z = src0.zw >= src1.zw ? \sim 0 : 0
  1870.  
  1871. .. opcode:: DFRAC - Fraction
  1872.  
  1873. .. math::
  1874.  
  1875.   dst.xy = src.xy - \lfloor src.xy\rfloor
  1876.  
  1877.   dst.zw = src.zw - \lfloor src.zw\rfloor
  1878.  
  1879. .. opcode:: DTRUNC - Truncate
  1880.  
  1881. .. math::
  1882.  
  1883.   dst.xy = trunc(src.xy)
  1884.  
  1885.   dst.zw = trunc(src.zw)
  1886.  
  1887. .. opcode:: DCEIL - Ceiling
  1888.  
  1889. .. math::
  1890.  
  1891.   dst.xy = \lceil src.xy\rceil
  1892.  
  1893.   dst.zw = \lceil src.zw\rceil
  1894.  
  1895. .. opcode:: DFLR - Floor
  1896.  
  1897. .. math::
  1898.  
  1899.   dst.xy = \lfloor src.xy\rfloor
  1900.  
  1901.   dst.zw = \lfloor src.zw\rfloor
  1902.  
  1903. .. opcode:: DROUND - Fraction
  1904.  
  1905. .. math::
  1906.  
  1907.   dst.xy = round(src.xy)
  1908.  
  1909.   dst.zw = round(src.zw)
  1910.  
  1911. .. opcode:: DSSG - Set Sign
  1912.  
  1913. .. math::
  1914.  
  1915.   dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
  1916.  
  1917.   dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
  1918.  
  1919. .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
  1920.  
  1921. Like the ``frexp()`` routine in many math libraries, this opcode stores the
  1922. exponent of its source to ``dst0``, and the significand to ``dst1``, such that
  1923. :math:`dst1 \times 2^{dst0} = src` .
  1924.  
  1925. .. math::
  1926.  
  1927.   dst0.xy = exp(src.xy)
  1928.  
  1929.   dst1.xy = frac(src.xy)
  1930.  
  1931.   dst0.zw = exp(src.zw)
  1932.  
  1933.   dst1.zw = frac(src.zw)
  1934.  
  1935. .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
  1936.  
  1937. This opcode is the inverse of :opcode:`DFRACEXP`. The second
  1938. source is an integer.
  1939.  
  1940. .. math::
  1941.  
  1942.   dst.xy = src0.xy \times 2^{src1.x}
  1943.  
  1944.   dst.zw = src0.zw \times 2^{src1.y}
  1945.  
  1946. .. opcode:: DMIN - Minimum
  1947.  
  1948. .. math::
  1949.  
  1950.   dst.xy = min(src0.xy, src1.xy)
  1951.  
  1952.   dst.zw = min(src0.zw, src1.zw)
  1953.  
  1954. .. opcode:: DMAX - Maximum
  1955.  
  1956. .. math::
  1957.  
  1958.   dst.xy = max(src0.xy, src1.xy)
  1959.  
  1960.   dst.zw = max(src0.zw, src1.zw)
  1961.  
  1962. .. opcode:: DMUL - Multiply
  1963.  
  1964. .. math::
  1965.  
  1966.   dst.xy = src0.xy \times src1.xy
  1967.  
  1968.   dst.zw = src0.zw \times src1.zw
  1969.  
  1970.  
  1971. .. opcode:: DMAD - Multiply And Add
  1972.  
  1973. .. math::
  1974.  
  1975.   dst.xy = src0.xy \times src1.xy + src2.xy
  1976.  
  1977.   dst.zw = src0.zw \times src1.zw + src2.zw
  1978.  
  1979.  
  1980. .. opcode:: DFMA - Fused Multiply-Add
  1981.  
  1982. Perform a * b + c with no intermediate rounding step.
  1983.  
  1984. .. math::
  1985.  
  1986.   dst.xy = src0.xy \times src1.xy + src2.xy
  1987.  
  1988.   dst.zw = src0.zw \times src1.zw + src2.zw
  1989.  
  1990.  
  1991. .. opcode:: DRCP - Reciprocal
  1992.  
  1993. .. math::
  1994.  
  1995.    dst.xy = \frac{1}{src.xy}
  1996.  
  1997.    dst.zw = \frac{1}{src.zw}
  1998.  
  1999. .. opcode:: DSQRT - Square Root
  2000.  
  2001. .. math::
  2002.  
  2003.    dst.xy = \sqrt{src.xy}
  2004.  
  2005.    dst.zw = \sqrt{src.zw}
  2006.  
  2007. .. opcode:: DRSQ - Reciprocal Square Root
  2008.  
  2009. .. math::
  2010.  
  2011.    dst.xy = \frac{1}{\sqrt{src.xy}}
  2012.  
  2013.    dst.zw = \frac{1}{\sqrt{src.zw}}
  2014.  
  2015. .. opcode:: F2D - Float to Double
  2016.  
  2017. .. math::
  2018.  
  2019.    dst.xy = double(src0.x)
  2020.  
  2021.    dst.zw = double(src0.y)
  2022.  
  2023. .. opcode:: D2F - Double to Float
  2024.  
  2025. .. math::
  2026.  
  2027.    dst.x = float(src0.xy)
  2028.  
  2029.    dst.y = float(src0.zw)
  2030.  
  2031. .. opcode:: I2D - Int to Double
  2032.  
  2033. .. math::
  2034.  
  2035.    dst.xy = double(src0.x)
  2036.  
  2037.    dst.zw = double(src0.y)
  2038.  
  2039. .. opcode:: D2I - Double to Int
  2040.  
  2041. .. math::
  2042.  
  2043.    dst.x = int(src0.xy)
  2044.  
  2045.    dst.y = int(src0.zw)
  2046.  
  2047. .. opcode:: U2D - Unsigned Int to Double
  2048.  
  2049. .. math::
  2050.  
  2051.    dst.xy = double(src0.x)
  2052.  
  2053.    dst.zw = double(src0.y)
  2054.  
  2055. .. opcode:: D2U - Double to Unsigned Int
  2056.  
  2057. .. math::
  2058.  
  2059.    dst.x = unsigned(src0.xy)
  2060.  
  2061.    dst.y = unsigned(src0.zw)
  2062.  
  2063. .. _samplingopcodes:
  2064.  
  2065. Resource Sampling Opcodes
  2066. ^^^^^^^^^^^^^^^^^^^^^^^^^
  2067.  
  2068. Those opcodes follow very closely semantics of the respective Direct3D
  2069. instructions. If in doubt double check Direct3D documentation.
  2070. Note that the swizzle on SVIEW (src1) determines texel swizzling
  2071. after lookup.
  2072.  
  2073. .. opcode:: SAMPLE
  2074.  
  2075.   Using provided address, sample data from the specified texture using the
  2076.   filtering mode identified by the gven sampler. The source data may come from
  2077.   any resource type other than buffers.
  2078.  
  2079.   Syntax: ``SAMPLE dst, address, sampler_view, sampler``
  2080.  
  2081.   Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
  2082.  
  2083. .. opcode:: SAMPLE_I
  2084.  
  2085.   Simplified alternative to the SAMPLE instruction.  Using the provided
  2086.   integer address, SAMPLE_I fetches data from the specified sampler view
  2087.   without any filtering.  The source data may come from any resource type
  2088.   other than CUBE.
  2089.  
  2090.   Syntax: ``SAMPLE_I dst, address, sampler_view``
  2091.  
  2092.   Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
  2093.  
  2094.   The 'address' is specified as unsigned integers. If the 'address' is out of
  2095.   range [0...(# texels - 1)] the result of the fetch is always 0 in all
  2096.   components.  As such the instruction doesn't honor address wrap modes, in
  2097.   cases where that behavior is desirable 'SAMPLE' instruction should be used.
  2098.   address.w always provides an unsigned integer mipmap level. If the value is
  2099.   out of the range then the instruction always returns 0 in all components.
  2100.   address.yz are ignored for buffers and 1d textures.  address.z is ignored
  2101.   for 1d texture arrays and 2d textures.
  2102.  
  2103.   For 1D texture arrays address.y provides the array index (also as unsigned
  2104.   integer). If the value is out of the range of available array indices
  2105.   [0... (array size - 1)] then the opcode always returns 0 in all components.
  2106.   For 2D texture arrays address.z provides the array index, otherwise it
  2107.   exhibits the same behavior as in the case for 1D texture arrays.  The exact
  2108.   semantics of the source address are presented in the table below:
  2109.  
  2110.   +---------------------------+----+-----+-----+---------+
  2111.   | resource type             | X  |  Y  |  Z  |    W    |
  2112.   +===========================+====+=====+=====+=========+
  2113.   | ``PIPE_BUFFER``           | x  |     |     | ignored |
  2114.   +---------------------------+----+-----+-----+---------+
  2115.   | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
  2116.   +---------------------------+----+-----+-----+---------+
  2117.   | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
  2118.   +---------------------------+----+-----+-----+---------+
  2119.   | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
  2120.   +---------------------------+----+-----+-----+---------+
  2121.   | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
  2122.   +---------------------------+----+-----+-----+---------+
  2123.   | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
  2124.   +---------------------------+----+-----+-----+---------+
  2125.   | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
  2126.   +---------------------------+----+-----+-----+---------+
  2127.   | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
  2128.   +---------------------------+----+-----+-----+---------+
  2129.  
  2130.   Where 'mpl' is a mipmap level and 'idx' is the array index.
  2131.  
  2132. .. opcode:: SAMPLE_I_MS
  2133.  
  2134.   Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
  2135.  
  2136.   Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
  2137.  
  2138. .. opcode:: SAMPLE_B
  2139.  
  2140.   Just like the SAMPLE instruction with the exception that an additional bias
  2141.   is applied to the level of detail computed as part of the instruction
  2142.   execution.
  2143.  
  2144.   Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
  2145.  
  2146.   Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
  2147.  
  2148. .. opcode:: SAMPLE_C
  2149.  
  2150.   Similar to the SAMPLE instruction but it performs a comparison filter. The
  2151.   operands to SAMPLE_C are identical to SAMPLE, except that there is an
  2152.   additional float32 operand, reference value, which must be a register with
  2153.   single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
  2154.   current samplers compare_func (in pipe_sampler_state) to compare reference
  2155.   value against the red component value for the surce resource at each texel
  2156.   that the currently configured texture filter covers based on the provided
  2157.   coordinates.
  2158.  
  2159.   Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
  2160.  
  2161.   Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
  2162.  
  2163. .. opcode:: SAMPLE_C_LZ
  2164.  
  2165.   Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
  2166.   for level-zero.
  2167.  
  2168.   Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
  2169.  
  2170.   Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
  2171.  
  2172.  
  2173. .. opcode:: SAMPLE_D
  2174.  
  2175.   SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
  2176.   the source address in the x direction and the y direction are provided by
  2177.   extra parameters.
  2178.  
  2179.   Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
  2180.  
  2181.   Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
  2182.  
  2183. .. opcode:: SAMPLE_L
  2184.  
  2185.   SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
  2186.   directly as a scalar value, representing no anisotropy.
  2187.  
  2188.   Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
  2189.  
  2190.   Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
  2191.  
  2192. .. opcode:: GATHER4
  2193.  
  2194.   Gathers the four texels to be used in a bi-linear filtering operation and
  2195.   packs them into a single register.  Only works with 2D, 2D array, cubemaps,
  2196.   and cubemaps arrays.  For 2D textures, only the addressing modes of the
  2197.   sampler and the top level of any mip pyramid are used. Set W to zero.  It
  2198.   behaves like the SAMPLE instruction, but a filtered sample is not
  2199.   generated. The four samples that contribute to filtering are placed into
  2200.   xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
  2201.   delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
  2202.   magnitude of the deltas are half a texel.
  2203.  
  2204.  
  2205. .. opcode:: SVIEWINFO
  2206.  
  2207.   Query the dimensions of a given sampler view.  dst receives width, height,
  2208.   depth or array size and number of mipmap levels as int4. The dst can have a
  2209.   writemask which will specify what info is the caller interested in.
  2210.  
  2211.   Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
  2212.  
  2213.   Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
  2214.  
  2215.   src_mip_level is an unsigned integer scalar. If it's out of range then
  2216.   returns 0 for width, height and depth/array size but the total number of
  2217.   mipmap is still returned correctly for the given sampler view.  The returned
  2218.   width, height and depth values are for the mipmap level selected by the
  2219.   src_mip_level and are in the number of texels.  For 1d texture array width
  2220.   is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
  2221.   still in dst.w.  In contrast to d3d10 resinfo, there's no way in the tgsi
  2222.   instruction encoding to specify the return type (float/rcpfloat/uint), hence
  2223.   always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
  2224.   resinfo allowing swizzling dst values is ignored (due to the interaction
  2225.   with rcpfloat modifier which requires some swizzle handling in the state
  2226.   tracker anyway).
  2227.  
  2228. .. opcode:: SAMPLE_POS
  2229.  
  2230.   Query the position of a given sample.  dst receives float4 (x, y, 0, 0)
  2231.   indicated where the sample is located. If the resource is not a multi-sample
  2232.   resource and not a render target, the result is 0.
  2233.  
  2234. .. opcode:: SAMPLE_INFO
  2235.  
  2236.   dst receives number of samples in x.  If the resource is not a multi-sample
  2237.   resource and not a render target, the result is 0.
  2238.  
  2239.  
  2240. .. _resourceopcodes:
  2241.  
  2242. Resource Access Opcodes
  2243. ^^^^^^^^^^^^^^^^^^^^^^^
  2244.  
  2245. .. opcode:: LOAD - Fetch data from a shader resource
  2246.  
  2247.                Syntax: ``LOAD dst, resource, address``
  2248.  
  2249.                Example: ``LOAD TEMP[0], RES[0], TEMP[1]``
  2250.  
  2251.                Using the provided integer address, LOAD fetches data
  2252.                from the specified buffer or texture without any
  2253.                filtering.
  2254.  
  2255.                The 'address' is specified as a vector of unsigned
  2256.                integers.  If the 'address' is out of range the result
  2257.                is unspecified.
  2258.  
  2259.                Only the first mipmap level of a resource can be read
  2260.                from using this instruction.
  2261.  
  2262.                For 1D or 2D texture arrays, the array index is
  2263.                provided as an unsigned integer in address.y or
  2264.                address.z, respectively.  address.yz are ignored for
  2265.                buffers and 1D textures.  address.z is ignored for 1D
  2266.                texture arrays and 2D textures.  address.w is always
  2267.                ignored.
  2268.  
  2269. .. opcode:: STORE - Write data to a shader resource
  2270.  
  2271.                Syntax: ``STORE resource, address, src``
  2272.  
  2273.                Example: ``STORE RES[0], TEMP[0], TEMP[1]``
  2274.  
  2275.                Using the provided integer address, STORE writes data
  2276.                to the specified buffer or texture.
  2277.  
  2278.                The 'address' is specified as a vector of unsigned
  2279.                integers.  If the 'address' is out of range the result
  2280.                is unspecified.
  2281.  
  2282.                Only the first mipmap level of a resource can be
  2283.                written to using this instruction.
  2284.  
  2285.                For 1D or 2D texture arrays, the array index is
  2286.                provided as an unsigned integer in address.y or
  2287.                address.z, respectively.  address.yz are ignored for
  2288.                buffers and 1D textures.  address.z is ignored for 1D
  2289.                texture arrays and 2D textures.  address.w is always
  2290.                ignored.
  2291.  
  2292.  
  2293. .. _threadsyncopcodes:
  2294.  
  2295. Inter-thread synchronization opcodes
  2296. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2297.  
  2298. These opcodes are intended for communication between threads running
  2299. within the same compute grid.  For now they're only valid in compute
  2300. programs.
  2301.  
  2302. .. opcode:: MFENCE - Memory fence
  2303.  
  2304.   Syntax: ``MFENCE resource``
  2305.  
  2306.   Example: ``MFENCE RES[0]``
  2307.  
  2308.   This opcode forces strong ordering between any memory access
  2309.   operations that affect the specified resource.  This means that
  2310.   previous loads and stores (and only those) will be performed and
  2311.   visible to other threads before the program execution continues.
  2312.  
  2313.  
  2314. .. opcode:: LFENCE - Load memory fence
  2315.  
  2316.   Syntax: ``LFENCE resource``
  2317.  
  2318.   Example: ``LFENCE RES[0]``
  2319.  
  2320.   Similar to MFENCE, but it only affects the ordering of memory loads.
  2321.  
  2322.  
  2323. .. opcode:: SFENCE - Store memory fence
  2324.  
  2325.   Syntax: ``SFENCE resource``
  2326.  
  2327.   Example: ``SFENCE RES[0]``
  2328.  
  2329.   Similar to MFENCE, but it only affects the ordering of memory stores.
  2330.  
  2331.  
  2332. .. opcode:: BARRIER - Thread group barrier
  2333.  
  2334.   ``BARRIER``
  2335.  
  2336.   This opcode suspends the execution of the current thread until all
  2337.   the remaining threads in the working group reach the same point of
  2338.   the program.  Results are unspecified if any of the remaining
  2339.   threads terminates or never reaches an executed BARRIER instruction.
  2340.  
  2341.  
  2342. .. _atomopcodes:
  2343.  
  2344. Atomic opcodes
  2345. ^^^^^^^^^^^^^^
  2346.  
  2347. These opcodes provide atomic variants of some common arithmetic and
  2348. logical operations.  In this context atomicity means that another
  2349. concurrent memory access operation that affects the same memory
  2350. location is guaranteed to be performed strictly before or after the
  2351. entire execution of the atomic operation.
  2352.  
  2353. For the moment they're only valid in compute programs.
  2354.  
  2355. .. opcode:: ATOMUADD - Atomic integer addition
  2356.  
  2357.   Syntax: ``ATOMUADD dst, resource, offset, src``
  2358.  
  2359.   Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2360.  
  2361.   The following operation is performed atomically on each component:
  2362.  
  2363. .. math::
  2364.  
  2365.   dst_i = resource[offset]_i
  2366.  
  2367.   resource[offset]_i = dst_i + src_i
  2368.  
  2369.  
  2370. .. opcode:: ATOMXCHG - Atomic exchange
  2371.  
  2372.   Syntax: ``ATOMXCHG dst, resource, offset, src``
  2373.  
  2374.   Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2375.  
  2376.   The following operation is performed atomically on each component:
  2377.  
  2378. .. math::
  2379.  
  2380.   dst_i = resource[offset]_i
  2381.  
  2382.   resource[offset]_i = src_i
  2383.  
  2384.  
  2385. .. opcode:: ATOMCAS - Atomic compare-and-exchange
  2386.  
  2387.   Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
  2388.  
  2389.   Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]``
  2390.  
  2391.   The following operation is performed atomically on each component:
  2392.  
  2393. .. math::
  2394.  
  2395.   dst_i = resource[offset]_i
  2396.  
  2397.   resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i)
  2398.  
  2399.  
  2400. .. opcode:: ATOMAND - Atomic bitwise And
  2401.  
  2402.   Syntax: ``ATOMAND dst, resource, offset, src``
  2403.  
  2404.   Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2405.  
  2406.   The following operation is performed atomically on each component:
  2407.  
  2408. .. math::
  2409.  
  2410.   dst_i = resource[offset]_i
  2411.  
  2412.   resource[offset]_i = dst_i \& src_i
  2413.  
  2414.  
  2415. .. opcode:: ATOMOR - Atomic bitwise Or
  2416.  
  2417.   Syntax: ``ATOMOR dst, resource, offset, src``
  2418.  
  2419.   Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2420.  
  2421.   The following operation is performed atomically on each component:
  2422.  
  2423. .. math::
  2424.  
  2425.   dst_i = resource[offset]_i
  2426.  
  2427.   resource[offset]_i = dst_i | src_i
  2428.  
  2429.  
  2430. .. opcode:: ATOMXOR - Atomic bitwise Xor
  2431.  
  2432.   Syntax: ``ATOMXOR dst, resource, offset, src``
  2433.  
  2434.   Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2435.  
  2436.   The following operation is performed atomically on each component:
  2437.  
  2438. .. math::
  2439.  
  2440.   dst_i = resource[offset]_i
  2441.  
  2442.   resource[offset]_i = dst_i \oplus src_i
  2443.  
  2444.  
  2445. .. opcode:: ATOMUMIN - Atomic unsigned minimum
  2446.  
  2447.   Syntax: ``ATOMUMIN dst, resource, offset, src``
  2448.  
  2449.   Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2450.  
  2451.   The following operation is performed atomically on each component:
  2452.  
  2453. .. math::
  2454.  
  2455.   dst_i = resource[offset]_i
  2456.  
  2457.   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
  2458.  
  2459.  
  2460. .. opcode:: ATOMUMAX - Atomic unsigned maximum
  2461.  
  2462.   Syntax: ``ATOMUMAX dst, resource, offset, src``
  2463.  
  2464.   Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2465.  
  2466.   The following operation is performed atomically on each component:
  2467.  
  2468. .. math::
  2469.  
  2470.   dst_i = resource[offset]_i
  2471.  
  2472.   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
  2473.  
  2474.  
  2475. .. opcode:: ATOMIMIN - Atomic signed minimum
  2476.  
  2477.   Syntax: ``ATOMIMIN dst, resource, offset, src``
  2478.  
  2479.   Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2480.  
  2481.   The following operation is performed atomically on each component:
  2482.  
  2483. .. math::
  2484.  
  2485.   dst_i = resource[offset]_i
  2486.  
  2487.   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
  2488.  
  2489.  
  2490. .. opcode:: ATOMIMAX - Atomic signed maximum
  2491.  
  2492.   Syntax: ``ATOMIMAX dst, resource, offset, src``
  2493.  
  2494.   Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
  2495.  
  2496.   The following operation is performed atomically on each component:
  2497.  
  2498. .. math::
  2499.  
  2500.   dst_i = resource[offset]_i
  2501.  
  2502.   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
  2503.  
  2504.  
  2505.  
  2506. Explanation of symbols used
  2507. ------------------------------
  2508.  
  2509.  
  2510. Functions
  2511. ^^^^^^^^^^^^^^
  2512.  
  2513.  
  2514.   :math:`|x|`       Absolute value of `x`.
  2515.  
  2516.   :math:`\lceil x \rceil` Ceiling of `x`.
  2517.  
  2518.   clamp(x,y,z)      Clamp x between y and z.
  2519.                     (x < y) ? y : (x > z) ? z : x
  2520.  
  2521.   :math:`\lfloor x\rfloor` Floor of `x`.
  2522.  
  2523.   :math:`\log_2{x}` Logarithm of `x`, base 2.
  2524.  
  2525.   max(x,y)          Maximum of x and y.
  2526.                     (x > y) ? x : y
  2527.  
  2528.   min(x,y)          Minimum of x and y.
  2529.                     (x < y) ? x : y
  2530.  
  2531.   partialx(x)       Derivative of x relative to fragment's X.
  2532.  
  2533.   partialy(x)       Derivative of x relative to fragment's Y.
  2534.  
  2535.   pop()             Pop from stack.
  2536.  
  2537.   :math:`x^y`       `x` to the power `y`.
  2538.  
  2539.   push(x)           Push x on stack.
  2540.  
  2541.   round(x)          Round x.
  2542.  
  2543.   trunc(x)          Truncate x, i.e. drop the fraction bits.
  2544.  
  2545.  
  2546. Keywords
  2547. ^^^^^^^^^^^^^
  2548.  
  2549.  
  2550.   discard           Discard fragment.
  2551.  
  2552.   pc                Program counter.
  2553.  
  2554.   target            Label of target instruction.
  2555.  
  2556.  
  2557. Other tokens
  2558. ---------------
  2559.  
  2560.  
  2561. Declaration
  2562. ^^^^^^^^^^^
  2563.  
  2564.  
  2565. Declares a register that is will be referenced as an operand in Instruction
  2566. tokens.
  2567.  
  2568. File field contains register file that is being declared and is one
  2569. of TGSI_FILE.
  2570.  
  2571. UsageMask field specifies which of the register components can be accessed
  2572. and is one of TGSI_WRITEMASK.
  2573.  
  2574. The Local flag specifies that a given value isn't intended for
  2575. subroutine parameter passing and, as a result, the implementation
  2576. isn't required to give any guarantees of it being preserved across
  2577. subroutine boundaries.  As it's merely a compiler hint, the
  2578. implementation is free to ignore it.
  2579.  
  2580. If Dimension flag is set to 1, a Declaration Dimension token follows.
  2581.  
  2582. If Semantic flag is set to 1, a Declaration Semantic token follows.
  2583.  
  2584. If Interpolate flag is set to 1, a Declaration Interpolate token follows.
  2585.  
  2586. If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
  2587.  
  2588. If Array flag is set to 1, a Declaration Array token follows.
  2589.  
  2590. Array Declaration
  2591. ^^^^^^^^^^^^^^^^^^^^^^^^
  2592.  
  2593. Declarations can optional have an ArrayID attribute which can be referred by
  2594. indirect addressing operands. An ArrayID of zero is reserved and treaded as
  2595. if no ArrayID is specified.
  2596.  
  2597. If an indirect addressing operand refers to a specific declaration by using
  2598. an ArrayID only the registers in this declaration are guaranteed to be
  2599. accessed, accessing any register outside this declaration results in undefined
  2600. behavior. Note that for compatibility the effective index is zero-based and
  2601. not relative to the specified declaration
  2602.  
  2603. If no ArrayID is specified with an indirect addressing operand the whole
  2604. register file might be accessed by this operand. This is strongly discouraged
  2605. and will prevent packing of scalar/vec2 arrays and effective alias analysis.
  2606.  
  2607. Declaration Semantic
  2608. ^^^^^^^^^^^^^^^^^^^^^^^^
  2609.  
  2610. Vertex and fragment shader input and output registers may be labeled
  2611. with semantic information consisting of a name and index.
  2612.  
  2613. Follows Declaration token if Semantic bit is set.
  2614.  
  2615. Since its purpose is to link a shader with other stages of the pipeline,
  2616. it is valid to follow only those Declaration tokens that declare a register
  2617. either in INPUT or OUTPUT file.
  2618.  
  2619. SemanticName field contains the semantic name of the register being declared.
  2620. There is no default value.
  2621.  
  2622. SemanticIndex is an optional subscript that can be used to distinguish
  2623. different register declarations with the same semantic name. The default value
  2624. is 0.
  2625.  
  2626. The meanings of the individual semantic names are explained in the following
  2627. sections.
  2628.  
  2629. TGSI_SEMANTIC_POSITION
  2630. """"""""""""""""""""""
  2631.  
  2632. For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
  2633. output register which contains the homogeneous vertex position in the clip
  2634. space coordinate system.  After clipping, the X, Y and Z components of the
  2635. vertex will be divided by the W value to get normalized device coordinates.
  2636.  
  2637. For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
  2638. fragment shader input contains the fragment's window position.  The X
  2639. component starts at zero and always increases from left to right.
  2640. The Y component starts at zero and always increases but Y=0 may either
  2641. indicate the top of the window or the bottom depending on the fragment
  2642. coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
  2643. The Z coordinate ranges from 0 to 1 to represent depth from the front
  2644. to the back of the Z buffer.  The W component contains the interpolated
  2645. reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
  2646. but unlike d3d10 which interpolates the same 1/w but then gives back
  2647. the reciprocal of the interpolated value).
  2648.  
  2649. Fragment shaders may also declare an output register with
  2650. TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
  2651. the fragment shader to change the fragment's Z position.
  2652.  
  2653.  
  2654.  
  2655. TGSI_SEMANTIC_COLOR
  2656. """""""""""""""""""
  2657.  
  2658. For vertex shader outputs or fragment shader inputs/outputs, this
  2659. label indicates that the resister contains an R,G,B,A color.
  2660.  
  2661. Several shader inputs/outputs may contain colors so the semantic index
  2662. is used to distinguish them.  For example, color[0] may be the diffuse
  2663. color while color[1] may be the specular color.
  2664.  
  2665. This label is needed so that the flat/smooth shading can be applied
  2666. to the right interpolants during rasterization.
  2667.  
  2668.  
  2669.  
  2670. TGSI_SEMANTIC_BCOLOR
  2671. """"""""""""""""""""
  2672.  
  2673. Back-facing colors are only used for back-facing polygons, and are only valid
  2674. in vertex shader outputs. After rasterization, all polygons are front-facing
  2675. and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
  2676. so all BCOLORs effectively become regular COLORs in the fragment shader.
  2677.  
  2678.  
  2679. TGSI_SEMANTIC_FOG
  2680. """""""""""""""""
  2681.  
  2682. Vertex shader inputs and outputs and fragment shader inputs may be
  2683. labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
  2684. a fog coordinate.  Typically, the fragment shader will use the fog coordinate
  2685. to compute a fog blend factor which is used to blend the normal fragment color
  2686. with a constant fog color.  But fog coord really is just an ordinary vec4
  2687. register like regular semantics.
  2688.  
  2689.  
  2690. TGSI_SEMANTIC_PSIZE
  2691. """""""""""""""""""
  2692.  
  2693. Vertex shader input and output registers may be labeled with
  2694. TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
  2695. in the form (S, 0, 0, 1).  The point size controls the width or diameter
  2696. of points for rasterization.  This label cannot be used in fragment
  2697. shaders.
  2698.  
  2699. When using this semantic, be sure to set the appropriate state in the
  2700. :ref:`rasterizer` first.
  2701.  
  2702.  
  2703. TGSI_SEMANTIC_TEXCOORD
  2704. """"""""""""""""""""""
  2705.  
  2706. Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
  2707.  
  2708. Vertex shader outputs and fragment shader inputs may be labeled with
  2709. this semantic to make them replaceable by sprite coordinates via the
  2710. sprite_coord_enable state in the :ref:`rasterizer`.
  2711. The semantic index permitted with this semantic is limited to <= 7.
  2712.  
  2713. If the driver does not support TEXCOORD, sprite coordinate replacement
  2714. applies to inputs with the GENERIC semantic instead.
  2715.  
  2716. The intended use case for this semantic is gl_TexCoord.
  2717.  
  2718.  
  2719. TGSI_SEMANTIC_PCOORD
  2720. """"""""""""""""""""
  2721.  
  2722. Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
  2723.  
  2724. Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
  2725. that the register contains sprite coordinates in the form (x, y, 0, 1), if
  2726. the current primitive is a point and point sprites are enabled. Otherwise,
  2727. the contents of the register are undefined.
  2728.  
  2729. The intended use case for this semantic is gl_PointCoord.
  2730.  
  2731.  
  2732. TGSI_SEMANTIC_GENERIC
  2733. """""""""""""""""""""
  2734.  
  2735. All vertex/fragment shader inputs/outputs not labeled with any other
  2736. semantic label can be considered to be generic attributes.  Typical
  2737. uses of generic inputs/outputs are texcoords and user-defined values.
  2738.  
  2739.  
  2740. TGSI_SEMANTIC_NORMAL
  2741. """"""""""""""""""""
  2742.  
  2743. Indicates that a vertex shader input is a normal vector.  This is
  2744. typically only used for legacy graphics APIs.
  2745.  
  2746.  
  2747. TGSI_SEMANTIC_FACE
  2748. """"""""""""""""""
  2749.  
  2750. This label applies to fragment shader inputs only and indicates that
  2751. the register contains front/back-face information of the form (F, 0,
  2752. 0, 1).  The first component will be positive when the fragment belongs
  2753. to a front-facing polygon, and negative when the fragment belongs to a
  2754. back-facing polygon.
  2755.  
  2756.  
  2757. TGSI_SEMANTIC_EDGEFLAG
  2758. """"""""""""""""""""""
  2759.  
  2760. For vertex shaders, this sematic label indicates that an input or
  2761. output is a boolean edge flag.  The register layout is [F, x, x, x]
  2762. where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
  2763. simply copies the edge flag input to the edgeflag output.
  2764.  
  2765. Edge flags are used to control which lines or points are actually
  2766. drawn when the polygon mode converts triangles/quads/polygons into
  2767. points or lines.
  2768.  
  2769.  
  2770. TGSI_SEMANTIC_STENCIL
  2771. """""""""""""""""""""
  2772.  
  2773. For fragment shaders, this semantic label indicates that an output
  2774. is a writable stencil reference value. Only the Y component is writable.
  2775. This allows the fragment shader to change the fragments stencilref value.
  2776.  
  2777.  
  2778. TGSI_SEMANTIC_VIEWPORT_INDEX
  2779. """"""""""""""""""""""""""""
  2780.  
  2781. For geometry shaders, this semantic label indicates that an output
  2782. contains the index of the viewport (and scissor) to use.
  2783. This is an integer value, and only the X component is used.
  2784.  
  2785.  
  2786. TGSI_SEMANTIC_LAYER
  2787. """""""""""""""""""
  2788.  
  2789. For geometry shaders, this semantic label indicates that an output
  2790. contains the layer value to use for the color and depth/stencil surfaces.
  2791. This is an integer value, and only the X component is used.
  2792. (Also known as rendertarget array index.)
  2793.  
  2794.  
  2795. TGSI_SEMANTIC_CULLDIST
  2796. """"""""""""""""""""""
  2797.  
  2798. Used as distance to plane for performing application-defined culling
  2799. of individual primitives against a plane. When components of vertex
  2800. elements are given this label, these values are assumed to be a
  2801. float32 signed distance to a plane. Primitives will be completely
  2802. discarded if the plane distance for all of the vertices in the
  2803. primitive are < 0. If a vertex has a cull distance of NaN, that
  2804. vertex counts as "out" (as if its < 0);
  2805. The limits on both clip and cull distances are bound
  2806. by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
  2807. the maximum number of components that can be used to hold the
  2808. distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
  2809. which specifies the maximum number of registers which can be
  2810. annotated with those semantics.
  2811.  
  2812.  
  2813. TGSI_SEMANTIC_CLIPDIST
  2814. """"""""""""""""""""""
  2815.  
  2816. When components of vertex elements are identified this way, these
  2817. values are each assumed to be a float32 signed distance to a plane.
  2818. Primitive setup only invokes rasterization on pixels for which
  2819. the interpolated plane distances are >= 0. Multiple clip planes
  2820. can be implemented simultaneously, by annotating multiple
  2821. components of one or more vertex elements with the above specified
  2822. semantic. The limits on both clip and cull distances are bound
  2823. by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
  2824. the maximum number of components that can be used to hold the
  2825. distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
  2826. which specifies the maximum number of registers which can be
  2827. annotated with those semantics.
  2828.  
  2829. TGSI_SEMANTIC_SAMPLEID
  2830. """"""""""""""""""""""
  2831.  
  2832. For fragment shaders, this semantic label indicates that a system value
  2833. contains the current sample id (i.e. gl_SampleID).
  2834. This is an integer value, and only the X component is used.
  2835.  
  2836. TGSI_SEMANTIC_SAMPLEPOS
  2837. """""""""""""""""""""""
  2838.  
  2839. For fragment shaders, this semantic label indicates that a system value
  2840. contains the current sample's position (i.e. gl_SamplePosition). Only the X
  2841. and Y values are used.
  2842.  
  2843. TGSI_SEMANTIC_SAMPLEMASK
  2844. """"""""""""""""""""""""
  2845.  
  2846. For fragment shaders, this semantic label indicates that an output contains
  2847. the sample mask used to disable further sample processing
  2848. (i.e. gl_SampleMask). Only the X value is used, up to 32x MS.
  2849.  
  2850. TGSI_SEMANTIC_INVOCATIONID
  2851. """"""""""""""""""""""""""
  2852.  
  2853. For geometry shaders, this semantic label indicates that a system value
  2854. contains the current invocation id (i.e. gl_InvocationID).
  2855. This is an integer value, and only the X component is used.
  2856.  
  2857. TGSI_SEMANTIC_INSTANCEID
  2858. """"""""""""""""""""""""
  2859.  
  2860. For vertex shaders, this semantic label indicates that a system value contains
  2861. the current instance id (i.e. gl_InstanceID). It does not include the base
  2862. instance. This is an integer value, and only the X component is used.
  2863.  
  2864. TGSI_SEMANTIC_VERTEXID
  2865. """"""""""""""""""""""
  2866.  
  2867. For vertex shaders, this semantic label indicates that a system value contains
  2868. the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
  2869. base vertex. This is an integer value, and only the X component is used.
  2870.  
  2871. TGSI_SEMANTIC_VERTEXID_NOBASE
  2872. """""""""""""""""""""""""""""""
  2873.  
  2874. For vertex shaders, this semantic label indicates that a system value contains
  2875. the current vertex id without including the base vertex (this corresponds to
  2876. d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
  2877. == TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
  2878. is used.
  2879.  
  2880. TGSI_SEMANTIC_BASEVERTEX
  2881. """"""""""""""""""""""""
  2882.  
  2883. For vertex shaders, this semantic label indicates that a system value contains
  2884. the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
  2885. this contains the first (or start) value instead.
  2886. This is an integer value, and only the X component is used.
  2887.  
  2888. TGSI_SEMANTIC_PRIMID
  2889. """"""""""""""""""""
  2890.  
  2891. For geometry and fragment shaders, this semantic label indicates the value
  2892. contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
  2893. and only the X component is used.
  2894. FIXME: This right now can be either a ordinary input or a system value...
  2895.  
  2896.  
  2897. TGSI_SEMANTIC_PATCH
  2898. """""""""""""""""""
  2899.  
  2900. For tessellation evaluation/control shaders, this semantic label indicates a
  2901. generic per-patch attribute. Such semantics will not implicitly be per-vertex
  2902. arrays.
  2903.  
  2904. TGSI_SEMANTIC_TESSCOORD
  2905. """""""""""""""""""""""
  2906.  
  2907. For tessellation evaluation shaders, this semantic label indicates the
  2908. coordinates of the vertex being processed. This is available in XYZ; W is
  2909. undefined.
  2910.  
  2911. TGSI_SEMANTIC_TESSOUTER
  2912. """""""""""""""""""""""
  2913.  
  2914. For tessellation evaluation/control shaders, this semantic label indicates the
  2915. outer tessellation levels of the patch. Isoline tessellation will only have XY
  2916. defined, triangle will have XYZ and quads will have XYZW defined. This
  2917. corresponds to gl_TessLevelOuter.
  2918.  
  2919. TGSI_SEMANTIC_TESSINNER
  2920. """""""""""""""""""""""
  2921.  
  2922. For tessellation evaluation/control shaders, this semantic label indicates the
  2923. inner tessellation levels of the patch. The X value is only defined for
  2924. triangle tessellation, while quads will have XY defined. This is entirely
  2925. undefined for isoline tessellation.
  2926.  
  2927. TGSI_SEMANTIC_VERTICESIN
  2928. """"""""""""""""""""""""
  2929.  
  2930. For tessellation evaluation/control shaders, this semantic label indicates the
  2931. number of vertices provided in the input patch. Only the X value is defined.
  2932.  
  2933.  
  2934. Declaration Interpolate
  2935. ^^^^^^^^^^^^^^^^^^^^^^^
  2936.  
  2937. This token is only valid for fragment shader INPUT declarations.
  2938.  
  2939. The Interpolate field specifes the way input is being interpolated by
  2940. the rasteriser and is one of TGSI_INTERPOLATE_*.
  2941.  
  2942. The Location field specifies the location inside the pixel that the
  2943. interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
  2944. when per-sample shading is enabled, the implementation may choose to
  2945. interpolate at the sample irrespective of the Location field.
  2946.  
  2947. The CylindricalWrap bitfield specifies which register components
  2948. should be subject to cylindrical wrapping when interpolating by the
  2949. rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
  2950. should be interpolated according to cylindrical wrapping rules.
  2951.  
  2952.  
  2953. Declaration Sampler View
  2954. ^^^^^^^^^^^^^^^^^^^^^^^^
  2955.  
  2956. Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
  2957.  
  2958. DCL SVIEW[#], resource, type(s)
  2959.  
  2960. Declares a shader input sampler view and assigns it to a SVIEW[#]
  2961. register.
  2962.  
  2963. resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
  2964.  
  2965. type must be 1 or 4 entries (if specifying on a per-component
  2966. level) out of UNORM, SNORM, SINT, UINT and FLOAT.
  2967.  
  2968.  
  2969. Declaration Resource
  2970. ^^^^^^^^^^^^^^^^^^^^
  2971.  
  2972. Follows Declaration token if file is TGSI_FILE_RESOURCE.
  2973.  
  2974. DCL RES[#], resource [, WR] [, RAW]
  2975.  
  2976. Declares a shader input resource and assigns it to a RES[#]
  2977. register.
  2978.  
  2979. resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
  2980. 2DArray.
  2981.  
  2982. If the RAW keyword is not specified, the texture data will be
  2983. subject to conversion, swizzling and scaling as required to yield
  2984. the specified data type from the physical data format of the bound
  2985. resource.
  2986.  
  2987. If the RAW keyword is specified, no channel conversion will be
  2988. performed: the values read for each of the channels (X,Y,Z,W) will
  2989. correspond to consecutive words in the same order and format
  2990. they're found in memory.  No element-to-address conversion will be
  2991. performed either: the value of the provided X coordinate will be
  2992. interpreted in byte units instead of texel units.  The result of
  2993. accessing a misaligned address is undefined.
  2994.  
  2995. Usage of the STORE opcode is only allowed if the WR (writable) flag
  2996. is set.
  2997.  
  2998.  
  2999. Properties
  3000. ^^^^^^^^^^^^^^^^^^^^^^^^
  3001.  
  3002. Properties are general directives that apply to the whole TGSI program.
  3003.  
  3004. FS_COORD_ORIGIN
  3005. """""""""""""""
  3006.  
  3007. Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
  3008. The default value is UPPER_LEFT.
  3009.  
  3010. If UPPER_LEFT, the position will be (0,0) at the upper left corner and
  3011. increase downward and rightward.
  3012. If LOWER_LEFT, the position will be (0,0) at the lower left corner and
  3013. increase upward and rightward.
  3014.  
  3015. OpenGL defaults to LOWER_LEFT, and is configurable with the
  3016. GL_ARB_fragment_coord_conventions extension.
  3017.  
  3018. DirectX 9/10 use UPPER_LEFT.
  3019.  
  3020. FS_COORD_PIXEL_CENTER
  3021. """""""""""""""""""""
  3022.  
  3023. Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
  3024. The default value is HALF_INTEGER.
  3025.  
  3026. If HALF_INTEGER, the fractionary part of the position will be 0.5
  3027. If INTEGER, the fractionary part of the position will be 0.0
  3028.  
  3029. Note that this does not affect the set of fragments generated by
  3030. rasterization, which is instead controlled by half_pixel_center in the
  3031. rasterizer.
  3032.  
  3033. OpenGL defaults to HALF_INTEGER, and is configurable with the
  3034. GL_ARB_fragment_coord_conventions extension.
  3035.  
  3036. DirectX 9 uses INTEGER.
  3037. DirectX 10 uses HALF_INTEGER.
  3038.  
  3039. FS_COLOR0_WRITES_ALL_CBUFS
  3040. """"""""""""""""""""""""""
  3041. Specifies that writes to the fragment shader color 0 are replicated to all
  3042. bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
  3043. fragData is directed to a single color buffer, but fragColor is broadcast.
  3044.  
  3045. VS_PROHIBIT_UCPS
  3046. """"""""""""""""""""""""""
  3047. If this property is set on the program bound to the shader stage before the
  3048. fragment shader, user clip planes should have no effect (be disabled) even if
  3049. that shader does not write to any clip distance outputs and the rasterizer's
  3050. clip_plane_enable is non-zero.
  3051. This property is only supported by drivers that also support shader clip
  3052. distance outputs.
  3053. This is useful for APIs that don't have UCPs and where clip distances written
  3054. by a shader cannot be disabled.
  3055.  
  3056. GS_INVOCATIONS
  3057. """"""""""""""
  3058.  
  3059. Specifies the number of times a geometry shader should be executed for each
  3060. input primitive. Each invocation will have a different
  3061. TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
  3062. be 1.
  3063.  
  3064. VS_WINDOW_SPACE_POSITION
  3065. """"""""""""""""""""""""""
  3066. If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
  3067. is assumed to contain window space coordinates.
  3068. Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
  3069. directly taken from the 4-th component of the shader output.
  3070. Naturally, clipping is not performed on window coordinates either.
  3071. The effect of this property is undefined if a geometry or tessellation shader
  3072. are in use.
  3073.  
  3074. TCS_VERTICES_OUT
  3075. """"""""""""""""
  3076.  
  3077. The number of vertices written by the tessellation control shader. This
  3078. effectively defines the patch input size of the tessellation evaluation shader
  3079. as well.
  3080.  
  3081. TES_PRIM_MODE
  3082. """""""""""""
  3083.  
  3084. This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
  3085. ``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
  3086. separate isolines settings, the regular lines is assumed to mean isolines.)
  3087.  
  3088. TES_SPACING
  3089. """""""""""
  3090.  
  3091. This sets the spacing mode of the tessellation generator, one of
  3092. ``PIPE_TESS_SPACING_*``.
  3093.  
  3094. TES_VERTEX_ORDER_CW
  3095. """""""""""""""""""
  3096.  
  3097. This sets the vertex order to be clockwise if the value is 1, or
  3098. counter-clockwise if set to 0.
  3099.  
  3100. TES_POINT_MODE
  3101. """"""""""""""
  3102.  
  3103. If set to a non-zero value, this turns on point mode for the tessellator,
  3104. which means that points will be generated instead of primitives.
  3105.  
  3106.  
  3107. Texture Sampling and Texture Formats
  3108. ------------------------------------
  3109.  
  3110. This table shows how texture image components are returned as (x,y,z,w) tuples
  3111. by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
  3112. :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
  3113. well.
  3114.  
  3115. +--------------------+--------------+--------------------+--------------+
  3116. | Texture Components | Gallium      | OpenGL             | Direct3D 9   |
  3117. +====================+==============+====================+==============+
  3118. | R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
  3119. +--------------------+--------------+--------------------+--------------+
  3120. | RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
  3121. +--------------------+--------------+--------------------+--------------+
  3122. | RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
  3123. +--------------------+--------------+--------------------+--------------+
  3124. | RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
  3125. +--------------------+--------------+--------------------+--------------+
  3126. | A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
  3127. +--------------------+--------------+--------------------+--------------+
  3128. | L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
  3129. +--------------------+--------------+--------------------+--------------+
  3130. | LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
  3131. +--------------------+--------------+--------------------+--------------+
  3132. | I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
  3133. +--------------------+--------------+--------------------+--------------+
  3134. | UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
  3135. |                    |              | [#envmap-bumpmap]_ |              |
  3136. +--------------------+--------------+--------------------+--------------+
  3137. | Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
  3138. |                    |              | [#depth-tex-mode]_ |              |
  3139. +--------------------+--------------+--------------------+--------------+
  3140. | S                  | (s, s, s, s) | unknown            | unknown      |
  3141. +--------------------+--------------+--------------------+--------------+
  3142.  
  3143. .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
  3144. .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
  3145.    or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
  3146.