Go to most recent revision | Details | Last modification | View Log | RSS feed
Rev | Author | Line No. | Line |
---|---|---|---|
1901 | serge | 1 | Welcome to Mesa's GLSL compiler. A brief overview of how things flow: |
2 | |||
3 | 1) lex and yacc-based preprocessor takes the incoming shader string |
||
4 | and produces a new string containing the preprocessed shader. This |
||
5 | takes care of things like #if, #ifdef, #define, and preprocessor macro |
||
6 | invocations. Note that #version, #extension, and some others are |
||
7 | passed straight through. See glcpp/* |
||
8 | |||
9 | 2) lex and yacc-based parser takes the preprocessed string and |
||
10 | generates the AST (abstract syntax tree). Almost no checking is |
||
11 | performed in this stage. See glsl_lexer.lpp and glsl_parser.ypp. |
||
12 | |||
13 | 3) The AST is converted to "HIR". This is the intermediate |
||
14 | representation of the compiler. Constructors are generated, function |
||
15 | calls are resolved to particular function signatures, and all the |
||
16 | semantic checking is performed. See ast_*.cpp for the conversion, and |
||
17 | ir.h for the IR structures. |
||
18 | |||
19 | 4) The driver (Mesa, or main.cpp for the standalone binary) performs |
||
20 | optimizations. These include copy propagation, dead code elimination, |
||
21 | constant folding, and others. Generally the driver will call |
||
22 | optimizations in a loop, as each may open up opportunities for other |
||
23 | optimizations to do additional work. See most files called ir_*.cpp |
||
24 | |||
25 | 5) linking is performed. This does checking to ensure that the |
||
26 | outputs of the vertex shader match the inputs of the fragment shader, |
||
27 | and assigns locations to uniforms, attributes, and varyings. See |
||
28 | linker.cpp. |
||
29 | |||
30 | 6) The driver may perform additional optimization at this point, as |
||
31 | for example dead code elimination previously couldn't remove functions |
||
32 | or global variable usage when we didn't know what other code would be |
||
33 | linked in. |
||
34 | |||
35 | 7) The driver performs code generation out of the IR, taking a linked |
||
36 | shader program and producing a compiled program for each stage. See |
||
37 | ir_to_mesa.cpp for Mesa IR code generation. |
||
38 | |||
39 | FAQ: |
||
40 | |||
41 | Q: What is HIR versus IR versus LIR? |
||
42 | |||
43 | A: The idea behind the naming was that ast_to_hir would produce a |
||
44 | high-level IR ("HIR"), with things like matrix operations, structure |
||
45 | assignments, etc., present. A series of lowering passes would occur |
||
46 | that do things like break matrix multiplication into a series of dot |
||
47 | products/MADs, make structure assignment be a series of assignment of |
||
48 | components, flatten if statements into conditional moves, and such, |
||
49 | producing a low level IR ("LIR"). |
||
50 | |||
51 | However, it now appears that each driver will have different |
||
52 | requirements from a LIR. A 915-generation chipset wants all functions |
||
53 | inlined, all loops unrolled, all ifs flattened, no variable array |
||
54 | accesses, and matrix multiplication broken down. The Mesa IR backend |
||
55 | for swrast would like matrices and structure assignment broken down, |
||
56 | but it can support function calls and dynamic branching. A 965 vertex |
||
57 | shader IR backend could potentially even handle some matrix operations |
||
58 | without breaking them down, but the 965 fragment shader IR backend |
||
59 | would want to break to have (almost) all operations down channel-wise |
||
60 | and perform optimization on that. As a result, there's no single |
||
61 | low-level IR that will make everyone happy. So that usage has fallen |
||
62 | out of favor, and each driver will perform a series of lowering passes |
||
63 | to take the HIR down to whatever restrictions it wants to impose |
||
64 | before doing codegen. |
||
65 | |||
66 | Q: How is the IR structured? |
||
67 | |||
68 | A: The best way to get started seeing it would be to run the |
||
69 | standalone compiler against a shader: |
||
70 | |||
71 | ./glsl_compiler --dump-lir \ |
||
72 | ~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag |
||
73 | |||
74 | So for example one of the ir_instructions in main() contains: |
||
75 | |||
76 | (assign (constant bool (1)) (var_ref litColor) (expression vec3 * (var_ref Surf |
||
77 | aceColor) (var_ref __retval) ) ) |
||
78 | |||
79 | Or more visually: |
||
80 | (assign) |
||
81 | / | \ |
||
82 | (var_ref) (expression *) (constant bool 1) |
||
83 | / / \ |
||
84 | (litColor) (var_ref) (var_ref) |
||
85 | / \ |
||
86 | (SurfaceColor) (__retval) |
||
87 | |||
88 | which came from: |
||
89 | |||
90 | litColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0); |
||
91 | |||
92 | (the max call is not represented in this expression tree, as it was a |
||
93 | function call that got inlined but not brought into this expression |
||
94 | tree) |
||
95 | |||
96 | Each of those nodes is a subclass of ir_instruction. A particular |
||
97 | ir_instruction instance may only appear once in the whole IR tree with |
||
98 | the exception of ir_variables, which appear once as variable |
||
99 | declarations: |
||
100 | |||
101 | (declare () vec3 normDelta) |
||
102 | |||
103 | and multiple times as the targets of variable dereferences: |
||
104 | ... |
||
105 | (assign (constant bool (1)) (var_ref __retval) (expression float dot |
||
106 | (var_ref normDelta) (var_ref LightDir) ) ) |
||
107 | ... |
||
108 | (assign (constant bool (1)) (var_ref __retval) (expression vec3 - |
||
109 | (var_ref LightDir) (expression vec3 * (constant float (2.000000)) |
||
110 | (expression vec3 * (expression float dot (var_ref normDelta) (var_ref |
||
111 | LightDir) ) (var_ref normDelta) ) ) ) ) |
||
112 | ... |
||
113 | |||
114 | Each node has a type. Expressions may involve several different types: |
||
115 | (declare (uniform ) mat4 gl_ModelViewMatrix) |
||
116 | ((assign (constant bool (1)) (var_ref constructor_tmp) (expression |
||
117 | vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) ) |
||
118 | |||
119 | An expression tree can be arbitrarily deep, and the compiler tries to |
||
120 | keep them structured like that so that things like algebraic |
||
121 | optimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1 |
||
122 | * (mat2 * vec))) or recognizing operation patterns for code generation |
||
123 | (vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier. This comes |
||
124 | at the expense of additional trickery in implementing some |
||
125 | optimizations like CSE where one must navigate an expression tree. |
||
126 | |||
127 | Q: Why no SSA representation? |
||
128 | |||
129 | A: Converting an IR tree to SSA form makes dead code elmimination, |
||
130 | common subexpression elimination, and many other optimizations much |
||
131 | easier. However, in our primarily vector-based language, there's some |
||
132 | major questions as to how it would work. Do we do SSA on the scalar |
||
133 | or vector level? If we do it at the vector level, we're going to end |
||
134 | up with many different versions of the variable when encountering code |
||
135 | like: |
||
136 | |||
137 | (assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) ) |
||
138 | (assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) ) |
||
139 | (assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) ) |
||
140 | |||
141 | If every masked update of a component relies on the previous value of |
||
142 | the variable, then we're probably going to be quite limited in our |
||
143 | dead code elimination wins, and recognizing common expressions may |
||
144 | just not happen. On the other hand, if we operate channel-wise, then |
||
145 | we'll be prone to optimizing the operation on one of the channels at |
||
146 | the expense of making its instruction flow different from the other |
||
147 | channels, and a vector-based GPU would end up with worse code than if |
||
148 | we didn't optimize operations on that channel! |
||
149 | |||
150 | Once again, it appears that our optimization requirements are driven |
||
151 | significantly by the target architecture. For now, targeting the Mesa |
||
152 | IR backend, SSA does not appear to be that important to producing |
||
153 | excellent code, but we do expect to do some SSA-based optimizations |
||
154 | for the 965 fragment shader backend when that is developed. |
||
155 | |||
156 | Q: How should I expand instructions that take multiple backend instructions? |
||
157 | |||
158 | Sometimes you'll have to do the expansion in your code generation -- |
||
159 | see, for example, ir_to_mesa.cpp's handling of ir_unop_sqrt. However, |
||
160 | in many cases you'll want to do a pass over the IR to convert |
||
161 | non-native instructions to a series of native instructions. For |
||
162 | example, for the Mesa backend we have ir_div_to_mul_rcp.cpp because |
||
163 | Mesa IR (and many hardware backends) only have a reciprocal |
||
164 | instruction, not a divide. Implementing non-native instructions this |
||
165 | way gives the chance for constant folding to occur, so (a / 2.0) |
||
166 | becomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0)) |
||
167 | |||
168 | Q: How shoud I handle my special hardware instructions with respect to IR? |
||
169 | |||
170 | Our current theory is that if multiple targets have an instruction for |
||
171 | some operation, then we should probably be able to represent that in |
||
172 | the IR. Generally this is in the form of an ir_{bin,un}op expression |
||
173 | type. For example, we initially implemented fract() using (a - |
||
174 | floor(a)), but both 945 and 965 have instructions to give that result, |
||
175 | and it would also simplify the implementation of mod(), so |
||
176 | ir_unop_fract was added. The following areas need updating to add a |
||
177 | new expression type: |
||
178 | |||
179 | ir.h (new enum) |
||
180 | ir.cpp:get_num_operands() (used for ir_reader) |
||
181 | ir.cpp:operator_strs (used for ir_reader) |
||
182 | ir_constant_expression.cpp (you probably want to be able to constant fold) |
||
183 | ir_validate.cpp (check users have the right types) |
||
184 | |||
185 | You may also need to update the backends if they will see the new expr type: |
||
186 | |||
187 | ../mesa/shaders/ir_to_mesa.cpp |
||
188 | |||
189 | You can then use the new expression from builtins (if all backends |
||
190 | would rather see it), or scan the IR and convert to use your new |
||
191 | expression type (see ir_mod_to_fract, for example). |
||
192 | |||
193 | Q: How is memory management handled in the compiler? |
||
194 | |||
195 | The hierarchical memory allocator "talloc" developed for the Samba |
||
196 | project is used, so that things like optimization passes don't have to |
||
197 | worry about their garbage collection so much. It has a few nice |
||
198 | features, including low performance overhead and good debugging |
||
199 | support that's trivially available. |
||
200 | |||
201 | Generally, each stage of the compile creates a talloc context and |
||
202 | allocates its memory out of that or children of it. At the end of the |
||
203 | stage, the pieces still live are stolen to a new context and the old |
||
204 | one freed, or the whole context is kept for use by the next stage. |
||
205 | |||
206 | For IR transformations, a temporary context is used, then at the end |
||
207 | of all transformations, reparent_ir reparents all live nodes under the |
||
208 | shader's IR list, and the old context full of dead nodes is freed. |
||
209 | When developing a single IR transformation pass, this means that you |
||
210 | want to allocate instruction nodes out of the temporary context, so if |
||
211 | it becomes dead it doesn't live on as the child of a live node. At |
||
212 | the moment, optimization passes aren't passed that temporary context, |
||
213 | so they find it by calling talloc_parent() on a nearby IR node. The |
||
214 | talloc_parent() call is expensive, so many passes will cache the |
||
215 | result of the first talloc_parent(). Cleaning up all the optimization |
||
216 | passes to take a context argument and not call talloc_parent() is left |
||
217 | as an exercise. |
||
218 | |||
219 | Q: What is the file naming convention in this directory? |
||
220 | |||
221 | Initially, there really wasn't one. We have since adopted one: |
||
222 | |||
223 | - Files that implement code lowering passes should be named lower_* |
||
224 | (e.g., lower_noise.cpp). |
||
225 | - Files that implement optimization passes should be named opt_*. |
||
226 | - Files that implement a class that is used throught the code should |
||
227 | take the name of that class (e.g., ir_hierarchical_visitor.cpp). |
||
228 | - Files that contain code not fitting in one of the previous |
||
229 | categories should have a sensible name (e.g., glsl_parser.ypp). |