Subversion Repositories Kolibri OS

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
4358 Serge 1
2
3
4
  
5
  GL Dispatch in Mesa
6
  
7
8
9
 
10
11
  

The Mesa 3D Graphics Library

12
13
 
14
15
16
 
17

GL Dispatch in Mesa

18
 
19

Several factors combine to make efficient dispatch of OpenGL functions

20
fairly complicated.  This document attempts to explain some of the issues
21
and introduce the reader to Mesa's implementation.  Readers already familiar
22
with the issues around GL dispatch can safely skip ahead to the 
23
href="#overview">overview of Mesa's implementation.

24
 
25

1. Complexity of GL Dispatch

26
 
27

Every GL application has at least one object called a GL context.

28
This object, which is an implicit parameter to ever GL function, stores all
29
of the GL related state for the application.  Every texture, every buffer
30
object, every enable, and much, much more is stored in the context.  Since
31
an application can have more than one context, the context to be used is
32
selected by a window-system dependent function such as
33
glXMakeContextCurrent.

34
 
35

In environments that implement OpenGL with X-Windows using GLX, every GL

36
function, including the pointers returned by glXGetProcAddress, are
37
context independent.  This means that no matter what context is
38
currently active, the same glVertex3fv function is used.

39
 
40

This creates the first bit of dispatch complexity. An application can

41
have two GL contexts.  One context is a direct rendering context where
42
function calls are routed directly to a driver loaded within the
43
application's address space.  The other context is an indirect rendering
44
context where function calls are converted to GLX protocol and sent to a
45
server.  The same glVertex3fv has to do the right thing depending
46
on which context is current.

47
 
48

Highly optimized drivers or GLX protocol implementations may want to

49
change the behavior of GL functions depending on current state.  For
50
example, glFogCoordf may operate differently depending on whether
51
or not fog is enabled.

52
 
53

In multi-threaded environments, it is possible for each thread to have a

54
differnt GL context current.  This means that poor old glVertex3fv
55
has to know which GL context is current in the thread where it is being
56
called.

57
 
58

2. Overview of Mesa's Implementation

59
 
60

Mesa uses two per-thread pointers. The first pointer stores the address

61
of the context current in the thread, and the second pointer stores the
62
address of the dispatch table associated with that context.  The
63
dispatch table stores pointers to functions that actually implement
64
specific GL functions.  Each time a new context is made current in a thread,
65
these pointers a updated.

66
 
67

The implementation of functions such as glVertex3fv becomes

68
conceptually simple:

69
 
70
    71
  • Fetch the current dispatch table pointer.
  • 72
  • Fetch the pointer to the real glVertex3fv function from the
  • 73
    table.
    74
  • Call the real function.
  • 75
    76
     
    77

    This can be implemented in just a few lines of C code. The file

    78
    src/mesa/glapi/glapitemp.h contains code very similar to this.

    79
     
    80
    81
    82
    83
    void glVertex3f(GLfloat x, GLfloat y, GLfloat z)
    84
    {
    85
        const struct _glapi_table * const dispatch = GET_DISPATCH();
    86
     
    87
        (*dispatch->Vertex3f)(x, y, z);
    88
    }
    89
    Sample dispatch function
    90
    91
     
    92

    The problem with this simple implementation is the large amount of

    93
    overhead that it adds to every GL function call.

    94
     
    95

    In a multithreaded environment, a naive implementation of

    96
    GET_DISPATCH involves a call to pthread_getspecific or a
    97
    similar function.  Mesa provides a wrapper function called
    98
    _glapi_get_dispatch that is used by default.

    99
     
    100

    3. Optimizations

    101
     
    102

    A number of optimizations have been made over the years to diminish the

    103
    performance hit imposed by GL dispatch.  This section describes these
    104
    optimizations.  The benefits of each optimization and the situations where
    105
    each can or cannot be used are listed.

    106
     
    107

    3.1. Dual dispatch table pointers

    108
     
    109

    The vast majority of OpenGL applications use the API in a single threaded

    110
    manner.  That is, the application has only one thread that makes calls into
    111
    the GL.  In these cases, not only do the calls to
    112
    pthread_getspecific hurt performance, but they are completely
    113
    unnecessary!  It is possible to detect this common case and avoid these
    114
    calls.

    115
     
    116

    Each time a new dispatch table is set, Mesa examines and records the ID

    117
    of the executing thread.  If the same thread ID is always seen, Mesa knows
    118
    that the application is, from OpenGL's point of view, single threaded.

    119
     
    120

    As long as an application is single threaded, Mesa stores a pointer to

    121
    the dispatch table in a global variable called _glapi_Dispatch.
    122
    The pointer is also stored in a per-thread location via
    123
    pthread_setspecific.  When Mesa detects that an application has
    124
    become multithreaded, NULL is stored in _glapi_Dispatch.

    125
     
    126

    Using this simple mechanism the dispatch functions can detect the

    127
    multithreaded case by comparing _glapi_Dispatch to NULL.
    128
    The resulting implementation of GET_DISPATCH is slightly more
    129
    complex, but it avoids the expensive pthread_getspecific call in
    130
    the common case.

    131
     
    132
    133
    134
    135
    #define GET_DISPATCH() \
    136
        (_glapi_Dispatch != NULL) \
    137
            ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key)
    138
    139
    Improved GET_DISPATCH Implementation
    140
    141
     
    142

    3.2. ELF TLS

    143
     
    144

    Starting with the 2.4.20 Linux kernel, each thread is allocated an area

    145
    of per-thread, global storage.  Variables can be put in this area using some
    146
    extensions to GCC.  By storing the dispatch table pointer in this area, the
    147
    expensive call to pthread_getspecific and the test of
    148
    _glapi_Dispatch can be avoided.

    149
     
    150

    The dispatch table pointer is stored in a new variable called

    151
    _glapi_tls_Dispatch.  A new variable name is used so that a single
    152
    libGL can implement both interfaces.  This allows the libGL to operate with
    153
    direct rendering drivers that use either interface.  Once the pointer is
    154
    properly declared, GET_DISPACH becomes a simple variable
    155
    reference.

    156
     
    157
    158
    159
    160
    extern __thread struct _glapi_table *_glapi_tls_Dispatch
    161
        __attribute__((tls_model("initial-exec")));
    162
     
    163
    #define GET_DISPATCH() _glapi_tls_Dispatch
    164
    165
    TLS GET_DISPATCH Implementation
    166
    167
     
    168

    Use of this path is controlled by the preprocessor define

    169
    GLX_USE_TLS.  Any platform capable of using TLS should use this as
    170
    the default dispatch method.

    171
     
    172

    3.3. Assembly Language Dispatch Stubs

    173
     
    174

    Many platforms has difficulty properly optimizing the tail-call in the

    175
    dispatch stubs.  Platforms like x86 that pass parameters on the stack seem
    176
    to have even more difficulty optimizing these routines.  All of the dispatch
    177
    routines are very short, and it is trivial to create optimal assembly
    178
    language versions.  The amount of optimization provided by using assembly
    179
    stubs varies from platform to platform and application to application.
    180
    However, by using the assembly stubs, many platforms can use an additional
    181
    space optimization (see below).

    182
     
    183

    The biggest hurdle to creating assembly stubs is handling the various

    184
    ways that the dispatch table pointer can be accessed.  There are four
    185
    different methods that can be used:

    186
     
    187
      188
    1. Using _glapi_Dispatch directly in builds for non-multithreaded
    2. 189
      environments.
      190
    3. Using _glapi_Dispatch and _glapi_get_dispatch in
    4. 191
      multithreaded environments.
      192
    5. Using _glapi_Dispatch and pthread_getspecific in
    6. 193
      multithreaded environments.
      194
    7. Using _glapi_tls_Dispatch directly in TLS enabled
    8. 195
      multithreaded environments.
      196
      197
       
      198

      People wishing to implement assembly stubs for new platforms should focus

      199
      on #4 if the new platform supports TLS.  Otherwise, implement #2 followed by
      200
      #3.  Environments that do not support multithreading are uncommon and not
      201
      terribly relevant.

      202
       
      203

      Selection of the dispatch table pointer access method is controlled by a

      204
      few preprocessor defines.

      205
       
      206
        207
      • If GLX_USE_TLS is defined, method #4 is used.
      • 208
      • If HAVE_PTHREAD is defined, method #3 is used.
      • 209
      • If WIN32_THREADS is defined, method #2 is used.
      • 210
      • If none of the preceeding are defined, method #1 is used.
      • 211
        212
         
        213

        Two different techniques are used to handle the various different cases.

        214
        On x86 and SPARC, a macro called GL_STUB is used.  In the preamble
        215
        of the assembly source file different implementations of the macro are
        216
        selected based on the defined preprocessor variables.  The assmebly code
        217
        then consists of a series of invocations of the macros such as:
        218
         
        219
        220
        221
        222
        GL_STUB(Color3fv, _gloffset_Color3fv)
        223
        224
        SPARC Assembly Implementation of glColor3fv
        225
        226
         
        227

        The benefit of this technique is that changes to the calling pattern

        228
        (i.e., addition of a new dispatch table pointer access method) require fewer
        229
        changed lines in the assembly code.

        230
         
        231

        However, this technique can only be used on platforms where the function

        232
        implementation does not change based on the parameters passed to the
        233
        function.  For example, since x86 passes all parameters on the stack, no
        234
        additional code is needed to save and restore function parameters around a
        235
        call to pthread_getspecific.  Since x86-64 passes parameters in
        236
        registers, varying amounts of code needs to be inserted around the call to
        237
        pthread_getspecific to save and restore the GL function's
        238
        parameters.

        239
         
        240

        The other technique, used by platforms like x86-64 that cannot use the

        241
        first technique, is to insert #ifdef within the assembly
        242
        implementation of each function.  This makes the assembly file considerably
        243
        larger (e.g., 29,332 lines for glapi_x86-64.S versus 1,155 lines for
        244
        glapi_x86.S) and causes simple changes to the function
        245
        implementation to generate many lines of diffs.  Since the assmebly files
        246
        are typically generated by scripts (see below), this
        247
        isn't a significant problem.

        248
         
        249

        Once a new assembly file is created, it must be inserted in the build

        250
        system.  There are two steps to this.  The file must first be added to
        251
        src/mesa/sources.  That gets the file built and linked.  The second
        252
        step is to add the correct #ifdef magic to
        253
        src/mesa/glapi/glapi_dispatch.c to prevent the C version of the
        254
        dispatch functions from being built.

        255
         
        256

        3.4. Fixed-Length Dispatch Stubs

        257
         
        258

        To implement glXGetProcAddress, Mesa stores a table that

        259
        associates function names with pointers to those functions.  This table is
        260
        stored in src/mesa/glapi/glprocs.h.  For different reasons on
        261
        different platforms, storing all of those pointers is inefficient.  On most
        262
        platforms, including all known platforms that support TLS, we can avoid this
        263
        added overhead.

        264
         
        265

        If the assembly stubs are all the same size, the pointer need not be

        266
        stored for every function.  The location of the function can instead be
        267
        calculated by multiplying the size of the dispatch stub by the offset of the
        268
        function in the table.  This value is then added to the address of the first
        269
        dispatch stub.

        270
         
        271

        This path is activated by adding the correct #ifdef magic to

        272
        src/mesa/glapi/glapi.c just before glprocs.h is
        273
        included.

        274
         
        275

        4. Automatic Generation of Dispatch Stubs

        276
         
        277
        278
        279