Subversion Repositories Kolibri OS

Rev

Go to most recent revision | Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
4349 Serge 1
gxvalid: TrueType GX validator
2
==============================
3
 
4
 
5
1. What is this
6
---------------
7
 
8
  `gxvalid' is a module to  validate TrueType GX tables: a collection of
9
  additional tables  in TrueType  font which are  used by  `QuickDraw GX
10
  Text',  Apple Advanced  Typography  (AAT).  In  addition, gxvalid  can
11
  validates `kern'  tables which have  been extended for AAT.   Like the
12
  otvalid  module,   gxvalid  uses  Freetype   2's  validator  framework
13
  (ftvalid).
14
 
15
  You can link gxvalid with your program; before running your own layout
16
  engine, gxvalid validates a font  file.  As the result, you can remove
17
  error-checking code  from the layout  engine.  It is also  possible to
18
  use  gxvalid  as a  stand-alone  font  validator;  the `ftvalid'  test
19
  program  included  in the  ft2demo  bundle  calls gxvalid  internally.
20
  A stand-alone font validator may be useful for font developers.
21
 
22
  This documents documents the following issues.
23
 
24
  - supported TrueType GX tables
25
  - fundamental validation limitations
26
  - permissive error handling of broken GX tables
27
  - `kern' table issue.
28
 
29
 
30
2. Supported tables
31
-------------------
32
 
33
  The following GX tables are currently supported.
34
 
35
    bsln
36
    feat
37
    just
38
    kern(*)
39
    lcar
40
    mort
41
    morx
42
    opbd
43
    prop
44
    trak
45
 
46
  The following GX tables are currently unsupported.
47
 
48
    cvar
49
    fdsc
50
    fmtx
51
    fvar
52
    gvar
53
    Zapf
54
 
55
  The following GX tables won't be supported.
56
 
57
    acnt(**)
58
    hsty(***)
59
 
60
  The following undocumented tables in TrueType fonts designed for Apple
61
  platform aren't handled either.
62
 
63
    addg
64
    CVTM
65
    TPNM
66
    umif
67
 
68
 
69
  *)   The `kern'  validator handles both  the classic and the  new kern
70
       formats;  the former  is supported  on both  Microsoft  and Apple
71
       platforms, while the latter is supported on Apple platforms.
72
 
73
  **)  `acnt' tables are not supported by currently available Apple font
74
       tools.
75
 
76
  ***) There  is  one more  Apple  extension,  `hsty',  but  it  is  for
77
       Newton-OS, not GX  (Newton-OS is a platform by  Apple, but it can
78
       use  sfnt- housed bitmap  fonts only).   Therefore, it  should be
79
       excluded  from  `Apple  platform'  in the  context  of  TrueType.
80
       gxvalid ignores it as Apple font tools do so.
81
 
82
 
83
  We have  checked 183  fonts bundled with  MacOS 9.1, MacOS  9.2, MacOS
84
  10.0, MacOS X 10.1, MSIE  for MacOS, and AppleWorks 6.0.  In addition,
85
  we have  checked 67 Dynalab fonts  (designed for MacOS)  and 189 Ricoh
86
  fonts (designed for Windows and  MacOS dual platforms).  The number of
87
  fonts including TrueType GX tables are as follows.
88
 
89
    bsln:  76
90
    feat: 191
91
    just:  84
92
    kern:  59
93
    lcar:   4
94
    mort: 326
95
    morx:  19
96
    opbd:   4
97
    prop: 114
98
    trak:  16
99
 
100
  Dynalab  and Ricoh fonts  don't have  GX tables  except of  `feat' and
101
  `mort'.
102
 
103
 
104
3. Fundamental validation limitations
105
-------------------------------------
106
 
107
  TrueType  GX  provides  layout   information  to  libraries  for  font
108
  rasterizers  and text layout.   gxvalid can  check whether  the layout
109
  data in  a font is conformant  to the TrueType GX  format specified by
110
  Apple.  But gxvalid cannot check  a how QuickDraw GX/AAT renderer uses
111
  the stored information.
112
 
113
  3-1. Validation of State Machine activity
114
  -----------------------------------------
115
 
116
    QuickDraw GX/AAT uses a `State Machine' to provide `stateful' layout
117
    features,  and TrueType GX  stores the  state transition  diagram of
118
    this `State  Machine' in a  `StateTable' data structure.   While the
119
    State  Machine receives  a series  of glyph  IDs, the  State Machine
120
    starts with `start  of text' state, walks around  various states and
121
    generates various  layout information  to the  renderer, and finally
122
    reaches the `end of text' state.
123
 
124
    gxvalid can check essential errors like:
125
 
126
      - possibility of state transitions to undefined states
127
      - existence of glyph  IDs that the State Machine  doesn't know how
128
        to handle
129
      - the  State Machine  cannot compute  the layout  information from
130
        given diagram
131
 
132
    These errors  can be  checked within finite  steps, and  without the
133
    State Machine itself, because these are `expression' errors of state
134
    transition diagram.
135
 
136
    There  is no  limitation  about  how long  the  State Machine  walks
137
    around,  so validation  of  the algorithm  in  the state  transition
138
    diagram requires infinite  steps, even if we had  a State Machine in
139
    gxvalid.   Therefore, the  following errors  and problems  cannot be
140
    checked.
141
 
142
      - existence of states which the State Machine never transits to
143
      - the  possibility that the  State Machine  never reaches  `end of
144
        text'
145
      - the possibility of stack underflow/overflow in the State Machine
146
        (in  ligature  and  contextual  glyph substitutions,  the  State
147
        Machine can store 16 glyphs onto its stack)
148
 
149
    In addition, gxvalid doesn't check `temporary glyph IDs' used in the
150
    chained State Machines  (in `mort' and `morx' tables).   If a layout
151
    feature  is  implemented by  a  single  State  Machine, a  glyph  ID
152
    converted by the State Machine is passed to the glyph renderer, thus
153
    it  should not  point to  an undefined  glyph ID.   But if  a layout
154
    feature is implemented by  chained State Machines, a component State
155
    Machine  (if it  is  not the  final  one) is  permitted to  generate
156
    undefined glyph IDs for temporary use, because it is handled by next
157
    component State Machine and not  by the glyph renderer.  To validate
158
    such temporary glyph IDs, gxvalid must stack all undefined glyph IDs
159
    which  can occur in  the output  of the  previous State  Machine and
160
    search  them in  the  `ClassTable' structure  of  the current  State
161
    Machine.  It is too complex to  list all possible glyph IDs from the
162
    StateTable, especially from a ligature substitution table.
163
 
164
  3-2. Validation of relationship between multiple layout features
165
  ----------------------------------------------------------------
166
 
167
    gxvalid does  not validate the relationship  between multiple layout
168
    features at all.
169
 
170
    If  multiple layout  features  are defined  in  TrueType GX  tables,
171
    possible  interactions,  overrides,  and  conflicts  between  layout
172
    features are implicitly  given in the font too.   For example, there
173
    are several predefined spacing control features:
174
 
175
      - Text Spacing          (Proportional/Monospace/Half-width/Normal)
176
      - Number Spacing        (Monospaced-numbers/Proportional-numbers)
177
      - Kana Spacing          (Full-width/Proportional)
178
      - Ideographic Spacing   (Full-width/Proportional)
179
      - CJK Roman Spacing     (Half-width/Proportional/Default-roman
180
                               /Full-width-roman/Proportional)
181
 
182
    If all  layout features are  independently managed, we  can activate
183
    inconsistent  typographic rules  like  `Text Spacing=Monospace'  and
184
    `Ideographic Spacing=Proportional' at the same time.
185
 
186
    The combinations  of layout features  is managed by a  32bit integer
187
    (one bit each for selector  setting), so we can define relationships
188
    between  up  to 32  features,  theoretically.   But  if one  feature
189
    setting  affects  another   feature  setting,  we  need  typographic
190
    priority  rules to  validate the  relationship.   Unfortunately, the
191
    TrueType GX format specification does not give such information even
192
    for predefined features.
193
 
194
 
195
4. Permissive error handling of broken GX tables
196
------------------------------------------------
197
 
198
  When  Apple's font  rendering system  finds an  inconsistency,  like a
199
  specification  violation or  an  unspecified value  in  a TrueType  GX
200
  table, it does not always  return error.  In most cases, the rendering
201
  engine silently  ignores such wrong  values or even whole  tables.  In
202
  fact, MacOS is shipped with  fonts including broken GX/AAT tables, but
203
  no harmful  effects due to  `officially broken' fonts are  observed by
204
  end-users.
205
 
206
  gxvalid  is designed  to continue  the validation  process as  long as
207
  possible.  When gxvalid find wrong  values, gxvalid warns it at least,
208
  and takes  a fallback procedure  if possible.  The  fallback procedure
209
  depends on the debug level.
210
 
211
  We used the following three tools to investigate Apple's error handling.
212
 
213
    - FontValidator  (for MacOS 8.5 - 9.2)  resource fork font
214
    - ftxvalidator   (for MacOS X 10.1 -)   dfont or naked-sfnt
215
    - ftxdumperfuser (for MacOS X 10.1 -)   dfont or naked-sfnt
216
 
217
  However, all tests were done on a PowerPC based Macintosh; at present,
218
  we have not checked those tools on a m68k-based Macintosh.
219
 
220
  In total, we checked 183 fonts  bundled to MacOS 9.1, MacOS 9.2, MacOS
221
  10.0, MacOS X  10.1, MSIE for MacOS, and  AppleWorks 6.0.  These fonts
222
  are distributed  officially, but many broken GX/AAT  tables were found
223
  by Apple's font tools.  In the following, we list typical violation of
224
  the GX specification, in fonts officially distributed with those Apple
225
  systems.
226
 
227
  4-1. broken BinSrchHeader (19/183)
228
  ----------------------------------
229
 
230
    `BinSrchHeader' is  a header of a  data array for  m68k platforms to
231
    access memory efficiently.  Although  there are only two independent
232
    parameters  for real  (`unitSize' and  `nUnits'),  BinSrchHeader has
233
    three additional parameters which  can be calculated from `unitSize'
234
    and  `nUnits',  for  fast  setup.   Apple  font  tools  ignore  them
235
    silently, so gxvalid warns if it finds and inconsistency, and always
236
    continues  validation.    The  additional  parameters   are  ignored
237
    regardless of the consistency.
238
 
239
      19  fonts include  such  inconsistencies; all  breaks  are in  the
240
      BinSrchHeader structure of the `kern' table.
241
 
242
  4-2. too-short LookupTable (5/183)
243
  ----------------------------------
244
 
245
    LookupTable format 0  is a simple array to get a  value from a given
246
    GID (glyph  ID); the index of  this array is a  GID too.  Therefore,
247
    the length  of the array is expected  to be same as  the maximum GID
248
    value defined  in the `maxp' table,  but there are  some fonts whose
249
    LookupTable format 0 is too  short to cover all GIDs.  FontValidator
250
    ignores  this error silently,  ftxvalidator and  ftxdumperfuser both
251
    warn and continue.  Similar problems are found in format 3 subtables
252
    of `kern'.  gxvalid  warns always and abort if  the validation level
253
    is set to FT_VALIDATE_PARANOID.
254
 
255
      5 fonts include too-short kern format 0 subtables.
256
      1 font includes too-short kern format 3 subtable.
257
 
258
  4-3. broken LookupTable format 2 (1/183)
259
  ----------------------------------------
260
 
261
    LookupTable  format  2,  subformat  4  covers the  GID  space  by  a
262
    collection  of  segments which  are  specified  by `firstGlyph'  and
263
    `lastGlyph'.   Some  fonts  store  `firstGlyph' and  `lastGlyph'  in
264
    reverse order,  so the segment specification is  broken.  Apple font
265
    tools ignore this error silently;  a broken segment is ignored as if
266
    it  did not  exist.   gxvalid  warns and  normalize  the segment  at
267
    FT_VALIDATE_DEFAULT, or ignore  the segment at FT_VALIDATE_TIGHT, or
268
    abort at FT_VALIDATE_PARANOID.
269
 
270
      1 font includes broken LookupTable format 2, in the `just' table.
271
 
272
    *) It seems  that all fonts manufactured by  ITC for AppleWorks have
273
       this error.
274
 
275
  4-4. bad bracketing in glyph property (14/183)
276
  ----------------------------------------------
277
 
278
    GX/AAT defines a  `bracketing' property of the glyphs  in the `prop'
279
    table,  to control layout  features of  strings enclosed  inside and
280
    outside  of   brackets.   Some  fonts   give  inappropriate  bracket
281
    properties  to glyphs.   Apple  font tools  warn  about this  error;
282
    gxvalid warns too and aborts at FT_VALIDATE_PARANOID.
283
 
284
      14 fonts include wrong bracket properties.
285
 
286
 
287
  4-5. invalid feature number (117/183)
288
  -------------------------------------
289
 
290
    The GX/AAT extension can  include 255 different layout features, but
291
    popular      layout      features      are      predefined      (see
292
    http://developer.apple.com/fonts/Registry/index.html).   Some  fonts
293
    include feature  numbers which are incompatible  with the predefined
294
    feature registry.
295
 
296
    In our survey, there are 140 fonts including `feat' table.
297
 
298
    a) 67 fonts use a feature number which should not be used.
299
    b) 117 fonts set the wrong feature range (nSetting).  This is mostly
300
       found in the `mort' and `morx' tables.
301
 
302
    Apple  font tools give  no warning,  although they  cannot recognize
303
    what  the feature  is.   At FT_VALIDATE_DEFAULT,  gxvalid warns  but
304
    continues in both cases (a, b).  At FT_VALIDATE_TIGHT, gxvalid warns
305
    and aborts for (a), but continues for (b).  At FT_VALIDATE_PARANOID,
306
    gxvalid warns and aborts in both cases (a, b).
307
 
308
  4-6. invalid prop version (10/183)
309
  ----------------------------------
310
 
311
    As most TrueType GX tables, the `prop' table must start with a 32bit
312
    version identifier: 0x00010000,  0x00020000 or 0x00030000.  But some
313
    fonts  store nonsense binary  data instead.   When Apple  font tools
314
    find them, they abort the processing immediately, and the data which
315
    follows is unhandled.  gxvalid does the same.
316
 
317
      10 fonts include broken `prop' version.
318
 
319
    All  of these  fonts are  classic  TrueType fonts  for the  Japanese
320
    script, manufactured by Apple.
321
 
322
  4-7. unknown resource name (2/183)
323
  ------------------------------------
324
 
325
    NOTE: THIS IS NOT A TRUETYPE GX ERROR.
326
 
327
    If  a TrueType  font is  stored  in the  resource fork  or in  dfont
328
    format, the data must be tagged as `sfnt' in the resource fork index
329
    to invoke TrueType font handler for the data.  But the TrueType font
330
    data  in   `Keyboard.dfont'  is  tagged   as  `kbd',  and   that  in
331
    `LastResort.dfont' is tagged as  `lst'.  Apple font tools can detect
332
    that the data is in  TrueType format and successfully validate them.
333
    Maybe  this is possible  because they  are known  to be  dfont.  The
334
    current  implementation  of the  resource  fork  driver of  FreeType
335
    cannot do that, thus gxvalid cannot validate them.
336
 
337
      2 fonts use an unknown tag for the TrueType font resource.
338
 
339
5. `kern' table issues
340
----------------------
341
 
342
  In common terminology of TrueType, `kern' is classified as a basic and
343
  platform-independent table.  But there are Apple extensions of `kern',
344
  and  there is  an  extension which  requires  a GX  state machine  for
345
  contextual kerning.   Therefore, gxvalid includes  a special validator
346
  for  `kern' tables.   Unfortunately, there  is no  exact  algorithm to
347
  check Apple's extension, so  gxvalid includes a heuristic algorithm to
348
  find  the proper validation  routines for  all possible  data formats,
349
  including    the   data    format   for    Microsoft.     By   calling
350
  classic_kern_validate() instead of gxv_validate(), you can specify the
351
  `kern' format  explicitly.  However, current  FreeType2 uses Microsoft
352
  `kern' format  only, others  are ignored (and  should be handled  in a
353
  library one level higher than FreeType).
354
 
355
  5-1. History
356
  ------------
357
 
358
    The original  16bit version of `kern'  was designed by  Apple in the
359
    pre-GX  era, and  it was  also approved  by  Microsoft.  Afterwards,
360
    Apple designed a  new 32bit version of the  `kern' table.  According
361
    to  the documentation, the  difference between  the 16bit  and 32bit
362
    version is only the size of  variables in the `kern' header.  In the
363
    following,  we call  the original  16bit version  as  `classic', and
364
    32bit version as `new'.
365
 
366
  5-2. Versions and dialects which should be differentiated
367
  ---------------------------------------------------------
368
 
369
    The `kern' table  consists of a table header  and several subtables.
370
    The version number  which identifies a `classic' or  a `new' version
371
    is  explicitly   written  in  the   table  header,  but   there  are
372
    undocumented  differences between  Microsoft's and  Apple's formats.
373
    It is  called a `dialect' in  the following.  There  are three cases
374
    which  should  be  handled:   the  new  Apple-dialect,  the  classic
375
    Apple-dialect,  and the classic  Microsoft-dialect.  An  analysis of
376
    the formats and the auto detection algorithm of gxvalid is described
377
    in the following.
378
 
379
    5-2-1. Version detection: classic and new kern
380
    ----------------------------------------------
381
 
382
      According  to Apple  TrueType  specification, there  are only  two
383
      differences between the classic and the new:
384
 
385
        - The `kern' table header starts with the version number.
386
          The classic version starts with 0x0000 (16bit),
387
          the new version starts with 0x00010000 (32bit).
388
 
389
        - In the  `kern' table header,  the number of  subtables follows
390
          the version number.
391
          In the classic version, it is stored as a 16bit value.
392
          In the new version, it is stored as a 32bit value.
393
 
394
      From Apple font tool's output (DumpKERN is also tested in addition
395
      to  the  three  Apple  font  tools in  above),  there  is  another
396
      undocumented difference.  In the  new version, the subtable header
397
      includes a 16bit variable  named `tupleIndex' which does not exist
398
      in the classic version.
399
 
400
      The new version  can store all subtable formats (0,  1, 2, and 3),
401
      but the Apple TrueType specification does not mention the subtable
402
      formats available in the classic version.
403
 
404
    5-2-2. Available subtable formats in classic version
405
    ----------------------------------------------------
406
 
407
      Although the  Apple TrueType  specification recommends to  use the
408
      classic version in  the case if the font is  designed for both the
409
      Apple and Microsoft platforms,  it does not document the available
410
      subtable formats in the classic version.
411
 
412
      According  to the Microsoft  TrueType specification,  the subtable
413
      format  assured for  Windows  and OS/2  support  is only  subtable
414
      format  0.  The  Microsoft TrueType  specification  also describes
415
      subtable format  2, but does  not mention which  platforms support
416
      it.  Aubtable formats 1, 3,  and higher are documented as reserved
417
      for future use.  Therefore, the classic version can store subtable
418
      formats 0 and 2, at least.  `ttfdump.exe', a font tool provided by
419
      Microsoft,  ignores the  subtable format  written in  the subtable
420
      header, and parses the table as if all subtables are in format 0.
421
 
422
      `kern'  subtable format  1  uses  a StateTable,  so  it cannot  be
423
      utilized without a GX  State Machine.  Therefore, it is reasonable
424
      to assume  that format 1 (and  3) were introduced  after Apple had
425
      introduced GX and moved to the new 32bit version.
426
 
427
    5-2-3. Apple and Microsoft dialects
428
    -----------------------------------
429
 
430
      The  `kern' subtable  has  a 16bit  `coverage'  field to  describe
431
      kerning attributes, but bit interpretations by Apple and Microsoft
432
      are different:  For example, Apple  uses bits 0-7 to  identify the
433
      subtable, while Microsoft uses bits 8-15.
434
 
435
      In  addition, due  to the  output of  DumpKERN  and FontValidator,
436
      Apple's bit interpretations of coverage in classic and new version
437
      are  incompatible also.   In  summary, there  are three  dialects:
438
      classic Apple  dialect, classic  Microsoft dialect, and  new Apple
439
      dialect.  The classic Microsoft  dialect and the new Apple dialect
440
      are documented  by each vendors' TrueType  font specification, but
441
      the documentation for classic Apple dialect is not available.
442
 
443
      For example,  in the  new Apple dialect,  bit 15 is  documented as
444
      `set to  1 if  the kerning  is vertical'.  On  the other  hand, in
445
      classic Microsoft dialect, bit 1 is documented as `set to 1 if the
446
      kerning  is  horizontal'.   From   the  outputs  of  DumpKERN  and
447
      FontValidator, classic  Apple dialect recognizes  15 as `set  to 1
448
      when  the kerning  is horizontal'.   From the  results  of similar
449
      experiments, classic Apple dialect  seems to be the Endian reverse
450
      of the classic Microsoft dialect.
451
 
452
      As a  conclusion it must be  noted that no font  tool can identify
453
      classic Apple dialect or classic Microsoft dialect automatically.
454
 
455
    5-2-4. gxvalid auto dialect detection algorithm
456
    -----------------------------------------------
457
 
458
      The first 16  bits of the `kern' table are  enough to identify the
459
      version:
460
 
461
        - if  the first  16  bits are  0x0000,  the `kern'  table is  in
462
          classic Apple dialect or classic Microsoft dialect
463
        - if the first 16 bits are  0x0001, and next 16 bits are 0x0000,
464
          the kern table is in new Apple dialect.
465
 
466
      If the `kern'  table is a classic one,  the 16bit `coverage' field
467
      is checked next.   Firstly, the coverage bits are  decoded for the
468
      classic Apple dialect using the following bit masks (this is based
469
      on DumpKERN output):
470
 
471
        0x8000: 1=horizontal, 0=vertical
472
        0x4000: not used
473
        0x2000: 1=cross-stream, 0=normal
474
        0x1FF0: reserved
475
        0x000F: subtable format
476
 
477
      If  any  of  reserved  bits  are  set  or  the  subtable  bits  is
478
      interpreted as format 1 or 3, we take it as `impossible in classic
479
      Apple dialect' and retry, using the classic Microsoft dialect.
480
 
481
        The most popular coverage in new Apple-dialect:         0x8000,
482
        The most popular coverage in classic Apple-dialect:     0x0000,
483
        The most popular coverage in classic Microsoft dialect: 0x0001.
484
 
485
  5-3. Tested fonts
486
  -----------------
487
 
488
    We checked  59 fonts  bundled with MacOS  and 38 fonts  bundled with
489
    Windows, where all font include a `kern' table.
490
 
491
      - fonts bundled with MacOS
492
        * new Apple dialect
493
          format 0: 18
494
          format 2:  1
495
          format 3:  1
496
        * classic Apple dialect
497
          format 0: 14
498
        * classic Microsoft dialect
499
          format 0: 15
500
 
501
      - fonts bundled with Windows
502
        * classic Microsoft dialect
503
          format 0: 38
504
 
505
    It looks strange that classic Microsoft-dialect fonts are bundled to
506
    MacOS: they come from MSIE for MacOS, except of MarkerFelt.dfont.
507
 
508
 
509
  ACKNOWLEDGEMENT
510
  ---------------
511
 
512
  Some parts of gxvalid are  derived from both the `gxlayout' module and
513
  the `otvalid'  module.  Development of  gxlayout was supported  by the
514
  Information-technology Promotion Agency(IPA), Japan.
515
 
516
  The detailed analysis of undefined  glyph ID utilization in `mort' and
517
  `morx' tables is provided by George Williams.
518
 
519
------------------------------------------------------------------------
520
 
521
Copyright 2004, 2005, 2007 by
522
suzuki toshiya, Masatake YAMATO, Red hat K.K.,
523
David Turner, Robert Wilhelm, and Werner Lemberg.
524
 
525
This  file is  part  of the  FreeType  project, and  may  only be  used,
526
modified,  and  distributed under  the  terms  of  the FreeType  project
527
license, LICENSE.TXT.  By continuing  to use, modify, or distribute this
528
file  you indicate that  you have  read the  license and  understand and
529
accept it fully.
530
 
531
 
532
--- end of README ---