Subversion Repositories Kolibri OS

Rev

Go to most recent revision | Blame | Last modification | View Log | Download | RSS feed

  1. Hubbub parser architecture
  2. ==========================
  3.  
  4. Introduction
  5. ------------
  6.  
  7.   Hubbub is a flexible HTML parser. It offers two interfaces:
  8.  
  9.     * a SAX-style event interface
  10.     * a DOM-style tree-based interface
  11.  
  12. Overview
  13. --------
  14.  
  15.   Hubbub is comprised of two parts:
  16.  
  17.     * a tokeniser
  18.     * a tree builder
  19.  
  20.   Tokeniser
  21.   ---------
  22.  
  23.     The tokeniser divides the data held in the document buffer into chunks.
  24.     It sends SAX-style events for each chunk.
  25.  
  26.   Tree builder
  27.   ------------
  28.  
  29.     The tree builder constructs a DOM-like tree from the SAX events emitted by
  30.     the tokeniser. The exact representation of the tree is up to the client,
  31.     which must provide a number of tree building handler functions.
  32.  
  33. Memory usage and ownership
  34. --------------------------
  35.  
  36.   Memory usage within the library is well defined, as is ownership of allocated
  37.   memory.
  38.  
  39.   Raw input data provided by the library client is owned by the client.
  40.  
  41.   SAX events which refer to document segments contain direct references to
  42.   internal data. Token objects are transient and data within them are no
  43.   longer valid once the event handler has returned control to the tokeniser.
  44.   All data returned by a SAX event is owned by the library.
  45.  
  46.   The tree builder will use client callbacks to create the objects used
  47.   within the tree. Tree objects may be reference counted (the client may
  48.   do nothing in the ref/unref callbacks and use garbage collection instead).
  49.   The resultant tree is owned by the client.
  50.  
  51. Parse errors
  52. ------------
  53.  
  54.   Notification of parse errors is made through a dedicated event. This event
  55.   contains the line/column offset of the error location, along with a message
  56.   detailing the error.
  57.  
  58. Exceptional circumstances
  59. -------------------------
  60.  
  61.   Exceptional circumstances (such as memory exhaustion) are reported
  62.   immediately.
  63.  
  64.   The parser's state in such situations is undefined. There is no recovery
  65.   mechanism.
  66.  
  67.   Therefore, if the client is able to recover from the exceptional
  68.   circumstance (e.g. by making more free memory available) the only valid
  69.   way to proceed is to create a new parser instance and start parsing from
  70.   scratch. The client should ensure that they destroy the old parser instance
  71.   and any DOM tree produced by it, to avoid resource leaks.
  72.