Lexical Tools

LVG Rules: Trie

The trie mechanism is used in Lexcial Tools for inflectional rules and derivational (suffix) rules. A persistent trie (uses Java RandomAccessFile class) was used before lvg.2002. Due to the relatively slower performance, RamTire is used to replace persistent trie after Lvg.2003. The inital memory footprint size for im.rul and dm.rul are about 2.0 MB and 3.0 MB of Lexcial Tools, respectively. After Lvg.2014, rm.rul is generated automatically to include the optimal set of Sd-Rules with associated exceptions to reach above 95% of precision and recall rate (of all candidate Sd-Rules). This new approach increased the footprint of dm.rul to 4.3 Mb. With this small footprint size and one time initiation overhead, the performance of trie has relatively high performance. The design are detailed as follows:

  1. Introduction
  2. File Format
  3. Wild Card
  4. Trie Tree
  5. Persistent Trie
  6. Java Class Usage
  7. Reference Documents