Lexical Tools

LVG Rules: Persistent Trie

I. Persistent Data

A persistent data system is to store data in files instead of RAM. In addition, these data can be randomly accessed by address from files. This function can be performed by a Java class, RandomAccessFile( ). The key issue of using this technique is to define the format (protocol) of how the data is sorted and retrieved. Usually, a header is needed to put at the front of each segment of data to describe the Meta data of storing data.

II. Persistent List

A persistent link class is used to perform one dimension linked list through persistent files.

  • File Data format:
    header + node1 + node2 + ...
    where header is:
    • 0~3 : int : numbers of nodes
    • 4~11: long: next address (address will be used for next node)

    node:
    • 0~x: specific data (length x varies)
    • x+1~x+8: long: address for next node
  • One persistent file is capable of storing multiple linked lists.
  • All nodes belong to one linked list need to be added (stored) into persistent file at once (before storing next list) if multiple lists is stored in the same file.
  • The node must be the child class of PersistentListNode( ).
  • The data structure in a list node includes address for this node, address for next node, and other node specific data.
  • Only same type of node can be stored for one linked list.

III. Persistent Tree

A persistent tree class is used to perform tree operations (two dimensional) through persistent files.

  • File Data format:
    node1 + node2 + ...
    where node:
    • 0~3: int: level
    • 4~11: long: address for next node
    • 12~19: long: address for parent node
    • 20~27: long: address for child list
    • 27+: specific data (length varies)
  • One persistent file is capable of storing multiple trees. However, we use one file to store one tree in our trie application for simplification.
  • The node must be the child class of PersistentTreeNode( ).
  • The data structure in a tree node includes address for this node, address for next node, address for the child, address for the parent, level of the node, and other node specific data.
  • Only same type of node can be stored for one linked list.

IV. Persistent Trie A persistent trie is developed to get inflections and derivations in LVG.

    < pre-process >
  • It reads in all rules from flat files and then store trie, rules, and exceptions into persistent files.
  • It uses PersistentTree to store trie tree in the persistent file.
  • It uses PersistentList to store rules and exceptions in the persistent file.

< process >

  • It opens persistent files.
  • It retrieves data from persistent files.

< post-process >

  • It closes persistent files.

V. Notes

The persistent trie uses Java RandomAccessFile class, which is not buffered and thus is slow. Buffered RandamAccessFile will improve the performance. In a study in 2002 (for 2003 release), replace PersistentTrie with RamTrie (put all infromation in the memory) improve performance more than 50% without too much heap size increased. Thus. PersistentTrie is not used after Lexical Tool 2013 release (use RamTire instead).