LVG Rules: Persistent Trie
I. Persistent Data
A persistent data system is to store data in files instead of RAM. In addition, these data can be randomly accessed by address from files. This function can be performed by a Java class, RandomAccessFile( ). The key issue of using this technique is to define the format (protocol) of how the data is sorted and retrieved. Usually, a header is needed to put at the front of each segment of data to describe the Meta data of storing data.
II. Persistent List
A persistent link class is used to perform one dimension linked list through persistent files.
- File Data format:
header + node1 + node2 + ...
where header is:
- 0~3 : int : numbers of nodes
- 4~11: long: next address (address will be used for next node)
node:
- 0~x: specific data (length x varies)
- x+1~x+8: long: address for next node
- One persistent file is capable of storing multiple linked lists.
- All nodes belong to one linked list need to be added (stored) into persistent file at once (before storing next list) if multiple lists is stored in the same file.
- The node must be the child class of PersistentListNode( ).
- The data structure in a list node includes address for this node, address for next node, and other node specific data.
- Only same type of node can be stored for one linked list.
III. Persistent Tree
A persistent tree class is used to perform tree operations (two dimensional) through persistent files.
- File Data format:
node1 + node2 + ...
where node:
- 0~3: int: level
- 4~11: long: address for next node
- 12~19: long: address for parent node
- 20~27: long: address for child list
- 27+: specific data (length varies)
- One persistent file is capable of storing multiple trees. However, we use one file to store one tree in our trie application for simplification.
- The node must be the child class of PersistentTreeNode( ).
- The data structure in a tree node includes address for this node, address for next node, address for the child, address for the parent, level of the node, and other node specific data.
- Only same type of node can be stored for one linked list.
IV. Persistent Trie
A persistent trie is developed to get inflections and derivations in LVG.
< pre-process >
- It reads in all rules from flat files and then store trie, rules, and exceptions into persistent files.
- It uses PersistentTree to store trie tree in the persistent file.
- It uses PersistentList to store rules and exceptions in the persistent file.
< process >
- It opens persistent files.
- It retrieves data from persistent files.
< post-process >
- It closes persistent files.
V. Notes
The persistent trie uses Java RandomAccessFile class, which is not buffered and thus is slow. Buffered RandamAccessFile will improve the performance. In a study in 2002 (for 2003 release), replace PersistentTrie with RamTrie (put all infromation in the memory) improve performance more than 50% without too much heap size increased. Thus. PersistentTrie is not used after Lexical Tool 2013 release (use RamTire instead).