Lexical Tools

LVG Rules: Persistent Trie

I. Persistent Data

A persistent data system is to store data in files instead of RAM. In addition, these data can be randomly accessed by address from files. This function can be performed by a Java class, RandomAccessFile( ). The key issue of using this technique is to define the format (protocol) of how the data is sorted and retrieved. Usually, a header is needed to put at the front of each segment of data to describe the Meta data of storing data.

II. Persistent List

A persistent link class is used to perform one dimension linked list through persistent files.

File Data format:
header + node1 + node2 + ...
where header is:
- 0~3 : int : numbers of nodes
- 4~11: long: next address (address will be used for next node)
node:
- 0~x: specific data (length x varies)
- x+1~x+8: long: address for next node
One persistent file is capable of storing multiple linked lists.
All nodes belong to one linked list need to be added (stored) into persistent file at once (before storing next list) if multiple lists is stored in the same file.
The node must be the child class of PersistentListNode( ).
The data structure in a list node includes address for this node, address for next node, and other node specific data.
Only same type of node can be stored for one linked list.

III. Persistent Tree

A persistent tree class is used to perform tree operations (two dimensional) through persistent files.

File Data format:
node1 + node2 + ...
where node:
- 0~3: int: level
- 4~11: long: address for next node
- 12~19: long: address for parent node
- 20~27: long: address for child list
- 27+: specific data (length varies)
One persistent file is capable of storing multiple trees. However, we use one file to store one tree in our trie application for simplification.
The node must be the child class of PersistentTreeNode( ).
The data structure in a tree node includes address for this node, address for next node, address for the child, address for the parent, level of the node, and other node specific data.
Only same type of node can be stored for one linked list.

IV. Persistent Trie A persistent trie is developed to get inflections and derivations in LVG.

pre-process >

It reads in all rules from flat files and then store trie, rules, and exceptions into persistent files.
It uses PersistentTree to store trie tree in the persistent file.
It uses PersistentList to store rules and exceptions in the persistent file.

< process >

It opens persistent files.
It retrieves data from persistent files.

< post-process >

It closes persistent files.

V. Notes

The persistent trie uses Java RandomAccessFile class, which is not buffered and thus is slow. Buffered RandamAccessFile will improve the performance. In a study in 2002 (for 2003 release), replace PersistentTrie with RamTrie (put all infromation in the memory) improve performance more than 50% without too much heap size increased. Thus. PersistentTrie is not used after Lexical Tool 2013 release (use RamTire instead).