Lexical Tools

LVG Rules: Trie Introduction

I. Background

A trie algorithm is used in LVG to find inflection variants and derivation. Rules are:

  • used when LVG does not find the result from fact (database table)
  • used as an independent program to find the results
  • driven by rules and exceptions specified in files
  • implements as trie tree (refer to Data Structure and Algorithms, by Aho, Hopcroft, and Ullman, PP.163-169)

II. Procedures

The high level algorithm flows are described as follows:

  1. Establish the trie tree by reading data (tree structure, rules, and exceptions) from files.
  2. Read in input term.
  3. Traverse the trie tree and find all match nodes.
  4. Check if the input term belongs to exceptions of the applied rules.
  5. Apply rules in match nodes to the input term.

III. Trie and Tree

Tree:
Tree has been used as powerful search algorithm. It includes binary tree and B-tree. A binary tree (searching) stores all data into a tree structure with all nodes have 2 branches at most. The B-tree was invented by Bayer and McCreight in the early 70's. It's now part of every database system. A node in a B-tree may have multiple branches.

Trie:
Trie is derived from the middle letters of the word "retrieval.

Difference:
In real applications, tree stores a whole word as a node while trie break a word into characters and store each single character into the tree structure.

< Example >
Data: be, bed, beset, bess, best, bet, better.

Binary Tree:


root (bess)
|
+--- Node (bed)
| |
| +--- Node (be)
| |
| +--- Node (beset)
|
+--- Node (bet)
|
+--- Node (best)
|
+--- Node (better)

Trie:


root (b)
|
+--- Node (e *)
|
+--- Node (d *)
|
+--- Node (s)
| |
| +--- Node (e)
| | |
| | +--- Node (t *)
| |
| +--- Node (s *)
| |
| +--- Node (t *)
|
+--- Node (t *)
|
+--- Node (t)
|
+--- Node (e)
|
+--- Node (r *)