SPT - Terms Trie Tree Design
I. Introduction
The next two sections are used to find all multi-words/terms in a given term match terms from a list of terms. A retrieve algorithm (Trie) is applied for this purpose.
II. Trie Algorithm
- Trie Node:
- Trie nodes are used to compose the Trie tree.
- Each node has a key, which is a (single) word
- Each node has a level, the top (root) level is 0
- Each node has a child, which is a Vector<TrieNode>
- Use "$_ROOT" for the word of root node
- Use "$_END" for the word of end node
- Trie Tree:
- Each branch (path) starts from root node to the end node is a term
- All terms are stored in the branch of the Trie tree
- Tokenize input term into words
- Go through each word:
- curNode starts from the top (root) node
- Init curWordNode by curWord
- Check if Childs contains curWordNode
- if yes, update curNode to the matched node in Childs
- If no, add curWordNode to Childs and update curNode to curWordNode
III. Java Classes
- TrieNode.java: a Java object for TrieNode
- TrieTree.java: a Java class for TrieTree
IV. Examples
- Synonym Rules:
word | synonym
|
---|
dog | canine
|
cat | feline
|
canine | K9
|
K9 | bull dog
|
Dog and cat | pets
|
puppy and kitty | pets
|
- Synonym Terms:
Terms
|
---|
dog
|
canine
|
cat
|
feline
|
k9
|
bull dog
|
dog and cat
|
pets
|
puppy and kitty
|
- Trie Tree (from Synonym Terms):