Lexical Tools

SPT - Terms Trie Tree Design

I. Introduction
The next two sections are used to find all multi-words/terms in a given term match terms from a list of terms. A retrieve algorithm (Trie) is applied for this purpose.

II. Trie Algorithm

  • Trie Node:
    • Trie nodes are used to compose the Trie tree.
    • Each node has a key, which is a (single) word
    • Each node has a level, the top (root) level is 0
    • Each node has a child, which is a Vector<TrieNode>

    • Use "$_ROOT" for the word of root node
    • Use "$_END" for the word of end node

  • Trie Tree:
    • Each branch (path) starts from root node to the end node is a term
    • All terms are stored in the branch of the Trie tree

    • Tokenize input term into words
    • Go through each word:
      • curNode starts from the top (root) node
      • Init curWordNode by curWord
      • Check if Childs contains curWordNode
        • if yes, update curNode to the matched node in Childs
        • If no, add curWordNode to Childs and update curNode to curWordNode

III. Java Classes

  • TrieNode.java: a Java object for TrieNode
  • TrieTree.java: a Java class for TrieTree

IV. Examples

  • Synonym Rules:

    wordsynonym
    dogcanine
    catfeline
    canineK9
    K9bull dog
    Dog and catpets
    puppy and kittypets

  • Synonym Terms:

    Terms
    dog
    canine
    cat
    feline
    k9
    bull dog
    dog and cat
    pets
    puppy and kitty

  • Trie Tree (from Synonym Terms):