Sub-Term Mapping Tools

UMLS-Core: Sub-Term

  • Descriptions:
    • Find all matched terms in a Trie Tree (synonyms) from the input
    • The match terms (sub-term of input term) include the starting and ending indexes from the input term

  • Examples - Test Cases:

    • Terms in corpus:

      Terms
      dog
      canine
      cat
      feline
      k9
      bull dog
      dog and cat
      pets
      puppy and kitty

    • Input Term:
      Who let dogs and CAT out
      • SynonynMapNorm: who let dog and cat out
      • Go through terms from "who let dog and cat out"
        icurTermbranchMatchesmatchTerms
        0who let dog and cat out  
        1let dog and cat out  
        2dog and cat out
        • dog
        • dog and cat
        • dog
        • dog and cat
        3and cat out 
        • dog
        • dog and cat
        4cat out
        • cat
        • dog
        • dog and cat
        • cat
        5out 
        • dog
        • dog and cat
        • cat

    • Outputs:

      return matched terms | start index | end indexes:

      • dog|2|3
      • dog and cat|2|5
      • cat|4|5

    • Trie Tree

  • Algorithm:
    • Init Vector<String> matchTerms
    • SynonymMapNorm the input term to newInTerm
    • Get inWords by tokenizing newInTerm
    • Go through terms from the inWords
      • Get curTerm from startIndex of inWords
      • Find branchMatches
        • Normalize the input term:
          • SynonymMapNorm
          • Add " $_END" (the END node)
        • Tokenize normalized term into inWords as a Vector<String>
        • Set the curNode to ROOT node
        • Init Vector branchMatches
        • Go through the inWords
          • Initiate curWordNode by the curWord
          • get curChilds from curNode
          • Check if curChilds has END node
            • Yes => add the branch term to branchMatches
          • Check if curChilds contains curWordNode
            • yes => update curNode
            • no => not match (false), break
      • Add branchMatches to matchTerms