Sub-Term Mapping Tools

SMT: Synonym Files

  • Descriptions:

    This page describes how to convert your own synonym file to a SMT standard normalized synonym file and then loaded into SMT corpus tree.

  • Standard normalized synonyms file
    • A Java class, SynonymsTable, in STMT package can be used to:
      • read in synonym file(s) in different format
      • store normTerm|synonym in a Hashtable
      • convert to standard format normTerm|synonym

    • normalize key:

      VariationsStandardizationLvg flowNotes and Examples
      genitiveremove genitive-f:g
      • down's|mongoloid
      • bristowe's|split brain
      • cowper's|bulbourethral
      parenthetic plural fromremove (s), (es), (ies)-f:rsThis is needed to standardize the term (not the synonym)
      spelling variantsget citation forms of all words in the term-f:Ct 
      inflectional variants 
      punctuationreplace with space-f:o
      • trim multiple spaces
      • % is removed
      Upper, lower, mixed caseslower case-f:l 
      stopWordNot implemented-f:t 

    • User's input synonym file to SMT standard normalized synonym file

      FeaturesDescriptionsUser's InputSMT Standard Normalized
      # for commentsA line starts with # is a comment# This is a comment 
      DuplicatesAll duplicated synonym pairs are removed
      • TB|TUBERCULOSIS
      • TB|TUBERCULOSIS
      • tb|TUBERCULOSIS
      • tuberculosis|TB
      RedundancyRemove the synonym pair if the synonym terms are the same
      • decal|decal
       
      Norm RedundancyRemove the synonym pair if the normed synonym terms are the same. This is an optional because no new CUI will be found for such substitution
      • Fecal|fecal
       
      Bi-directionGenerate symmetric synonym pairs
      • AB|ANTIBODY
      • ab|ANTIBODY
      • antibody|AB
      Multiple synonym pairsConvert to all possible double pairs
      • AF|ATRIAL FIBRILLATION|A FIB
      • af|ATRIAL FIBRILLATION
      • af|A FIB
      • atrial fibrillation|A FIB
      • atrial fibrillation|AF
      • a fib|AF
      • a fib|ATRIAL FIBRILLATION
      Recursive synonymsNot implemented (manually add them)
      • abut|adjoin
      • abut|to touch
      • ...
      • anaemia|anemia
      • chlorosis|chlorotic anemia
      • ...
      • adjoin|to touch
      • ...
      • chlorosis|chlorotic anaemia
      • ...
      suffixesNot implemented (manually remove them)
      • Knife|-tome
       

    • Java class to use

  • Load synonyms file to SMT corpus tree

    Set the SYNONYM_FILE to the standard normalized synonyms file generated from above in the SMT configuration file (${STMT}/data/Config/stm.properties)