Sub-Term Mapping Tools

UMLS-Core: Synonym Files

  • Descriptions:
    • A Java class, GenDefaultSynonymsFile, is implemented in STMT package to read in synonym files (with different format) and stored into normTerm-synonym table
    • There are 8 files (from different sources) with different formats and needs to be standardized first:

      FileDescriptionsNotes
      acronym_edited.txtacronyms and abbreviationsacronym|upper case|punctuation
      british.txtBritish Englishspelling varaints|lower case|punctuation
      ecri.txtECRI (Emergency Care Research Institute) Medical Devicemixed case|multiple sysnonym pairs
      greco-latin.txtGreco-Latinlower case|bi-direction
      lvg.txtLexical Toolsmixed case|genitive|punctuation|bi-direction
      ramiller-prune_kwf.txtRamiller (after human review) mixed case|genitive|punctuation|multiple sysnonym pairs
      To_add_UMLS_syn_consolidated.txtTo be added to UMLSmixed case|punctuation|multiple sysnonym pairs
      UMLS_syn_consolidated.txtUMLSlowercase|punctuation
    • All files have different formats (needs to be standardized)
    • Synonym variations:

      VariationsStandardizationLvg flowNotes and Examples
      genitiveremove genitive-f:g
      • down's|mongoloid
      • bristowe's|split brain
      • cowper's|bulbourethral
      parenthetic plural fromremove (s), (es), (ies)-f:rsThis is needed to standardize the term (not the synonym)
      punctuationreplace with space-f:o
      • trim multiple spaces
      • % is removed
      Upper, lower, mixed caseslower case-f:l 
      spelling variantsuse citation form instead-f:Ct 
      inflectional variantsuse citation form instead-f:Ct 
      stopWorddo nothing  
      suffixesmanually remove them 
      • Knife|-tome
      recursive synonymsnot implemented 
      • abut|adjoin
      • abut|to touch
        =>adjoin|to touch
      • anaemia|anemia
      • chlorosis|chlorotic anemia
        =>chlorosis|chlorotic anaemia
      bi-directionall synonym pairs are bi-directional 
      • TB|TUBERCULOSIS
        =>TUBERCULOSIS|TB
      multiple synonym pairsconvert to all possible double pairs 
      • AF|ATRIAL FIBRILLATION|A FIB
        =>AF|ATRIAL FIBRILLATION
        =>AF|A FIB
        =>ATRIAL FIBRILLATION|A FIB
        =>ATRIAL FIBRILLATION|AF
        =>A FIB|AF
        =>A FIB|ATRIAL FIBRILLATION
      redundancyremove if the normalized synonyms are the same. This remove redundant synonyms which is taken care in Norm, such as spelling variants, inflectional variants, etc. 
      • fecal|fecal

  • Examples - Test Cases:

    Skip

  • Algorithm:
    • manually reviewed all 8 files and remove suffix synonyms
    • Read in all 8 files, one by one
    • decompose multiple synonym pairs:
      • Input:
        • AF|ATRIAL FIBRILLATION|A FIB
      • Outputs:
        • AF|ATRIAL FIBRILLATION
        • AF|A FIB
        • ATRIAL FIBRILLATION|A FIB
    • Ignore comments (line starts with #)
    • normalize key:
    • Generate symmetric synonym pair
      • Input:
        • AB|ANTIBODY
      • Outputs:
        • ab|ANTIBODY
        • antibody|AB
    • remove duplications:
      • Remove duplicated synonyms from different files
      • Remove if the normalized synonyms in synonym pair are the same, which means no new CUI will be found because they have same key (normTerm) in CuiMapping.