Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Sub-Term Mapping Tools

UMLS-Core: Synonym Files

  • Descriptions:
    • A Java class, GenDefaultSynonymsFile, is implemented in STMT package to read in synonym files (with different format) and stored into normTerm-synonym table
    • There are 8 files (from different sources) with different formats and needs to be standardized first:

      FileDescriptionsNotes
      acronym_edited.txtacronyms and abbreviationsacronym|upper case|punctuation
      british.txtBritish Englishspelling varaints|lower case|punctuation
      ecri.txtECRI (Emergency Care Research Institute) Medical Devicemixed case|multiple sysnonym pairs
      greco-latin.txtGreco-Latinlower case|bi-direction
      lvg.txtLexical Toolsmixed case|genitive|punctuation|bi-direction
      ramiller-prune_kwf.txtRamiller (after human review) mixed case|genitive|punctuation|multiple sysnonym pairs
      To_add_UMLS_syn_consolidated.txtTo be added to UMLSmixed case|punctuation|multiple sysnonym pairs
      UMLS_syn_consolidated.txtUMLSlowercase|punctuation
    • All files have different formats (needs to be standardized)
    • Synonym variations:

      VariationsStandardizationLvg flowNotes and Examples
      genitiveremove genitive-f:g
      • down's|mongoloid
      • bristowe's|split brain
      • cowper's|bulbourethral
      parenthetic plural fromremove (s), (es), (ies)-f:rsThis is needed to standardize the term (not the synonym)
      punctuationreplace with space-f:o
      • trim multiple spaces
      • % is removed
      Upper, lower, mixed caseslower case-f:l 
      spelling variantsuse citation form instead-f:Ct 
      inflectional variantsuse citation form instead-f:Ct 
      stopWorddo nothing  
      suffixesmanually remove them 
      • Knife|-tome
      recursive synonymsnot implemented 
      • abut|adjoin
      • abut|to touch
        =>adjoin|to touch
      • anaemia|anemia
      • chlorosis|chlorotic anemia
        =>chlorosis|chlorotic anaemia
      bi-directionall synonym pairs are bi-directional 
      • TB|TUBERCULOSIS
        =>TUBERCULOSIS|TB
      multiple synonym pairsconvert to all possible double pairs 
      • AF|ATRIAL FIBRILLATION|A FIB
        =>AF|ATRIAL FIBRILLATION
        =>AF|A FIB
        =>ATRIAL FIBRILLATION|A FIB
        =>ATRIAL FIBRILLATION|AF
        =>A FIB|AF
        =>A FIB|ATRIAL FIBRILLATION
      redundancyremove if the normalized synonyms are the same. This remove redundant synonyms which is taken care in Norm, such as spelling variants, inflectional variants, etc. 
      • fecal|fecal

  • Examples - Test Cases:

    Skip

  • Algorithm:
    • manually reviewed all 8 files and remove suffix synonyms
    • Read in all 8 files, one by one
    • decompose multiple synonym pairs:
      • Input:
        • AF|ATRIAL FIBRILLATION|A FIB
      • Outputs:
        • AF|ATRIAL FIBRILLATION
        • AF|A FIB
        • ATRIAL FIBRILLATION|A FIB
    • Ignore comments (line starts with #)
    • normalize key:
    • Generate symmetric synonym pair
      • Input:
        • AB|ANTIBODY
      • Outputs:
        • ab|ANTIBODY
        • antibody|AB
    • remove duplications:
      • Remove duplicated synonyms from different files
      • Remove if the normalized synonyms in synonym pair are the same, which means no new CUI will be found because they have same key (normTerm) in CuiMapping.