Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Sub-Term Mapping Tools

Synonym Norm Development

I. Requirements
Use normalization to aggressively map a term to its synonyms by abstracting away from

  • g: Genitive
  • rs: parenthetical plural forms (s), (es), (ies)
  • o: Punctuation
  • l: cases
  • Ct: spelling variants and inflectional variants

  • remove duplicated spaces
  • trim
  • duplicated results

II. Developments

  • Approach 1 (Ct on input term):
    • use lvg -f:g:rs:o:l:Ct
    • Ct is to get the citation form on the input term
    • fast performance
    • lower coverage rate (98% of method bellows)
    • Example 1:
      IDTermnorm termsynonym substitutionsCUI
      KP102818CLOTTING FACTOR DEFICIENCY, CONGENITAL
      • ...
      not found

  • Approach 2 (Ct on every words of input term):
    • Use lvg -f:g:rs:o:l:Ct
    • Customize Ct to get the citation form on every words of the input term
    • More mutation and results slower performance and high coverage rate
    • Example 1:
      IDTermnorm termsynonym substitutionsCUI
      KP102818CLOTTING FACTOR DEFICIENCY, CONGENITALclot factor deficiency congenital
      • coagulation factor deficiency hereditary
      • ...
      C0272316
    • However, still misses some mapping when the citation form has punctuation, such as "carcino-embryonic" is the citation of "carcinoembryonic"
    • Example 2:
      IDTermnorm termsynonym substitutionsCUI
      KP194142Elevated carcinoembryonic antigenelevate carcino-embryonic antigen
      • increase carcino-embryonic antigen
      • increased carcino-embryonic antigen
      • high carcino-embryonic antigen
      • ...
      C0549371

  • Approach 3 (Move Ct before removing punctuation):
    • Use lvg -f:g:rs:Ct:l:o
    • Example 2:
      IDTermnorm termsynonym substitutionsCUI
      KP194142Elevated carcinoembryonic antigenelevate carcino embryonic antigen
      • increase carcino embryonic antigen
      • increased carcino embryonic antigen
      • high carcino embryonic antigen
      • ...
      C0549371
      • elevate cea
      C0742014
    • Add remove genitive after Ct:
      • E0000135|Addison's disease|Addisons disease
      • There are no records with CT has (s), (es), (ies), so no need for -f:rs
      • Use Database for CUI mapping to improve performance

III. Comparisons

 Approach 1
(Ct on term)
Approach 2
(CuiMap)
Approach 3
(Smt)
PerformanceFast
  • KP: 27 min.
  • VA: 23 min.
Slow
  • KP: 78 min.
  • VA: 321 min.
Fast
  • KP: 22 min.
  • VA: 68 min.
Coverage-KP
(26890 terms)
  • CUI with Norm: 12165 - 45.24%
  • CUI with 1 synonyms: 1673 - 6.22%
  • CUI with 2 synonyms: 168 - 0.62%
  • No CUI found: 12884 - 47.91%
  • Total term-CUIs found: 31643
  • CUI with Norm: 12165 - 45.24%
  • CUI with 1 synonyms: 1692 - 6.29%
  • CUI with 2 synonyms: 174 - 0.65%
  • No CUI found: 12859 - 47.82%
  • Total term-CUIs found: 31660
  • CUI with Norm: 12165 - 45.24%
  • CUI with 1 synonyms: 1692 - 6.29%
  • CUI with 2 synonyms: 174 - 0.65%
  • No CUI found: 12859 - 47.82%
  • Total term-CUIs found: 31661
Coverage-VA
(21221 terms)
  • CUI with Norm: 16937 - 79.81%
  • CUI with 1 synonyms: 221 - 1.04%
  • CUI with 2 synonyms: 12 - 0.06%
  • No CUI found: 4051 - 19.09%
  • Total term-CUIs found: 27478
  • CUI with Norm: 16937 - 79.81%
  • CUI with 1 synonyms: 228 - 1.07%
  • CUI with 2 synonyms: 15 - 0.07%
  • No CUI found: 4041 - 19.04%
  • Total term-CUIs found: 27498
  • CUI with Norm: 16937 - 79.81%
  • CUI with 1 synonyms: 228 - 1.07%
  • CUI with 2 synonyms: 15 - 0.07%
  • No CUI found: 4041 - 19.04%
  • Total term-CUIs found: 27498

IV. Notes
In practice, we only normalize key of the synonym pair. This might cause non-symmetric issues. For example:
synonym pair: impaired|abnormality are stored as follows in the database table:

normalized keysynonym
impairabnormality
abnormalityimpaired impair|abnormality

The mapping results in non-symmetric lookup:

  • abnormality -> abnormality -> impaired
  • impair -> impair -> abnormality (not symmetric)
  • impaired -> impair -> abnormality