Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

PreProcess: STs (Semantic Types)

  • Description:

    A set of 135 Semantic Types in the Semantic Network in NLM's Unified Medical Language System (UMLS) is used for STI. Concepts in the UMLS Metathesaurus are assigned to one or more STs which semantically characterize those concepts. For example, concept Aspirin is assigned the STs [Pharmacologic Substance] and [Organic Chemical]. Each Semantic Type has a ID and abbreviation. They are called StId and StAbbr, respectively. This information can be derived from the latest MRSTY.

  • Input:
    • MRSTY (Semantic types list)
      CUIST IDST Name
    • SRDEF (Semantic types Abbreviations)
      TUI...ST Abbreviations

  • Java File & Algorithm:
    • GenerateStFromMrSty.java
      • Read cui, Id, and name from MRSTY
      • Update Semantic Type list
      • Read TUI and ST abbreviation from SRDEF
      • Map TUI to ST abbreviations
      • Print out Semantic type with "|" as field separator

  • Output File:
    sts.txt:
    ST IndexST IDST Name

  • Notes:
    Semantic Types can be automatically generated from MRSTY by following script:
    • shell> flds 2,3 MRSTY | sort -u > SemanticType
    • shell> Manually add in index/abbreviation for each ST (usually at the end of the file)

      or

    • Get SRDEF.txt
    • shell> fgrep "STY|T" SRDEF.txt > SRDEF.STY.txt
    • shell> flds 1,2,3,9 SRDEF.STY.txt > STY.txt
    • shell> Manually add in index for each ST