Sub-Term Mapping Tools

STMT - PreProcessed Files

The STMT software has developed in 2013 and expected not to change unless new SCR (software change requests) are made. The routine annual change is to update JDK and HSqlDb. Also, the data need to update annually. These data can be obtained in this preProcess session as described below:

  • Make sure Lvg is installed under ${PROJECTS}
  • Root directory: ${STMT_DIR}/PreProcess
    shell> cd ${STMT_DIR}/PreProcess
    • I. Data from Lexicon (${DATA}/Lex/${DataYear})
      Data are from Lexicon, so it is ready whenever Lexicon is ready

      Org:${LEXICON}/data/${YEAR}/tables/inflVars.data
      Src:${STMT_DIR}/PreProcess/data/Lex/${YEAR}/inflVars.data
      Tar:${STMT_DIR}/PreProcess/data/Lex/${YEAR}/normInflvarEui.data

      1. Setup:
      shell> mkdir ${STMT_DIR}/PreProcess/data/Lex/${YEAR}
      shell> ln -sf ${Org} ${STMT_DIR}/PreProcess/data/Lex/${YEAR}/inflVars.data

      2. Run Program (to generate ${TAR}):
      shell> cd ${STMT_DIR}/PreProcess/bin/1.PreStmt
      lvgDist.jar version (${YEAR} => same as data year)
      stmt.jar version (2015 => the latest STMT version)
      data year

      • ${YEAR}
      • ${YEAR}Ascii => used for MetaMap BDB

      1

      The above process can be run as soon as Lexicon release is completed.

    • II. Data from Metathesaurus (${DATA}/Meta/${DataYear})
      Data are from Dr. Kin Wah Fung and Joe Chow, should ask them for AA: in May; AB: in Nov.

      Org:
      • lhc-lx-ash3:/data/Releases/${YEAR}${VERSION}/RRF/META/MRCONSO.RRF
      • lhc-lx-ash3:/data/Releases/${YEAR}${VERSION}/RRF/META/MRXNS_ENG.RRF

      Src:
      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/MRCONSO.RRF
      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/MRCONSO_ENG.RRF

      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/MRXNS_ENG.RRF

      Tar:
      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/nonSuppressCui.data
      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/normTermCui.data
      • ${STMT_DIR}/PreProcess/data/Meta/${YEAR}/cuiPreferredTerms.data

      1. Setup:
      shell> mkdir ${STMT_DIR}/PreProcess/data/Meta/${YEAR}AA
      shell> link MRCONSO.RRF
      shell> link MRXNS_ENG.RRF
      shell> link MRCONSO.RRF.ENG
      => must run ${META_DIR}/bin/1.ProcessUmlsFiles option 2 first to generate MRCONSO.RRF.ENG
      IN practice, we run through the test programs for UMLS-Metathesaurus release before we work on STMT data.

      2. Run Program:
      shell> cd ${STMT_DIR}/PreProcess/bin/1.PreStmt
      lvgDist.jar version (2016 => Does not used)
      stmt.jar version (2015 => the latest STMT version)
      data year (2018AA)
      2, 3, 4

      3: run on lexdev, requires big memory size

      The above process can be run after UMLS release is out (AA: May and AB: Nov.).

    • III. Data from UMLS-Core Synonyms (${DATA}/Synonyms/${DataYear})
      Data are static (UmlsCore.2016-), so it is always ready unless there are new updtaes
      Org/Src:
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/acronym_edited.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/british.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/ecri.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/greco-latin.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/lvg.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/ramiller-prune_kwf.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/To_add_UMLS_syn_consolidated.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2016-/UMLS_syn_consolidated.txt

      The synonyms from Lexicon are updated annully after 2017+ release. This file needs to be updated accordingly after 2017+ (UmlsCore.2017+).

      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/acronym_edited.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/british.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/ecri.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/greco-latin.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/lvg.txt => symbolic link to ../${YEAR}/synonyms.data.2.4
        • => Need to be updated annually
        • ${STMT_DIR}/PreProcess/data/Synonyms/${YEAR}/synonyms.data
        • shell> flds 2,4 synonyms.data > synonyms.data.2.4
          ${STMT_DIR}/PreProcess/data/Synonyms/${YEAR}/synonyms.data.2.4
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/ramiller-prune_kwf.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/To_add_UMLS_syn_consolidated.txt
      • ${STMT_DIR}/PreProcess/data/Synonyms/UmlsCore.2017+/UMLS_syn_consolidated.txt

      Tar:${STMT_DIR}/PreProcess/data/Synonyms/${YEAR}/normTermSynonyms.data

      1. Setup:
      shell> mkdir ${STMT_DIR}/PreProcess/data/Synonyms/${YEAR}

      • shell> cd ${YEAR}
      • shell> link synonyms.data
      • shell> flds 2,4 synonyms.data > synonyms.data.2.4
      • shell> sort -u synonyms.data.2.4 > synonyms.data.2.4.uSort

      • shell> cd ../UmlsCore.2017+
      • Update link lvg.txt.2017+ (link to ../${YEAR}/synonyms.data.2.4)

      2. Run Program:
      shell> cd ${STMT_DIR}/PreProcess/bin/1.PreStmt
      lvgDist.jar version (2018 => same as data year)
      stmt.jar version (2015 => the latest STMT version)
      data year (2018)
      5

      Option 6 and 7 are not used in normal annual operation.