np SPECIALIST Lexicon

The SPECIALIST Lexicon

Generating Synonyms from Meta-Thesaurus

This page generate LexSynonyms from the Meta-Thesaurus. Most of LexSynonyms are from this sources. The candidate list were generated in 2017 through this model and was completed in 2021. The following steps are used during these periods of time for LexSynonym generation.

I. Pre-Process

  • Directory: ${LEXICON_SYNONYMS}
  • program: ./Synonym/GenSynonymFromMeta.java
  • Inputs:
    • ./Results/sClass.data.tag
    • ${IN_DIR}LRSPL
    • ${IN_DIR}LRNOM
  • Outputs:
    • ./Results/synonymFromMeta.data

II. Process

  • Directory: ${LEXICON_SYNONYMS}/bin
  • program: GetSynonyms ${year}

    OptionDescriptionsInputsOutputsOption
    10
    • Apply tags to raw synonym candidate to generate synonym candidate file, then update tags
    • Synonym.TagSynonymClassCand.java
    • ./Candidates/synonymCan.raw.data
    • ./Tags/SynonymCan_Tagged.txt
      => Manually copy from last year
    • ./Results/sClass.out.tag
    • ./Results/sClass.out.notTag
      • This is the sClass has not been tagged (no CUI, previously).
      • These synonyms are from updates of UMLS-Metathesaurus and Lexicon.
      • Should be 0 when completed (after 2022+)
      • If not 0, sent them to linguist to tag (new synonym canidate list - new sClasses)
      • Append tagged results to ./Tags/SynonymCan_Tagged.txt and re-run Step 10-11 until it is zero.
    • ./Results/sClass.out.tag.tbd
      • This is the sClass has been tagged with new synonym(s) found in the class.
      • Send to linguist for tagging.
      • These synonyms are from updates of UMLS-Metathesaurus and Lexicon.
      • Should be 0 when completed.
      • If not, send them to linguists to tag those synonym with TBD tag (new synonym within existing sClasses)
      • Put the tagged file as the updated tagged file: ./Results/Tags/sClass.out.tag.tbd.tagged
        =>Run Step 10a to update those new tags (from new data) into ./Tags/SynonymCan_Tagged.txt.tbdUpdated (see Step-11a). Then rerun Step 10-11 until it is 0.

    • If both of above files are 0, go to Step 11.

    The tagged sClass No. could be different than the final sTagClass No. that is generated.
    10
    10a
  • Update ./Tags/SynonymCan_Tagged_all.txt on new synonym with TBD tag
  • Synonym.UpdateTagSynonymClass.java
    • ./Tags/SynonymCan_Tagged.txt
    • ./Results/Tags/sClass.out.tag.tbd.tagged
    • ./Tags/SynonymCan_Tagged.txt.tbdUpdated

    • Tbd no must be 0 in the log file for both input files.
      => If so, then:
      • cp -p ,/Tags/SynonymCan_Tagged.txt.tbdUpdated to ./Tags/SynonymCan_Tagged.txt
      • ln -sf ./Tags/SynonymCan_Tagged_all.txt ./Tags/SynonymCan_Tagged.txt
    • Continue to run Step 11, then 10-11 until ./Results/sClass.out.tag.tbd is empty (0)
    10a
    11
    • Validate and analyze synonym tag file(s)
    • Synonym.CheckTagSClassFile.java
    • ./Tags/SynonymCan_Tagged.txt


      =================

      Steps below were used in release before 2022:

    • if there are separated tagged files (2022-), Combine all tagged synonym candidates (in .Tags/curTags) to
      ./Tags/SynonymCan_Tagged_${YEAR}.txt.org
      shell>cat *.tag > SynonymCan_Tagged_${YEAR}.txt.org
    • Fix issues and save to ./Tags/SynonymCan_Tagged_${YEAR}.txt.fixed
      issues of tags will shown up during run this step
    • Update and copy SynonymCan_Tagged_${YEAR}.txt.fixed to SynonymCan_Tagged_${YEAR}.txt
      => This is the tag file only for the current year ${YEAR}
      => They are under ./Tags/curTags
    • append/change
      SynonymCan_Tagged_${YEAR}.txt to SynonymCan_Tagged_all.txt
    • link SynonymCan_Tagged_all.txt to SynonymCan_Tagged.txt
    Follow the log to check and fix follows:
    • ./Tags/SynonymCan_Tagged.txt.tbd:
      • Required manually add TBD tags to those synonyms missed tags
      • Tbd should be 0 to complete the tag (send to linguist for tagging)
    • ./Tags/SynonymCan_Tagged.txt.err:
      • Err no must be 0, send to linguists to re-tag if not 0
    • ./Tags/SynonymCan_Tagged.txt.sortByEui
      • Should be same as ./Tags/SynonymCan_Tagged_all.txt except for the return characters (PC and Linux), check wc -l
      • Use this file for the next cleanUp check in Step-11a and 11
    11
    11a
    • Clean up synonym tag file(s)
    • Synonym.CleanUpTagSClassFile.java
    • ./Tags/SynonymCan_Tagged.txt
    • ./Candidates/cuiPreferredTerm.data
    Follow the log to check and fix follows:
    • ./Tags/SynonymCan_Tagged.txt.cleanUp:
      => This is the clean up file, which merges tags and removd duplicates.
      => copy this file to the input file (./Tags/SynonymCan_Tagged.txt_ then re-ru Steps: 10,11,11a to make sure both TBD files are 0.
      • ./Results/sClass.out.notTag
      • ./Results/sClass.out.tag.tbd

      =>In the re-Run, this file should be identical to the input file
    • ./Tags/SynonymCan_Tagged.txt.conflict
      => This file shows all conflict tags (must = 0)
    • ./Tags/SynonymCan_Tagged.txt.ptErr
      => This file shows all all conflict preferred terms (for a same CUI). This file is for the referernece only. The PT is auot-updated to the lateset UMLS.
      must = 0 (in the final re-run after 11a)
    • ./Tags/SynonymCan_Tagged.txt.ptNull
      All conflicts in this file should be null, check in the log file (no pt for the updated assocaited EUIs, could be > 0).

    Fixed tags manually in SynonymCan_Tagged.txt until no errors/conflicts.
    copy back and re-run step 10-11 until no error, no conflict.
    11a
    12
    • Generate current year synonyms from same CUI in Meta-thesaurus
      • Go through all tagged sClass (Same CUI)
        • Collect all synonyms of [Pos|EUI|Base] with [Y] tag
        • Find all spVars and nominalizations of above [Y] tagged synonyms
        • Generate sPairs from all permutations of above synonyms, their spVars and noms
        • Use the CUI of the sClass for extra information
      • Print out sPairSet by alphabetical order
    • ./Tags/SynonymCan_Tagged.txt (all accumulated tags)
      => ./Results/sClass.out.tag (not used after 2022+, repleaced by above file)
    • ./inData/LRSPL
    • ./inData/LRNOM
    • Print out No. of tags [Y|N|S] for [yes|No|Skip]
    • ./Results/synonymFromMeta.data.${YEAR}
      NpLc Synonym-1Synonym-1Pos-1Synonym-2Pos-2CUI
    • shell> cp -p synonymFromMeta.data.${YEAR} synonymFromMeta.data
    12
    13
    • Combine previous year and current year synonyms from Meta-thesaurus
    • not used after 2022+
    • ./inData/synonymFromMeta.data.{PREV_YEAR}
      => link to ../../${PREVIOUS_YEAR}/outData/ Results/synonymFromMeta.data
      This is the accumulated synonyms from Meta (check WC with dGrowth)
      This is the file release in LVG and Lexicon
    • ./outData/Results/synonymFromMeta.data.${YEAR}
    • ./outData/Results/synonymFromMeta.data (accumulated)
    13