Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

LMW Candidates from Expansions of Abb/Acr in Lexicon

I. Introduction

The SPECAILIST Lexicon inlucdes expansions of abbreviations and acronyms. These are good LMW candidates. These expansions are cross-referenced (with EUI) if they exist in the Lexicon. Those without cross-ref EUIs are:

  • Invalid LMWs (base)
    • because "law(s) of articulation". That is a noun with a postmodifying prepositional phrase, rather than being a single NP, it cannot be a Lexbuild base. Such as condition on discharge|COD|E0453760
    • chemical names that are more like formulas than like words, such as 1-oleoyl-2-acetyl-sn-glycerol|OAG|E0698010
    • names of studies, considering them to be too ephemeral as terms. such as acquired immunodeficiency syndrome test|AIDS test|E0776477
  • Valid LMWs that to be added to the Lexicon: used as candidates.

After 2020+, Ths process was migrated to pre-process before freezing the Lexicon. Before 2019-, the validation and candidate generation processes were done during the Lexicon release. The correction (of adding valid LMWs) were done in the post-process and records will have updated cross-ref EUI in the following release.

II. Models

Implemented in ${LMW_DIR}/LexCandidates/GetAbbAcrExpansions.java.
The output file is: abbAcrExpansions.data.tag

  • From a LEXICON file, retrieve all abb/acr expansions
    • has cross-ref EUI:
      • [TAG_Y]: correct CR EUI vs. expansion (case sensitive)
      • [TAG_I]: Incorrect CR EUI vs. expansion
        • [|NO_EUI]: EUI is not in the Lexicon: deleted records
        • [baseSet]: Expansion is not base of CR EUI, could be modified records, requires manual fixes
    • no cross-ref EUI:
      • [TAG_P]: expansions are known exception (in the abbAcr_expansion_has_EUI_exception list, expansions are invalid LMWs, but has a same spelling as LMWs.
        For exampe, E0688694|Lin|noun, the expansion 'lines' is not the plural of line, but a gene name).
      • [TAG_M]: expansions are LMWs (inflVars, lowercased), add multiple mathced EUIs, sent file to linguist to tag:
        • file: abbAcrExpansions.data.hasEui.M
        • Format:
          EUIPOSCitationexpansionmatched EUIACR or ABBfound matching EUITag
      • [TAG_E]: expansions are LMWs (inflVars, lowercased): add 1 matched EUI, sent file to linguist to tag:
        • file: abbAcrExpansions.data.hasEui.E
        • Format:
          EUIPOSCitationexpansionmatched EUIACR or ABBfound matching EUITag

          Tags:
        • [C]: correct, expansion is invalid LMW, they should not have CR-ref EUI. No fix in the LexBuild.
          => Add to ${LMW_DIR}/data/${YEAR}/inData/abbAcrExpansions.data.hasEui.Exception.${YEAR}
        • [Y]: if the matched EUI is correct, manually add EUI to the lexRecord in the LexBuild.
          => update in the LB and the Lexicon release
        • [- EUI: E0xxxxxxx]: expansion is a valid LMW, add the EUI to the end of line if matched EUI is not correct. Also, fix them in the LexBuild
      • [TAG_N]: expansions are known invalid base forms (lowercased)
        => Add to ${LEX_CHECK}/data/Files/notBaseForm.data.${YEAR}
      • [TAG_C]: others, candidate list (sent to linguist)
        Tags:
        • [N]: invalid LMW: do nothing
          These tags will become [TAG_N] on the next run.
        • [Y]: valid LMW: add to Lexicon by LexBuild; add CR-EUI to the expansion
        • [M]: Modify lexRecords if needed, for cases that more modifications than just adding Cross-Ref EUI.

III. Processes

  • Source directory: ${LMW_DIR}/sources/LexCandidates
  • Input Data directory (${IN_DIR}: ${LMW_DIR}/data/${YEAR}/inData/
  • Current Data directory (${CUR_DIR}): ${LMW_DIR}/data/current/
  • Out Data directory (${OUT_DIR}): ${LMW_DIR}/data/${YEAR}/outData/12.LexCandidates
  • Program: ${LMW_DIR}/bin/12.LexAbbAcrCand <YEAR>

    StepDescrptionInputsOutputsNotes
    Pre-Process:
    0
    • Update the latest valid and invalid LMW list
    • Update candidates
    • Run ${LMW_DIR}/bin/00.CandidateList, steps 1-4
      => Setup: must link the latest Lexicon and inflVars from LexBuild daily backup to ${LMW_DIR}/data/current/inData/.
      => After run 00.CandidateList, two files used in the steps belows are auto-updated:
      • ${LMW_DIR}/data/current/inData/notLmw.data.current -> ${LMW_DIR}/data/Candidates/totalTerms.all.lmw.no
      • ${LMW_DIR}/data/current/inData/notBase.data.current -> ${LMW_DIR}/data/Candidates/totalTerms.all.base.no
    Process:
    1Generate candidate list from Abb/Acr expansion
    • GetAbbAcrExpansions.java
    • ${IN_DIR}/LEXICON (input)
    • ${IN_DIR}/inflVars.data (valid LMWs)
    • ${CUR_DIR}/notBase.data.current
      => linked to ${LMW_DIR}/data/Candidates/totalTerms.1_2.base.no
      => auto updated after run ${LMW_DIR}/bin/00.CandidateList, steps 1-4
    • ${LMW_DIR}/data/${YEAR}/inData/abbAcrExpansions.data.hasEui.Exception.${YEAR} (modified from the prev year)
    • abbAcrExpansions.tag (all tags)
    • abbAcrExpansions.invEui (the cross-ref EUI is invalid)
    • abbAcrExpansions.hasEui (no cross-ref EUI, but, expansion matches EUIs)
    • abbAcrExpansions.rpt (summary report)
    • abbAcrExpansions.data.cand (candidate list)
      => manual copy to ./Cand/abbAcrExpansions.data.cand.${YEAR}
      => Link to ./Stats/abbAcrExpansions.data.cand.${YEAR}
      => for the first time, go to step 10 to gen candidate list
      => then, repeat steps 0-2 until abbAcrExpansions.data.cand is empty (0)
    2Split invalid cross-ref EUI and no cross-ref EUI matches EUI file
    • abbAcrExpansions.data.invEui
    • abbAcrExpansions.data.hasEui
    • abbAcrExpansions.data.invEui.NO_EUI
      => Sent to linguist to tag [D]
      • [D]: if the CR of expansion is a deleted record (invalid LMWs), cross-ref EUI should be manually removed.
      • Others: the expansion is a valid LMW, this case might require to change the epxasion to citation form, restore the deleted records, or create a new lexRecord, and modify the CR-EUI, etc..

      => update ${LEX_CHECK}/data/File/notBaseForm.data.${YEAR}
      • this file should be empty after the update (notBaseForm.data)
    • abbAcrExpansions.data.invEui.WRONG_CIT
      => wrong citation, after fixed, it should be empty
    • abbAcrExpansions.data.hasEui.E
      => Exceptions, expansion has 1 matched EUI
      => Send to linguist to tag:
      • [C]: correct, expansion is invalid LMW, they should not have CR-ref EUI. No fix in LB.
      • [Y]: if the suggesting matched EUI is correct, manually add EUI to the lexRecord in LB.
      • [- EUI: E0xxxxxxx]: expansion is a valid LMW, add the EUI to the end of line if suggesting matched EUI is not correct. Also, fix in the LB.
    • abbAcrExpansions.data.hasEui.M
      => Exceptions, expansion has multiple matched EUIs
      => Sent to linguist to tag:
      • [C]: correct, the expansion shold not have cross-ref EUI (even the spelling is a valid base.=> add to abbAcrExpansions.data.hasEui.Exception.${YEAR}
      • [Y]: if the 1 matched EUI is correct (need to update the Lexicon in LExBuild)
      • EUI: add the correct EUI, might need to update the corss-ref EUI, modify the expansion, or add a new record (if expansion is a LMW) to Lexicon
    Post-Process:
    10Auto-tag candidate list
    • CandidateUtil.FilterTagCandFile
    • ${STATS_DIR}/abbAcrExpansions.data.cand.${YEAR}
    • ${LMW_DIR}/data/Candidates/0.LexiconInflVars/inflVars.data.current (valid LMWs)
      => ${LMW_DIR}/data/Candidates/0.LexiconInflVars/inflVars.data.current.1.uSort
    • ${LMW_DIR}/data/Candidates/totalTerms.all.base.no (invalid LMWs)
      => generated from step-0 (00.CandidateList, steps 1-4)
    Dir: ./Stats:
    • abbAcrExpansions.data.cand.${YEAR}.autoTag (all tags)
    • abbAcrExpansions.data.cand.${YEAR}.rmYesNo
      This file must be empty (wc=0) once updates/tags are completed
    • abbAcrExpansions.data.cand.${YEAR}.rmYesTagNo
      => Before update, this file is used as candidate list sent to linguist
      • No tag
      • if the expansion is a valid LMW, add to Lexicon, add CR-EUI to the expansion
    • notBaseFormUpdate.data.${YEAR}
      • flds 4,2 abbAcrExpansions.data.cand.${YEAR}.rmYesTagNo.${YEAR} > notBaseFormUpdate.data.${YEAR}
      • Append notBaseFormUpdate.data.${YEAR} to ${LexCheck}/data/Files/notBaseForm.data.${YEAR}
    After the candidate list is completed:
    • Add/Link candidates to ${Candidates}/1.LexiconAbbAcrExpansion/abbAcrExpansions.data.cand.${YEAR}
    • Run 00.CandidateList, step 1-4
      This step updates the valid and invalid LMW, and thus update the candidates.
    • rerun step 1-2, until *.cand = 0, because candidates that are LMWs are in the Lexicon and invalid LMWs are tagged as invalid automatically (by the updated totalTerm.all.base.no from 00.CandidateList), no new candidate should be found.