The SPECIALIST Lexicon

Not Base Forms/LMWs Files From LexCheck

The LexCheck releases include files that are not base forms (invalid LMWs and inflections of LMWs) and not valid LMWs. These files are derived from the expansion of abbreviations or acronyms in LEXICON. Some expansions are not valid LMWs and thus does not have cross-ref EUI because:

  • not a single POS:
    These terms often match “law(s) of articulation”. That is a noun with a postmodifying prepositional phrase, rather than being a single NP, it cannot be a Lexbuild base."
    such as cause of death|COD|E0453760, condition on discharge|COD|E0453760
  • chemical names that are more like formulas than like words:
    such as “1-oleoyl-2-acetyl-sn-glycerol” is an expansion of OAG|E0698010, but that expansion is not word-like enough to be a Lexbuild record. names of studies:
  • We have also declined to make Lexbuild records for names of studies, considering them to be too ephemeral as terms. If those studies have acronyms or abbreviations, the study names can appear as expansions in those records.
    such as "acquired immunodeficiency syndrome test"
They are good sources for invalid LMWs. They are added to the prevCand.data for over all valid and invalid LMW list. This page is a snapshot on the tag completion of the latest candidate list.

  • Program: ${MULTIWORDS}/bin/00.CandidateList
    2
  • Data directory: ${MULTIWORDS}/data/Candidate/
  • In Files:
    • ./5.LexCheckNotBaseForm/
    • ./6.LexCheckNotLmw/
  • Out Files:
    • notBaseLmw.data
    • notBaseLmw.data.yes
    • notBaseLmw.data.no
    • notBaseLmw.data.rpt
    • 5.LexCheckNotBaseForm
      • terms that are not base forms
      • This file is updated during the validation step in annual Lexicon release
        • expansion of acr/abb or nominalization has cross-ref EUI if they are valid LMWs
        • Those without EUI are invalid LMWs, and tagged [I] or [N] are not base forms
        • can be a valid LMW if it is an inflVars, tagged [I]

      YearTotalValidInvalid
      20156661201 (3.02%)6460 (96.98%)
      20168418276 (3.28%)8142 (96.72%)
      20178688287 (3.30%)8401 (96.70%)
      20189196300 (3.26%)8896 (96.78%)
      20199335301 (3.22%)9034 (96.78%)
      20209395336 (3.58%)9059 (96.42%)
      20219426337 (3.58%)9089 (96.42%)
      Accu.9426337 (3.58%)9089 (96.42%)

      * These files are accumulated. So, the accu. data must be the same as the latest release.

    • 6.LexCheckNotLmw
      • terms that are not valid LMWs
      • This file is updated during the validation step in annual Lexicon release
        • expansion of acr/abb or nominalization has cross-ref EUI if they are valid LMWs
        • Those without EUI are invalid LMWs, and tagged [N] are invalid LMWs
        • can be a valid LMW due to the tagging errors or linguistic usage changes.

        YearTotalValidInvalid
        201740724 (5.90%)383 (94.10%)
        201877727 (3.47%)750 (96.53%)
        201991628 (3.06%)888 (96.94%)
        202091828 (3.05%)890 (96.95%)
        202191828 (3.05%)890 (96.95%)
        Accu.91828 (3.05%)890 (96.95%)

        * These files are accumulated. So, the accu. data must be the same as the latest release.

    • Out Tagged Not Base/LMW Files:
      • terms from all above sources that are evaluated previously. Most of them are invalid LMWs.
      • The NotBaseForm files seems contains notLMW files.
      • The conbimed file is auto-tag valid/invalid LMWs by the latest Lexicon (inflVars.data)
      • out files: notBaseLmw.data.*

        TotalValidInvalidDateNotes
        9335291 (3.12%)9044 (96.88%)2018-11-152.MNSMatcherParAcr, 2017
        9335293 (3.14%)9042 (96.86%)2019-01-032.MNSMatcherParAcr, 2018
        9335301 (3.22%)9034 (96.78%)2019-05-203.DMNSMatcherCuiEndWord, 2017