Not Base Forms/LMWs Files From LexCheck
The LexCheck releases include files that are not base forms (invalid LMWs and inflections of LMWs) and not valid LMWs. These files are derived from the expansion of abbreviations or acronyms in LEXICON. Some expansions are not valid LMWs and thus does not have cross-ref EUI because:
- not a single POS:
These terms often match “law(s) of articulation”. That is a noun with a postmodifying prepositional phrase, rather than being a single NP, it cannot be a Lexbuild base."
such as cause of death|COD|E0453760, condition on discharge|COD|E0453760
- chemical names that are more like formulas than like words:
such as “1-oleoyl-2-acetyl-sn-glycerol” is an expansion of OAG|E0698010, but that expansion is not word-like enough to be a Lexbuild record.
names of studies:
- We have also declined to make Lexbuild records for names of studies, considering them to be too ephemeral as terms. If those studies have acronyms or abbreviations, the study names can appear as expansions in those records.
such as "acquired immunodeficiency syndrome test"
They are good sources for invalid LMWs. They are added to the prevCand.data for over all valid and invalid LMW list. This page is a snapshot on the tag completion of the latest candidate list.
- Program: ${MULTIWORDS}/bin/00.CandidateList
2
- Data directory: ${MULTIWORDS}/data/Candidate/
- In Files:
- ./5.LexCheckNotBaseForm/
- ./6.LexCheckNotLmw/
- Out Files:
- notBaseLmw.data
- notBaseLmw.data.yes
- notBaseLmw.data.no
- notBaseLmw.data.rpt
5.LexCheckNotBaseForm
Year | Total | Valid | Invalid
|
---|
2015 | 6661 | 201 (3.02%) | 6460 (96.98%)
|
2016 | 8418 | 276 (3.28%) | 8142 (96.72%)
|
2017 | 8688 | 287 (3.30%) | 8401 (96.70%)
|
2018 | 9196 | 300 (3.26%) | 8896 (96.78%)
|
2019 | 9335 | 301 (3.22%) | 9034 (96.78%)
|
2020 | 9395 | 336 (3.58%) | 9059 (96.42%)
|
2021 | 9426 | 337 (3.58%) | 9089 (96.42%)
|
|
Accu. | 9426 | 337 (3.58%) | 9089 (96.42%)
|
* These files are accumulated. So, the accu. data must be the same as the latest release.
6.LexCheckNotLmw
- terms that are not valid LMWs
- This file is updated during the validation step in annual Lexicon release However, It is not updated after 2023.
- expansion of acr/abb or nominalization has cross-ref EUI if they are valid LMWs
- Those without EUI are invalid LMWs, and tagged [N] are invalid LMWs
- can be a valid LMW due to the tagging errors or linguistic usage changes.
Year | Total | Valid | Invalid
|
---|
2017 | 407 | 24 (5.90%) | 383 (94.10%)
|
2018 | 777 | 27 (3.47%) | 750 (96.53%)
|
2019 | 916 | 28 (3.06%) | 888 (96.94%)
|
2020 | 918 | 28 (3.05%) | 890 (96.95%)
|
2021 | 918 | 28 (3.05%) | 890 (96.95%)
|
|
Accu. | 918 | 28 (3.05%) | 890 (96.95%)
|
* These files are accumulated. So, the accu. data must be the same as the latest release.
Out Tagged Not Base/LMW Files:
- terms from all above sources that are evaluated previously. Most of them are invalid LMWs.
- The NotBaseForm files seems contains notLMW files.
- The conbimed file is auto-tag valid/invalid LMWs by the latest Lexicon (inflVars.data)
- out files: notBaseLmw.data.*
Total | Valid | Invalid | Date | Notes
|
---|
9335 | 291 (3.12%) | 9044 (96.88%) | 2018-11-15 | 2.MNSMatcherParAcr, 2017
|
9335 | 293 (3.14%) | 9042 (96.86%) | 2019-01-03 | 2.MNSMatcherParAcr, 2018
|
9335 | 301 (3.22%) | 9034 (96.78%) | 2019-05-20 | 3.DMNSMatcherCuiEndWord, 2017
|