SPECIALIST Lexicon

Total Data Set Files and Usage

The procedure of adding LMW are:

It would be very useful if we collect all manually tagged terms. It can be used for:

Final filter to exclude previous tagged terms
- Use all inflVars to filter out valid LMWs (term already in the Lexicon)
- Use the collected invalid LMWs to tag invalid as a reference for linguists (some invalid term might become valid).
  => Please note that the above two data are changed when the Lexicon is updated, a new candidate list is completed, or a new not base/LMW files is updated in LexCheck
Use as training/test data set for deep learning models

This manually data are collected from two sources:

Out Files:

totalData.data.*

Date	Notes	Total Candidate	Valid LMWs	Invalid LMWs
		totalData.data	totalData.data.yes	totalData.data.no
2018-11-15	2.MNSMatcherParAcr, 2017	31004	16331 (52.67%)	14673 (47.32%)
2019-01-03	2.MNSMatcherParAcr, 2018	31806	16924 (53.21%)	14882 (46.79%)
2019-05-20	3.DMNSMatcherCuiEndWord, 2017	33751	16924 (50.14%)	16827 (49.86%)