Lexicon Test - Establish the Gold Standard
Introduction
The SPECIALIST Lexicon is a good corpus to be used for testing spVar model.
It includes spelling variants in base forms and Inflectional Spelling Variants.
Model (GetGoldStdFromLex.java)
- Inputs:
- inflVars.data:
inflVar | cat | infl | EUI | base | citation
|
---|
- LRSPL:
- inflSpVars.data:
- Outputs:
- goldStd.data
where:
- inflVar: lowercased inflVar, unique
- spVar tag: true|false
- Lex.terms.out (all terms from Lexicon)
- Algorithm:
- Go through inflVars.data
- Tag true if EUI are in the EUI set of base spVars (from LRSPL)
- Tag true if term are inflSpVars (from inflSpVars.data)
- In case of an inflVar exist in multiple lexRecords (EUIs), it is tagged as true if one of the them has spVars
- What are missing:
The following spelling variants are missing in this program (False Negative). These missing spVar are not included in the gold-standard (final submit) for the AIAM.2016 multiword paper.