ASCII LEXICON, 1st Version (09~10)
I. Introduction
The Specialist LEXICON is distributed in UTF-8 format annually with UMLS. There are some NLP projects uses the Specialist LEXICON and still only dealing with ASCII characters. Due to the requests from user groups, the pure ASCII version of LEXICON is distributed since 2009.
II. Algorithm
LEXICON content | Action | Notes & Example |
---|---|---|
{base=filler | N/A | All base is unique |
spelling_variant=filler | remove if it is duplicated | spelling_variant=résumé |
abbreviation_of=abbreviation | remove if it is duplicated | None |
acronym_of=acronyms | remove if it is duplicated | None |
nominalization_of=filler | remove if it is duplicated | None |
variants=irreg | remove if it is duplicated | irreg|saute|sautes|sauted|sauted|sauteing| |
compl=pphr( | N/A | Needs manual cleanup (none) |
trademark=filler( | N/A | Needs manual cleanup (none) |