Classification Types: A New Feature in the SPECIALIST Lexicon.

Lu C, Payne A, Demner-Fushman D

AMIA Fall Symposium, 2019.


The SPECIALIST Lexicon (thereafter, the Lexicon), distributed by the National Library of Medicine (NLM) as one of the Unified Medical Language System (UMLS) knowledge sources, supports popular NLP tools, such as SemRep, MetaMap, cTAKES, CSpell, and the SPECIALIST Lexical Tools, as an underlying resource. A new enhanced feature called the classification type (CT) is a proposed addition to the Lexicon. These classification types can be archaic, source, informal, or other. First, terms classified as archaic, such as cozen, colde and benight, are considered no longer in common use in modern corpora (such as MEDLINE). These terms may have modern equivalents in the same lexical record (colde for cold) or in separate ones (ye for the). Second, normalization on spelling variants from foreign English into US English is needed if the source is from a foreign country. For example, British English (analyse, leukaemia, tumour) can be normalized to US English (analyze, leukemia, tumor). These terms are classified as source. Third, consumers often use informal language when they ask questions. For example, bomb for success, or grandpa for grandfather are used primarily in colloquial contexts. The performance of automated consumer question understanding could be improved if the Lexicon provides informal terms with their cross- referenced (CR) formal terms (synonyms).

Lu C, Payne A, Demner-Fushman D Classification Types: A New Feature in the SPECIALIST Lexicon. 
AMIA Fall Symposium, 2019.