The SPECIALIST Lexicon

About Lexicon

The SPECIALIST LEXICON was first released in 1994 and is designed as a general English lexicon with extensive biomedical terminology. It encodes a broad range of linguistic knowledge, which includes syntactic categorization, variant forms, and specifications of acronyms and abbreviations. The Lexicon is a fundamental resource for biomedical natural language processing (NLP) and played a central role in the development of UMLS Metathesaurus.

The SPECIALIST LEXICON is composed of unit lexical records, which can be represented in text format, XML format, or as Java objects. For example, the record of "medicine" can be represented as:

  • Text Format
  • {base=medicine
    entry=E0039272
    	cat=noun
    	variants=reg
    	variants=uncount
    }
    

  • XML Format
  • <?xml version="1.0" encoding="UTF-8"?>
    <lexRecord>
    	<base>medicine</base>
    	<eui>E0039272</eui>
    	<cat>noun</cat>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="base" type="basic">medicine</inflVars>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="singular" type="basic">medicine</inflVars>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="plural" type="reg">medicines</inflVars>
    	<nounEntry>
    		<variants>reg</variants>
    		<variants>uncount</variants>
    	</nounEntry>
    </lexRecord>
    

  • Java Object
  • lexRecord
    • base:string
    • eui:string
    • cat:string
    • inflVars: vector(inflVars)
    • nounEntry: (nounEntry)
    inflVars
    • inflVar:string
    • cit:string
    • unInfl:string
    • eui:string
    • noun:string
    • infl:string
    • type:string
    nounEntry
    • variants: vector(string)