Text Categorization

Word Sense Disambiguation

NLP (Natural Language Process) applications use MetaMap (or MMTX) to map arbitrary text to concepts in the UMLS Metathesaurus. MetaMap (MMTX) tokenizes the input free text into word or term (phrase). Each word/term is then mapped into UMLS concept along with confidence scores. If the text maps to more than one Metathesaurus concept with same high confidence scores, MetaMap does not know which concept is the correct mapping and cause ambiguity problem. STI can be used to select the best meaning assigned to ambiguous concepts in the Metathesaurus for word sense disambiguation.

  • Method
    Apply STI on the free text and select the best concept with highest score/rank from ambiguous Metathesaurus concepts (Semantic types).
  • Processes
    • Run MetaMap (MMTX) through the free text
    • Run text through STI
    • Apply candidates output filter option on ambiguous Semantic types
    • Select the best Semantic Types with best rank (score) from STI results
  • Examples
    • Input:
      	Race, ethnicity, culture, and disparities in health care
      	
    • Ambiguity:
      Culture has two mapped UMLS concepts:
      UMLS ConceptsSemantic Types
      Anthropological cultureIdea or concept
      Laboratory cultureLaboratory procedure
    • STI Result:
      	--- ST scores and rank based on word frequency --- 
      	6|0.7184|idcn|Idea or Concept
      	47|0.3718|lbpr|Laboratory Procedure
      	--- ST scores and rank based on document count for word --- 
      	4|0.7385|idcn|Idea or Concept
      	50|0.3749|lbpr|Laboratory Procedure
      	
    • Output:
      Idea or Concept; Anthropological culture