You are here

The Importance of the Lexicon in Tagging Biological Text

Printer-friendly versionPrinter-friendly version
Smith LH, Rindflesch TC, Wilbur WJ
Natural Language Engineering 2005.
Abstract: 

Motivation: A part-of-speech tagger is a fundamental and indispensable tool in computational linguistics,typically employed at the critical early stages of processing. Although taggers are widelyavailable that achieve high accuracy in very general domains, these do not perform nearly as wellwhen applied to novel specialized domains, and this is especially true with biological text.Results: We present a stochastic tagger that achieves over 97.44% accuracy on MEDLINE abstracts.A primary component of the tagger is its lexicon which enumerates the permitted parts-of-speech forthe 10,000 words most frequently occurring in MEDLINE. We present evidence for the conclusionthat the lexicon is as vital to tagger accuracy as a training corpus, and more important than previouslythought.

Smith LH, Rindflesch TC, Wilbur WJ. The Importance of the Lexicon in Tagging Biological Text Natural Language Engineering 2005.