Word Sense Disambiguation

Original Test Collection

In order to support research investigating the automatic resolution of word sense ambiguity using natural language processing techniques, we have constructed this test collection of medical text in which the ambiguities were resolved by hand. Evaluators were asked to examine instances of an ambiguous word and determine the sense intended by selecting the Metathesaurus concept (if any) that best represents the meaning of that sense.

The test collection consists of 50 highly frequent ambiguous UMLS concepts from 1998 MEDLINE. Each of the 50 ambiguous cases has 100 ambiguous instances randomly selected from the 1998 MEDLINE citations. For a total of 5,000 instances. We had a total of 11 evaluators of which 8 completed 100% of the 5,000 instances, 1 completed 56%, 1 completed 44%, and the final evaluator completed 12% of the instances. Evaluations were only used when the evaluators completed all 100 instances for a given ambiguity.

The following links provide information about the process of building the WSD Test Collection:

The following paper describes in more detail the development of the test collection:

Please Note: Users are responsible for compliance with the UMLS Metathesaurus License Agreement.

To use this test collection, you must have accepted the terms of the UMLS Metathesaurus License Agreement, which requires you to respect the copyrights of the constituent vocabularies and to file a brief annual report on your use of the UMLS. You also must have activated a UMLS Terminology Services (UTS) account. For information on how we use UTS authentication please select the Info icon to the right:

For details of the licenses see the UMLS Metathesaurus License Agreement and How to License and Access the Unified Medical Language System® (UMLS®) Data.

The 5,000 MEDLINE citations included at this site are for exclusive use with the Test Collection and cannot be redistributed. In addition, the citations were retrieved in late 1999 and represent a static view of MEDLINE at that time.

Access WSD Test Collection (RESTRICTED)

You will be sent to the UMLS Terminology Services (UTS) login web page to login and then returned to the WSD Test Collection web page once you have successfully logged in.