WSD Test Suite Design
The design is similar between WsdTest (for NLM WSD Collection) and WsdTest2 (MSH WSD Set). We use WsdTest2 as an example in this page to illustrate the structure of WSD test suite.
- Test Suite Software:
- The structure of test suite include following components:
Component | Description | Values
|
---|
Year | TC release | 2007|2008|2009|2010|2011
|
Case | StWsd Case Type | Sentence|TiAb|Sentences
|
Score | Score Type | Cs|Dc|Rdc|Wc|Rwc
|
TestCase | a collection of WSD test instances for a specified word-score-case-year |
|
Instance | a single case (PMID) for WSD test |
|
- Java class simplified UML are shown as below:
- Input Data:
${WSD_TEST}/WsdTest2/data/Input
- allAmbiguousWords.txt: all ambiguous words
use "_" to replace " "
- MRSTY: map cui to ST
- release: original data from MSH WSD Set
- TestSet: Modified data set for StWsd
=> ambiguous Words/
- answers: gold standard WSD meaning for each PMID
PMID | Meaning (sense ID:, M1, M2, etc.)
|
- choices: all possible senses, ST candidates
- testCase.Sentence: ambiguous sentence (the sentence contains ambiguous word for disambiguation)
- testCase.TiAb: MedLine title and abstract. This can be used to retrieve ambiguous sentences (all sentences contain ambiguous word and its inflections)
PMID | Title & abstract (TiAb)
|
- Output Data:
${WSD_TEST}/WsdTest2/data/Output/
- ${YEAR}/: result for different year (version) of StWsd
- ${YEAR}/${TEST_CASE}/:
- ${TEST_CASE}:
Ambiguous Sentence | Title & Abstract (TiAb, citation) | Ambiguous sentences
|
- ${YEAR}/${TEST_CASE}/${SCORE_TYPE}/:
- ${SCORE_TYPE}:
Cs (Combined Score) | Dc (Documents counts) | Rdc (Real-time Dc)
| Rwc (Real-time Wc) | Wc (word counts)
|
- ${AMBIGUOUS_WORDS}.out: details WSD results for each ambiguous word
- Stats.rpt.abbr: statistics report for all ambiguous abbreviations
- Stats.rpt.all: statistics report for all ambiguous words
- Stats.rpt.both: statistics report for all ambiguous abbreviations and terms
- Stats.rpt.term: statistics report for all ambiguous terms