Test Set
II. Description
The test set used in CSpell was generated by finding consumer health questions with the highest count of OOV (out of vocabulary) terms from the NER (Name Entity Recognition) collections. The SPECIALIST Lexicon 2017 release was used as the dictionary to identify OOVs. The errors were manually annotated by two annotators (Dr. Alan R. Aronson and Sonya E. Shooshan) independently. The disagreements were reconciled by the annotators with arbitration by Dr. Dina Demner-Fushman as needed. This test set is summarized as follows:
Consumer health questions | 224 |
Tokens | 16,707 |
Annotation tags | 1,946 |
Instances of non-word corrections | 974 |
Instances of real-word corrections | 1,178 |
Word count per question | 3 - 337 |
Average word count per question | 72.36 |
Error per question | 0 - 22 |
Average error per question | 4.90 |
Error rate (error per token) | 0.07 (= 1,178/16,707) |
III. Distribution of Errors in the Test Set
Count | Minimum | Maximum | Average |
---|---|---|---|
Character | 23 | 1985 | 504.71 |
Word | 3 | 337 | 72.36 |
Error Tag | 0 | 22 | 4.90 |
IV. Generation Processes
V. Performance Tests