CSpell

Test Set

Test Set , 136 KB
- 2.7 MB
- OrgData.224: 224 health-related questions with highest OOV from NER collection asked by consumers
- GoldStd-NonWord: non-word gold standard
- GoldStd-RealWord: real-word gold standard
Test Set (Brat format) , 90 KB

II. Description

The test set used in CSpell was generated by finding consumer health questions with the highest count of OOV (out of vocabulary) terms from the NER (Name Entity Recognition) collections. The SPECIALIST Lexicon 2017 release was used as the dictionary to identify OOVs. The errors were manually annotated by two annotators (Dr. Alan R. Aronson and Sonya E. Shooshan) independently. The disagreements were reconciled by the annotators with arbitration by Dr. Dina Demner-Fushman as needed. This test set is summarized as follows:

Summary statistics:

Consumer health questions	224
Tokens	16,707
Annotation tags	1,946
Instances of non-word corrections	974
Instances of real-word corrections	1,178
Word count per question	3 - 337
Average word count per question	72.36
Error per question	0 - 22
Average error per question	4.90
Error rate (error per token)	0.07 (= 1,178/16,707)

III. Distribution of Errors in the Test Set

Stats on file size and error tags

Count Minimum Maximum Average
Character 23 1985 504.71
Word 3 337 72.36
Error Tag 0 22 4.90

Count	Minimum	Maximum	Average
Character	23	1985	504.71
Word	3	337	72.36
Error Tag	0	22	4.90

IV. Generation Processes

Generating the test set
- Generating Test Set from NER collection
- Annotation Guidelines
- Annotation
- Reconciled disagreements on the annotation data
- Computer-Aided Revision
  - Revision logs
- Final revised Brat Annotation data
Generating Gold Standard from the (NER) Test Set

V. Performance Tests

Performance Test for Ensemble on Test Set