Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Corpora in CSpell
I. Introduction
The corpus is used to:
II. Corpora Tested in CSpell
Three corpora were tested for comparison:
Baseline | Consumer Health Corpus | Medline N-gram Set | |
---|---|---|---|
Resources | 7 web sites | 20 (16) web sites | The Medline N-gram Set (2017 release) |
Statistics |
|
|
|
PS.
shell> ${PRE_PROCESS}/bin/RunCorpus
shell> ${PRE_PROCESS}/bin/RunPreProc
4
71
III. Corpora
IV. Development Tests
Model | Ensemble Corpus | Consumer Health Corpus | Medline.2017 |
---|---|---|---|
Frequency Only | |||
Frequency - Halil | 438|770|774 0.5688|0.5659|0.5674 | 438|770|774 0.5688|0.5659|0.5674 | 404|770|774 0.5247|0.5220|0.5233 |
Frequency - cSpell-Dev-1 | 536|769|774 0.6970|0.6925|0.6948 | 534|770|774 0.6935|0.6899|0.6917 | 521|770|774 0.6766|0.6731|0.6749 |
Frequency - cSpell-Dev-2 | 536|769|774 0.6970|0.6925|0.6948 | 534|770|774 0.6935|0.6899|0.6917 | 522|770|774 0.6779|0.6744|0.6762 |
Combined method | |||
Noisy Channel | 552|769|774 0.7178|0.7132|0.7155 | 551|770|774 0.7156|0.7119|0.7137 | 523|770|774 0.6792|0.6757|0.6775 |
CSpell Combined Orthographic and Frequency | 598|769|774 0.7776|0.7726|0.7751 | 598|769|774 0.7776|0.7726|0.7751 | 597|769|774 0.7763|0.7713|0.7738 |
V. Notes