Performance Tests on Dictionaries
I. Introduction
Performance tests are conducted on different dictionaries for finding the (best performance) dictionary for CSpell. Both original and revised gold standard of the training set are tested.
II. Setup
${C_SPELL}/PostProcess/bin/PerformanceTest
${C_SPELL}/PostProcess/data/Test/Baseline
III. Performance Results on Training Set
The ranking method chosen is the basic orthographic similarity (in Dec., 2017). At the time of test, no sophisticated ranking method was developed yet because ranking development was planed after the dictionary generation. Thus, recall are chosen over precision for developing the algorithm of generating dictionary. This result is not the final performance of CSpell. However, this setup was used as the base for a systematic approach of generating the dictionary for CSpell release.
Dictionary | Original: TP|Ret|Rel Precision|Recall|F1 | Revised: TP|Ret|Rel Precision|Recall|F1 |
---|---|---|
CSpell Software (orthographic similarity) with various dictionaries | ||
Jazzy | 498|2606|814 0.1911|0.6118|0.2912 | 514|2606|774 0.1972|0.6641|0.3041 |
Ensemble | 548|845|814 0.6485|0.6732|0.6606 | 553|825|774 0.6703|0.7145|0.6917 |
MEDLINE | 524|809|814 0.6477|0.6437|0.6457 | 550|809|774 0.6799|0.7106|0.6949 |
Lexicon 2017 | 535|829|814 0.6454|0.6572|0.6512 | 565|829|774 0.6815|0.7300|0.7049 |
Lexicon.E* 2017 | 534|814|814 0.6560|0.6560|0.6560 | 567|814|774 0.6966|0.7326|0.7141 |
Combo1** 2017 | 543|737|814 0.7368|0.6671|0.7002 | 577|737|774 0.7829|0.7455|0.7637 |
Combo2*** | 529|695|814 0.7612|0.6499|0.7011 | 557|695|774 0.8014|0.7196|0.7583 |
CSpell-1 (All, Med) *4 | 553|778|814 0.7108|0.6794|0.6947 | 591|778|774 0.7596|0.7636|0.7616 |
CSpell-1 (Medline, Med) *5* | 553|777|814 0.7117|0.6794|0.6952 | 592|777|774 0.7619|0.7649|0.7634 |
cSpell-1 (Medline, EngMed) *6* | 549|757|814 0.7252|0.6744|0.6989 | 584|757|774 0.7715|0.7545|0.7629 |