CSpell

Performance Tests on Dictionaries

I. Introduction

Performance tests are conducted on different dictionaries for finding the (best performance) dictionary for CSpell. Both original and revised gold standard of the training set are tested.

II. Setup

  • Program: ${C_SPELL}/PostProcess/bin/PerformanceTest
  • Data: ${C_SPELL}/PostProcess/data/Test/Baseline

III. Performance Results on Training Set

The ranking method chosen is the basic orthographic similarity (in Dec., 2017). At the time of test, no sophisticated ranking method was developed yet because ranking development was planed after the dictionary generation. Thus, recall are chosen over precision for developing the algorithm of generating dictionary. This result is not the final performance of CSpell. However, this setup was used as the base for a systematic approach of generating the dictionary for CSpell release.

DictionaryOriginal:
TP|Ret|Rel
Precision|Recall|F1
Revised:
TP|Ret|Rel
Precision|Recall|F1
CSpell Software (orthographic similarity) with various dictionaries
Jazzy498|2606|814
0.1911|0.6118|0.2912
514|2606|774
0.1972|0.6641|0.3041
Ensemble548|845|814
0.6485|0.6732|0.6606
553|825|774
0.6703|0.7145|0.6917
MEDLINE524|809|814
0.6477|0.6437|0.6457
550|809|774
0.6799|0.7106|0.6949
Lexicon
2017
535|829|814
0.6454|0.6572|0.6512
565|829|774
0.6815|0.7300|0.7049
Lexicon.E*
2017
534|814|814
0.6560|0.6560|0.6560
567|814|774
0.6966|0.7326|0.7141
Combo1**
2017
543|737|814
0.7368|0.6671|0.7002
577|737|774
0.7829|0.7455|0.7637
Combo2***529|695|814
0.7612|0.6499|0.7011
557|695|774
0.8014|0.7196|0.7583
CSpell-1 (All, Med) *4 553|778|814
0.7108|0.6794|0.6947
591|778|774
0.7596|0.7636|0.7616
CSpell-1 (Medline, Med) *5* 553|777|814
0.7117|0.6794|0.6952
592|777|774
0.7619|0.7649|0.7634
cSpell-1 (Medline, EngMed) *6* 549|757|814
0.7252|0.6744|0.6989
584|757|774
0.7715|0.7545|0.7629
  • * LexcionE: use Lexicon, with Aa, unit, and Mw (includes spVar and Pn)
  • ** Combo1: use LexcionE, with replacing suggDic by baseline (eng_med.dic)
  • *** Combo2: use LexcionE+Medline, with replacing suggDic by baseline (eng_med.dic)
  • *4* cSpell-1 (All, Med): use LexcionE, with replacing suggDic by Lexicon.enEwLc.dic.AddRm + UMLS_ST (Med.dic, All)
  • *5* cSpell-1 (Medline, Med): use LexcionE, with replacing suggDic by Lexicon.enEwLc.dic.AddRm + UMLS_ST (Med.dic, All and existed in Medline)
  • *6* cSpell-1 (Medline, Med): use LexcionE, with replacing suggDic by Lexicon + UMLS_ST (EngMed.dic)