CSpell

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests on Test Set

I. Test Setup

Data: Test Set
The Ensemble program from Dr. Kilicoglu was enhanced from Ensemble paper.

II. Test Results

Non-word Only:

Non-word, Detection
Method TP FP FN T. Ret T. Rel Precision Recall F1
Ensemble 736 230 238 966 974 0.7619 0.7556 0.7588
CSpell 852 118 122 970 974 0.8784 0.8747 0.8765

Non-word, Correction
Method TP FP FN T. Ret T. Rel Precision Recall F1
Ensemble 598 368 376 966 974 0.6190 0.6140 0.6165
CSpell 743 227 231 970 974 0.7660 0.7628 0.7644
Real-word Included:

Real-word Included, Detection
Method TP FP FN T. Ret T. Rel Precision Recall F1
Ensemble 665 145 513 810 1178 0.8210 0.5645 0.6690
CSpell 874 108 304 982 1178 0.8900 0.7419 0.8093

Real-word Included, Correction
Method TP FP FN T. Ret T. Rel Precision Recall F1
Ensemble 565 245 613 810 1178 0.6975 0.4796 0.5684
CSpell 747 235 431 982 1178 0.7607 0.6341 0.6917
Real-word correction, elapsed running-time:
- Ensemble:
  - 1st: 34'13" => 2053 sec.
  - 2nd: 34'35" => 2075 sec.
  - Avg. 34'24" => 2064 sec.
- CSpell:
  - 181.28 sec.
- Accordingly, CSpell is about 11.38 (= 2064/181.28) time faster than Ensemble.

III. Discussion

The improvement from Ensemble to CSpell for non-word detection and correction is 11.77% and 14.79%.
The improvement from Ensemble to CSpell for real-word detection and correction is 14.03% and 12.33%.
The test set is a harder set for spelling correction because it was samples from questions with the highest OOV rate. The error rate of the test set (0.07) is much higher than the training set (0.04). Accordingly, both CSpell and Ensemble had worse performance on the test set than the development set.