CSpell

Performance Tests on Training Set

I. Test Setup

  • Data: Training Set
  • The corrected data of ESpell and Jazzy from Dr. Kilicoglu are used directly for this test result.
  • The Ensemble program from Dr. Kilicoglu was enhanced from Ensemble paper. Thus, the result is slightly better.

II. Test Results

  • Non-word Only:

    Non-word, Detection
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell39578537911807740.33470.51030.4043
    Jazzy324694503937740.82440.41860.5553
    Ensemble6551701198257740.79390.84630.8193
    CSpell667551077227740.92380.86180.8917

    Non-word, Correction
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell23794353711807740.20080.30620.2426
    Jazzy1872065873937740.47580.24160.3205
    Ensemble5522732228257740.66910.71320.6904
    CSpell6071151677227740.84070.78420.8115

  • Real-word Included:

    Real-word Included, Detection
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell41077055411809640.34750.42530.3825
    Jazzy334596303939640.84990.34650.4923
    Ensemble5801383847189640.80780.60170.6897
    CSpell692532727459640.92890.71780.8098

    Real-word Included, Correction
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell24593571911809640.20760.25410.2285
    Jazzy1912027733939640.48600.19810.2815
    Ensemble5172014477189640.72010.53630.6147
    CSpell6271183377459640.84160.65040.7338

  • Speed:
    • Elapse: 56.91 sec

III. Discussion

  • The Ensemble outperformed ESpell and Jazzy (ASpell) by a large margin (over 30%) because Ensemble was developed to correct errors in consumer health questions.
  • The improvement from Ensemble to CSpell for non-word detection and correction is 7.24% and 12.11%.
  • The improvement from Ensemble to CSpell for real-word detection and correction is 12.01% and 11.91%.