CSpell

Spell Correction Evaluation

I. Introduction

Both spelling error detection and correction should be evaluated:

  • Spelling error detection is the first step of spelling error correction. Terms that are identified as spelling errors are further processed to be corrected.
  • Spelling error correction is the key for automatic spelling correction. Misspellings are corrected automatically by spelling tools.
The precision, recall, and F measure are used to measure the performance of spelling error detection and correction. The performance is used to evaluate:
  • Spelling error detection, including spelling checking dictionary and associated algorithm (spelling error exceptions).
  • Spelling error correction, it is overall performance, including detection, algorithm of ranking.

II. Performance of Spelling Correction

  • 1. Original to GoldStd: all changes are total relevant (= [TP] + [FN])
  • 2. Original to Correction: all changes are retrieved (= [TP] + [FP])

  • Changes are in both 1 and 2 are retrieved-relevant [TP], retrieved-relevant
  • Changes are only in 2, but not in 1, are [FP], retrieved-not-relevant
  • Changes are only in 1, but not in 2, are [FN], not-retrieved-relevant

  • Precision = [TP] / ([TP] + [FP])
  • Recall = [TP] / ([TP] + [FN])
  • F1 = (2 x P X R) / (P + R)

III. Detection and Correction Algorithm in Evaluation Tools

  • Comparing difference between:
    • OrgFile and GoldStd (DiffOrgGold):
    • OrgFile and Corrected (DiffOrgCorr):
  • Detection:
    • TP
      • Changes in both DiffOrgGold and DiffOrgCorr
        • change in DiffOrgCorr is the detection
        • change in DiffOrgGold should be corrected
        • So, change (no need to be the same) are the detection
    • FP
      • Change only in DiffOrgCorr
    • FN
      • Change only in DiffOrgGold
  • Correction:
    • [TP]
      • Changes in both DiffOrgGold and DiffOrgCorr
        • Changes are the same
        • Changes are spVar of each other
    • [FP]
      • Changes only in DiffOrgCorr
    • [FN]
      • Changes only in DiffOrgGold

IV. Evaluation Tools

  • shell> cd ${POST_PROCESS}/bin
  • shell> PerformanceTest
    • 0: run the correction first
    • 2: run the performance test for detection and correction