Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Spell Correction Evaluation

I. Introduction

Both spelling error detection and correction should be evaluated:

  • Spelling error detection is the first step of spelling error correction. Terms that are identified as spelling errors are further processed to be corrected.
  • Spelling error correction is the key for automatic spelling correction. Misspellings are corrected automatically by spelling tools.
The precision, recall, and F measure are used to measure the performance of spelling error detection and correction. The performance is used to evaluate:
  • Spelling error detection, including spelling checking dictionary and associated algorithm (spelling error exceptions).
  • Spelling error correction, it is overall performance, including detection, algorithm of ranking.

II. Performance of Spelling Correction

  • 1. Original to GoldStd: all changes are total relevant (= [TP] + [FN])
  • 2. Original to Correction: all changes are retrieved (= [TP] + [FP])

  • Changes are in both 1 and 2 are retrieved-relevant [TP], retrieved-relevant
  • Changes are only in 2, but not in 1, are [FP], retrieved-not-relevant
  • Changes are only in 1, but not in 2, are [FN], not-retrieved-relevant

  • Precision = [TP] / ([TP] + [FP])
  • Recall = [TP] / ([TP] + [FN])
  • F1 = (2 x P X R) / (P + R)

III. Detection and Correction Algorithm in Evaluation Tools

  • Comparing difference between:
    • OrgFile and GoldStd (DiffOrgGold):
    • OrgFile and Corrected (DiffOrgCorr):
  • Detection:
    • TP
      • Changes in both DiffOrgGold and DiffOrgCorr
        • change in DiffOrgCorr is the detection
        • change in DiffOrgGold should be corrected
        • So, change (no need to be the same) are the detection
    • FP
      • Change only in DiffOrgCorr
    • FN
      • Change only in DiffOrgGold
  • Correction:
    • [TP]
      • Changes in both DiffOrgGold and DiffOrgCorr
        • Changes are the same
        • Changes are spVar of each other
    • [FP]
      • Changes only in DiffOrgCorr
    • [FN]
      • Changes only in DiffOrgGold

IV. Evaluation Tools

  • shell> cd ${POST_PROCESS}/bin
  • shell> PerformanceTest
    • 0: run the correction first
    • 2: run the performance test for detection and correction