Spell Correction Evaluation
I. Introduction
Both spelling error detection and correction should be evaluated:
- Spelling error detection is the first step of spelling error correction. Terms that are identified as spelling errors are further processed to be corrected.
- Spelling error correction is the key for automatic spelling correction. Misspellings are corrected automatically by spelling tools.
The precision, recall, and F measure are used to measure the performance of spelling error detection and correction. The performance is used to evaluate:
- Spelling error detection, including spelling checking dictionary and associated algorithm (spelling error exceptions).
- Spelling error correction, it is overall performance, including detection, algorithm of ranking.
II. Performance of Spelling Correction
- 1. Original to GoldStd: all changes are total relevant (= [TP] + [FN])
- 2. Original to Correction: all changes are retrieved (= [TP] + [FP])
- Changes are in both 1 and 2 are retrieved-relevant [TP], retrieved-relevant
- Changes are only in 2, but not in 1, are [FP], retrieved-not-relevant
- Changes are only in 1, but not in 2, are [FN], not-retrieved-relevant
- Precision = [TP] / ([TP] + [FP])
- Recall = [TP] / ([TP] + [FN])
- F1 = (2 x P X R) / (P + R)
III. Detection and Correction Algorithm in Evaluation Tools
- Comparing difference between:
- OrgFile and GoldStd (DiffOrgGold):
- OrgFile and Corrected (DiffOrgCorr):
- Detection:
- TP
- Changes in both DiffOrgGold and DiffOrgCorr
- change in DiffOrgCorr is the detection
- change in DiffOrgGold should be corrected
- So, change (no need to be the same) are the detection
- FP
- Change only in DiffOrgCorr
- FN
- Change only in DiffOrgGold
- Correction:
- [TP]
- Changes in both DiffOrgGold and DiffOrgCorr
- Changes are the same
- Changes are spVar of each other
- [FP]
- Changes only in DiffOrgCorr
- [FN]
- Changes only in DiffOrgGold
IV. Evaluation Tools
shell> cd ${POST_PROCESS}/bin
shell> PerformanceTest
- 0: run the correction first
- 2: run the performance test for detection and correction