CSpell

Spelling (1-to-1) Candidates

I. Introduction

A token is a single word (term without spaces). If a token is corrected to a single word for spelling is called 1-to-1 spelling correction. If the token is a spelling error, it is a non-word spelling error correction. Otherwise, it is a real-word correction. This page focus on the non-word correction.

II. Algorithm

Get all possible edit combinations from the spelling error words within 1 edit distance by:
- Deletion: abc -> {bc, ac, ab}
- Insertion: abc -> {aabc, babc, ..., zabc, aabc, abbc, ..., abcz}
  => Do not insert space (split has different suggestion dictionary filter)
- Substitution: bc -> {abc, bbc, ..., zbc, aac, abc, ..., abz}
- Transposition (Swap): abc -> {bac, acb}
- Sum all above combinations
Get all possible combinations from the spelling error words within 2 edit distance by:
- Apply the same algorithm on the result of above (edit distance 1)
- Sum all combinations
Check all combinations within 2 edit-distance if there are dic.IsDicWord( )
- If yes, they are suggested candidates

III. Test Examples

Error token	Corrected word	Edit Distance	Notes
non-word 1-to-1 correction
neef	need	1
thier	their	1
shud	should	2	Metaphone2: [XT]\|[XLT]
cort	caught	4	Metaphone2: [KRT]\|[KFT]\|1
real-word 1-to-1 correction
there	their	2	Metaphone2: [0R]\|[OR]

IV. TBD

Algorithm:
- Substitution might need to add phonetic in, such as "f" to "ph", "k" -> "ch",
- Substitution might need to consider the adjacent keys for edit distance of three. For example, "f" -> {t, r, e, d, c, v, b, g}.