CSpell

CSpell

Corrector

This page describes the corrector algorithm that replaces the spelling errors with top ranked candidates to update the text.

I. One-To-One

Finding: Find the top rank candidate (TokenObj)
Correction: Add to the outTokenList
Java: OneToOneSplitCorrector.AddToFlatMapList
Example:

Input ... dianosed ...
Top Candidate ... diagnosed ...
Correction ... diagnosed ...

II. Split

Finding: Find the top rank candidate (TokenObj)
Correction: use FlatMap to the outTokenList
The top rank candidate (the split words) needs to be flat mapped to a list of TokenObjs and then add to the outTokenList.
Java: OneToOneSplitCorrector.AddToFlatMapList.
Example:

Input ... brokenbonecannotsleep ...
Top Candidate ... broken bone can not sleep ...
Correction ... broken bone can not sleep ...

III. Merge

Finding: Find the top rank candidate (TokenObj)
Correction:
- Update tokens for all MergeObjs
  - Go through all MergeObjs
  - update tokens before target merge start
  - update merge at target
- add tokens after the last MergeObj
Java: ProcessNonWordMerge.CorrectTokenListByMerge
Example:

Input ... problems dur ing her pregnancies.
Correction-1 ... problems
Correction-2 ... problems during
Correction-3 ... problems during her pregnancies.

* MergeObj:

tarWord mergeWord coreMergeWord mergeNo tarIndex startIndex endIndex tarPos startPos endPos
- xxxIndex is the index in the original text (including space tokens), used in merge operation to correct the input text
- xxxPos is the index in the non-space token list, used to find the context for context scores.
- coreMergeWord is used to take care of ending punctuation. Such as "disap point ment." to "disappointment."