CSpell

Corrector

This page describes the corrector algorithm that replaces the spelling errors with top ranked candidates to update the text.

I. One-To-One

  • Finding: Find the top rank candidate (TokenObj)
  • Correction: Add to the outTokenList
  • Java: OneToOneSplitCorrector.AddToFlatMapList
  • Example:

    Input...dianosed...
    Top Candidate...diagnosed...
    Correction...diagnosed...

II. Split

  • Finding: Find the top rank candidate (TokenObj)
  • Correction: use FlatMap to the outTokenList
    The top rank candidate (the split words) needs to be flat mapped to a list of TokenObjs and then add to the outTokenList.
  • Java: OneToOneSplitCorrector.AddToFlatMapList.
  • Example:

    Input...brokenbonecannotsleep...
    Top Candidate...broken bone can not sleep...
    Correction...broken bone can not sleep...

III. Merge

  • Finding: Find the top rank candidate (TokenObj)
  • Correction:
    • Update tokens for all MergeObjs
      • Go through all MergeObjs
      • update tokens before target merge start
      • update merge at target
    • add tokens after the last MergeObj
  • Java: ProcessNonWordMerge.CorrectTokenListByMerge
  • Example:

    Input...problemsduringherpregnancies.
    Correction-1...problems 
    Correction-2...problemsduring
    Correction-3...problemsduringherpregnancies.

    * MergeObj:

    tarWordmergeWordcoreMergeWordmergeNotarIndexstartIndexendIndextarPosstartPosendPos
    • xxxIndex is the index in the original text (including space tokens), used in merge operation to correct the input text
    • xxxPos is the index in the non-space token list, used to find the context for context scores.
    • coreMergeWord is used to take care of ending punctuation. Such as "disap point ment." to "disappointment."