CSpell

Detectors

Detectors are used to detect spelling errors (non-word and real-word). Different corrections require different detectors. For example, the detector for a non-word correction is to detect if a token is a non-word errors (i.e. words not in the dictionary) while the detector for a real-word correction is to detect if a token is a real-word errors (errors are valid words, but not intended). This page uses the detector for non-word spelling (1-to-1) correction to illustrate the concept of detector. Please refer to each process for the details of different types of detectors.

The non-word 1-to-1 detector checks if a token is a spelling error. A token can be valid (not need to be correct) if it is known by the (checking) dictionary or a spelling error exception. They are described as follows:

I. Dictionary

  • Checking Dictionary
    • Element words (no need for multiword because token is a single word)
    • Lower case
    • Verified lexicon is better
    • More coverage of the lexicon is better

II. Algorithm

  • Checking Dictionary
    Check the following on both token and the core-term of the token
    • check possessive
    • check slash or (case/test)
    • check parenthetical plural forms (s), (es), (ies)
  • Error Exceptions
    • IsDigit
    • IsPunc
    • IsDigitPunc
    • IsUrl
    • IsEmail
    • IsEmptyString
    • IsMeasurements (include simplified measurement and pure unit)

III. Exception Examples

InputNotes
year-longSpelling variants
dont'spossessive
123digit
123.456digit
_punctuation
12-35-00digit and punctuation
12.35.00digit and punctuation
clinicaltrials.govurl
http://www.yahoo.com?test=1%20try%20abcurl
123@gmail.comemail
-0.25mmmeasurement
30mg/50kgmeasurement