CSpell

CSpell project icon

CSpell, Java 8.0, UTF-8, 2018 Release:

CSpell, a distributable spell checker for consumer language, is designed to detect and correct various types of spelling errors in Consumer Health Questions. CSpell handles non-word errors, real-words errors, word boundary infraction, punctuation errors, informal expression, and combinations of the above and result in high F1 score and real-time performance. CSpell provides many correction features, configurable options, and Java APIs and can be used as general purpose spelling tools. The following table shows examples of errors corrected by CSpell from consumer health questions. The errors are underlines and the corrections are in italics. NW, RW, and ND stand for non-word, real-word and non-dictionary, respectively.

IDText from Consumer Health QuestionsCorrected Text by CSpellCorrections
Ex-1 My mom was dianosed early on set deminita 3 years ago. My mom was diagnosed early onset dementia 3 years ago.
  • dianosed -> diagnosed (NW spelling)
  • on set -> onset (RW merge)
  • deminita -> dementia (NW spelling)
Ex-2 ... doctors treated tricho rhino phalangeal syndrome. ... doctors treated trichorhinophalangeal syndrome.
  • tricho rhino phalangeal -> trichorhinophalangeal (NW merge)
Ex-3 Irregular bowl movement Irregular bowel movement
  • bowl -> bowel (RW spelling)
Ex-4 Sounding in my ear every time for along time. Sounding in my ear every time for a long time.
  • along -> a long (RW split)
Ex-5 Who need to do test?pls guide me thank u. Who need to do test? please guide me thank you.
  • test?pls -> test? pls (ND split) -> test? please (ND informal expression)
  • u -> you (ND informal expression)
Ex-6 I have a shuntfrom2007.How OftenDo they need changed? I have a shunt from 2007. How often do they need changed?
  • shuntfrom2007.How -> shuntfrom 2007. How (ND split) -> shunt from 2007. How (NW split)
  • OftenDo -> often do (NW split)
Ex-7 I am permanently depressed and was on 2 or 3 different anti depresants. I am permanently depressed and was on 2 or 3 different antidepressants.
  • anti depresants -> anti depressants (NW spelling) -> antidepressants (RW merge)

Please see:

  • Download CSpell, 2018 for installing CSpell on your local machine.

Release Notes

  • Developed in Java 1.8.0_171
  • Correction features:
    • Errors: non-word errors and real-word errors
    • Corrections: spelling, split and merge corrections
    • Dictionary: dictionary-based and non-dictionary-based corrections
    • Ranking Techniques: Combination of context, edit distance, phonetic, overlap, word frequency, noiscy channel, etc.