You are here

Spell checker for consumer language (CSpell).

Printer-friendly versionPrinter-friendly version
J Am Med Inform Assoc. 2019 Jan 21. doi: 10.1093/jamia/ocy171. [Epub ahead of print]
Abstract: 

Objective:

Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above.

Methods:

We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions.

Results:

Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively.

Discussion:

The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system.

Conclusion:

CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.

 

Lu C, Aronson, AR, Shooshan SE, Demner-Fushman D. Spell checker for consumer language (CSpell). J Am Med Inform Assoc. 2019 Jan 21. doi: 10.1093/jamia/ocy171. [Epub ahead of print]