Non-dictionary-based Corrections
This is the first step for spelling correction. It is used to correct errors that does not need dictionary. The non-dictionary-based correction model includes handlers and splitters. They were arranged as a chain of intermediate operator to handle HTML/XML tags introduced by the software that consumer use to ask questions, informal expression. It also handle missing spaces on adjacent punctuation or digits. Pattern match (regular expression) and table lookup are used in this type of correction. Software components are developed to resolve these issues and detailed as follows:
Types of Splitter | Error | Correction | File Name |
---|---|---|---|
Leading Digit Splitter | 20years | 20 years | 10349 |
Ending Digit Splitter | disease3 | disease 3 | 26 |
Leading Punctuation Splitter | volunteers( | volunteers ( | 12353 |
Ending Punctuation Splitter | cancer?if | cancer? if | 10004 |
File Name | Error | Correction |
---|---|---|
14 | knowabout | know about |
26 | diseaseany | disease any |
11841 | Iam | I am |
11186 | tbinthe | tb in the |
14849 | shuntfrom | shunt from |
10349 | along | a long |