Non-word Correction
This page describes the algorithm for non-word correction.
I. Functions
II. Results on Training Set
Tests CSpell ranking mode on the development set for non-word with different function modes:
| Function Mode | Raw data | Performance |
|---|---|---|
| ESpell | 230|1180|774 | 0.1949|0.2972|0.2354 |
| Jazzy (ASpell) | 186|393|774 | 0.4733|0.2403|0.3188 |
| Ensemble | 552|825|774 | 0.6691|0.7132|0.6904 |
| CSpell, non-dictionary-based | ||
| non-dictionary-based | 340|373|774 | 0.9115|0.4393|0.5929 |
| CSpell, non-word, Single Function | ||
| 1-to-1 | 588|699|774 | 0.8412|0.7597|0.7984 |
| Split | 365|469|774 | 0.7783|0.4716|0.5873 |
| Merge | 343|382|774 | 0.8979|0.4432|0.5934 |
| CSpell, non-word, Combined Functions | ||
| 1-to-1 + Split | 603|724|774 | 0.8329|0.7791|0.8051 |
| 1-to-1 + Split + Merge | 606|731|774 | 0.8290|0.7829|0.8053 |
From the results:
III. Examples
| ID | Input | Output | Notes |
|---|---|---|---|
| ND-1 | "Good" | "Good" | Xml/Html handler |
| ND-2 | pls | please | Informal Expression handler |
| ND-3 | 20years | 20 years | Leading Digit Splitter |
| ND-4 | from2007 | from 2007 | Ending Digit Splitter |
| ND-5 | volunteers(healthy) | volunteers (healthy) | Leading Punctuation Splitter |
| ND-6 | pain.help! | pain. help! | Ending Punctuation Splitter |
| ND-7 | pain.pls help! | pain. please help! | Combo |
| ND-8 | visit at pain.com! | visit at pain.com! | No correction! |
| ID | Input | Output | Notes |
|---|---|---|---|
| M-1 | dur ing | during | Merge |
| M-2 | non drug | nondrug | Merge |
| M-3 | non protein | non-protein | Merge with hyphen |
| M-4 | non surgical | non surgical | No merge |
| multiword | Element-non-word |
|---|---|
| non surgical | non |
| in vitro | vitro |
| in vivo grown | vivo |
| intra articular route | intra |
| per se | se |
| ID | Input | Output |
|---|---|---|
| 1-1 | good diagnosised | good diagnosis |
| 1-2 | was diagnosised with | was diagnosed with |
| ID | Input | Output |
|---|---|---|
| S-1 | thankyou | thank you |
| S-2 | shuntfrom2007.how | shunt from 2007. how |