Non-word Correction
This page describes the algorithm for non-word correction.
I. Functions
II. Results on Training Set
Tests CSpell ranking mode on the development set for non-word with different function modes:
Function Mode | Raw data | Performance |
---|---|---|
ESpell | 230|1180|774 | 0.1949|0.2972|0.2354 |
Jazzy (ASpell) | 186|393|774 | 0.4733|0.2403|0.3188 |
Ensemble | 552|825|774 | 0.6691|0.7132|0.6904 |
CSpell, non-dictionary-based | ||
non-dictionary-based | 340|373|774 | 0.9115|0.4393|0.5929 |
CSpell, non-word, Single Function | ||
1-to-1 | 588|699|774 | 0.8412|0.7597|0.7984 |
Split | 365|469|774 | 0.7783|0.4716|0.5873 |
Merge | 343|382|774 | 0.8979|0.4432|0.5934 |
CSpell, non-word, Combined Functions | ||
1-to-1 + Split | 603|724|774 | 0.8329|0.7791|0.8051 |
1-to-1 + Split + Merge | 606|731|774 | 0.8290|0.7829|0.8053 |
From the results:
III. Examples
ID | Input | Output | Notes |
---|---|---|---|
ND-1 | "Good" | "Good" | Xml/Html handler |
ND-2 | pls | please | Informal Expression handler |
ND-3 | 20years | 20 years | Leading Digit Splitter |
ND-4 | from2007 | from 2007 | Ending Digit Splitter |
ND-5 | volunteers(healthy) | volunteers (healthy) | Leading Punctuation Splitter |
ND-6 | pain.help! | pain. help! | Ending Punctuation Splitter |
ND-7 | pain.pls help! | pain. please help! | Combo |
ND-8 | visit at pain.com! | visit at pain.com! | No correction! |
ID | Input | Output | Notes |
---|---|---|---|
M-1 | dur ing | during | Merge |
M-2 | non drug | nondrug | Merge |
M-3 | non protein | non-protein | Merge with hyphen |
M-4 | non surgical | non surgical | No merge |
multiword | Element-non-word |
---|---|
non surgical | non |
in vitro | vitro |
in vivo grown | vivo |
intra articular route | intra |
per se | se |
ID | Input | Output |
---|---|---|
1-1 | good diagnosised | good diagnosis |
1-2 | was diagnosised with | was diagnosed with |
ID | Input | Output |
---|---|---|
S-1 | thankyou | thank you |
S-2 | shuntfrom2007.how | shunt from 2007. how |