Factor Analysis Results
I. Error Types
Correction Type | Details |
PreCorrection |
|
Dictionary-based Correction
|
|
Combination | TBD |
Correction Type | Details |
Not in checkDic, Not Correct |
|
II. Analysis Results
The results on baseline data are shown belows:
Results | Jazzy | Baseline | Medline | Lexicon | Lexicon.E* | Combo1** | Combo2*** |
---|---|---|---|---|---|---|---|
Performance (by Baseline program) | |||||||
TP|Ret.|Rel. Precision, Recall, F1 |
|
|
|
|
|
|
|
Tagged terms (833), should be corrected | |||||||
B2.1. DicCorr (T) | 227 (48.5043%) | 232 (49.5726%) | 205 (43.8034%) | 234 (50.0000%) | 235 (50.2137%) | 226 (48.2906%) | 210 (44.8718%) |
B2.2. DicCorr (F) | 241 (51.4957%) | 236 (50.4274%) | 263 (56.1966%) | 234 (50.0000%) | 233 (49.7863%) | 242 (51.7094%) | 258 (55.1282%) |
Tag issue: re-check the annotation | |||||||
B2.2.1. Not detect, real-word (error tag) | 36 (7.6923%) | 49 (10.4701%) | 43 (9.1880%) | 50 (10.6838%) | 50 (10.6838%) | 50 (10.6838%) | 50 (10.6838%) |
Detection issue: Check dictionary + exception algorithm | |||||||
B2.2.2. Not detect, spelling error (non-word) | 20 (4.2735%) | 54 (11.5385%) | 76 (16.2393%) | 57 (12.1795%) | 57 (12.1795%) | 57 (12.1795%) | 85 (18.1624%) |
Candidate issue: edit distance + phonetic + Suggesting dictionary | |||||||
B2.2.3. Detect, not candidates by edit-distance | 37 (7.9060%) | 34 (7.2650%) | 29 (6.1966%) | 32 (6.8376%) | 32 (6.8376%) | 32 (6.8376%) | 28 (5.9829%) |
B2.2.4. Detect, not candidates by suggestion Dic | 79 (16.8803%) | 11 (2.3504%) | 19 (4.0598%) | 17 (3.6325%) | 20 (4.2735%) | 15 (3.2051%) | 15 (3.2051%) |
B2.2.5. Detect, not candidates by multi-corrections | 2 (0.4274%) | 6 (1.2821%) | 13 (2.7778%) | 5 (1.0684%) | 5 (1.0684%) | 6 (1.2821%) | 6 (1.2821%) |
Ranking issue: in candidate list | |||||||
B2.2.6. Detect, Candidates, wrong (not top) rank | 62 (13.2479%) | 75 (16.0256%) | 77 (16.4530%) | 65 (13.8889%) | 57 (12.1795%) | 75 (16.0256%) | 69 (14.7436%) |
B2.2.7. Detect, Candidates, wrong top rank | 5 (1.0684%) | 7 (1.4957%) | 6 (1.2821%) | 8 (1.7094%) | 12 (2.5641%) | 7 (1.4957%) | 5 (1.0684%) |
Valid word (not-tagged), but not in checkDic, corrected wrong | |||||||
A2.2.1. Not in checkDic, corrected wrong, by Dic | 1912 (7.8287%) | 139 (0.5691%) | 121 (0.4954%) | 143 (0.5855%) | 137 (0.5609%) | 70 (0.2866%) | 51 (0.2088%) |
A2.2.2. Not in checkDic, corrected wrong, by Pre | 41 (0.1679%) | 33 (0.1351%) | 27 (0.1106%) | 31 (0.1269%) | 31 (0.1269%) | 31 (0.1269%) | 26 (0.1065%) |
Summary | |||||||
Check Dic B2.2.2+A2.2.1+A2.2.2 | 1973 | 226 | 224 | 231 | 225 | 158 | 162 |
Sugg Dic B2.2.3+B2.2.3+B2.2.4 | 118 | 51 | 61 | 54 | 57 | 53 | 49 |
edit distance | instance | percentage | Accu. percentage |
---|---|---|---|
1 | 317 | 67.74% | 67.74% |
2 | 110 | 23.50% | 91.24% |
3 | 24 | 5.13% | 96.37% |
4 | 8 | 1.71% | 98.08% |
5 | 6 | 1.28% | 99.36% |
6 | 2 | 0.43% | 99.79% |
7 | 1 | 0.21% | 100.00% |