Ensemble Performance
This page describes the initial performance tests on the Ensemble method (from Dr. Halil).
The Source code of Ensemble Spelling Correction that is used as baseline (for developing and comparison) is slightly better than what was reported in the paper due to following reasons.
The results of 472 files are listed in the following tables (tested on lexdev):
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 289 | 58 | 525 | 347 | 814 | 0.8329 | 0.3550 | 0.4978 | 87 min. |
Non-word | W/ Orthographic similarity | 495 | 329 | 319 | 824 | 814 | 0.6007 | 0.6081 | 0.6044 | 82 Min. |
Non-word | W/ Corpus Frequency | 361 | 449 | 453 | 810 | 814 | 0.4457 | 0.4435 | 0.4446 | 83 min. |
Non-word | W/ Context Similarity | 350 | 457 | 464 | 807 | 814 | 0.4337 | 0.4300 | 0.4318 | 80 min. |
Non-word | All (Ensemble) | 531 | 294 | 283 | 825 | 814 | 0.6436 | 0.6523 | 0.6480 | 80 min. |
Real-word | All (Ensemble) |
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 221 | 53 | 416 | 274 | 637 | 0.8066 | 0.3469 | 0.4852 | 80 min. |
Non-word | W/ Orthographic similarity | 388 | 267 | 249 | 655 | 637 | 0.5924 | 0.6091 | 0.6006 | 71 Min. |
Non-word | W/ Corpus Frequency | 278 | 363 | 359 | 641 | 637 | 0.4337 | 0.4364 | 0.4351 | 72 min. |
Non-word | W/ Context Similarity | 268 | 371 | 369 | 639 | 637 | 0.4194 | 0.4207 | 0.4201 | 70 min. |
Non-word | All (Ensemble) | 413 | 243 | 224 | 656 | 637 | 0.6296 | 0.6484 | 0.6388 | 70 min. |
Real-word | All (Ensemble) |
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 68 | 5 | 109 | 73 | 177 | 0.9315 | 0.3842 | 0.5440 | 10 min. |
Non-word | W/ Orthographic similarity | 107 | 62 | 70 | 169 | 177 | 0.6331 | 0.6045 | 0.6185 | 10 Min. |
Non-word | W/ Corpus Frequency | 83 | 86 | 94 | 169 | 177 | 0.4911 | 0.4689 | 0.4798 | 10 min. |
Non-word | W/ Context Similarity | 83(82) | 85(86) | 94(95) | 168 | 177 | 0.4940 | 0.4689 | 0.4812 | 10 min. |
Non-word | All (Ensemble) | 117 | 52 | 60 | 169 | 177 | 0.6923 | 0.6610 | 0.6763 | 10 min. |
Real-word | All (Ensemble) |