Performance Tests on Ranking Systems (2-Stage)
I. Test Setup
- Data: Training Set
- Gold Standard: non-word only
- Dictionary: CSpell
- Corpus: consumer Health corpus
- Ranking: Combined scores
II. Test Results
Stage-1 | Stage-2 | Precision | Recall | F1
|
---|
1-Stage Single Ranking
|
---|
Orthographic | N/A | 0.7606 | 0.7636 | 0.7621
|
Word Frequency | N/A | 0.6970 | 0.6925 | 0.6948
|
Noisy Channel | N/A | 0.7134 | 0.7171 | 0.7152
|
Context Score | N/A | 0.8035 | 0.5917 | 0.6815
|
1-Stage Combined Ranking
|
---|
Ensemble | N/A | 0.7516 | 0.7545 | 0.7531
|
2-Stage Single Ranking
|
---|
Orthographic | Word Frequency | 0.8241 | 0.7687 | 0.7955
|
Orthographic | Noisy Channel | 0.8255 | 0.7700 | 0.7968
|
Orthographic | Context Score | 0.8996 | 0.5672 | 0.6957
|
2-Stage Combined Ranking: CSpell
|
---|
Orthographic | Context Score, Noisy Channel | 0.8407 | 0.7842 | 0.8115
|
III. Discussion
- Stage-1 ranking:
- The non-word spelling and split candidate generator that relies on edit distance measure alone generate irrelevant candidates.
- Orthographic similarity scores, includes phonetic and overlap similarity, was used to exclude irrelevant candidates.
- The orthographic score ranking has the highest F1.
- Stage-2 ranking:
From either the 1-stage or 2-stage ranking, we observed:
- Context score ranking had the highest precision
- Noisy Channel ranking had the higher recall and F1 (2.04%) compared to word frequency (from 69.48% improved to 71.52%).
- Thus, we use the chain comparator in the sequential order of context score (for high precision), then noisy channel (for increasing recall) in the stage-2 ranking.
- Compare 1-stage and 2-stage:
- The best 1-stage ranking is Orthographic with F1 of 0.7621
=> This is better than the combined technique of Ensemble
- The best 2-stage ranking is CSpell with F1 of 0.8115
- The improvement from the best 1-stage ranking to 2-stage ranking is 4.94% (from 76.21% to 81.15%)