CSpell

CSpell

Performance Tests on Context Window Size

I. Test Setup

Data: Training Set
Gold Standard: non-word only
Dictionary: CSpell (Lexicon-based)
Corpus: Consumer health corpus
Ranking: Context score and CSpell ranking

II. Test Results

Tests on various context window sizes in context score ranking

Context Radius	Precision	Recall	F1
1	0.7780	0.6111	0.6845
2	0.8035	0.5917	0.6815
3	0.8044	0.5685	0.6662
4	0.8156	0.5543	0.6600
5	0.8252	0.5491	0.6594
6	0.8281	0.5413	0.6547
7	0.8240	0.5323	0.6468
8	0.8320	0.5310	0.6483
9	0.8443	0.5323	0.6529
10	0.8374	0.5258	0.6460
25	0.8433	0.5078	0.6339
50	0.8442	0.5039	0.6311
100	0.8442	0.5039	0.6311

Tests on various context window sizes in CSpell score ranking

Context Radius	Precision	Recall	F1
1	0.8380	0.7817	0.8088
2	0.8407	0.7842	0.8115
3	0.8366	0.7804	0.8075
4	0.8352	0.7791	0.8061
5	0.8352	0.7791	0.8061
6	0.8296	0.7739	0.8008
7	0.8310	0.7752	0.8021
8	0.8310	0.7752	0.8021
9	0.8310	0.7752	0.8021
10	0.8296	0.7739	0.8008
25	0.8283	0.7726	0.7995
50	0.8283	0.7726	0.7995
100	0.8283	0.7726	0.7995

III. Discussion

Closer (local) context is more important than far away (global) context
The far (global) context does not contribute too much on context score
The radius of context should be equivalent to window size in the training set. Training window size = (2 * context radius + 1).
Chose radius of 2 (total window size of 5) because it has the best F1 score in CSpell ranking