CSpell

Performance Tests on Context Window Size

I. Test Setup

  • Data: Training Set
  • Gold Standard: non-word only
  • Dictionary: CSpell (Lexicon-based)
  • Corpus: Consumer health corpus
  • Ranking: Context score and CSpell ranking

II. Test Results

  • Tests on various context window sizes in context score ranking

    Context RadiusPrecisionRecallF1
    10.77800.61110.6845
    20.80350.59170.6815
    30.80440.56850.6662
    40.81560.55430.6600
    50.82520.54910.6594
    60.82810.54130.6547
    70.82400.53230.6468
    80.83200.53100.6483
    90.84430.53230.6529
    100.83740.52580.6460
    250.84330.50780.6339
    500.84420.50390.6311
    1000.84420.50390.6311

  • Tests on various context window sizes in CSpell score ranking

    Context RadiusPrecisionRecallF1
    10.83800.78170.8088
    20.84070.78420.8115
    30.83660.78040.8075
    40.83520.77910.8061
    50.83520.77910.8061
    60.82960.77390.8008
    70.83100.77520.8021
    80.83100.77520.8021
    90.83100.77520.8021
    100.82960.77390.8008
    250.82830.77260.7995
    500.82830.77260.7995
    1000.82830.77260.7995

III. Discussion

  • Closer (local) context is more important than far away (global) context
  • The far (global) context does not contribute too much on context score
  • The radius of context should be equivalent to window size in the training set. Training window size = (2 * context radius + 1).
  • Chose radius of 2 (total window size of 5) because it has the best F1 score in CSpell ranking