CSpell

Ensemble Method Score

I. Introduction

Ensemble method was implemented in CSpell for comparison. The original equation are:

Ensemble Score = 0.15 * (Context Score) + 0.25 * (Frequency Score) + 0.2 * (Orthographic Score)

where:

  • Orthographic Score = (Edit Distance Score + Phonetic Similarity Score + Overlap Similarity Score)
  • Please notes there are slightly difference on the overlap similarity implementation.
  • The word frequency score uses different equation
  • The context score uses dual embedding, input matrix (syn0) and output matrix (syn1n), instead of using single embedding of the input matrix (syn0) for prediction words.

II. Results

Tests on non-word on the development set data with different ranking mode for the function mode of 1-to-1 and Split

Ranking ModeRaw dataPerformance
Orthographic592|769|7740.7698|0.7649|0.7673
Frequency534|770|7740.6935|0.6899|0.6917
Context446|554|7740.8051|0.5762|0.6713
Ensemble586|769|7740.7620|0.7571|0.7596

From the result:

  • The ensemble is a good ranking method with better performance than word frequency and Context (despite the different implementation in the ranking components).