CSpell

Performance Tests on Phonetic Similarity Score

I. Test Setup

  • Data: Training Set
  • Gold Standard: non-word only
  • Dictionary: CSpell (Lexicon-based)
  • Corpus: none
  • Ranking: Orthographic ranking

II. Test Results

  • Tests on various phonetic coding system within orthographic similarity score ranking.

    IDPhoneticPrecisionRecallF1
    11Double Metaphone0.74900.75190.7505
    12Refined Soundex0.73320.73700.7351
    13Caverphone-20.71720.72090.7191
    14Metaphone0.74870.75060.7497
    15Metaphone-30.74520.74810.7466

  • Tests on various weighting factors (WF) on costs of the edit distance (delete, insert, substitute, and transpose) with Metaphone 2 in the orthographic similarity score.

    IDDeleteInsertSubstituteTransposePrecisionRecallF1Notes
    10.950.950.950.950.74900.75190.7505Same ratio of WF
    21.000.950.950.950.73490.73770.7363Increasing 1 WF
    30.951.000.950.950.72750.73130.7294
    40.950.951.000.950.74130.74420.7427
    50.950.950.951.000.74900.75190.7505
    60.900.950.950.950.72750.73130.7294Decreasing 1 WF
    70.950.900.950.950.74390.74680.7453
    80.950.950.900.950.71720.72090.7191
    90.950.950.950.900.74390.74680.7453
    100.950.900.951.000.74390.74680.7453Try and error to find the WF of cost and phonetic
    99-10.950.950.950.900.73750.74030.7389

  • Tests on various weighting factors (WF) on costs of the edit distance (delete, insert, substitute, and transpose). The WF for orthographic is 1.0, 1.0, 1.0.

III. Discussion

  • From the results of test 11-15, we chose Double Metaphone as the phonetic system in the orthographic similarity score.

  • From the results of test 2-5, we observed the higher the weighting factor of transpose cost, the better the F1 score.
  • From the results of test 6-9, we observed the lower the weighting factor of insert cost, the better the F1 score.
  • Find the best F1 by try and error from tests 10-99-1, that is lower the cost of insert and raise the cost of transpose.

  • Use test 13 for the weighting factors for costs of delete, insert, substitute and transpose.