CSpell

Performance Tests on Orthographic Similarity Score

I. Test Setup

  • Data: Training Set
  • Gold Standard: non-word only
  • Dictionary: CSpell (Lexicon-based)
  • Corpus: none
  • Ranking: Orthographic ranking

II. Test Results

  • Tests on token (edit distance), phonetic, and overlap similarity scores:
    IDRankingPrecisionRecallF1
    0-1Edit Distance0.76060.76360.7621
    0-2Phonetic0.74900.75190.7505
    0-3Overlap0.75420.75710.7556

  • Tests on orthographic similarity scores using various weighting factors (WF) of token (edit distance), phonetic, and overlap similarity scores:

    IDEdit DistancePhoneticOverlapPrecisionRecallF1Notes
    11.001.001.000.75800.76100.7595same ratio of WF
    20.950.950.950.75800.76100.7595
    30.900.900.900.75800.76100.7595
    41.000.900.900.75930.76230.7608Increase 1 WF
    50.901.000.900.75800.76100.7595
    60.900.901.000.75800.76100.7595
    70.800.900.900.75800.76100.7595Decrease 1 WF
    80.900.800.900.75930.76230.7608
    90.900.900.800.75800.76100.7595
    101.000.800.900.75930.76230.7608Try and error by increasing Edit distance, decreasing phonetic
    111.000.800.850.75930.76230.7608
    121.000.700.800.76060.76360.7621
    131.000.700.900.75930.76230.7608
    141.000.000.000.76060.76360.7621
    151.000.500.800.76060.76360.7621
    161.000.600.800.76060.76360.7621
    171.000.650.800.76060.76360.7621
    181.000.650.900.76060.76360.7621
    191.000.650.900.76060.76360.7621
    201.000.750.900.75930.76230.7608
    211.000.850.900.75930.76230.7608

    III. Discussion

    • From the test 0-1-0-3, the order of better ranking in orthographic is Edit-distance, overlap, phonetic
    • The result of tests 1-3 are the same. That is the same ratio of weighting factors leads to same results
    • From the results of test 4-6, we observed the higher the weighting factor of edit distance similarity score, the better the F1 score.
    • From the results of test 7-9, we observed the lower the weighting factor of phonetic similarity score, the better the F1 score.
    • Find the best F1 by try and error on tests 10-21