Overlap Similarity Score
Introduction:
This page describes the algorithm of calculating similarity score for leading and trailing characters overlapping. This algorithm is enhanced for better ranking from the Ensemble by adding weights on the leading and ending overlapping (see examples at the end) and use it as tiebreaker when the score of edit distance and phonetic are tie.
Algorithm:
condition | formula | example |
---|---|---|
leadOverLap = minLen | Score = (1.0*leadOverlap + 0.1*trailOverlap)/(1.0*maxLen) |
|
trailOverLap = minLen | Score = (0.1*leadOverlap + 1.0*trailOverlap)/(1.0*maxLen) |
|
else | Score = (1.0*leadOverLap + 1.0*trailOverLap)/(1.0*maxLen) |
|
where:
Example | String 1 | string 2 | Calculation | Score | Notes |
---|---|---|---|---|---|
Ex-1 | spel | spell | =(4+0.1)/5 | 0.82 | spel is closer to spell than speil |
spel | speil | =(3+1)/5 | 0.80 | ||
Ex-2 | spell | sspell | =(0.1+5)/6 | 0.85 | spell is closer to sspell than nspell |
spell | nspell | =(0+5)/6 | 0.83 |
Source code: