Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Overlap Similarity Score
Introduction:
This page describes the algorithm of calculating similarity score for leading and trailing characters overlapping. This algorithm is enhanced for better ranking from the Ensemble by adding weights on the leading and ending overlapping (see examples at the end) and use it as tiebreaker when the score of edit distance and phonetic are tie.
Algorithm:
condition | formula | example |
---|---|---|
leadOverLap = minLen | Score = (1.0*leadOverlap + 0.1*trailOverlap)/(1.0*maxLen) |
|
trailOverLap = minLen | Score = (0.1*leadOverlap + 1.0*trailOverlap)/(1.0*maxLen) |
|
else | Score = (1.0*leadOverLap + 1.0*trailOverLap)/(1.0*maxLen) |
|
where:
Example | String 1 | string 2 | Calculation | Score | Notes |
---|---|---|---|---|---|
Ex-1 | spel | spell | =(4+0.1)/5 | 0.82 | spel is closer to spell than speil |
spel | speil | =(3+1)/5 | 0.80 | ||
Ex-2 | spell | sspell | =(0.1+5)/6 | 0.85 | spell is closer to sspell than nspell |
spell | nspell | =(0+5)/6 | 0.83 |
Source code: