Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Phonetic Similarity Score

Introduction

This page describes the algorithm to calculate the similarity score by phonetic approach. The idea is to find edit distance score on the converted Metaphone code of the token and candidate.

Algorithm

  • Convert string to Metaphone code with max. code length = 10 (use double Metaphone)
  • Get edit distance for Metaphone code:
    • delete cost = 95
    • insert cost = 95
    • replace cost = 100
    • swap cost = 90
    • case change cost = 10
    • split cost = insert cost = 95, for each split
  • Get penalty for split
  • Similarity Score = Edit Distance + penalty, use ceiling 1000 (0.00 <= similarity score <= 1.00)

Source Code: