Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Candidate Generators
I. Introduction
Tokens that are identified as spelling errors from detectors will be corrected. The first step is to find correct spelling candidates, then use ranking system to find the best match (highest rank) from the candidate list. The Church's reverse minimum edit distance technique was used to generate non-word spelling and split candidates to avoid expensive edit distance computations between a misspelled word and all words (~0.6 million) in the dictionary. In general, two steps are implemented to find candidates for 1-to-1 and split correction:
Edit Distance | Instance | Percentage | Accu. Percentage |
---|---|---|---|
1 | 317 | 67.74% | 67.74% |
2 | 110 | 23.50% | 91.24% |
3 | 24 | 5.13% | 96.37% |
4 | 8 | 1.71% | 98.08% |
5 | 6 | 1.28% | 99.36% |
6 | 2 | 0.43% | 99.79% |
7 | 1 | 0.21% | 100.00% |
Total | 468 | 100.00% | 100.00% |
II. Algorithm
CS_CAN_MAX_CANDIDATE_NO
)