LexBuild - Check Spelling Variants
The spelling variants could be typos, which results in:
This feature identifies possible error by comparing Edit Distance among
Edit Distance | Percent | Threshold | Notes |
---|---|---|---|
0 | 4.87% | 4.87% | Different cases |
1 | 75.38% | 80.26% | |
2 | 15.15% | 95.41% | Used to cover 95% |
3 | 3.65% | 99.07% | |
4 | 0.68% | 99.75% | |
5+ | 0.25% | 100.00% |
From above table, a threshold of 2 is used for sending out warning message in LexBuild to let LB users have second chance to verify the input spelling variants. With threshold at edit distance of 2, it covers more than 95% of correct spelling variants. On the other hand, the LB users could ignore the warning message if the input is correct (for the other 5%).
Program:
shell> $LEXBUILD_DIR/Tools/PostProcessing/AnalyzeEditDistance
Inputs:
$LEXBUILD_DIR/data/WebApp/Outputs/Lexicon/LEXICON
Outputs:
Format:
Eidt Distance|EUI|Base|Spelling Vars|Category
Notes:
editDistance.data is sorted (> sort -r editDistance.data > editDistance.sort.data) for further analysis. A manually sort might be needed for records with edit distance more than 10.