Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Spelling Variant Patterns - Test on Lexicon-LRSPL
Norm, MES, and ES are used in a sequential order to retrieve the most spelling variant groups. This model is tested on Lexicon (inflVars.data) for the recall, precisino, F1, and accuracy. The results are shown as follows:
2015 (Used in AMIA paper submission)
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lexicon.2015 | N/A | 867,728 | 363,217 | 0 | 0 | 504,511 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
1 | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
2 | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
3 | ES | 1 | 867,728 | 360,599 | 286,932 | 2,618 | 217,579 | 0.5569 | 0.9928 | 0.7135 | 0.6663 |
4 | MES | 3 | 867,728 | 360,956 | 301,097 | 2,261 | 203,414 | 0.5452 | 0.9938 | 0.7041 | 0.6504 |
5 | ES | 2 | 867,728 | 362,082 | 353,512 | 1,135 | 150,999 | 0.5060 | 0.9969 | 0.6713 | 0.5913 |
6 | MES | 4 | 867,728 | 362,159 | 356,156 | 1,058 | 148,355 | 0.5042 | 0.9971 | 0.6697 | 0.5883 |
Step 6 is the final results we use for the matcher. Use it as example for calculation check:
Check Item | Check numbers |
---|---|
Total sample no | 867,728 = 362,159 + 356,156 + 1,058 + 148,355 |
Precision | 0.5042 = 362,159 / (362,159 + 356,156) |
Recall | 0.9971 = 362,159 / (362,159 + 1,058) |
F1 | 0.6697 = (2 * 0.5042 * 0.9971) / (0.5042 + 0.9971) |
Accuracy | 0.5883 = (362,159 + 148,355) / 867,728 |
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|
Baseline (step-2 from above) | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
1 | Double Metaphone (10) | 2 | 867,728 | 356,375 | 178,698 | 6,842 | 325,813 | 0.6660 | 0.9812 | 0.7935 | 0.7862 |
2 |
| 2 | 867,728 | 354,790 | 151,028 | 8,427 | 353,483 | 0.7014 | 0.9768 | 0.8165 | 0.8162 |
3 |
| 2 | 867,728 | 352,911 | 115,531 | 10,306 | 388,980 | 0.7534 | 0.9716 | 0.8487 | 0.8550 |
Enhanced SpVarNorm | |||||||||||
Baseline (step-1 from above) | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
New Basline | Norm | N/A | 867,728 | 304,831 | 3,973 | 58,386 | 500,538 | 0.9871 | 0.8393 | 0.9072 | 0.9281 |
4 |
| 2 | 867,728 | 352,826 | 114,271 | 10,391 | 390,240 | 0.7554 | 0.9714 | 0.8499 | 0.8563 |
5 |
| 2 | 867,728 | 352,675 | 105,623 | 10,542 | 398,888 | 0.7695 | 0.9710 | 0.8586 | 0.8661 |
New GoldStandard - with inflectional Spelling Variants | |||||||||||
6.0 | Norm | 2 | 867,728 | 305,329 | 3,475 | 74,447 | 484,477 | 0.9887 | 0.8040 | 0.8868 | 0.9102 |
6.1?? |
| 2 | 867,728 | 369,200 | 97,897 | 10,576 | 390,055 | 0.7904 | 0.9722 | 0.8719 | 0.8750 |
6.1 |
| 1 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
6.2 |
| 2 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
Tried:
Example | Term | Metaphone 1 | Metaphone 2 | Notes |
---|---|---|---|---|
1 | meagreness | MKRNS | MKRNS |
|
meagerness | MJRNS | MKRNS | ||
2 | abkhasian | ABKHXN | APKSN |
|
abkhazian | ABKHSN | APKSN | ||
3 | toxic edema | TKSSTM | TKSKTM |
|
toxic oedema | TKSKTM | TKSKTM |
Example | Term | Metaphone 1 | Metaphone 2 | Caverphone 2.0 | Notes |
---|---|---|---|---|---|
1 | zymographical | SMKRFKL | SMKRFKL | SMKRFKA111 |
|
zymographically | SMKRFKL | SMKRFKL | SMKRFKLA11 | ||
2 | absorption test | ABSRPXNTST | APSRPXNTST | APSPSNTST1 |
|
absorption tests | ABSRPXNTST | APSRPXNTST | APSPSNTSTS | ||
3 | bacterial culture media | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTA |
|
bacterial culture medium | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTM |
Example | Term | Metaphone (10) | Metaphone (60) | Notes |
---|---|---|---|---|
1 | 2-item patient health questionnair | TMPTNTL0KS | ITMPTN0L0KSXNR |
|
2-item patient health questionnaires | TMPTNTL0KS | TMPTNTL0KSXNRS | ||
2 | bacterial culture media | PKTRLKLTRM | PKTRLKLTRMT |
|
bacterial culture medium | PKTRLKLTRM | PKTRLKLTRMTM |
Example | Singular | Plural | Notes |
---|---|---|---|
1 | aan | aan's |
|
2 | dcmp deaminase | dcmp deaminase's |
|
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | acroscleroses | AKRSKRSS | AKRSKLRSS1 | singular |
|
acrosclerosis | AKRSKRSS | AKRSKLRSS1 | plural | ||
2 | ammon's horn scleroses | AMNSRNSKRSS | AMNSNSKLRS | singular |
|
ammon's horn sclerosis | AMNSRNSKRSS | AMNSNSKLRS | plural | ||
3 | fimbria | FMPR | FMPRA11111 | singular |
|
fimbriae | FMPR | FMPRA11111 | plural | ||
4 | infraorbital foramen | ANFRRPTLFRMN | ANFRPTFRMN | singular |
|
infraorbital foramina | ANFRRPTLFRMN | ANFRPTFRMN | plural |
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | zygomycetes | SKMSTS | SKMSTS1111 | ? |
|
zygomycetous | SKMSTS | SKMSTS1111 | ? |
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | zygapophyseal joint | TBD | TBD | ? |
|
zygapophysial joint | TBD | TBD | ? | ||
2 | zuclomifene | TBD | TBD | ? |
|
zuclomiphene | TBD | TBD | ? |