Spelling Variant Patterns - Test on Lexicon-LRSPL
Norm, MES, and ES are used in a sequential order to retrieve the most spelling variant groups. This model is tested on Lexicon (inflVars.data) for the recall, precisino, F1, and accuracy. The results are shown as follows:
2015 (Used in AMIA paper submission)
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lexicon.2015 | N/A | 867,728 | 363,217 | 0 | 0 | 504,511 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
1 | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
2 | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
3 | ES | 1 | 867,728 | 360,599 | 286,932 | 2,618 | 217,579 | 0.5569 | 0.9928 | 0.7135 | 0.6663 |
4 | MES | 3 | 867,728 | 360,956 | 301,097 | 2,261 | 203,414 | 0.5452 | 0.9938 | 0.7041 | 0.6504 |
5 | ES | 2 | 867,728 | 362,082 | 353,512 | 1,135 | 150,999 | 0.5060 | 0.9969 | 0.6713 | 0.5913 |
6 | MES | 4 | 867,728 | 362,159 | 356,156 | 1,058 | 148,355 | 0.5042 | 0.9971 | 0.6697 | 0.5883 |
Step 6 is the final results we use for the matcher. Use it as example for calculation check:
Check Item | Check numbers |
---|---|
Total sample no | 867,728 = 362,159 + 356,156 + 1,058 + 148,355 |
Precision | 0.5042 = 362,159 / (362,159 + 356,156) |
Recall | 0.9971 = 362,159 / (362,159 + 1,058) |
F1 | 0.6697 = (2 * 0.5042 * 0.9971) / (0.5042 + 0.9971) |
Accuracy | 0.5883 = (362,159 + 148,355) / 867,728 |
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|
Baseline (step-2 from above) | MES | 2 | 867,728 | 355,423 | 173,647 | 7,794 | 330,864 | 0.6718 | 0.9785 | 0.7967 | 0.7909 |
1 | Double Metaphone (10) | 2 | 867,728 | 356,375 | 178,698 | 6,842 | 325,813 | 0.6660 | 0.9812 | 0.7935 | 0.7862 |
2 |
| 2 | 867,728 | 354,790 | 151,028 | 8,427 | 353,483 | 0.7014 | 0.9768 | 0.8165 | 0.8162 |
3 |
| 2 | 867,728 | 352,911 | 115,531 | 10,306 | 388,980 | 0.7534 | 0.9716 | 0.8487 | 0.8550 |
Enhanced SpVarNorm | |||||||||||
Baseline (step-1 from above) | Norm | N/A | 867,728 | 306,387 | 19,374 | 56,830 | 485,137 | 0.9405 | 0.8435 | 0.8894 | 0.9122 |
New Basline | Norm | N/A | 867,728 | 304,831 | 3,973 | 58,386 | 500,538 | 0.9871 | 0.8393 | 0.9072 | 0.9281 |
4 |
| 2 | 867,728 | 352,826 | 114,271 | 10,391 | 390,240 | 0.7554 | 0.9714 | 0.8499 | 0.8563 |
5 |
| 2 | 867,728 | 352,675 | 105,623 | 10,542 | 398,888 | 0.7695 | 0.9710 | 0.8586 | 0.8661 |
New GoldStandard - with inflectional Spelling Variants | |||||||||||
6.0 | Norm | 2 | 867,728 | 305,329 | 3,475 | 74,447 | 484,477 | 0.9887 | 0.8040 | 0.8868 | 0.9102 |
6.1?? |
| 2 | 867,728 | 369,200 | 97,897 | 10,576 | 390,055 | 0.7904 | 0.9722 | 0.8719 | 0.8750 |
6.1 |
| 1 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
6.2 |
| 2 | 867,728 | 369,049 | 89,249 | 10,727 | 398,703 | 0.8053 | 0.9718 | 0.8807 | 0.8845 |
Tried:
Example | Term | Metaphone 1 | Metaphone 2 | Notes |
---|---|---|---|---|
1 | meagreness | MKRNS | MKRNS |
|
meagerness | MJRNS | MKRNS | ||
2 | abkhasian | ABKHXN | APKSN |
|
abkhazian | ABKHSN | APKSN | ||
3 | toxic edema | TKSSTM | TKSKTM |
|
toxic oedema | TKSKTM | TKSKTM |
Example | Term | Metaphone 1 | Metaphone 2 | Caverphone 2.0 | Notes |
---|---|---|---|---|---|
1 | zymographical | SMKRFKL | SMKRFKL | SMKRFKA111 |
|
zymographically | SMKRFKL | SMKRFKL | SMKRFKLA11 | ||
2 | absorption test | ABSRPXNTST | APSRPXNTST | APSPSNTST1 |
|
absorption tests | ABSRPXNTST | APSRPXNTST | APSPSNTSTS | ||
3 | bacterial culture media | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTA |
|
bacterial culture medium | BKTRLKLTRM | PKTRLKLTRM | PKTRKTRMTM |
Example | Term | Metaphone (10) | Metaphone (60) | Notes |
---|---|---|---|---|
1 | 2-item patient health questionnair | TMPTNTL0KS | ITMPTN0L0KSXNR |
|
2-item patient health questionnaires | TMPTNTL0KS | TMPTNTL0KSXNRS | ||
2 | bacterial culture media | PKTRLKLTRM | PKTRLKLTRMT |
|
bacterial culture medium | PKTRLKLTRM | PKTRLKLTRMTM |
Example | Singular | Plural | Notes |
---|---|---|---|
1 | aan | aan's |
|
2 | dcmp deaminase | dcmp deaminase's |
|
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | acroscleroses | AKRSKRSS | AKRSKLRSS1 | singular |
|
acrosclerosis | AKRSKRSS | AKRSKLRSS1 | plural | ||
2 | ammon's horn scleroses | AMNSRNSKRSS | AMNSNSKLRS | singular |
|
ammon's horn sclerosis | AMNSRNSKRSS | AMNSNSKLRS | plural | ||
3 | fimbria | FMPR | FMPRA11111 | singular |
|
fimbriae | FMPR | FMPRA11111 | plural | ||
4 | infraorbital foramen | ANFRRPTLFRMN | ANFRPTFRMN | singular |
|
infraorbital foramina | ANFRRPTLFRMN | ANFRPTFRMN | plural |
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | zygomycetes | SKMSTS | SKMSTS1111 | ? |
|
zygomycetous | SKMSTS | SKMSTS1111 | ? |
Example | Term | Metaphone (60) | Caverphone 2.0 | Greco-Latin | Notes |
---|---|---|---|---|---|
1 | zygapophyseal joint | TBD | TBD | ? |
|
zygapophysial joint | TBD | TBD | ? | ||
2 | zuclomifene | TBD | TBD | ? |
|
zuclomiphene | TBD | TBD | ? |