Test on Lexicon: For AMIA Final Submission
Norm, MES, and ES are used in a sequential order to retrieve the most spelling variant groups. This model is tested on Lexicon (inflVars.data) and LRSPL for the recall, precisino, F1, and accuracy. The details are shown as follows:
2015
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GoldStd | N/A | 867,728 | 379,776 | 0 | 0 | 487,952 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1 min. |
1 | Norm | N/A | 867,728 | 315,241 | 10,520 | 64,535 | 477,432 | 0.9677 | 0.8301 | 0.8936 | 0.9135 | 2 min. |
2 | MES | 2 | 867,728 | 371,982 | 157,088 | 7,794 | 330,864 | 0.7031 | 0.9795 | 0.8186 | 0.8100 | 6 hr. |
3 | ES | 1 | 867,728 | 377,158 | 270,373 | 2,618 | 217,579 | 0.5525 | 0.9931 | 0.7343 | 0.6854 | 26 hr. |
4 | MES | 3 | 867,728 | 377,515 | 284,538 | 2,261 | 203,414 | 0.5702 | 0.9940 | 0.7247 | 0.6695 | 8 min. |
5 | ES | 2 | 867,728 | 378,641 | 336,953 | 1,135 | 150,999 | 0.5291 | 0.9970 | 0.6913 | 0.6104 | 29 hr. |
6 | MES | 4 | 867,728 | 378,718 | 339,597 | 1,058 | 148,355 | 0.5272 | 0.9972 | 0.6898 | 0.6074 | 2 min. |
Step | Methods | Edit Distance | Sample No. | ret-rel | ret-irrel | notRet-rel | notRet-irrel | Precision | Recall | F1 | Accuracy | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GoldStd | N/A | 867,728 | 379,269 | 0 | 0 | 488,459 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1 min. |
1 | Norm | N/A | 867,728 | 305,309 | 3,495 | 73,960 | 484,964 | 0.9887 | 0.8050 | 0.8874 | 0.9107 | 1 min. |
2 | MES | 2 | 867,728 | 371,385 | 156,648 | 7,884 | 331,811 | 0.7033 | 0.9792 | 0.8187 | 0.8104 | 7 hr. |
3 | ES | 1 | 867,728 | 376,646 | 270,881 | 2,623 | 217,578 | 0.5817 | 0.9931 | 0.7336 | 0.6848 | 23 hr. |
4 | MES | 3 | 867,728 | 377,004 | 285,046 | 2,265 | 203,413 | 0.5694 | 0.9940 | 0.7241 | 0.6689 | 8 min. |
5 | ES | 2 | 867,728 | 378,134 | 337,461 | 1,135 | 150,998 | 0.5284 | 0.9970 | 0.6907 | 0.6098 | 26 hr |
6 | MES | 4 | 867,728 | 378,211 | 340,105 | 1,058 | 148,354 | 0.5265 | 0.9972 | 0.6892 | 0.6068 | 2 min. |
Both tests show this model have high recall with low precision after 6 steps. High recall rate was required for the task in AMIA.2016 multiword paper because it was used as one of the fitler to retrieve LMW candidate list. However, this model need to be improved for the following: