Precision, Recall, and F1 Analysis for LMW Candidates from (ACR) Model - Paper On (ACR) matcher
I. Introduction
All multiwords (LMWs) from an interested domain must be identified to find the recall rate. In this analysis, we used LMW candidates from the Parentheic Acronym Pattern matcher (ACR) to calculate precision, recall, and F1 score. The example illustrated below is based on 2015 data.
II. Data-1 (Table-3 in 2016 AMIA paper initial version)
Case | Description | TP | FP | T. Retrieved | T. Relevant | Precision | Recall | F1 |
---|---|---|---|---|---|---|---|---|
1 | Parenthetic Acronym - Gold Standard | 13170 | 1230 | 14400 | 13170 | 0.9146 | 1.0000 | 0.9554 |
Filters or a single matcher | ||||||||
2 | Distilled MEDLINE N-gram Set (16 filters) | 13165 | 795 | 13960 | 13170 | 0.9431 | 0.9996 | 0.9705 |
3 | Spelling Variant Pattern Matcher | 6837 | 293 | 7130 | 13170 | 0.9589 | 0.5191 | 0.6736 |
4 | Metathesaurus CUI Pattern matcher | 8678 | 512 | 9190 | 13170 | 0.9443 | 0.6589 | 0.7762 |
5 | EndWord Pattern matcher | 1587 | 108 | 1695 | 13170 | 0.9363 | 0.1205 | 0.2135 |
Combination of filters and matchers | ||||||||
6 | SpVar + CUI + Distrilled | 5108 | 129 | 5237 | 13170 | 0.9754 | 0.3879 | 0.5550 |
7 | SpVar + CUI + EndWord + Distrilled | 703 | 5 | 708 | 13170 | 0.9929 | 0.0534 | 0.1013 |
III. Data-2 (Table-3 in 2016 AMIA paper final)
Case | Description | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|
1 | Parenthetic Acronym - Gold Standard | 14805 | 1870 | 0 | 0 | 0.8879 | 1.0000 | 0.9406 | 0.8879 |
Filters or a single matcher | |||||||||
2 | Distilled MEDLINE N-gram Set (16 filters) | 14796 | 1305 | 9 | 565 | 0.9189 | 0.9994 | 0.9575 | 0.9212 |
3 | Spelling Variant Pattern Matcher | 7509 | 482 | 7296 | 1388 | 0.9397 | 0.5072 | 0.6588 | 0.5336 |
4 | Metathesaurus CUI Pattern matcher | 9488 | 752 | 5317 | 1118 | 0.9266 | 0.6409 | 0.7577 | 0.6360 |
5 | EndWord Pattern matcher (top 20) | 1710 | 180 | 13095 | 1690 | 0.9048 | 0.1155 | 0.2049 | 0.2039 |
Combination of filters and matchers | |||||||||
6 | SpVar + CUI + Distrilled | 5510 | 206 | 9295 | 1664 | 0.9640 | 0.3722 | 0.5370 | 0.4302 |
7 | SpVar + CUI + EndWord (20) + Distrilled | 727 | 11 | 14078 | 1859 | 0.9851 | 0.0491 | 0.0935 | 0.1551 |
III. Data-3 (Table-2 in 2017 HealthInf paper final)
Case | Description | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|---|---|
1 | Parenthetic Acronym - Gold Standard | 15850 | 1857 | 0 | 0 | 0.8951 | 1.0000 | 0.9447 | 0.8951 |
Filters or a single matcher | |||||||||
2 | Distilled MEDLINE N-gram Set (16 filters) | 15840 | 1299 | 10 | 558 | 0.9242 | 0.9994 | 0.9603 | 0.9261 |
3 | Spelling Variant Pattern Matcher | 8094 | 499 | 7756 | 1358 | 0.9419 | 0.5107 | 0.6623 | 0.5338 |
4 | Metathesaurus CUI Pattern matcher | 10056 | 755 | 5794 | 1102 | 0.9302 | 0.6344 | 0.7544 | 0.6301 |
5 | EndWord Pattern matcher (top 20) | 1804 | 178 | 14046 | 1679 | 0.9102 | 0.1138 | 0.2023 | 0.1967 |
5A | EndWord Pattern matcher (top 33) | 2346 | 251 | 13504 | 1606 | 0.9034 | 0.1408 | 0.2544 | 0.2232 |
Combination of filters and matchers | |||||||||
6 | SpVar + CUI + Distrilled | 5892 | 212 | 9958 | 1645 | 0.9653 | 0.3717 | 0.5368 | 0.4257 |
7 | SpVar + CUI + EndWord (20) + Distrilled | 777 | 11 | 15073 | 1846 | 0.9860 | 0.0490 | 0.0934 | 0.1481 |
7A | SpVar + CUI + EndWord (33) + Distrilled | 992 | 15 | 14858 | 1842 | 0.9851 | 0.0626 | 0.1177 | 0.1600 |
8 | CUI + EndWord (33) + Distrilled | 1766 | 113 | 14084 | 1744 | 0.9399 | 0.1114 | 0.1992 | 0.1982 |
IV. Automatic Tagging Model
Tag | Notes |
---|---|
valid |
|
invalid |
|
tbd |
|