SPECIALIST Lexicon

Precision, Recall, and F1 Analysis for LMW Candidates from (ACR) Model - Paper On (ACR) matcher

I. Introduction

All multiwords (LMWs) from an interested domain must be identified to find the recall rate. In this analysis, we used LMW candidates from the Parentheic Acronym Pattern matcher (ACR) to calculate precision, recall, and F1 score. The example illustrated below is based on 2015 data.

II. Data-1 (Table-3 in 2016 AMIA paper initial version)

Setup
- Apply Parentheic Acronym Pattern matcher on the MEDLINE n-gram set (2015)
- The lowercased core-terms of these acronym expansions are used for LMW candidates
- 14,400 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
- 13,170 are valid (TP) ; 1,230 are invalid (FP), as shown in case-1 in the table below
- This is used as gold standard for further analysis of other combination of filters and matchers
- Cases 2: test on filters
  In real practice, apply the distilled MEDLINE n-gram set as domain filter instead of applying all 16 filters in sequential
- Cases 3 ~ 5: test on a single matcher
- Cases 6 ~ 7: test on combination of filters and matchers

Results

Case	Description	TP	FP	T. Retrieved	T. Relevant	Precision	Recall	F1
1	Parenthetic Acronym - Gold Standard	13170	1230	14400	13170	0.9146	1.0000	0.9554
Filters or a single matcher
2	Distilled MEDLINE N-gram Set (16 filters)	13165	795	13960	13170	0.9431	0.9996	0.9705
3	Spelling Variant Pattern Matcher	6837	293	7130	13170	0.9589	0.5191	0.6736
4	Metathesaurus CUI Pattern matcher	8678	512	9190	13170	0.9443	0.6589	0.7762
5	EndWord Pattern matcher	1587	108	1695	13170	0.9363	0.1205	0.2135
Combination of filters and matchers
6	SpVar + CUI + Distrilled	5108	129	5237	13170	0.9754	0.3879	0.5550
7	SpVar + CUI + EndWord + Distrilled	703	5	708	13170	0.9929	0.0534	0.1013

III. Data-2 (Table-3 in 2016 AMIA paper final)

Setup
- Same as in Data-1 (completed all n-grams for 2015 MEDLINE n-gram set)
- 16,675 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
- 14,805 are valid (TP - total relevent) ; 1,870 are invalid (FP - total irrelevent), as shown in case-1 in the table below
- This is used as gold standard for further analysis of other combination of filters and matchers

Results

Case	Description	TP	FP	FN	TN	Precision	Recall	F1	Accuracy
1	Parenthetic Acronym - Gold Standard	14805	1870	0	0	0.8879	1.0000	0.9406	0.8879
Filters or a single matcher
2	Distilled MEDLINE N-gram Set (16 filters)	14796	1305	9	565	0.9189	0.9994	0.9575	0.9212
3	Spelling Variant Pattern Matcher	7509	482	7296	1388	0.9397	0.5072	0.6588	0.5336
4	Metathesaurus CUI Pattern matcher	9488	752	5317	1118	0.9266	0.6409	0.7577	0.6360
5	EndWord Pattern matcher (top 20)	1710	180	13095	1690	0.9048	0.1155	0.2049	0.2039
Combination of filters and matchers
6	SpVar + CUI + Distrilled	5510	206	9295	1664	0.9640	0.3722	0.5370	0.4302
7	SpVar + CUI + EndWord (20) + Distrilled	727	11	14078	1859	0.9851	0.0491	0.0935	0.1551

III. Data-3 (Table-2 in 2017 HealthInf paper final)

Setup
- Similar to Data-1 and 2 (completed all n-grams for 2016 MEDLINE n-gram set)
- 17,707 LMW candidates are retrieved and tagged automatically by programs (if known in Lexicon and previous invalid-tags), and then manually by linguists.
- 15,850 are valid (TP - total relevent) ; 1,857 are invalid (FP - total irrelevent), as shown in case-1 in the table below
- This is used as gold standard for further analysis of other combination of filters and matchers

Results

Case	Description	TP	FP	FN	TN	Precision	Recall	F1	Accuracy
1	Parenthetic Acronym - Gold Standard	15850	1857	0	0	0.8951	1.0000	0.9447	0.8951
Filters or a single matcher
2	Distilled MEDLINE N-gram Set (16 filters)	15840	1299	10	558	0.9242	0.9994	0.9603	0.9261
3	Spelling Variant Pattern Matcher	8094	499	7756	1358	0.9419	0.5107	0.6623	0.5338
4	Metathesaurus CUI Pattern matcher	10056	755	5794	1102	0.9302	0.6344	0.7544	0.6301
5	EndWord Pattern matcher (top 20)	1804	178	14046	1679	0.9102	0.1138	0.2023	0.1967
5A	EndWord Pattern matcher (top 33)	2346	251	13504	1606	0.9034	0.1408	0.2544	0.2232
Combination of filters and matchers
6	SpVar + CUI + Distrilled	5892	212	9958	1645	0.9653	0.3717	0.5368	0.4257
7	SpVar + CUI + EndWord (20) + Distrilled	777	11	15073	1846	0.9860	0.0490	0.0934	0.1481
7A	SpVar + CUI + EndWord (33) + Distrilled	992	15	14858	1842	0.9851	0.0626	0.1177	0.1600
8	CUI + EndWord (33) + Distrilled	1766	113	14084	1744	0.9399	0.1114	0.1992	0.1982

IV. Automatic Tagging Model

Input: LMW candidates

Algorithm:

Tag	Notes
valid	known in Lexicon: ${MULTIWORDS}/data/current/inData/inflVars.data.current (inflVars.data from the latest Lexicon) ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.yes
invalid	known in the previous ACR tag: ${MULTIWORDS}/data/current/inData/invalidMwForParAcr.data.current (all invalid LMWs from previous tagging) ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.no
tbd	unknonw: used as LMW candidates ${MULTIWORDS}/data/${YEAR}/outData/7.MatcherParAcr/acronymExp.tag.data.tag.${YEAR}.tbd Sent to linguists for manually tagging

Next Steps:
Use the auto-tags and manual-tags to calculate Precision, Recall, and F1. This is used as gold standard.

The SPECIALIST Lexicon