The SPECIALIST Lexicon

Precision, Recall, and F1 Analysis for LMW Candidates from SpVar Model - Paper on SpVar

I. Introduction

In the previous study (AMIA paper), multiple MES and ES models are used to retrieved SpVar from the distilled MEDLINE n-gram set and then use as filter to retrieve LMWs. This model is OK with some issues:

  • Performance and Frequency:
    Due to the complexity of the algorithm, the performance is very slow. Thus, we have to reduce the size of MEDLINE n-gram set by applying high frequency (WC = 150). With this reduction, it took weeks for the program to complete the process.
  • The precision on the bench mark test on Lexicon.2015 is only 52.65% (even the recall reach 99.72%). This is OK to used as additonal filter to the high precision (ACR) matcher. However, it can not be used as stand long matcher to retrieve high precision LMW candidates.
  • LMWs with WC less than 150 are missing

An improved model, M2CES, is developed to address these issues.

II. Development

  • Compare to Table-3 in 2016 (ACR) paper final
    CaseDescriptionTPFPT. RetrievedT. RelevantFNTNPrecisionRecallF1Accuracy
    31Parenthetic Acronym - Gold Standard14,8051,87016,67514,805000.88791.00000.94060.8879
    Filters or a single matcher
    33SpVar Matcher - MES + ES, WC: 1507,5094827,99114,8057,2961,3880.93970.50720.65880.5336
    33ASpVar Matcher - M2CES, WC: 1503,6952263,92114,80511,1101,6440.94240.24960.39460.3202
    33BSpVar Matcher - M2CES, WC: 1004,4852834,76814,80510,3201,5870.94060.30290.45830.3641
    33CSpVar Matcher - M2CES, WC: 505,7484216,16914,8059,0571,4490.93180.38820.54810.4316
    33DSpVar Matcher - M2CES, WC: 306,6825137,19514,8058,1231,3570.92870.45130.60750.4821
    Combination: SpVar + CUI + Distrilled
    36SpVar - MES + ES, WC: 1505,5102065,71614,8059,2951,6640.96400.37220.53700.4302
    36ASpVar - M2CES, WC: 1502,7931062,89914,80512,0121,7640.96340.18870.31550.2733
    36BSpVar - M2CES, WC: 1003,3191183,43714,80511,4861,7520.96570.22420.36390.3041
    36CSpVar - M2CES, WC: 504,1021624,26414,80510,7031,7080.96200.27710.43020.3484
    36DSpVar - M2CES, WC: 304,6991894,88814,80510,1061,6810.96130.31740.47720.3826
    Combination: SpVar + CUI + Distrilled
    37SpVar - MES + ES, WC: 1507271173814,80514,0781,8590.98510.04910.09350.1551
    37ASpVar - M2CES, WC: 150354535914,80514,4511,8650.98610.02390.04670.1331
    37BSpVar - M2CES, WC: 100427543214,80514,3781,8650.98840.02880.05600.1375
    37CSpVar - M2CES, WC: 50568857614,80514,2371,8620.98610.03840.07390.1457
    37DSpVar - M2CES, WC: 306541066414,80514,1511,8600.98490.04420.08460.1508