The SPECIALIST Lexicon

SpVar Normalization Development Notes

I. Introduction

An iterative progresses were developed to improve precision and recall on SpVarNorm algorithm by:

  • Run SpVarNorm on Lexicon.2015
  • Check all False-Positive instanaces
  • Enhance the spVarNorm algorithm and repeat these three steps

II. Process

  • Enhanced Norm to increase precision by removing genetive only at the end of term (not anywhere in the term). This is used in AMIA paper final submission.

  • Test on False Positives (for increasing precision) - used AMIA final submission as baseline
    StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
    0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 Min.
    1Baseline
    AMIA-Final
    N/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 Min.
    1.1Genetive SpVarsN/A867,728303,8181,75975,451486,7000.99420.80110.88730.91101 Min.
    1.2Dash SpVarsThe False-Postive is very small (199), no enhanced algorithm is implemented.
    1.3Space SpVarsThe False-Postive is very small (41), no enhanced algorithm is implemented.
    1.4Mixed case SpVarsThese False-Postive is actually a valid (TP) due to the error in gold Standard

  • Test on not to remove genitive at all in Norm
    StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1Accuracy
    0GoldStdN/A867,728379,77600487,9521.00001.00001.00001.0000
    1NormN/A867,728315,24110,52064,535477,4320.96770.83010.89360.9135
    1.1Norm-no remove genetive at allN/A867,728302,5801,62077,196486,3320.99470.79670.88480.9092

III. Discussion

  • We want a Norm with very high precision (even the recall is lower).
  • The recall can be improved by applying phonetic algorithm, such as Metaphone, Caverphone, etc.
  • If the precision is low at the begining (Norm), it would keep going down when we apply phonetic algorithm.