The SPECIALIST Lexicon

SpVar - Phonetic Tests

I. Introduction

SpVarNorm is used to find spVars from a corpus by normalizing lexical information. The other chracteristic of spVars is having same pronunciation. Serveral Phonetics algorithms are tests for this task.

II. Processes

  • Most of phonetic algorithmis very agreesive and result in high recall and low precision.
  • We perfromed two types of tests: unit test and application test.

III. Results

  • Unit Tests:
    We tested all above algorithms. Soundex, RefinedSoundex, Caverphone 1.0, and Cologne are too aggresive and results in poor precision.

  • Application Tests
    • First Test:
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 min.
      1NormN/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 min.
      2M2ES2867,728364,39664,62214,873423,8370.84940.96080.90170.90842 hr.
      2M3ES2867,728364,87064,37114,399424,0880.85000.96200.90260.90922 hr.
      2C2ES2867,728366,78896,11612,481392,3430.79240.96710.87110.87482 hr.
      2M2CES2867,728363,60957,25915,660431,2000.86400.95870.90890.91602 hr.
      2M3CES2867,728363,59356,95315,676431,5060.86460.95870.90920.91632 hr.

    • 2nd Test (from new spVarNorm):
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 min.
      1NormN/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 min.
      2M2ES2867,728363,95662,97915,313425,4800.85250.95960.90290.9098113 min.
      2M3ES2867,728364,44762,72614,822425,7330.85320.96090.90380.9106114 min.
      2C2ES2867,728366,57494,65412,695393,8050.79480.96650.87230.8763133 min.
      2M2CES2867,728363,22255,61816,047432,8410.86720.95770.91020.917496 min.
      2M3CES2867,728363,20355,31016,066433,1490.86780.95670.91050.917797 min.

    • M2CES and M3CES have very similar results. Considering the recall and software maturity, M2CES model is used to on MEDLINE n-gram distillled set for retrieve LMS candidates.