The SPECIALIST Lexicon

Test on Lexicon: For AMIA Final Submission

Norm, MES, and ES are used in a sequential order to retrieve the most spelling variant groups. This model is tested on Lexicon (inflVars.data) and LRSPL for the recall, precisino, F1, and accuracy. The details are shown as follows:

  • Setup:
    • Test name: LexTest.Amia.2.Final
    • Input File: (Lexicon.2015)
      • inflVars.data
      • LRSPL
    • Software:
      • GoldStd:
        Add inflectional spelling variants as valid spVars.
        • 1st attemp: GetGoldStdFromLex.java.2.InflSpVarCode
        • 2ns Attemp: GetGoldStdFromLex.java.3.InflSpVarFile
          PhoneticExceptionObj.java
          PhoneticExceptionPattern.java
      • Norm: SpVarNorm.java.3.AmiaFinal
      • MES: GroupSpVarByMES.java
      • ES: GroupSpVarByES.java

  • Results:

    2015

    • Changed Gold Standard from the initial submission (base spVar) to final submission (base spVars and inflSpVars)
    • Tag inflectional spelling variants as true

    • Results fom 1st attemp (identified inflSpVar in java code)
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,77600487,9521.00001.00001.00001.00001 min.
      1NormN/A867,728315,24110,52064,535477,4320.96770.83010.89360.91352 min.
      2MES2867,728371,982157,0887,794330,8640.70310.97950.81860.81006 hr.
      3ES1867,728377,158270,3732,618217,5790.55250.99310.73430.685426 hr.
      4MES3867,728377,515284,5382,261203,4140.57020.99400.72470.66958 min.
      5ES2867,728378,641336,9531,135150,9990.52910.99700.69130.610429 hr.
      6MES4867,728378,718339,5971,058148,3550.52720.99720.68980.60742 min.

    • Results fom 2nd attemp (identified inflSpVar from file, generated separately)
      StepMethodsEdit DistanceSample No.ret-relret-irrelnotRet-relnotRet-irrelPrecisionRecallF1AccuracyNotes
      0GoldStdN/A867,728379,26900488,4591.00001.00001.00001.00001 min.
      1NormN/A867,728305,3093,49573,960484,9640.98870.80500.88740.91071 min.
      2MES2867,728371,385156,6487,884331,8110.70330.97920.81870.81047 hr.
      3ES1867,728376,646270,8812,623217,5780.58170.99310.73360.684823 hr.
      4MES3867,728377,004285,0462,265203,4130.56940.99400.72410.66898 min.
      5ES2867,728378,134337,4611,135150,9980.52840.99700.69070.609826 hr
      6MES4867,728378,211340,1051,058148,3540.52650.99720.68920.60682 min.

  • Discussion:

    Both tests show this model have high recall with low precision after 6 steps. High recall rate was required for the task in AMIA.2016 multiword paper because it was used as one of the fitler to retrieve LMW candidate list. However, this model need to be improved for the following:

    • Precision
    • F1
    • Accuracy: In most applications for just getting a spVar, we will need a high accuracy
    • RUnning time: need to improve for running a big data like MEDLINE n-gram set.