CSpell

Performance Tests - Ensemble on Training Set

I. Introduction

Performance tests are conducted on different ranking methods of Ensemble Spelling (original code).

II. Setup

  • Program:
    ${C_SPELL}/SpellCorrection/bin/runSpellingAllData
    0 (all data)
    3, 4 (nonword, real-word)
    0,1,2,3,4 (methods)
  • InData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/AllData/
  • OutData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData/LinearWeighted_nw_OUT_*
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData/LinearWeighted_rw_OUT_*

    Backup on:

    • ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData.baseline
    • ${C_SPELL}/PostProcess/data/Test/Baseline/TestData/9_Baseline/Offical/*

III. Performance Results

  • Non-word Only

    MethodsOriginal GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    0. PreProcess 289|347|814
    0.8329|0.3550|0.4978
    289|347|774
    0.8329|0.3734|0.5156
    1. Orthographic 495|824|814
    0.6007|0.6081|0.6044
    511|824|774
    0.6201|0.6602|0.6395
    2. Corpus Frequency 361|810|814
    0.4457|0.4435|0.4446
    366|810|774
    0.4519|0.4729|0.4621
    3. Word Embedding 350|807|814
    0.4337|0.4300|0.4318
    358|807|774
    0.4436|0.4625|0.4529
    4. Ensemble 530|825|814
    0.6424|0.6511|0.6467
    552|825|774
    0.6691|0.7132|0.6904

  • Read-word Included (Use Ensemble option works for Real-word)

    MethodsOriginal GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Ensemble (non-word) 531|825|926
    0.6436|0.5734|0.6065
    556|825|964
    0.6739|0.5768|0.6216
    Ensemble (real-Word) 498|718|926
    0.6936|0.5378|0.6058
    517|718|964
    0.7201|0.5363|0.6147