CSpell

Performance Tests for Ensemble

I. Introduction

Performance tests on the test set are conducted on the Ensemble Spelling as the baseline to compare to CSpell.

II. Setup

  • Program:
    ${C_SPELL}/SpellCorrection/bin/runSpellingAllData
    4 (CSpell data - NER)
    3, 4 (nonword, real-word)
    4 (methods)
  • InData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/
  • OutData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/LinearWeighted_nw_OUT_4
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/LinearWeighted_rw_OUT_4

    Backup on:

    • ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData.baseline
    • ${C_SPELL}/PostProcess/data/Test/NewTest/TestData/9_Baseline/Offical/*

III. Performance Results

  • Non-word Only GoldStd

    Methods Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    4. Ensemble 559|966|974
    0.5787|0.5739|0.5763

  • Read-word Included GoldStd(Only Ensemble option works for Real-Word)

    MethodsRevised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    4. Ensemble (NW) 560|966|1178
    0.5797|0.4754|0.5224
    4. Ensemble (RW) 520|810|1178
    0.6420|0.4414|0.5231

The results of non-word and real-word options from Ensemble seems do not have too much difference.