CSpell

Factor Process

I. Introduction

We exam each error types (training set) to:

  • Find the cause of error types
  • Fix the error types
    • Enhance the data or algorithm if it is a generic pattern
    • Correct the gold standard (if that is the cause)
  • Rerun the program

II. Detail Process

  • Tokens are not in Brat annotation data (correct spelling)
    • A2.2.1. Not in checkDic, corrected wrong, by dictionary
      => Need to add these words to checkDic
      => Combo-6 (best performance) is chosen as the base model for further enhancement

      Model & EnhancementTP|Ret|RelP|R|F
      Combo-6543|737|8140.7368|0.6671|0.7002
      Add Shorthand

    • A2.2.2. Not in checkDic, corrected wrong, by preCorrection
      => TBD (this analysis focus on dictionary-based correction first)

  • Tokens are in Brat annotation data (spelling error)
    • B1.1. PreCorr (T)
    • B1.2. PreCorr (F)
    • B2.1. DicCorr (T)
    • B2.2. DicCorr (F)
      • B2.2.1. Not detect, real-word (error tag)
      • B2.2.2. Not detect, spelling error (non-word)
      • B2.2.3. Detect, not candidates by edit-distance
      • B2.2.4. Detect, not candidates by suggestion Dic
      • B2.2.5. Detect, not candidates by multi-corrections
      • B2.2.6. Detect, candidates, wrong (not top) rank
      • B2.2.7. Detect, candidates, wrong top rank