CSpell

CSpell

Factor Process

I. Introduction

We exam each error types (training set) to:

Find the cause of error types
Fix the error types
- Enhance the data or algorithm if it is a generic pattern
- Correct the gold standard (if that is the cause)
Rerun the program

II. Detail Process

Tokens are not in Brat annotation data (correct spelling)
- A2.2.1. Not in checkDic, corrected wrong, by dictionary
  => Need to add these words to checkDic
  => Combo-6 (best performance) is chosen as the base model for further enhancement
  
  Model & Enhancement TP|Ret|Rel P|R|F
  Combo-6 543|737|814 0.7368|0.6671|0.7002
  Add Shorthand
- A2.2.2. Not in checkDic, corrected wrong, by preCorrection
  => TBD (this analysis focus on dictionary-based correction first)
Tokens are in Brat annotation data (spelling error)
- B1.1. PreCorr (T)
- B1.2. PreCorr (F)
- B2.1. DicCorr (T)
- B2.2. DicCorr (F)
  - B2.2.1. Not detect, real-word (error tag)
  - B2.2.2. Not detect, spelling error (non-word)
  - B2.2.3. Detect, not candidates by edit-distance
  - B2.2.4. Detect, not candidates by suggestion Dic
  - B2.2.5. Detect, not candidates by multi-corrections
  - B2.2.6. Detect, candidates, wrong (not top) rank
  - B2.2.7. Detect, candidates, wrong top rank