Factor Process
I. Introduction
We exam each error types (training set) to:
- Find the cause of error types
- Fix the error types
- Enhance the data or algorithm if it is a generic pattern
- Correct the gold standard (if that is the cause)
- Rerun the program
II. Detail Process
- Tokens are not in Brat annotation data (correct spelling)
- A2.2.1. Not in checkDic, corrected wrong, by dictionary
=> Need to add these words to checkDic
=> Combo-6 (best performance) is chosen as the base model for further enhancement
Model & Enhancement | TP|Ret|Rel | P|R|F
|
---|
Combo-6 | 543|737|814 | 0.7368|0.6671|0.7002
|
Add Shorthand | |
|
- A2.2.2. Not in checkDic, corrected wrong, by preCorrection
=> TBD (this analysis focus on dictionary-based correction first)
- Tokens are in Brat annotation data (spelling error)
- B1.1. PreCorr (T)
- B1.2. PreCorr (F)
- B2.1. DicCorr (T)
- B2.2. DicCorr (F)
- B2.2.1. Not detect, real-word (error tag)
- B2.2.2. Not detect, spelling error (non-word)
- B2.2.3. Detect, not candidates by edit-distance
- B2.2.4. Detect, not candidates by suggestion Dic
- B2.2.5. Detect, not candidates by multi-corrections
- B2.2.6. Detect, candidates, wrong (not top) rank
- B2.2.7. Detect, candidates, wrong top rank