The ensemble spelling correction (by Halil) is used as the baseline for this project. The reviewed status and suggesting plan for the original source code are described as follows:
| Original Java Code | Notes | Module | Status | Plan
|
---|
1 | SpellingPreProcessor.java
|
- Preprocess for input text (contractions, punctuation, split digit, etc.)
- Decomposition:
- PreProcXml.java (refactoring with modifications)
- PreProcContractions.java (rewrite)
- PreProcSentence.java (redesign)
- PreProcSplit.java (refactoring with modifications)
| PreProcessor
|
| Rewrite or refactoring
|
2 | DictionaryBasedSpellChecker.java
|
- Uses jazzy for the dictionary and suggestion
- Planned to be replaced by a better mechanism.
| Dictionary
|
| Rewrite
|
3 | SpellingCandidateGenerator.java
|
- It uses exhausted mechanism (slow) to get candidates
- Method, getLevenshteinEdits(), is used to get all candidates from the dictionary
- Does not use jazzySpellChecker
- decomposition:
- EditDistance.java (refactoring)
- OverLapUtil.java (refactoring)
- MergeUtil.java (refactoring)
- SplitUtil.java (rewrite for better performance and bug fixed)
| Candidate
|
| Rewrite or refactoring
|
4 | CorpusFrequencyCounts.java
|
- Get frequency score
- A bug found: getUnigramScore( )
| Ranking
|
| Rewrite
|
5 | Word2Vector.java
|
- Used for WordEmbedding algorithm (contextual Similarity)
| Ranking
|
| TBD
|
6 | SpellCorrectionEvaluator.java
|
- OK code to use it as is.
- Suggested to rewrite for simplicity and speed.
- Decomposition:
- Span.java
- TokenSpan.java
- TokenSpanUtil.java
- CoreNLPWrapper.java
- FileUtils.java
| Evaluator
|
| Rewrite
|
7 | diff_match_patch.java
|
- Library codes for comparing two text
- Nice to use it. Plan to rewrite for simplicity, maintenance and speed
| Evaluator
|
| Rewrite
|
8 | SpellCorrection.java
| Interface, might not need it
| System
|
| Remove or redesign
|
9 | LinearWeightedEnsembleSpellCorrection.java
|
- Add a new class for configurable setting
- ESpellCorrection.java (process multiple files)
- Decomposition:
- EnsembleSpellCorrectText.java (correct 1 text file)
- EnsembleSpellPreProcess.java
- EnsembleSpellPreProcessObj.java
- EnsembleSpellSpans.java (convert input text to spanText)
- EnsembleSpellProcess.java
- EnsembleSpellCorrectSentence.java (correct a sentence)
- EnsembleSpellFindCandidates.java (find candidates for a instance)
- EnsembleSpellCandidates.java (find candidate for a token)
- EnsembleSpellMergeCandidates.java (find merge candidate for a token)
- EnsembleSpellFindRanking.java (find best ranked candidate for a instance)
| System
|
| Rewrite
|
10 | JazzySpellCorrection.java
| Use ASpell (Jazzy) to correct text
| System
|
| Remove
|
11 | ESpellCorrection.java
| Use ESpell to correct text
| System
|
| Remove
|