Lazy Tokenizer
I. Introduction
This page describes the lazy implementation of tokenizer. "Lazy" means conduct process until it needs to be processed for faster speed performance. In CSpell, the input text is tokenized to words and processed sequentially. A lazy implementation of tokenization on punctuation (delay tokenizing on punctuation until the last moment) with coreTerm class were used to avoid unnecessary computation for tokenization and assembly on punctuation. This implementation save time and easier to maintain. It avoid unnecessary tokenization and fit perfectly with Java 8 stream operation.
II. Source code
TokenObj.java
TokenUtil.java
TextObj.java
TermUtil.java
III. Design and Algorithm