Tokenize
Break up a string into an unique list of "words". The definition a word is depends on how the string is tokenized. It is defined to be all tokens that contain only runs of alphanumeric characters. The definition of a word is also dependent upon the minimum number of characters in the run. The default minimum number of characters in the run is two. This is configurable by the -ws:INT global option. There are applications where it is convenient to throw away single character words, and there are times when it is convenient to keep such single character words.
No effect on the -m option. "none" is added at the end of the output.
shell> lvg -f:c the club-foot the club-foot|the|2047|16777215|c|1| the club-foot|club|2047|16777215|c|1| the club-foot|foot|2047|16777215|c|1|More examples