Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Tokenize
Break up a string into an unique list of "words". The definition a word is depends on how the string is tokenized. It is defined to be all tokens that contain only runs of alphanumeric characters. The definition of a word is also dependent upon the minimum number of characters in the run. The default minimum number of characters in the run is two. This is configurable by the -ws:INT global option. There are applications where it is convenient to throw away single character words, and there are times when it is convenient to keep such single character words.
No effect on the -m option. "none" is added at the end of the output.
shell> lvg -f:c the club-foot the club-foot|the|2047|16777215|c|1| the club-foot|club|2047|16777215|c|1| the club-foot|foot|2047|16777215|c|1|More examples