Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Tokenize, no break on hyphens

  • Short Description: Tokenize, but do not break on hyphens.

  • Full Description:

    Break up a string into an unique list of "words", but not break on hyphens.

    No effect on the -m option. "none" is added at the end of the output.

  • Difference: None

  • Features:
    1. Breaks up the input term into tokens separated by delimiters.
    2. Delimiters include space, tab, and all punctuations but hyphen (-).


  • Symbol: ch

  • Examples:

    shell> lvg -f:ch

    
    the club-foot
    the club-foot|the|2047|16777215|ch|1|
    the club-foot|club-foot|2047|16777215|ch|1|
    
    More examples

  • Implementation Logic:
    1. Utilize Java StringTokenizer class.
    2. Delimiters include space, tab, and all punctuations but hyphen (-).

  • Source Code: ToTokenizeNoHyphens.java

  • Hierarchy: Object -> Transformation -> ToTokenizeNoHyphens