Lexical Tools

Tokenize, no break on hyphens

  • Short Description: Tokenize, but do not break on hyphens.

  • Full Description:

    Break up a string into an unique list of "words", but not break on hyphens.

    No effect on the -m option. "none" is added at the end of the output.

  • Difference: None

  • Features:
    1. Breaks up the input term into tokens separated by delimiters.
    2. Delimiters include space, tab, and all punctuations but hyphen (-).


  • Symbol: ch

  • Examples:

    shell> lvg -f:ch

    
    the club-foot
    the club-foot|the|2047|16777215|ch|1|
    the club-foot|club-foot|2047|16777215|ch|1|
    
    More examples

  • Implementation Logic:
    1. Utilize Java StringTokenizer class.
    2. Delimiters include space, tab, and all punctuations but hyphen (-).

  • Source Code: ToTokenizeNoHyphens.java

  • Hierarchy: Object -> Transformation -> ToTokenizeNoHyphens