Strip Punctuation
- Short Description:
Strip punctuation.
- Full Description:
This flow is used to strips punctuation from the input term. The stripped items are not replaced by spaces. Punctuations are defined in Java Character class and include:
- DASH_PUNCTUATION (20): -
- START_PUNCTUATION (21): ( { [
- END_PUNCTUATION (22): ) } ]
- CONNECTOR_PUNCTUATION (23): _
- OTHER_PUNCTUATION (24): ! @ # % & * \ : ; " ' , . ? /
- MATH_SYMBOL (25): ~ + = | < >
- CURRENCY_SYMBOL (26): $
- MODIFIER_SYMBOL (27): ` ^
No effect on the -m option. "none" is added at the end of the output.
- Difference:
- Java version trims output terms (remove spaces at the beginning and ending of the term).
- Different result for testing diacritics, such as \345\346... in the unit test.
- Features:
- Strip a character from the input term if the character belongs to above list.
- Symbol:
p
- Examples:
shell> lvg -f:p
St. John's
St. John's|St Johns|2047|16777215|p|1|
More examples
Implementation Logic:
- Go through every character in the input term; strip it if the character is a punctuation.
Source Code: ToStripPunctuation.java
Hierarchy: Object -> Transformation -> ToStripPunctuation