Ending Punctuation Splitter
This splitter is used to process a split by adding a space after ending punctuation if a token contains ending punctuation. Ending punctuation includes: .?!,:;&)]}
Split a token in front of ending punctuation.
File Name | Input | Output |
---|---|---|
10023.txt | down.please | down. please |
10286.txt | ...my | ... my |
10004.txt | cancer?if | cancer? if |
11186.txt | ?pls | ? please |
97.txt | suggestions?thanks | suggestions? thanks |
53.txt | hello!can | hello! can |
11186.txt | ,she | , she |
16823.txt | :by | : by |
22.txt | ;syrinx | ; syrinx |
2.txt | )why | ) why |
Broader Generic Matchers | ||
---|---|---|
Matcher | Regular Expression | Examples |
Contains Ending Punctuation | ^.*[\\.\\?!,;:&\\)\\]\\}].*$ | |
Email (false) | ^[\\w!#$%&'*+-/=?^_`{|}~]+@(\\w+(\\.\\w+)*(\\.(gov|com|org|edu|mil|net)))$ |
|
Url (false) | ^((ftp|http|https|file)://)?(\\w+(\\.\\w+)*(\\.(gov|com|org|edu|mil|net|uk)).*)$ |
|
Pure digit or punctuation (false) | ^([\\W_\\d&&\\S]+)$ |
|
Filters (Specific Exceptions for Each Ending Punctuation) | |||
---|---|---|---|
Ending Punctuation | Filter (Exception) | Regular Expression | Examples |
Period [.] | 1. Plural form | (.*\\.s) |
|
2. surrounded by digit [char]*[digit].[digit][char]* | ((\\w*\\d\\.\\d\\w*)+) |
| |
3. surrounded by single characters [single non-digit].[single non-digit]? | ((\\D\\.)+\\D?) |
| |
4. followed by a hyphen [word]*.-[word]* | (\\w*\\.-\\w*) |
| |
5. followed by a quote [char]*.['"] | (.*\\.['\"]) |
| |
Question Mark [?] | 1. followed by a quote [char]*?['"] | (.*\\?['\"]) |
|
Exclamation Mark [!] | 1. followed by a quote [char]*!['"] | (.*!['\"]) |
|
Comma [,] | 1. digit group separator [digit]+,[digit]{3} | (\\d+(,[\\d]{3})+) |
|
Colon [:] | 1. ratio [digit]+:[digit]+ | (\\d+:\\d+) |
|
Semicolon [;] | 1. No exceptions found | $^ | None |
Ampersand [&] | 1. Abbreviations [A-Z]+&[A-Z]+ | [A-Z]+&[A-Z]+ |
|
Right Parenthesis [)] | 1. single char surrounded by parenthesis [non-space]*([+char])[non-space]* | ((\\S)*\\([+\\w]\\)(\\S)*) |
|
2. chars surrounded by parenthesis and followed by a hyphen [non-space]*(char+)-[non-space]* | ((\\S)*\\([+\\w]+\\)-(\\S)*) |
| |
3. digit surrounded by parenthesis [non-space]*(digit+)[non-space]* | ((\\S)*\\(\\d+\\)(\\S)*) |
| |
Right Square Bracket []] | 1. [digit]+[Upper] surrounded by [] [non-space]*[[digit]+[Upper]][non-space]* | (\\S*\\[\\d+[A-Z]\\]\\S*) |
|
2. [lower] surrounded by [] [Upper]+ | (\\S*\\[[a-z]\\]\\S*) |
| |
Right Curly Brace [}] | 1. No exceptions found | $^ | None |