Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Ending Punctuation Splitter
This splitter is used to process a split by adding a space after ending punctuation if a token contains ending punctuation. Ending punctuation includes: .?!,:;&)]}
Split a token in front of ending punctuation.
File Name | Input | Output |
---|---|---|
10023.txt | down.please | down. please |
10286.txt | ...my | ... my |
10004.txt | cancer?if | cancer? if |
11186.txt | ?pls | ? please |
97.txt | suggestions?thanks | suggestions? thanks |
53.txt | hello!can | hello! can |
11186.txt | ,she | , she |
16823.txt | :by | : by |
22.txt | ;syrinx | ; syrinx |
2.txt | )why | ) why |
Broader Generic Matchers | ||
---|---|---|
Matcher | Regular Expression | Examples |
Contains Ending Punctuation | ^.*[\\.\\?!,;:&\\)\\]\\}].*$ | |
Email (false) | ^[\\w!#$%&'*+-/=?^_`{|}~]+@(\\w+(\\.\\w+)*(\\.(gov|com|org|edu|mil|net)))$ |
|
Url (false) | ^((ftp|http|https|file)://)?(\\w+(\\.\\w+)*(\\.(gov|com|org|edu|mil|net|uk)).*)$ |
|
Pure digit or punctuation (false) | ^([\\W_\\d&&\\S]+)$ |
|
Filters (Specific Exceptions for Each Ending Punctuation) | |||
---|---|---|---|
Ending Punctuation | Filter (Exception) | Regular Expression | Examples |
Period [.] | 1. Plural form | (.*\\.s) |
|
2. surrounded by digit [char]*[digit].[digit][char]* | ((\\w*\\d\\.\\d\\w*)+) |
| |
3. surrounded by single characters [single non-digit].[single non-digit]? | ((\\D\\.)+\\D?) |
| |
4. followed by a hyphen [word]*.-[word]* | (\\w*\\.-\\w*) |
| |
5. followed by a quote [char]*.['"] | (.*\\.['\"]) |
| |
Question Mark [?] | 1. followed by a quote [char]*?['"] | (.*\\?['\"]) |
|
Exclamation Mark [!] | 1. followed by a quote [char]*!['"] | (.*!['\"]) |
|
Comma [,] | 1. digit group separator [digit]+,[digit]{3} | (\\d+(,[\\d]{3})+) |
|
Colon [:] | 1. ratio [digit]+:[digit]+ | (\\d+:\\d+) |
|
Semicolon [;] | 1. No exceptions found | $^ | None |
Ampersand [&] | 1. Abbreviations [A-Z]+&[A-Z]+ | [A-Z]+&[A-Z]+ |
|
Right Parenthesis [)] | 1. single char surrounded by parenthesis [non-space]*([+char])[non-space]* | ((\\S)*\\([+\\w]\\)(\\S)*) |
|
2. chars surrounded by parenthesis and followed by a hyphen [non-space]*(char+)-[non-space]* | ((\\S)*\\([+\\w]+\\)-(\\S)*) |
| |
3. digit surrounded by parenthesis [non-space]*(digit+)[non-space]* | ((\\S)*\\(\\d+\\)(\\S)*) |
| |
Right Square Bracket []] | 1. [digit]+[Upper] surrounded by [] [non-space]*[[digit]+[Upper]][non-space]* | (\\S*\\[\\d+[A-Z]\\]\\S*) |
|
2. [lower] surrounded by [] [Upper]+ | (\\S*\\[[a-z]\\]\\S*) |
| |
Right Curly Brace [}] | 1. No exceptions found | $^ | None |