Leading Punctuation Splitter
This splitter is used to process a split by adding a space before leading punctuation if a token contains leading punctuation. Leading punctuation includes: &([{
Split a token in front of leading punctuation.
File Name | Input | Output |
---|---|---|
12134.txt | doppler( | doppler ( |
12271.txt | 1-plug& | 1-plug & |
12353.txt | epilepsy( | epilepsy ( |
12353.txt | volunteers( | volunteers ( |
12706.txt | dr.[ | dr. [ |
18186.txt | test( | test ( |
18341.txt | vain( | vain ( |
2.txt | one( | one ( |
30.txt | folitrax( | folitrax ( |
50.txt | ,[ | , [ |
78.txt | genes[ | genes [ |
Broader Generic Matchers (Qualifiers) | ||
---|---|---|
Matcher | Regular Expression | Examples |
Contains Leading Punctuation | ^.*[&\\(\\[\\{].*$ |
Filters (Specific Exceptions for Each Leading Punctuation) | |||
---|---|---|---|
Leading Punctuation | Filter (Exception) | Regular Expression | Examples |
Ampersand [&] | 1. Abbreviations [A-Z]+&[A-Z]+ | ^[A-Z]+&[A-Z]+$ |
|
Left Parenthesis [(] | 1. contains digits or plus sign [non-space]*([digit]+\+?)[non-space]* | ((\\S)*\\([\\d]+(\\+)?\\)(\\S)*) |
|
2. max or min [non-space]*(max|min)[non-space]* | ((\\S)*\\((max|min)\\)) |
| |
3. contains a single char or plus [non-space]*(+char)[non-space]* | ((\\S)*\\([+\\w]\\)(\\S)*) |
| |
4. parenthetic plural forms [word]+((s|es)|(y(ies))) | ([\\w]+((s\\(es\\))|(y\\(ies\\)))) |
| |
5. after a hyphen [non-space]*-([non-space]*) | ((\\S)*-\\((\\S)*) |
| |
Left Square Bracket [[] | 1. [ [lower] ] [non-space]*[[lower]][non-space]* | (\\S*\\[[a-z]\\]\\S*) |
|
2. leads with tilde or hyphen (tilde|hyphen)[ | ([~\\-]\\[\\S*) |
| |
Left Curly Brace [{] | 1. No exceptions found | $^ | None |