Leading Digit Splitter
This splitter is used to process a split by adding a space after the leading digits if a token leads with digits.
Split a token at the end of leading digits.
File Name | Input | Output |
---|---|---|
73.txt | 4miscarriages | 4 miscarriages |
10349.txt | 20years | 20 years |
11579.txt | 29yrs | 29 yrs |
10349.txt | 1.5years | 1.5 years |
13082.txt | 3weeks | 3 weeks |
13175.txt | 50mg | 50 mg |
Matchers | ||
---|---|---|
Matcher | Regular Expression | Examples |
Leads with digit(s) | ^(\\d*\\.?\\d+)([a-zA-Z]{2,})(.*)$ |
|
Filters (Exceptions) | ||
---|---|---|
Filter (Exception) | Regular Expression | Examples |
1. ordinal number | ^((\\d*)(1st|2nd|3rd))|((\\d+)(th))$ |
|
2. [single chars] after the leading digit | ^(\\d+)([a-zA-Z])$ |
|
3. [Upper], [Upper or digit]* after leading digit | ^(\\d+)([A-Z]+)([A-Z0-9]*)$" |
|
4. [Upper, lower]+, [-], [word]* after leading digit | ^(\\d+)([a-zA-Z]+)-(\\w*)$ |
|
5. [Upper, lower], [punc, digit]* after leading digit | ^(\\d+)([a-zA-Z])([\\p{Punct}\\d]*)$ |
|