Leading Digit Splitter
This splitter is used to process a split by adding a space after the leading digits if a token leads with digits.
Split a token at the end of leading digits.
| File Name | Input | Output |
|---|---|---|
| 73.txt | 4miscarriages | 4 miscarriages |
| 10349.txt | 20years | 20 years |
| 11579.txt | 29yrs | 29 yrs |
| 10349.txt | 1.5years | 1.5 years |
| 13082.txt | 3weeks | 3 weeks |
| 13175.txt | 50mg | 50 mg |
| Matchers | ||
|---|---|---|
| Matcher | Regular Expression | Examples |
| Leads with digit(s) | ^(\\d*\\.?\\d+)([a-zA-Z]{2,})(.*)$ |
|
| Filters (Exceptions) | ||
|---|---|---|
| Filter (Exception) | Regular Expression | Examples |
| 1. ordinal number | ^((\\d*)(1st|2nd|3rd))|((\\d+)(th))$ |
|
| 2. [single chars] after the leading digit | ^(\\d+)([a-zA-Z])$ |
|
| 3. [Upper], [Upper or digit]* after leading digit | ^(\\d+)([A-Z]+)([A-Z0-9]*)$" |
|
| 4. [Upper, lower]+, [-], [word]* after leading digit | ^(\\d+)([a-zA-Z]+)-(\\w*)$ |
|
| 5. [Upper, lower], [punc, digit]* after leading digit | ^(\\d+)([a-zA-Z])([\\p{Punct}\\d]*)$ |
|