Split Ligatures in Unicode
Unicode | Mapped String | Char | Unicode Name |
---|---|---|---|
U+00C6 | AE | Æ | LATIN CAPITAL LETTER AE |
Please note:
As discussed in the Unicode Normalization, Unicode Normalization KC can be used for splitting ligatures. Unicode normalization KC decomposes a ligature into several Unicode characters. This process is used after the table mapping. Please note:
Unicode | Mapped String | Char | Unicode Name |
---|---|---|---|
U+00B5 | µ | µ | MICRO SIGN |
Unicode | Mapped ASCII | Char | Unicode Name |
---|---|---|---|
U+00BC | 1/4 | ¼ | VULGAR FRACTION ONE QUARTER |
U+00BD | 1/2 | ½ | VULGAR FRACTION ONE HALF |
U+00BE | 3/4 | ¾ | VULGAR FRACTION THREE QUARTER |
U+00C6 | AE | Æ | LATIN CAPITAL LETTER AE |
U+00E6 | ae | æ | LATIN SMALL LETTER AE |
U+0132 | IJ | IJ | LATIN CAPITAL LETTER IJ |
U+0133 | ij | ij | LATIN SMALL LETTER IJ |
U+0152 | OE | Œ | LATIN CAPITAL LETTER OE |
U+0153 | oe | œ | LATIN SMALL LETTER OE |
U+FB00 | ff | ff | LATIN SMALL LIGATURE FF |
U+FB01 | fi | fi | LATIN SMALL LIGATURE FI |
U+FB02 | fl | fl | LATIN SMALL LIGATURE FL |
U+FB03 | ffi | ffi | LATIN SMALL LIGATURE FFI |
U+FB04 | ffl | ffl | LATIN SMALL LIGATURE FFL |
U+FB05 | st | ſt | LATIN SMALL LIGATURE LONG S T |
U+FB06 | st | st | LATIN SMALL LIGATURE ST |
- import com.ibm.icu.text.*;
- if the character is in the ligature mapping table
- Perform mapping
- else
- String normStr = Normalizer.normalize(inChar, Normalizer.NFKC);
- Set the split string to normStr.trim()