Split ligatures
This flow splits ligatures from the input using Unicode normalization KC algorithm. Users may also define their own ligatures and split characters (string) in the file of $LVG/data/Unicode/ligatureMap.data. This flow is enhanced since 2008 and is used to split ligatures and normalize Unicode characters of fullwidth block . Please refer to the design documents of split ligatures for details. Two typical usage of using this split ligatures flow component is to:
As mentioned above, users may define their own ligature split mapped String in "data/Unicode/ligatureMap.data". This user defined ligatures mapping list is configurable by modifying this ligature file. Users may add/modify this file from the default set for their applications. Please refer to the design documents of splitting ligatures in Unicode for details.
When the -m flag is specified, the detail mutate operations for each characters of the input string are added after the standard set of lvg output fields. There are three basic mutate operations for stripping diacritics as shown in following table:
Operations | Descriptions | Example |
---|---|---|
NO | No operation | A -> A |
MP | Table mapping | Æ -> AE |
NFKC | Normalization KC | ff -> ff |
Take the advantage of capability of Unicode, the new Java version is used to split ligatures and normalize fullwidth characters.
shell> lvg -f:q2 spælsau spælsau|spaelsau|2047|16777215|q2|1| shell> lvg -f:q2 -m œ œ|oe|2047|16777215|q2|1|MP|More examples