Get Unicode Names
This flow returns:
The format of an Unicode name starts with a staring tag ![ , followed by the Unicode name, and ends with an ending tag ]! . Please refer to the design documents of Get Unicode Name for details. This flow component is used to convert UTF-8 to pure ASCII in NLP since all Unicode names are ASCII. In addition, this flow preserves the information of the original Unicode character.
When the -m flag is specified, the detail Unicode information for each characters of the input string are added after the standard set of lvg output fields. Three sets of information are included as:
Unicode Hex Value, Unicode Name, Unicode Block
The lvg defined symbols are removed since 2008. This flow is simplified to return Unicode name if the input character is not an ASCII (regardless of diacritics, ligatures, symbols).
shell> lvg -f:q3 lvg ©2008 lvg ©2008|lvg ![COPYRIGHT SIGN]!2008|2047|16777215|q3|1| shell> lvg -f:q3 -m µ µ|![MICRO SIGN]!|2047|16777215|q3|1|U+00B5, MICRO SIGN, LATIN_1_SUPPLEMENT| μ μ|![GREEK SMALL LETTER MU]!|2047|16777215|q3|1|U+03BC, GREEK SMALL LETTER MU, GREEK|More examples