Lexical Tools

Map Unicode to ASCII

Short Description: Converts input Unicode characters to ASCII characters.
Full Description:
This flow converts Unicode characters to ASCII characters. Some Unicode characters are not be able to convert to Unicode by Unicode normalization algorithm, such as strip diacritics, split ligatures, etc. These characters are normalized to ASCII by table lookup mapping. The mapping table is defined in the file of $LVG/data/Unicode/unicodeMap.data. Users may add/modify this file from the default set for their applications. Please refer to the design documents of Map Unicode to ASCII for details.
When the -m flag is specified, the detail mutate operations for each characters of the input string are added after the standard set of lvg output fields. There are two basic mutate operations for normalize Unicode to ASCII in this flow as shown in following table:

Operations Descriptions Example
NO No operation ø -> ø
MP Table lookup mapping Ƽ -> 5
Difference:
None.
Features:
1. Get the ASCII representation of the Unicode characters from the input term.
Symbol: q1

Operations	Descriptions	Example
NO	No operation	ø -> ø
MP	Table lookup mapping	Ƽ -> 5

Examples:


shell> lvg -f:q1 -m
⅝
⅝|5/8|2047|16777215|q1|1|MP|

Implementation Logic:
1. Check if the character is in the Unicode mapping table:
  - if yes, return the mapped ASCII character
  - if no, return the original input Unicode
Source Code: ToMapUnicodeToAscii.java
Hierarchy: Object -> Transformation -> ToMapUnicodeToAscii