Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Map Unicode to ASCII

  • Short Description: Converts input Unicode characters to ASCII characters.

  • Full Description:

    This flow converts Unicode characters to ASCII characters. Some Unicode characters are not be able to convert to Unicode by Unicode normalization algorithm, such as strip diacritics, split ligatures, etc. These characters are normalized to ASCII by table lookup mapping. The mapping table is defined in the file of $LVG/data/Unicode/unicodeMap.data. Users may add/modify this file from the default set for their applications. Please refer to the design documents of Map Unicode to ASCII for details.

    When the -m flag is specified, the detail mutate operations for each characters of the input string are added after the standard set of lvg output fields. There are two basic mutate operations for normalize Unicode to ASCII in this flow as shown in following table:

    OperationsDescriptionsExample
    NONo operationø -> ø
    MPTable lookup mappingƼ -> 5


  • Difference:

    None.

  • Features:
    1. Get the ASCII representation of the Unicode characters from the input term.


  • Symbol: q1

  • Examples:
    
    shell> lvg -f:q1 -m
    ⅝
    ⅝|5/8|2047|16777215|q1|1|MP|
    
    More examples

  • Implementation Logic:
    1. Check if the character is in the Unicode mapping table:
      • if yes, return the mapped ASCII character
      • if no, return the original input Unicode

  • Source Code: ToMapUnicodeToAscii.java

  • Hierarchy: Object -> Transformation -> ToMapUnicodeToAscii