Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Map Symbols & Punctuation to ASCII

  • Short Description: Converts input Unicode symbols and punctuation to ASCII characters.

  • Full Description:

    This flow converts Unicode symbols and punctuation to ASCII characters (symbols and punctuation). Unicode punctuation and symbols are very confusing not only because they looks alike, multiple defined (in different Unicode blocks), but also because text editor software automatically change them during the editing and transaction. In recent years, text editor software change ASCII (dumb) quotes to smart quotes automatically. This automation results in converting ASCII punctuation to non-ASCII Unicode punctuation. This flow component provides a way to reverse this process using table mapping method. The mapping table is defined in the file of $LVG/data/Unicode/symbolMap.data. Users may add/modify this file from the default set for their applications. Please refer to the design documents of Map Unicode symbols and punctuation to ASCII for details.

    When the -m flag is specified, the detail mutate operations for each characters of the input string are added after the standard set of lvg output fields. There are two basic mutate operations for mapping symbols and punctuation to ASCII in this flow as shown in following table:

    OperationsDescriptionExample
    NONo operationø -> ø
    MPTable lookup mapping“ -> "


  • Difference:

    None.

  • Features:
    1. Get the ASCII representation of the Unicode symbols and punctuation from the input term.


  • Symbol: q0

  • Examples:
    
    shell> lvg -f:q0 -m
    “Quote”
    “Quote” |"Quote"|2047|16777215|q0|1|MP|NO|NO|NO|NO|NO|MP|
    
    More examples

  • Implementation Logic:
    1. Check if the character is in the Unicode symbol mapping table:
      • if yes, return the mapped ASCII symbol
      • if no, return the original input Unicode

  • Source Code: ToMapSymbolToAscii.java

  • Hierarchy: Object -> Transformation -> ToMapSymbolToAscii