Lexical Tools

Map Symbols & Punctuation to ASCII

  • Short Description: Converts input Unicode symbols and punctuation to ASCII characters.

  • Full Description:

    This flow converts Unicode symbols and punctuation to ASCII characters (symbols and punctuation). Unicode punctuation and symbols are very confusing not only because they looks alike, multiple defined (in different Unicode blocks), but also because text editor software automatically change them during the editing and transaction. In recent years, text editor software change ASCII (dumb) quotes to smart quotes automatically. This automation results in converting ASCII punctuation to non-ASCII Unicode punctuation. This flow component provides a way to reverse this process using table mapping method. The mapping table is defined in the file of $LVG/data/Unicode/symbolMap.data. Users may add/modify this file from the default set for their applications. Please refer to the design documents of Map Unicode symbols and punctuation to ASCII for details.

    When the -m flag is specified, the detail mutate operations for each characters of the input string are added after the standard set of lvg output fields. There are two basic mutate operations for mapping symbols and punctuation to ASCII in this flow as shown in following table:

    NONo operationø -> ø
    MPTable lookup mapping“ -> "

  • Difference:


  • Features:
    1. Get the ASCII representation of the Unicode symbols and punctuation from the input term.

  • Symbol: q0

  • Examples:
    shell> lvg -f:q0 -m
    “Quote” |"Quote"|2047|16777215|q0|1|MP|NO|NO|NO|NO|NO|MP|
    More examples

  • Implementation Logic:
    1. Check if the character is in the Unicode symbol mapping table:
      • if yes, return the mapped ASCII symbol
      • if no, return the original input Unicode

  • Source Code: ToMapSymbolToAscii.java

  • Hierarchy: Object -> Transformation -> ToMapSymbolToAscii