Lexical Tools

Norm Unicode to ASCII with Synonym Option

  • Introduction:
    This normalization is used to convert Unicode string to pure ASCII. This Norm is identical to Unicode synonym conversion followed by Unicode Norm. In other words:
    • First, it converts Unicode characters to its defined synonym base (-f:q4)
    • Then, it used core Norm to normalize the converted Unicode string (-f:q7)
    • Last, it converts non-ASCII Unicode characters to ![Unicode Name]! format (-f:q3)

    The main advantages of using this Norm are:

    • Pure ASCII results
    • More normalization in terms of Unicode synonyms
    • Preserve Unicode information
      Unicode characters can be retrieved from ![Unicode Name]!

  • Algorithm:
    • Convert Unicode characters to Unicode synonym base
    • Perform core norm
    • Convert no-ASCII characters from the result of above to Unicode name as ![Unicode name]!

  • References: