You are here

Using Lexical tools to convert Unicode characters to ASCII

Printer-friendly versionPrinter-friendly version
Lu C, Browne AC, Allen C, Divita G
AMIA Annu Symp Proc. 2008 Nov 6:1031.
Abstract: 

Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the worlds writing systems. It is widely used in multilingual NLP (natural language processing) projects. On the other hand, there are some NLP projects still only dealing with ASCII characters. This paper describes methods of utilizing lexical tools to convert Unicode characters (UTF-8) to ASCII (7-bit) characters.

Lu C, Browne AC, Allen C, Divita G. Using Lexical tools to convert Unicode characters to ASCII AMIA Annu Symp Proc. 2008 Nov 6:1031.