Lexical Tools

Unicode Introduction

Why Unicode?

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters. These encoding systems also conflict with one another. That is, two encodings can use the same number for two different character, or use different numbers for the same character.

What is Unicode?

The Unicode Standard is a character encoding specification published by the Unicode Consortium. Unicode is designed to be a universal character set that includes all of the major scripts of the words in a simple and consistent manner. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. It allows data to be transported through many different systems without corruption.

Categories in Unicode:

  • Blocks
  • General Categories
  • Name
  • Value

Reference