Lexical Tools

About Lexical Tools

The Lexical tools include

  • norm/luiNorm:
    A word or string normalization tool
  • wordInd:
    A word tokenization tool
  • lvg:
    A suite of text utilities that can generate, mutate, and filter out lexical variants from the given input. This tool is known as Lexical Variants Generation or lvg.
  • toAscii:
    A tool to convert Unicode to pure ASCII
  • lgt:
    Lvg with graphics user interface
  • fields:
    A tool to cut out and/or rearrange fields

All above tools include:

  • Command line tools
  • Tools with GUI (graphic user interface, except for fields)
  • Web tools (run over internet, except for fields)
  • Java APIs

These tools (except for toAscii) have some general characteristics:

  • take input from standard input stream
  • send results on standard output stream
  • interpret fielded text

    Default field separator
    |
    Default end of a record new line

In other words, these tools can be told

  • what field to do the text transformation on
  • what fields to pass through to the output
  • what fields contain other relevant information, such as categories and inflections

The current version of Lexical Tools is developed in 100% Java and intended to be used as a tool to create robust indexes and as a tool to transform user queries into retrievable entries from those indexes.

These tools are intended to be embedded into user's application. Java APIs along with documents are provided for this purpose.