Lexical Tools

  • WordInd
  • Java


Introduction

WordInd is used to create word indexes. It breaks up a string into a unique list of lowercased "words". WordInd follows the UMLS definition of word, a sequence of one or more alphanumeric characters.

WordInd uses UTF-8 for the input and output since 2004 release.

Set Up

Follow the installation instructions to install lexical tool and run the wordInd program. Check on the following items only if you don't use the provided script to install Lexical tools.

  • CLASSPATH:
    1. include Lexical tools distribution jar file, ${LVG_DIR}/lib/lvg${YEAR}dist.jar, in your CLASSPATH
    2. include lvg top directory, ${LVG_DIR}, in your CLASSPATH

  • Database: no database required.

  • Configuration File: assign the full path of the top directory of lvg${YEAR} to a variable named LVG_DIR in configuration file, ${LVG_DIR}/data/config/lvg.properties.

Test Run

  • run java program

    Enter the command:

    
    shell> wordInd -p
    - Please input a term (type "Ctl-d" to quit) >
    aaaa bbbb:cccc
    aaaa
    bbbb
    cccc
    

    where:

    • wordInd: WordInd script to run WordInd Java class
    • -p: set Norm system option to show prompt (try -h option!)

Output Format

WordInd reads from standard input and writes to standard output, one line per word. The fields in the output are in the order of the -F options.

Global Behavior Options

Please refer to design document

Input Field Options

Please refer to design document