Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

  • WordInd
  • Java


Introduction

WordInd is used to create word indexes. It breaks up a string into a unique list of lowercased "words". WordInd follows the UMLS definition of word, a sequence of one or more alphanumeric characters.

WordInd uses UTF-8 for the input and output since 2004 release.

Set Up

Follow the installation instructions to install lexical tool and run the wordInd program. Check on the following items only if you don't use the provided script to install Lexical tools.

  • CLASSPATH:
    1. include Lexical tools distribution jar file, ${LVG_DIR}/lib/lvg${YEAR}dist.jar, in your CLASSPATH
    2. include lvg top directory, ${LVG_DIR}, in your CLASSPATH

  • Database: no database required.

  • Configuration File: assign the full path of the top directory of lvg${YEAR} to a variable named LVG_DIR in configuration file, ${LVG_DIR}/data/config/lvg.properties.

Test Run

  • run java program

    Enter the command:

    
    shell> wordInd -p
    - Please input a term (type "Ctl-d" to quit) >
    aaaa bbbb:cccc
    aaaa
    bbbb
    cccc
    

    where:

    • wordInd: WordInd script to run WordInd Java class
    • -p: set Norm system option to show prompt (try -h option!)

Output Format

WordInd reads from standard input and writes to standard output, one line per word. The fields in the output are in the order of the -F options.

Global Behavior Options

Please refer to design document

Input Field Options

Please refer to design document