Lexical Tools

WordInd System Options

WordInd is used to break up a string into an unique list of lowercased "words". The definition of "word" is dependent on how the string is tokenized. It is currently defined to be all tokens that contain only runs of alphanumeric characters of length greater than or equal to 1.

The definition of a token includes a run one or more of non-white space, non-punctuation characters as defined within the ISO-Latin-I character set, and tagged by Java as such. WordInd throws out anything that is not a word.

This page lists all system options for WordInd programs

Original FlagNew FlagFeature Descriptions
Input Filter Options:
tN t:INT Define the field to use as the input term field. The default is 1.
Global Behavior Options:
  c Reserve cases of input terms.
h h Print program help information.
  hs Print option's hierarchy structure.
  i:STR Define input file name. The default is screen input.
  o:STR Define output file name. The default is screen output.
  p Show the prompt. The default is no prompt.
s'Char' s:STR Defines a field separator for the input. The default is "|".
v v Return the current version identification of WordInd.
Output Filter Options:
oN F:INT
F:INT:INT:...
Copy specified field(s) from input to output.
n n Return a "-No Output-" message when an input produces no output.

Examples:

  • shell> wordInd -c This is a book. This is a book
  • shell> wordInd -F:2:1 aa~bb~cc|dd~ee dd~ee|aa~bb~cc|aa dd~ee|aa~bb~cc|bb dd~ee|aa~bb~cc|cc
  • shell> wordInd -t:7 -F:1:6 C0185495|ENG|P|L0223844|PF|S0298948|Denis-Browne splint strapping|3| C0185495|S0298948|denis C0185495|S0298948|browne C0185495|S0298948|splint C0185495|S0298948|strapping
  • shell> wordInd -i:in.data -o:out.data Read data from file, in.data, and send output to file, out.data.
  • shell> wordInd -n $$$ -No Output-
  • shell> wordInd -s:/ aa/bb/cc|dd/ee aa
  • shell> wordInd -t:2 a~bb~cc|dd~ee dd ee
  • shell> wordInd -t:2 -n a~bb~cc||dd~ee -No Output-
  • shell> wordInd -v wordInd.2024