Lexical Variant Generation (lvg) is a suite of utilities that can generate, transform, and filter lexical variants from the given input. Lvg is intended to be used to create robust indexes and to transform user queries into retrievable entries from those indexes.
Since 2002, lvg has been developed and released in pure Java.
In 2004 release, lvg used UTF-8 as the default format of input and output.
In 2008 release, lvg contained 62 flow components and 37 command options.
In 2013 release, lvg enhanced derivations with 2 more options and increased command options to 39.
The design features of lvg are described as below:
Follow the installation instructions to install lexical tool and run lvg program. Check the following items only if you don't use the provided script to install the Lexical tools.
Enter the command:
shell> lvg -f:n -f:i -p Please input a term (type "Ctl-d" to quit) > sleep sleep|sleep|2047|16777215|n|1| sleep|sleep|128|1|i|2| sleep|sleep|128|512|i|2| sleep|sleep|1024|1|i|2| sleep|sleep|1024|262144|i|2| sleep|sleep|1024|1024|i|2| sleep|slept|1024|32|i|2| sleep|slept|1024|64|i|2| sleep|sleeps|1024|128|i|2| sleep|sleeping|1024|16|i|2|
where:
Lvg copies its input from standard input to standard output and appends 6 or more fields. In general the output consists of:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7+ |
Input | Output Term | Categories | Inflections | Flow History | Flow Number | Additional Information |
Field 1: Input Line
The input may have one or more fields.
Field 2: Output Term
The output term field contains the transformed term.
Since the input may be fielded, this output term will be a
transformation of only one of the input fields.
The default field for transformation is the first field. This
behavior may be changed with the -t:INT input filter option.
Field 3: Category
The category field
contains the decimal representation of a bit vector representing all
the possible categories that this output term may have. The bit
vector is a compact way of representing multiple categories with one
number. This data format is intended to be utilized by a program or
parser. The -SC filter interprets the category information in humanly
readable form.
Field 4: Inflection
The inflection field
is the decimal representation of a bit vector representing all the
possible inflection types the output term may have. As with the
category field this compact format is intended to be used by a program
or parser. The -SI filter interprets the inflection information in
humanly readable form.
Field 5: Flow History
The flow history
represents the flow component mnemonics of the flow options that were
applied to produce the output. Generally, the symbols of flow
components mnemonics reflect the flow options specified on the command
line.
Field 6: Flow Number
The flow number
field contains a number indicating which flow produced the
output. Flows are composed of command line options starting with
-f:. Lvg can transform terms in parallel flows. For instance, one may
want to generate both synonyms and derivations for any given input.
One would do this via two parallel flows, -f:y -f:d. This differs
from -f:y:d, which would produce the derivations of the synonyms for
any given input. In the above example, the synonyms generated would be
produced by the first flow and the derivations would be generated in
the second flow. The flow number field would indicate this.
Field 7+: Additional Information
The
additional information field(s) contain additional information that is
specific to the flow option applied. The contents of these fields
are, generally, governed by -m global option.
The following example shows the command, input, and outputs for lvg:
shell> lvg -t:2 -f:y -ti -m
C0037313|sleep
sleep|hypnic|1|1|y|1|FACT|sleep|sleep|noun|hypnic|adj|NLP_LVG| sleep|sleep|128|1|y|1|FACT|sleep|sleep|verb|sleep|noun|C0037313| sleep|sleep|1024|1|y|1|FACT|sleep|sleep|noun|sleep|verb|C0037313|
Field Num | Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7+ |
---|---|---|---|---|---|---|---|
Field | Input | Output | Category | Inflection | Flow History | Flow Number | Additional Information |
Result-1 | sleep | hypnic | 1 | 1 | y | 1 | FACT|sleep|sleep|noun|hypnic|adj|NLP_LVG| |
Result-2 | sleep | sleep | 128 | 1 | y | 1 | FACT|sleep|sleep|verb|sleep|noun|C0037313| |
Result-3 | sleep | sleep | 1 | 1024 | y | 1 | FACT|sleep|sleep|noun|sleep|verb|C0037313| |
Please refer to design document
Please refer to design document