Uninflect Words
Lvg can uninflect both words and terms. That is, it can make plural nouns into singular nouns, inflected verbs into their infinitive forms, and adjectives and adverbs into their positive forms.
No effect on the -m flag option. "none" is added at the end of the output.
There is a subtle difference between uninflecting the input as terms (-f:b) and uninflecting the input as a sequence of words (-f:B). The difference is that when the input is viewed as one term, a quick lookup of this term is made, and if it is not found, then the rules kick in to create an uninflected form. When the input is viewed as a sequence of words, each word is looked up to find the uninflected form. What is returned is every combination of uninflected words, rather than one rule generated. As an example of this difference, take the term alpha beta which is not in the lexicon as a term, but where alpha is in the lexicon and beta is in the lexicon. When this is pushed through the command
lvg -f:b -m
, the result is:
alpha beta|alpha beta|1|1|b|1|RULE|$|adj|base|$|adj|base alpha beta|alpha beta|2|1|b|1|RULE|$|adv|base|$|adv|base alpha beta|alpha beta|128|1|b|1|RULE|$|noun|base|$|noun|base alpha beta|alpha beta|1024|1|b|1|RULE|$|verb|base|$|verb|base alpha beta|alpha beton|128|512|b|1|RULE|a$|noun|plural|on$|noun|singular alpha beta|alpha betum|128|512|b|1|RULE|a$|noun|plural|um$|noun|singular
As can be seen, rules were applied to the end of the term, in this case beta to come up with uninflected rule generated forms for alpha beta. When the input is viewed as a sequence of words however, the resulting uninflection is different. When the command
lvg -f:B
is used, the result is:
alpha beta|alpha beta|2047|1|B|1|
A heuristic within this uninflection flow that should be pointed out is that words that, uninflect to more than 10 forms, are treated differently. In such case, only the input form is used as the uninflected form. For example, the nonsense term PIIA clA CuUM TIAA, only has one uninflected form as a result of this heuristic because each word of these terms generates three variants a piece. Where as the nonsense term PIIA clA cuUM produces nine normalized forms due to the rule generated uninflected forms. The reasoning behind the heuristic is that the aggressive rule generated forms when not pruned can produce an explosive amount of irrelevant forms. The number of the outputs can be set in the lvg configuration file (${LVG_DIR}/data/config/lvg.properties, MAX_RULE_UNINFLECTED_TERMS, the default value is 10).
An additional heuristic has also been implemented within the uninflectional morphology unit to limit spurious variants. If a term goes through an uninflectional morphology mutation, and the term is not known to the lexicon, but its rule generated form is known to the lexicon, this variant is thrown out, because it is likely to be wrong.
The results are sorted by length and then case incentive alphabetical order.
shell> lvg -f:B alpha beta alpha beta|alpha beta|2047|1|B|1| left left|left|2047|1|B|1| left|leave|2047|1|B|1| data data|data|2047|1|B|1| data|datum|2047|1|B|1| left data left data|left data|2047|1|B|1| left data|leave data|2047|1|B|1| left data|left datum|2047|1|B|1| left data|leave datum|2047|1|B|1|More examples