Lexical Tools

Uninflect a Term

  • Short Description: Reduce the input term to its uninflected form(s).

  • Full Description:

    Lvg can uninflect both words and terms. That is, it can make plural nouns in to singular nouns, inflected verbs into their infinitive forms, and adjectives and adverbs into their positive forms.

    First, this flow finds the uninflected terms by facts (or by rules if none from facts). Then it combines records if the spelling and category of the uninflected (base) form are the same. Finally, it sorts results by alphabetical order, category. For example, if the input is Li, six lexical records are found from facts:

    • Li|Li|noun|base|FACT|Li|noun|base|Li|noun|base|E0003847|
    • Li|Li|noun|base|FACT|Li|noun|singular|Li|noun|base|E0003847|
    • Li|Li|noun|base|FACT|Li|noun|base|Li|noun|base|E0355488|
    • Li|Li|noun|base|FACT|Li|noun|singular|Li|noun|base|E0355488|
    • Li|LI|noun|base|FACT|LI|noun|base|LI|noun|base|E0699468|
    • Li|LI|noun|base|FACT|LI|noun|singular|LI|noun|base|E0699468|

    They are combined and sorted to the final results:
    • Li|LI|noun|base|FACT|LI|noun|base|LI|noun|base|E0699468|
    • Li|Li|noun|base|FACT|Li|noun|base|Li|noun|base|E0003847|

    Please note that two combining operations are performed in this example:

    • Combining inflections of "base" and "singular" because they mean the same thing when category is noun
    • Combining EUI "E0003847" and "E0355488" because they have same base form of "Li". EUI is randomly selected. In this case, "E0355488" is removed.

    When the -m flag is specified, the fact or rule that was used to do the uninflection is added after the standard set of lvg output fields. The addition information could be composed of two formats:

    • |FACT|input term|category|inflected inflection|uninflected term|category|uninflected inflection|EUI|
    • |RULE|matched pattern|category|inflection|replaced pattern|category|uninflected inflection|

    An additional heuristic has also been implemented within the inflectional morphology unit to limit spurious variants. If a term goes through an inflectional morphology mutation (-f:i, -f:B, -f:b), and the term is not known to the lexicon, but its rule generated form is known to the lexicon, this variant is thrown out, because it is likely to be wrong. This heuristic is only overruled when the -ki flag is set to return all forms

    The results are sorted by alphabetical order, category.

  • Difference:
    1. The inflection scheme is redesigned in the Java version. A new database table for fact inflection is created. Accordingly, results are different.
    2. The Java version stores cases of each uninflected term in data base (IDB). In a word, results are case sensitive.
    3. The order of display will be sorted by the frequency of category, then by dictionary order.
    4. The Java version is capable of handling punctuations. The old version strip punctuations from the input term first before uninflecting it.

      For example: "isn't" should be uninflected as "be" not "isn t"

      For example: "wasn't" should be uninflected as "be" not "wasn t"
      For example: "doesn't" should be uninflected as "do" not "doesn t"
      For example: "won't" should be uninflected as "will" not "win t"
      For example: "Vit's" should be uninflected as "Vit" not "Vit s"
      For example: "cit's" should be uninflected as "cit" not "cit s"

    5. After lvg 2004 release, output filter function -CR:oc is added to this flow component as default. In other words, only one base form (with same spelling) for one category shows on the output.
    6. After lvg 2006 release, information of irreg variants for spelling variants are added in LEXICON. Accordingly, this flow returns the corresponding base form. For examples, club-feet will return club-foot.

  • Features:
    1. The input term is viewed as a term.
    2. This term is looked up to find the uninflected form(s) by fact and rules.

  • Symbol: b

  • Examples:
    shell> lvg -f:b -m
    alpha beta
    alpha beta|alpha beta|1|1|b|1|RULE|$|adj|base|$|adj|base
    alpha beta|alpha beta|2|1|b|1|RULE|$|adv|base|$|adv|base
    alpha beta|alpha beta|128|1|b|1|RULE|$|noun|base|$|noun|base
    alpha beta|alpha beta|1024|1|b|1|RULE|$|verb|base|$|verb|base
    alpha beta|alpha beton|128|512|b|1|RULE|a$|noun|plural|on$|noun|singular
    alpha beta|alpha betum|128|512|b|1|RULE|a$|noun|plural|um$|noun|singular
    shell> lvg -f:b -CR:o
    alpha beta
    alpha beta|alpha beta|1155|1|b|1|
    alpha beta|alpha beton|128|512|b|1|
    alpha beta|alpha betum|128|512|b|1|
    More examples

  • Implementation Logic:
    1. Find all uninflected form(s) for the input term.
      • Find uninflected terms from fact (Lexicon Database).
      • If no result from fact, find uninflected terms from rule (Trie).
      • Filter out terms from the result (generated by rules) if it is in Lexicon Database.
      • Combine record if the spelling of base form and the category are the same.
      • Sort results by alphabetical order, category.

  • Source Code: ToUninflectTerm.java

  • Hierarchy: Object -> Transformation -> ToUninflectTerm