Lexical Tools

Anti-Normalize (Approximate Match)

  • Short Description: Returns inflected terms in Lexicon using normalized terms as input. This flow can be used as the basic approximate match.

  • Full Description:

    This process involves normalization and table lookup operations. This flow normalizes the input and then use normalized terms to find (inflected) terms in the lexicon. A database table, AntiNorm (antiNorm.data), is generated with normalized forms of all inflected terms in the lexicon (infl.data). Please refer to database table design documents for details. The same approach can be applied to different corpus for a basic approximate match.

    This flow is useful when we are not privy to the words contained in a target system, and/or we do not have the ability to index those set of words. The brute force way to do this is to take each word from a given input, and inflect it. The resulting inflected forms for each word would then be permuted across each set of word and inflections of that word from the input. We would like to constrain that explosion of output terms, if possible. The output of this flow is one of the major result applied constraints. This flow can be used as the basic approximate match in the Lexicon.

    The order of the results is sorted by alphabetical, EUI, category, and then inflection.

    The -m option is used to display the additional information that can be retrieved with this flow. The additional information is the EUI of the found term in lexicon.

  • Difference:

    None (new flow component).

  • Features:
    1. Find normalized forms for the inputs.
    2. Find terms in lexicon for the found normalized forms.


  • Symbol: An

  • Examples:
    
    shell> lvg -f:An -m
    Abrami disease
    Abrami disease|Abrami's disease|128|1|An|1|E0000141|
    Abrami disease|Abrami's disease|128|512|An|1|E0000141|
    Abrami disease|Abrami's diseases|128|8|An|1|E0000141|
    
    andersens syndrome
    andersens syndrome|Andersen syndrome|128|1|An|1|E0000363|
    andersens syndrome|Andersen syndrome|128|512|An|1|E0000363|
    andersens syndrome|Andersen's syndrome|128|1|An|1|E0000363|
    andersens syndrome|Andersen's syndrome|128|512|An|1|E0000363|
    
    More examples

  • Implementation Logic:
    1. use flow component N to normalize input term
    2. use found normalized terms to retrieve terms from lexicon (by order)
    3. filter out identical records
    4. sort results by alphabetic order, EUI, category, inflection (Db.AntiNormComparator)

  • Source Code: ToAntiNorm.java

  • Hierarchy: Object -> Transformation -> ToAntiNorm