Lexical Tools

Generate all fruitful variants

  • Short Description: Generate all fruitful variants for the input.

  • Full Description:

    The variants are created by generating inflectional variants, spelling variants, acronyms and abbreviations, expansion, derivational variants (recursively), synonyms (recursively), and combinations of these, as specified in Aronson AR, The Effect of Textual Variation on Concept Based Information Retrieval. Proceedings of the AMIA Symposium, 373-377, 1996. This flow option is useful to create an aggressive retrieval index.

    The history notation and distance score are shown below:

    OperationNotationDistance score
    No Operationsn0
    Spelling Variants0
    Inflectional Varianti1
    Uninflectional Variantb1
    Synonym y2
    Acronym/AbbreviationA2
    Expansiona2
    Derivational Variantd3

    The algorithm of MetaMap variants are generated by the following combinations:

    • n
      • n + d
      • n + d + y

    • A/a
      • A/a + d
      • A/a + d + y

    • y
      • y + d
      • y + d + y

    • A/a + y
    • y + A/a

    Derivational variants, Synonyms, Acronyms/Abbreviations, and Expansions are all based on the base form of input. Thus, an uninflect operation needs to be done first. Only suffix derivations are used to generate fruitful variants after 2018.

    The output from these combinations are filtered out if they have same spelling and category and kept the one with shortest distance score. Inflections are simplified and only inflections with value less than 256 are kept.

    Then, generate spelling variants and inflectional variants (fact and rule) for all items in the above combined list. The spelling variants are generated first since they have lower distance score.

    Finally, filter out outputs again by filtering out items with same spelling and category and just keep the one with shortest distance score. In addition, any item with inflection value great than 256 is filtered out.

    The -m option include 5 fields to shows detail mutate information:

    initial categorycategory after the very first operationflow historydistance scoretag information

    The tag information is represented by a long value by combining tag bit values. Currently, two bits are defined as listed in following table:

    BitValueTag
    01Noun/Adj only in recursive derivation
    12Unique acronyms/expansion

    Please refer to Tag class for details.

  • Difference:
    1. The algorithm in the new version is much more thorough.
    2. Much more information are provided in the mutate information.


  • Features:
    1. Generates all fruitful variants and provides initial category, flow history, distance score, and tag information.


  • Symbol: G

  • Examples:
    
    shell> lvg -f:G -m
    neurological
    neurological|nervous systems|128|8|G|1|1|1|n+dd+y+i|9|2|
    neurological|neurologies|128|8|G|1|1|1|n+d+i|4|3|
    neurological|neurologists|128|8|G|1|1|1|n+dd+i|7|2|
    neurological|nervous system|128|1|G|1|1|1|n+dd+y|8|2|
    neurological|neurology|128|1|G|1|1|1|n+d|3|3|
    neurological|neurologist|128|1|G|1|1|1|n+dd|6|2|
    neurological|neurologically|2|1|G|1|1|1|n+d|3|2|
    neurological|neurol|1|1|G|1|1|1|n+A|2|1|
    neurological|neurological|1|1|G|1|1|1|n|0|3|
    neurological|neuro|1|1|G|1|1|1|n+A|2|1|
    neurological|neurologic|1|1|G|1|1|1|n+y|2|3|
    
    More examples

  • Implementation Logic:
    1. Retrieve all possible categories and inflections ( < 256), put into list originalSet and mark as no operation.
    2. Generate variants from flow of b/n, A, a, y and saved in variantSet.
      • Get the base form from the input and put into list nb (n/b). If the base is different from the input, mark as "b", otherwise, mark as "n". The category and inflection ( < 256, or category is modal or aux) are also retrieved.
      • Get the spelling variants of nb and mark as b list.
      • Get the acronyms and expansions from list b and save in list A (A/a).
      • Get recursive synonyms from list b and save as list y (y).
      • Add list b, A, y into list 1 (n/b, A/a, y).
      • Generate recursive derivational variants from list 1 and save as list 2 (d, A/a + d, y + d).
      • Generate recursive synonyms from list 2 and save in list 3 (d + y, A/a + d + y, y + d + y).
      • Generate recursive synonyms from list A and save in list 4 (A/a + y).
      • Generate Acronyms/Abbreviations from list y and save in list 5 (y + A/a).
      • Add lists 1, 2, 3, 4, 5 into list 6 (n/b, A/a, y, d, A/a + d, y + d, d + y, A/a + d + y, y + d + y, A/a + y, y + A/a). List 6 filters out items with same spelling and keep the one with the shortest distance score.
    3. Generate spelling and inflectional variants list sivSet on list 6.
      • Generate Spelling variants from list variantSet and save as list 7 (n + s, A/a + s, y + s, d + s, A/a + d + s, y + d + s, d + y + s, A/a + d + y + s, y + d + y + s, A/a + y + s, y + A/a + s).
      • Generate Inflectional variants from list variantSet and save as list 8 (n + i, A/a + i, y + i, d + i, A/a + d + i, y + d + i, d + y + i, A/a + d + y + i, y + d + y + i, A/a + y + i, y + A/a + i).
    4. Combine list sivSet into list originalSet.
    5. At the last, 8 filters out items with same spelling and inflections, and keep the one with shortest distance. In addition, filter items with inflection value greater than 256. And finally updates mutate information with flow history, distance, and tag information.

  • Source Code: ToFruitfulVariants.java

  • Hierarchy: Object -> Transformation -> ToFruitfulVariants