Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Convert output - from Xerox Parc stochastic tagger into Lvg style pipe delimited format

  • Short Description: Convert the output of the Xerox Parc stochastic tagger into Lvg style pipe delimited format.

  • Full Description:

    Convert the output of the Xerox Parc stochastic tagger into Lvg style pipe delimited format.

    An example of the use of this option is to utilize the output from a stochastic part of speech tagger as hints to LVG for determining part of speech and inflection, and perform a word based lexical lookup.

    No effect on the -m option. "none" is added at the end of the output.

  • Difference:
    1. The inflection table scheme is different and thus the related inflections are different.
    2. The Java version does not take "," as the input term.
    3. The Java version with IDB has a slow performance on testing data of [':', 'cl'], ['.', 'pd'], ['-', 'hy'], and [',', 'cm'].


  • Features:
    1. Convert input from Xerox Parc stochastic tagger into Lvg style pipe delimited format.
    2. Xerox Parc stochastic tagger: ['term', 'category']
    3. Lvg style pipe delimited format: term|term|category|inflection|Flow History|
    4. The inflections are retrieved from database.


  • Symbol: U

  • Examples:
    
    shell> lvg -f:U
    ['elderly', 'adj']
    ['elderly', 'adj']|elderly|1|257|U|1|
    
    More examples

  • Implementation Logic:
    1. Get term and category from the input term in its Xerox Parc stochastic tagger
    2. Retrieve all possible inflection from inflection table in the database based on the term and categories
    3. Concatenate all possible inflections together.

  • Source Code: ToConvertOutput.java

  • Hierarchy: Object -> Transformation -> ToConvertOutput