Convert output - from Xerox Parc stochastic tagger into Lvg style pipe delimited format
- Short Description:
Convert the output of the Xerox Parc stochastic tagger into Lvg style pipe delimited format.
- Full Description:
Convert the output of the Xerox Parc stochastic tagger into Lvg style pipe delimited format.
An example of the use of this option is to utilize the output from a stochastic part of speech tagger as hints to LVG for determining part of speech and inflection, and perform a word based lexical lookup.
No effect on the -m option. "none" is added at the end of the output.
- Difference:
- The inflection table scheme is different and thus the related inflections are different.
- The Java version does not take "," as the input term.
- The Java version with IDB has a slow performance on testing data of
[':', 'cl'], ['.', 'pd'], ['-', 'hy'], and [',', 'cm'].
- Features:
- Convert input from Xerox Parc stochastic tagger into Lvg style pipe delimited format.
- Xerox Parc stochastic tagger: ['term', 'category']
- Lvg style pipe delimited format: term|term|category|inflection|Flow History|
- The inflections are retrieved from database.
- Symbol:
U
- Examples:
shell> lvg -f:U
['elderly', 'adj']
['elderly', 'adj']|elderly|1|257|U|1|
More examples
Implementation Logic:
- Get term and category from the input term in its Xerox Parc stochastic tagger
- Retrieve all possible inflection from inflection table in the database based on the term and categories
- Concatenate all possible inflections together.
Source Code: ToConvertOutput.java
Hierarchy: Object -> Transformation -> ToConvertOutput