Prefix Computer Programs
A set of computer programs is developed to retrieve prefix word|word in LEXICON and validation for derivations. This program is run annually for lvg release.
- Get all base forms from LEXICON (inflvars.data)
- Program: GetBaseForms.java
- Input:./dataOrg/inflVars.data
- Output:./data/bases.data
- Descriptions
- go through all lines (inflectional variants) in file of "inflVars.data"
- retrieve base form (infl = 1)
- Retrieve and validate prefix words|words
- Program: GetPrefixWordsFromFile.java
- Input:
- ./dataOrg/prefix.data
- ./data/bases.data
- ./data/prefix.tag.data
- Output:./data/prefixWords.meta.data
- Descriptions
- get prefixes from a file (./dataOrg/prefix.data)
- get base forms from a file (./data/bases.data)
- get prefix tags from a file (./data/prefix.tag.data)
- Find all pairs of prefix words|words in LEXICON:
- go through all prefixes from the sorted prefixes list
- find all pairs of prefix word|word (prefixWordList) if:
- prefix word is in base forms
- word is base in base forms
- validate all pairs of prefix words|words in prefixWordList
- go through all pairs of prefixWord|words in prefixWordList
- print tag ("yes" or "no") to ./data/prefixWords.meta.data
- print "tbd" if no tag found
- Generate various reports from ./data/prefixWords.meta.data by tag
- Program: GeneratePrefixFiles.java
- Input:
- ./data/prefixWords.meta.data
- Output:
- ./data/prefix.tbd.data
- ./data/prefixWords.data
- ./data/prefix.newTag.data
- Descriptions
- go through all pairs of tagged prefixWord|words in prefixWords.meta.data
- send all "tbd" tags to prefix.tbd.data
- send all "yes" and "no" tags to prefix.newTag.data
- send all "yes" tags to prefixWords.data
- Check if there is invalid tag
- Check all comment lines
- Validate results:
- Program: 2.GetPrefixWords
- Input:
- ./data/prefix.tag.data
- ./data/prefix.newTag.dat
- ./data/prefixWords.data.new
- Output:
- ./data/prefix.tag.data.noComment.sort
- ./data/prefix.newTag.data.all.sort
- Descriptions
- Remove all comments line from prefix.tag.data
- fgrep -v '#' prefix.tag.data prefix.tag.data.noComment
- sort -u prefix.tag.data.noComment > prefix.tag.data.noComment.sort
- Combine results and new prefixWords (will be added in the future)
- cat prefix.newTag.data prefixWords.data.new > prefix.newTag.data.all
- sort -u prefix.newTag.data.all > prefix.newTag.data.all.sort
- Compare two input and results tagged files
- diff prefix.tag.data.noComment.sort prefix.newTag.data.all.sort > prefix.tag.diff
- Usage for (future) releases:
- update inflVars.data from new release of LEXICON
- update prefix.data
- update prefixWords.data.new (for new prefix words that not in this release)
- ./bin/1.GetBaseForms ${YEAR}
- ./bin/2.GetPrefixWords ${YEAR}
- Check lines of prefix.tag.diff (should be 0)
- prefixWords.data (to be added to derivations.data)
- prefix.tbd.data (send to linguists for validations)