LVG Trie File Format
I. File Distribution (data for rules)
- im.rul
- plural.rul
- verbinfl.rul
- dm.rul
II. File Format
There are four different types of data specified in a file. They are:
- Comments: lines start with #
- Rules: lines with a format of
key | Input Category | Input Inflection
| value | Output Category | Output Inflection
- Exceptions: lines start with space, follow with key | value;
- Another file for additional rules: lines start with #include "file name"
These types of format work all right. However, they are not intuited to human and thus hard to maintain. A new format is proposed and used to improve its maintainability:
- Comments: lines start with #
- Rules: lines start with RULE:
- Exceptions: lines start with EXCEPTION:
- Another file: lines start with FILE:
- Standardize the abbreviation of Category and Inflection to be consistent through all lvg components.
- Category: noun, verb, adj, adv
- Inflection: base
singular, plural
infinitive, pres, past, presPart, pastPart
positive, comparative, superlative
< Example >
# This is a comments
RULE: e$|adj|positive|er$|adj|comparative
EXCEPTION: inhale|inhaler;
FILE: verbinfl.rul
III. File Characteristics
- All rules and exceptions are bi-directional (forward and reverse)
- key is used as a pattern to match the input term
- value is used as a pattern to change the input term for output
- Exceptions are the exceptions for the rule that locate right above them
- key in a rule is called input suffix
- value in a rule is called output suffix
- WildCards are used in input suffix and output suffix
- input category must be one of: adj, adv, noun or verb
- input inflection must be one of: base, singular, positive, infinitive, plural, comparative, superlative, pres, presPart, past, pastPart
- output category must be one of: adj, adv, noun or verb
- output inflection must be one of: base, singular, positive, infinitive, plural, comparative, superlative, pres, presPart, past, pastPart