Lexical Tools

LVG Trie File Format

I. File Distribution (data for rules)

  • im.rul
  • plural.rul
  • verbinfl.rul

  • dm.rul

II. File Format

There are four different types of data specified in a file. They are:

  1. Comments: lines start with #
  2. Rules: lines with a format of
    key | Input Category | Input Inflection | value | Output Category | Output Inflection
  3. Exceptions: lines start with space, follow with key | value;
  4. Another file for additional rules: lines start with #include "file name"

These types of format work all right. However, they are not intuited to human and thus hard to maintain. A new format is proposed and used to improve its maintainability:

  1. Comments: lines start with #
  2. Rules: lines start with RULE:
  3. Exceptions: lines start with EXCEPTION:
  4. Another file: lines start with FILE:
  5. Standardize the abbreviation of Category and Inflection to be consistent through all lvg components.
    • Category: noun, verb, adj, adv
    • Inflection: base
      singular, plural
      infinitive, pres, past, presPart, pastPart
      positive, comparative, superlative
< Example >
# This is a comments
RULE: e$|adj|positive|er$|adj|comparative
EXCEPTION: inhale|inhaler;
FILE: verbinfl.rul

III. File Characteristics

  • All rules and exceptions are bi-directional (forward and reverse)
  • key is used as a pattern to match the input term
  • value is used as a pattern to change the input term for output
  • Exceptions are the exceptions for the rule that locate right above them

  • key in a rule is called input suffix
  • value in a rule is called output suffix
  • WildCards are used in input suffix and output suffix
  • input category must be one of: adj, adv, noun or verb
  • input inflection must be one of: base, singular, positive, infinitive, plural, comparative, superlative, pres, presPart, past, pastPart
  • output category must be one of: adj, adv, noun or verb
  • output inflection must be one of: base, singular, positive, infinitive, plural, comparative, superlative, pres, presPart, past, pastPart