prevariants
Descriptions:
Prevariants, tell a lower cased form is or could be an acronym or abbreviation.
Fields:
Field | Name | Notes
|
---|
1 | FROM | Inflected variable, lower cased, unique
|
2 | FLAG
|
- 0: means FORM is never an abbreviation or acronym (dog)
- 1: means FORM is sometimes an abbreviation or acronym (aids, yes, bras)
- 2: means FORM is always an abbreviation or acronym (aca)
|
3 | SCA | Syntactic category
(combined value)
|
Notes:
Many of the ambiguities seems as the result of lower-casing (AIDS/aids). For examples:
dog|0|1152
aids|1|1152
nih|2|128
Algorithm:
- Go through input file, Lexicon, and put all Lexical record into a Vector of LexRecord Java object.
- Go through all lexical record objects:
- Get lower cased inflectional variants and put into forms. Go through all forms:
- If the form does not exist in a hash table, preVars:
=> Put formas key in the hash table.
=> Instantiate a preVar include form, cat, abb, and nonAbb
=> put the preVar into the hast table
- It the form exists in the hast table,
=> Get the preVar out from the hash table
=> Update the preVar for logical or (|) on cat, abb, nonAbb
=> put the preVar into the hast table
- Go through hash table, PreVar, to print:
field 1: form (inflected, lower cased, unique)
field 2:
Value | Condition | Notes
|
---|
0 | abb = false | Never an abbreviation
|
1 | abb = true and nonAbb = true | Sometimes an abbreviation
|
2 | abb = true and nonAbb = false | Always an abbreviation
|
field 3: category (combined number)
- Sort the result (Done by Unix command sort).
- Generate ASCII only file