Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

prevariants

Descriptions:

Prevariants, tell a lower cased form is or could be an acronym or abbreviation.

Fields:

FieldNameNotes
1FROMInflected variable, lower cased, unique
2FLAG
  • 0: means FORM is never an abbreviation or acronym (dog)
  • 1: means FORM is sometimes an abbreviation or acronym (aids, yes, bras)
  • 2: means FORM is always an abbreviation or acronym (aca)
3SCASyntactic category (combined value)

Notes:

Many of the ambiguities seems as the result of lower-casing (AIDS/aids). For examples:

dog|0|1152
aids|1|1152
nih|2|128

Algorithm:

  • Go through input file, Lexicon, and put all Lexical record into a Vector of LexRecord Java object.
  • Go through all lexical record objects:
    • Get lower cased inflectional variants and put into forms. Go through all forms:
      • If the form does not exist in a hash table, preVars:
        => Put formas key in the hash table.
        => Instantiate a preVar include form, cat, abb, and nonAbb
        => put the preVar into the hast table

      • It the form exists in the hast table,
        => Get the preVar out from the hash table
        => Update the preVar for logical or (|) on cat, abb, nonAbb
        => put the preVar into the hast table
    • Go through hash table, PreVar, to print:
      field 1: form (inflected, lower cased, unique)
      field 2:
      ValueConditionNotes
      0abb = falseNever an abbreviation
      1abb = true and nonAbb = trueSometimes an abbreviation
      2abb = true and nonAbb = falseAlways an abbreviation

      field 3: category (combined number)
  • Sort the result (Done by Unix command sort).
  • Generate ASCII only file