The SPECIALIST Lexicon

Antonym Generation

This page describes the antonym generation design, implementatin and processes as follows:

  • ${ANTONYM}/input/antCand.data.tag.${YEAR}

    This is the latest accumulated antonym candidate tagged file. Antonyms are gathered from 6 different sources (LEX|SD|PD|TT|CC|SN). All antonym candidates are tagged for each source. Source of TT are further assigned into source of LEX\SD|PD|CC|SN. This file includes valid and invalid antonym canidates. It is used as a baseline for:

    • tagging for future releases
    • generating antonyms for current release

  • ${ANTONYM}/output/aPair.data.${YEAR}

    aPairs are generated from the above tagged file. Antonym candidates with [Y] tag (Canonical antonym) and valid fields are used to generated aPairs.

    Spelling variants are added in this step. Sometimes different spVars do have slightly different meanings (and thus, could theoretically have different aPairs), but that is also the case for base forms without spVars (such as "bank" being a riverbank or a financial institution). For example: Hell vs hell are spVars of the same entry of E0031052, and we had interpreted Hell is slightly more of a "location" (e.g. "The Bible talks about sinners going to Hell") while hell is usually used as more of a "quality" ("Traffic today was hell") just based on the way they are typically seen in examples -- but in fact both hell and Hell are ambiguous between both uses. So although it is sometimes hard to make a decision about the "best" tag, I think whatever tag is used should apply to the entire set of spVars, and therefore it is reasonable to draw the conclusion that if colorful and colorless are an aPair, so are all the permutations of {color/colour}-ful & -less.
    That is: four aPairs are generated from tagged antonym candidate: colorful|E0017909|colorless|E0017911|adj|Y|AB2|O|quality|SD
    The output aPairs with spVars are:

    • colorful|E0017909|colorless|E0017911|adj|AB2|O|quality|SD
    • colorful|E0017909|colourless|E0017911|adj|AB2|O|quality|SD
    • colourful|E0017909|colorless|E0017911|adj|AB2|O|quality|SD
    • colourful|E0017909|colourless|E0017911|adj|AB2|O|quality|S

  • ${ANTONYM}/output/antonyms.data.${YEAR}

    This file is the final delivered antonym file. It is used in the LVG database. This files is generated from above aPair.data.${YEAR} file. It performs the following operations:

    • adds a key field (1s field, LC & NoPunc)
    • converts POS to number
    • converts 1-way aPairs to 2-way antonyms
    • unifies all antonyms
    • Reassign source if the antonyms is duplicates among sources (LEX > SD > PD > CC > SN)
    • find issues, such as tag conflicts, tag duplicates, source conflicts.

  • ${ANTONYM}/output/analysis/

    This diretory includes many analysis on the generated antonyms. It includes:

    • domain.out.cand.canon: valid domain name used
    • *.stats:
      statistic data on antonym candidates and antonyms. The analysis includes:
      • Source distribution: LEX|SD|PD|CC|SN
      • POS distribution: adj|noun|verb|adv|Pron|modal|prep|aux|det|conj
      • Type: bounded|unbounded|asymmetric bounded|NA
      • Negation: True negative|Broadly negative|Otherwise negative
      • Domain distribution: existence|frequency|physical_property|possibility|quality|quantity|role|size|temperature|temporal