Antonym Tags
Antonym candidates are manually tagged. The tagged information is saved and used for future releases. The definition of tags and tagging processes are described below.
I. Fields of Antonym Candidates
The candidate file has 10 fields, as shown in the table below.
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7 | Field 8 | Field 9 | Field 10
|
---|
Ant-1 | EUI-1 | Ant-2 | EUI-2 | POS | Canon | Type | Negation | Domain | Source
|
---|
II. Definition of Tags
The generic definitions of tags are described in this section.
- Field 1 – Ant-1
This field specifies the 1st antonym (base form) of the aPair. It is the base form of the antonym for Ant2 (which must have the same POS). For the current study, we restrain antonym to be single word (cannot be a multiwords). This field is automatically generated by computer programs.
- Field 2 – EUI-1
This field specifies the associated EUI for antonym-1, automatically generated by computer programs.
- Field 3 – Ant-2
This field specifies the 2nd antonym (base form) of the aPair. It is automatically generated by programs.
- Negative antonyms are assigned to Ant2 by programs if the sources is LEX, SD or PD.
- Ant1 and Ant2 are assigned in alphabetical order by programs if the source is CC and SN.
- If the EUI is [EUI_TBD], this means the associated antonyms are not in the Lexicon. Ant1 or Ant2 should be added to the Lexicon if they are valid words with the given POS. EUI_TBD is used in TT model, not in the annual production models.
- Field 4 – EUI-2
This field specifies the associated EUI for antonym-2, automatically generated by computer programs.
- Field 5 – POS
This field specifies the Part-Of-Speech (category) of the aPair. Antonyms must have the same POS. Legal values include: [adj], [noun], [verb], [adv], [prep], [modal], [aux], [pron], [det] and [conj]. POS of [compl] is not included for aPairs. POS is automatically generated by computer programs.
- Field 6 – Canon
Canonicity: a canonical aPair must have generic domains (central to human life and way of living across times and cultures). tag [CANON_TBD] is assigned by computer programs during the aPair candidate generation. Legal tags include:
- [Y] – canonical, only canonical antonyms will be used as aPairs in the Lexicon.
- [N] – non-canonical
- Type must be [NA]
Computer programs will change type from [TYPE_TBD] to [NA] automatically. No need for manually tagging!
- Domain must be [DOMAIN_NONE]
Computer programs will change domain from [DOMAIN_TBD] to [DOMAIN_NONE]. No need for manually tagging.
- Field 7 – Type
This field specifies the type of antonyms by their meaning in relation to negation. Bounded antonyms represent two endpoints on a domain. There is no middle ground for bounded antonyms, such as [dead|alive]. Unbounded antonyms, such as [long|short], are unbounded in the sense that extreme values never reach an endpoint. In practice, we choose the criteria (listed below) of X ≠ not Y, Y ≠ not X over endpoints if there is a conflict. For example, [always|never] is tagged as [UB] even though they are two endpoints ([B]). Asymmetric bounded antonyms are pairs with one negative word and/or endpoint on a scale and one non-endpoint/negative word. For example, impossible (negative/endpoint) – possible
(non-negative/endpoint). This allows us to apply bounded and asymmetric bounded antonyms in subterm substitution for concept mapping applications. Tags for type include:
- [B]: bounded antonym, if X = not Y, Y = not X
- [UB]: unbounded antonym, if X ≠ not Y, Y ≠ not X
- [AB1]: asymmetric bounded, if X = not Y, Y ≠ not X, where X is the negative/endpoint
- [AB2]: asymmetric bounded, if Y = not X, X ≠ not Y, where Y is the negative/endpoint
- [NA]: Not applicable (not of any of above cases). This tag is used when Canonical is [N] or not [B|UB|AB1|AB2].
Examples:
- [B]: [dead|alive|adj], [false|true|adj], [closed|open|adj], [with|without|prep], [asleep|awake|adj], [irregularly|regularly|adv], [emergency|nonemergency|noun], [exclude|include|verb]
- [UB]: [narrow|wide|adj], [light|dark|adj], [low|high|noun], [sad|happy|adj], [rich|poor|adj], [careful|careless|adj], [first|last|adv], [form|destroy|verb]
- [AB1]: [bad|good|adj], [asleep|alert|adj], [bumpless|bumpy|adj], [chilly|warm|adj], [colorless|colorful|adj], [exactly|approximately|adv], [silence|noise|noun], [disbelieve|believe|verb]
- [AB2]: [good|bad|adj], [alert|asleep|adj], [bumpy|bumpless|adj], [warm|chilly|adj], [colorful|colorless|adj], [approximately|exactly|adj], [noise|silence|noun], [believe|disbelieve|verb]
- [NA]: Used in the tagging process when the antonym candidates are not a valid canonical antonym pair.
Usage:
APairs with tags of [B] and [AB1/AB2] can be used in subterm substitution for better recall. For example, in the above table:
- not asleep = awake; not awake = asleep
- bad = not good; however, good ≠ not bad
- Field 8 – Negation
This field specifies the negation of an aPair. Tags for negation include:
- [N1]: true/strict negative, ant1 is the negative antonym
- [N2]: true/strict negative, ant2 is the negative antonym
- [BN1]: broadly negative, ant1 is the negative antonym
- [BN2]: broadly negative, ant2 is the negative antonym
- [O]: Otherwise
Examples:
- [N1]: [exclude|include|verb], [false|true|adj], [lack|plenty|noun], [never|always|adv]
- [N2]: [include|exclude|verb], [true|false|adj], [plenty|lack|noun], [always|never|adv]
- [BN1]: [failure|success|noun], [fake|real|adj], [rarely|usually|adv], [repel|attract|verb]
- [BN2]: [success|failure|noun], [real|fake|adj], [usually|rarely|adv], [attract|repel|verb]
- [O]: [ask|reply|verb], [relaxed|upset|adj], [slow|quick|adv], [student|teacher|noun]
APairs with sources of LEX, SD and PD only have negations of N2|BN2 because negative antonyms are assigned to ant2 automatically by computer programs. However, aPairs with a source of CC could have negations of N1|N2|BN1|BN2 because ant1 and ant2 are arranged by alphabetical order by computer programs.
Usage:
Negative antonyms can be used as negation detection cue words. For example, unsuccessful, useless, without are negation detection cue words (A sentence is detected as negative if it contains these cue words) from the above table.
- Field 9 – Domain
This field specifies the domain that is central to human life and way of living across times and cultures (generic). Domain is used to determine if an aPair is a canonical antonym. For example, color, temperature, existence, length, weight, speed, time, quality are legit domains; while chocolate, tea, fruit
are not legit domains as shown in examples below.
- Canonical: [black|white|color], [hot|cold|temperature], [dead|alive|existence], [short|long|length], [slow|fast|speed], [slow|quick|time]
- Non-Canonical: [white|dark|chocolate], [hot|iced|tea], [dry|fleshy|fruit]
- [DOMAIN_NONE]: used when canonical is tagged as [N].
Our goal is to keep the size of domain as small and property as generic as possible. A domain list will be generated automatically by the computer programs during the tag validation processes.
- Field 10 – Source (Model)
Sources are assigned by computer programs. This field specifies the source (model) the aPair is generated. Valid tags include:
- [LEX]: Lexical records with negative tags.
- [SD]: suffix derivation with negation. (derived antonyms)
- [PD]: prefix derivation with negation. (derived antonyms)
- [CC]: co-occurrence in a corpus (co-occurring model from MEDLINE n-gram set)
- [SN]: semantic network (aPairs are found in a semantic network - WordNet)
Antonym candidates from Training and Test Set [TT] are re-assigned to the above four sources.
Please refer to original design documents