The SPECIALIST Lexicon

CC Source Model - Co-occurrence in Corpus (MEDLINE)

I. Introduction

Co-occurrence hypothesis is one of the most popular approaches for antonym identification [1989 Charles & Miller, 1995 Fllbaum, 2015 Tesfaye]. In this Co-occurrence in Corpus (CC) model, first, we enhanced co-occurrence patterns from previous researches [Justeson and Katz, 1991] to identify 10 co-occurrence patterns. These patterns are derived from a collection of 1000 antonyms from the internet domain [Lu, 2021]. The MEDLINE n-gram set [Lu 2015] is used as the corpus. These patterns are in the format of [X keyword Y], while keywords include: -and-, -or-, -to-, -versus-, -than-, -vs-, -from-, -nor-, -and/or- and -as well as-. High frequency co-occurrence terms that meet these patterns from the corpus (MEDLINE n-gram set) that are not Lexicon synonyms [Lu 2017], has CUIs, and meet STI rules are retrieved as aPair candidates, such as [above|below|prep], [accept|reject|verb], [sick|well|adj] and [birth|death|noun]. Both frequency in the MEDLINE (word count) and in the keywords (pattern count) are taken into consideration during this process.

II. Design

Two MEDLINE n-grams files are used for this model:

  • 3-gram.2024.30.core: for [X keyword Y], where keywords are: -and-, -or-, -to-, -versus-, -than-, -vs-, -from-, -nor-, -and/or-.
  • 5-gram.2024.30.core: for [X as well as Y]

Derived Pattern Details, please see design documents for details:

Ant-2Ant-2Co-occurrence Examples
normalabnormal
  • 11160|normal and abnormal
  • 2387|normal nor abnormal
  • 1917|normal or abnormal
  • 463|abnormal and normal
  • 385|normal from abnormal
  • 243|normal versus abnormal
  • 159|normal to abnormal
  • 125|abnormal or normal
  • 69|abnormal as well as normal
externalinternal
  • 15160|internal and external
  • 6836|external and internal
  • 1667|internal or external
  • 898|external or internal
  • 184|internal versus external
  • 164|internal as well as external
  • 124|internal to external
  • 122|internal, and external
  • 116|internal and/or external
  • 114|external to internal
...

We observed from above table,

  • Most of these aPairs fall into the collocate patterns of [Ant-1 keyword Ant-2]. Keywords are in the middle of the 3-gram, including “and”, “or”, “versus”, “to”, etc.
  • Some aPairs, such as calm|excited, buyer|seller, are not co-occurring in the MEDLINE n-grams. The reasonable guesses are:
    • the MEDLINE n-gram set does not cover these aPairs. In such case, we suggest applying this co-occurrences model with another corpus to find collocate patterns.
    • These aPairs cannot be derived by collocate model. In such case, we suggest performing more research and focus on the semantics. These types of aPairs are categorized with source of [SN] (semantic in corpus).

III. Implementation

Java source codes are implemented in the directory of Medline:

  • GetAntCandFrom3GramPatMid.java
  • GetAntCandFrom5GramPatMid.java

Algorithm:

  • go through all n-grams (N = 3 or 5) to retrieve antonyms from the normalized (coreterm) 1st and last grams. The middle word(s) are used as keywords.
  • check if middle word(s) match key words
  • check if the normalized 1st and last grams meet the criteria of antonyms:
    • have EUIs (in the Lexicon)
    • single words
    • have the same POS
    • not invalid words for antonym in CC model, such as "the", "a", "which", "not", etc.
    • not synonyms
    • have CUIs
    • have STIs, either same STIs or legal STI pairs
      legal STI was derived from tagged aPair candidates with occurrence above 10 for canonical aPairs. The report is in the file: ${ANTONYM}/${YEAR}/output/Analysis/antCand.data.tag.cuiSti.rpt.
      STI-1STI-2Frequency
      T033|FindingT080|Qualitative Concept38
      T033|FindingT121|Pharmacologic Substance10
      T033|FindingT169|Functional Concept19
      T033|FindingT170|Intellectual Product11
      T033|FindingT184|Sign or Symptom15
      T078|Idea or ConceptT080|Qualitative Concept10
      T080|Qualitative ConceptT081|Quantitative Concept13
      T080|Qualitative ConceptT082|Spatial Concept10
      T080|Qualitative ConceptT121|Pharmacologic Substance10
      T080|Qualitative ConceptT169|Functional Concept37
      T121|Pharmacologic SubstanceT169|Functional Concept10
  • convert to base form (citation form) for aPair candidates

IV. References

  • Walter G. Charles, George A. Miller, Contexts of antonymous adjectives, Applied Psycholinguistics (1989) 10, 357-375
  • Christiane Fellbaum, Co-Occurrence and Antonymmy, International Journal of Lexicography, Vol 8 no 4, 1995 Oxford University Press, 281-303
  • Debela Tesfaye, Carita Paradis, On the use of antonyms and synonyms from a domain perspective, Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, 150-154
  • John S. Justeson, Slava M. Katz, Co-occurrences of Antonymous Adjectives and Their Contexts, Computational Linguistics, Vol 17, No 1, Association for Computational Linguistics, 1991, 1-19