MEDLINE Co-Occurrences (MRCOC) Files

The MEDLINE co-occurrences file summarizes the MeSH Descriptors that occur together in MEDLINE citations from the MEDLINE/PubMed Baseline. The MEDLINE/PubMed Baseline is a snapshot created at the beginning of each new MeSH Indexing Year containing the MEDLINE, OLDMEDLINE, and PubMed-not-MEDLINE records.


 Indexing   Co-Occurrences Generated 
MH  - *Poisoning
MH  - *Poisons
MH  - Vanillic Acid/analogs & derivatives
MH  - Veratrum/*metabolism
Poisoning — Poisons
Poisoning — Vanillic Acid
Poisoning — Veratrum
Poisons — Vanillic Acid
Poisons — Veratrum
Vanillic Acid — Veratrum
Co-Occurrences Example

The example above shows the indexing from a sample MEDLINE citation on the left and the list of co-occurrences that would be generated from the indexing on the right. We also track whether each of the MeSH Descriptors is considered a Major Topic (starred). In this example, Poisoning, Poisons, and Veratrum are considered Major Topics. A more complete example is available in the documentation.

Asterisks (stars) on MeSH Descriptors and Qualifiers (e.g., Veratrum/*metabolism) designate that they are the Major Topics of the article. Non-Major (non-asterisked) Descriptors and Qualifiers are usually additional topics substantively discussed within the article, terms added to qualify a Major Topic, or Check Tags (excerpt from Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®: A Tutorial with slight modification for this description). We specifically identify the co-occurrences where both MeSH Descriptors are marked as Major Topics for backward compatibility with the legacy MRCOC file. The legacy MRCOC file only tracked co-occurrences where both MeSH Descriptors were marked as Major Topics. We now track all MeSH Descriptor co-occurrences for completeness.

For each MEDLINE/PubMed Baseline, we have created two files: One with the complete details (detailed_CoOccurs_YYYY.txt) and one with a summarized version (summary_CoOccurs_YYYY.txt) of the identified co-occurrences. The summary file is the replacement for the legacy UMLS MRCOC file. The more detailed file provides deeper and richer data if the information is required. For example, from the detailed file, you can identify all of the PMIDs where the MeSH Descriptors Poisoning and Vanillic Acid co-occur and are both identified as Major Topics. Or, if you are trying to identify the earliest paper talking about both Poisoning and Vanillic Acid, you will find the information in the detailed file.

In 2016, we began providing two additional files that should help put the co-occurrence information into context. The first file is MH_freq_counts_YYYY.txt which provides a frequency count for every MeSH heading over the full MEDLINE Baseline, and broken down by the three time periods: MED (last five years), MBD (next five years), and RST (everything prior to MBD). The second file (summary_CoOccurs_asPctOverall_YYYY.txt) is a summary file which brings together the MRCOC information and the MeSH heading frequency information to provide context about just how frequent each co-occurrence is when compared to how often each of the co-occurring terms appears within the MEDLINE Baseline.

For example, the summary_CoOccurs_asPctOverall_YYYY.txt file will allow you to see that Cervical Plexus (D002572) co-occuring with Female (D005260) has increased over time as a percentage of Cervical Plexus being indexed. RST (11 years backwards): 270/841 (32.10%), MBD (6-10 years ago): 78/135 (57.78%), and MED (last five years): 64/97 (65.98%). So, while the overall use of Cervical Plexus seems to be decreasing, the indexing of Female along with it is increasing.

The co-occurrences are summarized by timeframe (MED - last five years of MEDLINE, MBD - previous five years of MEDLINE (years 6-10), and RST - the remaining years of MEDLINE) based on the Year from the Date Completed (the date indexing processing was completed). For each co-occurrence, we track the MeSH Descriptor Unique Identifier (DUI), UMLS Concept Unique Identifier (CUI), the overall frequency for the occurrence of the two MeSH Descriptors in the same MEDLINE citation, the frequency of when both MeSH Descriptors are starred (identified as Major Topics) in the same MEDLINE citation, Date Completed Year, timeframe, and several supplemental information frequencies detailed in the documentation. We specifically flag the co-occurrences where both MeSH Descriptors are marked as Major Topics (starred) for backward compatibility with the legacy MRCOC file.

  • detailed_CoOccurs_YYYY.txt: [Very large file]
    The detailed descriptor co-occurrences file contains the complete information for each MeSH Descriptor co-occurrence and allows for identifying PMIDs for specific sets of co-occurrences.

The file is sorted into DUI1, DUI2, Completed Year, and PMID order clustering all of the DUI1/DUI2 co-occurrence combinations by the year completed for easier summarization. The file also contains information identifying which MeSH Qualifiers are associated with which MeSH Descriptors in the co-occurrence. So, it is possible to recreate the legacy MRCOC file LQ and LQB two-way view from this file if desired. This file contains multiple dates allowing for the identification of the earliest occurrence of a co-occurrence. Please see the more detailed explanation of this file in the documentation along with a detailed explanation of the various dates that are tracked.

  • MH_freq_counts_YYYY.txt [Small File]
    The descriptor frequency counts file contains a set of frequency counts for each MeSH Descriptor found in the MEDLINE Baseline sorted in DUI order. The file contains the DUI, UMLS CUI information, and an overall total frequency, MED time-period frequency, MBD time-period frequency, and RST time-period frequency of occurrences in the MEDLINE Baseline
  • summary_CoOccurs_asPctOverall_YYYY.txt [Large File]
    The summary_CoOccurs_asPctOverall_YYYY.txt file combines the Final MRCOC summary file with the Descriptor Frequency Counts file to provide percentage information on how often the co-occurrences occurs as a subset of the Descriptor. There will be two entries in this file for each co-occurrence to show what percentage the co-occurrence is for each of the Descriptors in the co-occurrence pairing.

  (237 kb) Building an Updated MEDLINE Co-Occurrences (MRCOC) File
Provides a detailed explanation of how and why the files are created and details the format of each of the download files

Historically, the UMLS (Unified Medical Language System) MRCOC file tracked the co-occurrences of important concepts from three sources: MEDLINE, AI/RHEUM (The Artificial Intelligence Rheumatology Consultant System), and CCPSS (The Canonical Clinical Problem Statement System). The AI/RHEUM and CCPSS data are not available to update the information for use in the new MRCOC file. The existing AI/RHEUM and CCPSS records from the MRCOC file are available in the historical versions of the UMLS releases up through the 2013AA release, or as a static file representing the 2013AA version of the AI/RHEUM and CCPSS data from our archive.