INFORMATION & RESOURCES
The MEDLINE co-occurrences file summarizes the MeSH Descriptors that occur together in MEDLINE citations from the MEDLINE/PubMed Baseline. The MEDLINE/PubMed Baseline is a snapshot created at the beginning of each new MeSH Indexing Year containing the MEDLINE, OLDMEDLINE, and PubMed-not-MEDLINE records.
|MH - *Poisoning
MH - *Poisons
MH - Vanillic Acid/analogs & derivatives
MH - Veratrum/*metabolism
|Poisoning — Poisons
Poisoning — Vanillic Acid
Poisoning — Veratrum
Poisons — Vanillic Acid
Poisons — Veratrum
Vanillic Acid — Veratrum
The example above shows the indexing from a sample MEDLINE citation on the left and the list of co-occurrences that would be generated from the indexing on the right. We also track whether each of the MeSH Descriptors is considered a Major Topic (starred). In this example, Poisoning, Poisons, and Veratrum are considered Major Topics. A more complete example is available in the documentation.
Asterisks (stars) on MeSH Descriptors and Qualifiers (e.g., Veratrum/*metabolism) designate that they are the Major Topics of the article. Non-Major (non-asterisked) Descriptors and Qualifiers are usually additional topics substantively discussed within the article, terms added to qualify a Major Topic, or Check Tags (excerpt from Medical Subject Headings (MeSH®) in MEDLINE®/PubMed®: A Tutorial with slight modification for this description). We specifically identify the co-occurrences where both MeSH Descriptors are marked as Major Topics for backward compatibility with the legacy MRCOC file. The legacy MRCOC file only tracked co-occurrences where both MeSH Descriptors were marked as Major Topics. We now track all MeSH Descriptor co-occurrences for completeness.
For each MEDLINE/PubMed Baseline, we have created two files: One with the complete details (detailed_CoOccurs_YYYY.txt) and one with a summarized version (summary_CoOccurs_YYYY.txt) of the identified co-occurrences. The summary file is the replacement for the legacy UMLS MRCOC file. The more detailed file provides deeper and richer data if the information is required. For example, from the detailed file, you can identify all of the PMIDs where the MeSH Descriptors Poisoning and Vanillic Acid co-occur and are both identified as Major Topics. Or, if you are trying to identify the earliest paper talking about both Poisoning and Vanillic Acid, you will find the information in the detailed file.
In 2016, we began providing two additional files that should help put the co-occurrence information into context. The first file is MH_freq_counts_YYYY.txt which provides a frequency count for every MeSH heading over the full MEDLINE Baseline, and broken down by the three time periods: MED (last five years), MBD (next five years), and RST (everything prior to MBD). The second file (summary_CoOccurs_asPctOverall_YYYY.txt) is a summary file which brings together the MRCOC information and the MeSH heading frequency information to provide context about just how frequent each co-occurrence is when compared to how often each of the co-occurring terms appears within the MEDLINE Baseline.
For example, the summary_CoOccurs_asPctOverall_YYYY.txt file will allow you to see that Cervical Plexus (D002572) co-occuring with Female (D005260) has increased over time as a percentage of Cervical Plexus being indexed. RST (11 years backwards): 270/841 (32.10%), MBD (6-10 years ago): 78/135 (57.78%), and MED (last five years): 64/97 (65.98%). So, while the overall use of Cervical Plexus seems to be decreasing, the indexing of Female along with it is increasing.
The co-occurrences are summarized by timeframe (MED - last five years of MEDLINE, MBD - previous five years of MEDLINE (years 6-10), and RST - the remaining years of MEDLINE) based on the Year from the Date Completed (the date indexing processing was completed). For each co-occurrence, we track the MeSH Descriptor Unique Identifier (DUI), UMLS Concept Unique Identifier (CUI), the overall frequency for the occurrence of the two MeSH Descriptors in the same MEDLINE citation, the frequency of when both MeSH Descriptors are starred (identified as Major Topics) in the same MEDLINE citation, Date Completed Year, timeframe, and several supplemental information frequencies detailed in the documentation. We specifically flag the co-occurrences where both MeSH Descriptors are marked as Major Topics (starred) for backward compatibility with the legacy MRCOC file.
2023 MEDLINE Baseline Version (1.9GB; 18G uncompressed)md5sum: da888a18161339ca76073d9ae1990b5b
2022 MEDLINE Baseline Version (1.8GB; 17GB uncompressed)
2021 MEDLINE Baseline Version (1.7GB; 17GB uncompressed)
2020 MEDLINE Baseline Version (1.7GB; 16GB uncompressed)
2019 MEDLINE Baseline Version (1.6GB; 16GB uncompressed)
2018 MEDLINE Baseline Version (1.6GB; 15GB uncompressed)
2017 MEDLINE Baseline Version (1.5GB; 14GB uncompressed)
The file is sorted into DUI1, DUI2, Completed Year, and PMID order clustering all of the DUI1/DUI2 co-occurrence combinations by the year completed for easier summarization. The file also contains information identifying which MeSH Qualifiers are associated with which MeSH Descriptors in the co-occurrence. So, it is possible to recreate the legacy MRCOC file LQ and LQB two-way view from this file if desired. This file contains multiple dates allowing for the identification of the earliest occurrence of a co-occurrence. Please see the more detailed explanation of this file in the documentation along with a detailed explanation of the various dates that are tracked.
2023 MEDLINE Baseline Version (26G; 196G uncompressed)md5sum: 0d772c5918c5dc9051b76bc80ff4b192
2022 MEDLINE Baseline Version (26GB; 191GB uncompressed)
2021 MEDLINE Baseline Version (24GB; 183GB uncompressed)
2020 MEDLINE Baseline Version (23GB; 176GB uncompressed)
2019 MEDLINE Baseline Version (22GB; 169GB uncompressed)
2018 MEDLINE Baseline Version (21GB; 161GB uncompressed)
2017 MEDLINE Baseline Version (20GB; 154GB uncompressed)
2023 MEDLINE Baseline Version (388K; 986K uncompressed)md5sum: 9a110ab5aa80bc823132a961deaae7b1
2022 MEDLINE Baseline Version (385KB; 977KB uncompressed)
2021 MEDLINE Baseline Version (379KB; 965KB uncompressed)
2020 MEDLINE Baseline Version (375KB; 955KB uncompressed)
2019 MEDLINE Baseline Version (371KB; 945KB uncompressed)
2018 MEDLINE Baseline Version (364KB; 926KB uncompressed)
2017 MEDLINE Baseline Version (356KB; 906KB uncompressed)
2023 MEDLINE Baseline Version (1.2G; 8.4G uncompressed)md5sum: 98816780cf032fbe5d77d31f191fa87d
2022 MEDLINE Baseline Version (1.1GB; 8.2GB uncompressed)
2021 MEDLINE Baseline Version (1.1GB; 7.9GB uncompressed)
2020 MEDLINE Baseline Version (1.1GB; 7.8GB uncompressed)
2019 MEDLINE Baseline Version (1.1GB; 7.6GB uncompressed)
2018 MEDLINE Baseline Version (1007MB; 7.3GB uncompressed)
2017 MEDLINE Baseline Version (977MB; 7.1GB uncompressed)
(237 kb) Building an Updated MEDLINE Co-Occurrences (MRCOC) File
Provides a detailed explanation of how and why the files are created and details the format of each of the download files
Historically, the UMLS (Unified Medical Language System) MRCOC file tracked the co-occurrences of important concepts from three sources: MEDLINE, AI/RHEUM (The Artificial Intelligence Rheumatology Consultant System), and CCPSS (The Canonical Clinical Problem Statement System). The AI/RHEUM and CCPSS data are not available to update the information for use in the new MRCOC file. The existing AI/RHEUM and CCPSS records from the MRCOC file are available in the historical versions of the UMLS releases up through the 2013AA release, or as a static file representing the 2013AA version of the AI/RHEUM and CCPSS data from our archive.