INFORMATION & RESOURCES

MEDLINE/PubMed Baseline Repository (MBR)

The MEDLINE/PubMed Baseline Repository (MBR) provides access to each MEDLINE/PubMed Baseline snapshot starting with the 2002 MEDLINE Baseline. Each baseline contains a snapshot of MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The baseline snapshot is created at the beginning of each new MeSH Indexing Year. The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.

Terms and Conditions   NLM Data Distribution Site   Repository Details (PDF: 37 kb)

The baselines are normally generated towards the middle of November each year and contain all completed citations in MEDLINE as of that date. The baselines represent MEDLINE after the year-end processing has been completed. This means that the records have been revised with the upcoming year's new MeSH vocabulary terms. We currently have available the 2002 - current year MEDLINE/PubMed Baselines. The naming of the baselines represents this year-end processing. For example, the 2002 MEDLINE/PubMed Baseline contains all completed citations from the mid-1960's until the date the baseline was created in late November 2001 with the year-end processing assigning appropriate 2002 MeSH vocabulary terms, thus it is a baseline for the 2002 year.

The baselines also contain citations that are not MEDLINE. All of the baselines we have stored (2002 on) contain "Out-of-scope" citations which were renamed to "PubMed-not-MEDLINE" starting with the 2004 MEDLINE/PubMed Baseline. The PubMed-not-MEDLINE status refers to citations that reside in PubMed from journals included in MEDLINE and have undergone quality review but are not assigned MeSH headings because the cited item is not in scope for MEDLINE either by topic or by date of publication. Citations in the Out-of-scope or PubMed-not-MEDLINE status make up a very small percentage of the total number of citations contained in the baselines (For example, 0.51% or 75,271 records in the 2005 baseline and 1.8% or 323,919 records in the 2009 baseline).

Starting with the 2005 MEDLINE/PubMed Baseline, OLDMEDLINE citations are also included in the baselines. The OLDMEDLINE citations make up approximately 11% of the total number of baseline citations. The OLDMEDLINE citations are from international biomedical journals covering the fields of medicine, preclinical sciences, and allied health sciences. The citations were originally printed in hardcopy indexes published prior to 1966. For additional information, please refer to the following URL: https://www.nlm.nih.gov/databases/databases_oldmedline.html.

In the 2005 baseline the subject indexing from the OLDMEDLINE citations were stored solely in the "Other Term" (or "OT") tagged fields and not the MeSH Terms (or MH) tagged fields. This means that searching the 2005 baseline from our MBR Query Tool via the MH field does not include any OLDMEDLINE citations. The only way to include OLDMEDLINE records in the 2005 baseline is to do a timeframe query without specifying any field specific search criteria. Beginning with the 2006 baseline, Other Terms are starting to be mapped to current MeSH Terms so that searching via the MH field may retrieve some OLDMEDLINE records, but, not necessarily the complete set of possibilities.

Starting with the 2007 MEDLINE/PubMed Baseline, on records where all the OLDMEDLINE terms are converted to MeSH Headings, the citation status changes to MEDLINE. You need to rely on the <CitationSubset>OM</CitationSubset> element to determine if a citation is in the OLDMEDLINE subset.

Select the Baseline year to see the directory listing for that MEDLINE/PubMed Baseline allowing you to download all of the files for that baseline year. The DTDs link is for a gzipped tar file containing all of the required DTD files for that year's baseline files.

 Baseline  Created  # Files  # Citations   DTD Files 
2021 December 14, 2020 1062 31,850,051 DTDs
2020 November 19, 2019 - December 3, 2020 1015 30,420,660 DTDs
2019 December 10 & 11, 2018 972 29,138,916 DTDs
2018 November 27 & 28, 2017 928 27,837,540 DTDs
2017 December 13, 2016 892 26,759,399 DTDs
2016 November 20, 2015 812 24,358,442 DTDs
2015 November 24, 2014 779 23,343,329 DTDs
2014 November 21, 2013 746 22,376,811 DTDs
2013 November 15 & 16, 2012 717 21,508,439 DTDs
2012 November 18, 2011 684 20,494,848 DTDs
2011 November 19, 2010 653 19,569,568 DTDs
2010 November 20, 2009 617 18,502,916 DTDs
2009 November 21 & 22, 2008 593 17,764,826 DTDs
2008 November 16 & 17, 2007 563 16,880,015 DTDs
2007 November 17 & 18, 2006 538 16,120,074 DTDs
2006 November 18 & 19, 2005 516 15,433,668 DTDs
2005 November 20, 2004 500 14,792,864 DTDs
2004 November 14-18, 2003 417 12,421,396 DTDs
2003 November 1-4, 2002 396 11,847,524 DTDs
2002 Approx. November 21, 2001 379 11,299,108 DTDs

We generate a large number of data files during our normal processing of each set of baseline files. We make available the files that we think others might be able to use with the goal of trying to reduce any duplication of effort.

The MeSH FTP download site: ftp://nlmpubs.nlm.nih.gov/online/mesh/ now includes separate directories for each release year of MeSH. In addition, MeSH created the folder "MESH_FILES" with the latest release files that are updated every morning Monday - Friday. The yearly release folders span from 2011 to the latest full release which occurs in November of the preceding year (for example, 2016 MeSH was released in November of 2015). A single directory is also included for earlier files from 1999-2010.

Semantic Types and Groups: A parsable list of Semantic Types and their abbreviations from the UMLS and a parsable list of Semantic Groups and their mappings to the Semantic Types.
Semantic Types and Groups

The MEDLINE N-gram Set from The SPECIALIST Lexicon. N-grams of size 1 - 5 are identified from all of the Title and Abstract for each MEDLINE citation in baseline.
Latest information on the SPECIALIST Lexicon MEDLINE N-Gram Set


Medical Subject Headings (MeSH) Files Available to Download web page. This page describes the various files that are available to download from MeSH and detailed information about each of the files.
https://wayback.archive-it.org/org-350/20191102205209/https://www.nlm.nih.gov/mesh/filelist.html

Unified Medical Language (UMLS) main information web page. This provides links to information fully explaining the UMLS and the UMLS data files.
https://www.nlm.nih.gov/research/umls/

NLM Bibliographic Services Division description of the MEDLINE/PubMed XML elements.
https://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html

Information on OLDMEDLINE records in Medline.
https://www.nlm.nih.gov/databases/databases_oldmedline.html