The National Library of Medicine (NLM) provides information about biomedical datasets. Inclusion on an NLM Web page does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health. Learn more: NLM Web Policies (


Comparison Table

Patient Count 413,457 285,300,139 49,102,289 502,219
Data Start Year 1981 1999 1948 1980
Data End Year 2022 2020 2021 2021
Region United States United States United Kingdom United Kingdom
Deceased Count 3,256 49,151,453 3,141,446 44,498
Primary Dataset Type EHR, Research, Surveys, Genetic Administrative Claims EHR EHR, Research, Surveys, Genetic
Visit Context All All Outpatient Outpatient, Inpatient
Recruitment Status Active Active Active Active
Recruitment Age 18+ N/A N/A 40-60
Native Data Format OMOP Custom Custom Custom

All of Us

The National Institutes of Health’s All of Us Research Program is a large-scale United States based research program that began nationwide enrollment in May 2018 and intends to recruit more than one million participants. The program integrates Electronic Health Records (EHR) with survey questionnaire to develop a diverse, information rich database that serves as a central point for many secondary research studies and reduce the need for developing individual single use study specific data collection protocols. The program includes two tiers of data access the Registered tier and the more restricted Controlled tier.


The Centers for Medicare and Medicaid Services (CMS) Virtual Research Data Center (VRDC) collection contains populated claim forms and administrative meta data describing individual providers, facilities, patients, care plans and transactions known to CMS. The data is sourced from Medicare, Medicaid, Child Health Insurance Program (CHIP) and Social Security Disability Insurance (SSDI) encounters among others.


Clinical Practice Research Datalink (CPRD) is a real-world research service supporting retrospective and prospective public health and clinical studies. CPRD includes de-identified Electronic Health Record patient level data from a network of UK based general practitioners (GPs). This profile is for CPRD AURUM which includes data collected from practices that use EMIS clinical systems.

UK Biobank

The UK Biobank program is a large health and biomedical database that serves multiple retrospective, observational studies and includes over half a million participants between the ages of 40 and 69 from the United Kingdom. UK Biobank contains a combination of health, questionnaire and genetic data that is regularly updated and enriched with new data fields.