graphic depicting a biomedical researcher with various data types overlaying the imageLarge clinical datasets are an essential resource for biomedical research as they can provide data on millions of patients, which allows for greater strength and reliability in biomedical research. The valuable data found within these existing real-world, large-scale clinical datasets may reduce the need for some traditional intentional trials. However, accessing large clinical datasets can be challenging due to associated costs and license restrictions, among other barriers.

To address these challenges, the National Library of Medicine launched a new Center for Clinical Observational Investigations in 2023.

As a first step, NLM is curating a list of nationally and internationally available clinical datasets. Then, using informatics, data science, and statistical analysis, NLM will create and make available dataset profiles to include key information such as participants, demographics, diseases, and other characteristics important to research. The Center will also aim to employ a consistent approach to organize the data to foster standardization across the datasets and reduce ambiguity, improve reliability of research, and lower barriers to the use of data.


The stated clinical domains, visit contexts and individual concepts used in the CCOI dataset profiles were generated via data structured in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The OMOP CDM is a data standard, designed to standardize the structure and content of observational data. For more information on the OMOP CDM please visit OMOP Common Data Model ( For more information on individual OMOP concepts please visit Athena, the OMOP vocabulary library searchable by OMOP concept ID, source code and item description.


The Center for Clinical Observational Investigations (CCOI) Dataset Profiles are a free, web-based resource for researchers interested in using clinical observational datasets and includes a metadata profile comprised of three components: 1) dataset overview, 2) basic statistics, and 3) concept counts. The dataset profiles are carefully curated through multiple data sources.

To ensure uniformity of the dataset profiles and enable efficient interoperability of disparate datasets, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is used to harmonize each dataset alongside other datasets whenever an OMOP mapping does not exist. 

The CCOI Dataset Profiles allow clinical researchers to discover and understand available datasets and make informed decisions in dataset selection through the ability to assess project feasibility and compare metadata across different datasets. 

The CCOI Dataset Profiles are designed to: 

  • Allow researchers to search, discover, compare, and understand nuances across clinical observational datasets from various data sources
  • Provide a searchable, trusted single-source knowledge base of clinical dataset metadata that investigators will be able to query to identify real-world clinical observational data that can be used for their projects 

The CCOI does not host the individual-level datasets or their repositories but provides a link to these data sources for access and further exploration. If a CCOI dataset is no longer available, a message will be displayed that states, "The dataset is no longer available. For more information contact the dataset provider directly".

Through our dedication to curating dataset profiles and ensuring its ethical use and inclusivity, we aim to provide a resource that allows researchers to strategically select data aligned with their research hypothesis and scientific inquiry and facilitate informed decision-making on its feasibility.  By harnessing comprehensive and diverse datasets for their investigations, researchers can unlock the full potential of these resources to advance scientific knowledge and generate transformative insights leading to meaningful progress towards achieving improved health and health equity for all. 

Clinical observational datasets are reviewed by NLM’s CCOI for potential and continued inclusion in the CCOI Dataset Profiles using the criteria listed below.

  1. Inclusion Criteria
    1. Dataset supports clinical observational research
    2. For the CCOI Dataset Profiles development, the included datasets are:
      1. Publicly available or by request from the data source with sufficient information about how its dataset is managed to determine its appropriateness for inclusion in the CCOI
      2. Dataset from observational cohort studies will be included in the future
      3. Dataset passes CCOI Dataset Profiles technical review
      4. Metadata can be freely and easily accessed based on at least one of the criteria listed:
        • Provides an access point with link to dataset and includes documentation about its usage 
        • Provides the ability to find and select clinical observational datasets from a broad spectrum of datasets
        • Has a responsive point of contact listed by the dataset provider
  2. Criteria for removal/exclusion
    1. Dataset owner no longer wants dataset to be included in the CCOI 
    2. Content is no longer within the scope of the CCOI
    3. A new dataset version is available to replace the existing dataset

NLM will routinely examine dataset profiles to confirm alignment with policies and best practices.  Should non-compliance be identified, the team will collaborate with the dataset provider to rectify gaps or potentially exclude the dataset from CCOI.

Content in the CCOI Dataset Profiles may be collected from data sources and repositories managed by government agencies and other non-governmental organizations.

The standards specified under the CCOI Content and Inclusion Policy and Inclusion Criteria are taken into account when assessing a dataset for inclusion. NLM is not responsible for the quality of individual datasets.  Any inquiries about the datasets or their contents should be directed to the dataset provider. The inclusion of a dataset in the CCOI does not represent its endorsement.

The beta version of the CCOI’s Dataset Profiles is being launched to gather user feedback, which will guide future development efforts.

The inaugural launch of the CCOI includes:

  1. Limited collection of datasets: The number of datasets with available profiles are currently limited to four reflected here: CCOI Datasets ( Future plans may include incorporating additional datasets and their profiles.
  2. Dataset profiles: CCOI dataset profiles consists of three levels: 1) Dataset overview 2) Basic Statistics, and 3) Concept.
  3. Comparison of dataset contents and features: For the initial datasets included, a comparison of the features and contents are available both in general and on a concept level.