You are here

Discoveries from MIMIC II/III and Other Sources

Large database collections of clinical data -- from longitudinal research projects, electronic medical records, and health information exchanges -- provide opportunities to examine controversial findings from smaller scale clinical studies and to conduct retrospective epidemiological studies in areas that lack clinical trials.

NLM established a goal to integrate biomedical, clinical, and public health information systems that promote scientific discovery and speed the translation of research into practice (NLM Long Range Plan, 2006-2016, Goal 3).  One of NLM's key recommendations to fulfill this goal is to "develop linked databases for discovering relationships between clinical data, genetic information, and environmental factors."

LHNCBC's biostatistician and clinicians are using MIT’s large longitudinal MIMIC-II database (33,000 patients with 40,000 intensive care unit (ICU) visits and 180 million rows of data) to answer clinical research questions. We also contributed standard clinical vocabulary code mappings to the latest MIMIC-II release (v 2.6).

MIMIC II is a de-identified collection of ten years of almost complete intensive care unit (ICU) records, organized and maintained by MIT (Massachusetts Institute of Technology), and derived from ICU records at Beth Israel, Boston. It includes all laboratory test results, vital signs, nursing notes, radiology reports, drug orders, discharge summaries, and other tests results. It also includes vital status. On the one hand, MIMIC II delivers provider notes and radiology reports as narrative text – grist for exercising LHNCBC’s natural language processing methods. On the other hand, it delivers medications, diagnoses, and observations, including nursing and ICU measures and laboratory tests, as structured data with defined fields and “coded identifiers”. MIMIC has recently added images for more than 200,000 chest X-ray studies providing more grist of interest to NLM researches.

We have completed a study on the impact of obesity on outcomes after critical illness, which was published in the journal Critical Care.

Ongoing studies include: 1) the relationship between vitamin B12 levels and mortality; and 2) the relationship between blood transfusions, feeds, and necrotizing enterocolitis (NEC) in newborns.

We developed and implemented Natural Language Processing algorithms to extract patients’ smoking status and discharge destinations from the MIMIC-II physician discharge summaries. We extracted information on episodes of neonatal apnea and bradycardia as well as maternal history from clinical notes for infants in the neonatal intensive care unit (NICU) for the NEC study. We also extracted data about hypertension and hypertensive medications from free-text notes, and used that data to compare to ICD-9 hypertension diagnosis codes in order to evaluate underreporting of certain common conditions after ICU admission.

To assist with integrating and analyzing the data, LHNCBC's researchers are using NLM-supported clinical vocabulary standards to improve the utility of the MIMIC-II database. We mapped the laboratory tests and medications to LOINC and RxNorm, respectively, and its radiology reports to the LOINC codes that describe the radiology study.

We are also developing the Maximum Likelihood (ML) statistical method -- to address measurement error in NLP-derived variables in order to reduce bias -- which could potentially increase the utility of NLP-derived data.

This LHNCBC research aligns closely with NIH's Big Data to Knowledge (BD2K) initiative, which "seeks to facilitate broad use of biomedical big data through new data sharing policies, catalogs of datasets, and enhanced training for early career scientists entering the new world of big data" by supporting "the management, analysis and integration of large-scale data and informatics."

Publications/Tools: 
Ben Abacha A, Long LR, Seco de Herrera AG, Antani SK, Wang K, Demner-Fushman D. Named Entity Recognition in Functional Neuroimaging Literature. BIBM 2017
Kury F, Baik SH, McDonald CJ. Cardioprotective Drugs and Incident Dementias in Medicare's Big Data. AMIA 2017.
Bhupatiraju R, Huser V, Fung K. Phenotype modelling tools utilizing standardized EHR data in a Common Data Model format [Poster]. NIH Research Festival 2017.
Roberts K, Gururaj A, Chen X, Pournejati S, Cohen T, Hersh WR, Demner-Fushman D. Information Retrieval for Biomedical Datasets. 2016 bioCADDIE Challenge. AMIA 2017.
Mundkur ML, Callaghan FM, Abhyankar S, McDonald CJ. Use of Electronic Health Record Data to Evaluate the Impact of Race on 30-Day Mortality in Patients Admitted to the Intensive Care Unit. J Racial Ethn Health Disparities. 2017 Aug;4(4):539-548. doi: 10.1007/s40615-016-0256-6. Epub 2016 Jun 20.
Chakrabarti S, Sen A, Huser V, Hruby GW, Rusanova A, Albers DJ, Weng C. An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model version 5.0. J Healthc Inform Res. 2017 Jun;1(1):1-18. doi: 10.1007/s41666-017-0005-6. Epub 2017 Jun 8.
Kury F, Baik SH, McDonald CJ. Analysis of Healthcare Cost and Utilization in the First Two Years of the Medicare Shared Savings Program Using Big Data from the CMS Enclave. AMIA Annu Symp Proc. 2017 Feb 10;2016:724-733. eCollection 2016.
Cahan A, Cimino JJ. Improving precision medicine using individual patient data from trials. CMAJ. 2017 Feb 6;189(5):E204-E207. doi: 10.1503/cmaj.160267. Epub 2016 Aug 29.
Cohen T, Roberts K, Gururaj AE, Chen X, Pournejati S, Alter G, Hersh WR, Demner-Fushman D, Ohno-Machado L, Lu H. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax061.
Huser V, DeFalco FJ, Schuemie M, Ryan PB, Shang N, Velez M, Park RW, Boyce RD, Duke J, Khare R, Utidjian L, Bailey L. Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets. EGEMS (Wash DC). 2016 Nov 30;4(1):1239. doi: 10.13063/2327-9214.1239. eCollection 2016.

Pages