You are here

Discoveries from Clinical Data

Large database collections of clinical data -- from longitudinal research projects, electronic medical records, and health information exchanges -- provide opportunities to examine controversial findings from smaller scale clinical studies and to conduct retrospective epidemiological studies in areas that lack clinical trials.

NLM established a goal to integrate biomedical, clinical, and public health information systems that promote scientific discovery and speed the translation of research into practice (NLM Long Range Plan, 2006-2016, Goal 3).  One of NLM's key recommendations to fulfill this goal is to "develop linked databases for discovering relationships between clinical data, genetic information, and environmental factors."

LHNCBC's biostatistician and clinicians are using MIT’s large longitudinal MIMIC-II database (33,000 patients with 40,000 intensive care unit (ICU) visits and 180 million rows of data) to answer clinical research questions. We also contributed standard clinical vocabulary code mappings to the latest MIMIC-II release (v 2.6).

We have completed a study on the impact of obesity on outcomes after critical illness, which was published in the journal Critical Care.

Ongoing studies include: 1) the relationship between vitamin B12 levels and mortality; and 2) the relationship between blood transfusions, feeds, and necrotizing enterocolitis (NEC) in newborns.

We developed and implemented Natural Language Processing algorithms to extract patients’ smoking status and discharge destinations from the MIMIC-II physician discharge summaries. We extracted information on episodes of neonatal apnea and bradycardia as well as maternal history from clinical notes for infants in the neonatal intensive care unit (NICU) for the NEC study. We also extracted data about hypertension and hypertensive medications from free-text notes, and used that data to compare to ICD-9 hypertension diagnosis codes in order to evaluate underreporting of certain common conditions after ICU admission.

To assist with integrating and analyzing the data, LHNCBC's researchers are using NLM-supported clinical vocabulary standards to improve the utility of the MIMIC-II database. We mapped the laboratory tests and medications to LOINC and RxNorm, respectively, and its radiology reports to the LOINC codes that describe the radiology study.

We are also developing the Maximum Likelihood (ML) statistical method -- to address measurement error in NLP-derived variables in order to reduce bias -- which could potentially increase the utility of NLP-derived data.

This LHNCBC research aligns closely with NIH's Big Data to Knowledge (BD2K) initiative, which "seeks to facilitate broad use of biomedical big data through new data sharing policies, catalogs of datasets, and enhanced training for early career scientists entering the new world of big data" by supporting "the management, analysis and integration of large-scale data and informatics."

Kilicoglu H, Peng Z, Tafreshi S, Tran T, Rosemblat G, Schneider J. Confirm or Refute?: A Comparative Study on Citation Sentiment Classification in Clinical Research Publications. J Biomed Inform. 2019 Feb 9:103123. doi: 10.1016/j.jbi.2019.103123.
Zolnoori M, Fung K, Patrick TB, Fontelo P, Kharrazi H, Faiola A, Wu YSS, Eldredge CE, Luo J, Conway M, Zhu J, Park SK, Xu K, Moayyed H, Goudarzvand S. A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications. NCBINCBI Logo Skip to main content Skip to navigation Resources How To About NCBI Accesskeys PubMed US National Library of Medicine National Institutes of Health Search databaseSearch term 30611893[uid] Clear inputSearch Create RSSCreate alertAdvancedHelp Result Filters Format: AbstractSend to J Biomed Inform. 2019 Feb;90:103091. doi: 10.1016/j.jbi.2018.12.005. Epub 2019 Jan 4.
Scarton LA, Wang L, Kilicoglu H, Jahries M, Del Fiol M. Expanding vocabularies for complementary and alternative medicine therapies. Int J Med Inform. 2019 Jan;121:64-74. doi: 10.1016/j.ijmedinf.2018.11.009. Epub 2018 Nov 22.
Rindflesch TC, Blake CL, Cairelli MJ, Fiszman M, Zeiss CJ, Kilicoglu H. Investigating the role of interleukin-1 beta and glutamate in inflammatory bowel disease and epilepsy using discovery browsing. J Biomed Semantics. 2018 Dec 27;9(1):25. doi: 10.1186/s13326-018-0192-y.
Zolnoori M, Fung K, Fontelo P, Kharrazi H, Faiola A, Wu YSS, Stoffel V, Patrick T. Identifying the Underlying Factors Associated With Patients' Attitudes Toward Antidepressants: Qualitative and Quantitative Analysis of Patient Drug Reviews. JMIR Ment Health. 2018 Sep 30;5(4):e10726. doi: 10.2196/10726.
Fontelo P, Liu F. A review of recent publication trends from top publishing countries. Syst Rev. 2018 Sep 27;7(1):147. doi: 10.1186/s13643-018-0819-1.
Sylim P, Liu F, Marcelo A, Fontelo P. Blockchain Technology for Detecting Falsified and Substandard Drugs in Distribution: Pharmaceutical Supply Chain Intervention. JMIR Res Protoc. 2018 Sep 13;7(9):e10163. doi: 10.2196/10163.
Goss FR, Lai KH, Topaz M, Acker WW, Kowalski L, Plasek JM, Blumenthal KG, Seger DL, Slight SP, Fung KW, Chang FY, Bates DW, Zhou L. A value set for documenting adverse reactions in electronic health records. J Am Med Inform Assoc. 2018 Jun 1;25(6):661-669. doi: 10.1093/jamia/ocx139.
Edinger T, Demner-Fushman D, Cohen AM, Bedrick S, Hersh W. Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval. AMIA Annu Symp Proc. 2018 Apr 16;2017:660-669. eCollection 2017.
Huser V, Shmueli-Blumberg D. Data sharing platforms for de-identified data from human clinical trials. Clin Trials. 2018 Apr 1:1740774518769655. doi: 10.1177/1740774518769655. [Epub ahead of print]