RESEARCH/R&D
Health information standards and discovery research focuses on the development of methods to gain insights from large health databases while learning the strengths and weaknesses of datasets and improving them, when possible. This area of research assesses whether specific standards are fit for purpose (e.g., quality assurance and interoperability assessments of biomedical terminologies) and investigates standards in action (e.g., in support of tasks such as natural language processing, annotation, data integration, and mapping across terminologies).
The Center for Clinical Observational Investigations (CCOI) seeks to reduce barriers to accessing data for researchers through an evolving, multi-pronged approach. Through initial strides, this project will includes curating a list of clinical datasets and cataloging high level data into Dataset Profiles.
The goal of this project is to develop a tool that can generate data entry forms dynamically based on specifications stored in a database. The development platform is Ruby on Rails, an open-source web application framework. Developers are using this tool in the data capture function of personal health records. They are also using several terminology resources from the UMLS (e.g. RxNORM, ICD9-CM) in data entry fields that require a set of controlled terms. Further development will involve work with very large databases of de-identified patient data. The goal is to create additional reusable software tools, some of which will involve biostatistical analysis with the "R" package.
Multiple projects in this area continue to promote the development, enhancement, and adoption of clinical vocabulary standards. Inter-terminology mapping promotes the use of standard terminologies by creating maps to administrative terminologies, which allows re-use of encoded clinical data.
Provided by the Center for Medicare and Medicaid Services (CMS), the VRDC now carries 17 years of Parts A and B claims data including diagnoses, procedures and medications dispensed in offices (mostly injectable), and vital status derived from Social Security death records. Since late 2006, it also contains Part D medication prescription claims (dispensed by community pharmacies). Very recently the cause of death (captured by CDC), has become available (1999-2016).
We apply cutting edge data science approaches, including artificial intelligence and machine learning, to existing large-scale clinical datasets (LSCDs) and rearrange the data by putting data from people with HIV who are highly similar to each other into to their own cohorts. Research conducted on such cohorts is expected to be more reproducible, and its conclusions more robust. We will do this by automating the segmentation of people who are described in LSCDs and living with HIV. We will segment their clinical events into cohorts with reproducible cohort definitions. Our reproducible cohort definitions can be used for designing novel studies or to compare LSCDs to one another before a study begins to support choosing a LSCD intentionally. Nationality, demography, geography, treatment era, comorbidities, and preexisting conditions (prior to HIV infection) should inform treatment outcomes and efficacy when studying people living with HIV.
We developed and implemented Natural Language Processing algorithms to extract patients’ smoking status and discharge destinations from the MIMIC-II physician discharge summaries. We extracted information on episodes of neonatal apnea and bradycardia as well as maternal history from clinical notes for infants in the neonatal intensive care unit (NICU) for the NEC study. We also extracted data about hypertension and hypertensive medications from free-text notes, and used that data to compare to ICD-9 hypertension diagnosis codes in order to evaluate underreporting of certain common conditions after ICU admission.
To assist with integrating and analyzing the data, LHNCBC's researchers are using NLM-supported clinical vocabulary standards to improve the utility of the MIMIC-II database. We mapped the laboratory tests and medications to LOINC and RxNorm, respectively, and its radiology reports to the LOINC codes that describe the radiology study.
The current version of the LHC-Forms is at https://lhcforms.nlm.nih.gov/.
This is a collection of components used to create forms for use in Electronic Health Records.
The Medical Informatics Pioneers oral history project is here.
Oral history is a method for documenting history in a vivid way by recording the voices of those who have experienced it.
Beginning in 2004, Drs. Joan S. Ash and Dean F. Sittig chose and interviewed 17 medical informatics pioneers to capture their memories.
In 2013, NLM acquired the transcripts from the first 15 interviews and began work to make them publicly available, including recruiting and placing photographs to enliven the written words.
The LHNCBC Medical Terminology Standards project seeks to facilitate the development, promotion, and dissemination of health data standards, as well as to support the use of terminology standards in health care, public health, and research. The Project focuses on the integration, dissemination, quality assurance and applications of drug ontologies and on quality assurance in biomedical ontologies. We also develop application programming interfaces (APIs) and browsers for RxNorm and related drug resources.
Visit RxNav at https://rxnav.nlm.nih.gov/.
The RxNorm browser RxNav and application programming interfaces (APIs) support the adoption and distribution of RxNorm, the NLM standard terminology for drugs. RxNav and companion APIs also extend the scope of RxNorm by linking RxNorm drugs to physician-friendly terms (RxTerms), and drug classes (RxClass). RxMix allows users to combine API functions to build applications. RxNav-in-a-Box provides users with a locally installable version of the APIs and applications.
The current version of RxTerms is at https://mor.nlm.nih.gov/RxTerms/.
RxTerms is a drug interface terminology derived from RxNorm for prescription writing or medication history recording (e.g. in e-prescribing systems, PHRs). RxTerms is free to use (see terms and conditions). It directly links to RxNorm, the U.S. drug terminology standard and facilitates inclusion of RxNorm identifiers in electronic health records.
The Top LOINC Codes can be downloaded from https://loinc.org/usage/obs/
LHNCBC, in cooperation with Regenstrief Institute, obtained and analyzed statistical data from many health care organizations to identify the most frequent subset that organizations could target for mapping. It obtained frequency distribution for three years of laboratory tests sources, including from Partners of Boston and the Indiana Network for Patient Care (an HIE), and United Healthcare, all of whom had mapped the test results to LOINC.
Lister Hill National Center for Biomedical Communication's (LHNCBC) natural language processing (NLP), or text mining, research focuses on the development and evaluation of computer algorithms for automated text analysis. This area of research works primarily with text from the biomedical literature or electronic medical records and examines a wide variety of NLP tasks, including information extraction, literature searches, question answering, and text summarization.
BabelMeSH is a multi-language tool for searching MEDLINE/PubMed.
The consumer health question answering project was launched to support NLM customer services that receive about 90,000 requests a year from a world-wide pool of customers. The requests are categorized by the customer support services staff and are either answered using about 300 stock answers (with or without modifications) or researched and answered by the staff manually. Responding to a customer with a stock reply takes approximately 4 minutes; answering with a personalized stock reply takes about 10 minutes. To reduce the time and cost of customer services, NLM launched the Consumer Health Information and Question Answering (CHIQA) project. The CHIQA project conducts research in both the automatic classification of customers’ requests and the automatic answering of consumer health questions.
The current version of NLM-Scrubber, the NLM HIPAA compliant, clinical text de-identification tool, is here https://scrubber.nlm.nih.gov/
LHNCBC is developing a new software application that is capable of de-identifying many kinds of clinical reports with high accuracy. The software design uses a number of deterministic and probabilistic pattern recognition algorithms and various computational linguistic methods. The application accepts narrative reports in plain text or in HL7 format. When the reports are formatted as HL7 messages, the application leverages the labeled patient-related information embedded in various HL7 segments to find such information in the free text narrative.
The Indexing Initiative (II) project investigates language-based and machine learning methods for the automatic selection of subject headings for use in both semi-automated and fully automated indexing environments at NLM. Its major goal is to facilitate the retrieval of biomedical information from textual databases such as MEDLINE.
The current version of the SPECIALIST Lexicon and NLP Tools are here https://lhncbc.nlm.nih.gov/LSG. LHNCBC's Lexical Systems Group develops and maintains the SPECIALIST lexicon and the tools that support and exploit it. The SPECIALIST Lexicon and NLP Tools are at the center of NLM's natural language research, providing a foundation for all our natural language processing efforts. In general, we investigate the contributions that natural language processing techniques can make to the task of mediating between the language of users and the language of online biomedical information resources. The SPECIALIST NLP Tools facilitate natural language processing by helping application developers with lexical variation and text analysis tasks in the biomedical domain.
This project seeks to improve information retrieval from collections of full-text biomedical articles, images, and patient cases, by moving beyond conventional text-based searching to combining both text and visual features.
The current version of Open-i is at https://openi.nlm.nih.gov/.
The Open-i® (pronounced “open eye”) experimental multimedia search engine retrieves and displays structured MEDLINE citations augmented by image-related text and concepts and linked to images based on image features.
PubMed for Handhelds research brings medical information to the point of care via devices like smartphones. This includes developing algorithms and public-domain tools for searching by text message (askMEDLINE and txt2MEDLINE), applying clinical filters (PICO) and viewing summary abstracts (The Bottom Line and Consensus Abstracts) in MEDLINE/PubMed, and evaluating the use of these tools in Clinical Decision Support.
Image processing focuses on data science research in biomedical image and signal processing, artificial intelligence, and machine learning to support automated clinical decision-making in disease screening and diagnostics. This area of research includes image and text analysis for clinical research, exploration of visual content relevant to disease in images and video, and visual information retrieval for embedding automated decision-support systems in diagnostic and treatment pathways.
Advances in machine learning and artificial intelligence techniques offer a promise to supplement rapid, accurate, and reliable computer-assisted disease screening. Such techniques are particularly valuable in overburdened and/or resource constrained regions. These regions also tend to exhibit high prevalence of infectious diseases and report high mortality. Our research in machine learning and artificial intelligence algorithms aims to improve disease detection accuracy and reliability, with a goal to also explain algorithm behavior.
The goal of our work in Biomedical Imaging is two-fold: One, to develop advanced imaging tools for biomedical research in partnership with the National Cancer Institute and other organizations. Secondly, to conduct research in Content Based Image Retrieval (CBIR) to index and retrieve medical images by image features (e.g., shape, color and texture), augmented by textual features as well. This work includes the development of the CervigramFinder for retrieval of uterine cervix images by image features, SPIRS for retrieval of digitized x-ray images of the spine from NHANES II and a distributed global system SPIRS-IRMA for image retrieval by both high-level and detailed features of medical images, in collaboration with Aachen University, Germany.
To improve malaria diagnostics, the Lister Hill National Center for Biomedical Communications, an R&D division of the US National Library of Medicine, in collaboration with NIH’s National Institute of Allergy and Infectious Diseases (NIAID) and Mahidol-Oxford University, is developing a fully-automated system for parasite detection and counting in blood films.