Health Information Standards and Discovery

Health information standards and discovery research focuses on the development of methods to gain insights from large health databases while learning the strengths and weaknesses of datasets and improving them, when possible. This area of research assesses whether specific standards are fit for purpose (e.g., quality assurance and interoperability assessments of biomedical terminologies) and investigates standards in action (e.g., in support of tasks such as natural language processing, annotation, data integration, and mapping across terminologies).

Related Projects

Clinical Data Entry Tools

The goal of this project is to develop a tool that can generate data entry forms dynamically based on specifications stored in a database. The development platform is Ruby on Rails, an open-source web application framework. Developers are using this tool in the data capture function of personal health records. They are also using several terminology resources from the UMLS (e.g. RxNORM, ICD9-CM) in data entry fields that require a set of controlled terms. Further development will involve work with very large databases of de-identified patient data.

Clinical Vocabulary Standards

Multiple projects in this area continue to promote the development, enhancement, and adoption of clinical vocabulary standards. Inter-terminology mapping promotes the use of standard terminologies by creating maps to administrative terminologies, which allows re-use of encoded clinical data.

CMS’s Virtual Research Data Center (VRDC)

Provided by the Center for Medicare and Medicaid Services (CMS), the VRDC now carries 17 years of Parts A and B claims data including diagnoses, procedures and medications dispensed in offices (mostly injectable), and vital status derived from Social Security death records. Since late 2006, it also contains Part D medication prescription claims (dispensed by community pharmacies). Very recently, the cause of death (captured by CDC) has become available (1999-2016).

Discoveries from MIMIC II/III and Other Sources

Large database collections of clinical data -- from longitudinal research projects, electronic medical records, and health information exchanges -- provide opportunities to examine controversial findings from smaller scale clinical studies and to conduct retrospective epidemiological studies in areas that lack clinical trials.


A collection of components used to create forms for use in Electronic Health Records.

Medical Informatics Pioneers

Drs. Joan S. Ash and Dean F. Sittig chose and interviewed 17 medical informatics pioneers to capture their memories. In 2013, NLM acquired the transcripts from the first 15 interviews and began work to make them publicly available, including recruiting and placing photographs to enliven the written words.

Medical Ontology Research

The LHNCBC Medical Ontology Research project encompasses basic research on biomedical terminologies and ontologies and their applications to natural language processing, clinical decision support, translational medicine, data integration and interoperability.

Profiles in Science

The Profiles in Science Web site showcases digital reproductions of items selected from the personal manuscript collections of prominent biomedical researchers, medical practitioners, and those fostering science and health. The Web site provides worldwide access to this unique biomedical information.


RxNav is an interface to the RxNorm database, designed for displaying relations among drug entities. In addition to the browser, we created SOAP-based and RESTful application programming interfaces (APIs) enabling users to integrate RxNorm in their applications.


RxTerms is a drug interface terminology derived from RxNorm for prescription writing or medication history recording (e.g. in e-prescribing systems, PHRs). The advantages of RxTerms: free to use; directly links to RxNorm, facilitating inclusion of RxNorm identifiers in electronic health records; and efficient data entry.

Top LOINC Codes – Orders and Observations

LHNCBC, in cooperation with Regenstrief Institute, obtained and analyzed statistical data from many health care organizations to identify the most frequent subset that organizations could target for mapping. It obtained frequency distribution for three years of laboratory tests sources, including from Partners of Boston and the Indiana Network for Patient Care (an HIE), and United Healthcare, all of whom had mapped the test results to LOINC. The sample size of the combined sources was 490 million test results.