LHNCBC Project List

Current Projects

thumbnail of chest CT projectAdvances in machine learning and artificial intelligence techniques offer a promise to supplement rapid, accurate, and reliable computer-assisted disease screening. Such techniques are particularly valuable in overburdened and/or resource constrained regions. These regions also tend to exhibit high prevalence of infectious diseases and report high mortality. Our research in machine learning and artificial intelligence algorithms aims to improve disease detection accuracy and reliability, with a goal to also explain algorithm behavior.

thumbnail graphic for clinical data entry projectThe goal of this project is to develop a tool that can generate data entry forms dynamically based on specifications stored in a database. The development platform is Ruby on Rails, an open-source web application framework. Developers are using this tool in the data capture function of personal health records. They are also using several terminology resources from the UMLS (e.g. RxNORM, ICD9-CM) in data entry fields that require a set of controlled terms. Further development will involve work with very large databases of de-identified patient data. The goal is to create additional reusable software tools, some of which will involve biostatistical analysis with the "R" package.

thumbnail graphic for clinical vocabulary standards projectMultiple projects in this area continue to promote the development, enhancement, and adoption of clinical vocabulary standards. Inter-terminology mapping promotes the use of standard terminologies by creating maps to administrative terminologies, which allows re-use of encoded clinical data.

thumbnail graphic representing Virtual research data center projectProvided by the Center for Medicare and Medicaid Services (CMS), the VRDC now carries 17 years of Parts A and B claims data including diagnoses, procedures and medications dispensed in offices (mostly injectable), and vital status derived from Social Security death records. Since late 2006, it also contains Part D medication prescription claims (dispensed by community pharmacies). Very recently the cause of death (captured by CDC), has become available (1999-2016).

thumbnail for consumer health question answering projectThe consumer health question answering project was launched to support NLM customer services that receive about 90,000 requests a year from a world-wide pool of customers. The requests are categorized by the customer support services staff and are either answered using about 300 stock answers (with or without modifications) or researched and answered by the staff manually. Responding to a customer with a stock reply takes approximately 4 minutes; answering with a personalized stock reply takes about 10 minutes. To reduce the time and cost of customer services, NLM launched the Consumer Health Information and Question Answering (CHIQA) project. The CHIQA project conducts research in both the automatic classification of customers’ requests and the automatic answering of consumer health questions.

thumbnail graphic for de-identification tool project The current version of NLM-Scrubber, the NLM HIPAA compliant, clinical text de-identification tool, is here

LHNCBC is developing a new software application that is capable of de-identifying many kinds of clinical reports with high accuracy. The software design uses a number of deterministic and probabilistic pattern recognition algorithms and various computational linguistic methods. The application accepts narrative reports in plain text or in HL7 format. When the reports are formatted as HL7 messages, the application leverages the labeled patient-related information embedded in various HL7 segments to find such information in the free text narrative.

thumbnail graphic for discoveries from MIMIC II/III projectWe developed and implemented Natural Language Processing algorithms to extract patients’ smoking status and discharge destinations from the MIMIC-II physician discharge summaries. We extracted information on episodes of neonatal apnea and bradycardia as well as maternal history from clinical notes for infants in the neonatal intensive care unit (NICU) for the NEC study. We also extracted data about hypertension and hypertensive medications from free-text notes, and used that data to compare to ICD-9 hypertension diagnosis codes in order to evaluate underreporting of certain common conditions after ICU admission.

To assist with integrating and analyzing the data, LHNCBC's researchers are using NLM-supported clinical vocabulary standards to improve the utility of the MIMIC-II database. We mapped the laboratory tests and medications to LOINC and RxNorm, respectively, and its radiology reports to the LOINC codes that describe the radiology study.

thumbnail graphic for imaging tools projectThe goal of our work in Biomedical Imaging is two-fold: One, to develop advanced imaging tools for biomedical research in partnership with the National Cancer Institute and other organizations. Secondly, to conduct research in Content Based Image Retrieval (CBIR) to index and retrieve medical images by image features (e.g., shape, color and texture), augmented by textual features as well. This work includes the development of the CervigramFinder for retrieval of uterine cervix images by image features, SPIRS for retrieval of digitized x-ray images of the spine from NHANES II and a distributed global system SPIRS-IRMA for image retrieval by both high-level and detailed features of medical images, in collaboration with Aachen University, Germany.

thumbnail graphic for indexing initiative projectThe Indexing Initiative (II) project investigates language-based and machine learning methods for the automatic selection of subject headings for use in both semi-automated and fully automated indexing environments at NLM. Its major goal is to facilitate the retrieval of biomedical information from textual databases such as MEDLINE.

thumbnail icon for infobot projectThis system automatically augments a patient's Electronic Health Record (EHR) with pertinent information from NLM resources. The software runs as background agents, both at a hospital and at NLM. The hospital uses our APIs to integrate the search setup and to display and store results in their existing EHR system. For clinical settings that have no means to use the API, a Web-based interface allows information requests to be manually entered. The InfoBot API integrated with the NIH Clinical Center’s EMR system, CRIS, is in daily use through the Evidence-Based Practice tab in CRIS since July 2009. Information provided to a medical institution is customized according to the institution's requirements. The requirements define the EMR fields that are provided to InfoBot and the knowledge sources to be mined for information provided by InfoBot. Each set of requirements for a specific clinical task and user group is called a Ruleset. Medical institutions can define as many rulesets as are needed to support their daily practice with evidence.

thumbnail icon for lexical systems group projectThe current version of the SPECIALIST Lexicon and NLP Tools are here LHNCBC's Lexical Systems Group develops and maintains the SPECIALIST lexicon and the tools that support and exploit it. The SPECIALIST Lexicon and NLP Tools are at the center of NLM's natural language research, providing a foundation for all our natural language processing efforts. In general, we investigate the contributions that natural language processing techniques can make to the task of mediating between the language of users and the language of online biomedical information resources. The SPECIALIST NLP Tools facilitate natural language processing by helping application developers with lexical variation and text analysis tasks in the biomedical domain.

The current version of the LHC-Forms is at

thumbnail graphic for LHC forms project This is a collection of components used to create forms for use in Electronic Health Records.

thumbnail graphic representing malaria screener projectTo improve malaria diagnostics, the Lister Hill National Center for Biomedical Communications, an R&D division of the US National Library of Medicine, in collaboration with NIH’s National Institute of Allergy and Infectious Diseases (NIAID) and Mahidol-Oxford University, is developing a fully-automated system for parasite detection and counting in blood films.

thumbnail graphic for medical informatics pioneers projectThe Medical Informatics Pioneers oral history project is here.

Oral history is a method for documenting history in a vivid way by recording the voices of those who have experienced it.

Beginning in 2004, Drs. Joan S. Ash and Dean F. Sittig chose and interviewed 17 medical informatics pioneers to capture their emories.

In 2013, NLM acquired the transcripts from the first 15 interviews and began work to make them publicly available, including recruiting and placing photographs to enliven the written words.

thumbnail graphic for medical ontology researchThe LHNCBC Medical Ontology Research project encompasses basic research on biomedical terminologies and ontologies and their applications to natural language processing, clinical decision support, translational medicine, data integration and interoperability. In the past few years, our research has focused on the integration, dissemination, quality assurance and applications of drug ontologies and on quality assurance in biomedical ontologies. We also develop application programming interfaces (APIs) and browsers for drug resources including RxNorm, RxTerms, NDF-RT and ATC.

thumbnail graphic for medical ontology researchThis project seeks to improve information retrieval from collections of full-text biomedical articles, images, and patient cases, by moving beyond conventional text-based searching to combining both text and visual features.

thumbnail graphic for newborn screening projectNewborn screening in the United States is a complex public health program. The goals of NBS are to identify infants who appear healthy but have serious conditions, begin treatment before they suffer significant disability or death, and in doing so decrease the burden of disease on society. In 2006, a recommended uniform screening panel was published that included conditions based on detailed criteria for the condition itself, screening and diagnostic tests, and treatment and management.

thumbnail logo for open-i projectThe current version of Open-i is at

The Open-i® (pronounced “open eye”) experimental multimedia search engine retrieves and displays structured MEDLINE citations augmented by image-related text and concepts and linked to images based on image features.

thumbnail graphic for pubmed for handhelds projectPubMed for Handhelds research brings medical information to the point of care via devices like smartphones. This includes developing algorithms and public-domain tools for searching by text message (askMEDLINE and txt2MEDLINE), applying clinical filters (PICO) and viewing summary abstracts (The Bottom Line and Consensus Abstracts) in MEDLINE/PubMed, and evaluating the use of these tools in Clinical Decision Support.

thumbnail graphic for RIDeM projectRIDeM is one of the LHNCBC Clinical Decision Support projects. The long-term goal of the Repository for Informed Decision Making is to provide access to key facts needed to support clinical decision making. The facts are extracted from biomedical literature and clinical text sources. The development of the Repository is guided by the Evidence Based Medicine (EBM) principles for finding and appraising information.

thumbnail graphic for RX nav projectThe current version of RxNav is at

Released in September 2004, RxNav was developed as an interface to the RxNorm database and was primarily designed for displaying relations among drug entities. In addition to the browser, SOAP-based and RESTful application programming interfaces (APIs) were created, enabling users to integrate RxNorm in their applications. Examples of use include mapping drug names to RxNorm, finding the ingredient(s) corresponding to a brand name, and obtaining the list of NDCs for a given drug.

thumbnail graphic for rx terms projectThe current version of RxTerms is at

RxTerms is a drug interface terminology derived from RxNorm for prescription writing or medication history recording (e.g. in e-prescribing systems, PHRs). RxTerms is free to use (see terms and conditions). It directly links to RxNorm, the U.S. drug terminology standard and facilitates inclusion of RxNorm identifiers in electronic health records.

thumbnail graphic for SNOMED CT CORE projectThe current SNOMED CT is at

SNOMED CT (Systematized Nomenclature of Medicine - Clinical Terms) is the most comprehensive, multi-lingual medical terminology in the world. It is emerging as the standard terminology clinical terminology for use in the Electronic Health Record (EHR). According to the "Meaningful Use" of the EHR incentive program of the Centers for Medicare & Medicaid Services (CMS), one of the certification criteria of EHR is that problem list data should be encoded in SNOMED CT. The problem list is considered to be an essential part of the Electronic Health Record (EHR) by various sanctioning bodies and medical information standards organizations, including the Institute of Medicine, Joint Commission, American Society for Testing and Materials and Health Level Seven. This lack of a common standard leads to duplication of effort and impedes data interoperability.

thumbnail graphic for top LOINC codesThe Top LOINC Codes can be downloaded from 

LHNCBC, in cooperation with Regenstrief Institute, obtained and analyzed statistical data from many health care organizations to identify the most frequent subset that organizations could target for mapping. It obtained frequency distribution for three years of laboratory tests sources, including from Partners of Boston and the Indiana Network for Patient Care (an HIE), and United Healthcare, all of whom had mapped the test results to LOINC.