Natural Language Processing

Lister Hill National Center for Biomedical Communication's (LHNCBC) natural language processing (NLP), or text mining, research focuses on the development and evaluation of computer algorithms for automated text analysis. This area of research works primarily with text from the biomedical literature or electronic medical records and examines a wide variety of NLP tasks, including information extraction, literature searches, question answering, and text summarization.

Related Projects


BabelMeSH is a multi-language tool for searching MEDLINE/PubMed.

Consumer Health Question Answering

The consumer health question answering project was launched to support NLM customer services that receive about 90,000 requests a year from a world-wide pool of customers.

De-Identification Tools

Computational de-identification uses natural language processing (NLP) tools and techniques to recognize patient-related individually identifiable information (e.g. names, addresses, and telephone and social security numbers) in the text, and redacts them. In this way, patient privacy is protected and clinical knowledge is preserved.

Indexing Initiative

The Indexing Initiative (II) project investigates language-based and machine learning methods for the automatic selection of subject headings for use in both semi-automated and fully automated indexing environments at NLM. Its major goal is to facilitate the retrieval of biomedical information from textual databases such as MEDLINE.

Lexical Systems & Tools (SPECIALIST)

LHNCBC's Lexical Systems Group develops and maintains the SPECIALIST lexicon and the tools that support and exploit it. The SPECIALIST Lexicon and NLP Tools are at the center of NLM's natural language research, providing a foundation for all our natural language processing efforts.

Multi-modal Information Retrieval and Question Answering

This project seeks to improve information retrieval from collections of full-text biomedical articles, images, and patient cases, by moving beyond conventional text-based searching to combining both text and visual features.


The Open-ism (pronounced “open eye”) experimental multimedia search engine retrieves and displays structured MEDLINE citations augmented by image-related text and concepts and linked to images based on image features.

PubMed for Handhelds

PubMed for Handhelds research brings medical information to the point of care via devices like smartphones. This includes developing algorithms and public-domain tools for searching by text message (askMEDLINE), applying clinical filters (PICO) and viewing summary abstracts (The Bottom Line and Consensus Abstracts) in MEDLINE/PubMed, and evaluating the use of these tools in Clinical Decision Support.