Natural Language Processing

Multi-modal Information Retrieval and Question Answering

Research Area: Natural Language Processing

Researchers: Sameer Antani, Dina Demner-Fushman

Multi-modal Information Retrieval and Question Answering project iconThis project seeks to improve information retrieval from collections of full-text biomedical articles, images, and patient cases, by moving beyond conventional text-based searching to combining both text and visual features to:

  • Build text processing and image processing tools to index images and image-related text, and enable searching of the literature by textual, visual and hybrid search queries.
  • Build tools employing a combination of text and image features to enrich traditional bibliographic citations with relevant biomedical images, charts, graphs, diagrams and other illustrations, as well as with patient-oriented outcomes from the literature.

In addition to developing these tools, we test them in two related initiatives that seek to:

  • Improve the retrieval of the biomedical literature by targeting the visual content in articles. Within this broad goal, we initially focus on finding information relevant to a patient's medical case in the literature, and then linking it to the health record, and clinical question answering.
  • Improve the retrieval of semantically similar images from the literature and from image databases, with the goal of reducing the "semantic gap" that is a significant hindrance to the use of image retrieval for practical clinical purposes.

Test the prototype image search engine Open-i

See the technical report to the LHNCBC Board of Scientific Counselors (September 2010) for details.


  • Text processing:
    • Caption extraction and segmentation
    • Mention extraction
    • Biomedical terminology extraction
  • Image processing:
    • multi-panel figure segmentation
    • text and symbols localization
    • color and texture features computation
  • Image classification using supervised machine learning
  • Image annotation:
    • automatic UMLS-based medium-level annotation using text references to image regions of interest and mark-up


Open-i was evaluated in the ImageCLEF medical image retrieval tasks.