This project seeks to improve information retrieval from collections of full-text biomedical articles, images, and patient cases, by moving beyond conventional text-based searching to combining both text and visual features to:
- Build text processing and image processing tools to index images and image-related text, and enable searching of the literature by textual, visual and hybrid search queries.
- Build tools employing a combination of text and image features to enrich traditional bibliographic citations with relevant biomedical images, charts, graphs, diagrams and other illustrations, as well as with patient-oriented outcomes from the literature.
In addition to developing these tools, we test them in two related initiatives that seek to:
- Improve the retrieval of the biomedical literature by targeting the visual content in articles. Within this broad goal, we initially focus on finding information relevant to a patient's medical case in the literature, and then linking it to the health record, and clinical question answering.
- Improve the retrieval of semantically similar images from the literature and from image databases, with the goal of reducing the "semantic gap" that is a significant hindrance to the use of image retrieval for practical clinical purposes.
Test the prototype image search engine Open-i
See the technical report to the LHNCBC Board of Scientific Counselors (September 2010) for details.
Processes
- Text processing:
- Caption extraction and segmentation
- Mention extraction
- Biomedical terminology extraction
- Image processing:
- multi-panel figure segmentation
- text and symbols localization
- color and texture features computation
- Image classification using supervised machine learning
- Image annotation:
- automatic UMLS-based medium-level annotation using text references to image regions of interest and mark-up
Evaluation
Open-i was evaluated in the ImageCLEF medical image retrieval tasks.