PUBLICATIONS

Abstract

A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature.


Chachra S, Ben Abacha A, Shooshan SE, Rodriguez L, Demner-Fushman D

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers: 1093-1100.

Abstract:

Readers usually rely on abstracts to identify relevant medical information from scientific articles. Abstracts are also essential to advanced information retrieval methods. More than 50 thousand scientific publications in PubMed Central lack author-generated abstracts, and the relevancy judgements for these papers have to be based on their titles alone. In this paper, we propose a hybrid summarization technique that aims to select the most pertinent sentences from articles to generate an extractive summary in lieu of a missing abstract. We combine i) health outcome detection, ii) keyphrase extraction, and iii) textual entailment recognition between sentences. We evaluate our hybrid approach and analyze the improvements of multi-factor summarization over techniques that rely on a single method, using a collection of 295 manually generated reference summaries. The obtained results show that the hybrid approach outperforms the baseline techniques with an improvement of 13% in recall and 4% in F1 score.


Chachra S, Ben Abacha A, Shooshan SE, Rodriguez L, Demner-Fushman D. A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature. 
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers: 1093-1100.

PDF