Identification of "comment-on sentences" in online biomedical documents using support vector machines. .
Kim IC, Le DX, Thoma GR
Proc. SPIE conference on Document Recognition and Retrieval, 6500:65000O (1-8), San Jose, January 2007.
Abstract:
MEDLINE is the premier bibliographic online database of the National Library of Medicine, containing approximately 14 million citations and abstracts from over 4,800 biomedical journals. This paper presents an automated method based on support vector machines to identify a 'comment-on' list, which is a field in a MEDLINE citation denoting previously published articles commented on by a given article. For comparative study, we also introduce another method based on scoring functions that estimate the significance of each sentence in a given article. Preliminary experiments conducted on HTML-formatted online biomedical documents collected from 24 different journal titles show that the support vector machine with polynomial kernel function performs best in terms of recall and F-measure rates.
Kim IC, Le DX, Thoma GR. Identification of "comment-on sentences" in online biomedical documents using support vector machines. .
Proc. SPIE conference on Document Recognition and Retrieval, 6500:65000O (1-8), San Jose, January 2007.