You are here
Identifying “comment-on” citation data in online biomedical articles using SVM-based text summarization technique.
Comment-on (CON), a MEDLINE® citation field, indicates previously published articles commented on by authors expressing possibly complimentary or contradictory opinions. This paper presents an automated method using a support vector machine (SVM)-based text summarization technique that identifies CON data by distinguishing CON sentences from “citation sentences” and analyzes their corresponding bibliographic data in the references. We compare the performance of two types of SVM, one with a linear kernel function and the other with a radial basis kernel function (RBF). Input feature vectors for these SVMs are created by combining five feature types: 1) word statistics, 2) frequency of occurrence of author names, 3) sentence positions, 4) similarity between titles, and 5) difference of publication years. Experiments conducted on a set of online biomedical articles show that the SVM with a RBF is more reliable in terms of precision, recall, and F-measure rates than the SVM with a linear kernel function for identifying CON.
Keywords: “Comment-on” identification, online biomedical documents, support vector machine, MEDLINE