18 January 2010 Naïve Bayes and SVM classifiers for classifying databank accession number sentences from online biomedical articles
Author Affiliations +
Abstract
This paper describes two classifiers, Naïve Bayes and Support Vector Machine (SVM), to classify sentences containing Databank Accession Numbers, a key piece of bibliographic information, from online biomedical articles. The correct identification of these sentences is necessary for the subsequent extraction of these numbers. The classifiers use words that occur most frequently in sentences as features for the classification. Twelve sets of word features are collected to train and test the classifiers. Each set has a different number of word features ranging from 100 to 1,200. The performance of each classifier is evaluated using four measures: Precision, Recall, F-Measure, and Accuracy. The Naïve Bayes classifier shows performance above 93.91% at 200 word features for all four measures. The SVM shows 98.80% Precision at 200 word features, 94.90% Recall at 500 and 700, 96.46% F-Measure at 200, and 99.14% Accuracy at 200 and 400. To improve classification performance, we propose two merging operators, Max and Harmonic Mean, to combine results of the two classifiers. The final results show a measureable improvement in Recall, F-Measure, and Accuracy rates.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jongwoo Kim, Daniel X. Le, George R. Thoma, "Naïve Bayes and SVM classifiers for classifying databank accession number sentences from online biomedical articles", Proc. SPIE 7534, Document Recognition and Retrieval XVII, 75340U (18 January 2010); doi: 10.1117/12.838961; https://doi.org/10.1117/12.838961
PROCEEDINGS
8 PAGES


SHARE
RELATED CONTENT


Back to Top