Paper
14 February 2020 Heterogeneous features extraction based on deep learning for drug-related webpages classification
Ruiguang Hu, Qi Gao, Yujiao Jia, Yanxin Liu
Author Affiliations +
Proceedings Volume 11430, MIPPR 2019: Pattern Recognition and Computer Vision; 1143002 (2020) https://doi.org/10.1117/12.2535014
Event: Eleventh International Symposium on Multispectral Image Processing and Pattern Recognition (MIPPR2019), 2019, Wuhan, China
Abstract
In this paper, heterogeneous features extraction is conducted by deep learning for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, text-based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Webpages representation is generated by concatenating representations of text and images. Heterogeneous feature extraction are conducted by deep learning and classical methods, such as PCA, respectively. Feature selection is also conducted using information theory. Last, extracted features and selected features are classified. Experimental results demonstrate that the classification accuracy of features extracted by deep learning is higher than those of features extracted or selected by classical methods, and also higher than the accuracy of single modal classification.
© (2020) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ruiguang Hu, Qi Gao, Yujiao Jia, and Yanxin Liu "Heterogeneous features extraction based on deep learning for drug-related webpages classification", Proc. SPIE 11430, MIPPR 2019: Pattern Recognition and Computer Vision, 1143002 (14 February 2020); https://doi.org/10.1117/12.2535014
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature extraction

Feature selection

Information theory

Principal component analysis

Back to Top