In this paper, multi-modal local decision fusion is used for drug-related webpages classification. First, meaningful text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, six SVM classifiers are trained for six kinds of drug-taking instruments, which are represented by PHOG. One SVM classifier is trained for the cannabis, which is represented by the mid-feature of BOW model. For each instance in a webpage, seven SVMs give seven labels for its image, and other seven labels are given by searching the names of drug-taking instruments and cannabis in its related text. Concatenating seven labels of image and seven labels of text, the representation of those instances in webpages are generated. Last, Multi-Instance Learning is used to classify those drugrelated webpages. Experimental results demonstrate that the classification accuracy of multi-instance learning with multi-modal local decision fusion is much higher than those of single-modal classification.