Modality filtering is an important feature in biomedical image searching systems and may significantly improve the retrieval performance of the system. This paper presents a new method for extracting endoscopic image figures from photograph images in biomedical literature, which are found to have highly diverse content and large variability in appearance. Our proposed method consists of three main stages: tissue image extraction, endoscopic image candidate extraction, and ophthalmic image filtering. For tissue image extraction we use image patch level clustering and MRF relabeling to detect images containing skin/tissue regions. Next, we find candidate endoscopic images by exploiting the round shape characteristics that commonly appear in these images. However, this step needs to compensate for images where endoscopic regions are not entirely round. In the third step we filter out the ophthalmic images which have shape characteristics very similar to the endoscopic images. We do this by using text information, specifically, anatomy terms, extracted from the figure caption. We tested and evaluated our method on a dataset of 115,370 photograph figures, and achieved promising precision and recall rates of 87% and 84%, respectively.
Accuracy of content-based image retrieval is affected by image resolution among other factors. Higher resolution images
enable extraction of image features that more accurately represent the image content. In order to improve the relevance
of search results for our biomedical image search engine, Open-I, we have developed techniques to extract and label
high-resolution versions of figures from biomedical articles supplied in the PDF format. Open-I uses the open-access
subset of biomedical articles from the PubMed Central repository hosted by the National Library of Medicine. Articles
are available in XML and in publisher supplied PDF formats. As these PDF documents contain little or no meta-data to
identify the embedded images, the task includes labeling images according to their figure number in the article after they
have been successfully extracted. For this purpose we use the labeled small size images provided with the XML web
version of the article. This paper describes the image extraction process and two alternative approaches to perform image
labeling that measure the similarity between two images based upon the image intensity projection on the coordinate
axes and similarity based upon the normalized cross-correlation between the intensities of two images. Using image
identification based on image intensity projection, we were able to achieve a precision of 92.84% and a recall of 82.18%
in labeling of the extracted images.