Content-based visual information (or image) retrieval (CBIR) has been an extremely active research domain within medical imaging over the past ten years, with the goal of improving the management of visual medical information. Many technical solutions have been proposed, and application scenarios for image retrieval as well as image classification have been set up. However, in contrast to medical information retrieval using textual methods, visual retrieval has only rarely been applied in clinical practice. This is despite the large amount and variety of visual information produced in hospitals every day. This information overload imposes a significant
burden upon clinicians, and CBIR technologies have the potential to help the situation. However, in order for CBIR to become an accepted clinical tool, it must demonstrate a higher level of technical maturity than it has to date. Since 2004, the ImageCLEF benchmark has included a task for the comparison of visual information
retrieval algorithms for medical applications. In 2005, a task for medical image classification was introduced and both tasks have been run successfully for the past four years. These benchmarks allow an annual comparison of visual retrieval techniques based on the same data sets and the same query tasks, enabling the meaningful
comparison of various retrieval techniques. The datasets used from 2004-2007 contained images and annotations from medical teaching files. In 2008, however, the dataset used was made up of 67,000 images (along with their associated figure captions and the full text of their corresponding articles) from two Radiological Society of North America (RSNA) scientific journals.
This article describes the results of the medical image retrieval task of the ImageCLEF 2008 evaluation campaign. We compare the retrieval results of both visual and textual information retrieval systems from 15 research groups on the aforementioned data set. The results show clearly that, currently, visual retrieval
alone does not achieve the performance necessary for real-world clinical applications. Most of the common visual retrieval techniques have a MAP (Mean Average Precision) of around 2-3%, which is much lower than that achieved using textual retrieval (MAP=29%). Advanced machine learning techniques, together with good training data, have been shown to improve the performance of visual retrieval systems in the past. Multimodal retrieval (basing retrieval on both visual and textual information) can achieve better results than purely visual,
but only when carefully applied. In many cases, multimodal retrieval systems performed even worse than purely textual retrieval systems. On the other hand, some multimodal retrieval systems demonstrated significantly increased early precision, which has been shown to be a desirable behavior in real-world systems.