17 June 2003 Characterizing the high-level content of natural images using lexical basis functions
Author Affiliations +
Abstract
The performance of content-based image retrieval using low-level visual content has largely been judged to be unsatisfactory. Perceived performance could probably be improved if retrieval were based on higher-level content. However, researchers have not been very successful in bridging what is now called the "semantic gap" between low-level content detectors and higher-level visual content. This paper describes a novel "top-down" approach to bridging this semantic gap. A list of primitive words (which we call "lexical basis functions") are selected from a lexicon of the English language, and are used to characterize the higher-level content of natural outdoor images. Visual similarity between pairs of images are then "computed" based on the degree of similarity between their respective word lists. These "computed" similarities are then shown to correlate with subjectively perceived similarities between pairs of images. This demonstrates that the chosen set of lexical basis functions is able to characterize the multidimensional content (and similarity) of these image pairs in a manner that parallels their subjectively perceived content (and similarity). If a retrieval system could be designed to automatically detect the visual content represented by these basis functions, it could compute a similarity measure that would correlate with human subjective similarity rankings.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
John Arthur Black, Kanav Kahol, Prem Kuchi, Gamal F. Fahmy, Sethuraman Panchanathan, "Characterizing the high-level content of natural images using lexical basis functions", Proc. SPIE 5007, Human Vision and Electronic Imaging VIII, (17 June 2003); doi: 10.1117/12.477775; https://doi.org/10.1117/12.477775
PROCEEDINGS
14 PAGES


SHARE
Back to Top