23 March 1995 Multimedia input in automated image annotation and content-based retrieval
Author Affiliations +
Abstract
This research explores the interaction of linguistic and photographic information in an integrated text/image database. By utilizing linguistic descriptions of a picture (speech and text input) coordinated with pointing references to the picture, we extract information useful in two aspects: image interpretation and image retrieval. In the image interpretation phase, objects and regions mentioned in the text are identified; the annotated image is stored in a database for future use. We incorporate techniques from our previous research on photo understanding using accompanying text: a system, PICTION, which identifies human faces in a newspaper photograph based on the caption. In the image retrieval phase, images matching natural language queries are presented to a user in a ranked order. This phase combines the output of (1) the image interpretation/annotation phase, (2) statistical text retrieval methods, and (3) image retrieval methods (e.g., color indexing). The system allows both point and click querying on a given image as well as intelligent querying across the entire text/image database.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Rohini K. Srihari, Rohini K. Srihari, "Multimedia input in automated image annotation and content-based retrieval", Proc. SPIE 2420, Storage and Retrieval for Image and Video Databases III, (23 March 1995); doi: 10.1117/12.205290; https://doi.org/10.1117/12.205290
PROCEEDINGS
12 PAGES


SHARE
Back to Top