We study the problem of retrieving images using a small template. The goal is to allow a user to search for images containing a pattern similar to the template, adding to the capability of a search engine. We propose to employ a segmentation-based approach. As a specific example, we introduce a quadtree segmentation technique for textured images and a distance measure, Sum of Minimum Distance, suitable for template-based image retrieval applications.
There is a growing need for the ability to query image databases based on image content rather than strict keyword search. Most current image database systems that perform query by content require a distance computation for each image in the database. Distance computations can be time consuming, limiting the usability of such systems. There is thus a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for run-time creation of distance measures. In this paper, we introduce FIDS, or `Flexible Image Database System.' FIDS allows the user to query the database based on user-defined polynomial combinations of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can return matches to the query image without directly comparing the query images to much of the database. FIDS is currently being tested on a database of eighteen hundred images.
Color is one of the most recognizable elements of image content, and color histogram is the most commonly used technique for indexing colors. Faloutsos et al. propose using a 3D index to perform histogram filtering. Sawhney and Hafner later generalize the filtering approach by using k- dimensional indices. The main contribution of this paper is the development and analysis of multi-level color histograms. The key idea is to insert additional levels of abstracted histograms in between a low dimensional index and the original histograms. Based on a cost model we developed, our analysis shows that in most cases, the optimal 3-level and 4-level configurations, when compared with the Faloutsos configuration and the optimal Sawhney-Hafner configuration, require lower CPU and I/O costs. Experimental results indicate that the gain in total time can vary from 22% to 400%. Our analysis also shows that the overhead required by 3-level and 4-level histograms is negligible.
The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval. The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence. In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single images. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.
In digital libraries and the Internet, large amount of data in various modalities has to be transmitted and delivered across the networks, and is subject to bandwidth constraints and network congestion. Among all multimedia data, video is the most difficult to handle, both in terms of its size and the scarcity of tools and techniques available for efficient delivery, storage and retrieval. Providing tools to help users search and browse large collections of video documents is important. Equally important are the means to deliver and present the essence of video content to the user without noticeable delay. In this paper, we focus on the characterization of video by means of automatic analysis of its visual content and the compact presentation of the underlying story content built upon the derived characteristics. We develop models to capture and characterize video by temporal events, namely, dialogues, actions and story units. We then present these events using succinct visual summaries that depict and differentiate the underlying dramatic elements in an intuitive manner. The combination of video characterization and visual summary offers significant compaction of data size in video far beyond the numbers achieved by traditional video compression, while retaining essential meanings and semantics of the content, and is particularly useful for digital library and Internet applications.
This paper presents a novel approach for video retrieval from a large archive of MPEG or Motion JPEG compressed video clips. We introduce a retrieval algorithm that takes a video clip as a query and searches the database for clips with similar contents. Video clips are characterized by a sequence of representative frame signatures, which are constructed from DC coefficients and motion information (`DC+M' signatures). The similarity between two video clips is determined by using their respective signatures. This method facilitates retrieval of clips for the purpose of video editing, broadcast news retrieval, or copyright violation detection.
There has been work on database systems that can retrieve multimedia objects by their content. We are extending this work by using the World Wide Web as source and storage for multimedia objects much like current text search engines do for textual information. A system that can access all types of multimedia objects by their content is a formidable task and improvements are constantly being made to indexing techniques. We have taken an important first step in demonstrating the viability of this technique while laying the groundwork for a larger, more capable system. We have implemented a simple indexing scheme while concentrating on building the infrastructure to support this system. Our system can retrieve references to images on the WWW, index those images, and store those images using spatial access methods. We then use query by example to find a set of images on the WWW that resemble our query image. Due to its design, it is easy to include additional context features, to substitute different indexing schemes, and add other types of multimedia to our system like time sequences, voice and video.
We describe a visual information system prototype for searching for images and videos on the World-Wide Web. New visual information in the form of images, graphics, animations and videos is being published on the Web at an incredible rate. However, cataloging this visual data is beyond the capabilities of current text-based Web search engines. In this paper, we describe a complete system by which visual information on the Web is (1) collected by automated agents, (2) processed in both text and visual feature domains, (3) catalogued and (4) indexed for fast search and retrieval. We introduce an image and video search engine which utilizes both text-based navigation and content-based technology for searching visually through the catalogued images and videos. Finally, we provide an initial evaluation based upon the cataloging of over one half million images and videos collected from the Web.
Most current image retrieval systems use holistic comparison that require a global match between images or presegmented object in images. However, often the user of an image database system is interested in a local match between images. For example, `Find images from the database with something like this anywhere in the image,' or `Fine images with something like this in some region of any image in the database,' or `Find images with this spatial configuration of regions like this.' In this paper, we provide an overview of a new framework that should help to allow these types of queries to be answered efficiently. In order to illustrate the usefulness of our framework, we have developed a complete image retrieval system based on local color information. Our system features fully automatic insertion and very efficient query execution, rivaling the efficiency of systems that can only handle global image comparisons. The query execution engine, called the ImageGREP Engine, can process queries at a speed of approximately 3000 images per second (or better) on a standard workstation when the index can be stored in main memory. In the future, we believe our framework should be used in other domains and applications, to handle queries based on texture or other material properties and perhaps domain specific image properties.
In this paper, we describe the SS+-tree, a tree structure for supporting similarity searches in a high- dimensional Euclidean space. Compared to the SS-tree, the tree uses a tighter bounding sphere for each node which is an approximation to the smallest enclosing sphere and it also makes a better use of the clustering property of the available data by using a variant of the k-means clustering algorithm as the split heuristic for its nodes. A local reorganization rule is also introduced during the tree building to reduce the overlapping between the nodes' bounding spheres.
A method for categorizing images using N X M-grams is presented. The goal of categorization is to find all images from the same category as a given query image. Some example categories are hand-written documents, printed documents, floor plans, satellite images, and fingerprints. The categorization method is based on the N-gram technique that is commonly used for determining similarity of text documents. Intuitively an N X M-gram is a small pattern in an image. The hypothesis that two images that have the same recurring patterns are likely to belong to the same category is examined. The notion of N X M-grams is defined and the process of computing an image profile in terms of the N X M-grams termed an N X M-gram vector is explained. Two similarity measures to compare images based on their N X M-gram vectors are proposed. Results of an experiment on images from various categories are presented. The two similarity measures for N X M-gram vectors are compared to each other as well as to results of categorization using a method based on color distribution features. The results show that for our test images N X M-gram based methods were more successful in finding images from the same category as a given query image than color distribution based methods.
This paper presents a formal framework for designing search algorithms which can identify target images by the spatial distribution of color, edge and texture attributes. The framework is based on a multiscale representation of both the image data, and the associated parameter space that must be searched. We defined a general form for the distance function which insures that branch and bound search can be used to find the globally optimal match. Our distance function depends on the choice of a convex measure of feature distance. For this purpose, we propose the L1 norm and some other alternative choices such as the Kullback-Liebler and divergence distances. Experimental results indicate that the multiscale approach can improve search performance with minimal computational cost.
A powerful new set of video playback control functions is proposed which aids subscribers in finding specific programs or contents from among a vast store of video materials. For conducting title-based searches, repeat and clip capabilities are proposed as ways of previewing or browsing a program's contents. For retrieving information from within a program, skip and fast forward/rewind functions are effective when searching through video materials that are familiar, while midway playback is an effective approach when searching through material that had never been seen before. Most significantly, this set of video playback control permits visual searches without reducing the number of concurrent users that can be supported while preserving a video access response time of under 1 second. The proposed methods are implemented in an experimental system and evaluated.
In this paper we present a framework for content based query and retrieval of information from large video databases. This framework enables content based retrieval of video sequences by characterizing the sequences using motion, texture and colorimetry cues. This characterization is biologically inspired and results in a compact parameter space where every segment of video is represented by an 8 dimensional vector. Searching and retrieval is done in real- time with accuracy in this parameter space. Using this characterization, we then evolve a set of discriminators using Genetic Programming Experiments indicate that these discriminators are capable of analyzing and characterizing video. The VideoBook is able to search and retrieve video sequences with 92% accuracy in real-time. Experiments thus demonstrate that the characterization is capable of extracting higher level structure from raw pixel values.
In general, video shots need to be clustered to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are usually described by one or several representative frames, called key frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose automatic solutions to the problems of key frame computing and key frame pruning. We develop an original image similarity criterion, which considers both spatial layout and detail content in an image. Coefficients of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit (quasi-) invariant properties. The novel `Seek and Spread' strategy used in key frame computing allows us to obtain a large representative range for the key frames. Inter-shot redundancy of the key frames is suppressed using the same image similarity measure. Experimental results demonstrate the effectiveness and efficiency of our techniques.
Automated analysis and annotation of video sequences are important for digital video libraries, content-based video browsing and data mining projects. A successful video annotation system should provide users with useful video content summary in a reasonable processing time. Given the wide variety of video genres available today, automatically extracting meaningful video content for annotation still remains hard by using current available techniques. However, a wide range video has inherent structure such that some prior knowledge about the video content can be exploited to improve our understanding of the high-level video semantic content. In this paper, we develop tools and techniques for analyzing structured video by using the low-level information available directly from MPEG compressed video. Being able to work directly in the video compressed domain can greatly reduce the processing time and enhance storage efficiency. As a testbed, we have developed a basketball annotation system which combines the low-level information extracted from MPEG stream with the prior knowledge of basketball video structure to provide high level content analysis, annotation and browsing for events such as wide- angle and close-up views, fast breaks, steals, potential shots, number of possessions and possession times. We expect our approach can also be extended to structured video in other domains.
The temporal and multi-modal nature of video increases the dimensionality of content based retrieval problem. This places new demands on the indexing and retrieval tools required. The Virage Video Engine (VVE) with the default set of primitives provide the necessary frame work and basic tools for video content based retrieval. The video engine is a flexible platform independent architecture which provides support for processing multiple synchronized data streams like image sequences, audio and closed captions. The architecture allows for multi-modal indexing and retrieval of video through the use of media specific primitives. This paper presents the use of the VVE framework for content based video retrieval.
Development of various multimedia applications hinges on the availability of fast and efficient storage, browsing, indexing, and retrieval techniques. Given that video is typically stored efficiently in a compressed format, if we can analyze the compressed representation directly, we can avoid the costly overhead of decompressing and operating at the pixel level. Compressed domain parsing of video has been presented in earlier work where a video clip is divided into shots, subshots, and scenes. In this paper, we describe key frame selection, feature extraction, and indexing and retrieval techniques that are directly applicable to MPEG compressed video. We develop a frame-type independent representation of the various types of frames present in an MPEG video in which al frames can be considered equivalent. Features are derived from the available DCT, macroblock, and motion vector information and mapped to a low-dimensional space where they can be accessed with standard database techniques. The spatial information is used as primary index while the temporal information is used to enhance the robustness of the system during the retrieval process. The techniques presented enable fast archiving, indexing, and retrieval of video. Our operational prototype typically takes a fraction of a second to retrieve similar video scenes from our database, with over 95% success.
We propose a new simple image coder based on Discrete Wavelet Transform (DWT). The DWT coefficients are coded in bitplanes. We use a variable order Markovian model to code the DWT coefficient bitplanes. Recently, we have developed this method that used 65 contexts. In this paper, the number of contexts is reduced to 34. We show the experimental results, both in terms of distortion measurement and visual comparison, and compare them to well-known methods.
Video content characterization is a challenging problem in video databases. The aim of such characterization is to generate indices that can describe a video clip in terms of objects and their actions in the clip. Generally, such indices are extracted by performing image analysis on the video clips. Many such indices can also be generated by analyzing the embedded audio information of video clips. Indices pertaining to context, scene emotion, and actors or characters present in a video clip appear especially suitable for generation via audio analysis techniques of keyword spotting, and speech and speaker recognition. In this paper, we examine the potential of speaker identification techniques for characterizing video clips in terms of actors present in them. We describe a three-stage processing system consisting of a shot boundary detection stage, an audio classification stage, and a speaker identification stage to determine the presence of different actors in isolated shots. Experimental results using the movie A Few Good Men are presented to show the efficacy of speaker identification for labeling video clips in terms of persons present in them.
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Although content based retrieval of images is increasingly common, the use of media content as a basis for navigation has received relatively little attention. In this paper we describe our recent development of facilities in the MAVIS/Microcosm architecture for generic link authoring and following from non-text media and in particular, the use of shape and texture for content based navigation from images. Applications from a product catalogue and an archaeological collection are presented, together with an outline of an image viewer providing rapid delineation of object shapes in images when authoring or following links.
In India a temple is not only a structure of religious significance and celebration, but it also plays an important role in the social, administrative and cultural life of the locality. Temples have served as centers for learning Indian scriptures. Music and dance were fostered and performed in the precincts of the temples. Built at the end of the 10th century, the Brihadisvara temple signified new design methodologies. We have access to a large number of images, audio and video recordings, architectural drawings and scholarly publications of this temple. A multimedia system for this temple is being designed which is intended to be used for the following purposes: (1) to inform and enrich the general public, and (2) to assist the scholars in their research. Such a system will also preserve and archive old historical documents and images. The large database consists primarily of images which can be retrieved using keywords, but the emphasis here is largely on techniques which will allow access using image content. Besides classifying images as either long shots or close-ups, deformable template matching is used for shape-based query by image content, and digital video retrieval. Further, to exploit the non-linear accessibility of video sequences, key frames are determined to aid the domain experts in getting a quick preview of the video. Our database also has images of several old, and rare manuscripts many of which are noisy and difficult to read. We have enhanced them to make them more legible. We are also investigating the optimal trade-off between image quality and compression ratios.
This paper presents an additive watermarking technique for grey scale pictures, which can be extended to video sequences. It consists of embedding secretly a copyright information (a binary scale) in the picture without degrading its quality. Those bits are encoded through the phase of Maximal Length Sequences (MLS). MLS are sequences having good correlation properties, which means that the result of the autocorrelation is far greater than crosscorrelations, i.e. correlations made with shifted version of this sequence. This embedding is performed line by line going from the top to the bottom of the picture as the objective was to implement a low cost and real time embedding method able to work for common video equipments. The very embedding process is underlain by a masking criterion that guarantees the invisibility of the watermark. This perceptive criterion, deduced from physiological and psychophysic studies, has already proved its efficiency in a previously presented paper. It is combined with an edge and texture discrimination to determine the embedding level of the MLS, whose bits are actually spread over 32 by 8 pixel squares. Eventually, some preliminary results are presented, which analyze the efficacy of the decoding as well as the resistance of the watermark towards compression and robustness against malevolent treatments.
To guarantee security and privacy in image transmission and archival applications, adequate efficient bulk encryption techniques are necessary which are able to cope with the vast amounts of image data involved. Experience has shown that block-oriented symmetric product ciphers constitute an adequate design paradigm for resolving this task, since they can offer a very high level of security as well as very high encryption rates. In this contribution we introduce a new product cipher which encrypts large blocks of plain-text (images) by repeated intertwined application of substitution and permutation operations. While almost all of the current product ciphers used fixed (predefined) permutation operations on small data blocks, our approach involves parameterizable (keyed) permutations on large data blocks (whole images) induced by specific chaotic systems (Kolmogorov flows). By combining these highly unstable dynamics with an adaption of a very fast shift register based pseudo-random number generator we obtain a new class of computationally secure product ciphers which are firmly grounded on systems theoretic concepts, offering many features that make them superior to contemporary bulk encryption systems when aiming at efficient image data encryption.
Those who have seen the move Forrest Gump witnessed former U.S. President J.F. Kennedy shaking the hand of Tom Hanks. Nevertheless, they have never met. Though this image seems real, how can we be certain it is not? How long can we still believe the images? This paper presents an efficient system guaranteeing to detect whether the images of an original video sequence have been modified between the original recording and the moment of viewing. Moreover, it takes into account the intervention of an editor between the original recording and the projection. Whatever is the final edited tape, the cryptographic information generated by the camera allows to authenticate the images of the editing. This scheme is dedicated to images coming from digital cameras using the DV format but can be extended to any other standard.
In the European project SMASH a mass multimedia storage device for home usage is being developed. The success of such a storage system depends not only on technical advances, but also on the existence of an adequate copy protection method. Copy protection for visual data requires fast and robust labeling techniques. In this paper, two new labeling techniques are proposed. The first method extends an existing spatial labeling technique. This technique divides the image into blocks and searches an optimal label- embedding level for each block instead of using a fixed embedding-level for the complete image. The embedding-level for each block is dependent on a lower quality JPEG compressed version of the labeled block. The second method removes high frequency DCT-coefficients in some areas to embed a label. A JPEG quality factor and the local image structure determine how many coefficients are discarded during the labeling process. Using both methods a perceptually invisible label of a few hundred bits was embedded in a set of true color images. The label added by the spatial method is very robust against JPEG compression. However, this method is not suitable for real-time applications. Although the second DCT-based method is slightly less resistant to JPEG compression, it is more resistant to line-shifting and cropping than the first one and is suitable for real-time labeling.
Digital watermarks have been proposed in recent literature as the means for copyright protection of multimedia data. In this paper we address the capability of invisible watermarking schemes to resolve copyright ownerships. We will show that rightful ownerships cannot be resolved by current watermarking schemes alone. In addition, in the absence of standardization of watermarking procedures, anyone can claim ownership of any watermarked image. Specifically, we provide counterfeit watermarking schemes that can be performed on a watermarked image to allow multiple claims of rightful ownerships. We also proposed non-invertible watermarking schemes in this paper and discuss in general the usefulness of digital watermarks in identifying the rightful copyright owners. The results, coupled with the recent attacks on some image watermarks, further imply that we have to carefully re-think our approaches to invisible watermarking of images, and re- evaluate the promises, applications and limitations of such digital means of copyright protection.
This paper describes the development of a prototype of a video database system, called VidIO, that takes account of the importance of different perspectives in video retrieval. Text-based hierarchical structures are used for representing the contents of a video. The structures are used for supporting the required functionalities in organizing personalized video materials. In addition to support for indexing original video materials, the system also supports tools for re-indexing and maintaining the results of video retrieval. In other words it tries to fulfill the requirement of personalized video information management. The paper defines the requirement, outlines the key considerations in providing such support and describes the implemented system.
Video databases can be searched for visual content by searching over automatically extracted key frames rather than the complete video sequence. Many video materials used in the humanities and social sciences contain a preponderance of shots of people. In this paper, we describe our work in semantic image retrieval of person-rich scenes (key frames) for video databases and libraries. We use an approach called retrieval through segmentation. A key-frame image is first segmented into human subjects and background. We developed a specialized segmentation technique that utilizes both human flesh-tone detection and contour analysis. Experimental results show that this technique can effectively segment images in a low time complexity. Once the image has been segmented, we can then extract features or pose queries about both the people and the background. We propose a retrieval framework that is based on the segmentation results and the extracted features of people and background.
While histogram or global feature approaches are powerful methods to encode image information for retrieval purposes, they suffer from a complete lack of spatial information. One possibility to overcome this drawback is the storage of the feature vectors of subregions. However, this increases the size of the index vector. The paper suggests to store only the differences of the features between a region and its subregions, instead the whole feature vector of subregions. This introduced distance is called inter hierarchical distance (IHD). A new index, which combines the IHD and global color feature of the whole image, is suggested. The subregions are gained by a fixed tessellation. Experimental results, using an image database with more than 12'000 color images, are presented. The retrieval power of the combined index is as powerful as an index which is 2.5 times larger in size and just needs global color features. The IHD is invariant to linear color transformation, which ensures a more stable performance of the index under gamma corrections.
In an architectural database that is to be used by architects, urbanists, sociologists, geometers, etc., querying must be simplified. The aim of this work is to retrieve the images of a building that best fit a specified point of view. Original data are provided in DXF and TIFF formats (maps and images respectively.) A loose linking between these two types of information is obtained through textural attributes. However, the same building is photographed several times and more than a single building can appear on a picture. After determining the point of view by simple `clicks' on a map, we take advantages of the geometrical description of the building in order to draw its outline. Then, the images that have been textually associated with the selected building undergo a five-steps image-processing algorithm: conversion from the RGB color- space to intensity component, Nagao filtering, oriented gradient filtering, thresholding, and correlation-based hierarchical full search matching. If the building objects are not completely masked by natural ones, the `rectangular' shapes of frontage and side walls correspond well to the sketch and the requested images are returned to the user.
An indexing method for content-based image retrieval by using textual information in video is proposed. Indices extracted from textual information make it possible to retrieve video data by a conceptual query, such as a topic or a person's name, and organize flat video data into structured video data based on its conceptual content. To this end, we developed a text extraction and recognition algorithm and a visual feature matching algorithm for indexing and organizing video data at a conceptual level. The text extraction and recognition algorithm identifies frames in the video which contain text, extracts the text regions from the frame, finds text lines, and recognizes characters in the text line. The visual feature matching algorithm measures the similarity of frames containing text of find frames with similar appearances text, which can be considered topic change frames. Experiments using real video data showed that our algorithm can index textual information reliably and that it has good potential as a tool for making content-based conceptual-level queries to video databases.
Histogram comparison is a popular technique for image indexing. Given a query image, histogram-based techniques can retrieve similar images from a database, which were acquired under similar illumination levels. However, these techniques fail when images are acquired under different illumination conditions. In this paper, we propose two novel histogram-based techniques which are robust to the changes in illumination. First, we propose to employ moments of the image histogram which are invariant to translation and scaling of image gray levels. Secondly, we propose to compare the parameters of histograms of the wavelet subbands for indexing. These parameters are modified appropriately to counter the effect of changes in illumination. The proposed techniques are computationally inexpensive and can be easily integrated within a wavelet-based image coder.
This paper discusses a novel data placement scheme which optimizes the storage utilization of a NVOD system. The scheme is most distinctive in the following two aspects: (1) It considers the file blocks placement of programs featured different number NVOD channels. (2) The file blocks grouping scheme optimizes the storage utilization of a NVOD system.
We test the performance of a texture feature constructed from the variance of the first eight AC Discrete Cosine Transform (DCT) coefficients of JPEG compressed images. We break the image into sub-images, consisting of many 8*8 blocks, and them calculate the variance of each DCT coefficient across the sub-image. We evaluate the texture feature at two different image resolutions, and at three different quality factors. In our high resolution image a pixel covered a square of side 4 cm on the ground. Our low resolution image was generated by subsampling. Representative feature vectors were generated for give subjectively identified textures, by averaging a small training set. Each sub-image was then classified according to the representative feature vector closest in feature space. Compression ratio had little effect on the classification result in our study. However image resolution significantly altered the classification result. Classification correlated much more closely to a subjective classification for the low resolution image. Feature vectors also fell into much more clearly defined clusters at the lower resolution. Although more research is required across different photo-scales and sets of images, we conclude that texture features generated from compressed JPEG images have potential for content-based image retrieval based on texture.
This article propose a way to automatically retrieve images from the world-wide-web using a semantic description for images and an agent concept for the retrieval of images. The system represents image in a textual way, e.g. look for a portrait of the a specific person, or fetch an image showing a countryside in Southern California. This textual descriptions are fed in search machines, e.g. yahoo, alta- vista. The resulting html documents are seeked for links. The next step subsequently processes each link by fetching the document other the net, converting it to an ascii representation, and performing a full text search by using the image description. This leads to starting points of images which are retrieved via a web-agent. The image descriptions are decomposed in a set of parts containing image operations which are further processed, e.g. a set for representing the background of a portrait tries to find a homogeneous region in the image because this is likely to find in a portrait. Additional operations are performed on the foreground, i.e. the image region which contains e.g. the face of a person. The system is realized using two C++ libraries: one for building up web-agents, LIWA++, and one for processing images, HORUS.
The technique of symbolic projection has been widely studied in the area of image database systems as a first step towards content-based indexing and retrieval of images. In this paper we have extended the idea of symbolic projections to video and audio data as well as to multimedia documents containing combinations of these data types. Formal definitions of symbolic video sequence, symbolic audio sequence and symbolic multimedia documents are given as are definitions of their symbolic projections. An indexing methodology based on these symbolic projections is presented. Operators which allow multimedia documents to be constructed from the basic multimedia data types are also presented. The main contribution of this paper is to provide a basis for the development of content-based retrieval of multimedia documents via extended symbolic projections.
In the European project SMASH mass-market storage systems for domestic use are under study. Besides the storage technology that is developed in this project, the related objective of user-friendly browsing/query of video data is studied as well. Key issues in developing a user-friendly system are (1) minimizing the user-intervention in preparatory steps (extraction and storage of representative information needed for browsing/query), (2) providing an acceptable representation of the stored video content in view of a higher automation level, (3) the possibility for performing these steps directly on the incoming stream at storage time, and (4) parameter-robustness of algorithms used for these steps. This paper proposes and validate novel approaches for automation of mentioned preparatory phases. A detection method for abrupt shot changes is proposed, using locally computed threshold based on a statistical model for frame-to-frame differences. For the extraction of representative frames (key frames) an approach is presented which distributes a given number of key frames over the sequence depending on content changes in a temporal segment of the sequence. A multimedia database is introduced, able to automatically store all bibliographic information about a recorded video as well as a visual representation of the content without any manual intervention from the user.
This paper presents a model of image coding and delivering for multimedia and images browsing system based on a multiresolution format. The multiresolution format coding is suitable for evaluations either on server performance or on the effect on the image content, in terms of semantic and syntactic degradation. Multimedia and image browsing systems are used in image based service (IBS). Pictorial, technical, medical, geographic information management, such as home shopping and WWW service, are based on images organised in databases. In this system a large amount of resources are enslaved to image coding, transmission, decoding and showing. Considering that not every image retrieved corresponds to the user needs, a non negligible resource is unwisely used. The need for full-resolution image retrieval involves high time consumption for data retrieval and for image viewing. Likewise in a window system, the user wants to be able to resize the image frame flexibly. More techniques are available1 with different performance in terms of content maintenance and complexity, but the network load is not reduced if the resizing is realised by the client. A simple solution to guarantee the client independent service, in terms of client image-resolution, is in the storing of the images in the server in more files with different resolution of the same image, Multi File Coding (MFC), but with the image information degraded as a function ofthe downsized images and the algorithm used. It is well known that an image can be represented in mathematical form as a continuous functionf of two variables x and y. Using x and y as coordinates of the point on the screen,f is an attribute of the point (like luminance, HSV component etc.). Assuming that the information contained in one image is localized in the points where the functionfis defmed, in a picture the information is uniformly distributed over a large part of the image. On the other hand, in technical images the functionf is defined only in a small area (the plotting area),while the remaining area represents the background and is devoid of information. Resizing operations are characterized by the reduction of the image pixels (understood as a basic element of the image), likewise the information located in the image decreases as a function of the lost pixels or as a function ofthe reduction ratio2. This situation involves a degradation of the image. In the case of pictorial images the information loss is uniformly distributed and usually counterbalanced by human reasoning-driven mechanisms. So in the reduction of pictorial images the visual information loss is less than the pixel loss. In the resizing of technical images the use of symbols, thin lines and types localized in small areas involves the loss of information content.
Partitioning video sequences into individual shots is one of the fundamental processes in video content parsing and content-based video retrieval. Up to now, a variety of algorithms and systems have been developed to perform this task. However, most of these algorithms exhibit their weakness when applied to detect gradual transitions such as dissolves, wipe, fade-in and fade-out. In this paper, we presented an integrated scheme to the detection of abrupt camera breaks and gradual scene changes using DCT coefficients and motion data encoded in the MPEG compression stream. The core of the proposed approach is a tree-like classifier. Three algorithms are organized in the classifier to deal with the complicated situation in real-world video sequences separately.
In this paper we address image retrieval by similarity in multimedia databases. We discuss the generation and use of signatures computed from image content. The proposed technique is not based on image annotation, therefore it does not require human assistance. Signatures abstract the directionality of image objects. They are computed from the image Fourier transform, and the influence of computation parameters on signature effectiveness is discussed. Retrieval is based on spectrum comparison between a reference image, assumed as the query, and the images in a collection. We introduce a metric for comparing the spectra and ranking the result, and approach the issue of partial query specification. Sample results on a small test collection are given.
The development of increasingly complex multimedia applications calls for new methodologies for the organization and retrieval of still images and video sequences. Query and retrieval methods based on image content promise good results, are currently widely investigated and, to some extent, already commercially available. Yet a large number of issues remain unsolved. In this paper we describe some results of a study on similarity evaluation in image retrieval using color, object orientation and relative position as content features. A simple prototype system is also introduced that computes the feature descriptors and performs queries. Although not trivial, the features extraction process is completely automated and requires no user intervention. The system is admittedly not a general purpose tool, but is oriented to thematic image repositories where the semantics of stored images are limited to a specific domain.
We are exploring the use of highly distributed computing and storage architectures to provide all aspects of collecting, storing, analyzing, and accessing large data-objects. These data-objects can be anywhere from tens of MBytes to tens of GBytes in size. Examples are: single large images from electron microscopes, video images such as cardio- angiography, sets of related images such as MRI data and images and numerical data such as the output from a particle accelerator-experiment. The sources of such data objects are often remote from the users of the data and from available large-scale storage and computation systems. Our Large Data- object Management system provides network interface between the object sources, the data management system and the user of the data. As the data is being stored, a cataloguing system automatically creates and stores condensed versions of the data, textual metadata and pointers to the original data. The catalogue system provides a Web based graphical interface to the data. The user is able the view the low- resolution data with a standard internet connection and Web browser, or if high-resolution is required can use a high- speed connection and special application programs to view the high-resolution original data.
A picture knowledge base management system is described that is used to represent, organize and retrieve pictures from a frame knowledge base. Experiments with human test subjects were conducted to obtain further descriptions of pictures from news magazines. These descriptions were used to represent the semantic content of pictures in frame representations. A conceptual clustering algorithm is described which organizes pictures not only on the observable features, but also on implicit properties derived from the frame representations. The algorithm uses inheritance reasoning to take into account background knowledge in the clustering. The algorithm creates clusters of pictures using a group similarity function that is based on the gestalt theory of picture perception. For each cluster created, a frame is generated which describes the semantic content of pictures in the cluster. Clustering and retrieval experiments were conducted with and without background knowledge. The paper shows how the use of background knowledge and semantic similarity heuristics improves the speed, precision, and recall of queries processed. The paper concludes with a discussion of how natural language processing of can be used to assist in the development of knowledge bases and the processing of user queries.
Visual (image and video) database systems require efficient indexing to enable fast access to the images in a database. In addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate the use of compression technique. Vector quantization (VQ) is an efficient technique for low bit rate image and video compression. In addition, the low complexity of the decoder makes VQ attractive for low power systems and applications which require fast decoding. The detection of camera operations provides a mechanism to segment a long video shot into short clips defined by homogeneous camera operations which can then be used for indexing. In this paper, we present a technique for the detection of camera operations in video sequences compressed using VQ. The proposed technique is executed in the compressed domain. This entails significant savings in computational and storage costs resulting in faster execution.
The wavelet packet transform and the successive approximation quantization techniques, which have been adopted in modern wavelet coding, are exploited for content- based image retrieval in this research. By adopting this approach, images can be compressed and indexed simultaneously, and the complexity of database management can be significantly reduced. The proposed new feature for image indexing is the number of significant wavelet coefficients in each wavelet packet subband. This feature does not only serve as a good representation of image content but also allows a hierarchical retrieval and browsing of images and facilitates the progressive transmission of retrieved images. Extensive experimental results are provided to demonstrate the retrieval efficiency of the proposed new method.
Watermarking techniques, also referred to as digital signature, sign images by introducing changes that are imperceptible to the human eye but easily recoverable by a computer program. Generally, the signature is a number which identifies the owner of the image. The locations in the image where the signature is embedded are determined by a secret key. Doing so prevents possible pirates from easily removing the signature. Furthermore, it should be possible to retrieve the signature from an altered image. Possible alternations of signed images include blurring, compression and geometrical transformations such as rotation and translation. These alterations are referred to as attacks. A new method based on amplitude modulation is presented. Single signature bits are multiply embedded by modifying pixel values in the blue channel. These modifications are either additive or subtractive, depending on the value of the bit, and proportional to the luminance. This new method has shown to be resistant to both classical attacks, such as filtering, and geometrical attacks. Moreover, the signature can be extracted without the original image.
The SPYRUS Metering System allows intellectual property suppliers to control access to electronic information with content-based meters. Information is distributed to users in encrypted form, and each user has a hardware token that contains the cryptographic keys necessary to decrypt the information as well as the meters that control the use of each key. The hardware token will not grant access to the information by decrypting it unless the supplier provided meter constraints are met. Since the hardware token includes a built-in real time clock, time-based meters can be enforced without relying on the easily modified computer system clock.