In content based image retrieval, systems allow users to ask for objects similar in shape to a query object. However, there is no clear understanding of how computational shape similarity corresponds to human shape similarity. In this paper several shape similarity measures were evaluated on planar, connected, non-occluded binary shapes. Shape similarity using algebraic moments, spline curve distances, cumulative turning angle, signal of curvature and Hausdorff- distance were compared to human similarity judgments on twenty test shapes against a large image database.
Most color indexing techniques proposed in the literature are similar: images are represented by color histograms, and a metric on the color histogram space is used to determine the similarity of images. In this paper we determine the limits of these color indexing techniques. We propose two functions to measure the discrimination power of indexing techniques: the capacity (how many distinguishable histograms can be stored) and the maximal match number (the maximal number of retrieved images). We derive bounds for these functions. These bounds have two practical aspects. First, they help a user to decide whether color histograms effectively index database images from a given domain. Second, they facilitate the choice of a good threshold for the distance below which histograms are considered similar. Our arguments are based on an analysis of the metrical properties of the histogram space and results from coding theory. The results show that over a large range of reasonable parameters the capacity is very large. Thus, the set of parameters for which color indexing works well can be described as the set of parameters for which the maximal match number is below an application-dependent maximum.
Academic and photo libraries require the reuse of images. If the database is large, similarity retrieval and retrieval by sketch are effective. However, similarity retrieval can retrieve images that are similar to only one key image. Retrieval by sketch can retrieve images that are similar to the sketch, but it is not always easy to make a sketch sufficiently similar to the desired image. To overcome these limitations, we propose a new flexible montage retrieval method (FMR) that allows the user to retrieve images by appropriately combining characteristics of the key images in the same way that a montage picture of a criminal is constructed. In FMR, the user selects one or more key images and specifies areas (called `key regions') manually or by using image manipulating tools. He then indicates the retrieval conditions, for example, the requirement that the image to be retrieved includes specified areas. This paper describes an FMR algorithm that uses a color histogram and texture to quantify the similarity between a key region and an image in the database. The effectiveness of FMR is then statistically demonstrated by comparing its correct retrieval rate with that of retrieval by sketch. The retrieval rate is calculated from the result of retrieving the images that include a specified sky image from among more than 150 images containing various sky images.
We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We describe three Photobook tools in particular: one that allows search based on grey-level appearance, one that uses 2D shape, and a third that allows search based on textural properties.
This paper describes the retrieval process from image databases, based on a partial match between the query and the images. The proposed approach allows to measure the similarity between the query and the images in the database and to retrieve those having the highest probability to be relevant. The paper describes the query processing and the access structures, based on the `signature method'. Four levels of signature files are associated to the image database and a signature is associated to the query. The query signature is compared with the image signatures in a four step image processing algorithm. The result of the process is a set of images with an associated recognition degree, measured by using information provided by the user during query formulation (such as importance of the presence of each object) and by using the image structure and the recognition degree associated to each object. The retrieved images are presented to the user in decreasing relevance order. The method described so far is inefficient, since the selection of most relevant images is executed among all relevant images (even those having a low relevance). The paper presents two approaches for improving the efficiency of query processing by (a) reducing the number of accesses to the Image Database and (b) by reducing the number of accesses to the signature file. The two approaches are discussed in detail in the paper. The advantages and drawbacks of each method are illustrated.
Indexing of scientific image databases is a difficult task, due to their extraordinary sizes and to the complex nature of the visual information contained in them. This data volume and complexity require an automatic indexing scheme that will categorize this visual content; without it, the data will be essentially useless to scientists and medical doctors. A method for automatic indexing of scientific image databases is presented which involves a wavelet package decomposition of images in the frequency domain, resulting in a quad-tree of subbands. These subbands are regarded as realizations of random fields, and statistical measures are computed on them. One class of newly derived measures determines whether the subbands contain any significant organization of pixels beyond what chance would imply. If this is found to be true for a subband, its node is retained on an index tree, and other identifying measurements may be added. The structure of the resulting pruned subband tree constitutes the first level of index; the node statistics form a second indexing level. Results of a pilot study are reported; they suggest that further investigation of this approach is warranted.
The paper presents mathematical and empirical results of the behavior of a new multidimensional neural computing paradigm called multidimensional holographic associative computing (MHAC). MHAC can be potentially used for high density associative storage and retrieval of image information. Unlike conventional neural computing, each morsel of information in MHAC is presented as a complex vector in a multidimensional unit spherical space. Each of the individual phases of the vector enumerates a value of the information. The magnitude of the vector represents the associated confidence in the information. In contrast, the conventional neural computing operates only on the notion of confidence. The proposed multidimensional generalization demonstrates significant improvement in associative storage capacity without the loss of generalization space. Virtually, unlimited pattern associations can be enfolded over a single holographic memory substrate by higher order encoding. In addition, its well-structured computation, simultaneous multi-channel learning, and single step non- iterative retrieval promise highly scalable parallelism. The paper presents the theory of operation of MHAC that is founded on the generalized holographic principles and multidimensional Hebbian learning. The paper also presents analytical as well as empirical evidence from computer simulation supporting the superior performance of MHAC cells.
The Manchester Content Addressable Image Database (CAID) is a generic tool which has been designed for image informatics and computer vision problems. The system stores pre- computed feature tokens, which are obtained by conventional processing of the input images, into a database which is accessed by a specialized query language (MVQL). MVQL extends the usual notion of a query language to include creation and refinement of groupings based on the stored attributes of the pre-computed features, and these groupings can be continued in a nested fashion. The aim of this paper is to report on two recent application studies. The first study concerns the automatic registration of volume MR data sets. This is achieved entirely within the CAID environment (i.e., as a database query) using only a short MVQL specification. The method used is based on curve token correspondence and an exhaustive search of transformation space. Some results on the estimated accuracy of the result will be included. The second application is concerned with the classification of microfossil images into morphological groups. Preliminary work on detecting microfossil structures was reported in Shann et al. Here we address the problem of classifying detected structures into six broad morphological groups. This is achieved using MVQL to define `structure' measures from the distribution of curve tokens in a circular region around each microfossil. The results are surprisingly good considering the very limited feature evidence used. Both application studies confirm the general applicability of the CAID structure and its query language.
The design of an electronic archive of digitized images of thousands of xrays collected as part of nationwide health surveys has raised several issues related to user interface design, image presentation and image compression. The project involves developing an image archive implemented with an optical disk jukebox, and user workstations that allow Internet access to the images. This paper describes: the physical layout design of the workstation screens; desirable image processing functions contributing to better viewing and minimizing artifacts; interface design factors contributing to ease-of-use and speed of task completion; and work toward the selection of a suitable image compression technique.
Computer-assisted content-based indexing is a critical enabling technology and currently a bottleneck in productive use of video resources. This paper presents the Video Classification Project, an effort toward automating content-based video indexing and retrieval, at the Institute of Systems Science of the National University of Singapore. We discuss in detail three goals of the project: image processing tools for video parsing, feature extraction and retrieval; a knowledge-based approach to representing video content; and stratified tools which allow greater flexibility in browsing a video resource, either before or after performing specific retrieval operations.
Video-on-Demand systems have received a good deal of attention recently. Few studies, however, have addressed the problem of locating a video of interest in a large video database. This paper describes the design and implementation of a metadata database and query interface that attempts to solve this information retrieval problem.
In this paper we present a multiresolution approach for video indexing and feature matching of subband coded video databases. Subband coding refers to a coding technique where the input images are quantized after being decomposed into several narrow spatial frequency bands by filtering and decimation. Five different approaches were tested for scene change detection which is applied only on the lowest subband for computational efficiency. Two kinds of scene changes, abrupt and smoothly accumulated scene changes, mark the beginning of new scene segments. An index for each scene segment is the histogram of two representative frames, which we take to be the first and the last frame of the scene for simplicity. Using the approach of query by example, the index matching algorithm takes a multi-resolution approach by hierarchically comparing histograms at different resolutions. The search algorithm for the match between example query and its target scene segment starts from the coarsest resolution, and moves to the next finer resolution until a single match is obtained or the finest resolution is reached. Experimental results are presented, and the proposed indexing technique appears to be promising for its computational efficiency and its inherent hierarchical search procedure.
Owing to the advent of multimedia, simultaneous access 'to text, image, sound, graphics and video has now became been made possible. However, video on computers is a very recent capability. It is achieved thanks to the progress made in hardware components, digitization equipments, storage media (CD ROM) and, above all, image compression techniques (JPEG 1 for still pictures and MPEG 2for motion pictures). The representational power of video makes it the favourite medium for many multimedia applications. The various media are not used in the same way. A structured text (with a table of contents), a still, graphics or sound (with representative graph) may be seen as a whole, allowing them to be handled (for editing purposes for example). Furthermore, these media support information-based navigation (hypertext 3 is a good example of this capability). On the other hand, video is basically a sequential medium. Operations such as fast-forward, rewind and pause allow only limited interactivity when consulting video. This lack of interactivity is due to the fact that video information, unlike structured text, is delivered "flat", with no indications. In order to have the same capacities as in a structured text, it is necessary to have an interpretation level above the raw video information. In the case of structured text, the interpretation level corresponds to paragraph and chapter titles and to the table of contents. Furthermore, text has a basic semantic entity : the word which is used with the search function and the index to create a multiplicity of navigational possibilities through the information. The goal of our work is aimed to offer ways of accessing videos which are similar to those of structured texts. This will make it possible to: .quickly become acquainted with the video content, .quickly access the part of the video of particular interest and .haveentry points in the video. Our methodology is comprised of two stages. In the first stage, we seek to extract the maximum information from the video by using digital image processing techniques. This stage enables us to obtain a layer of interpretation of the raw video data. In the second stage, an operator builds a high level video representation using the results of the first stage. In the next paragraph, we introduce the terms used throughout this paper. Related work is presented in the third paragraph, our own work is described in the fourth one. The conclusion draws a picture of our present work and the potential developments of it.
The design of a distributed video-on-demand system that is suitable for large video libraries is described. The system is designed to store 1000s of hours of video material on tertiary storage devices. A video that a user wants to view is loaded onto a video file server close to the users desktop from where it can be played. The system manages the distributed cache of videos on the file servers and schedules load requests to the tertiary storage devices. The system also includes a metadata database, described in a companion paper, that the user can query to locate video material of interest. This paper describes the software architecture, storage organization, application protocols for locating and loading videos, and distributed cache management algorithm used by the system.
In this paper, we evaluate the performance of our trial hierarchical storage server by using simulation. Our system is composed of magnetic disk drives and an optical mass storage system (MSS). This is based on three key techniques: (1) multiple readout control using the time slot synchronous method, (2) quick access control of the MSS by using magnetic disk drives, (3) a storage hierarchy method for video programs that has three configurations. To verify the effectiveness of our storage hierarchy, we construct a queuing network simulation model of the system based on experimentally measured values, and analyze the average waiting time for each of the three configurations of video programs. Selecting the appropriate video program configuration shortens the average waiting time to half of previous times, with a quick access of less than 1 second, and with 8 simultaneous readouts. Storage costs are reduced to about half that of magnetic disk drives only.
We present three strategies for placement of video data on parallel disk arrays. Using a low- level disk model and video data from a scalable subband coding technique, we derive constraints with which to compare the three strategies. One strategy, constant frame grouping, is shown to be superior. Two methods for interleaving multiple videos under the constant frame grouping strategy are presented: nonperiodic and periodic. Periodic interleaving is shown to have the advantages of a lower access time and limited scan and pause functions. The constant frame grouping strategy is tested on an actual array of 8 disks and shown to have performance that is close to the theoretical prediction. The scalable nature of the compressed data is used to relieve the disk system overload for an overly high request rate.
This paper describes a novel system for real-time object extraction from a moving video image. The object extraction method employed in this system has two features. The first one is multi-channel thresholding in a color space for extracting a target with several colors as a single object. The color space is normalized by luminance to make the process robust against light intensity fluctuations. The other feature is a key algorithm, called sequential growing, which exploits interfield correlation of a video image and realizes field-rate operation. In this algorithm, a binary image representing a target object is generated by growing an initially given mask image horizontally and vertically until whole object is extracted. We have also fabricated a 0.8 micrometers CMOS chip to incorporate the system into a compact video camera, which has successfully extracted objects in a number of general scenes.
Motion and structure analysis in video sequences can lead to efficient descriptions of objects and their motions. Interesting events in videos can be detected using such an analysis--for instance independent object motion when the camera itself is moving, figure-ground segregation based on the saliency of a structure compared to its surroundings. In this paper we present a method for 3D motion and structure analysis that uses a planar surface in the environment as a reference coordinate system to describe a video sequence. The motion in the video sequence is described as the motion of the reference plane, and the parallax motion of all the non-planar components of the scene. It is shown how this method simplifies the otherwise hard general 3D motion analysis problem. In addition, a natural coordinate system in the environment is used to describe the scene which can simplify motion based segmentation. This work is a part of an ongoing effort in our group towards video annotation and analysis for indexing and retrieval. Results from a demonstration system being developed are presented.
The main aim of this paper is to describe a method for locating a subimage of a stored image that approximately matches a given query image. This matching can support naive users in accessing an image database according to image contents rather symbolic attributes. The query image can be either composed using painting tools or cuts out of an actual scanned image. Our method is based on the extraction of features from the query image and from the stored images. The following three steps are involved: (1) an ISODATA algorithm is applied to segment (into region) both the query image and the stored images; (2) the normalized moment and geometrical features are computed from the segmented regions, and (3) a matching process is run on the resulting features to find those stored images which should be retrieved. The result is an ordered list of stored images or subimages from the database.
Visual query programs are an effective tool for searching large collection of images. Unlike most programs of this type that rely on text to access categories of images, IBM's QBIC software will search for images based on what they look like. This is of value in situations where the content of an image is not described in the text catalog or when the user is seeking relational images. The Art and Art History Department at the University of California, Davis is using QBIC to determine what role it may play in retrieval of fine art images. This paper describes the department's construction of a pilot database and presents preliminary findings.