Color histograms have been shown to be very effective in representing images for content- based retrieval systems. They are compact, robust and amenable to computer analysis. However, such histograms convey only global image properties and do not embody local color information which is so important when comparing and contrasting images. We present a novel image coding scheme which captures some of this locally correlated color information and improves the selectivity of the retrieval mechanism -- an important issue for very large databases. The technique uses a histogram of features which represent frequently occurring local combinations color tuples occurring throughout the image. It outperforms straight color histogram matching. We outline the thrust of our approach and discuss the factors affecting the efficacy of the retrieval mechanism using a database of color images.
Color space selection and quantization are critical to content-based image retrieval based on color histograms. In this work, we first examine the color distribution in different color spaces including RGB, HSV, YUV and Munsell spaces and discuss the appropriate quantization strategies in these color spaces based on the distribution of colors. Then, we propose a color quantization scheme which applies the Lloyd-Max quantizer along each axis independently. The proposed scheme is simple yet efficient. Retrieval using the proposed quantization scheme on different color spaces are compared through experiments.
Successful retrieval of images by shape feature is likely to be achieved only if we can mirror human similarity judgments. Following Biederman's theory of recognition-by-components, we postulate that shape analysis for retrieval should characterize an image by identifying properties such as collinearity, shape similarity and proximity in component boundaries. Such properties can then be used to group image components into families, from which indexing features can be derived. We are currently applying these principles in the development of the ARTISAN shape retrieval system for the UK Patent Office. The trademark images, supplied in compressed bit-map format, are processed using standard edge-extraction techniques to derive a set of region boundaries, which are approximated as a sequence of straight-line and circular-arc segments. These are then grouped into families using criteria such as proximity and shape similarity. Shape features for retrieval are then extracted from the image as a whole, each boundary family, and each individual boundary. Progress to date with the project is analyzed, evaluation plans described, and possible future directions for the research discussed.
To improve the discrimination power of color indexing techniques we encode a minimal amount of spatial information in the index. We propose an approach that lies between uniformly tesselating the images with rectangular regions and relying on fully segmented images. For each image we define 5 partially overlapping, fuzzy regions. From each region in the image we extract the first three moments of the color distribution and store them in the index. The feature vectors in the index are relatively insensitive to small translations and small rotations of an image because they are extracted from fuzzy regions. To retrieve images we define a function which measures the similarity of two color feature vectors. Invariance of retrieval results with respect to the typical image rotations of 90 degrees around the center of the image is guaranteed because our feature similarity function exploits the spatial arrangement of the 5 image regions. We present experimental results using an image database which contains more than 11,000 color images. Our experiments demonstrate clearly that our weak encoding of spatial information significantly increases the discrimination power of the index compared to plain color indexing techniques.
The CANDID project (comparison algorithm for navigating digital image databases) employs probability density functions (PDFs) of localized feature information to represent the content of an image for search and retrieval purposes. A similarity measure between PDFs is used to identify database images that are similar to a user-provided query image. Unfortunately, signature comparison involving PDFs is a very time-consuming operation. In this paper, we look into some efficiency considerations when working with PDFs. Since PDFs can take on many forms, we look into tradeoffs between accurate representation and efficiency of manipulation for several data sets. In particular, we typically represent each PDF as a Gaussian mixture (e.g. as a weighted sum of Gaussian kernels) in the feature space. We find that by constraining all Gaussian kernels to have principal axes that are aligned to the natural axes of the feature space, computations involving these PDFs are simplified. We can also constrain the Gaussian kernels to be hyperspherical rather than hyperellipsoidal, simplifying computations even further, and yielding an order of magnitude speedup in signature comparison. This paper illustrates the tradeoffs encountered when using these constraints.
Content-based retrieval in image management systems requires indexing of image feature vectors. Most feature vectors have a high number of dimensions (15+). This makes indexing difficult since most existing multi-dimensional indexing structures grow exponentially in size as dimensions increase. We approach this problem in three stages: (1) reduce the dimensionality of the feature space, (2) evaluate existing multi-dimensional indexing structures to determine which one can best organize the reduced feature space, and (3) customize the selected structure to improve search performance. To reduce the dimensionality of the feature space without losing much information we apply a statistical technique called principal component analysis (PCA), using Turk and Pentland's eigenimages approach. We then conduct a comparative analysis of a wide range of existing multi-dimensional indexing structures, selecting and implementing three of them (bucket adaptive KD-tree, gridfile, R- tree) for further empirical comparisons. Tests show that the adaptive KD-tree uses the least storage and performs the best during search. Finally, we customize the bucket adaptive KD- tree by implementing techniques that take advantage of the characteristics of the transformed space -- namely, ranked dimensions by decreasing variance, and known dynamic ranges. This prunes the search space and results in very efficient searches. The number of page accesses are reduced significantly, sometimes leading to savings as high as 70%.
Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.
Content based image retrieval techniques are being developed for automatic indexing and retrieval of images in many applications. One of the main features of an image is its dominant colors, hence the development of color based image retrieval techniques. In these techniques, images are indexed using their dominant colors and images with perceptually similar dominant colors to the query are retrieved. In a large image database, images may come from many different sources, may be captured using different devices and represented using different color spaces. These differences may be subtle, but result in different meanings of image data. If these differences are not accounted for, image retrieval performance may suffer. In existing systems, this factor is normally not considered. In this paper, we present various different image representations. We then discuss effects on retrieval performance using color histograms when images in a database are represented differently. It is shown that different image representations have serious effects on image retrieval performance. We discuss the conversion between different image representations, and information required to carry out these conversions. This information is normally not available in most current image formats, which indicates a need of a common color image interchange format.
R-tree and its variants are examples of spatial data structures for paged-secondary memory. To process a query, these structures require multiple path traversals. In this paper, we present a new image access method, SB+-tree which requires a single path traversal to process a query. Also, SB+-tree will allow commercial databases an access method for spatial objects without a major change, since most commercial databases already support B+-tree as an access method for text data. The SB+-tree can be used for zero and non-zero size data objects. Non-zero size objects are approximated by their minimum bounding rectangles (MBRs). The number of SB+-trees generated is dependent upon the number of dimensions of the approximation of the object. The structure supports efficient spatial operations such as regions-overlap, distance and direction. In this paper, we experimentally and analytically demonstrate the superiority of SB+-tree over R-tree.
This paper describes an algorithm for searching image databases for images that match a specified pattern. The application in mind for this algorithm is a query system for a large library of digitized satellite images. The algorithm has two thresholds that allow the user to adjust independently the closeness of a match. One threshold controls an intensity match and the other controls a texture match. The thresholds are correlations that can be computed efficiently in the Fourier transform domain of an image, and are particularly efficient to compute when the Fourier coefficients are mostly zero. Thus the scheme works well with image-compression algorithms that replace small Fourier coefficients by zeros. For compressed images, the majority of the cost of processing such images is in computing the inverse transforms plus a few operations per pixel for nonlinear threshold operations. The quality of retrieval for this algorithm has not been evaluated at this writing. We show the use of this technique on a typical satellite image. The technique may be suitable for automatic identification of cloud-free images, for making crude classifications of land use, and for finding isolated features that have unique intensity and texture characteristics. We discuss how to generalize the algorithm from matching gray-scale intensity to color or multispectral images.
This paper presents the software architecture for DocBrowse: a system for mixed text/graphics document image analysis and retrieval. DocBrowse is an open and extensible environment that permits the user to visually manage and perform queries on highly degraded document image databases. DocBrowse also serves as a research environment for developing document image analysis and query by image example (QBIE) algorithms. The system consists of a user interface, an object-relational document database and a variety of document image analysis engines. Using DocBrowse, it is possible to perform queries that retrieve documents based on both graphical and textual content. We describe the graphical user interface and visual image browser that is used to perform such queries. We also describe our approach to QBIE, the database structure, and the analysis engines incorporated in DocBrowse.
At the Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), a prototype image database retrieval system has been built. This medical information retrieval system (MIRS) is a client/server application which provides Internet access to biomedical databases, including both text search/retrieval and retrieval/display of medical images associated with the text records. The MIRS graphical user interface (GUI) allows a user to formulate queries by simple, intuitive interactions with screen buttons, list boxes, and edit boxes; these interactions create structured query language (SQL) queries, which are submitted to a database manager running at NLM. The result of a MIRS query is a display showing both scrollable text records and scrollable images returned for all of the 'hits' of the query. MIRS is designed as an information-delivery vehicle intended to provide access to multiple collections of medical text and image data. The database used for initial MIRS evaluation consists of national survey data collected by the National Center for Health Statistics, including 17,000 spinal x-ray images. This survey, conducted on a sample of 27,801 persons, collected demographic, socioeconomic, and medical information, including both interview results and results acquired by direct examination by physician.
The major problem facing video databases is that of content characterization of video clips once the cut boundaries have been determined. The current efforts in this direction are focussed exclusively on the use of pictorial information, thereby neglecting an important supplementary source of content information, i.e. the embedded audio or sound track. The current research in audio processing can be readily applied to create many different video indices for use in Video On Demand (VOD), educational video indexing, sports video characterization, etc. MPEG is an emerging video and audio compression standard with rapidly increasing popularity in multimedia industry. Compressed bit stream processing has gained good recognition among the researchers. We have also demonstrated feature extraction in MPEG compressed video which implements a majority of scene change detection schemes on compressed video. In this paper, we examine the potential of audio information for content characterization by demonstrating the extraction of widely used features in audio processing directly from compressed data stream and their application to video clip classification.
This paper studies the impact of different disk array configurations on the size of data buffer required to support video-on-demand. The study is based on a general two-level hierarchical disk array structure that provides both parallelism and concurrency. The study reveals that, for disk arrays with the same total number of disks, a higher degree of parallelism means that the minimum size of data buffer required is larger. This result provides valuable insight about how to organize the disk array in order to minimize system costs.
Dissimilarity measures, the basis of similarity-based retrieval, can be viewed as a distance and a similarity-based search as a nearest neighbor search. Though there has been extensive research on data structures and search methods to support nearest-neighbor searching, these indexing and dimension-reduction methods are generally not applicable to non-coordinate data and non-Euclidean distance measures. In this paper we reexamine and extend previous work of other researchers on best match searching based on the triangle inequality. These methods can be used to organize both non-coordinate data and non-Euclidean metric similarity measures. The effectiveness of the indexes depends on the actual dimensionality of the feature set, data, and similarity metric used. We show that these methods provide significant performance improvements and may be of practical value in real-world databases.
This paper examines the issue of direct extraction of low level features from compressed images. Specifically, we consider the detection of areas of interest and edges in images compressed using the discrete cosine transform (DCT). For interest areas, we show how a measure based on certain DCT coefficients of a block can provide an indication of underlying activity. For edges, we show using an ideal edge model how the relative values of different DCT coefficients of a block can be used to estimate the strength and orientation of an edge. Our experimental results indicate that coarse edge information from compressed images can be extracted up to 20 times faster than conventional edge detectors.
In this paper we propose a media-independent knowledge indexing and retrieval system as a basis for an information retrieval system. The representation allows for sharing of low level information bearing objects and at the same time allows for maintaining of user-dependent views. The tools for maintenance and manipulation of concepts focus on the user and user's intentions. The aim of the system is to provide a set of flexible tools and let the user structure the knowledge in his or her own way, instead of attempting to build an all-encompassing common sense, or general knowledge representation.
The growth of digital image and video archives is increasing the need for tools that effectively filter and efficiently search through large amounts of visual data. Towards this goal we propose a technique by which the color content of images and videos is automatically extracted to form a class of meta-data that is easily indexed. The color indexing algorithm uses the back- projection of binary color sets to extract color regions from images. This technique provides for both the automated extraction of regions and representation of their color content. It overcomes some of the problems with color histogram techniques such as high-dimensional feature vectors, spatial localization, indexing and distance computation. We present the binary color set back-projection technique and discuss its implementation in the VisualSEEk content- based image/video retrieval system for the World Wide Web. We also evaluate the retrieval effectiveness of the color set back-projection method and compare its performance to other color image retrieval methods.
The retrieval of images through the use of content-based search techniques often requires inferencing and reasoning. As a result the processing of content-based queries in large databases is frequently reduced to membership and range queries on data having vague or uncertain attributes. We refer to these as fuzzy attributes. In this paper, a hierarchical indexing technique for membership and range queries in databases containing data having fuzzy attributes is proposed. This approach is suitable for both unimodal and multimodal fuzzy membership functions. In the proposed approach, an index using a multi-attribute indexing scheme such as a hierarchical hash table is generated based on the discrete representation of each fuzzy attribute. Indexing is then performed by traversing the data structure looking for the activation values provided by the query.
HNC Software Inc. has developed a technology for automatic indexing and retrieval of free text and images. This technique is based on the concept of 'context vectors' which encode a succinct representation of the associated text and features of images. In this paper, we describe some technical issues in the image content addressable retrieval system (ICARS) including image context vector representation, clustering algorithm, retrieval and indexing techniques. ICARS has the capability to retrieve images based on similarity of content by texture and color without performing segmentation or object recognition.
Until recently, the management of large image databases has relied exclusively on manually entered alphanumeric annotations. Systems are beginning to emerge in both the research and commercial sectors based on 'content-based' image retrieval, a technique which explicitly manages image assets by directly representing their visual attributes. The Virage image search engine provides an open framework for building such systems. The Virage engine expresses visual features as image 'primitives.' Primitives can be very general (such as color, shape, or texture) or quite domain specific (face recognition, cancer cell detection, etc.). The basic philosophy underlying this architecture is a transformation from the data-rich representation of explicit image pixels to a compact, semantic-rich representation of visually salient characteristics. In practice, the design of such primitives is non-trivial, and is driven by a number of conflicting real-world constraints (e.g. computation time vs. accuracy). The virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems. The architecture has been designed to support both static images and video in a unified paradigm. The infrastructure provided by the Virage engine can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.
The virtual digital library, a concept that is quickly becoming a reality, offers rapid and geography-independent access to stores of text, images, graphics, motion video and other datatypes. Furthermore, a user may move from one information source to another through hypertext linkages. The projects described here further the notion of such an information paradigm from an end user viewpoint.
Multimedia information is now routinely available in the forms of text, pictures, animation and sound. Although text objects are relatively easy to deal with (in terms of information search and retrieval), other information bearing objects (such as sound, images, animation) are more difficult to index. Our research is aimed at developing better ways of representing multimedia objects by using a conceptual representation based on Schank's conceptual dependencies. Moreover, the representation allows for users' individual interpretations to be embedded in the system. This will alleviate the problems associated with traditional semantic networks by allowing for coexistence of multiple views of the same information. The viability of the approach is tested, and the preliminary results reported.
The paper addresses a fundamental problem for image retrieval systems: how is the content information to be used in answering user queries? Our answer to this question is a retrieval model based on logic that offers: (a) an abstract representation of the visual appearance of an image allowing to incorporate in a principled way any image retrieval technique based on the similarity of physical features such as region, color, and shape: (b) a semantic data modeling styled representation of the image content, independent from how the content information is obtained; (c) a functional representation of the association between portions of the image form and content objects. These three-leveled image representations are queried via a logical language spanning along four dimensions: the visual dimension, in which queries are images themselves, and the content, mapping, and spatial dimensions, in which queries are symbolic expressions. An image is retrieved in response to a query if it satisfies, in a logical sense, the query.
To the end-user of a video database, content consists of objects and events occurring in the video. A video database system must be designed to extract, represent and organize this information in a fashion that supports querying, manipulation and data visualization by a user. As a data modeling exercise, objects and events are defined in terms of semantic attributes such that an end-user's queries are expressible through the modeling language. On the other hand, as a feature extraction exercise, objects are defined as solutions to equations, often in terms of low-level visual primitives like voxels or contours. These two formalisms constitute entirely different languages. However, integration of these two approaches can provide a powerful mechanism for description and manipulation of complex visual data. This paper explores issues involved with this integration. We introduce the notion of a visual data modeling language (VDML), which supports data definition and data manipulation operations over complex visual data characteristic of video database systems. We discuss this data- modeling effort in the context of our multiple perspective interactive video system which generates three-dimensional data sets using input from multiple video cameras.
Many applications would benefit if media objects such as images could be selected and classified (or clustered) such that 'conceptually similar' images are grouped together by content. This requires that image content be described by some coherent semantic domain model rather than relying on the use of keywords as in most commercial image database systems. However, a description of image contents cannot be predefined by prescribing what should be in the images but must incrementally evolve to link image instances with descriptions of what is actually there. Flexibility is required as the same image may be reused from many different application perspectives, and classified and reclassified by many different, unpredictable, and possibly contradictory interpretations of the same contents. We present preliminary work on the incremental and flexible description of image and video semantic content by the use of a description logic (DL), GRAIL, developed at the University of Manchester. GRAIL progressively bridges the gap between the uninterpreted raw image and the application's semantic domain of 'world' objects by supporting the incremental specification of a schema, the automatic classification of descriptions (and hence images), the notion of 'conceptual similarity' for imprecise queries, multiple granularity of views and reuse. We then present a model for a video database system based on this approach. A primary aim is to determine if GRAIL in particular, and DLs in general, are suitable for such an application.
Two approaches for integrating images into the framework of a database management system are presented. The classification approach preprocesses all images and attaches a semantic classification and an associated certainty factor to each object found in the image. The abstraction approach describes each object in the image by using a vector consisting of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries that are based on image content are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that have the same classification as query objects. In the abstraction approach, retrieval is on the basis of similarity of feature vector values of these objects. Both the pattern recognition and indexing aspects of the method are addressed for each approach. The emphasis is on extracting both contextual and spatial information from the raw images. Methods for storing and indexing symbolic images as tuples in a relation are presented for each approach. Indices are constructed for both the contextual and the spatial data. The user interface for a pictorial information system based on these two approaches is also presented.
Video data management is fast becoming one of the most important topics in multimedia databases. Most of the recent work on video databases has so far focused on video classification, feature extraction, spatial reasoning and image retrieval (video access); little work has been done on supporting advanced video editing and production activities, nor has there been much work done on providing facilities for efficient and versatile video data management. In this paper, we describe the development of an experimental video database system being implemented at HKUST, which employs extended object-oriented features and techniques. By incorporating conceptual object clustering concepts and techniques, it enables users to dynamically form, among other things, video programs (or segments) from existing objects based on semantic features/index terms. A prototype of this system has been constructed, using a persistent object storage manager (viz. EOS), on Sun4 workstations.
Many algorithms have been proposed for detecting video shot boundaries and classifying shot and shot transition types. Few published studies compare available algorithms, and those that do have looked at limited range of test material. This paper presents a comparison of several shot boundary detection and classification techniques and their variations including histograms, discrete cosine transform, motion vector, and block matching methods. The performance and ease of selecting good thresholds for these algorithms are evaluated based on a wide variety of video sequences with a good mix of transition types. Threshold selection requires a trade-off between recall and precision that must be guided by the target application.
Indexing and editing digital video directly in the compressed domain offer many advantages in terms of storage efficiency and processing speed. We have designed automatic tools in the compressed domain for extracting key visual features such as scene cut, dissolve, camera operations (zoom, pan), and moving object detection and tracking. In addition, we have developed algorithms to solve the decoder buffer control problems and allow users to 'cut, copy and paste' arbitrary compressed video segments directly in the compressed domain. The compressed-domain approach does not require full decoding. Thus fast software implementations can be achieved. Our compressed video editing techniques enhance the reusability of existing compressed videos.
A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.
The specification of image content is a critical issue in image databases. In this paper we explore the problem of specifying an important visual cue, that of image texture. The approach we have taken is to separately categorize texture images and texture words (in the English language), and then explore the relationships between the identified categories of images and words. These relationships are expressed as association matrices, and measure the mapping between the visual texture space and lexical texture space. Based on experiments with human subjects, we determined Pearson's coefficient of contingency (which measures the degree of association) to be 0.63 for the association matrix mapping images to words, and 0.56 for the association matrix mapping words to images. These indicate a strong association between texture words and images. Furthermore, like categories of texture words map onto like categories of texture images, e.g. words dealing with repetition map onto images of repetitive texture.
With the advances in digital imaging and storage technologies, image is becoming a common type of data in many applications. This creates a demand to manage these database of images. One of the issues is in providing a query interface to allow an effective and efficient retrieval. Our study in this issue has led to the development of a query interface system which we call Exquisi. The main contribution of Exquisi is an expressive query language and interface that allow effective querying from a database of similar images. In particular, it provides querying tools for a user to express subtle differences that may exist between images to be retrieved and other images that are similar. The query interface also allows the user to incorporate ambiguities and imprecisions in specifying his/her query. Another important aspect of Exquisi is the provision of a reformulation language by which the user can incorporate a more specific aspect of the query results for the refinement of the query. One key part of this paper is the description of the input and reformulation features that are supported by Exquisi and a discussion on the relationship between the reformulation and the input modules.
In order to flexibly and efficiently store, manage, and present video data streams, continuous video data must be chopped into video objects and stored into database. This paper investigates systematic strategies for supporting continuous and synchronized presentation of video data streams in multimedia database systems. Compressed video data streams are segmented and stored as sets of video objects coupled with specified synchronization requirements. Strategies for efficiently scheduling and buffering video objects are presented which guarantee the hiccup-free presentations of video streams. Delay effects are considered in these strategies. We propose to extend the existing object-oriented database system (OODBS) techniques to include the proposed video presentation mechanisms. We are currently designing and implementing a multimedia presentation tool (termed MediaShow) on top of O2, a well-known OODBS, as a basis for our implementation. However, the design strategies can be generally used in any OODBS environments that support C++ interface.
The large amount of video data makes it a tedious and hard job to browse and annotate them by just fast forward and rewind. Recent works in video parsing provide a foundation for building interactive and content based video browsing systems. In this paper, a generalized top-down hierarchial clustering process, which adopts partition clustering recursively at each level of the hierarchy, is studied and used to build hierarchical views of video shots. With the clustering processes, when a list of video programs or clips is provided, a browsing system can use either key-frame and/or shot features to cluster shots into classes, each of which consists of shots of similar content. After such clustering, each class of shots can be represented by an icon, which can then be displayed at the high levels of a hierarchical browser. As a result, users can know roughly the content of video shots even without moving down to a lower level of the hierarchy.
In this paper, we propose an algorithm based on vector quantization (VQ) for indexing of video sequences in the compressed form. In VQ, the image to be compressed is decomposed into L-dimensional vectors. Each vector is mapped into one of a finite set of codewords (codebook). Vectors are encoded in the intraframe mode using adaptive VQ. Each frame is represented by a set of labels and a codebook. We note that the codebook reflects the contents of the frame being compressed and similar frames have similar codebooks. The labels are used for cut detection and to generate indices to store and retrieve video sequences. The proposed technique provides fast access to the sequences in the database. In addition, this technique combines video compression and video indexing. Simulation results confirm the substantial gains of the proposed technique in comparison with other techniques reported in the literature.
I/Browse: The Bellcore Video Library Toolkit is a set of tools for constructing and browsing libraries of digital video. The toolkit is designed to work with video libraries on local or network disks, CD-ROMs or a multimedia sever. There are three main components, a preprocessor, a tagger, and a browser. Particular attention is focused on the tools and techniques we have developed to rapidly tag videos. The tagging system allows text fields, type information, and other resources to be associated with frames in the video. The tags are further organized into a hierarchical 'table of contents', which is suitable for browsing and searching.
An experimental video server for middle-scale video-on-demand services that uses a 'redundant double-layered disk array' can read out 100 MPEG-1 1.5-Mbps video streams simultaneously with a response time of under one second through an FDDI-LAN. An exclusive data method that switches between normal data and fast data and a skip-search method are used to provide fast visual search. The gateway connecting the video server LAN to a 6.312-Mbps constant bit-rate line allows broadcast services to be integrated with on- demand services. The protocol implemented in this gateway controls the visual search rate, corrects errors in downloaded data, and accelerates the playback mode changes.
Parallelism with merging (PWM) techniques for storing and retrieving continuous media combine the use of I/O parallelism and object interleaving to meet the playback requirements of continuous media. The PWM techniques previously proposed do not explicitly address support for scalable video, i.e. the ability to retrieve a video at various bandwidths or data rates. Compression techniques for video such as MPEG and 3D subband coding encode video sequences in such a way as to allow subsets of the full resolution video bit stream to be decoded to recreate lower resolution videos. This facilitates support for various quality of service levels without having to store each level separately. In this paper, we extend the PWM techniques by integrating scalable video support. We show how this integration impacts the continuity requirements, buffering requirements, and admission control criteria. Results of a simulation study are presented which compare the various scalable-PWM strategies.
Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
This paper proposes a design procedure that exploits storage devices with different cost/bandwidth and cost/capacity ratios to build a hierarchical storage system for video-on- demand with minimum cost. The storage system is assumed to have two levels of hierarchy. The level-1 storage devices feature a low cost/bandwidth ratio and a high cost/capacity ratio. On the other hand, the level-2 storage devices feature a high cost/bandwidth ratio and a low cost/capacity ratio. The proposed procedure determines, with respect to overall system cost, which level of the hierarchy each program should be placed into. Based on the decisions, the designer then can figure out the appropriate configuration of the hierarchical storage system. The optimization target is to minimize the overall cost of the storage system.