Our goal is to enable viewers to access TV programs based on their content. Towards this end, we present a system that automatically captures and processes TV news programs into a database that can be searched over the internet. Users browse this database by submitting simple English queries. The results of the query is a hyperlinked list of matching news stories. Clicking on any item in the list immediately launches a video of the pertinent part of the news broadcast. We segment TV news broadcasts into distinct news stories. We then index each story as a separate entity. In reply to a query, videos for these news stories are displayed rather than the whole TV program. News program s ar usually accompanied by a transcript in closed caption text. The closed caption text contains markers for story boundaries. Due to the live nature of TV news programs, the closed caption lags the actual audio/video by varying amounts of time up to a few seconds. The closed caption text, thus, has to be shifted to be aligned in time to the video. We use video and audio events to do this synchronization. The closed caption for each story is entered into a database. In response to a query, the database retrieves and ranks the matching closed caption stores. An HTML document is returned to the user which lists: 1) the name and time of the news program that this story belongs to, 2) thumbnails providing a visual summary of the story, 3) closed caption text. To view a news story, the user simply clicks on an item form the list and the video for that story is streamed onto a media player at the user side. This system maintains the manner of presentation of the media, namely video for TV programs, while allowing the common search and selection techniques used on the web.
Redundant Array of Inexpensive Disks (RAID) vendors rely on multi-megabyte files and
large numbers of physical disks to achieve the high transfer rates and Input/Output
Operations Per Second (lOPS) quoted in the promotional literature. Practical image
database applications do not always deliver such large files and cannot always afford the
cost of the large numbers of disks required to match the vendors' performance claims.
Because the user is often waiting on-line to view the images, applications deployed on the
World Wide Web (WWW) are especially sensitive to keeping inline images relatively
small. For such applications, the expected performance advantages of RAID storage
may not be achieved.
The Lister Hill National Center for Biomedical Communications houses three image
datasets on a SPARCstorage Array RAID system. Applications deliver these images to
users via the Internet using the WWW and other client/server programs. Although
approximately 3% of the images exceed 1 MB in size, the average file size is less than
200 KB and approximately 60% of the files are less than 100 KB in size. A study was
undertaken to determine the configuration of the RAID system that will provide the
fastest retrieval of these image files and to discover general principles of RAID
performance. Average retrieval times with single processes and with concurrent processes
are measured and compared for several configurations of RAID levels 5and 0+1 . A few
trends have emerged showing a tradeoff between optimally configuring the RAID for a
single process or for concurrent processes.
This paper describes the Data Distribution Laboratory and discusses issues involved in building multimedia CD-ROMs. It describes the modeling philosophy for cataloging multimedia products and the world-wide web-based multimedia archive and retrieval system built on that model.
In this paper, we present an approach in implementing intelligent information retrieval systems. We have constructed a multilingual information system which combines both image and text retrieval. We have developed an English/Chinese text retrieval tool on the WWW, and later incorporated an image retrieval tool based on associated multilingual captions. The system allows the general public to locate and keep abreast of information about Singapore. It has a novel user interface which accepts queries that are expressed in English, Chinese and mixed text into its database. The titles, summaries, URLs and the matching scores of retrieved documents will then be returned, and a thumbnail will be displayed as well if an image document is retrieved.
In multimedia databases, one major class of user queries requires retrieving those database images that are spatially similar to a query image. To rank order the database images with respect to the query, the existing spatial similarity algorithms compute the similarity of every database image with the query. For large multimedia databases, this task is computationally expensive and renders interactive query processing difficult. In this paper, we propose an indexing scheme which will eliminate non-relevant images to a query before the actual similarity computation. In other works, the indexing scheme serves as a filter and spatial similarity computation is done only on those images that pas through the filter. Some non-relevant images may pass through the filter ut the proposed indexing scheme guarantees that no relevant images are eliminated. The indexing scheme is robust in the sense that it recognizes translation, scaling, and rotation variant images of the query image as relevant to the query.
image indexing, namely, the problem of retrieving content information from images in response to queries, is a key problem underlying several operations in image databases. Indexing for object queries, in particular, is a difficult problem, as it requires localizing an unanticipated object in unsegmented images. This inevitably involves search, a computationally intensive operation when based entirely on image features. It is desirable to have efficient data structures that avoid the need for sequential search through images and their features for query localization. Conventional data structures used for database organization are not adequate for image indexing where the object query has to be located in images depicting changed imaging conditions that include pose changes and occlusions. In this paper, we explore the use of a geometric hash table as a suitable data structure for fast image indexing. The technique of geometric hashing has been used in computer vision for indexing a library of models to find candidate model objects for recognition in the isolated image region. Here, however, we use geometric hashing as a technique of fast query localization in unsegmented images of a database. Specifically, we show that by using three consecutive features along a curve as basis points for affine invariance, a hash table can be constructed for images that is quadratic in the number of features. The resulting indexing method is also quadratic in the number of features. The query localization by geometric hashing is demonstrated for the problem of indexing of handwritten documents based on handwriting pattern queries.
In this work, we investigate the nearly lossless image compression technique, which provides a better compression ratio than purely lossless compression schemes and has a better reconstructed image quality than lossy ones. In particular, we introduce a new idea called the soft decision quantization and integrate it with the binary arithmetic QM coder. The superior performance of the developed algorithm is demonstrated with numerical experiments.
Fast and efficient storage, indexing, browsing, and retrieval of video is a necessity for the development of various multimedia database applications. This can be achieved by analyzing the video directly in the compressed domain, thereby avoiding the overhead of decompressing video into individual frames in the pixel domain. Our compressed domain parsing of video performs shot change detection and motion detection using the data readily accessible from MPEG, with minimal decoding. Key frames are identified and are used for indexing, retrieval, and browsing. In this paper, we describe feature extraction and key frame indexing and retrieval techniques that are directly applicable to compressed video. The features are derived form the available DCT, macroblock, and motion vector information and the techniques enable fast parsing and archiving of video.
Current image retrieval systems have many important limitations. Many are specialized for a particular domain of images, and are not applicable to other image domains. The more general systems treat all images uniformly. Consequently, the power of their query facility is limited to color, texture, shape, and other features that ar common to all images, with no deeper understanding of the structure of a given image. Few systems have addressed the issue of scalability with respect to the size of the image collection and with respect to the underlying techniques. There are two communities that can contribute to image databases: computer vision and database systems. In this paper we focus on the database side of the issue. We consider how to design a database system that supports a rich class of content-based queries on image collections, scales with collection size, and easily incorporate future advances in computer vision. This paper outlines one approach, in the form of the design, implementation and testing of an image database system called PIQ.
Image retrieval based on JPEG compressed data is investigated in this research. Two low level features are extracted in the DCT domain to facilitate content-based retrieval. They are tree-structured color representations of the DC coefficients of Y, Cb, and Cr block arrays and the energy distribution of AC coefficients to denote different textured patterns. Multiresolution indexing schemes are employed to achieve fast retrieval with both color and texture features. Finally, the retrieval performance is improved by a new method which combines both color and energy features.
We examine the problem of finding similar tumor shapes. The main contribution of this work is the proposal of a natural (dis-)similarity function for shape matching called the 'morphological distance'. This function has two desirable properties: a) it matches human perception of similarity, as we illustrate with precision/recall experiments; b) it can be lower-bounded by a set of features, leading to fast indexing for range queries and nearest neighbor queries. We use state-of-the-art methods from morphology both in defining our distance function and for feature extraction. In particular, we use the 'size-distribution', related to the 'pattern spectrum', to extract features from shapes. Following Jagadish and Faloutos et. al., we organize the n-d feature points in a spatial access method. We show that any Lp norm in the n-d space lower-bounds the morphological distance. This guarantees no false dismissals for range queries. In addition, we present a nearest neighbor algorithm that also guarantees no false dismissals. We implemented the method and tested it against a testbed of realistic tumor shapes generated by an established tumor- growth model. The response time of our method is up to 27 times faster than sequential scanning. Moreover, precision/recall experiments show that the proposed distance captures very well the dissimilarity as perceived by humans.
In this work, we study the relationship between content- based image retrieval and pattern recognition, by modeling the image retrieval process in a probabilistic method. A model called random image database will be presented, together with a retrieval quality measure called probability of self similar, which enables us to establish the link between image retrieval and pattern recognition. The main result is that such a quality measure is uniformly upperbounded by its pattern recognition counterpart using nearest neighbor rule, when only one training sample is available for each class. Therefore a feature measure having better performance in the one training sample per class case should be favored over features doing well in large training sample situations.
It is very hard to realize a general purpose, domain independent content-based image retrieval system. To implement a CBIR system rapidly for some specific purpose, tools that support application development are needed. The objective of this study has been to design a framework for content-based image retrieval application development. With this environment it is fast and easy to implement prototype systems and test their performance and usability. In order to speed up experiments, a compete computational chain has been implemented in such a way that rapid changes are possible. This has been realized utilizing atomic, re-usable software components. The designed framework permits the development of new CBIR systems by utilizing existing components and by building new standardized components.
Content-based indexing of images and videos based on texture features is a powerful mechanism to retrieve images and video scenes. However, the feature extraction process from these images and video is time consuming and is not suitable for interactive query processing. A progressive texture extraction and matching algorithm is proposed and evaluated in this paper. This algorithm takes advantage of the multi- resolution lower than the full resolution of an image or video, the proposed algorithm performs the feature extraction and matching hierarchically. Only those regions matched to the target template at a lower resolution level will be further compared at a higher resolution. The computation speed of this algorithm is shown to be significantly improved over conventional algorithms while maintaining the same accuracy.
Classification of farm animal behaviors are based upon oral or written descriptions of the activity that an animal is engaged in. The quantification of animal behavior for research requires that an individual recognize and code the behavior of the animal under study. The classification of these behaviors can be subjective and may differ among observers. Illustrated guides to animal behavior do not convey the motion associated with most behaviors. Video based guides offer a subjective method of quantifying behaviors with real time demonstrations of the components that make up a behavior. In this paper, we propose an animal behavior video database system which can automatically extract animal motion information from the input animal activity video clip by a multi-object tracking and reasoning system. The extracted information is then analyzed and described using a set of standard animal behavior terms we are developing. The behavior description is used to automatically annotate the given video clip, and serves as the content-based index. The user of the system is able to use a keywork description of the behavior to retrieve the corresponding video object. The intended applications of the system are animal and veterinary science education, and animal behavior research. The prototype system is built for swine and will be extended to other farm animal species.
In this paper, we first describe a structural compression technique which has been designed to facilitate document text image storage, retrieval, and processing. This technique provides an efficient representation of textural images and lends itself to lossy compression, progressive transmission, direct access to sub-regions of the document and document processing in the compressed domain. We describe a data structure which can be used to efficiently store the compressed information, provide algorithms for creating and manipulating it, and present results of document processing on images compressed form the University of Washington database.
We introduce a pattern matching algorithm and a bitmap reconstruction method used in document image compression. This pattern matching algorithm uses the cross entropy between two patterns as the criterion for a match. We use a physical model which is based on the finite resolution of the scanner to estimate the probability values used in cross entropy calculation. The matching algorithm is enhanced by the bitmap reconstruction method which infers a good high resolution image form a series of poor low resolution images. This bitmap reconstruction method is based on the naive image averaging method, but it uses a preprocessing smoothing filter and it is done in a higher resolution than the original image. Experimental results show that this pattern matching algorithm and this bitmap reconstruction method compare favorably to previous techniques.
Due to the increasing complexity of applications in the field of multimedia information systems, new querying techniques need to be studied in order to make them better to satisfy user needs. Our aim is to define a general model to facilitate retrieval of multimedia data in the context of object-oriented databases. With this model, the user expresses his preferences and then, the answers to a query are presented from the most pertinent to the least. We show that systems which manage temporal aspects of multimedia data need such a model and we propose that the definition of 'fuzzy' temporal predicates be integrated in the general model.
Research on computer-based recognition of violence is scant. We are working on the automatic recognition of violence in digital movies, a first step towards the goal of a computer- assisted system capable of protecting children against TV programs containing a great deal of violence. In the video domain a collision detection and a model-mapping to locate human figures are run, while the creation and comparison of fingerprints to find certain events are run int he audio domain. This article centers on the recognition of fist- fights in the video domain and on the recognition of shots, explosions and cries in the audio domain.
In this paper we present a geometry based indexing method for the semantic retrieval of large video databases. It combines two separate modules i.e., 3D object shape inferencing from a video sequence and geometric modeling from the reconstructed shape, to achieve better performance. First, a motion-based segmentation algorithm employing feature block tracking and hierarchical principal component split is used for multi-moving-object motion classification and segmentation. After segmentation, feature blocks for an individual moving scene or object can be used to reconstruct the 3D motion and shape structure of this scene or object by a factorization method. We assume object is rigid and relatively far away from the camera so that perspective distortion can be ignored. The estimated shape structure and motion parameters are then used to generate the implicit polynomial (IP) representation for the object. The system starts with a very coarse representation of the 3D shape. When more frames are available from the video stream and are properly segmented and classified, the IP representation will change accordingly by varying the coefficients of the implicit polynomial to minimize the estimation error. This process will stop when enough information is obtained to generate a reliable IP shape representation or until the video stream runs out. The semantic retrieval of the video databases is achieved by using the geometric structure of the objects and their spatial relationship. We generalize the 2D sting concept to 3D to compactly encode the spatial relationship among objects. The algebraic invariants of the implicit polynomial are used as the geometric feature vector for the object. A similarity value can be computed for two sets of objects or two video sequences to allow fast retrieval of video databases.
Int his paper, we present a new method for video sequence segmentation which can be used in video indexation applications. Our approach uses the image content as indices of segmentation. As for most video sequences, the images contain 3D hints. In order to detect these indices efficiently, we develop a two step Hough transformation (HT). The first HT tries to find all lines contained in video image. The second one according to the theory of projective geometry, gives the possible focus of expansion (FOE) point. Once we have all possible FOE positions, simple comparison of these positions tells the difference of video sequences. This method is robust not only for the images taken from well structured objects in the scene as buildings, roads and other man made entities, but also for the scene containing the flower field or other aligned natural objects. The results of the approach are shown at final part of the paper.
This paper describes our experiences in video analysis for a video library on the World Wide Web. News and documentary programs, though seemingly simple, have some characteristics which can cause problems in simple shot segmentation algorithms. We have developed a methodology, based on our experience with the analysis of several hours of news/documentary footage, which improve the results of shot segmentation on this type of material and which in turn allows for higher-quality storyboards for our video library.
A video is a multimedia document which is structured in scenes and shots. Scenes are lists of consecutive shots characterized by common visual and audio features. Shots are sets of consecutive frames separated by cuts, which can be easily recognized by existing techniques. Video segmentation into scenes is a new and open problem. It is needed for scenes retrieval, specially in authoring and interactive video applications. We propose a new approach of video segmentation into scenes, which is based on several media and takes into account the film syntax. We characterize a scene by some similarity between color histograms of the current shot, and of one of the most recent previous shots. Similarity between a shot frame and a frame of a previous shot may indicate the presence of alternate shots, which belong to the same scene. Other techniques based on projective geometry are presented in a companion paper. These techniques enable to detect the movement of the camera. We recognize the speakers of a scene by AR vector model techniques, such as the one proposed by some of the authors in the Orphee system, implemented at Laforia. However the speaker recognition problem is much more difficult when applied to the video CD-I, due to several transition types and various types of noise. We present experimental results, based on this approach. Detection of alternate shots is efficient, but speaker recognition needs improvements.
In this paper we present most recent evolution of JACOB, a system we developed for image and video content-based storage and retrieval. The system is based on two separate archives: a 'features DB' and a 'raw-data DB'. When a user puts a query, a search is done in the 'features DB'; the selected items are taken form the 'raw-data DB' and shown to the user. Two kinds of sessions are allowed: 'database population' and 'database querying'. During a 'database population' session the user inserts new data into the archive. The input data can consist of digital images or videos. Videos are split into shots and for each shot one or more representative frames are automatically extracted. Shots and r-frames are then characterized, either in automatic or semi-automatic way, and stored in the archives. Automatic features' extraction consist of computing some low-level global features. Semi-automatic features' extraction is done by using annotation tools that perform operations that aren't currently possible with fully automatic methods. To this aim semi-automatic motion based segmentation and labeling tools have been developed. During a 'database querying' session, queries direct or by example are allowed. Queries may be iterated and variously combined to satisfy the query in the smallest number steps. Multifeature querying is based on statistical analysis of the feature space.
In information system servers the information access task can be carried out by means of what we call an 'information engine'. The information engine composes interactive multimedia documents making use of storage and retrieval services. Therefore, it is responsible for translating each particular data structure into a common data representation, providing access to the storage systems independently of their internal per site structure. The information engine is composed of the document engine and the storage and retrieval services. The former manages data browsing, composition, formatting, and the latter the gateway to the different systems, providing a common database abstraction to the document engine.
A video-on-demand system in a distributed environment relies on a video server that simultaneously provides video services to multiple clients and guarantees the quality of service (QoS) for each client. Because video data is categorized as continuous media, which has an implied dynamic temporal dimension, resource management of a video server has to be specially designed to meet new requirements. In this paper we develop the policy and the mechanism of resource management for implementing an MPEG- based video server that supports common VCR functionality. Our goal is to maximize the number of video streams to clients, while maintaining the quality of service of each video stream, under the limitation of system resources available on a workstation. We first define the policy and the QoS parameters for different video services. Then, we describe the mechanisms of admission control and resource control to efficiently implement the policy and guarantee the QoS for video services. A dynamic approach of resource reservation dealing with state change is also presented. The major contribution of this paper is to present a framework that integrates CPU/disk/network scheduling and memory management for video services. Under such a framework, we can manage system resources and optimize individual resources systematically.
High performance servers and high speed networks will form the backbone of the infrastructure required for distributed multimedia information systems. A server for an interactive distributed multimedia system may require thousands of gigabytes of storage space and high I/O bandwidth. In order to maximize system utilization, and thus minimize cost, it is essential that the load be balanced among each of the server's components vis. the disks, the interconnection network and the scheduler. Many algorithms for maximizing retrieval capacity from the storage system have been proposed in the literature. This paper presents techniques for improving server capacity by assigning media requests to the noes of a server so as to balance the load on the interconnection network and the scheduling nodes. Five policies for request assignment, round robin, minimum link allocation, minimum contention allocation, weighted minimum link allocation and weighted minimum contention allocation are developed. We also consider the issue of file replication, and develop two schemes for storing the replicas. The performance of these policies on a server model developed earlier is presented.
We consider the problem of data retrieval from disk storage in video server where the video data are read in constant size blocks. Retrieval algorithms of this type are referred to as constant data length (CDL) retrieval algorithms. We recently introduced a new retrieval algorithm called GCDL that generalizes the CDL retrieval algorithm: GCDL reads for a video stream i during k (DOT) mi, k (epsilon) , consecutive disk rounds a constant size block from the disk, which may results in a large read-ahead requiring a large amount of buffer. In this paper, we propose two new retrieval algorithms called static and dynamic GCDLb that minimize the number of reads during consecutive disk rounds while still maintaining continuous delivery to the client. COmpared to GCDL, we show that GCDLb requires less buffer per client and can admit more clients.
The main motivation for disk arrays is the opportunity to increase data parallelism to satisfy the escalating demands of a large class of applications such as multimedia, which is characterized as a real-time IO-intensive application. However, traditional disk arrays suffer from contention in several components: memory, bus, disk controllers and processing power. This contention degrades performance and causes delivery system bottlenecks. We propose MP-RAID: a parallel architecture for redundant arrays of inexpensive disks (RAID) which extends data parallelism and introduces control parallelism to disk arrays. MP-RAID is a transputer- based multiple parallel RAID that employs data parallelism on two levels. The lower level has multiple disks grouped in a single parity group and operated simultaneously. The higher level connects multiple decentralized RAID modules via a high speed interconnect network with multiple I/O paths. Control parallelism can be achieved by either of these operating modes: SCMS (single controller multiple servers) or MCMS (multiple controller multiple servers). In SCMS parallel operation mode, requests are queued in the main array controller unit (ACU). The ACU distributes requests among modules and establishes one or more links with host applications. It instructs one or more module to serve a single large request or multiple small requests. In MCMS mode, each storage module receives requests directly acting as an independent ACU.
The problem of multi-media document retrieval is investigated. The representation document features using and queries is discussed. It is then shown how these, the features descriptions and query description can be matched as aggregated to obtain an overall document score. Finally we look at the problem of fusion of structured objects a problem often arising in image databases.
We propose the approach to design of holographic disk memory which can be applied for multimedia and video archiving. This approach consists in the synthesis of 3D memory by overlapping 1D expanded partial holograms on the moving carrier (disk) and the use of collinear optical heterodyning for data readout. The general architecture of such a system is presented. The results of simulating experiments on synthesis of hologram of 2D image and recording and reproducing of wideband rf signals in the acoustooptical scheme are demonstrated.
The Alexandria Digital Library Project has been tasked with the goal of building a digital geographical library. To meet these requirements we have designed and prototyped an intelligent data store to store its holdings; the library's map, image, and geographical data are viewed as a collection of distributed objects. Developed suing the Java language, the Data Store was designed to address the problems associated with digital library storage: interoperability, extensibility, distribution, and elimination of server bottlenecks.
The creation of training, expert and similar systems, using archives of complex color images, is a dream of many specialists in various subject matter areas including medicine. However, so far there has not been solved the problem high-quality metrological input to a computer of complex color images. This problem is now being solved mainly in two directions: 1) creation of black-and-white high-resolution hardware with alternate photographic an object through R,G,B-filters and subsequent superposing three images upon each other; 2) creation of high-resolution color hardware. At the same time, in inputting static objects, one more possibility exists, namely; successive input of separate fragments of a complex image with subsequent mathematical joining them in a computer. Here, the effective resolution of hardware used heightens such number times, that is practically proportional to the number of fragments input, except common zones required for high- quality joining. The present work is dedicated to practical realization of the latter direction conformingly to the input of cytological preparations images.
In this paper we model each 2D image as a generalized extended pseudo-symbolic picture based on the absolute spatial relationships in the image. Each generalized extended pseudo-symbolic picture can then be represented by a GEP-2D string. We combine GEP-2D strings and usual 2D strings into generalized combined 2D strings to represent 2D images in order to capture both the absolute and relative spatial relationships in the image. Then we address how to maintain the complete information about the absolute spatial relationships in the image. Picture retrieval by generalized combined 2D strings is discussed.