This paper presents a human posture recognition method form a single image. We first segment an image into homogeneous regions and extract curve segments corresponding to human body parts. Each body part is considered as a 2D ribbon. From the smooth curve segments in skin regions, 2D ribbons are extracted and a human body model is constructed. We assign a predefined posture type to the image according to the constructed body model. For the user input query to retrieve images containing human of specific posture, the system convert the query to a body model. The body model is compared to other body models saved in the local storage of target images and images of good matches are retrieved. When a face detection result is available for the given image, it is also used to increase the reliability of body model. For the query human posture, our system retrieves images of the corresponding posture. As another application, the proposed method provides an initial location of a human body to track in a video sequence.
The content-based image retrieval system framework using human-level features of image contents is presented. The system consists of face detection to locate the human faces within images, face recognition of the similarity measure between a query face and indexed faces within storage. Face detection and recognition techniques used in the system are very simple because short response time in interactive multimedia systems is essential. Color and elongated Gaussian filter response are used for face detection and dimple structural and normalized template matching are used for face similarities. Simulation results are summarized using restricted test images.
In this paper, we present an approach to effectively access desired images in a large image database. Similarity-based image retrieval and image classification are representative methods for accessing images in an image database. With similarity-based image retrieval, it is difficult to specify a proper query image. The image classification can not achieve the high accuracy classification required by users. We present methods for hierarchical classification and the visualization. In addition, we present an image access system which integrates image retrieval with the classification and the visualization in order to overcome the issues mentioned above and improve access efficiency. The hierarchical classification is dynamically structured according to users' needs, and the image feature space visualization is real-time generated to enable users to interactively browse images. In addition, we also show the usability of an image access system integrating image retrieval with the classification and the visualization method.
In this paper,we propose a new scalable simultaneous learning and indexing technique for efficient content-based retrieval of images that can be described by high- dimensional feature vectors. This scheme combines the elements of an efficient nearest neighbor search algorithm, and a relevance feedback learning algorithm which refines the raw feature space to the specific subjective needs of each new application, around a commonly shared compact indexing structure based on recursive clustering. Consequently, much better time efficiency and scalability can be achieved as compared to those techniques that do not make provisions for efficient indexing or fast learning steps. After an overview of the current related literature, and a presentation of our objectives and foundations, we describe in detail the three aspects of our technique: learning, indexing and similarity search. We conclude with an analysis of the objectives met, and an outline of the current work and considered future enhancements and variations on this technique.
We have previously introduced a Bayesian framework for content-based image retrieval that relies on a generative model for feature representation based on embedded mixtures. This is a truly generic image representation that can jointly model color and texture and has been shown to perform well across a broad spectrum of image databases. In this paper, we expand the Bayesian framework along two directions.
This paper presents work toward indexing the image content in a collection of 17,000 cervical spine and lumbar spine images, for purposes of public dissemination by such system as the Web-based Medical Information System. These images were collected as part of a national health survey and to date no radiological or quantitative content has been derived from the images, except for our work, described in this paper. Practical considerations, primarily of labor cost, make the job of deriving radiological interpretations or quantitative anatomical measures by manual methods very difficult. For this reason, the acquisition of content information by automated means, or even by semi-automated means, which require human interaction, but significantly reduce the required labor, are very important. This field is not in an advanced state of development, and the results we present are necessarily work in progress.
It has become increasingly important for multimedia databases to provide capabilities for content-based retrieval of multi-modal data at multiple abstraction levels for various decision support applications. These decision support applications commonly require the evaluation of fuzzy spatial or temporal Cartesian product of objects that have been retrieved based on their similarity to the target object in terms of color, shape or texture features.
Similarity indexing is the supporting technology for fast content-based retrieval of large media databases, and many similarity index structures have been proposed. Compared with the many structures present, less attention has been paid to performance evaluation of index structures and theoretic analysis son factors influencing index performance. In this paper, we attempt to solve part of the problem and focus our research on analyzing the influence of data splitting methods. To give a formal definition for index structure performance evaluation, we introduce the query distribution probability concept and propose using average search cost to evaluate the performance of a similarity indexing structure. We choose the simplest case of similarity indexing - nearest-neighbor search in our discussion and deduce an expression for the average search cost function. Based on analysis of the expression, we proposed some criteria that may be useful in index design and implementation. Then we extend these conclusions to the general similarity indexing case and use these criteria as general rules in index design and implementation. Basic thoughts and analysis are detailed, as well as experiment results.
Shape is a popular feature used for content-based image retrieval. In this paper we propose a new method for image retrieval using a shape boundary represented in scale-space. The proposed method is suggested by the notion of 'dynamic shape' where all 2D boundary representations evolve from a single, primeval, featureless shape - a circle. Shape is represented by linearizing the boundary based on the polar coordinates of boundary points relative to the object's centroid. Points on the shape boundary are mapped to a primeval circle, and two functions are defined, the Radius Difference Function and the Angle Difference Function,and smoothed through scale-space to devolve the shape. Maxima and minima of the Radius Difference Function are extracted and used to calculate similarity between objects. Similarity is calculated using Euclidean distance. Other scale-space approaches to shape representation use various techniques to maintain constant boundary arc length, that any otherwise change in non-intuitive ways over scale. We introduce the contour stability over scale property stating that the perceived boundary length should not change significantly over scale. Experiments show that significant similarity computation may be saved by using coarser scales without effectively reducing retrieval performance.
In this paper we present a technique for shape similarity estimation for content-based indexing and retrieval over large image databases. Here the high curvature points are detected using wavelet decomposition. The feature set is extracted under the framework of polygonal approximation. It uses simple features extracted at high curvature points. The experimental result and comparisons show the performance of the proposed technique. This technique is also suitable to be extended to the retrieval of 3D objects.
Most current image processing systems work on color images, and color is a precious perceptual clue for determining image similarity. Working with color images, however, is not the sam thing as working with images taking values in a 3D Euclidean space. Not only are color spaces bounded, but the characteristics of the observer endow the space with a 'perceptual' metric that in general does not correspond to the metric naturally inherited from R3. This paper studies the problem of filtering color images abstractly. It begins by determining the properties of the color sum and color product operations such that he desirable properties of orthonormal bases will be preserved. The paper then defines a general scheme, based on the action of the additive group on the color space, by which operations that satisfy the required properties can be defined.
The purpose of this paper is two-fold. We begin by exploring the emerging trend to view multimedia information in terms of low-level and high-level components; the former being feature-based and the latter the 'semantics' intrinsic to what is portrayed by the media object. Traditionally, this has been viewed by employing analogies with generative linguistics. Recently, a new perceptive based on the semiotic tradition has been alluded to in several papers. We believe this to be a more appropriate approach. From this, we propose an approach for tackling this problem which uses an associative data structure expressing authored information together with intelligent agents acting autonomously over this structure. We then show how neural networks can be used to implement such agents. The agents act as 'vehicles' for bridging the gap between multimedia semantics and concrete expressions of high-level knowledge, but we suggest that traditional neural network techniques for classification are not architecturally adequate.
This paper presents some methodological observations on the measurement of performance in Visual Information Retrieval systems. The paper identifies three different types of measures tow of which can be determined with methods inherited from physical and social sciences respectively. The third model is more typical of the design and construction of complicated systems, since it allows us to measure the performance of individual modules before their insertion in a particular application. This paper present some methodologies for the decontextualized evaluation, anchoring them to a case study of evaluation of several subsystems of an image database.
There has been significant progress in the area of content- based still image retrieval systems. However, most of the existing visual information management system use static feature analysis models decided by database implementers based on their heuristics, and use indexing oriented data modeling techniques. In other words, such systems have limitations of areas including scalability, extensibility and adaptability. In this paper, we will attempt to resolve the problems that surface in content modeling, description and sharing of distributed heterogeneous multimedia information. A language; named UCDL, for heterogenous multimedia content description is presented to resolve the related problem. The resulting UCDL facilitates a formal content modeling and description method for complex multimedia content and the exchange of heterogeneous content information. The proposed language has several advantages. For instance, an individual user can easily create audio-visual descriptions by using a library of automated tools. Users can perform automated testing of content description becomes implementation independent, thus offering portability across a number of applications for authoring tools to database management systems. Users can have personalized retrieval view through content filtering, and can easily share the heterogeneous content descriptions of various information sources. In addition, the proposed language can be a part of MPEG-7 DDL.
An anchor person is the hosting character in broadcast programs. Anchor segments in video often provide the landmarks for detecting the content boundaries so that it is important to identify such segments during automatic content-based multimedia indexing. Previous efforts are mostly focused on audio information or visual information alone for anchor detection using either model based methods via off-line trained models or unsupervised clustering methods. The inflexibility of the off-line model based approach and the increasing difficulty in achieving detection reliability using clustering approach lead to a new approach proposed in this paper. The goal is to detect an arbitrary anchor in a given broadcast news program. The proposed approach exploits both audio and visual cues so that on-line acoustic and visual models for the anchor can be built dynamically during data processing. In addition to the capability of identifying any given anchor, the proposed method can also be used to enhance the performance by combining with the algorithm that detects a predefined anchor. Preliminary experiment result are shown and discussed. It is demonstrated that this proposed new approach enables the flexibility of detecting an arbitrary anchor without losing the performance.
Tools for efficient and intelligent management of digital content are essential for digital video data management. An extremely challenging research area in this context is that of multimedia analysis and understanding. The capabilities of audio analysis in particular for video data management are yet to be fully exploited. We present a novel scheme for indexing and segmentation of video by analyzing the audio track. This analysis is then applied to the segmentation and indexing of movies. We build models for some interesting events in the motion picture soundtrack. The models built include music, human speech and silence. We propose the use of hidden Markov models to model the dynamics of the soundtrack and detect audio-events. Using these models we segment and index the soundtrack. A practical problem in motion picture soundtracks is that the audio in the track is of a composite nature. This corresponds to the mixing of sounds from different sources. Speech in foreground and music in background are common examples. The coexistence of multiple individual audio sources forces us to model such events explicitly. Experiments reveal that explicit modeling gives better result than modeling individual audio events separately.
In this paper, we propose four index structures for music data retrieval. Based on suffix trees, we develop two index structures called combined suffix tree and independent suffix trees. These methods still show shortcomings for some search functions. Hence we develop another index, called Twin Suffix Trees, to overcome these problems. However, the Twin Suffix Trees lack of scalability when the amount of music data becomes large. Therefore we propose the fourth index, called Grid-Twin Suffix Trees, to provide scalability and flexibility for a large amount of music data. For each index, we can use different search functions, like exact search and approximate search, on different music features, like melody, rhythm or both. We compare the performance of the different search functions applied on each index structure by a series of experiments.
The Lockheed Martin (LM) team had garnered over a decade of operational experience in digital imagery management and analysis for the US Government at numerous worldwide sites. Recently, it set out to create a new commercial product to serve the needs of large-scale imagery archiving and analysis markets worldwide. LM decided to provide a turnkey commercial solution to receive, store, retrieve, process, analyze and disseminate in 'push' or 'pull' modes components and adapted and developed its own algorithms to provide added functionality not commercially available elsewhere. The resultant product, Intelligent Library System, satisfies requirements for (a) a potentially unbounded, data archive automated workflow management for increased user productivity; (c) automatic tracking and management of files stored on shelves; (d) ability to ingest, process and disseminate data involves with bandwidths ranging up to multi-gigabit per second; (e) access through a thin client- to-server network environment; (f) multiple interactive users needing retrieval of filters in seconds from both archived images or in real time, and (g) scalability that maintains information throughput performance as the size of the digital library grows.
In this paper we extend the shot transition detection component of the ViBE video database system to include gradual scene changes. ViBE, a browsable/searchable paradigm for organizing video data containing a large number of sequences, is being developed at Purdue as a testbed to explore ideas and concepts in video databases. We also present result on the performance of our cut detection algorithm using a large test set. The performance of two other techniques are compared against our method.
This paper present a new approach to content based retrieval in image databases. The basic new idea in the proposed technique is to organize the quantized and truncated wavelet coefficient of an image into a suitable tree structure. The tree structure respects the natural hierarchy imposed on the coefficients by the successive resolution levels. Al the trees relative to the images in a database are organized into a trie. This structure helps in the error tolerant retrieval of queries. The result obtained show that this approach is promising provided that a suitable distance function between trees is adopted.
Most of the content-based image retrieval systems require a distance computation for each candidate image in the database. As a brute-force approach, the exhaustive search can be employed for this computation. However, this exhaustive search is time-consuming and limits the usefulness of such systems. Thus, there is a growing demand for a fast algorithm which provides the same retrieval results as the exhaustive search. In this paper, we prose a fast search algorithm based on a multi-resolution data structure. The proposed algorithm computes the lower bound of distance at each level and compares it with the latest minimum distance, starting from the low-resolution level. Once it is larger than the latest minimum distance, we can exclude the candidates without calculating the full- resolution distance. By doing this, we can dramatically reduce the total computational complexity. It is noticeable that the proposed fast algorithm provides not only the same retrieval results as the exhaustive search, but also a faster searching ability than existing fast algorithms. For additional performance improvement, we can easily combine the proposed algorithm with existing tree-based algorithms. The algorithm can also be used for the fast matching of various features such as luminance histograms, edge images, and local binary partition textures.
This paper presents a method for fast and effective similarity-based shape retrieval. Shape similarity is determined by comparing the frequencies with which different types of local structure occur in each shape. The system consists of three processes. (1) The segmentation process uses a scale-space approach to find convex segments that lie between curvature zero-crossings at all scales. Local shape structure is represented by short sequences of segments, called terms. (2) The representation process classifiers the terms into types based on a set of local shape features. Then the distribution of term types within the shape is computed. (3) The retrieval process compares the term type distribution of the query shape to the term type distributions of the database shapes and retrieves the most similar database shapes. Efficient data structures are used to store the distributions compactly and to support fast retrieval. The performance of the method on a test database ranged from 69 percent to 100 percent of ideal performance, depending on the number of items retrieved.
In this paper, we propose a self-learning content-based image indexing and retrieval system. Our system employs a hierarchical content representation and a hierarchical content matching method for effective and efficient image/object retrieval. the 'learning' behavior is enabled by our proposed hierarchical content representation which allows easy storage of combinations of regions that have resulted in successful matches to objects of interest as determined by user search patterns and profiles. The learning step effectively performs an automatic analysis of database images into meaningful objects given certain user search patterns and interest profiles. The advantages of the proposed hierarchical content representation and 'learning' schemes are demonstrated on a collection of car and face images, where the significant improvements in search and retrieval speed are described both theoretically and experimentally.
We propose an image clustering algorithm which uses fuzzy graph theory. First, we define a fuzzy graph and the concept of connectivity for a fuzzy graph. Then, based on our definition of connectivity we propose an algorithm which finds connected subgraphs of the original fuzzy graph. Each connected subgraph can be considered as a cluster. As an application of our algorithm, we consider a database of images. We calculate a similarity measure between any paris of images in the database and generate the corresponding fuzzy graph. The, we find the subgraphs of the resulting fuzzy graph using our algorithm. Each subgraph corresponds to a cluster. We apply our image clustering algorithm to the key frames of news programs to find the anchorperson clusters. Simulation results show that our algorithm is successful to find most of anchorperson frames from the database.
Using recent advances in scale-space decomposition techniques, it is possible to transform images into trees in scale space, which represent the topology of regions within the image. We propose that complex objects can be found in the trees using a combination of subgraph isomorphism testing and conventional feature matching and hence provide a vehicle for achieving content-based retrieval and navigations for complex objects in images.
Recently, multimedia database systems have emerged as a fruitful area for research due to the recent progress in high-speed communication networks, large capacity storage devices, digitized media,and data compression technologies over the last few years. Multimedia information has been used in a variety of applications including manufacturing, education, medicine, entertainment, etc. A multimedia database system integrates text, images, audio, graphics, database system is that all of the different media are brought together into one single unit, all controlled by a computer. As more information sources become available in multimedia systems, how to model and search the image processing techniques to model multimedia data. A Simultaneous Partition and Class Parameter Estimation algorithm that considers the problem of video frame segmentation as a joint estimation of the partition and class parameter variables has been developed and implemented to identify objects and their corresponding spatial relations. Based on the obtained object information, a web spatial model (WSM) is constructed. A WSM is a multimedia database searching structure to model the temporal and spatial relations of semantic objects so that multimedia database queries related to the objects' temporal and spatial relations on the images or video frames can be answered efficiently.
In this paper, we propose a new method for indexing large amounts of points in high-dimensional space. The basic principle is as follows: Data points or feature vectors extracted from objects are first quantized into lattice points by using lattice vector quantization. Inverted file is adopted to organize those lattice points. Fast retrieval is implemented by sing the good properties of algebraic lattice. We first tested the indexing performance for range query by using lattice Eg and Hash. The initial experimental result show our method has good indexing performance. However, we found, Hash has to search the inverted file for a large amounts of lattice points if query window is bigger or dimension is high. To solve this problem, we use Trie instead of Hash and propose Tire Parallel Search Algorithm to fast access the inverted file. Further experiments have been done for n-dimensional data point by using lattice Zn. The results show the proposed index structure owns many good properties such as low CPU cost and low I/O cost in comparison to R-tree.
It has been argued that future image coding techniques should allow for 'midstream access', i.e. allow image query, retrieval, and modification to proceed on the compressed representation. In a recent work, we introduced the color visual pattern image coding (CVPIC) technique for color image compression. An image is divided into blocks and each block coded locally by mapping it to one of a predefined, universal set of visually significant image patterns consisting of representations for both edge and uniform regions. The pattern and color information is then stored, following a color quantization algorithm and an entropy encoding stage. Compression ratios between 40:1 and 60:1 were achieved while maintaining high image quality on a variety of natural color images. It was also shown that CVPIC could achieve comparable performance to state-of-the- art techniques such as JPEG.
As an increasing amount of audio-visual data is stored, distribute, and used in the compressed form, compressed- domain techniques will be favorable. However, as conventional features may not be accessible in the compressed domain, exploration of new compressed domain features will become mandatory. Studies have shown that the DC coefficients of a DCT-compressed video can be used to detect shot transitions for relatively simple video sequences.In this work, the use of the energy histogram of the lower frequency DCT coefficients as features for video parsing was examined. The experimental results show an improvement over those obtained by the DC coefficients alone.
A method for lossless compression of large binary images is proposed for applications where spatial access to the image is needed. The method utilizes the advantages of (1) variable-size context modeling in a form of context trees, and (2) forward-adaptive statistical compression. New strategies for constructing the context tree are considered, including a fast two-stage bottom-up approach. The proposed technique achieves higher compression rates and allows dense tiling of images down to 50 X 50 pixels without sacrificing the compression performance. It enables partial decompression of large images far more efficiently than if the standard JBIG was applied.
In the past few years, immense improvement was obtained in the field of content-based image retrieval. Nevertheless, existing systems still fail when applied to medical image databases. Simple feature-extraction algorithms that operate on the entire image for characterization of color, texture, or shape cannot be related to the descriptive semantics of medical knowledge that is extracted from images by human experts.
Image retrieval in medical applications (IRMA) requires the cooperation of experts in the field of medicine, image analysis, feature analysis and systems engineering. A distributed developing platform was implemented to support the progress of the IRMA-system. As the concept for this system strictly separates the steps for medical image retrieval, its components can be developed separately by work groups in different departments. The development platform provides location and access transparency for its resources. These resources are images and extracted features as well as methods which all are distributed automatically between the work groups. Replications are created to avoid repeated network transfers. All resources are administered in one central database. Computationally expensive feature extraction tasks are distributed also automatically to be processed on concurring workstations of different work groups. The developing platform intensifies and simplifies the cooperation of the interdisciplinary IRMA-development- team by providing fast and automated deliveries of components from software developers to physicians for evaluation.
Automated classification of digital video is emerging as an important piece of the puzzle in the design of content management systems for digital libraries. The ability to classify videos into various classes such as sports, news, movies, or documentaries, increases the efficiency of indexing, browsing, and retrieval of video in large databases. In this paper, we discuss the extraction of features that enable identification of sports videos directly from the compressed domain of MPEG video. These features include detecting the presence of action replays, determining the amount of scene text in vide, and calculating various statistics on camera and/or object motion. The features are derived from the macroblock, motion,and bit-rate information that is readily accessible from MPEG video with very minimal decoding, leading to substantial gains in processing speeds. Full-decoding of selective frames is required only for text analysis. A decision tree classifier built using these features is able to identify sports clips with an accuracy of about 93 percent.
In this paper, we propose a dynamic approach to feature and classifier selection. In our approach, based on performance, visual features and classifiers are selected automatically. In earlier work, we presented the Visual Apprentice, in which users can define visual object models via a multiple- level object definition hierarchy. Visual Object Detectors are learned, using various learning algorithms - as the user provides examples from images or video, visual features are extracted and multiple classifiers are learned for each node of the hierarchy. In this paper, features and classifiers are selected automatically at each node, depending on their performance over the training set introduce the concept of Recurrent Visual Semantics and show how it can be used to identify domains in which performance-based learning techniques such as the one presented can be applied. We then show experimental results in detecting Baseball video shots, images that contain handshakes,and images that contain skies. These result demonstrate the importance, feasibility, and usefulness of dynamic feature/classifier selection for classification of visual information, and the performance benefits of using multiple learning algorithms to build classifiers. Based on our experiments, we also discuss some of the issues that arise when applying learning techniques in real-world content-based applications.
Consumer digital video devices are becoming computing platforms. As computing platforms, digital video devices are capable of crunching the compressed bits into the best displayable picture and delivering enhanced services. Although these deices will primarily aim to continue their traditional functions of display and storage, there are additional functions such as content management for real- time and stored video, tele-shopping, banking, Internet connectivity, and interactive services, which the device could also handle.
In this paper we propose an interactive tool for generating dynamic markers for video object in a distributed video content discussion environment. We address interactive video object selection and real-time video object marker generation which is supported by an automatic object tracking method. The proposed system satisfies the following criteria: (i) automatic object tracking has to be in real- time; (ii) the video object selection has to be carried out with minimal effort and knowledge; (iii) the user has to be notified by the system when the automatic object tracking method encounters problems: and (iv) interactive rectification of the object marker has to be instantaneous and direct. Our experimental results indicate that the proposed tool is very effective and intuitive in creating dynamic object markers for video content on the fly. Automatic object tracking method yields reliable results on a desktop PC in real-time, even with busy background and/or partial occlusion.
An increasing number of people own and use camcorders to make videos that capture their experiences and document their lives. These videos easily add up to many hours of material. Oddly, most of them are put into a storage box and never touched or watched again. The reasons for this are manifold. Firstly, the raw video material is unedited, and is therefore long-winded and lacking visually appealing effects. Video editing would help, but, it is still too time-consuming; people rarely find the time to do it. Secondly, watching the same tape more than a few times can be boring, since the video lacks any variation or surprise during playback. Automatic video abstracting algorithms can provide a method for processing videos so that users will want to play the material more often. However, existing automatic abstracting algorithms have been designed for feature films, newscasts or documentaries, and thus are inappropriate for home video material and raw video footage in general. In this paper, we present new algorithms for generating amusing, visually appealing and variable video abstracts of home video material automatically. They make use of a new, empirically motivated approach, also presented in the paper, to cluster time-stamped shots hierarchically into meaningful units. Last but not least, we propose a simple and natural extension of the way people acquire video - so-called on-the-fly annotations - which will allow a completely new set of applications on raw video footage as well as enable better and more selective automatic video abstracts. Moreover, our algorithms are not restricted to home video but can also be applied to raw video footage in general.
In this paper we present a new descriptor for spatial distribution of motion activity in video sequences. We use the magnitude of the motion vectors as a measure of the intensity of motion cavity in a macro-block. We construct a matrix Cmv consisting of the magnitudes of the motion vector for each macro-block of a given P frame. We compute the average magnitude of the motion vector per macro-block Cavg, and then use Cavg as a threshold on the matrix C by setting the elements of C that are less than Cavg to zero. We classify the runs of zeros into three categories based on length, and count the number of runs of each category in the matrix C. Our activity descriptor for a frame thus consists of four parameters viz. the average magnitude of the motion vectors and the numbers of runs of short, medium and long length. Since the feature extraction is in the compressed domain and simple, it is extremely fast. We have tested it on the MPEG-7 test content set, which consists of approximately 14 hours of MPEG-1 encoded video content of different kinds. We find that our descriptor enables fast and accurate indexing of video. It is robust to noise and changes in encoding parameters such as frame size, frame rate, encoding bit rate, encoding format etc. It is a low-level non-semantic descriptor that gives semantic matches within the same program, and is thus very suitable for applications such as video program browsing. We also find that indirect and computationally simpler measures of the magnitude of the motion vectors such as bits taken to encode the motion vectors, though less effective, also can be used in our run-length framework.
Color objects recognition methods that are based on image retrieval algorithms can handle changes of illumination via image normalization, e.g. simple color-channel-normalization or by forming a doubly-stochastic image matrix. However these methods fail if the object sought is surrounded by clutter. Rather than directly trying to find the target, a viable approach is to grow a small number of feature regions called locales. These are defined as a non-disjoint coarse localization based on image tiles. In this paper, locales are grown based on chromaticity, which is more insensitive to illumination change than is color. Using a diagonal model of illumination changes, a least-squares optimization on chromaticity recovers the best set of diagonal coefficients for candidate assignments from model to test locales sorted in a database. If locale centroids are also sorted then, adapting a displacement model to include model locale weights, transformed pose and scale can be recovered. Tests on databases of real images show promising results for objects query.
Developing semantic indices into large image databases is a challenging and important problem in content-based image retrieval. We address the problem of detecting objects in an image based on color and texture features. Specifically, we consider the following two problems of detecting sky and vegetation in outdoor images. An image is divided into 16 X 16 sub-blocks and color, texture, and position features are extracted form every sub-block. We demonstrate how a small set of codebook vectors, extracted from every sub- block. We demonstrate how a small set of codebook vectors, extracted from a learning vector quantizer, can be used to estimate the class-conditional densities of the low-level observed feature needed for the Bayesian methodology. The sky and vegetation detectors have been trained on over 400 color images from the Corel database. We achieve classification accuracies of over 94 percent for both the classifiers on the training data. We are currently extending our evaluation to a larger database of 1,700 images.
An effective image management system should provide efficient storage of the image collection, while simultaneously providing fast content-based retrieval of the images. The traditional image management approach is to handle compression and indexing separately. Compression does not address the issue of image indexing.
This paper describes an image retrieval system that searches a database for images similar to a target imagined by a user. The system uses image features, rather than keywords, and retrieves images by estimating distance parameters. We use the Mahalanobis distance as the distance measure. The system, first, presents the user some images with a suitable feature vector value and asks the user to select images that are similar to the target he has in mind. Based on the user's selection, the system revises the distance parameters. This process is continued until the target region is reduced to a suitable volume. Since this method requires neither a real target image nor keywords to do the retrieval, it is quite simple and practical. Experimental results show the advantage and efficiency of the proposed system.
Content-based multimedia information retrieval is the hot point of researchers in many domains. But traditional feature vector based retrieval method can not provide retrieval on the semantic template automatically in the process of relevance feedback, and construct a network of semantic template with the support of WordNet in the retrieval process, which helps the user to do retrieval on the semantic level. By our approach, in the keyword query of the user, relevant images will be returned to the user by the help of semantic template association even those images are not annotated by keyword. This paper introduces this approach in detail and present an experiment result at the end of this paper.
This paper present a wavelet-transform-based algorithm for fast and accurate retrieval of images from large image databases. We describe the architecture and the implementation of such a framework to perform a search for an image in a database. The retrieval algorithm entails a wavelet-based methodology that yields an elegant and computationally efficient implementation. This scheme matches the images directly in the wavelet domain, based on the information extracted from only a few of the wavelet coefficients. Experiments show that this novel algorithm provides highly accurate search results.
This paper investigates retrieval and indexing schemes in pixel domain, and points to the future work on image retrieval schemes. Image retrieval schemes generate indices for images in pixel or compressed domain based on their features in the corresponding domain. These indices are used to retrieve images from a database. The features in pixel domain are extracted from the color, shape or texture characteristics of images. The application of these three methods depends on the characteristic of the image database, and the query image. In the near future the image databases contains the compressed version of images, therefore there will be a high demand for the image retrieval techniques in compressed domain.
The amount of pictorial data grows enormously with the expansion of the WWW. From the large number of images, it is very important for users to retrieve desired images via an efficient and effective mechanism. In this paper we prose two efficient approaches to facilitate image retrieval by using a simple method to represent the image content. Each image is partitioned into m X n equal-sized sub-images. A color that has enough number of pixels in a block is extracted to represent its content. In the first approach, the image content is represented by the extracted colors of the blocks. The spatial information of images is considered in image retrieval. In the second approach, the colors of the blocks in an image are used to extract objects. A block- level process is process is proposed to perform the region extraction. The spatial information of regions is considered unimportant in image retrieval. Our experiments show that these two block-based approaches can speed up the image retrieval. Moreover, the two approaches are effective for different requirements of image similarity. Users can choose a proper approach to process their queries based on their similarity requirements.
Significant number of color images on the WWW and many other applications are color quantized. This paper presents a new technique for content-based retrieval of this type of color images. We introduce a technique, which compares the color contents of images by comparing the color tables of the image. An effective measure of the similarity of color tables based on a modified Hausdorff distance has been developed. Computer simulation result on retrieving three sets of colorful images are presented which demonstrated the highly effective of the method.
Video segmentation is an important step in many of the video applications. We observe that the video shot boundary is a multi-resolution edge phenomenon in the feature space. Based on this observation, we have developed a novel temporal multi-resolution analysis (TMRA) based algorithm using Canny wavelets to perform temporal video segmentation. Information across multiple resolution is used to help detect as well as locate abrupt and gradual transitions. We present the theoretical basis of the algorithm followed by the implementation as well as the result. In this paper the TMRA technique has been implemented using color histogram in the raw domain and DCT coefficients in the compressed video streams as the feature space. Experimental result shows that this method can detect as well as characterize both the abrupt and gradual shot boundaries. The technique also shows good noise tolerance characteristics.
In this work, we present a system for the automatic segmentation, indexing and retrieval of audiovisual data based on the combination of audio, visual and textural content analysis. The video stream is demultiplexed into audio, image and caption components. Then, a semantic segmentation of the audio signal based on audio content analysis is conducted, and each segment is indexed as one of the basic audio types. The image sequence is segmented into shots based on visual information analysis, and keyframes are extracted from each shot. Meanwhile, keywords are detected from the closed caption. Index tables are designed for both linear and non-linear access to the video. It is shown by experiments that the proposed methods for multimodal media content analysis are effective. And that the integrated framework achieves satisfactory results for video information filtering and retrieval.
In this paper, we present a fade detection technique for indexing of MPEG-2 and MPEG-4 compressed video sequences. We declare a fade-in if the number of positive residual dc coefficients in P frames exceeds a certain percentage of the total number of non-zero dc coefficients consistently over several consecutive frames. Our fade-detection technique has fair accuracy and the advantage of high simplicity since it uses only entropy decoding and does not use computationally expensive inverse DCTs.
In this paper we address the problem of the detection of abrupt shot changes in videos. Differently from the majority of the techniques in the literature., we perform this task directly on the stream coded in the Mpeg format, without resorting to any decoding procedure. The proposed algorithm proceeds according to a step-wise refinement strategy and combining different cut detection criteria. Experimental results are presented and discussed.
The current trend in content-based retrieval is the development of object-based systems. Such systems enable users to make higher level queries which are more intuitive to them than queries based on visual primitives. In this paper, we present OVID, our Object-based VIDeo retrieval system. It currently consists of a video parsing module, an annotation module, a user interface and a search mechanism. A combined multiple expert approach is at the heart of the video parsing routine for an improved performance. The annotation module extracts color and texture-based region information which will be used by the neural-network-based search routine at query tie. The iconic query paradigm on which the system is based provides users with a flexible means to define object-based queries.
In this paper, we propose an efficient wavelet-based approach to achieve flexible and robust motion trajectory matching of video objects. By using the wavelet transform, our algorithm decomposes the raw object trajectory into components at different scales. We use the coarsest scale components to approximate the global motion information and the finer scale components to partition the global motion into subtrajectories. Each subtrajectory is then modeled by a set of spatial and temporal translation invariant attributes. Motion retrieval based on subtrajectory modeling has been tested and compared against other global trajectory matching schemes to show the advantages of our approach in achieving spatio-temporal invariance properties.
This paper addresses key-frame selection for content-based video indexing and access. The proposed key-frame selection method is aimed to operate in real-time irrespective of the available computation resources and memory. Hence, we provide three solutions to content-based key-frame selection with different costs, and suggests three operation levels. The suggested key-frame selection method has two major parts: (i) segmentation of the video into shots; (ii) analysis of the motion and color activity within each video shot to selected additional frames. We also prove a new color based approach to key-frame selection and discuss how to fuse color and motion based key-frame selection results.
Efficient ways to manage digital video data have assumed enormous importance lately. An integral aspect is the ability to browse, index nd search huge volumes of video data automatically and efficiently. This paper presents a novel scheme for matching video sequences base on low-level features. The scheme supports fast and efficient matching and can search 450,000 frames of video data within 72 seconds on a 400 MHz. Pentium II, for a 50 frame query. Video sequences are processed in the compressed domain to extract the histograms of the images in the DCT sequence is implemented for matching video clips. The binds of the histograms of successive for comparison. This leads to efficient storage and transmission. The histogram representation can be compacted to 4.26 real numbers per frame, while achieving high matching accuracy. Multiple temporal resolution sampling of the videos to be matched is also supported and any key-frame-based matching scheme thus becomes a particular implementation of this scheme.
Studies on MPEG-7 have so far been concentrated only on the normative components. While feature extraction has been an active research subject in content-based retrieval studies, work on content-based search engines is scarce. In this paper, we attempt to perceive the constructs of an MPEG-7 optimum search engine by examining various requirements as well as functionality imposed on, or enabled for, the search engine by the MPEG-7 standard.
Histograms are the most prevalently used representation for the color content of images and video. An elaborate representation of the histograms requires specifying the color centers of the histogram bins and the count of the number of image pixels with that color. Such an elaborate representation, though expressive, may not be necessary for some tasks in image search, filtering and retrieval. A qualitative representation of the histogram is sufficient for many applications. Such as representation will be compact and greatly simplify the storage and transmission of the image representation. It will also reduce the computational complexity of search and filtering algorithms without adversely affecting the quality. We present such a compact binary descriptor for color representation. This descriptor is the quantized Haar transform coefficients of the color histogram. We show the use of this descriptor for browsing large image databases without the need for computationally expensive clustering algorithms. The compact nature of the descriptor and the associated simple similarity measure allows searching over a database of about four hours of video in less than 5-6 seconds without the use of any sophisticated indexing scheme.