PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
In an earlier study a Semantic Content Based Image Retrieval system was developed. The system requires a Visual Object Process Diagram - VOPD to be created for each image in the database and for the query. This is a major drawback since it requires the user and database manager to be acquainted with the rules and structures of the VOPD. This is not trivial and in fact troublesome to the naive user. To overcome this drawback two approaches are presented in this work, to provide an interface to the Image Retrieval system and to bypass the need of manually creating VOPD representations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a prototype system for managing and searching collections of personal digital images. The system allows the collection to be stored across a mixture of local and remote computers and managed seamlessly. It provides multiple ways of organizing and viewing the same collection. It also provides a search function that uses features based on face detection and low-level color, texture and edge features combined with digital camera capture settings to provide high-quality search that is computed at the server but available from all other networked devices accessing the photo collection. Evaluations of the search facility using human relevancy experiments are provided.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image retrieval (IR) means taking a probe image and finding the most appropriate match in a (possibly very large) image database. Unlike keyword-indexing, our approach is to compute a feature vector (FV) for each image, and to compute the distance from the probe to each image in the database. As a starting point, we studied the system of Jacobs et al., developed at the University of Washington, which used the Haar wavelet transform to produce feature vectors from images. A genetic algorithm developed weighting parameters which yielded significantly improved image retrieval performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital image retrieval systems allow sophisticated querying and searching by image contents. Since 1990’s, Content-Based Image Retrieval (CBIR) has attracted great research attention. In this paper, we propose a new approach for shape-based image retrieval. We perform an independent edge self-reinforcement algorithm on the edge map to yield the salient edges. The content of a salient edge is characterized by its low-level features, including length, rotation angle histogram and corner frequency. Then, the image similarity measure are based on the EMD (Earth Mover’s Distance), named as integrated salient edge matching in this article. Preliminary experimental results on a database containing 7000 images demonstrate that the proposed method is promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image and object search and retrieval algorithm is devised that combines color and spatial information. Spatial characteristics are described in terms of Wiskott’s jets formulation, based on a set of Gabor wavelet functions at varying scales, orientations and locations. Color information is first converted to a form more impervious to illumination color change, reduced to 2D, and encoded in a histogram based on a new stretched chromaticity space for which all bins are populated. An image database of 27,380 images is devised by replicating 2,738 JPEG images by a set of transforms that include resizing, various cropping attacks, JPEG quality changes, aspect ratio alteration, and reducing color to grayscale. Correlation of the complete encode vector is used as the similarity measure. For both searches with the original image as probe within the complete dataset, and with the altered images as probes with the original dataset, the grayscale, the stretched, and the resized images had near-perfect results. The most formidable challenge was found to be images that were cropped both horizontally as well as vertically. The algorithm’s ability to identify objects, as opposed to just images, is tested. In searching for images in a set of 4 classifications, the jets were found to contribute most analytic power when objects with distinctive spatial characteristics were the target.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a photo browsing system that uses image classification results in an error tolerant manner. Images are hierarchically classified into indoor/outdoor and further into city/landscape. We employ simple classifiers based on global color histogram, wavelet subband energies and contour directions having medium recall rates around 85%. This paper delivers two contributions to cope with classification errors in the context of image browsing. The first contribution is a method to associate confidence measures to classification results. A second contribution is a browsing tool that does not reveal classification results to the user. Instead, browsing options are generated. These browsing options are thumbnails representing semantic topics such as indoor and outdoor. User studies showed that thumbnails and semantic topics are highly demanded features for a photo-browsing tool. The thumbnails are representative images from the database with high confidence values. The thumbnails are chosen context-based such that they have class labels in common with currently displayed images or usage history.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The focus of this paper is on similarity modeling. In the first part we revisit underlying concepts of similarity modeling and sketch the currently most used VIR similarity model (Linear Weighted Merging, LWM). Motivated by its drawbacks we introduce a new general similarity model called Logical Retrieval (LR) that offers more flexibility than LWM. In the second part we integrate the Feature Contrast Model (FCM) in this environment, developed by psychologists to explain human peculiarities in similarity perception. FCM is integrated as a general method for distance measurement. The results show that FCM performs (in the LR context) better than metric-based distance measurement. Euclidean distance is used for comparison because it is used in many VIR systems and is based on the questionable metric axioms. FCM minimizes the number of clusters in distance space. Therefore it is the ideal distance measure for LR. FCM allows a number of different parameterizations. The tests reveal that in average a symmetric, non-subtractive configuration that emphasizes common properties of visual objects performs best. Its major drawback in comparison to Euclidean distance is its worse performance (in terms of query execution time).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An original image retrieval framework is proposed and developed. Trying to achieve the semantic retrieval, a novel cognitive model - feature element constructional model is proposed. With its hierachical constructional structure and bias competition mechanism, the new model provides great power for semantic retrieval. Two types of retrieval mode are presented in the new system, which both try to analysis the semantic concept in the query image or semantic command. Then matching from the object to the feature element is carried out to obtain the final result, and our understanding of retrieval “to provide the way of approaching the accurate result” is also embodied.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Enabling semantic detection and indexing is an important task in multimedia content management. Learning and classification techniques are increasingly relevant to the state of the art content management systems. From relevance feedback to semantic detection, there is a shift in the amount of supervision that precedes retrieval from light weight classifiers to heavy weight classifiers. In this paper we compare the performance of some popular classifiers for semantic video indexing. We mainly compare among other techniques, one technique for generative modeling and one for discriminant learning and show how they behave depending on the number of examples that the user is willing to provide to the system. We report results using the NIST TREC Video Corpus.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Static multimedia on the Web can already be hardly structured manually. Although unavoidable and necessary, manual annotation of dynamic multimedia becomes even less feasible when multimedia quickly changes in complexity, i.e. in volume, modality, and usage context. The latter context could be set by learning or other purposes of the multimedia material. This multimedia dynamics calls for categorisation systems that index, query and retrieve multimedia objects on the fly in a similar way as a human expert would. We present and demonstrate such a supervised dynamic multimedia object categorisation system. Our categorisation system comes about by continuously gauging it to a group of human experts who annotate raw multimedia for a certain domain ontology given a usage context. Thus effectively our system learns the categorisation behaviour of human experts. By inducing supervised multi-modal content and context-dependent potentials our categorisation system associates field strengths of raw dynamic multimedia object categorisations with those human experts would assign. After a sufficient long period of supervised machine learning we arrive at automated robust and discriminative multimedia categorisation. We demonstrate the usefulness and effectiveness of our multimedia categorisation system in retrieving semantically meaningful soccer-video fragments, in particular by taking advantage of multimodal and domain specific information and knowledge supplied by human experts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Music query-by-humming has attracted much research interest recently. It is a challenging problem since the hummed query inevitably contains much variation and inaccuracy. Furthermore, the similarity computation between the query tune and the reference melody is not easy due to the difficulty in ensuring proper alignment. This is because the query tune can be rendered at an unknown speed and it is usually an arbitrary subsequence of the target reference melody. Many of the previous methods, which adopt note segmentation and string matching, suffer drastically from the errors in the note segmentation, which affects retrieval accuracy and efficiency. Some methods solve the alignment issue by controlling the speed of the articulation of queries, which is inconvenient because it forces users to hum along a metronome. Some other techniques introduce arbitrary rescaling in time but this is computationally very inefficient. In this paper, we introduce a melody alignment technique, which addresses the robustness and efficiency issues. We also present a new melody similarity metric, which is performed directly on melody contours of the query data. This approach cleanly separates the alignment and similarity measurement in the search process. We show how to robustly and efficiently align the query melody with the reference melodies and how to measure the similarity subsequently. We have carried out extensive experiments. Our melody alignment method can reduce the matching candidate to 1.7% with 95% correct alignment rate. The overall retrieval system achieved 80% recall in the top 10 rank list. The results demonstrate the robustness and effectiveness the proposed methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In western art music, composers communicate their work to performers via a standard notation which specificies the musical pitches and relative timings of notes. This notation may also include some higher level information such as variations in the dynamics, tempo and timing. Famous performers are characterised by their expressive interpretation, the ability to convey structural and emotive information within the given framework. The majority of work on audio content analysis focusses on retrieving score-level information; this paper reports on the extraction of parameters describing the performance, a task which requires a much higher degree of accuracy. Two systems are presented: BeatRoot, an off-line beat tracking system which finds the times of musical beats and tracks changes in tempo throughout a performance, and the Performance Worm, a system which provides a real-time visualisation of the two most important expressive dimensions, tempo and dynamics. Both of these systems are being used to process data for a large-scale study of musical expression in classical and romantic piano performance, which uses artificial intelligence (machine learning) techniques to discover fundamental patterns or principles governing expressive performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We give an overview of existing audio analysis approaches in the compressed domain and incorporate them into a coherent formal structure. After examining the kinds of information accessible in an MPEG-1 compressed audio stream, we describe a coherent approach to determine features from them and report on a number of applications they enable. Most of them aim at creating an index to the audio stream by segmenting the stream into temporally coherent regions, which may be classified into pre-specified types of sounds such as music, speech, speakers, animal sounds, sound effects, or silence. Other applications centre around sound recognition such as gender, beat or speech recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In Casey describes a generalized sound recognition framework based on reduced rank spectra and Minimum-Entropy Priors. This approach enables successful recognition of a wide variety of sounds such as male speech, female speech, music, animal sounds etc. In this work, we apply this recognition framework to news video to enable quick video browsing. We identify speaker change positions in the broadcast news using the sound recognition framework. We combine the speaker change position with color & motion cues from video and are able to locate the beginning of each of the topics covered by the news video. We can thus skim the video by merely playing a small portion starting from each of the locations where one of the principal cast begins to speak. In combination with our motion-based video browsing approach, our technique provides simple automatic news video browsing. While similar work has been done before, our approach is simpler and faster than competing techniques, and provides a rich framework for further analysis and description of content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digital music files. In the first step, spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity.The digital audio can be robustly segmented by correlating a kernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed. In the second step,segments are clustered based on the self-similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels. Finally, the music is summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By extracting representative image features and employing our recently developed perceptual distance function (dynamic partial function), image copy detection can be performed effectively. Our empirical study shows that our scheme can detect various forms of near-replicas with high accuracy. Thus, our system has application for protection of copyrighted images and trademarks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present the architecture and a resource management and adaptation framework that goes beyond existing peer-to-peer and content delivery infrastructures to accommodate and accelerate multimedia peer applications and services. We will propose key technology components that allow the seamless adaptation of resources to enhance quality of service and the building of better tools and applications that utilize the underlying power of the peer-computing network, and show a prototype system that integrates the various components, as well as some sample applications that can be built on the proposed infrastructure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting unauthorized copies of digital media (images, audio and video) is a basic requirement for Intellectual Property Right (IPR) protection. This paper proposes a novel method to detect copies of digital images. This copy detection scheme can be used as either an alternative approach or a complementary approach to watermarking. A test image is first reduced to 8×8 sub-image by intensity averaging, and then, the AC coefficients of its discrete cosine transform (DCT) are used to compute distance from those generated from the query image, of which a user wants to find copies. A challenge arises when copies are processed to avoid copy detection or enhance image quality. We show the ordinal measure of DCT coefficients, which is based on relative ordering of AC magnitude values and using distance metrics between two rank permutations, are robust to various modifications of the original image. The optimal threshold selection scheme using the maximum a posteriori (MAP) criterion is also addressed. Through simulations on the database of 40,000 images we show the effectiveness of the proposed system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This document presents several approaches to extract interest points within compressed images (based on DCT compression methods). The goal is to minimize the stages and/or the calculation costs for image sequence indexing tasks or database retrieval from a significant MPEG file repository.
Initially, only the fixed images (I-Frames) are take under consideration, motion will be integrated in further research. The traditional invariant feature points (Harris corner points, points with remarquable principal curvatures) are extracted from images using a gradient estimate (first order derivative) or the Laplacian (second-order derivative) of an image. So the first part of this paper handles in detail the derivation of the signal from DCT blocks.
The trials to implement feature points detection as close as possible to the DCT coefficient are explained. Results provided by our last DCT-blockwise curvature estimatiorare also shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Whilst storage and capture technologies are able to cope with huge numbers of images, image retrieval is in danger of rendering many repositories valueless because of the difficulty of access. This paper proposes a similarity measure that imposes only very weak assumptions on the nature of the features used in the recognition process. This approach does not make use of a pre-defined set of feature measurements which are extracted from a query image and used to match those from database images, but instead generates features on a trial and error basis during the calculation of the similarity measure. This has the significant advantage that features that determine similarity can match whatever image property is important in a particular region whether it be a shape, a texture, a colour or a combination of all three. It means that effort is expended searching for the best feature for the region rather than expecting that a fixed feature set will perform optimally over the whole area of an image and over every image in a database. The similarity measure is evaluated on a problem of distinguishing similar shapes in sets of black and white symbols.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In content-based image indexing and retrieval (IIR), hue component histograms of images are widely used for indexing the images in an image database. It is to retrieve all color images whose distance between hue distributions are within some threshold distance of the query image. Edit distance has been successfully used as a similarity measure. Our earlier algorithm O(b2) computing the edit distance between two angular histograms, where b is the number of bins in the hue histogram, tends to be too slow for users to wait for the outputs when applied to every image in the database. For this reason, we design two filtration functions that eliminate most color images from consideration as possible outputs quickly and exact edit distances are only computed for those remaining images. We are still guaranteed to find all similar hue distributions and the filtration technique gives significant speeds-ups.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The explosive growth of images and videos on the World Wide Web (WWW) is making the Web into a huge resource of visual information. Among various types of multimedia information, still images or dynamic images (video clips) in compressed format are the most widely accepted on the WWW. Therefore, it becomes an essential issue to achieve the maximum efficiency in transmitting and decoding those compressed images on the Internet. Progressive coding provides a mode that allows a coarse version of an image being transmitted at a lower bit rate and then gradually refined by subsequent transmissions. Compared with conventional coding, it is more suitable for interactive applications such as those involving JPEG images on the Internet. In this paper, we first give an approximation of cosine function used in IDCT for the various orders. Based on the approximation and a series analysis, we then develop a progressive decoding scheme which comprehends the successive approximation and the spectral selection. The analysis and experiments establish the fact that our proposed method saves computational cost significantly in comparison with the existing spectral selection based progressive decoding proposed by JPEG. Extensive experiments are carried out to evaluate the proposed algorithm, which reveals that, the reconstructed images, even at the lowest bit rate and with lower order approximation, can still achieve encouraging PSNR values.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present an approach to extract scenes in video. The approach is top-down and uses video editing rules and audio cues to extract simple dialog and action scenes. The underlying model is a finite state machine coupled with audio cues that are determined using an audio classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video segmentation is fundamental to a number of applications related to video retrieval and analysis. Shot change detection is the initial step of video segmentation and indexing. There are two basic types of shot changes. One is the abrupt change or cut, and the other is the gradual shot transition. The smooth variations of the video feature values in a gradual transition produced by the editing effects are often confused with those caused by camera or object motions. To overcome this difficulty, it is reasonable to estimate the motions and suppress the disturbance caused by them. In this thesis, we explore the possibility to exploit motion and illumination estimation in a video sequence to detect both abrupt and gradual shot changes. A generalized optical flow constraint that includes an illumination parameter to model local illumination changes is employed in the motion and illumination estimation. An iterative process is used to refine the generalized optical flow constraints step by step. A robust measure that is the likelihood ratio of the corresponding motion-compensated blocks in the consecutive frames is used for detecting abrupt changes. For the detection of gradual shot transitions, we compute the average monotony of intensity variations on the stable pixels in the images in a twin-comparison framework. We test the proposed algorithm on a number of video sequences in TREC 2001 and compare the detection results with the best results reported in the TREC 2001 benchmark. The comparisons indicate that the proposed shot change detection algorithm is competitive against the best existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a technique for rapidly generating highlights of soccer videos using peaks in audio volume in conjunction with temporal patterns of motion activity extracted in the compressed domain. Our intuition is that any interesting event, such as a goal, in a soccer match leads to an interruption of the game for a non-trivial duration. Furthermore, interesting events are associated with a sharp increase (or peak) in audio volume since the crowd noise goes up in anticipation of the event or as a result of the event. We thus use the temporal patterns of motion activity around each audio peak to detect and capture interesting events. Our preliminary results indicate that the scheme works well for a variety of soccer content from different parts of the world. The computational simplicity of our scheme enables rapid and flexible generation of highlights.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many recent efforts have been made to automatically index multimedia content with the aim of bridging the semantic gap between syntax and semantics. In this paper, we propose a novel framework to automatically index video using context for video understanding. First we discuss the notion of context and how it relates to video understanding. Then we present the framework we are constructing, which is modeled as an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and different data sources available with the video (metadata, text from automatic speech recognition, etc.). We also describe our approach to align text from speech recognition and video segments, and present experiments using a simple implementation of our framework. Our experiments show that context can be used to improve the performance of visual detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work aims at recovering the temporal structure of a broadcast tennis video from an analysis of the raw footage. Our method relies on a statistical model of the interleaving of shots, in order to group shots into predefined classes representing structural elements of a tennis video. This stochastic modeling is performed in the global framework of Hidden Markov Models (HMMs). The fundamental units are shots and transitions. In a first step, colors and motion attributes of segmented shots are used to map shots into 2 classes: game (view of the full tennis court) and not game (medium, close up views, and commercials). In a second step, a trained HMM is used to analyze the temporal interleaving of shots. This analysis results in the identification of more complex structures, such as first missed services, short rallies that could be aces or services, long rallies, breaks that are significant of the end of a game and replays that highlight interesting points. These higher-level unit structures can be used either to create summaries, or to allow non-linear browsing of the video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of a specific sport to perform a top-down video shot classification, including identification of video shot classes for each sport, and supervised learning and classification of the given sports video with low-level and middle-level features extracted from the sports video. It is observed that for each sport we can predefine a small number of semantic shot classes, about 5~10, which covers 90~95% of sports broadcasting video. With the supervised learning method, we can map the low-level features to middle-level semantic video shot attributes such as dominant object motion (a player), camera motion patterns, and court shape, etc. On the basis of the appropriate fusion of those middle-level shot classes, we classify video shots into the predefined video shot classes, each of which has a clear semantic meaning. The proposed method has been tested over 4 types of sports videos: tennis, basketball, volleyball and soccer. Good classification accuracy of 85~95% has been achieved. With correctly classified sports video shots, further structural and temporal analysis, such as event detection, video skimming, table of content, etc, will be greatly facilitated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major challenges facing current media management systems and the related applications is the so-called “semantic gap” between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Using video analysis for detecting hazardous events such as fire/smoke activity, impending threats, or suspicious behaviors has spurred new research for security concerns. To make such detection reliable, researchers must overcome difficulties such as classification by the importance of consequences, imbalances of positive and negative data, environmental factors, and variation in camera capabilities. This paper puts forward a general framework for hazardous event detection which includes spatial-temporal feature extraction, statistical-based classification for biased data and calibration for environmental change. At the current stage of development, the framework can work effectively for detecting hazardous events like fire/smoke from video sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level soccer video processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game, ii) all goals in a game, and iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust for soccer video processing. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g. goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and the robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured at different countries and conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital television (digiTV) is an additional multimedia environment, where metadata is one key element for the description of arbitrary content. This implies adequate structures for content description, which is provided by XML metadata schemes (e.g. MPEG-7, MPEG-21). Content and metadata management is the task of a multimedia repository, from which digiTV clients - equipped with an Internet connection - can access rich additional multimedia types over an “All-HTTP” protocol layer. Within this research work, we focus on conceptual design issues of a metadata repository for the storage of metadata, accessible from the feedback channel of a local set-top box. Our concept describes the whole heterogeneous life-cycle chain of XML metadata from the service provider to the digiTV equipment, device independent representation of content, accessing and querying the metadata repository, management of metadata related to digiTV, and interconnection of basic system components (http front-end, relational database system, and servlet container). We present our conceptual test configuration of a metadata repository that is aimed at a real-world deployment, done within the scope of the future interaction (fiTV) project at the Digital Media Institute (DMI) Tampere (www.futureinteraction.tv).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In general, video is too much lengthy for browsing the contents. So, there are many efforts being made for browsing the content of the video fast and effectively. Video summary is the one of techniques related to those efforts. Video summary comprises a number of key-frames. Therefore, we propose a method to extract key-frames from the video in MPEG compressed domain. Proposed method extracts the simple 2D content curve reflecting the variation of the video content from the MPEG video in the compressed domain, approximates the curve to polygonal lines and then extracts key-frames from the approximated lines effectively and rapidly. Also, proposed method let the user set the number of key-frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose semantic event detection method using MPEG-7. In the proposed method, content description technique of MPEG-7 is adopted into the detection algorithm to extract, represent, reuse, and interoperate low-level features. Also we use multiple descriptors to improve efficiency. In this paper, shots and key frames give a hint in semantic event detection by predefined inference. Each shot gets a semantic meaning using MPEG-7 descriptors with example image or image sequence. Event is detected by segmenting the shots.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present an audio segmentation technique by searching similar sections of a song. The search is performed on MPEG-7 low-level audio feature descriptors as a growing source of multimedia meta data. These descriptors are available every 10 ms of audio data. For each block the similarity to each other block is determined. The result of this operation is a matrix which contains off-diagonal stripes representing similar regions. At that point some postprocessing is necessary due to a very disturbed structure of the similarity matrix. Using the a-priori knowledge that we search off-diagonal stripes which must represent several seconds of audio data we implemented a filter to enhance the structure of the similarity matrix. The last step is to extract the off-diagonal stripes and match them into the time domain of the audio data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object shape features are powerful when used in similarity search-&-retrieval and object recognition because object shape is usually strongly linked to object functionality and identity. Many applications, including those concerned with visual objects retrieval or indexing, are likely to use shape features. Those systems have to cope with scaling, rotation, deformation and partial occlusion of the objects to be described. The ISO standard MPEG-7 contains different shape descriptors, where we focus especially on the region-shape descriptor. Since we found, that the region-shape descriptor is not very robust against partial occlusion, we propose a slightly changed feature extraction method, which is based on central-moments. Further, we compare our method with the original region-shape implementation and show that, applying the proposed changes, the robustness of the region-shape descriptor against partial occlusions can be significantly increased.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Driven by increasing amount of music available electronically the need and possibility of automatic classification systems for music becomes more and more important. Currently most search engines for music are based on textual descriptions like artist or/and title.
This paper presents a system for automatic music description, classification and visualization for a set of songs. The system is designed to extract significant features of a piece of music in order to find songs of similar genre or a similar sound characteristics. The description is done with the help of MPEG-7 only. The classification and visualization is done with the self organizing map algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Efficient content-based image retrieval of biomedical images is a challenging problem of growing research interest. Feature representation algorithms used in indexing medical images on the pathology of interest have to address conflicting goals of reducing feature dimensionality while retaining important and often subtle biomedical features. At the Lister Hill National Center for Biomedical Communications, a R&D division of the National Library of Medicine, we are developing a content-based image retrieval system for digitized images of a collection of 17,000 cervical and lumbar x-rays taken as a part of the second National Health and Nutrition Examination Survey (NHANES II). Shape is the only feature that effectively describes various pathologies identified by medical experts as being consistently and reliably found in the image collection. In order to determine if the state of the art in shape representation methods is suitable for this application, we have evaluated representative algorithms selected from the literature. The algorithms were tested on a subset of 250 vertebral shapes. In this paper we present the requirements of an ideal algorithm, define the evaluation criteria, and present the results and our analysis of the evaluation. We observe that while the shape methods perform well on visual inspection of the overall shape boundaries, they fall short in meeting the needs of determining similarity between the vertebral shapes based on the pathology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the development of a human brain multimedia database for surgical candidacy determination in temporal lobe epilepsy. The focus of the paper is on content-based image management, navigation and retrieval. Several medical image-processing methods including our newly developed segmentation method are utilized for information extraction/correlation and indexing. The input data includes T1-, T2-Weighted MRI and FLAIR MRI and ictal and interictal SPECT modalities with associated clinical data and EEG data analysis. The database can answer queries regarding issues such as the correlation between the attribute X of the entity Y and the outcome of a temporal lobe epilepsy surgery. The entity Y can be a brain anatomical structure such as the hippocampus. The attribute X can be either a functionality feature of the anatomical structure Y, calculated with SPECT modalities, such as signal average, or a volumetric/morphological feature of the entity Y such as volume or average curvature. The outcome of the surgery can be any surgery assessment such as memory quotient. A determination is made regarding surgical candidacy by analysis of both textual and image data. The current database system suggests a surgical determination for the cases with relatively small hippocampus and high signal intensity average on FLAIR images within the hippocampus. This indication pretty much fits with the surgeons’ expectations/observations. Moreover, as the database gets more populated with patient profiles and individual surgical outcomes, using data mining methods one may discover partially invisible correlations between the contents of different modalities of data and the outcome of the surgery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the use of Wavelet function technique to compress and storage the electroencephalographic (EEG) signal into a multichannel EEG system. The system consists of such components: multichannel bio-amplifier, analog filters, ADC, microprocessor, DSP, PCMCIA memory, etc. The algorithms to compress EEG signal have been implemented using language C/C++. The proposed digital FIR filter to compress the signal has own coefficients chosen as the coefficients of Daubechies Wavelets. The results of the experiments with implemented procedures have shown the compression ratio and SNR values for EEG signal in the case of real time compression. Values of real time compressing and storing parameters are presented when DSP and AMD586 processor used. The Backpropagation Neural Network was used to carry out the identification of EEG Patterns in the case of epilepsy illness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an algorithm for relevance feedback in region-based image retrieval systems. In region-based image retrieval systems one image usually represented by many regions, each region is represented by a feature vector. Because traditional region-based feedback algorithms are based on the one-vector model, it is hard to directly use past feedback algorithms to a region-based image retrieval system. In this paper we propose a novel feedback algorithm using clustering among regions in all feedback images on region-based image retrieval systems. All regions in one image are divided into two parts: the foreground regions and the background regions based on the feedback images. Here foreground regions stand for the common property of all feedback images, which can be viewed as being of one semantic category. The others are background regions, which may stand for different semantic categories. During feedback, we treat the two kinds of regions with different manner. Experimental results show that using such algorithm improves the retrieval performance of region based image system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Relevance feedback in content-based image retrieval has been an active research focus for many years. It uses user-labeled information to re-adjust the measurement of similarity between images to get the improved retrieval results. In this paper we propose a simple and effective approach for image relevance feedback, which uses both positive and negative examples labeled by users to refine the query and update the distance measurement dynamically. Our method not only has a very low complexity but also adapts well to the changes of user’s retrieval interests. Experimental results on a database of 7,000 images represented by MPEG-7 color and texture descriptors show the efficiency of our algorithm compared with other two existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A Data Warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support or OLAP queries. It is important to select the right view to materialize that answer a given set of queries. In this paper, we have addressed and designed algorithm to select a set of views to materialize in order to answer the most queries under the constraint of a given space. The algorithm presented in this paper aim at making out a minimum set of views, by which we can directly respond to as many as possible user’s query requests. We use experiments to demonstrate our approach. The results show that our algorithm works better. We implemented our algorithms and a performance study of the algorithm shows that the proposed algorithm gives a less complexity and higher speeds and feasible expandability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Clustering is considered as one of the most important tools to organize and analyze large multimedia databases. In Content Based Image Retrieval (CBIR), Clustering can be used to categorize a large collection of images. This organization can be used to: (i) build an indexing structure; (ii) build a navigation system; or (iii) show the user the most representative images in a query by visual example. Most existing clustering techniques assume that the clusters have well-defined shapes (spherical or ellipsoidal). Thus, they are not suitable for image database categorization where images are usually mapped to high-dimensional feature vectors, and it is hard to even guess the shape of the clusters in the feature space. In this paper, we first describe a clustering approach, called SyMP, that can identify clusters of various shapes. Then, we demonstrate its ability to generate an efficient and compact summary of an image database. SyMP is based on synchronization of pulse-coupled oscillators. It is robust to noise and outliers, determines the number of clusters in an unsupervised manner, and identifies clusters of arbitrary shapes. The robustness of SyMP is an intrinsic property of the synchronization mechanism. To determine the optimum number of clusters, SyMP uses a dynamic and cluster dependent resolution parameter. To identify clusters of various shapes, SyMP models each cluster by an ensemble of Gaussian components.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Anchoring is a technique for representing objects by their distances to a few well chosen landmarks, or anchors. Objects are mapped to distance-based feature vectors, which can be used for content-based retrieval, classification, clustering, and relevance feedback of images, audio, and video. The anchoring transformation typically reduces dimensionality and replaces expensive similarity computations in the original domain with simple distance computations in the anchored feature domain, while guaranteeing lack of false dismissals. Anchoring is therefore surprisingly simple, yet effective, and flavors of it have seen application in speech recognition, audio classification, protein homology detection, and shape matching.
In this paper, we describe the anchoring technique in some detail and study methods for anchor selection, both from an analytical, as well as empirical, standpoint. Most work to date has largely ignored this problem by fixing the anchors to be the entire set of objects or by using greedy selection from among the set of objects. We generalize previous work by considering anchors from outside of the object space, and by deriving an analytical upper bound on the distance-approximation error of the method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content-based image retrieval has become an active research topic for more than one decade. Nevertheless, current image retrieval systems still have major difficulties bridging the gap between the user’s implied concept and the low-level image description. To address the difficulties, this paper presents a novel image retrieval model integrating long-term learning with short-term learning. This model constructs a semantic image link network by long-term learning which simply accumulates previous users’ relevance feedback. Then, the semantic information learned from long-term learning process guides short-term learning of a new user. The image retrieval is based on a seamless joint of both long-term learning and short-term learning. The model is easy to implement and can be efficiently applied to a practical image retrieval system. Experimental results on 10,000 images demonstrate that the proposed model is promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.