PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Among some of the most popular multimedia formats available today
are: QuickTime, Shockwave, Advanced Streaming Format, RealVideo and {MPEG-4}. Since broadband Internet became widely available, these multimedia formats have strongly evolved and are extremely popular. This article analyzes these formats based on an existing reference model. This reference model is built on the state-of-the-art in three areas: temporal models, computer based descriptions and synchronization mechanisms. Out of these three areas a set of 10 criteria describing the reference model was created. In this paper we first shortly explain the reference model and it's ten criteria. Then each of the listed multimedia formats is mapped onto the reference model. Finally, a comparison based on the reference model is given. In the conclusions section we point out some of the strong and some of the weak points for the different multimedia formats based on the comparison.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The number of terminals that have access to multimedia content by means of a network is rapidly increasing. More and more, the characteristics of different terminals are increasing in variety. In addition, their users can have different preferences. Therefore, the adaptation of multimedia content to a specific terminal and/or its user has become an important research issue. Such an adaptation is mainly based on two aspects: the description of the multimedia content and the description of the user environment. Both can be considered as metadata, and can be formatted in an XML language (e.g., MPEG-7 and CC/PP). However, it is not yet clear how we can realize a generic mapping mechanism between two such vocabularies. We feel that such a mechanism is necessary to accomplish a mature content adaptation framework. This paper describes how such a mechanism can be achieved. We attach requirements and preferences of the user environment to specific aspects of the description of multimedia content. Based on this information, we try to maximize the value of the adapted content, while making it appropriate for the terminal. We also take into account the extensibility of the existing vocabularies we focus on, because this means our mechanism will also be extensible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A comprehensive approach to the access of archival collections necessitates the interplay of various types of metadata standards. Each of these standards fulfills its own part within the context of a 'metadata infrastructure'. Besides this, it should be noted that present-day digital libraries are often limited to the management of mainly textual and image-based material. Archival Information Systems dealing with various media types are still very rare. There is a need for a methodology to deal with time-dependant media within an archival context. The aim of our research is to investigate and implement a number of tools supporting the content management multimedia data within digital collections. A flexible and extendible framework is proposed, based on the emerging Metadata Encoding and Transmission Standard (METS). Firstly, we will focus on the description of archival collections according to the archival mandates of provenance for the benefit of an art-historical research in an archive-theoretically correct manner. Secondly, we will examine the description tools that represent the semantics and structure of multimedia data. In this respect, an extension of the present archival metadata framework has been proposed to time-based media content delivered via standards such as the MPEG-7 multimedia content description standard.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major challenges in digital, interactive television is to provide facilities for an intelligent multimedia presentation at the consumer terminal. The end-user shall benefit from a web-page like structure, whose content is browsable, rather than one monolithic broadcast stream without either any interaction facilities or content adaptation models. Therefore we introduce a Digital Broadcast Item (DBI) that structures the broadcast content into an interactable intelligent multimedia presentation: along the push content (broadcast stream) consisting of a video/audio stream adaptive content elements are streamed by the help of binarized metadata streaming solutions and synchronized to the audio/video stream. As far broadcasting only provided content as monolithic structure, composed of an image flow, graphics, special effects, sound effects, single path story flow, etc. The transport medium utilized is a high-bit rate MPEG-2 Transport Stream (MPEG-2 TS) carrying audio/video and some low level metadata, such as an Electronic Programme Guide (EPG).The aim of this research paper is to show and prove the concept of realizing adaptive content customisation for white pre-marked rectangular areas in multiple orientations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the number of installed surveillance cameras increases, and the cost of storing the compressed digital multimedia decreases, the CCTV industry is facing the prospect of large multimedia archives where it may be very difficult to locate specific content. To be able to get the full benefit of this wealth of multimedia data, we need to be able to automatically highlight events of interest to the operator in real-time. We also need to make it possible to quickly identify and retrieve content which meets particular criteria. We show how advances in the Internet and multimedia systems can be used to effectively analyze, tag, store, search and retrieve multimedia content in surveillance systems. IP cameras are utilized for multimedia compression and delivery over the Internet or intranet. The recorded multimedia is analyzed in real-time, and metadata descriptors are automatically generated to describe the multimedia content. The emerging ISO MPEG-7 standard is used to define application-specific multimedia descriptors and description schemes, and to enforce a standard Description Definition Language (DDL) for multimedia management. Finally, a graphical multimedia retrieval application is used to provide content-based searching, browsing, retrieval and playback over the Internet or intranet.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper center around the problem of automated visual content classification. To enable classification based image or visual object retrieval, we propose a new image representation scheme called visual context descriptor (VCD) that is a multidimensional vector in which each element represents the frequency of a unique visual property of an image or a region. VCD utilizes the predetermined quality dimensions (i.e., types of features and quantization level) and semantic model templates mined in priori. Not only observed visual cues, but also contextually relevant visual features are proportionally incorporated in VCD. Contextual relevance of a visual cue to a semantic class is determined by using correlation analysis of ground truth samples. Such co-occurrence analysis of visual cues requires transformation of a real-valued visual feature vector (e.g., color histogram, Gabor texture, etc.,) into a discrete event (e.g., terms in text). Good-feature to track, rule of thirds, iterative k-means clustering and TSVQ are involved in transformation of feature vectors into unified symbolic representations called visual terms. Similarity-based visual cue frequency estimation is also proposed and used for ensuring the correctness of model learning and matching since sparseness of sample data causes the unstable results of frequency estimation of visual cues. The proposed method naturally allows integration of heterogeneous visual or temporal or spatial cues in a single classification or matching framework, and can be easily integrated into a semantic knowledge base such as thesaurus, and ontology. Robust semantic visual model template creation and object based image retrieval are demonstrated based on the proposed content description scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, there has been a growing interest in developing effective methods for searching large image databases based on image content. A commonly used method is search-by-query, that is often not satisfactory. Often it is difficult to find or produce good query images or repetitive queries tend to become trapped among a small group of undesirable images. To overcome these problems the user is to be provided with easy and intuitive access to information in image databases. In this paper we present a new browsing environment, which uses the metaphor of maps. Like street maps with different scales, from a world map to a city map, the image space is represented through
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video databases became an active field of research during the last decade. The main objective in such systems is to provide users with capabilities to friendly search, access and playback distributed stored video data in the same way as they do for traditional distributed databases. Hence, such systems need to deal with hard issues : (a) video documents generate huge volumes of data and are time sensitive (streams must be delivered at a specific bitrate), (b) contents of video data are very hard to be automatically extracted and need to be humanly annotated. To cope with these issues, many approaches have been proposed in the literature including data models, query languages, video indexing etc. In this paper, we present SIRSALE : a set of video databases management tools that allow users to manipulate video documents and streams stored in large distributed repositories. All the proposed tools are based on generic models that can be customized for specific applications using ad-hoc adaptation modules. More precisely, SIRSALE allows users to : (a) browse video documents by structures (sequences, scenes, shots) and (b) query the video database content by using a graphical tool, adapted to the nature of the target video documents. This paper also presents an annotating interface which allows archivists to describe the content of video documents. All these tools are coupled to a video player integrating remote VCR functionalities and are based on active network technology. So, we present how dedicated active services allow an optimized video transport for video streams (with Tamanoir active nodes). We then describe experiments of using SIRSALE on an archive of news video and soccer matches. The system has been demonstrated to professionals with a positive feedback. Finally, we discuss open issues and present some perspectives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multimedia database interfaces should be designed to be very user-adaptive, since there is no generally applicable model of user's search behavior or of his search intention. First, the challenging task for the interface is to present the most representative objects in an appealing and concise manner. Second, the interface has to identify the user's search intention from very few positive feedbacks. In particular for the latter there exist a lot of Relevance Feedback imple-mentations.
While most of them are considered as more or less heuristically proved parameter adjustment procedures, we treat Relevance Feedback as direct probability density estimation. Our density is defined as the
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the traditional user interfaces of Query-by-Example (QbE) in Content-Based Image Retrieval (CBIR), finding good combinations of query examples is essential for successful retrieval. Unfortunately, the traditional user interfaces are not suitable for trying different combinations of query examples. These systems assumed query examples are added incrementally. The only way to refine the query was to add more example images. When no additional example is found in the result set, the search is considered to have converged. Furthermore, no place was provided to hold previous query results.
We are developing ImageGrouper, a new interface for Content-based image retrieval. In this system, the users can interactively compare different combinations of query examples by dragging and grouping images on the workspace (Query-by-Group.) Unlike the traditional systems, a group of images is considered as the basic unit of the query. Because the query results are displayed on another pane, the user can quickly review the results. Combining different query results is also easier.
ImageGrouper makes possible new image search methods that were difficult in the traditional user interfaces. First, since the user can annotate text information on each group, integration of keyword-based and content-based search becomes easy. Second, by creating a hierarchy of query examples, the user can begin with collecting relatively generic image first, then narrow down the search to more specific images. Finally, the user can create multiple groups of positive examples. Therefore, we can extend relevance feedback algorithms from two-classes problem (positive and negative) to multiple-classes problem (multiple positive and negative classes.)
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
MPEG-21 is an emerging standard from MPEG that specifies a framework for transactions of multimedia content. MPEG-21 defines the fundamental concept known as a digital item, which is the unit of transaction in the multimedia framework. A digital item can be used to package content for such as a digital photograph, a video clip or movie, a musical recording with graphics and liner notes, a photo album, and so on. The packaging of the media resources, corresponding identifiers, and associated metadata is provided in the declaration of the digital item. The digital item declaration allows for more effective transaction, distribution, and management of multimedia content and corresponding metadata, rights expressions, variations of media resources. In this paper, we describe various challenges for multimedia content management in the MPEG-21 framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The design of an effective architecture for image retrieval requires careful consideration of the interplay between the three major components of a retrieval system: feature transformation, feature
representation, and similarity function. We present a review of
ongoing work on a decision theoretic formulation of the retrieval problem that enables the design of systems where all components are optimized with respect to the same end-to-end performance criteria: the minimization of the probability of retrieval error. In addition to some previously published results on the theoretical characterization of the impact of the feature transformation and representation in the probability of error, we present an efficient algorithm for optimal feature selection. Experimental results show that decision-theoretic retrieval performs well on color, texture, and generic image databases in terms of both retrieval accuracy and perceptual relevance of similarity judgments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a psychophysical and analytical framework for the comparison of the performance of different analytical measures of motion activity in video segments with respect to a subjective ground truth. We first construct a test-set of video segments and conduct a psychophysical experiment to obtain a ground truth for the motion activity. Then we present several low-complexity motion activity descriptors computed from compressed domain block motion vectors. In the first analysis, we quantize the descriptors and show that they perform well against the ground truth. We also show that the MPEG-7 motion activity descriptor is among the best. In the second analysis, we find the pairs of video segments for which the human subjects unanimously rate one as higher activity than the other. Then we examine the specific cases where each descriptor fail to give the correct ordering. We show that the distance from camera, and strong camera motion are main cases where motion vector based descriptors tend to overestimate or underestimate the intensity of motion activity. We finally discuss the experimental methodology and analysis methods we used and possible alternatives. We review the applications of motion activity and how the results presented here relate to those applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Media analysis for video indexing is witnessing an increasing influence of statistical techniques. Examples of these techniques include the use of generative models as well as discriminant techniques for video structuring, classification, summarization, indexing, and retrieval. Advances in multimedia analysis are
related directly to advances in signal processing, computer vision, pattern recognition, multimedia databases, and smart sensors. This paper highlights the statistical techniques in
multimedia retrieval with particular emphasis on semantic characterization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of identifying speakers for movie content analysis is
addressed in this paper. While most previous work on speaker
identification was carried out in a supervised mode using pure
audio data, more robust results can be obtained in real-time by
integrating knowledge from multiple media sources in an
unsupervised mode. In this work, both audio and visual cues will
be employed and subsequently combined in a probabilistic framework
to identify speakers. Particularly, audio information is used to
identify speakers with a maximum likelihood (ML)-based approach
while visual information is adopted to distinguish speakers by
detecting and recognizing their talking faces based on face
detection/recognition and mouth tracking techniques. Moreover, to
accommodate for speakers' acoustic variations along time, we
update their models on the fly by adapting to their newly
contributed speech data. Encouraging results have been achieved
through extensive experiments, which shows a promising future of
the proposed audiovisual-based unsupervised speaker identification
system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Face detection and recognition is becoming increasingly important in the contexts of surveillance,credit card fraud detection,assistive devices for visual impaired,etc. A number of face recognition algorithms have been proposed in the literature.The availability of a comprehensive face database is crucial to test the performance of these face recognition algorithms.However,while existing publicly-available face databases contain face images with a wide variety of poses angles, illumination angles,gestures,face occlusions,and illuminant colors, these images have not been adequately annotated,thus limiting their usefulness for evaluating the relative performance of face detection algorithms. For example,many of the images in existing databases are not annotated with the exact pose angles at which they were taken.In order to compare the performance of various face recognition algorithms presented in the literature there is a need for a comprehensive,systematically annotated database populated with face images that have been captured (1)at a variety of pose angles (to permit testing of pose invariance),(2)with a wide variety of illumination angles (to permit testing of illumination invariance),and (3)under a variety of commonly encountered illumination color temperatures (to permit testing of illumination color invariance). In this paper, we present a methodology for creating such an annotated database that employs a novel set of apparatus for the rapid capture of face images from a wide variety of pose angles and illumination angles. Four different types of illumination are used,including daylight,skylight,incandescent and
fluorescent. The entire set of images,as well as the annotations and the experimental results,is being placed in the public domain,and made available for download over the worldwide web.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose the use of approximate digital signatures of selected
multimedia feature vectors for fast content based retrieval in
very large multimedia databases. We adapt and extend the
Approximate Message Authentication Code (AMAC), introduced by some
of the authors recently in the area of message authentication, to
the multimedia searching problem. An AMAC is a binary signature
with the ability to reflect changes in the message it represents.
The hamming distance between two AMACs is used to measure the
degree of the similarity between multimedia objects. We develop a
method to compress AMAC signatures to create a direct lookup table
that allows fast searching of a database. The color histogram is
used as the example feature space to show how the signature is applied. Experimental results show that the performance of the
proposed method is comparable with existing methods based on other
popular metrics, but significantly decreases search
time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Content-based image retrieval (CBIR) is becoming a hot research point in the field of multimedia information retrieval. Interest points are local features with high informational content. So, this paper proposes a novel method for image retrieval using interest points, which contains three key stages: interest points detection, image features description based on interest points and similarity measure between two images. In the process of detecting interest points, firstly, we use a self-adaptive filter to smooth image, and then use detector to find interest points. In the stage of image features description, we design a histogram to represent image, which contains local gray changes of interest points, mutual position relations among interest points and interest points distribution of the whole image. In the stage of similarity measure, we use the distance between two histograms to calculate similarity between two images. Lots of experimental results based on a database containing 1500 images demonstrate our proposed approach is efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Technology in the field of digital media generates huge amounts of
non-textual information that proves to be very difficult to index. The key
problem in achieving efficient and user-friendly retrieval in the domain
of multimedia images is the development of a search mechanism to guarantee
delivery of minimal irrelevant information (high precision) while insuring
that relevant information is not overlooked (high recall). To provide more
accurate search results we propose a system that examines the
relationships among objects in images to help achieve a more detailed
understanding of the content and meaning of individual images. We solve
the problem of creating a meaning based index structure through the design
and implementation of a concept-based model using domain dependent
ontologies. The system converts objects to their meaning by identifying
appropriate concepts that both describe and identify images. The system
poses the ability to automatically select concepts using a disambiguation
algorithm that prune irrelevant concepts and allows relevant ones to
associate with images. The system uses a neural network to successfully
identify objects present in images. Once identified by the neural network,
the objects are fed into the domain-dependent ontologies for high
precision classification of the image based on its contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The World Wide Web has become an essential tool as well as an entertainment medium in our daily life. However, efficient web browsing from anywhere at anytime, presents an opportunity as well as a challenge to the web document authors. The mobile users have a totally different requirement compared to the conventional wired users. The mobile networks have limited bandwidths, and the mobile terminals have limited resolutions. Hence a scheme is needed to adapt the web-data and multimedia contents such that they can be used on devices with limited resources such as bandwidth, resolution and memory space. In this paper we propose a novel mobile proxy server architecture suitable resource-limited multimedia terminals. Here, a distributed method is employed to adapt the multimedia data such that the adaptation is neither fully based on the proxy server nor fully based on the web server. Adaptation process is expected to be faster since the data adaptation loads are distributed between the proxy and the web server. Simulation results suggest that a significant performance improvement can be achieved with the proposed architecture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present the novel scheduling algorithm of the multimedia data retrieval for the mobile disk drive. Our algorithm is focused on minimizing the power consumption in multimedia data retrieval.
While the disk based storage devices, e.g. hard disk and optical disk becomes small enough to be used in mobile devices, the practical usage of which leaves much to be desired due to the stringent power consumption restriction of the mobile device.
The playback of multimedia data requires that data blocks are delivered to the destination in periodic fashion. The major issue here has been how to guarantee the continuous flow of data. Most of preceding works assume that the disk drive always operates in the steady state. However, this does not hold in modern disk drive for the mobile device. Modern low power disk drive for mobile device goes into standby state when it is not in use. While this feature can significantly extend the battery life, it adds another dimension of complexity in scheduling of the multimedia data retrieval.
We elaborately model the power consumption behavior of the low power mobile drive and develop an Adaptive Round Merge(ARM) scheduling algorithm which guarantees a certain disk bandwidth for the multimedia playback while minimizing the power consumption of the storage device.
According to our simulation based experiment, the ARM algorithm reduces the power consumption by as much as 23%. It manifests itself when the video clip is relatively short, typically less than 30 sec.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current explosive expansion of mobile communication systems will lead to an increased demand for multimedia applications. However, due to the large variety of mobile terminals (such as mobile phones, laptops .) and, because of this, a wide collection of different terminal possibilities and terminal characteristics, it is difficult to create a mobile multimedia application which can be used on mobile devices of different types. In this paper, we propose a mobile multimedia application that adapts its content to the possibilities of the mobile terminal and to the end-user preferences. Also the application takes changing device characteristics into account. To make this possible, a software framework is set up to enable negotiation between the mobile terminal and the content server. During the initial negotiation, the concept of the Universal Multimedia Access framework is used. Subsequent negotiations take place after changing terminal characteristics or end-user preferences, and this by means of time-dependent metadata. This newly created flexible and extendable framework makes it possible that multimedia applications interact with the content provider in order to deliver an optimal multimedia presentation for any arbitrary mobile terminal at any given time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A video personalization and summarization system is designed and implemented incorporating usage environment to dynamically generate a personalized video summary. The personalization system adopts the three-tier server-middleware-client architecture in order to select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. Our semantic metadata is provided through the use of the VideoAnnEx MPEG-7 Video Annotation Tool. When the user initiates a request for content, the client communicates the MPEG-21 usage environment description along with the user query to the middleware. The middleware is powered by the personalization engine and the content adaptation engine. Our personalization engine includes the VideoSue Summarization on Usage Environment engine that selects the optimal set of desired contents according to user preferences. Afterwards, the adaptation engine performs the required transformations and compositions of the selected contents for the specific usage environment using our VideoEd Editing and Composition Tool. Finally, two personalization and summarization systems are demonstrated for the IBM Websphere Portal Server and for the pervasive PDA devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a framework, motivated by rate-distortion theory and the human visual system, for optimally representing the real world given limited video resolution. To provide users with high fidelity views, we built a hybrid video camera system that combines a fixed wide-field panoramic camera with a controllable pan/tilt/zoom (PTZ) camera. In our framework, a video frame is viewed as a limited-frequency representation of some "true" image function. Our system combines outputs from both cameras to construct the highest fidelity views possible, and controls the PTZ camera to maximize information gain available from higher spatial frequencies. In operation, each remote viewer is presented with a small panoramic view of the entire scene, and a larger close-up view of a selected region. Users may select a region by marking the panoramic view. The system operates the PTZ camera to best satisfy requests from multiple users. When no regions are selected, the system automatically operates the PTZ camera to minimize predicted video distortion. High-resolution images are cached and sent if a previously recorded region has not changed and the PTZ camera is pointed elsewhere. We present experiments demonstrating that the panoramic image can effectively predict where to gain the most information, and also that the system provides better images to multiple users than conventional camera systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An Internet-based interactive walkthrough virtual environment is
presented in this work to facilitate interactive streaming and
browsing of 3D graphic models across the Internet. The models are
compressed by the view-dependent progressive mesh compression
algorithm to enable the decorrelation of partitions and finer
granularity. Following the fundamental framework of mesh
representation, an interactive protocol based on the real time
streaming protocol (RTSP) is developed to enhance the interaction
between the server and the client. Finally, the data of the
virtual world is re-organized and transmitted according to the
viewer's requests. Experimental results demonstrate that the
proposed algorithm reduces the required transmission bandwidth,
and provides an acceptable visual quality even at low bit rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to skewed popularity of objects in many continuous media
applications, data placement techniques such as selective
replication have been introduced to resolve potential load
imbalance problem by providing more replicas for more popular
objects, resulting in a higher availability of hot objects and a
more efficient usage of bounded storage space. To fully harness
the advantage of selective replication technique, one may need to
periodically reconfigure the number of instances of objects and
data placement of them to tune up the system performance because,
in reality, access frequency varies over time in many reasons.
Reconfiguration usually requires time and disk bandwidth resulting
in a degradation of the system performance during the process.
This paper proposes algorithms for dynamic reconfiguration of
continuous media servers based on ever changing popularity of
objects. This paper quantifies the expected startup latency and
reconfiguration overhead. Proposed analytic models and simulation
results demonstrate that the proposed reconfiguration process is
feasible in a reasonable amount of time. They also show tolerable
performance degradation due to bandwidth overhead during
reconfiguration process, which is critical for most real
applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, the notion of multimedia ship area network is developed. The review of possible multimedia services, the possible physical layers, the different IP architectures as well as different levels of service, and finally the resulting simulations for hundreds of end-users are presented and commented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate mechanisms for the video streaming system where proxies cooperate to provide users with low-latency and high-quality services under heterogeneous and mobile environment where hosts have different capabilities and dynamically change their locations. The proxy is capable of adapting incoming or cached video data to user's demand by means of transcoders and filters. With such proxies, heterogeneous QoS requirements on the delivered stream can be fully satisfied by preparing high-quality video data in the local cache buffer and adjust them to the requirements. On receiving a request from a user, the proxy first checks the cache. If no
appropriate data is available, it retrieves the video data of the satisfactory quality from the video server or proxies nearby. The proxy communicates with the others and finds an appropriate one for data retrieval by taking into account the transfer delay and the video quality. We propose a cooperative video caching mechanism for the video streaming system and evaluate the performance in terms of the delay introduced and the video quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To meet QoS for multimedia transmission over packet-lossy network such as IP networks, two ways can be followed, either the source scalability is extended to packets, or multiple description schemes are used. In this case, equivalence between packets is assumed and forward error correction is needed. In this paper the proposed solution allows multiple description of a scalable bitstream source using a backprojection operator. This operator belongs to the class of the Mojette transforms, already presented in ITCom2001. In this scheme a redundant projections set is firstly computed for different angles. In a second step only few projections are selected to check the reconstructibility (quantization step). Third an entropic coding on remaining projections is applied. The Mojette transform is an exact discrete Radon transform generating bins from
ixels (information elements) computed as XOR or standard additions. This transform is linear (in number of pixels and number of projections) both for coding and decoding. In this new scheme we propose, sub flows (when assuming source scalability) issued from the application output bitstreams are mapped into buffers. The projections issued from these buffers meet both the compression of the bitstreams and the
multiple descriptions constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.