This PDF file contains the front matter associated with SPIE Proceedings Volume 6504, including the Title Page, Copyright information, Table of Contents, Introduction, and the Conference Committee listing.
We present optimal schemes for allocating bits of fine-grained scalable video sequences among multiple senders streaming
to a single receiver. This allocation problem is critical in optimizing the perceived quality in peer-to-peer and distributed
multi-server streaming environments. Senders in such environments are heterogeneous in their outgoing bandwidth and
they hold different portions of the video stream. We formulate the allocation problem as an optimization problem, which
is nonlinear in general. We use rate-distortion models in the formulation to achieve the minimum distortion in the rendered
video, constrained by the outgoing bandwidth of senders, availability of video data at senders, and incoming bandwidth of
receiver. We show how the adopted rate-distortion models transform the nonlinear problem to an integer linear programming
(ILP) problem. We then design a simple rounding scheme that transforms the ILP problem to a linear programming
(LP) one, which can be solved efficiently using common optimization techniques such as the Simplex method. We prove
that our rounding scheme always produces a feasible solution, and the solution is within a negligible margin from the
optimal solution. We also propose a new algorithm (FGSAssign) for the allocation problem that runs in O(n log n) steps,
where n is the number of senders. We prove that FGSAssign is optimal. Because of its short running time, FGSAssign can
be used in real time during the streaming session. Our experimental study validates our analytical analysis and shows the
effectiveness of our allocation algorithm in improving the video quality.
A ubiquitous computing concept permits end users to have access to multimedia and digital content anywhere, anytime and in any way they want. As a consequence, the importance of resource customization according to user preferences and device requirements set the primary challenge towards seamless access. Moreover, once a suitable customization approach has been decided (e.g. adaptation), deploying it in the existing network requires a generic and widely accepted standard applied to the process. With the advancement of time, performing encryption in the compressed domain should also be taken care of not only for serving sensitive digital contents but also for offering security as an embedded feature of the adaptation practice to ensure digital right management and confidentiality. In this paper, we present an architecture for temporal adaptation of ITU-T H.264 video conforming to ISO/IEC MPEG-21 DIA. In addition, we present a perceptual encryption scheme that is integrated in the system for video encryption. The framework enables video bitstreams to be adapted and encrypted in the compressed domain, eliminating cascaded adaptation (i.e. decoding - adaptation - encoding). The encryption framework is applied on the adapted video content, which reduces computational overhead compared to that on the original content. A prototype, based on the proposed architecture and experimental evaluations of the system as well as its performance supporting the architecture are also presented.
The MPEG-4 Fine Grained Scalability (FGS) profile aims at scalable layered video encoding, in order to ensure efficient
video streaming in networks with fluctuating bandwidths. In this paper, we propose a novel technique, termed as FMOEMR,
which delivers significantly improved rate distortion performance compared to existing MPEG-4 Base Layer
encoding techniques. The video frames are re-encoded at high resolution at semantically and visually important regions
of the video (termed as Features, Motion and Objects) that are defined using a mask (FMO-Mask) and at low resolution
in the remaining regions. The multiple-resolution re-rendering step is implemented such that further MPEG-4
compression leads to low bit rate Base Layer video encoding. The Features, Motion and Objects Encoded-Multi-
Resolution (FMOE-MR) scheme is an integrated approach that requires only encoder-side modifications, and is
transparent to the decoder. Further, since the FMOE-MR scheme incorporates "smart" video preprocessing, it requires
no change in existing MPEG-4 codecs. As a result, it is straightforward to use the proposed FMOE-MR scheme with any
existing MPEG codec, thus allowing great flexibility in implementation. In this paper, we have described, and
implemented, unsupervised and semi-supervised algorithms to create the FMO-Mask from a given video sequence, using
state-of-the-art computer vision algorithms.
Transmitting high-quality, real-time interactive video over lossy networks is challenging because data loss due to the network can severely degrade video quality. A promising feedback technique for low-latency video repair is Reference Picture Selection (RPS), whereby the encoder selects one of several previous frames as a reference frame for predictive encoding of subsequent frames. RPS can operate in two different modes: an optimistic policy that uses negative acknowledgements (NACKs) and a more conservative policy that relies upon positive acknowledgements (ACKs). The choice between RPS ACK mode and NACK mode to some extent depends upon the effects of reference distance on the encoded video quality. This paper provides a systematic study of the effects of reference distance on video quality for a range of video coding conditions. High-quality videos with a wide variety of scene complexity and motion characteristics are selected and encoded using H.264 with a bandwidth constraint and a range of reference distances. Two objective measures of video quality, PSNR and VQM, are analyzed to show that scene complexity and motion characteristics of the video determine the amount of degradation in quality as reference distance increases. In particular, videos with low motion degrade in quality more with an increase in reference distance since they cannot take advantage of the strong similarity between adjacent frames. Videos with high motion do not suffer as much with higher reference distance since the similarity between adjacent frames is already low. The motion characteristics also determine the initial quality under the bandwidth constraint. The data presented should be useful for selecting ACK or NACK mode or for modeling video repair techniques.
In this paper we propose an algorithm for predicting a person's perceptual attention focus (PAtF) through the use of a
Kalman Filter design of the human visual system. The concept of the PAtF allows significant reduction of the bandwidth
of a video stream and computational burden reduction in the case of 3D media creation and transmission. This is
possible due to the fact that the human visual system has limited perception capabilities and only 2 degrees out of the
total of 180 provide the highest quality of perception. The peripheral image quality can be decreased without a viewer
noticing image quality reduction. Multimedia transmission through a network introduces a delay. This delay reduces the
benefits of using a PAtF due to the fact that the person's attention area can change drastically during the delay period,
thus increasing the probability of peripheral image quality reduction being detected. We have created a framework which
uses a Kalman Filter to predict future PAtFs in order to compensate for the delay/lag and to reduce the
bandwidth/creation burden of any visual multimedia.
Peer to peer (P2P) systems are traditionally designed to scale to a large number of nodes. However, we focus on scenarios where the sharing is effected only among neighbors. Localized sharing is particularly attractive in scenarios where wide area network connectivity is undesirable, expensive or unavailable. On the other hand, local neighbors may not offer the wide variety of objects possible in a much larger system. The goal of this paper is to investigate a P2P system that shares contents with its neighbors. We analyze the sharing behavior of Apple iTunes users in an University setting. iTunes restricts the sharing of audio and video objects to peers within the same LAN sub-network. We show that users are already making a significant amount of content available for local sharing. We show that these systems are not appropriate for applications that require access to a specific object. We argue that mechanisms that allow the user to specify classes of interesting objects are better suited for these systems. Mechanisms such as bloom filters can allow each peer to summarize the contents available in the neighborhood, reducing network search overhead. This research can form the basis for future storage systems that utilize the shared storage available in neighbors and build a probabilistic storage for local consumption.
This paper experimentally examines the performance of streaming media applications over a CDMA2000 1xEV-DO
network. The performance of streaming in a cellular network is tested across three different levels of mobility,
two applications, and the two transport layer protocols, TCP and UDP. Findings of this study are that streaming
applications are impacted more by sources of interference such as high-rise buildings than by increased velocity.
Also, when the mobile client is stationary, high data rates and high video quality are consistently achieved. We
also find that for the streaming applications considered, UDP streams outperform TCP streams, consistently
achieving higher bandwidth.
iTVP is a system built for IP-based delivery of live TV programming, video-on-demand and audio-on-demand
with interactive access over IP networks. It has a country-wide range and is designed to provide service to a high
number of concurrent users. iTVP prototype contains the backbone of a two-level hierarchical system designed
for distribution of multimedia content from a content provider to end users. In this paper we present experience
gained during a few months of the prototype operation. We analyze efficiency of iTVP content distribution
system and resource usage at various levels of the hierarchy. We also characterize content access patterns
and their influence on system performance, as well as quality experienced by users and user behavior. In our
investigation, scalability is one of the most important aspects of the system performance evaluation. Although
the range of the prototype operation is limited, as far as the number of users and the content repository is
concerned, we believe that data collected from such a large scale operational system provides a valuable insight
into efficiency of a CDN-type of solution to large scale streaming services. We find that the systems exhibits
good performance and low resource usage.
The increasing demand for multi-modal systems and applications that are highly interactive and multi-sensory in nature
has led to the introduction of new media and new user interface devices in multimedia computing. Computer generated
smell, also known as olfactory data, is one of such media objects currently generating a lot of interest in the multimedia
industry. We are currently focusing our attention on exploring user perception of computer generated smell when
combined with other media to enrich their multimedia experience. In this paper, we present the results of an empirical
study into users' perception of olfactory enhanced multimedia displays. Results showed that users generally enjoy an
enhanced multimedia experience when augmented by olfactory stimuli, and that the presence of such stimuli increases
the sense of relevance. Whilst there is a general positive bias towards olfactory enhanced multimedia applications,
specific properties of smell such as diffusion and lingering mean that in practice specific attention needs to be given
when a mix of smells is associated with visualised multimedia content; moreover, it was found that whilst smell was
incorrectly identified in some instances, the presence of smell per se is generally enough to create a richer user
Computer games are often played on devices with varying display resolutions. While higher resolutions generally provide
more immersive game play they can yield reduced frame rates and/or increased costs, making choosing the optimal resolution
important. Despite this importance, to the best of our knowledge, there has been no extensive study of the effects of
resolution on users playing computer games. This paper presents results from extensive user studies measuring the impact
of resolution on users playing First Person Shooter games. The studies focus on the effects of resolution in conjunction
with low and high contrast virtual environments, full screen and windowed modes and identification of long-range objects.
Analysis indicates resolution has little impact on performance over the range of conditions tested and only matters when
the objects being identified are far away or small and are reduced to too few pixels to be distinguishable.
The number of media streams that can be supported concurrently is highly constrained by the stringent requirements of real-time playback and high transfer rates. To address this problem, media delivery techniques, such as Batching and Stream Merging, utilize the multicast facility to increase resource sharing. The achieved resource sharing depends greatly on how the waiting requests are scheduled for service. Scheduling has been studied extensively when Batching is applied, but up to our knowledge, it has not been investigated in the context of stream merging techniques, which achieve much better resource sharing. In this study, we analyze scheduling when stream merging is employed and propose a simple, yet highly effective scheduling policy, called Minimum Cost First (MCF). MCF exploits the wide variation in stream lengths by favoring the requests that require the least cost. We present two alternative implementations of MCF: MCF-T and MCF-P. We compare various scheduling policies through extensive simulation and show that MCF achieves significant performance benefits in terms of both the number of requests that can be serviced concurrently and the average waiting time for service.
In any large scale distribution architecture, considerable thought needs to be given to resource management, particularly in
the case of high quality TV on-demand. This work presents a globally accessible network storage architecture operating
over a shared infrastructure, termed Video Content Distribution Network (VCDN). The goal of which is to store all TV
content broadcast over a period of time within the network and make it available to clients in an on-demand fashion. This
paper evaluates a number of content placement approaches in terms of their ability to efficiently manage system resources.
Due to the dynamic viewing patterns associated with TV viewing, the effectiveness of content placement is expected to
change over time, therefore so too should the content placement. The placement of content within such a system is the
single most influential factor in resource usage. Intuitively, the further content is placed from a requesting client, the
higher the total bandwidth requirements are. Likewise, the more replicas of an object that are distributed throughout the
network, the higher the storage costs will be. Ideally, the placement algorithm should consider both these resources when
making placement decisions. Another desirable property of the placement algorithm, is that it should be able to converge
on a placement solution quickly. A number of placement algorithms are examined, each with different properties, such as
minimizing delivery path. There are a large number of variables in such a system, which are examined and their impact on
the algorithms performance is shown.
Mobile multimedia computers require large amounts of data storage, yet must consume low power in order to
prolong battery life. Solid-state storage offers low power consumption, but its capacity is an order of magnitude
smaller than the hard disks needed for high-resolution photos and digital video. In order to create a device with
the space of a hard drive, yet the low power consumption of solid-state storage, hardware manufacturers have
proposed using flash memory as a write buffer on mobile systems. This paper evaluates the power savings of such
an approach and also considers other possible flash allocation algorithms, using both hardware- and software-level
flash management. Its contributions also include a set of typical multimedia-rich workloads for mobile systems
and power models based upon current disk and flash technology. Based on these workloads, we demonstrate
an average power savings of 267 mW (53% of disk power) using hardware-only approaches. Next, we propose
another algorithm, termed Energy-efficient Virtual Storage using Application-Level Framing (EVS-ALF), which
uses both hardware and software for power management. By collecting information from the applications and
using this metadata to perform intelligent flash allocation and prefetching, EVS-ALF achieves an average power
savings of 307 mW (61%), another 8% improvement over hardware-only techniques.
Although many overlay and P2P approaches have been proposed to assist large-scale live video streaming, how to ensure
service quality and reliability still remains a challenging issue. Peer dynamics, especially unscheduled node departures,
affect the perceived video quality at a peer node in two ways. In particular, the amplitude of quality fluctuations and the
duration for which stable quality video is available at a node heavily depend on the nature of peer departures in the system.
In this paper, we first propose a service quality model to quantify the quality and stability of a video stream in a P2P
streaming environment. Based on this model, we further develop tree construction algorithms that ensure that every peer
in the collaborative network receives a video stream with a statistical reliability guarantee on quality. A key advantage
of the proposed approach is that we can now explicitly control the quality and stability of the video stream supplied to
every node. This is the fundamental difference of our approach from existing approaches that provide stream stability by
over-provisioning resources allocated to every peer. Also, the proposed tree construction schemes decide the position of
a node in the delivery tree based on both its estimated reliability and upstream bandwidth contribution while striving to
minimize the overall load on the server. Our simulations show that our algorithms use the server resources very efficiently
while significantly improving the video stability at peers.
As multimedia-capable, network-enabled devices become ever more abundant, device heterogeneity and resource sharing
dynamics remain difficult challenges in networked continuous media applications. These challenges often cause the
applications to exhibit very brittle real-time performance. Due to heterogeneity, minor variations in encoding can mean
a continuous media item performs well on some devices but very poorly on others. Resource sharing can mean that
content can work for some of the time, but real-time delivery is frequently interrupted due to competition for resources.
Quality-adaptive approaches seek to preserve real-time performance, by evaluating and executing trade-offs between the
quality of application results and the resources required and available to produce them. Since the approach requires the
applications to adapt the results they produce, we refer to them as elastic real-time applications. In this paper, we use
video as a specific example of an elastic real-time application. We describe a general strategy for CPU adaptation called
Priority-Progress adaptation, which compliments and extends previous work on adaptation for network bandwidth. The
basic idea of Priority-Progress is to utilize priority and timestamp attributes of the media to reorder execution steps, so that
low priority work can be skipped in the event that the CPU is too constrained to otherwise maintain real-time progress.
We have implemented this approach in a prototype video application. We will present benchmark results that demonstrate
the advantages of Priority-Progress adaptation in comparison to techniques employed in current popular video players.
These advantages include better timeliness as CPU utilization approaches saturation, and more user-centric control over
quality-adapation (for example to boost the video quality of selected video in a multi-video scenario). Although we focus
on video in this paper, we believe that the Priority-Progress technique is applicable to other multimedia and other real-time
applications, and can similarly help them address the challenges of device heterogenity and dynamic resource sharing.
The computation and communication abilities of modern platforms are enabling increasingly capable cooperative
distributed mobile systems. An example is distributed multimedia processing of sensor data in robots deployed
for search and rescue, where a system manager can exploit the application's cooperative nature to optimize the
distribution of roles and tasks in order to successfully accomplish the mission. Because of limited battery capacities,
a critical task a manager must perform is online energy management. While support for power management
has become common for the components that populate mobile platforms, what is lacking is integration and explicit
coordination across the different management actions performed in a variety of system layers. This papers
develops an integration approach for distributed multimedia applications, where a global manager specifies both
a power operating point and a workload for a node to execute. Surprisingly, when jointly considering power and
QoS, experimental evaluations show that using a simple deadline-driven approach to assigning frequencies can
be non-optimal. These trends are further affected by certain characteristics of underlying power management
mechanisms, which in our research, are identified as groupings that classify component power management as
"compatible" (VFC) or "incompatible" (VFI) with voltage and frequency scaling. We build on these findings
to develop CompatPM, a vertically integrated control strategy for power management in distributed mobile
systems. Experimental evaluations of CompatPM indicate average energy improvements of 8% when platform
resources are managed jointly rather than independently, demonstrating that previous attempts to maximize
battery life by simply minimizing frequency are inappropriate from a platform-level perspective.
The MPEG-21 standard defines a framework for the interoperable delivery and consumption of multimedia content.
Within this framework the adaptation of content plays a vital role in order to support a variety of terminals and to
overcome the limitations of the heterogeneous access networks. In most cases the multimedia content can be adapted by
applying different adaptation operations that result in certain characteristics of the content. Therefore, an instance within
the framework has to decide which adaptation operations have to be performed to achieve a satisfactory result. This
process is known as adaptation decision-taking and makes extensive use of metadata describing the possible adaptation
operations, the usage environment of the consumer, and constraints concerning the adaptation. Based on this metadata a
mathematical optimization problem can be formulated and its solution yields the optimal parameters for the adaptation
operations. However, the metadata is represented in XML resulting in a verbose and inefficient encoding. In this paper,
an architecture for an Adaptation Decision-Taking Engine (ADTE) is introduced. The ADTE operates both on XML
metadata and on metadata encoded with MPEG's Binary Format for Metadata (BiM) enabling an efficient metadata
processing by separating the problem extraction from the actual optimization step. Furthermore, several optimization
algorithms which are suitable for scalable multimedia formats are reviewed and extended where it was appropriate.,
Peer-to-Peer(P2P) streaming has become a very popular technique to realize live media broadcast over the Internet. Most previous research of P2P streaming focuses on the delivery of a single media stream (called a channel). The widely deployed implementations, however, all concurrently offer multiple channels through their P2P networks. This paper investigates the overlay organization for multi-channel P2P streaming systems through modeling and simulations. In particular, this paper examines the potential collaborations among nodes across multiple channels. Our investigation shows that collaboration among nodes across different channels can improve the overall performance of the multi-channel P2P streaming system. However, the collaboration strategies need to be carefully selected. Simple collaboration strategies, such as treating collaborative nodes (those "borrowed" from other channels) the same as a channel's native nodes (those playing the channel), tend to have marginal or even negative effects on the whole system performance. This result is contrary to common impression - the larger population the better performance of P2P system - and we found that this is caused by the differences between P2P streaming and traditional P2P file-sharing systems. Furthermore, this paper proposes a set of simple strategies that controls the upload-download ratio of collaborative nodes. We showed that this set of strategies produces a much better collaboration result for multi-channel P2P streaming systems. Although only a preliminary study, we believe the results will promote further investigation on the topic of multi-channel P2P streaming.
Reducing the product development cycle time is one of the most important and challenging problems faced by the
industry today. As the functionality and complexity of devices increases, so does the time required to design, test, and
develop the devices. Developing products rapidly in the face of this increasing complexity requires new methodologies
and tools. This paper presents a methodology for estimating the resources consumed by a video decoder. The proposed
methodology enables resource estimation based on high level user requirements. Component architecture for a H.264
video decoder is developed to enable design space exploration. The resources required to decode H.264 video are
estimated based on a measure of the complexity of the H.264 bitstreams and the target architecture. The proposed
approach is based on the hypothesis that the complexity of a H.264 video bitstream significantly influences resource
consumption and the complexity of a bitstream can thus be used to determine resource estimation. The bitstream
complexity is characterized to capture the data dependencies using a process called Bitstream Abstraction. The decoder
is componentized and component level resource requirements determined in a process called Decoder Abstraction. The
proposed methodology uses Bitstream Abstraction together with Decoder Abstraction to estimate resource
requirements. A component model for the H.264 video decoder is developed. Resources consumed by each component
are determined using the VTune performance analyzer. These resource estimates and video bitstream complexity are
used in developing a parametric model for resource estimation based on bitstream complexity. The proposed
methodology enables high level resource estimation for multimedia applications without a need for extensive and time
Applications in the fields of virtual and augmented reality as well as image-guided medical applications make use of a
wide variety of hardware devices. Existing frameworks for interconnecting low-level devices and high-level application
programs do not exploit the full potential for processing events coming from arbitrary sources and are not easily generalizable.
In this paper, we will introduce a new multi-modal event processing methodology using dynamically-typed
event attributes for event passing between multiple devices and systems. The existing OpenTracker framework was
modified to incorporate a highly flexible and extensible event model, which can store data that is dynamically created
and arbitrarily typed at runtime. The main factors impacting the library's throughput were determined and the performance
was shown to be sufficient for most typical applications. Several sample applications were developed to take advantage
of the new dynamic event model provided by the library, thereby demonstrating its flexibility and expressive power.
The wireless industry has seen a surge of interest in upcoming broadband wireless access (BWA) networks
like WiMAX that are based on orthogonal frequency division multiplexing (OFDM). These wireless access
technologies have several key features such as centralized scheduling, fine-grained allocation of
transmission slots, adaptation of the modulation and coding schemes (MCS) to the SNR variations of the
wireless channel, flexible and connection oriented MAC layer as well as QoS awareness and
differentiation for applications. As a result, such architectures provide new opportunities for cross-layer
optimization, particularly for applications that can tolerate some bit errors. In this paper, we describe a
multi-channel video streaming protocol for video streaming over such networks. In addition, we propose a
new combined channel coding and proportional share allocation scheme for multicast video distribution
based upon a video's popularity. Our results show that we can more efficiently allocate network
bandwidth while providing high quality video to the application.
Effcient delivery of video data over computer networks has been studied extensively for decades. Still, multi-receiver video
delivery is challenging, due to heterogeneity and variability in network availability, end node capabilities, and receiver
preferences. Our earlier work has shown that content-based networking is a viable technology for fine granularity multireceiver
video streaming. By exploiting this technology, we have demonstrated that each video receiver is provided with
fine grained and independent selectivity along the different video quality dimensions region of interest, signal to noise
ratio for the luminance and the chrominance planes, and temporal resolution. Here we propose a novel adaptation scheme
combining such video streaming with state-of-the-art techniques from the field of adaptation to provide receiver-driven
multi-dimensional adaptive video streaming. The scheme allows each client to individually adapt the quality of the received
video according to its currently available resources and own preferences. The proposed adaptation scheme is validated
experimentally. The results demonstrate adaptation to variations in available bandwidth and CPU resources roughly over
two orders of magnitude and that fine grained adaptation is feasible given radically different user preferences.
New applications like remote surveillance and online environmental or traffic monitoring are making it increasingly
important to provide flexible and protected access to remote video sensor devices. Current systems use application-level
codes like web-based solutions to provide such access. This requires adherence to user-level APIs provided by such
services, access to remote video information through given application-specific service and server topologies, and that
the data being captured and distributed is manipulated by third party service codes. CameraCast is a simple, easily used
system-level solution to remote video access. It provides a logical device API so that an application can identically
operate on local vs. remote video sensor devices, using its own service and server topologies. In addition, the
application can take advantage of API enhancements to protect remote video information, using a capability-based
model for differential data protection that offers fine grain control over the information made available to specific codes
or machines, thereby limiting their ability to violate privacy or security constraints. Experimental evaluations of
CameraCast show that the performance of accessing remote video information approximates that of accesses to local
devices, given sufficient networking resources. High performance is also attained when protection restrictions are
enforced, due to an efficient kernel-level realization of differential data protection.
Media streaming has found applications in many domains such as education, entertainment, communication
and video surveillance. Many of these applications require non-trivial manipulations of media streams, beyond
the usual capture/playback operations supported by typical multimedia software and tools. To support rapid
development of such applications, we have designed and implemented a scripting language called Plasma. Plasma
treats media streams as first-class objects, and caters to the characteristic differences between stored media files
and live media streams. In this paper, we illustrate the design and features of Plasma through several small
examples, and describe two example applications that we developed on top of Plasma. These two applications
demonstrate that using Plasma, complex applications that compose, mix, and filter multimedia streams can be
written with relatively little effort.
This paper describes the design and implementation of a multi-modal, multimedia capable sensor networking framework called SenseTK.
SenseTK allows application writers to easily construct multi-modal, multimedia sensor networks that include both traditional scalar-based
sensors as well as sensors capable of recording sound and video. The distinguishing features of such systems include the need to push
application processing deep within the sensor network, the need to bridge extremely low power and low computation devices, and the need
to distribute and manage such systems. This paper describes the design and implementation of SenseTK and provides several diverse
examples to show the flexibility and unique aspects of SenseTK. Finally, we experimentally measure several aspects of SenseTK.