Multiview video in "texture-plus-depth" format enables decoder to synthesize freely chosen intermediate views
for enhanced visual experience. Nevertheless, transmission of multiple texture and depth maps over bandwidthconstrained
and loss-prone networks is challenging, especially for conferencing applications with stringent deadlines.
In this paper, we examine the problem of loss-resilient coding of depth maps by exploiting two observations.
First, different depth macroblocks have significantly different error sensitivities with respect to the reconstructed
images. Second, unlike texture, the relative overhead of using reference pictures with large prediction distance is
low for depth maps. This motivates our approach of assigning a weight to represent the varying error sensitivity
of each macroblock and using such weights to guide selection of reference frames. Results show that (1) errors in
depth maps in sequence with high motion yields significant drop in quality in reconstructed images, and (2) that
the proposed scheme can efficiently maintain the quality of reconstructed images even at relatively high packet
loss rates of 3-5%.
We describe the real-time CUDA implementation of an error concealment algorithm for high definition video at
720p. The concealment method is based on decoder motion search on high resolution frame, using a thumbnail
as a guide, and is therefore comparable in complexity as encoder motion search. We discuss the different
requirements for decoder motion search compared to encoder search, and present a fast motion search algorithm
suitable for parallel implementation in GPU. The design of the real-time CUDA implementation and its
performance analysis are also presented.
We describe a networked video application where personalized avatars, controlled by a group of "hecklers", are
overlaid on top of a real-time encoded video stream of an Internet game for multicast consumption. Rather
than passively observing the streamed content individually, the interactivity of the controllable avatars, along
with heckling voice exchange, engenders a sense of community during group viewing. We first describe how
the system splits video into independent regions with and without avatars for processing in order to minimize
complexity. Observing that the region with avatars is more delay-sensitive due to their interactivity, we then
show that the regions can be logically packetized into separable sub-streams, and be transported and buffered
with different delay requirements, so that the interactivity of the avatars can be maximized. The utility of our
system extends beyond Internet game watching to general community streaming of live or pre-encoded video
with visual overlays.
Streaming video in consumer homes over wireless IEEE 802.11 networks is becoming commonplace. Wireless 802.11 networks
pose unique difficulties for streaming high definition (HD), low latency video due to their error-prone physical layer and media
access procedures which were not designed for real-time traffic. HD video streaming, even with sophisticated H.264 encoding, is
particularly challenging due to the large number of packet fragments per slice. Cross-layer design strategies have been proposed
to address the issues of video streaming over 802.11. These designs increase streaming robustness by imposing some degree of
monitoring and control over 802.11 parameters from application level, or by making the 802.11 layer media-aware. Important
contributions are made, but none of the existing approaches directly take the 802.11 queuing into account. In this paper we take
a different approach and propose a cross-layer design allowing direct, expedient control over the wireless packet queue, while
obtaining timely feedback on transmission status for each packet in a media flow. This method can be fully implemented on a
media sender with no explicit support or changes required to the media client. We assume that due to congestion or deteriorating
signal-to-noise levels, the available throughput may drop substantially for extended periods of time, and thus propose video
source adaptation methods that allow matching the bit-rate to available throughput. A particular H.264 slice encoding is presented
to enable seamless stream switching between streams at multiple bit-rates, and we explore using new computationally efficient
transcoding methods when only a high bit-rate stream is available.
In this paper we evaluate a layered coding technique based on subband coding for the purpose of encoding medical images for realtime transmission over heterogeneous networks. The objective of this research is to support a medical conference in a heterogeneous networking scenario. The scalable coding scheme under study in this paper generates a single bit-stream, from which a number of sub-streams of varying bit-rates can be extracted. This makes it possible to support a multicast transmission scenario, where the different receivers are capable of receiving different bit- rate streams from the same source, in an efficient and scalable way. The multirate property also allows us to provide graceful degradation to loss when used over networks which support multiple priorities. This paper evaluates the quality of the video images encoded with the layered encoding technique at different bit-rates in terms of the peak signal to noise ratio for cine-angiogram video. It also describes experiments with the transmission of the video across an asynchronous transfer mode (ATM) local area network, using two layer encoded video stream, and assigning different network service classes to the two layers. We study how the quality of the reconstructed signal changes with the ratio of the bit-rates of the high and low priority layers, for various levels of congestion in the ATM network.