The bulk of the video content available today over the Internet and over mobile networks suffers from many
imperfections caused during acquisition and transmission. In the case of user-generated content, which is typically
produced with inexpensive equipment, these imperfections manifest in various ways through noise, temporal
flicker and blurring, just to name a few. Imperfections caused by compression noise and temporal flicker are
present in both studio-produced and user-generated video content transmitted at low bit-rates. In this paper,
we introduce an algorithm designed to reduce temporal flicker and noise in video sequences. The algorithm takes
advantage of the sparse nature of video signals in an appropriate transform domain that is chosen adaptively based
on local signal statistics. When the signal corresponds to a sparse representation in this transform domain, flicker
and noise, which are spread over the entire domain, can be reduced easily by enforcing sparsity. Our results show
that the proposed algorithm reduces flicker and noise significantly and enables better presentation of compressed
Multimedia services for mobile phones are becoming increasingly popular thanks to capabilities brought about
by location awareness, customized programming, interactivity, and portability. With mounting attraction to
these services there is desire to seamlessly expand the mobile multimedia experience to stationary environments
where high-resolution displays can offer significantly better viewing conditions. In this paper, we propose a
fast, high quality super-resolution algorithm that enables high resolution display of low-resolution video. The
proposed algorithm, SWAT, accomplishes sparse reconstructions using directionally warped transforms and spatially
adaptive thresholding. Comparisons are made with some existing techniques in terms of PSNR and visual
quality. Simulation examples show that SWAT significantly outperforms these techniques while staying within
a limited computational complexity envelope.
In this paper, within the context of the MPEG-4 standard we report on preliminary experiments in three areas -- authoring of MPEG-4 content, a player/browser for MPEG-4 content, and streaming of MPEG-4 content. MPEG-4 is a new standard for coding of audiovisual objects; the core of MPEG-4 standard is complete while amendments are in various stages of completion. MPEG-4 addresses compression of audio and visual objects, their integration by scene description, and interactivity of users with such objects. MPEG-4 scene description is based on VRML like language for 3D scenes, extended to 2D scenes, and supports integration of 2D and 3D scenes. This scene description language is called BIFS. First, we introduce the basic concepts behind BIFS and then show with an example, textual authoring of different components needed to describe an audiovisual scene in BIFS; the textual BIFS is then saved as compressed binary file/s for storage or transmission. Then, we discuss a high level design of an MPEG-4 player/browser that uses the main components from authoring such as encoded BIFS stream, media files it refers to, and multiplexed object descriptor stream to play an MPEG-4 scene. We also discuss our extensions to such a player/browser. Finally, we present our work in streaming of MPEG-4 -- the payload format, modification to client MPEG-4 player/browser, server-side infrastructure and example content used in our MPEG-4 streaming experiments.
REmote authentication is vital for many network based applications. As the number of such applications increases, user friendliness of the authentication process, particularly as it relates to password management, becomes as important as its reliability. The multimedia capabilities of the modern terminal equipment can provide the basis for a dependable and easy to use authentication system which does not require the user to memorize passwords. This paper outlines our implementation of an authentication system based on the joint use of the speech and facial video of a user. Our implementation shows that the voice and the video of the associated lip movements, when used together, can be very effective for password free authentication.
This paper describes a system that has been designed and built at AT&T Bell Labs for studying transmission of real- time MPEG-2 video over ATM networks for multi-cast applications. The set-up comprises a hardware real-time MPEG-2 video, audio and system encoder, an ATM network adaptation module for MPEG-2 transport over AAL-5, and ATM switch, a software system decoder and a hardware elementary stream decoder. The MPEG-2 transport stream has been characterized in terms of robustness to errors. This preliminary study showed the higher importance of the structural information of the stream (PES packet headers TS headers, sequence, picture headers, etc.) with respect to the coded video data (motion vectors, DCT coefficients, etc.). A brief study of the current MPEG-2 hardware decoding architectures allowed us to better understand the effects of bit-stream errors on the resulting video quality. In our experiments, while the loss of some structural data such as picture start codes led the hardware decoder to loose synchronization or to freeze, the loss of video data only affected the image quality. Furthermore the recovery times from a loss of synchronization were orders of magnitude higher than the recovery from some video data loss. An error-resilient real-time software transport stream decoder has been developed. In multiplex-wide operations (i.e. operations on the entire transport stream) it takes advantage of ring buffers and manages the timing information appropriately. In video-stream specific operations it uses resynchronization mechanisms at the picture level which exploit the redundancy of the PES and transport stream syntax. Furthermore time data transfers between the system decoder and the elementary stream decoder are employed. Experiments show that proper use of these methods can significantly improve the system performance.
The increasing availability of high speed local area networks (LANs) and the recent developments in image coding hardware that follow the international standardizations have made multipoint desktop video teleconferencing feasible using mostly off-the-shelf hardware and software. We have implemented a multipoint video teleconferencing system using workstations and personal computers (PCs) on a fiber distributed data interface (FDDI) LAN. The system is based on motion JPEG image coding and uses commercially available hardware and software wherever possible. In this paper, we outline our design experiences.
Scalable video coding is important in a number of applications where video needs to be decoded and displayed at a variety of resolution scales. It is more efficient than simulcasting, in which all desired resolution scales are coded totally independent of one another within the constraint of a fixed available bandwidth. In this paper, we focus on scalability using the frequency domain approach. We employ the framework proposed for the ongoing second phase of Motion Picture Experts Group (MPEG-2) standard to study the performance of one such scheme and investigate improvements aimed at increasing its efficiency. Practical issues related to multiplexing of encoded data of various resolution scales to facilitate decoding are considered. Simulations are performed to investigate the potential of a chosen frequency domain scheme. Various prospects and limitations are also discussed.
This paper presents a parallel scheme for three dimensional data visualization at interactive rates. The scheme is
particularly suitable for multiprocessor systems with distributed frame buffers and is currently implemented on an AT&T
Pixel Machine, a parallel computer based on mesh connected digital signal processors with a distributed frame buffer.
Nearly linear performance increase with the number of processors in the mesh is obtained by partitioning the original
three dimensional data into sub-blocks and processing each sub-block in parallel. The approach is very flexible in
implementing a variety of visualization techniques, such as volume compositing (translucent models), binary-class and
percentage mixtures and surface based volume rendering.
This course explains the basic principles and current applications of video streaming. A primary goal of the course is to reveal the underlying mechanisms and techniques shared by all networked video applications. In addition, the relevant parts of related networking protocols, e.g., TCP, UDP, and the latest video codec (H.264/AVC) properties will be explained. A classification of networked video applications, from video-on-demand to interactive video, will be presented together with their specific problems and the current solutions. The emphasis will be on the transport layer techniques including packetization issues, loss recovery, delay jitter removal, synchronization and multiplexing. Real-time Transport Protocol (RTP/RTCP) will be presented in detail. Examples will be given from the popular video streaming solutions available today either under proprietary offerings or as open source solutions. Technologists and managers who are considering deploying video streaming solutions or evaluating relative merits of existing technologies will benefit from this course.