This article attempts to identify some of the technology and research challenges facing the digital media industry in the future. We first discuss several trends in the industry, such as the rapid growth of broadband Internet networks and the emergence of networking and media-capable devices in the home. Next, we present technical challenges that result from these trends, such as effective media interoperability in devices, and provide a brief overview of Windows Media, which is one of the technologies in the market attempting to address these challenges. Finally, given these trends and the state of the art, we argue that further research on data compression, encoder optimization, and multi-format transcoding can potentially make a significant technical and business impact in digital media. We also explore the reasons that research on related techniques such as wavelets or scalable video coding is having a relatively minor impact in today’s practical digital media systems.
One of the largest sources of compression in vision is the sensitivity and resolution reduction as a function of eccentricity. However, utilization of this visual property has been limited to system that directly measure the viewer's gaze position. We have applied visual eccentricity models to videophone compression applications without using eye tracking by combining the visual model with a face tracking algorithm. In lieu of a gaze-detector, we assume the gaze will be directed to the faces appearing in images. The incorporation of both resolution as well as sensitivity- based eccentricity models in a low bitrate video compression standard will be discussed. For videophone applications, the reduction in bitrate while retaining similar image quality is up to 50 percent. Problems arising from the improved temporal sensitivity of the periphery, despite its reduced spatial bandwidth, will be discussed.
In block-based video coding, the current frame to be encoded is decomposed into blocks of the same size, and a motion vector is used to improve the prediction for each block. The motion vectors and the difference frame, which contains the blocks' prediction errors, must be encoded with bits. Typically, choosing a smaller block size will improve the prediction and hence decrease the number of difference frame bits, but it will increase the number of motion bits since more motion vectors need to be encoded. Not surprisingly, there must be some value for the block size that optimizes the tradeoff between motion and difference frame bits, and thus minimizes their sum. Despite the widespread experience with block-based video coders, there is little analysis or theory that quantitatively explains the effect of block size on encoding bit rate, and ordinarily the block size for a coder is chosen based on empirical experiments on video sequences of interest. In this work, we derive a procedure to determine the optimal block size that minimizes the encoding rate for a typical block-based video coder. To do this, we analytically model the effect of block size and derive expressions for the encoding rates for both motion vectors and difference frames, as functions of block size. Minimizing these expressions leads to a simple formula that indicates how to choose the block size in these types of coders. This formula also shows that the best block size is a function of the accuracy with which the motion vectors are encoded and several parameters related to key characteristics of the video scene,such as image texture, motion activity, interframe noise, and coding distortion. We implement the video coder and use our analysis to optimize and explain its performance on real video frames.
In block-based motion-compensated video coding, a fixed-resolution motion field with one motion vector per image block is used to improve the prediction of the frame to be coded. All motion vectors are encoded with the same fixed accuracy, typically 1 or 1/2 pixel accuracy. In this work, we explore the benefits of encoding the motion vectors with other accuracies, and of encoding different motion vectors with different accuracies within the same frame. To do this, we analytically model the effect of motion vector accuracy and derive expressions for the encoding rates for both motion vectors and difference frames, in terms of the accuracies. Minimizing these expressions leads to simple formulas that indicate how accurately to encode the motion vectors in a classical block-based motion-compensated video coder. These formulas also show that the motion vectors must be encoded more accurately where more texture is present, and less accurately when there is much interframe noise. We implement video coders based on our analysis and present experimental results on real video frames. These results suggest that our equations are accurate, and that significant bit rate savings can be achieved when our optimal motion vector accuracies are used.
Classical block-based motion-compensated video coders need to find and code a motion field with one motion vector per image block. All motion vectors are computed and encoded with the same fixed accuracy, typically 1 or one-half pixel accuracy. Higher motion vector accuracies have been shown to significantly reduce the total bit rate in some video sequences, but motion estimation at such subpixel accuracies is computationally expensive and is usually not performed in practice. In this paper we show that computing and encoding different motion vectors with different accuracies in the same frame can significantly reduce the total bit rate, and that the complexity of the block adaptive motion estimation procedure can be as low as that of the classical motion estimation at 1 pixel accuracy. Our new block adaptive motion estimator uses a simple technique to decide how accurately to compute the motion vector for each block. This technique results from an analysis on the effect of the motion vector accuracy on the total bit rate, and indicates that motion vectors of higher texture blocks must be computed more accurately and that at higher levels of compression lower motion vector accuracies are needed. We implement two video coders based on our technique, present results on real video frames, and describe the rate/complexity benefits of our procedure.