Nowadays, a strong need exists for the efficient organization of an increasing amount of home video content. To create
an efficient system for the management of home video content, it is required to categorize home video content in a
semantic way. So far, a significant amount of research has already been dedicated to semantic video categorization.
However, conventional categorization approaches often rely on unnecessary concepts and complicated algorithms that
are not suited in the context of home video categorization. To overcome the aforementioned problem, this paper
proposes a novel home video categorization method that adopts semantic home photo categorization. To use home photo
categorization in the context of home video, we segment video content into shots and extract key frames that represent
each shot. To extract the semantics from key frames, we divide each key frame into ten local regions and extract lowlevel
features. Based on the low level features extracted for each local region, we can predict the semantics of a
particular key frame. To verify the usefulness of the proposed home video categorization method, experiments were
performed with home video sequences, labeled by concepts part of the MPEG-7 VCE2 dataset. To verify the usefulness
of the proposed home video categorization method, experiments were performed with 70 home video sequences. For the
home video sequences used, the proposed system produced a recall of 77% and an accuracy of 78%.
Environments for the delivery and consumption of multimedia are often very heterogeneous, due to the use of various
terminals in varying network conditions. One example of such an environment is a wireless network providing
connectivity to a plethora of mobile devices. H.264/AVC Scalable Video Coding (SVC) can be utilized to deal with
diverse usage environments. However, in order to optimally tailor scalable video content along the temporal, spatial, or
perceptual quality axes, a quality metric is needed that reliably models subjective quality. The major contribution of this
paper is the development of a novel quality metric for scalable video bit streams having a low spatial resolution,
targeting consumption in wireless video applications. The proposed quality metric allows modeling the temporal, spatial,
and perceptual quality characteristics of SVC bit streams. This is realized by taking into account several properties of the
compressed bit streams, such as the temporal and spatial variation of the video content, the frame rate, and PSNR values.
An extensive number of subjective experiments have been conducted to construct and verify the reliability of our quality
metric. The experimental results show that the proposed quality metric is able to efficiently reflect subjective quality.
Moreover, the performance of the quality metric is uniformly high for video sequences with different temporal and
The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has
limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with
powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be
addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as
generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA
provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In
CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model.
This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation
(MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined
with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new
CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results
compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU
requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an
example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based
Three low complexity algorithms that allow spatial scalability in the context of video coding are presented in this paper. We discussed the feasibility of reusing motion and residual texture information of the base layer in the enhancement layer. The prediction errors that arise from the discussed filters and schemes are evaluated in terms of the Mean of Absolute Differences. For the interpolation of the decoded pictures from the base layer, the presented 6-tap and bicubic filters perform significantly better than the bilinear and nearest neighbor filters. In contrast, when reusing the motion vector field and the error pictures of the base layer, the bilinear filter performs best for the interpolation of residual texture information. In general, reusing the motion vector field and the error pictures of the base layer gives the lowest prediction errors. However, our tests showed that for some sequences that have regions with complex motion activity, interpolating the decoded picture of the base layer gives best result. This means that an encoder should compare all possible prediction schemes combined with all interpolation filters in order to achieve optimal prediction. Obviously this would not be possible for real-time content creation.
H.264/AVC is a new specification for digital video coding that aims at a deployment in a lot of multimedia applications, such as video conferencing, digital television broadcasting, and internet streaming. This is for instance reflected by the design goals of the standard, which are about the provision of an efficient compression
scheme and a network-friendly representation of the compressed data. Those requirements have resulted in a very flexible syntax and architecture that is fundamentally different from previous standards for video compression. In this paper, a detailed discussion will be provided on how to apply an extended version of the MPEG-21 Bitstream Syntax Description Language (MPEG-21 BSDL) to the Annex B syntax of the H.264/AVC specification. This XML based language will facilitate the high-level manipulation of an H.264/AVC bitstream in
order to take into account the constraints and requirements of a particular usage environment. Our performance measurements and optimizations show that it is possible to make use of MPEG-21 BSDL in the context of the current H.264/AVC standard with a feasible computational complexity when exploiting temporal scalability.
Video coding is used under the hood of a lot of multimedia applications, such as video conferencing, digital storage media, television broadcasting, and internet streaming. Recently, new standards-based and proprietary technologies have emerged. An interesting problem is how to evaluate these different video coding solutions in terms of delivered quality.
In this paper, a PSNR-based approach is applied in order to compare the coding potential of H.264/AVC AHM 2.0 with the compression efficiency of XviD 0.9.1, DivX 5.05, Windows Media Video 9, and MC-EZBC. Our results show that MPEG-4-based tools, and in particular H.264/AVC, can keep step with proprietary solutions. The rate-distortion performance of MC-EZBC, a wavelet-based video codec, looks very promising too.
While the price of mobile devices is dropping quickly, the set of features and capabilities of these devices is advancing very dramatically. Because of this, new mobile multimedia applications are conceivable, also thanks to the availability of high speed mobile networks like UMTS and Wireless LAN. However, creating such applications is still difficult due to the huge diversity of features and capabilities of mobile devices. Software developers also have to take into account the rigorous limitation on processing capabilities, display possibilities, and the limited battery life of these devices. On top of that, the availability of the device resources fluctuates strongly during execution of an application, directly and violently influencing the user experience, whereas equivalent fluctuations on traditional desktop PC's are far less prominent. Using new technology like MPEG-4, -7 and -21 can help application developers to overcome these problems. We have created an MPEG-21-based Video-on-Demand application optimized for mobile devices that is aware of the usage environment (i.e., user preference, device capabilities, device conditions, network status, etc.) of the client and adapts the MPEG-4 videos to it. The application is compliant with the Universal Multimedia Access framework, supports Time-Dependent Metadata, and relies on both MPEG-4 and MPEG-21 technology.