The MPEG-4 video standard extends the traditional frame-based processing with the option to compose several
video objects (VO) superimposed on a background sprite image. In our previous work, we presented a distributed,
multiprocessor based, scalable implementation of an MPEG-4 arbitrary-shaped decoder, which forms together
with the background sprite decoder an essential part for further scene rendering. For control of the multiprocessor
architecture, we have constructed a Quality-of-Service (QoS) management that monitors the availability of
required data and distributes the processing of individual tasks with guaranteed or best-effort services of the
platform. However, the proposed architecture with the combined guaranteed and best-effort services poses
problems for real-time scene rendering.
In this paper, we present a technique for proper run-time rendering of the final scene after decoding one VO
Layer. The individual video-object monitors check the data availability and select the highest quality for the
final scene rendering. The algorithm operates hierarchically both at the scene level and at the task level of the
video object processing. Whereas the earlier work on scalable implementation concentrated only on guaranteed
services, we now introduce a new element in the system architecture for the real-time control and fall back
mechanism of the best-effort services. This element is based on first, controlling data availability at task level,
and second, introducing the propagation service to QoS management. We present our simulation results in the
comparison with the standard "frame-skipping" technique that is the only currently available solution to this
type of rendering a scalable processing.
MPEG-4 is the first standard that combines synthetic objects, like 2D/3D graphics objects, with natural rectangular and non-rectangular video objects. The independent access to individual synthetic video objects for further manipulation creates a large space for future applications. This paper addresses the optimization of such complex multimedia algorithms for implementation on multiprocessor platforms. It is shown that when choosing the correct granularity of processing for enhanced parallelism and splitting time-critical tasks, a substantial improvement in processing efficiency can be obtained. In our work, we focus on non-rectangular (also called arbitrary-shaped) video objects decoder. In previous work, we motivated the use of a multiprocessor System-on-Chip(SoC) setup that satisfies the requirements on the overall computation capacity. We propose the optimization of the MPEG-4 algorithm to increase the decoding throughput and a more efficient usage of the multiprocessor architecture. First, we present a modification of the Repetitive Padding to increase the pipelining at block level. We identified the part of the padding algorithm that can be executed in parallel with
the DCT-coefficient decoding and modified the original algorithm into two communicating tasks. Second, we introduce a synchronization mechanism that allows the processing for the Extended Padding and postprocessing (Deblocking & Deringing) filters at block level. The first optimization results in about 58% decrease of the original
Repetitive-Padding task computational requirements. By introducing the previously proposed data-level parallelism and exploiting the inherent parallelism between the separated color components (Y, Cr, Cb), the computational savings are about 72% on the average. Moreover, the proposed optimizations marginalize the processing latency from frame size to slice order-of-magnitude.
Component-based software development is very attractive, because it allows a clear decomposition of logical
processing blocks into software blocks and it offers wide reuse. The strong real-time requirements of media
processing systems should be validated as soon as possible to avoid costly system redesign. This can be achieved
by prediction of timing and performance properties. In this paper, we propose a scenario simulation design
approach featuring early performance prediction of a component-based software system. We validated this
approach through a case study, for which we developed an advanced MPEG-4 coding application. The benefits
of the approach are threefold: (a) high accuracy of the predicted performance data; (b) it delivers an efficient
real-time software-hardware implementation, because the generic computational costs become known in advance,
and (c) improved ease of use because of a high abstraction level of modelling. Experiments showed that the
prediction accuracy of the system performance is about 90% or higher, while the prediction accuracy of the
time-detailed processor usage (performance) does not get lower than 70%. However, the real-time performance
requirements are sometimes not met, e.g. when other applications require intensive memory usage, thereby
imposing delays on the retrieval from memory of the decoder data.