The MPEG-21 standard defines a framework for the interoperable delivery and consumption of multimedia content.
Within this framework the adaptation of content plays a vital role in order to support a variety of terminals and to
overcome the limitations of the heterogeneous access networks. In most cases the multimedia content can be adapted by
applying different adaptation operations that result in certain characteristics of the content. Therefore, an instance within
the framework has to decide which adaptation operations have to be performed to achieve a satisfactory result. This
process is known as adaptation decision-taking and makes extensive use of metadata describing the possible adaptation
operations, the usage environment of the consumer, and constraints concerning the adaptation. Based on this metadata a
mathematical optimization problem can be formulated and its solution yields the optimal parameters for the adaptation
operations. However, the metadata is represented in XML resulting in a verbose and inefficient encoding. In this paper,
an architecture for an Adaptation Decision-Taking Engine (ADTE) is introduced. The ADTE operates both on XML
metadata and on metadata encoded with MPEG's Binary Format for Metadata (BiM) enabling an efficient metadata
processing by separating the problem extraction from the actual optimization step. Furthermore, several optimization
algorithms which are suitable for scalable multimedia formats are reviewed and extended where it was appropriate.,
XML-based metadata is widely adopted across the different communities and plenty of commercial and open source tools for processing and transforming are available on the market. However, all of these tools have one thing in common: they operate on plain text encoded metadata which may become a burden in constrained and streaming environments, i.e., when metadata needs to be processed together with multimedia content on the fly. In this paper we present an efficient approach for transforming such kind of metadata which are encoded using MPEG's Binary Format for Metadata (BiM) without additional en-/decoding overheads, i.e., within the binary domain. Therefore, we have developed an event-based push parser for BiM encoded metadata which transforms the metadata by a limited set of processing instructions - based on traditional XML transformation techniques - operating on bit patterns instead of cost-intensive string comparisons.
Due to the heterogeneity of the current terminal and network infrastructures, multimedia content needs to be adapted to specific capabilities of these terminals and network devices. Furthermore, user preferences and user environment characteristics must also be taken into consideration. The problem becomes even more complex by the diversity of multimedia content types and encoding formats. In order to meet this heterogeneity and to be applicable to different coding formats, the adaptation must be performed in a generic and interoperable way. As a response to this problem and in the context of MPEG-21, we present an approach which uses XML to describe the high-level structure of a multimedia resource in a generic way, i.e., how the multimedia content is organized, for instance in layers, frames, or scenes. For this purpose, a schema for XML-based bitstream syntax descriptions (generic Bitstream Syntax Descriptions or gBSDs) has been developed. A gBSD can describe the high-level structure of a multimedia resource in a coding format independent way. Adaptation of the resource is based on elementary transformation instructions formulated with respect to the gBSDs. These instructions have been separated from the gBSDs in order to use the same descriptions for different adaptations, e.g., temporal scaling, SNR scaling, or semantic adaptations. In the MPEG-21 framework, those adaptations can be steered for instance by the network characteristics and the user preferences. As a result, it becomes possible for coding format agnostic adaptation engines to transform media bitstreams and associated descriptions to meet the requirements imposed by the network conditions, device capabilities, and user preferences.