Maestro is a middleware support tool for distributed multimedia and collaborative computing applications. These applications share a common need for managing multiple subgroups while providing a possibly different quality-of- service guarantees for each of these groups. Maestro's functionality maps well into these requirements, and can significantly shorten the development time of such applications. In this paper we report on Maestro, and demonstrate its utility in implementing IMUX, a pseudo X- server (proxy) for collaborative computing applications. Examples of other multimedia applications that benefit from Maestro appear in the full version of the paper.
We describe a compositional simulation system, which predicts streamed video performance on multiple platform configurations. System behavior is modeled with a set of key deterministic and stochastic variables, each of which characterizes part of a 'virtual system component,' e.g., an IO device, a particular CPU, a codec, etc. These variables are profiled in isolation, by inserting lightweight instrumentation code into the main threads and upcalls involved in the playout datapath. Then, a post-processor converts the derived samples into synthesized probability distribution functions, after which the results are stored in the simulator's library. At simulation time, the user selects a set of 'virtual components,' which are then composed into an 'end-user system.' The resulting model is used to predict the performance of any video, which usually requires no more than a few seconds. The output is a list of frame-display times, accompanied by statistics on the mean-playout rate, variance, jitter, etc. This scheme lets developers extend the range of their target platforms, by automatically benchmarking new components, storing the results, and then simulating an entirely new set of systems. So, if a developer possesses a large set of pre-profiled components, he or she can quickly estimate a video's performance on a huge spectrum of target platforms -- without ever having to actually assemble them. In this paper we demonstrate evidence that our method works within a reasonable degree of accuracy, when compared to actual on-line playout. We present results for a generic, streamed Quicktime video system -- subjected to multiple configurations. These were assembled (combinatorially) using four different CPUs, three types of SCSI devices, two common codecs (Radius Cinepak and Intel Indeo), and two full-frame video masters. On most configurations tested, the simulator's on-line predictions were accurate within a margin of 15% error.
In this paper, we describe the design and implementation of a prototype software video production switcher, vps, that improves the quality of the content of MBone broadcasts. vps is modeled after the broadcast television industry's studio production switcher. It provides special effects processing to incorporate audience discussions, add titles and other information, and integrate stored videos into the presentation. vps is structured to work with other MBone conferencing tools. The ultimate goal is to automate the production of MBone broadcasts.
Due to the heterogeneity and shared resource nature of today's computer network environments, the end-to-end delivery of multimedia requires adaptive mechanisms to be effective. We present a framework for the adaptive streaming of heterogeneous media. We introduce the application of online statistical process control (SPC) to the problem of dynamic rate control. In SPC, the goal is to establish (and preserve) a state of statistical quality control (i.e., controlled variability around a target mean) over a process. We consider the end-to-end streaming of multimedia content over the internet as the process to be controlled. First, at each client, we measure process performance and apply statistical quality control (SQC) with respect to application-level requirements. Then, we guide an adaptive rate control (ARC) problem at the server based on the statistical significance of trends and departures on these measurements. We show this scheme facilitates handling of heterogeneous media. Last, because SPC is designed to monitor long-term process performance, we show that our online SPC scheme could be used to adapt to various degrees of long-term (network) variability (i.e., statistically significant process shifts as opposed to short-term random fluctuations). We develop several examples and analyze its statistical behavior and guarantees.
Current multimedia server systems are based largely on the UNIX variants such as 4.4BSD. The design of the I/O subsystem in these operating systems was originally intended to mediate access by applications to I/O devices. In the 90s, multimedia applications such as web and continuous media servers have stressed the original design by streaming data between I/.O devices. This work revamps the I/O system design from the ground up. It presents a detailed description of a new I/O system architecture for UNIX and an implementation in the roadrunner kernel. The design is ambitious, requiring changes to all I/O elements including file systems, network protocols, and the device driver interface. The result is a clean architecture that maintains support for all I/O functions found in current UNIX systems but which also provides an efficient, general mechanism for streaming data with or without transformations.
Modern networks are now capable of guaranteeing a consistent quality of service (QoS) to multimedia traffic streams. A number of major operating system vendors are also working hard to extend these guarantees into the end-system. In both cases, however, there remains the problem of determining a service rate sufficient to ensure the desired quality of service. Source modeling is not a sustainable approach in the network case and it is even less feasible to model the demands of multimedia applications. The ESPRIT measure project is successfully using on-line measurement and estimation to perform resource allocation for bursty traffic in ATM networks. In this paper we consider the applicability of the same theory to resource allocation in a multimedia operating system which offers QoS guarantees to its applications.
Efficient support for interactive VCR-like operations in video-on-demand (VOD) systems is an important factor in making such systems commercially appealing. In this paper, we focus on a particular type of interactive operations; namely, fast forward and rewind with scanning. Conventional approaches to supporting scanning operations have significant shortcomings in terms of their network bandwidth and/or set-top box buffering requirements. We introduce an efficient method for supporting scanning operations which does not require any extra bandwidth beyond what is already allocated for the normal play-back operation. To support both forward and reverse scanning operations at several speedups, multiple, differently coded versions of a movie are stored at the VOD server. These versions include a 'normal version' for the normal play-back operation and a 'scan version' for every different speedup that has to be supported. A request for a scanning operation is serviced by switching to an appropriate version at the VOD server. Only one version is transported and decoded at a time. Each scanning version is produced by encoding a sample of the raw movie frames, so that when decoded and played back at the normal frame rate, this version gives a perceptual speedup in the motion picture. Our approach for providing scanning operations is integrated into a previously proposed framework for distributing archived MPEG- coded video streams. To incorporate interactive scanning operations in that framework, we investigate mechanisms for controlling the traffic envelopes of the scan versions during the encoding phase and we discuss the pre-processing steps needed to generate scan versions with a desired traffic envelope. We then discuss the storage requirements of the scan versions and the latency for switching from one version to another. Simple and effective ways to control this latency are presented. Finally, we discuss the implications of implementing our approach to both the server and the client.
Bandwidth smoothing algorithms can effectively reduce the network resource requirements for the delivery of compressed video streams. For stored video, a large number of bandwidth smoothing algorithms have been introduced that are optimal under certain constraints but require access to all the frame size data in order to achieve their optimal properties. This constraint, however, can be both resource and computationally expensive, especially for moderately priced set-top-boxes. In this paper, we introduce a movie approximation technique for the representation of the frame sizes of a video, reducing the complexity of the bandwidth smoothing algorithms and the amount of frame data that must be transmitted prior to the start of playback. Our results show that the proposed approximation technique can accurately approximate the frame data with a small number of piece-wise linear segments without affecting the performance measures that the bandwidth soothing algorithms are attempting to achieve by more than 1%. In addition, we show that implementations of this technique can speed up execution times by 100 to 400 times, allowing the bandwidth plan calculation times to be reduced to tens of milliseconds. Evaluation using a compressed full-length motion-JPEG video is provided.
Conventional video-on-demand (VoD) servers providing VCR-like interactivity allocate a separate channel for each user. Various schemes have been proposed for aggregating users into groups to improve resource utilization. By bridging the temporal skew between users viewing the same content, multiple users can be served from a single channel. Dynamic aggregation techniques such as rate adaptation and content insertion attempt to optimize resource utilization on-the-fly, thus improving their performance in interactive situations. In this paper, we describe a VoD system which delivers continuous media content to multiple clients using dynamic service aggregation to reduce bandwidth requirements. MPEG-1 system streams are used as the content format and IP multicast is used for video delivery. To the best of our knowledge, we are the first to demonstrate the viability of dynamic service aggregation using rate-adaptive merging in a VoD system. We present our experiences with building the system and address some important issues relating to aggregation in video servers including server directed channel switching in the client and stream merging by acceleration. We show simulations of our clustering and merging algorithms for large user populations and report the piggybacking of over 2.5 users on average per physical channel.
An integrated multimedia file system supports the storage and retrieval of multiple data types. In this paper, we first discuss various design methodologies for building integrated file systems and examine their tradeoffs. We argue that, to efficiently support the storage and retrieval of heterogeneous data types, an integrated file system should enable the coexistence of multiple data type specific techniques. We then describe the design of Symphony -- an integrated file system that achieves this objective. Some of the novel features of Symphony include: a QoS-aware disk scheduling algorithm; support for data type specific placement, failure recovery, and caching policies; and support for assigning data type specific structure to files. We discuss the prototype implementation of Symphony, and present results of our preliminary experimental evaluation.
Device independent I/O has been a holy grail to operating system designers since the early days of UNIX. Unfortunately, existing operating systems fall short of this goal for multimedia applications. Techniques such as caching and sequential read-ahead can help mask I/O latency in some cases, but in others they increase latency and add substantial jitter. Multimedia applications, such as video players, are sensitive to vagaries in performance since I/O latency and jitter affect the quality of presentation. Our solution uses adaptive prefetching to reduce both latency and jitter. Applications submit file access plans to the prefetcher, which then generates I/O requests to the operating system and manages the buffer cache to isolate the application from variations in device performance. Our experiments show device independence can be achieved: an MPEG video player sees the same latency when reading from a local disk or an NFS server. Moreover, our approach reduces jitter substantially.
We consider the problem of OS resource management for real- time and multimedia systems where multiple activities with different timing constraints must be scheduled concurrently. Time on a particular resource is shared among its users and must be globally managed in real-time and multimedia systems. A resource kernel is meant for use in such systems and is defined to be one which provides timely, guaranteed and protected access to system resources. The resource kernel allows applications to specify only their resource demands leaving the kernel to satisfy those demands using hidden resource management schemes. This separation of resource specification from resource management allows OS-subsystem- specific customization by extending, optimizing or even replacing resource management schemes. As a result, this resource-centric approach can be implemented with any of several different resource management schemes. We identify the specific goals of a resource kernel: applications must be able to explicitly state their timeliness requirements; the kernel must enforce maximum resource usage by applications; the kernel must support high utilization of system resources; and an application must be able to access different system resources simultaneously. Since the same application consumes a different amount of time on different platforms, the resource kernel must allow such resource consumption times to be portable across platforms, and to be automatically calibrated. Our resource management scheme is based on resource reservation and satisfies these goals. The scheme is not only simple but captures a wide range of solutions developed by the real-time systems community over several years. One potentially serious problem that any resource management scheme must address is that of allowing access to multiple resources simultaneously and in timely fashion, a problem which is known to be NP-complete. We show that this problem of simultaneous access to multiple resources can be practically addressed by resource decoupling and resolving critical resource dependencies immediately. Finally, we demonstrate our resource kernel's functionality and flexibility in the context of multimedia applications which need processor cycles and/or disk bandwidth.
The design of file systems is strongly influenced by measuring the use of existing file systems, such file size distribution and patterns of access. We believe that a similar characterization of video stored on the Internet will help network engineers, codec designers, and other multimedia researchers. We therefore executed an experiment to measure how video data is used on the Web today. In this experiment, we downloaded and analyzed over 57,000 AVI, QuickTime and MPEG files stored on the Web -- approximately 100 gigabytes of data. Among our more interesting discoveries, we found that the most common video technology in use today is QuickTime, and that the image resolution and frame rate of video files that include audio are much more uniform than video-only files. The majority of all audio/video files have dimensions of CIF or QCIF (or very similar) at 10, 12, 15, or 30 fps, whereas the dimensions and frame rates of video-only files are more uniformly distributed. We also experimentally verified the conjecture that current Internet bandwidth is at least an order of magnitude too slow to support streaming playback of video. We present these results and other statistical information characterizing video on the web in this paper.
With the increasing popularity of the World Wide Web, the amount of information available and the use of Web servers are growing exponentially. As a consequence, the number of requests to popular Web servers increases exponentially as well. In order to reduce the overhead induced by frequent requests to the same documents, server caching, also referred to as main memory caching, has been proposed and implemented. In this work, we propose a static caching mechanism which consists of updating the contents of the cache periodically and, at the update time, brings into the cache only the most requested documents in the previous time interval. This caching policy has lower management overhead than others. Under some statistical assumptions we show that static caching has the highest hit rate. We also provide empirical comparison results obtained by trace-driven simulations. It turns out that static caching is more efficient in terms of hit rate than those analyzed in the literature.
The WWW employs a hierarchical data dissemination architecture in which hyper-media objects stored at a remote server are served to clients across the Internet, and cached on disks at intermediate proxy servers. One of the objectives of web caching algorithms is to maximize the data transferred from the proxy servers or cache hierarchies. Current web caching algorithms are designed only for text and image data. Recent studies predict that within the next five years more than half the objects stored at web servers will contain continuous media data. To support these trends, the next generation proxy cache algorithms will need to handle multiple data types, each with different cache resource usage, for a cache limited by both bandwidth and space. In this paper, we present a resource-based caching (RBC) algorithm that manages the heterogeneous requirements of multiple data types. The RBC algorithm (1) characterizes each object by its resource requirement and a caching gain, (2) dynamically selects the granularity of the entity to be cached that minimally uses the limited cache resource (i.e., bandwidth or space), and (3) if required, replaces the cached entities based on their cache resource usage and caching gain. We have performed extensive simulations to evaluate our caching algorithm and present simulation results that show that RBC outperforms other known caching algorithms.
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.
Software implementation of multimedia compression is an enabling technology for the widespread use of computer based multimedia communications. M-JPEG offers reasonable compression at reasonable computational cost. This paper presents modifications to the well-known JPEG compression algorithm to achieve a 60 - 70% speedup of digital video compression. The scheme presented exploits dependencies between frames to predict the DCT coefficients in a frame based on previous frames in the sequence. This knowledge is used to reduce the computational complexity of the DCT transforms. The process-based approach introduces only a mild overhead of about 0.4% into the compression. Performance measurements with real video sequences demonstrate the increased performance of the modified JPEG process for digital video.
This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.
Hybrid ARQ schemes can yield much better throughput and reliability than static FEC schemes for the transmission of data over time-varying wireless channels. However these schemes result in higher delay. They adapt to the varying channel conditions by retransmitting erroneous packets, this results in variable effective data rates for current PCS networks because the channel bandwidth is constant. Hybrid ARQ schemes are currently being proposed as the error control schemes for real-time video transmission. The standardization process is on-going in ITU, MPEG-4 and wireless ATM forum. The important issue is how to ensure low delay while taking advantage of the high throughput and reliability that these schemes provide for. In this paper we propose an adaptive source rate control (ASRC) protocol which can work together with the hybrid ARQ error control schemes to achieve efficient transmission of real-time video with low delay and high reliability. The ASRC scheme adjusts the source rate based on the channel conditions, the transport buffer occupancy and the delay constraints. It optimizes the video quality by dynamically changing both the number of the forced update (intracoded) macroblocks and the quantization scale used in a frame. The number of the forced update macroblocks used in a frame is first adjusted according to the allocated source rate. This reduces the fluctuation of the quantization scale with the change in the channel conditions during encoding so that the uniformity of the video quality is improved. The simulation results show that the proposed ASRC protocol performs very well for both slow fading and fast fading channels.
The emergence of streaming multimedia players provides users with low latency audio and video content over the Internet. Providing high-quality, best-effort, real-time multimedia content requires adaptive delivery schemes that fairly share the available network bandwidth with reliable data protocols such as TCP. This paper proposes a new flow and congestion control scheme, SCP (streaming control protocol), for real- time streaming of continuous multimedia data across the Internet. The design of SCP arose from several years of experience in building and using adaptive real-time streaming video players. SCP addresses two issues associated with real- time streaming. First, it uses a congestion control policy that allows it to share network bandwidth fairly with both TCP and other SCP streams. Second, it improves smoothness in streaming and ensures low, predictable latency. This distinguishes it from TCP's jittery congestion avoidance policy that is based on linear growth and one-half reduction of its congestion window. In this paper, we present a description of SCP, and an evaluation of it using Internet- based experiments.