Despite advances in networking technology, the limitation of the server bandwidth prevents multimedia applications from taking full advantage of next-generation networks. This constraint sets a hard limit on the number of users the server is able to support simultaneously. To address this bottleneck, we propose a Caching Multicast Protocol (CMP) to leverage the in-network bandwidth. Our solution caches video streams in the routers to facilitate regional services in the immediate future. In other words, the network storage is managed as a huge `video server' to allow the application to scale far beyond the physical limitation of its video server. The tremendous increase in the service bandwidth also enables the system to provide true on-demand services. To assess the effectiveness of this technique, we develop a detailed simulator to compare its performance with that of our earlier scheme called Chaining. The simulation results indicates that CMP is substantially better with many desirable properties as follows: (1) it is optimized to reduce traffic congestion; (2) it uses much less caching space; (3) client workstations are not involved in the caching protocol; (4) it can work on the network layer to leverage modern routers.
Current media servers do not provide the generality required to easily integrate arbitrary isochronous processing algorithms into streams of continuous media. Specifically, present day video server architectures primarily focus on disk and network strategies for efficiently managing available resources under stringent QoS guarantees. However, they do not fully consider the problems of integrating the wide variety of algorithms required for interactive multimedia applications. Examples of applications benefiting from a more flexible server environment include watermarking, encrypting or scrambling streams, visual VCR operations, and multiplexing or demultiplexing of live presentations. In this paper, we detail the MediaMesh architecture for integrating arbitrary isochronous processing algorithms into general purpose media servers. Our framework features a programming model through which user-written modules can be dynamically loaded and interconnected in self-managing graphs of stream processing components. Design highlights include novel techniques for distributed stream control, efficient buffer management and QoS management. To demonstrate its applicability, we have implemented the MediaMesh architecture in the context of a commercial video server. We illustrate the viability of the architecture through performance data collected from four processing modules that were implemented to facilitate new classes of applications on our video server.
Internet video-on-demand (VoD) today streams videos directly from server to clients, because re-distribution is not established yet. Intranet solutions exist but are typically managed centrally. Caching may overcome these management needs, however existing web caching strategies are not applicable because they work in different conditions. We propose movie distribution by means of caching, and study the feasibility from the service providers' point of view. We introduce the combination of our reliable multicast protocol LCRTP for caching hierarchies combined with our enhancement to the patching technique for bandwidth friendly True VoD, not depending on network resource guarantees.
In this paper we first analyze the concealment performance of the G.729 decoder. We find that the loss of unvoiced frames can be concealed well. Also, the loss of voiced frames is concealed well once the decoder has obtained sufficient information on them. However the decoder fails to conceal the loss of voiced frames at an unvoiced/voiced transition because it extrapolates internal state (filter coefficients and excitation) for an unvoiced sound. Moreover, once the encoder has failed to build the appropriate linear prediction synthesis filter, it takes a long time for the decoder to resynchronize with the encoder. Using this result, we then develop a new FEC scheme to support frame-based codecs, which adjusts the amount of added redundancy adaptively to the properties of the speech signal. Objective quality measures (ITU P.861A and EMBSD) show that our speech property-based FEC scheme achieves almost the same speech quality as current FEC schemes while approximately halving the amount of necessary redundant data to adequately protect the voice flow.
In this paper, we introduce a novel technique for the modeling of variable-bit-rate MPEG video streams. This technique is designed to aid multimedia systems researchers interested in using real compressed video data to study the systems they are designing without the overhead of having to digitize and compress the video stream multiple times. Our proposed approach differs from current approaches in that it uses an inexpensive non-inter-coded video capture board to digitize the movie data and then statistically generates MPEG data using the captured movie itself. Our model uses a subsample from the digitized data to create a linear regression model of the target video (both quality and frame pattern). Using this linear regression model, our approach statistically generates the target video. We have digitized and compressed 29 hours of constant-quality MPEG video to verify our pseudo-modeling approach and to allow other researchers to verify this model as well.
Universal access to the WWW is the vision in which all information, from any source, can be accessed anywhere, by any devices, in a consistent and straightforward way. However, the existing web paradigm in which the web server defines the content delivered to the Internet client device has hindered content accessibility for pervasive and ubiquitous devices. When delivering information over the Internet to pervasive and ubiquitous devices, content providers face the considerable challenge of sending and presenting content in a way that makes it usable to resource-limited devices. Transcoding is instrumental to enabling universal access to the web for pervasive and ubiquitous computing devices. In this paper, we propose a taxonomy of transcoding techniques. The main contributions of the proposed taxonomy are (1) to provide a road map of the work accomplished to date by presenting dimensions of characteristics derived from our study of transcoding models and techniques, and (2) to help identify unsolved problems that exist in the domain today. Based on the proposed taxonomy, we have analyzed several existing commercially available systems, and investigated possible future improvements on these technologies.
With rapid progress in both computers and networks, real- time multimedia applications are now possible on the Internet. Sine the Internet was designed to support traditional applications, multimedia applications on the Internet often suffer from unacceptable delay, jitter and data loss. Among these, data loss often has the largest impact on quality. In this paper, we propose a new forward error correction technique for video that compensates for lost packets, while maintaining minimal delay. Our approach transmits a small, low-quality redundant frame after each full-quality primary frame. In the event the primary frame is lost, we display the low-quality frame, rather than display the previous frame or retransmit the primary frame. To evaluate our approach, we simulated the effect of network data loss on MPEG video clips and repaired the data loss by using redundancy frames. We conducted user studies that experimentally measured users' opinions on the quality of the video streams in the presence of data loss, both with and without our redundancy approach. In addition, we analyze the system overhead incurred by the redundancy. We find that video redundancy can greatly improve the perceptual quality of video in the presence of network data loss. The system overhead that redundancy introduces is dependent on the quality of the redundant frames, but a typical redundancy overhead will be approximately 10% that of the original frames.
The explosive growth of the Internet has come with increasing diversity and heterogeneity in terms of client device capability, network bandwidth, and user preferences. To date, most Web content has been designed with desktop computers in mind, and often contains rich media such as images, audio, and video. In many cases, this content is not suitable for devices like netTVs, handheld computers, personal digital assistants, and smart phones with relatively limited display capability, storage, processing power, and network access. Thus, Internet access is still constrained on these devices and there is a need to develop alternative approaches for information delivery. In this paper, we propose a framework for adaptive content delivery in heterogeneous environments. The goal is to improve content accessibility and perceived quality of service for information access under changing network and viewer conditions. The framework includes content adaptation algorithms, client capability and network bandwidth discovery methods, and a Decision Engine for determining when and how to adapt content. We describe this framework, initial system implementations based upon this framework, and the issues associated with the deployment of such systems based on different architectures.
In different areas of applications such as education, entertainment, medical surgery, or space shuttle launching, distributed visual tracking systems are of increasing importance. In this paper we describe the design, implementation and evaluation of OmniTrack, a distributed omni-directional visual tracking system, developed at the University of Illinois at Urbana-Champaign, with an Adaptive Middleware Architecture as the core of the system. With respect to both operating systems and network connections, adaptation is of fundamental importance to the tracking system, since it runs in an environment with large performance variations and without support of Quality of Service guarantees.
This paper provides two contributions to the study of OS endsystem support for real-time Object Request Broker (ORB) middleware. First, we empirically compare and evaluate the suitability of real-time operating systems, VxWorks and LynxOS, and general-purpose operating systems with real-time scheduling classes, Windows NT, Solaris, and Linus, for real-time ORB middleware. While holding the hardware and ORB constant, we systematically vary the OS and measure key platform-specific variations in latency, jitter, operation throughput, and CPU processing overhead. Second, we describe specific areas where these operating systems must improve so that ORB middleware will be predictable, efficient, and scalable enough to support the QoS requirements of multimedia applications.
Despite evidence of rising popularity of video on the web (or VOW), little is known about how users access video. However, such a characterization can greatly benefit the design of multimedia systems such as web video proxies and VOW servers. Hence, this paper presents an analysis of trace data obtained from an ongoing VOW experiment in Lulea University of Technology, Sweden. This experiment is unique as video material is distributed over a high bandwidth network allowing users to make access decisions without the network being a major factor. Our analysis revealed a number of interesting discoveries regarding user VOW access. For example, accesses display high temporal locality: several requests for the same video title often occur within a short time span. Accesses also exhibited spatial locality of reference whereby a small number of machines accounted for a large number of overall requests. Another finding was a browsing pattern where users preview the initial portion of a video to find out if they are interested. If they like it, they continue watching, otherwise they halt it. This pattern suggests that caching the first several minutes of video data should prove effective. Lastly, the analysis shows that, contrary to previous studies, ranking of video titles by popularity did not fit a Zipfian distribution.
Multimedia objects are difficult to query since much of the content of the object is beyond the interpretive capabilities of a computer. State-of-the-art multimedia content-based retrieval systems focus on post mortem analysis of the underlying data objects to derive semantic information. With the emergence of inexpensive I/O technology we can shift some of the burden of this content analysis to `smart' devices that can capture context information about a multimedia object. Context information includes the geographical location an object is created or used, identities of users, activities of users and other applications, etc. Such information can be used to index the multimedia objects to perform content-based queries with minimal signal processing. We are concerned with the design and implementation of a context-based retrieval system for IP-multicast videoconferences. The Multimedia Internet Recorder and Archive (MIRA) models a videoconference as a Bayesian network that describes bounds on `cost' and `reliability' for a recording task. Context about a videoconference is derived from monitoring the RTCP control channel and messaging events that occur among meeting participants. MIRA creates an index of context information as an XML document that is used by clients to visualize, browse, and retrieve multimedia objects. MIRA also creates event-based transcripts as summaries of meetings. These transcripts are available over the WWW.
The number of video conferences conducted over the Internet has constantly increased during the last years. The need to archive the multimedia data streams of the conferences became apparent, and a number of tools accomplishing this task for audio and video streams were developed. In many video conferencing scenarios, shared whiteboards are used in addition to audio and video to transmit slides or to sketch ideas. However, none of the existing recording tools provides an efficient recording service for data streams of these tools. In this paper we present a new approach to the recording and playback of shared whiteboard media streams. We discuss generic design issues of a shared whiteboard recorder, and we present a novel algorithm that enables efficient random access to the recorded streams. We describe an implementation of our algorithms for the media streams of our digital lecture board.
This paper describes the implementation of a system to deliver Quality of Service for IP flows using a DiffServ- like packet marking mechanism. The system uses an unmodified commodity operating system (Windows NT), and a policy daemon is employed to implement arbitrary policies for QoS via a scripting mechanism. By interposing an agent in the protocol stack used by the application runtime system, off-the-shelf applications can have different packet forwarding policies assigned to different flows they originate, without any need to recompile either the operating system or the application. The principle of the system can be naturally extended to implement more widely coordinated policy-based networking, and network reservations using protocols such as RSVP, without any need to recompile applications.
For the same long-term loss ratio, different loss patterns lead to different application-level Quality of Service (QoS) perceived by the users (short-term QoS). While basic packet loss measures like the mean loss rate are widely used in the literature, much less work has been devoted to capturing a more detailed characterization of the loss process. In this paper, we provide means for a comprehensive characterization of loss processes by employing a model that captures loss burstiness and distances between loss bursts. Model parameters can be approximated based on run-lengths of received/lost packets. We show how the model serves as a framework in which packet loss metrics existing in the literature can be described as model parameters and thus integrated into the loss process characterization. Variations of the model with different complexity are introduced, including the well-known Gilbert model as a special case. Finally we show how our loss characterization can be used by applying it to actual Internet loss traces.
Shared networks are now able to support a wide range of applications, including real-time multimedia. This has led the networking community to consider a wider range of network Quality of Service (QoS) guarantees and pricing schemes. To date, the QoS required by networked multimedia applications has been described in terms of technical parameters. We argue that, in order to maximize the realized quality of any network, the QoS requirements of networked multimedia applications should be based on the value that users ascribe to the media quality they receive in the content of a particular task. This argument is supported with results from a set of studies in which users' perceptions of media quality was investigated for a listening task. We found that users' expectancies of quality directly influenced their ratings: low expectancies produce higher ratings for the same level of objective quality-- provided that quality is predictable. In conclusion, we outline the implications of our studies for the design of networked multimedia applications and the network services that support them.
The transition to digital information and networking of new digital devices lead to considerable changes in the consumer electronic industry. New applications will arise, offering more entertainment, comfort and flexibility. To achieve this, complex problems in communication and distributed systems need to be solved. High requirements on stability, usability, quality and price call for new solutions. This paper describes the concept of In-Home Digital Networks and will then in detail address the WWICE system. This new architecture provides a coherent system environment for the home. It focuses on services and applications, which can easily be accessed and controlled by the user. Application framework and middleware services of the layered software architecture efficiently support development of IHDN applications as well as flexible application control at runtime.
This paper proposes a new technique for on-demand delivery of streaming media. The idea is to hold in reserve, or `skim', a portion of the client reception bandwidth that is sufficiently small that display quality is not impacted significantly, and yet that is nonetheless enough to support substantial reductions in server and network bandwidth through near-optimal hierarchical client stream merging. In this paper we show that this objective is feasible, and we develop practical techniques that achieve it. The results indicate that server and network bandwidth can be reduced to on the order of the logarithm of the number of clients who are viewing the object, using a small `skim' (e.g., 15%) of client reception bandwidth. These low server and network bandwidths are achieved for every media file, while providing immediate service to each client, and without having to pre-load initial portions of the video at each client.
We propose a reactive broadcasting protocol that addresses the problem of distributing moderately popular videos in a more efficient fashion. Like all efficient broadcasting protocols, reactive broadcasting assumes that the customer set-top box has enough local storage to store at least one half of each video being watched. Unlike other broadcasting protocols, reactive broadcasting only broadcasts the later portions of each video. the initial segment of each video is distributed on demand using a stream tapping protocol. Our simulations show that reactive broadcasting outperforms both conventional broadcasting protocols and pure stream tapping for a wide range of video request rates.
This paper addresses optimizing cache allocation in a distributed image database system over computer networks. We consider progressive image file formats, and `soft' caching strategies, in which each image is allocated a variable amount of cache memory, in an effort to minimize the expected image transmission delay time. A simple and efficient optimization algorithm is proposed, and is generalized to include multiple proxies in a network scenario. With optimality proven, our algorithms are surprisingly simple, and are based on sorting the images according to a special priority index. We also present an adaptive cache allocation/replacement strategy that can be incorporated into web browsers with little computational overhead. Simulation results are presented.
The transfer of live media streams such as video and audio over the Internet is subject to several problems, static and dynamic by nature. Important quality of service (QoS) parameters do not only differ between various receivers depending on their network access, service provider, and nationality, the QoS is also variable in time. Moreover the installed receiver base is heterogeneous with respect to operating system, browser or client software, and browser version. We present a new concept for serving live media streams. It is not longer based on the current one-size-fits all paradigm, where the server offers just one stream. Our compresslet system takes the opposite approach: it builds media streams `to order' and `just in time'. Every client subscribing to a media stream uses a servlet loaded into the media server to generate a tailored data stream for his resources and constraints. The server is designed such that commonly used components for media streams are computed once. The compresslets use these prefabricated components, code additional data if necessary, and construct the data stream based on the dynamic available QoS and other client constraints. A client-specific encoding leads to resource- optimal presentation that is especially useful for the presentation of complex multimedia documents on a variety of output devices.
Digital teletext will be the most prominent multimedia service offered in digital television networks in Europe. Teletext is well known and has already been available for many years in analog television in European and elsewhere as an additional service provided by television operators. Digital Broadcast technology allows to enhance this service in quality and functionality. It provides many advantages in comparison so the analog distribution methods, such as possible reuse of content from Internet-based services, improved navigation, better layout capabilities and support of animated content. These capabilities give more freedom to the application designers. This paper gives an overview about the teletext technologies which are currently introduced in various European countries. It is explained which standards are used, what kind of services are been provided, and how they differ from each other and from the services offered in the World Wide Web.
Scene is considered a good unit for indexing and retrieving data from large video databases. In this paper, we present a new content-based approach for detecting and classifying scene changes in video sequences. Our technique can detect and classify not only abrupt changes (i.e., hard cuts) but also gradual changes such as fades and dissolves. We compute background difference between frames, and use background tracking to handle various camera motions. Although our method processes significantly less data, it results in more semantically rich pieces (i.e., scenes). Our experiments on various types of videos indicate that the proposed technique is much less sensitive to the predefined threshold values, and is very effective in reducing the number of false hits. Our approach is particularly suitable for very large video databases because it is both space and time efficient.
Today's wide variety of computing devices offer a large range of resource availability. These resources include CPU speed, bandwidth, and memory. Workstations and PCs typically are rich in resources, whereas palmtop devices are generally quite limited. This disparity offers challenges to integrating these heterogeneous devices into a single distributed system. Services must be available to each device, but it may be necessary to modify certain services if the connected device does not have the required resources to support them. Proxies may be introduced into the system to off-load computations that would preclude certain services to resource-deprived devices. We have implemented one such proxy that enables the viewing of live MPEG video on the 3Com PalmPilot. The proxy is able to transform the video feed on-the-fly, removing extraneous information, thereby reducing CPU and memory requirements and allowing palm devices to participate in video sessions. This paper discusses the design and performance of our video proxy targeted for 3Com PalmPilot handheld computers. Several protocols are compared and the advantages of each are discussed.
In this paper we present the WebSTAR system, which is a prototype of a video database on the World Wide Web. The WebSTAR system allows owners of video material to make their video available on the Internet together with suitable metadata descriptions. Through ordinary web browser, users on the Internet can search, browse and play back the available video material on their computer screen. Users who want a deeper understanding of the contents of the video can have the relevant metadata visualized in synchronicity with the video playback.