Shot boundary detection (SBD) is the first fundamental step to managing video databases. It segments video data into the basic units for indexing and retrieval. Many automatic SBD techniques exist. They, however, are based on sequential search, and therefore too expensive for practical use. To address this problem, we explore a different direction to SBD in this paper. We investigate a non-linear approach in which most video frames do not need to be compared. This idea is fundamentally different from all existing methods. In fact, it is orthogonal to these schemes in the sense that it can be applied to substantially improve their performance. Our experiments show that this idea speeds up a conventional method based on color histograms up to 16 times while preserving the same accuracy. On the average, the improvement is five time according to our experiments on 26 videos of six different types.
Video conferencing and high quality video-on-demand services are very desirable for many Internet users. However, Internet access channels ranging from wireless connections to high- speed ATM networks mean great heterogeneity with respect to bandwidth. Hierarchical video encoders that scale and distribute video data over different layers enable users to adapt video quality to the capacity of their Internet connection. However, the construction of the layers at the encoder determines the video quality that can be expected at the receiver. To achieve an optimal configuration of the different layers with respect to visual quality, we propose a hybrid scaling algorithm that scales video data both in spatial and temporal dimension. Using a quality metric based on properties of the human visual system our algorithm calculates an optimal ratio between spatial and temporal information. Additionally, we present experimental results that demonstrate the capabilities of our approach.
e-Seminar is an IBM-Research internal prototype platform to allow all IBM-Research employees access to videos and slides of talks, seminars, presentations and other events at any IBM- Research campus worldwide. The platform facilitates an increased use of digital video by lowering the cost and complexity thresholds for the use of video in live and in on demand applications. The e-Seminar system therefore increases the leverage of intellectual property, and helps to improve the communication between the laboratories of the IBM-Research Division. All of the above is reached through use of a distributed Video-on-Demand system on all sites of the IBM- Research Division. It is a highly integrative project, assembling tools, techniques and insights from a broad variety of knowledge fields including but not limited to networking, video recording/encoding, data streaming, video analysis, HCI, MM authoring, data visualizing and distributed systems management. The e-Seminar system is designed to serve as a base platform to perform research on the broad range of all these areas. Currently ongoing research in the project explores how such a system can be built and operated with the least amount of people involved while at the same time reducing the breadth and depth of skills necessary. Additional research is done in the area of real time analysis and indexing of the given material in ways that are relevant to the teaching purpose of the system. Also new ways of video network caching are explored.
Complex multimedia applications have diverse resource and timing requirements. A platform for building such programs therefore should supply the developer with mechanisms for managing concurrency, communication, and real-time constraints but should remain flexible with regard to scheduling policies and interaction models. We have developed such a platform consisting of a user-level threads package and operating system extensions. The threads package offers a message-based threading model uniformly integrating synchronous and asynchronous communication, inter-thread synchronization, and signal handling as well as real-time functionality and application-specific scheduling. To support this user-space flexibility an upcall mechanism links the user-level scheduler to the kernel.
A packet scheduler is an operating system component that controls the allocation of network interface bandwidth to outgoing network flows. By deciding which packet to send next, packet schedulers not only determine how bandwidth is shared among flows, but also play a key role in determining the rate and timing behavior of individual flows. The recent explosion of rate and timing-sensitive flows, particularly in the context of multimedia applications, has focused new interest on packet schedulers. Next generation packet schedulers must not only ensure separation among flows and meet real-time performance constraints, they must also support dynamic fine- grain real-location of bandwidth for flows with variable-bit- rate requirements. Unfortunately, today's packet schedulers either do not support rate and timing sensitive flows, or do so with reservation systems that are relatively coarse-grain and inflexible. This paper makes two contributions. First it shows how bandwidth requirements can be inferred directly from real-rate flows, without requiring explicit specifications from the application. Second, it presents the design, implementation and performance evaluation of a rate-matching packet scheduler that uses these inferred requirements to automatically and dynamically control the bandwidth allocation to flows.
To allow continuous media applications fine-grained control over their CPU allocations, and to protect these allocations from each other, thread priorities must have quality-of- service (QoS) interpretation. To this end, we present a CPU scheduler based on the well-defined resource specification of service curve. Service curve is distinguished from the traditional notion of rate by its ability to flexibly decouple delay and rate performance. Apart from how we compute thread priorities, predictable performance is hard to achieve also because threads can interact with each other and contend for synchronization resources. Such interactions can contribute to various forms of priority inversion. We discuss a new approach of dynamic priority inheritance in our CPU scheduler that solves priority inversion due to lock contention. To solve priority inversion arising from incompatible client/server resource specifications, we employ a train abstraction that allows a thread of control to visit multiple protection domains while carrying its resource and scheduling state intact. Train has been applied to real applications like a Solaris X window server. Finally, we present a mechanism for Internet flow specifications to reserve CPU time for network receive interrupt processing. We demonstrate an experimental system in which the combined techniques provide effective CPU isolation under various conditions of lock contention, client/server programming, and network processing.
Providing interactive video on hand-held, mobile devices is extremely difficult. These devices are subject to processor, memory, and power constraints, and communicate over wireless links of rapidly varying quality. Furthermore, the size of encoded video is difficult to predict, complicating the encoding task. We present Fugue, a system that copes with these challenges through a division along time scales of adaptation. Fugue is structured as three sperate controllers: transmission, video and preference. This decomposition provides adaptation along different time scales: per-packet, per-frame, and per-video. The controllers are provided at modest time and space costs compared to the cost of video encoding. We present simulations confirming the efficacy of our transmission controller, and compare our video controller to several alternatives. We find that, in situations amenable to adaptive compression, our scheme provides video quality equal to or better than the alternatives at a comparable or substantially lower computational cost. We also find that distortion, the metric commonly used to compare mobile video, under-values the contribution smooth motion makes to perceived video quality.
Until there is greater consensus on proposals for realizing better-than-best-effort services on the Internet, developers of multimedia and distributed virtual environment applications must rely on best-effort media adaptations to ameliorate the effects of network congestion. We present the results of a study on the use of adaptations originally developed for audio and video applications for the data-flows generated by the UNC nanoManipulator. The nanoManipulator is a virtual environment interface to a scanned-probe microscope that has been used by scientists as a tool for basic research in the material and biological sciences. We are building a distributed version of the system for operation over the Internet and are investigating media adaptations for realizing application performance requirements. The results of early experiments with audio and video-centric media adaptations applied to the flows generated by a microscope and a haptic force feedback device are promising. A simple forward error correction scheme provides good recovery from packet loss and an elastic display-queue management scheme limits the impact of delay- jitter and results in more continuous playout of media samples. These preliminary results provide evidence that a sophisticated virtual environment interface can operate over modest distances over the Internet to control a remote microscope in real-time.
In this work, we propose a virtual media (Vmedia) access protocol for interactive image browsing over the Internet. Vmedia is a multimedia communication protocol in a server- client environment. It enables the access and delivery of segments of media, and manages the delivered media segments with a local Vmedia cache. An interactive JPEG 2000 image browser is implemented with Vmedia. The image is compressed with JPEG 2000 into one single bitstream and put on the server. In the interactive browsing process, the user specifies a region of interest (ROI) with a certain spatial and resolution constraint. The Vmedia browser only downloads the media segments covering the current ROI, and the download is performed in a progressive fashion so that a coarse view of the ROI can be rendered very quickly and then gradually refined as more and more bits arrive. In the case of switch of ROI, e.g, zooming in/out or panning around, the Vmedia browser will use existing media segments in cache to quickly render a coarse view of the new ROI, and in the same time, request a new set of media segments corresponding to the updated ROI. Vmedia greatly improves the browsing experience of large images over slow networks.
Live Internet streaming media programs, called webcasts, can adopt techniques developed by television to obtain higher quality. We have developed a general webcast production model composed of three stages (i.e., sources, broadcast, and transmission) and a tool, called the Director's Console (dc), to control live webcasts. The tool is one component of a distributed service architecture, which adapts to varying physical infrastructure and broadcast configurations.
Constructs to assist in the design of multimedia documents are different in nature from constructs for document descriptions to be processed by multimedia systems. In this article we introduce the media construction pattern (MCP), a new formalism supporting abstraction and re-use at the artifact design level. MCP relies on a temporal model similar to that of SMIL Boston, but targets design issues. The main characteristics of MCP are: (1) its ability to support an arbitrary level of abstraction, (2) its ability to describe generic behavior, using the notions of roles, players and actors, (3) it use of evolution laws to provide an integrated specification of spatial, temporal, and individual behavior, (4) the specification of computable compositions which may depend on run time events, (5) an author-friendly graphical representation. We present a set of criteria for the expressive power of constructs supporting multimedia design, and we discus how MCP satisfies these criteria.
Transcoding is a technique employed by network proxies to dynamically customize multimedia objects for prevailing network conditions and individual client characteristics. Transcoding can be performed along a number of different axes and the specific transcoding technique used depends on the type of multimedia object. Our goal in this paper is to understand the nature of typical Internet images and their transcoding characteristics. We focus our attention on transcodings intended to customize an image for file size savings. Our results allow the developers of a transcoding proxy server to choose the appropriate transcoding techniques for the important classes of Internet images. We analyze the characteristics of images available on the Web through a representative trace. We show that most GIF images accessed on the Internet are small; about 80% of the GIF images are smaller than 6 KBs. JPEG images are larger than GIF images; about 40% of the JPEG images are larger than 6 KBs. We also establish the characteristics of popular image transcoding operations. We show that for JPEG images, the JPEG compression metric and a transcoding that reduces the spatial geometry are productive transcoding operations (saves at least 50% of the file size for 50% of the images). Our systematic study of image characteristics leads to some surprising results. For example, a naive spatial geometry reduction of GIF images by a factor of 2 along each axis actually causes an increase in the file size compared to the original image for 40% of the images. Thus it is important to understand the characteristics of individual images before choosing the proper transcoding operation.
This paper describes the development of a prototype of a Web Image Search Engine (WISE), which allows users to search for images on the WWW by image examples, in a similar fashion to current search engines that allow users to find related Web pages using text matching on keywords. The system takes an image specified by the user and finds similar images available on the WWW by comparing the image contents using low level image features. The current version of the WISE system consists of a graphical user interface (GUI), an autonomous Web agent, an image comparison program and a query processing program. The users specify the URL of a target image and the URL of the starting Web page from where the program will 'crawl' the Web, finding images along the way and retrieve those satisfying a certain constraints. The program then computes the visual features of the retrieved images and performs content-based comparison with the target image. The results of the comparison are then sorted according to a certain similarity measure, which along with thumbnails and information associated with the images, such as the URLs; image size, etc. are then written to an HTML page. The resultant page is stored on a Web server and is outputted onto the user's Web browser once the search process is complete. A unique feature of the current version of WISE is its image content comparison algorithm. It is based on the comparison of image palettes and it therefore very efficient in retrieving one of the two universally accepted image formats on the Web, 'gif.' In gif images, the color palette is contained in its header and therefore it is only necessary to retrieve the header information rather than the whole images, thus making it very efficient.
Supporting collaborative multimedia content management activities, as e.g., image and video acquisition, exploration, and access dialogues between naive users and multi media information systems is a non-trivial task. Although a wide variety of experimental and prototypical multimedia storage technologies as well as corresponding indexing and retrieval engines are available, most of them lack appropriate support for collaborative end-user oriented user interface front ends. The development of advanced user adaptable interfaces is necessary for building collaborative multimedia information- space presentations based upon advanced tools for information browsing, searching, filtering, and brokering to be applied on potentially very large and highly dynamic multimedia collections with a large number of users and user groups. Therefore, the development of advanced and at the same time adaptable and collaborative computer graphical information presentation schemes that allow to easily apply adequate visual metaphors for defined target user stereotypes has to become a key focus within ongoing research activities trying to support collaborative information work with multimedia collections.
Scalable network servers are increasingly identified as a critical component in the exponential growth of the Internet. We focus on media servers for variable bit rate streams and study the scalability of alternative disk striping policies, previously known and new. In contrast to results of previous studies, we show that the highest sustained number of streams that can be supported increases almost linearly with the number of disks. We believe that important role for our conclusion play the performance evaluation method that we use and a new disk space allocation technique that we introduce. Also, with reasonable technological projections, our arguments remain valid into the foreseeable future.
In traditional near-VOD (NVOD), the number of streams required is still high if the user delay goal is low (say, 2 minutes). In this paper, we study the use of client buffering to reduce the bandwidth requirement given a certain maximum user delay goal. Driven by the observation that broadband networks to the home has become a reality, we propose a broadcasting scheme termed 'stream-bundling,' which proves to be effective for popular movies. The scheme groups (i.e., 'bundles') the server streams into channels of incrementally increasing bandwidth. Such high-speed bundled channels are used to deliver the beginning segment of a video to the client so that it can merge with an on-going multicast stream quickly. By comparing with other previously proposed broadcasting schemes such as Pyramid Broadcasting, Skyscraper Broadcasting and Harmonic Broadcasting, stream -bundling is shown to achieve similar level of performance with much lower complexity (without many channels to manage and to hop). We then obtain the minimum request rate of a movie under which such stream-bundling technique should be used over the one-stream-per-request approach. By considering a video system, we show that the bandwidth requirement of a system can be reduced significantly as compared with the traditional NVOD (by more than 50%), with the cost of only little client buffering (<EQ 20% of the movie length).
A recently proposed streaming media file caching algorithm, called Resource Based Caching (RBC), considers the impact of both file size and required delivery bandwidth in making cache insertion and replacement decisions. Previous comparisons between RBC and the least-frequency-used (LFU) policy conclude that RBC provides a better byte hit ratio in the cache. This paper revisits this policy comparison over a much broader region of the system design space than previously considered. The results provide more complete insight into the behavior of RBC, and support new conclusions about the relative performance of RBC and LFU. A new policy, Pooled RBC, is proposed. Pooled RBC includes three improvements to the original RBC policy and has significantly better performance than RBC. Finally, a new hybrid LFU/interval caching strategy is proposed. The new hybrid policy is significantly simpler to implement than RBC and performs as well or better than both Pooled RBC and LFU.