Hyperlinked video is video in which specific objects are made selectable by some form of user interface, and the user's interactions with these objects modify the presentation of the video. Identifying and tracking the objects remains one of the chief difficulties in authoring hyperlinked video; we solve this problem through the use of a video tracking and segmentation algorithm that uses color, texture, motion, and position parameters. An author uses a computer mouse to scribble roughly on each desired object in a frame of video, and the system generates a segmentation mask for that frame and following frames. We have applied this technique in the production of a soap opera program, with the result that the user can inquire about purchasing clothing and furnishings used in the show. We will discuss this and other uses of the technology, describe our experiences in using the segmentation algorithm for hyperlinked video purposes, and present several different user-interface methods appropriate for hyperlinked video.
In this paper we describe elements of an ubiquitous vision system - a system that enables viewers to observe a remote dynamic scene from any viewing perspective at any time instant. This integrated framework has the potential to achieve real-time generation of many 3D virtual views concurrently with high-dynamic events in the scene, by not utilizing 3D model. The paper presents a brief overview of the system including its requirements, components and preliminary results of acquiring 3D range estimation and virtual view generation.
This work describes a new method for locating moving lips in digital video sequences and applications for this technology in multimedia information retrieval and robotics. The capture on digital video of groups of speakers is very common. Determining which person in such a group is speaking is useful in many applications, including robot navigation, speaker identification and video segmentation for the purpose of retrieval in recorded proceedings.
In order to assist the people who have a tendency to wander away, a human tracking system using wireless multimedia technology was studied. This system consists of a compact mobile unit which can be carried in one's pocket and a base unit at home, both of which are connected through mobile public network. The positioning performance can be realized using GPS. In this paper first a tracking system structure and a performance principle are described. From the fundamental experiments, such as position measurement and position data transmission, the system can be modified to be used as human tracking system. Moreover, several subjects, such as compactness of the mobile unit, improvement of position precision, and high reliability of human tracking continue to be studied.
This paper presents the future multimedia standard MPEG-4 and discusses its potential use for broadband applications. After an introduction into MPEG-4, scenarios for broadband MPEG-4 sessions are described and some examples are presented. At the same time, the structure of a decoder terminal is explained and the relevant parts from the broadband key issues of the standard are presented in more detail.
In this paper, we present an implementation of an MPEG-4 decoder using Java for dynamic processing, i.e. providing flexibility and extensibility. The advantage of Java is its platform independent paradigm using a virtual machine. This enables us to provide downloading of tools and also dynamic configuration of already downloaded tools. However, the disadvantage of Java is its low performance. Therefore we propose a hybrid implemented approach using Java implementations only for flexibility and extensibility. All the rest of the decoder is implemented in native code, providing the high performance necessary for real time issues. We use Java only where Java is necessary. To integrate Java with the native code implementations we utilize the Java native interface (JNI). We use the JNI to create an instance of the Java virtual machine (JVM) in the running MPEG-4 application. This JVM instance handles all Java decoder tool implementations as well as incoming Java bit streams. All the other data streams are handled by the native implemented part.
MPEG-4 is meant to be the integrating standard concerning multimedia in the future. This paper wants to highlight experiences from the participation in the EU project EMPHASIS the authors within a research institution being involved mainly in application definition and system integration. Based on this a general framework for system design comparison is outlined.
Soon the days of separate transmission facilities for video will be over, and video will be just another traffic type in a multipurpose network. These networks today employ a variety of transport mechanisms, for historical as well as technical reasons, and, at least experimentally, enterprises, network operates, and service providers are sending video over these networks with their associated transport mechanisms. But video is not just a random bit stream, so it is natural to ask how well the transport mechanisms perform when transporting video. In this paper we examine what video needs for capture to playback and how these can be provided by the end system or the network using the MPEG transport stream, asynchronous transfer mode, and the Internet Protocol. We concentrate on the Internet Protocol because of the opportunity for networked video applications that its pervasion provides.
In this paper, we address the issue of error control in transmitting MPEG-2 encoded video streams over broadband fixed wireless access networks for broadcast or multicast services. Because of the error-prone nature of wireless channels, error control is mandatory when MPEG-2 video streams are transported over wireless access networks to end user. To prevent overloading the reliable wireline networks error control has to be applied locally. FEC is a must for broadcast or multicast services. Because of the important role of MPEG-2 control information in the decoding process, it must be given priority service in the form of excess error protection in order to achieve the desired QoS. In this paper, a header redundancy FEC (HRFEC) strategy is introduced and an implementation of it (type-I HRFEC scheme) is described. The overhead and delay jitter associated with the type-I HRFEC is also estimated. Simulation results on the performance of type-I HRFEC indicates that it improves the reception statistics of MPEG-2 control. As a direct, the quality, measured in terms of objective grade point and PSNR of the reconstructed video sequence, is improved.
The transmission of video over variable bit rate channels is a challenging problem. The requirements for an acceptable video quality are typically negotiated between the application layer and the network. The encoder of a variable bit rate video application can adjust several of its parameters to meet the requirements of the network. The main aim is to decrease the loss, delay variation. Network resources have to be properly assigned to meet a required service request. Network buffers need to be controlled carefully, since buffers are responsible for delay and loss distribution. On the other hand, during video transmission, the state of buffers at the encoder output, and decoder input are also very important. Since a buffer overflow or underflow indicates inappropriate resource distribution, it is essential to know when and how buffer overflow and underflows occur. The proposed research is based on the encoder and decoder buffer boundary definitions introduced in Nguyen et al. 97. The bit rate of the video is adjusted so that the encoder and decoder buffers neither overflow or underflow at any instance during the transmission. This is guaranteed by estimating the state of the decoder buffer at a fixed frame rate, which in turn is derived for the variable channel rates for the duration of transmission. Both the encoder and decoder buffer fullness information is used as a feedback to the encoder. The encoder, upon receiving this state information adjusts the bit rate for the subsequent frames. This research uses the analytically proven boundaries for the variable bit rate encoder and decoder provided and the behavior is simulated using several video sequences. Simulation results provide a better understanding of the boundary definitions and their practical applications. In addition to understanding the dynamic behavior in terms of the bit rates, this investigation also addresses the impact of the changes in the frame rate and other scalability factors controlled by the encoder. The results show that the boundary definitions, which in turn are used for rate control of the encoder and decoder, improve the quality of the video received.
An enhancement of the Piecewise Constant Rate Transmission and Transport (PCRTT) algorithm for reducing the burstiness of a video stream based on smoothing constant intervals is proposed. the two algorithms are compared by testing 12 compressed video streams according to the Motion-JPEG format. The new algorithms called e-PCRTT is shown to construct transmission rate-plans with smaller buffer sizes compared to the original PCRTT. Alternatively, for the same buffer size e-PCRTT reduces the number of bandwidth changes compared to PCRTT. In addition, e-PCRTT produces a rate- plane with smaller initial playback delay, which applications based on. We also introduce a new scheme for multiplexing several smoothed video traces, into a single constant-bit-rate channel that are synchronized nd have fixed-size intervals.
We are concerned about the transmission of delay-sensitive Variable Bit Rate Video (VBR) MPEG-2 sources over the Asynchronous Transfer Mode (ATM) networks. VBR sources allow the use of statistical multiplexing in the network to improve bandwidth utilization. In order to allow for an efficient multiplexing in the network, video sources must be well characterized in order to provide a precise estimation of the network resources needed by the source to achieve quality of service (QoS) requirements. The VBR-rt transfer capability defined by the ATM forum and the ITU-T uses two couples of leaky-bucket parameters as traffic descriptors for the VBR traffic: (the Peak Cell Rate, and CDVT) and, (the Sustainable Cell Rate, the Intrinsic Burst Tolerance). In previous works, we have shown that these traffic descriptors are too conservative and could allow only a very poor network utilization. We have also shown that an efficient multiplexing of VBR video sources could be achieved if the source rate distribution is modeled by a Gamma distribution characterized by the source mean rate and variance. We present in this paper an algorithm that allows the translation of the leaky-bucket parameters into the mean rate and variance needed for efficient video multiplexing. Therefore, for a source controlling its rate to be compliant with the leaky bucket parameters, the corresponding maximum rate variance could be estimated thus allowing a more efficient video multiplexing while maintaining QoS requirements.
This work addresses the optimization of TV-resolution MPEG-2 video streams to be transmitted over lossy packet networks. This paper introduces a new scene-complexity adaptive mechanism, namely the adaptive MPEG-2 Information Structuring (AMIS) mechanism. AMIS adaptively modulates the number of resynchronization points in order to maximize the perceived video quality assuming it is aware of the packet loss probability and the error concealment technique implemented in the decoder. The perceived video quality depends both on the encoding quality and the degradation due to data loss. Therefore, AMIS constantly determines the best compromise between the rate allocated to pure video information and the rate aiming at reducing the sensitivity to packet loss. Results show that the proposed algorithm behaves much better than the traditional MPEG-2 encoding scheme in terms of perceived video quality under the same traffic constraints.
In this paper, we describe modifications to the rate multicast audio tool to allow audio stream to be hierarchically encoded. This allows the rate of audio data transmission to be selected by the receiver. This is particularly useful in multicast sessions where many users participate with diverse connectivity. The hierarchical encoder also allows dynamic adjustment of the received data rate to network conditions. The algorithm is based upon a simple pulse code modulation scheme. Speech is sampled at 8,000 samples per second with 16 bits per sample. The bits are then divided into groups based on their overall relative importance to the sample. A base group is defined which provides a reasonable approximation to the sample, but which discards most details. Each successive group adds more detail. The groups, which together represent different resolutions, are sent as separate data streams. The receiver subscribes to a set of data streams such that the total data rate can feasibly be delivered by the network. Bandwidth usage and signal resolution are easily adapted by adding or dropping groups. Performance was evaluated based on continuity and intelligibility of the speech segments as well as bandwidth usage and loss.
Satellite-communication systems are expected to play a significant role in meeting global telecommunication demands in the 21st century. Recent convergence of computing and communications has opened new markets for services and applications such as web surfing, video conferencing, Intranet and High Speed Internet Access. Hybrid satellite and terrestrial solutions will provide an interconnectivity with distant/isolated nodes of the terrestrial network. A number of new satellite-communication systems have been proposed at geosynchronous earth orbit (GEO) and low earth orbit (LEO) configurations operating at Ka-band and above frequencies. Most of these system are planned to use on- board processing and ATM or 'ATM-like' switching. Although ATM technology has been designed to provide a transparent service over terrestrial networks, satellite ATM networks will provide global connectivity and statistical multiplexing gain while maintaining quality of service (QoS) requirements. To make future satellite systems for multimedia services feasible, several technical and regulatory/standards challenges have to be solved. This paper reviews briefly the planned GEO and LEO systems, addresses the multimedia satellite network architectural issues, and discussed the technical challenges such as traffic management, QoS, performance requirements, and possible solutions. the current status on the standards activities for satellite ATM networking is also provided.
Within a few years, the number of live audio and video programs available on the Internet will increase dramatically. As the INternet grows as an entertainment medium, content providers are eager to access ratings information to know better what people are interested in and be able to personalize the programs. However, whereas it is easy to know how many people have connected to a given Web site, gathering rating about what multicast viewers are watching does not scale, because there could potentially by millions of viewers connected to a multicast group. This paper presents a novel scheme at the application level to gather feedback form multicast receivers in a scalable way. Our model based on the random sampling of receivers, scale well for very large group since we only need partial feedback to estimate the ratings.
Video communication is one of the most resource demanding services any network has to support due to its high bandwidth requirement, bursty and real time nature. This paper investigates a new multiplexing scheme that improves the network efficiency by reducing the burstiness or variability of the combined MPEG video streams. 6 MPEG sources were individually processed and packetized then the packets joined a common queue and were served by an ATM server controlled by leaky bucket. The simulations were run based on both real MPEG 1 video sequences and sequences modeled by the TES technique. The TES model adopted was a composite of a number of individual processes in order to capture the characteristics of a MPEG video source. The advantages and limitations of the network efficiency significantly while maintaining the acceptable quality of service. The proposed TES model developed is a versatile method to simulate random video sources with various characteristics. The result can be used as a basis for the design and management of VOD like applications.
The combination of ATM and ADSL is fast becoming an attractive alternative for Internet access for home and small business. ADSL modems allow the use of the existing copper plant at speeds much higher than those afforded by traditional modem technologies. The use of ATM both enables the long-sought goal of an ATM end-to-end network, and allows, through the use of QOS guarantees, efficient use of the limited upstream bandwidth of ADSL. Although the client- server model, which typified classical Internet traffic and newer multimedia IP services, fits well an asymmetric network model, performance can be greatly impacted unless the interactions between ADSL, ATM, and Internet protocols are well understood an taken into account in the design of ATM interfaces. In this paper we investigate the potential limitations on performance in IP/ATM/ADSL networks and explain how, in our ATM interface designs, we have ameliorated these problems and optimized the use of IP services over such networks. We discuss the importance of 'traffic shaping', heretofore afforded little importance for IP traffic, and the impact of latency and asymmetric bandwidth of ADSL, on both traditional and multimedia IP services, in our implementations.
Congestion control is important for multimedia transportation in ATM networks. Congestion situation can be indicated by a single-bit congestion information, called congestion bit, carried in each cell header. Some congestion bit setting schemes have been sued for congestion control in these years. Although the existing schemes work well, they can not provide accurate congestion information, which is needed for some congestion control algorithms. In this paper,we propose a new scheme to set the single-bit congestion information by using statistics approach so that it can convey more accurate information. The existing statistics approach schemes can not provide correct congestion information while congestion occurs at more than one switch simultaneously. On the contrary, the new scheme can provide the correct information of the congestion situation at the bottle-neck, where the congestion is heaviest on the way from the traffic source to its destination. Theoretical analyses and simulation results show that the new approach is efficient.
The demand for Internet bandwidth has grown rapidly in the past few years. A new generation of broadband satellite constellations promise to provide high speed Internet connectivity to areas not served by optical fiber, cable or other high speed terrestrial connections. However, using satellite links to supply high bandwidth has been difficult due to problems with inefficient performance of the Internet's TCP/IP protocols over heterogeneous network environments, especially networks containing satellite links. The end-to-end connection is split into segments, and the protocol on the satellite segment is optimized for the satellite link characteristics. TCP congestion control mechanisms are maintained on each segment, with some coupling between the segments to produce the effect of end- to-end TCP flow control. We have implemented this design and present results showing that using such gateways can improve throughput for individual connections by a large factor over paths containing a satellite link.
A video-on-demand (VOD) system provides a service which enables users of the system to request in real time and the transmission of a video stream from a collection of available video material. Because different layers of a VOD system, one is resource management in the session and connection layer and the other is service control in the application layer, contain different interactive issues, a well design for controlling interactivities is needed. This paper studies a conceptual two-level gateway model to handle the system's interactions so as to build up an easy-extended VOD system. We present our work of implementing an Ethernet- based VOD system is easy to extend to an interactive multimedia system that builds up with multiple service providers and supply different kinds of services.
We address a new error-resilient scheme for the as-yet- defined H.263++ standard of Question 15/Study Group 16 in ITU-T. The key idea is the use of essential administration at the end of each data-partitioned packet in order to provide error localization with increased credibility. Furthermore, since the additional CODs consist of fixed length codes, effect of inaccurate rate control can be minimized. Through computer simulations, the proposed auxiliary information at the end of each packet offers improvement of picture quality especially in low error-prone channel. The proposed method is a part of layered data partitioning to be presented in ITU-T and the rudimentary result are introduced in this paper.
Delay experienced by each packet, in packet based communication networks, is variable due to the dynamic nature of queuing. Thus, it is unlikely that the time difference of two packets at the source will be the same at the receiver. The presence of the delay variations, known as jitter, can have an impact on the audio-visual quality as perceived by the human user. In this paper a new synchronization algorithm for Adaptive Jitter Control (AJC) is introduced. It provides an integrated solution for single and multiple stream synchronization problems. AJC builds its own recent history for network delay and adapts to network changes which works with any underlying delay distribution. It provides optimal delay and buffering for a given Quality of Service (QoS) requirement. The algorithm performance could be fine tuned by several parameters which can be supplied by the application involved and it does not need a global clock or a synchronized clocks for its operation. AJC has shown significant performance enhancement over other approaches specially for bursty traffic cases, as it was able to track the changes in the network much quicker.
We present a pure Java-based streaming MPEG-1 video player. By implementing the player entirely in Java, we guarantee its functionality across platforms within any Java-enabled web browsers, without the need for native libraries. This allows greater sue of MPEG video sequences, because the users will no longer need to pre-install any software to display video, beyond Java compatibility. This player features a novel forward-mapping IDCT algorithm that allows it to play locally stored, CIF-sized video sequences at 11 frames per second, when run on a personal computer with Java 'just-in-time' compiler. The IDCT algorithm can run with greater speed when the sequence is viewed at reduced size; e.g., performing approximately 1/4 the amount of computation when the user resizes the sequence to 1/2 its original width and height. We are able to play video streams stored anywhere on the Internet with acceptable performance using a proxy server, eliminating the need for large-capacity auxiliary storage. Thus, the player is well suited to small devices, such as digital TV set-top decoders, requiring little more memory than is required for three video frames. Because of our modular design, it is possible to assemble multiple video streams from remote sources and present them simultaneously to the viewers, subject to network and local performance limitations. The same modular system can further provide viewers with their own customized view of each sessions; e.g., moving and resizing the video display window dynamically, and selecting their preferred set of video controls.
Digital video decoding, enabled by the MPEG-2 Video standard, is an important future application for embedded systems, particularly PDAs and other information appliances. Many such system require portability and wireless communication capabilities, and thus face severe limitations in size and power consumption. This places a premium on integration and efficiency, and favors software solutions for video functionality over specialized hardware. The processors in most embedded system currently lack the computational power needed to perform video decoding, but a related and equally important problem is the required data bandwidth, and the need to cost-effectively insure adequate data supply. MPEG data sets are very large, and generate significant amounts of excess memory traffic for standard data caches, up to 100 times the amount required for decoding. Meanwhile, cost and power limitations restrict cache sizes in embedded systems. Some systems, including many media processors, eliminate caches in favor of memories under direct, painstaking software control in the manner of digital signal processors. Yet MPEG data has locality which caches can exploit if properly optimized, providing fast, flexible, and automatic data supply. We propose a set of enhancements which target the specific needs of the heterogeneous types within the MPEG decoder working set. These optimizations significantly improve the efficiency of small caches, reducing cache-memory traffic by almost 70 percent, and can make an enhanced 4 KB cache perform better than a standard 1 MB cache. This performance improvement can enable high-resolution, full frame rate video playback in cheaper, smaller system than woudl otherwise be possible.
We present several compressed-domain methods for reverse- play transcoding of MPEG video streams. A reverse-play transcoder takes any original MPEG IPB bitstream as input and creates an output MPEG IPB bitstream which, when decoded by a generic MPEG decoder, displays the original video games in reverse order. A baseline spatial-domain method requires decoding the MPEG bitstream, storing and reordering the decoded video frames, and re-encoding the reordered video. The proposed compressed-domain transcoding methods achieve an order of magnitude reduction in computational complexity over the baseline spatial-domain approach. Much of the savings are achieved by using the forward motion vector fields available in the forward-play MPEG bitstream to efficiently generate the reverse motion vector fields used in the reverse-play MPEG bitstream. Furthermore, the storage requirements of the compressed-domain methods are reduced and the resulting image quality is within 0.6 dB of the baseline spatial-domain approach for a difficult highly detailed computer-generated video sequence. For more typical video sequences, the resulting image quality is even closer to the baseline spatial-domain approach.
In this paper, a new rate control algorithm is introduced for a fast rate control requirement in transcoding. The specific transcoding issue mentioned in this paper is referred to as bit-rate conversion. For video services in heterogeneous network environments, it is necessary to convert the bit-rate of compressed video to match it to given transmission channels of lower capacity. the transcoding of coded video streams by requantization in the DCT domain is considered as a promising technique for its low complexity and acceptable picture quality. We carried out experiments for various video sequences to validate the existence of a novel R-Q model in the requantization process and found a piecewise linearly decreasing model. Based on this model, we present an efficient rate control algorithm for transcoding, that requires much lower complexity than conventional one. Simulation results show that the proposed algorithm provides a significant advantage over conventional method in both picture quality and bit-rate deviation.
Real-time inverse-telecine detection algorithms are important video pre-processing components in video- compression systems. Conversion of film source material to NTSC video is typically performed using a frame rate conversion algorithm called 3-2 pulldown or telecine. The telecine algorithm generates duplicate video fields to convert from film's 24 frames per second, to NTSC's 29.97 fps. This redundancy is exploited in video-compression algorithms such as MPEG-2. Instead of encoding the repeated field, the compression algorithm sets a flag, indicating a repeated field, minimizing the redundant information that is encoded. Using the inverse-telecine algorithm to encode film-source video preserves information integrity with a ten-percent bitrate reduction. Detection of the telecine 3-2 pulldown pattern is achieved using field differencing, where repeated fields are detected as anomalies in the frame difference signal. Knowledge of the pulldown pattern and local statistics enhance the detection of repeated fields. Since the pulldown pattern repeats on an interval of five frames within a given field, only one repeated field will ie within a five-frame window. Detection of the repeated field is found by selecting the field with the smallest SAD value within the window. A correlation circuit ensures the number and pattern of repeated frames in the video source corresponds to an ideal telecine source. Comprehensive result using movie trailers and typical video sequence show excellent result for the 3-2 pulldown detection algorithm presented in this paper. One-pass detection of the 3-2 pulldown pattern and a low-delay low-complexity implementation is useful in commercial encoders tasked for compression of entertainment material originating on film.
As digital video becomes more prevalent, video signal processors (VSPs) emerge as a solution to accelerate computational-intensive multimedia applications. While dedicated VSPs are available for MPEG CODECs, the need for greater functionality and time-to-market pressures will push the video industry towards programmable VSPs. Combining a high degree of parallelism with the efficiency of statistically scheduled instructions, very long instruction word micro-architecture is becoming very popular and widely adopted. In this paper, we provide an overview of problems involved in VSP design, including hierarchical architectural paradigm, register and memory sizing, streaming computation, and so on. Analyses and solutions are also presented in addition to problem formulation. Understanding these problems well would help us realize architectural tradeoffs and pinpoint bottleneck in different stages of design sophistication.
Multimedia, as an application, has at its very core the field of signal processing technology. Although, multimedia has leveraged on numerous disciplines, signal processing is the most relevant. Some of the basic concepts, such as spectral analyses, sampling theory and the theory of partial differential equations have become the fundamental building blocks for numerous applications and subsequently have been reinvested in such diverse areas as transform coding, display technology and neural networks. The latter, most recently, leads to a fast implementation of vector quantization. It is evident, that the diverse signal processing algorithms, concepts and applications are interconnected and in numerous instances appear in various reincarnated forms. For example, sub-band coding existed for many years before wavelets became fashionable. In this paper, an attempt will be made to provide a historical overview of signal processing through the present, followed by a highly personal speculation for the future.
Conference applications facilitate communication and cooperation between users at different locations. Crucial for the acceptance of a conference application is the optimal mapping of well known social group behavior to widely spread distributed system within a network. Efficient standardized protocols become mandatory to enable the connectivity among systems of different vendors and to facilitate the implementation of conference applications by providing a generic group communication functionality. For the performance evaluation of group communication protocols in terms of scalability, load and latency, simulations may be used. For the evaluation of the protocol, the generated load within the simulation is crucial for the significance of the evaluation. Statements about usability are useless, if the given load is not realistic. For that, our work presents a model to generate loads, which are typical of different group communication scenarios. These scenarios are evaluated and defined in a formal load model which may be used in simulations. We also present simulation result for the comparison of the resource management of the T.120 group communication standard and an improved scheme, which was developed at our department. It is shown that using our load mode, a detailed evaluation and improvement of group communication protocols is feasible without implementing and testing in large environments.
Improvements in networking allow for increasingly complex collaboration environments with regard to sessions scale, range of shared tasks, and distance between remote parties. Floor control protocols add an access discipline to such environments that allows to mitigate race conditions on shared resources and throttle media transmission. Primary causes for resource competition among users may be the lack of mutual awareness sand formal session orchestration, or network and shot limitations. Various, often proprietary and unscalable solutions for floor control have been implemented for telemedicine, video conferencing, or distributed interactive simulation. To this date, an analytic comparison of the efficacy of these solutions is lacking. With efficacy, we mean the proportion of time that a protocol takes to allocate a resource, accounting for social and technical overhead for muser behavior, protocol cost, and network conditions. We present a novel taxonomy an comparative performance analysis of known classes of floor control protocols, including socially driven protocols, collision sensing on shared resource, floor taken passing in fully-connected and ring topologies, and, innovatively, across shared control trees. Accordingly, aggregated and selective transmission of control information over a multicast control tree offers the best scalability and efficacy. A novel hierarchical floor control protocol correlating in its operation with tree-based reliable multicast is outlined.
Multimedia communications over packet switched networks - commonly referred to as Internet telephony - has generated immense interest in recent years. The International Telecommunication Union is currently developing the H.323 umbrella of standards for this purpose. In the H-323 standard, a gatekeeper is a control entity that performs functions such as address translation, bandwidth management etc. A H.323 terminal needs to discover a suitable gatekeeper and register with it. When a H.3232 terminal wishes to call another H.323 terminal, communication between the gatekeepers result in the determination of the transport layer address of the destination terminal for sending the connection setup message to. There is thus a need not only for terminals to discover gatekeepers but also for gatekeepers to discover other gatekeepers. In this paper, we propose a mechanism whereby terminals and gatekeepers can discover other gatekeepers. The proposed mechanisms result in efficient use of network bandwidth within the administrative domain. The security issues involved in such communications are discussed, and a proposal is made to establish a shared secret during the discovery process. Such a shared secret may be used for subsequent confidentiality or authentication of messages.
An adaptive multi-level diagnostic system capable of processing and filtering multimedia information to extract the most relevant information is described. Experimental results show that such an adaptive multi-level diagnosis system in general performs better than a single-stage detector both in processing speed and in recognition rate and overall rejection rate.
Da CaPo is a highly flexible communication subsystem to perform all communication tasks of middleware solutions like CORBA. Main focus of the first Da CaPo prototype is on the relationship of protocol functionality, QoS, and resource utilization. We are currently redesigning Da CaPo for the real-time (mu) -kernel operating system Chorus. Support for guaranteed QoS via appropriate scheduling mechanisms, efficiency, and a simple programming model are the main goals for the new design and implementation. In this paper, we analyze three multithreading architectures for Da CaPo, i.e., thread per module, thread per packet, and thread per control, in combination with the real-time scheduling policies rate monotonic and earliest deadline first. We use measurements from the first Da CaPo prototype to quantify the execution times of Da CaPo modules, and measurements from the electronic classroom as real-life workload. Our simulation results show that thread per module and thread per packet perform better than thread per control, and RM should be preferred to EDF.
To support real-time multimedia applications in wireless packet networks, it is an essential challenge to provide seamless quality of service (QoS) to mobile users. In this paper, we address the problem of real-time multimedia multicast in cellular networks, and present our solution to avoid large QoS fluctuations during handoffs. Specifically, during a multicast session, a mobile host may experience varying packet delay, delay jitter, and channel error when it moves from one cell to another. It is thus desirable that these location-dependent QoS parameters appear as seamless as possible to mobile hosts. We present protocols to achieve a degree of transmission synchronization among multiple cells, so that the delays and delay jitters of each packet to all subscribing mobile hosts do not vary substantially. In addition, we apply Forward Error Correction technique to recover the QoS-mandatory packets from wireless channel errors. We show through analysis and simulation that the mobile hosts will experience brief, smooth, and low packet loss rate handoffs.
Wireless ATM networks will bring broadband applications like video conferencing to the mobile user. Due to the inherently unreliable nature of the wireless link cell loss is likely to occur, which may result in a violation of the Quality of Service contracts. Within this paper we will present our solution in terms of a generic Quality of Service (QoS) aware Audio/Video Transport Subsystem for wireless ATM. By using an object oriented approach our system can use different video coding methods for different partners, simultaneously. A QoS controller maps user defined quality parameters down to network and system QoS parameters. Our system works together with network filters and error control modules to help recovering from data loss at the wireless or wired part of the ATM-network. We will evaluate the performance of our sub-system based on simulations for the Magic WAND (Wireless ATM Network Demonstrator) system, which provides wireless access to a fixed ATM network in the 5.2 GHz band. For evaluating the impact of errors at the physical layer, we simulated the transmission of several MPEG-2 transport streams (TS) under different channel conditions resulting in different bit error rates. We analyze the suitability of different packing schemes for encapsulating the MPEG-2 TS packets into AAL5 frames based on the WAND radio model with respect to error resilience.
Keywords: Quality of Service, Wireless ATM, Video Compression, Video Transmission, MPEG-2 Transport Streams
In this paper, we investigated the possibility and effectiveness of applying ARQ schemes to the transmission of MPEG-2 encoded video streams over fixed wireless links in B- FWANs in video-on-demand services. Simulations were conducted to evaluate the overall system performance, which is then compared to a scheme using FEC with cell interleaving. Important metrics relevant to the broadband video services are analyzed, including the quality of the reconstructed video sequences, the excess delay and delay variation introduced by the ARQ scheme, the throughput, and the required average channel bandwidth. A hidden Markov model was also introduced to model the variation of the wireless channel.
Early re-synchronization technique is a useful method to develop the error free video data in the cells after a lost cell or a transmission error. In this paper, we propose a method in which the MPEG-2 video data is packetized into cells, and the first macroblock in each cell is always located at a special bit, such as an odd bit. Hence, the early re-synchronization method can be improved to halve the computational complexity, while increasing bit-stream data content by only 13 percent. This is because the decoder will not waste its time to decode the macroblock by starting at an even bit. Furthermore, the probability of mis-decoding the macroblock can also be reduced by 50 percent. We also propose another method to utilize the start code in a cell to help correctly decoding the macroblocks located before it. Basic theoretical analysis is presented in the paper to prove that the proposed method is more effective than the existing one. The result show that the effectiveness of the re-synchronization method can be greatly improved by adopting the proposed packetizing technique.
IP telephony, a new technology to provide voice communication over traditional data networks, has the potential to revolutionize telephone communication within the modern enterprise. This innovation uses packetization techniques to carry voice conversations over IP networks. This packet switched technology promises new integrated services, and lower cost long-distance communication compared to traditional circuit switched telephone networks. Future satellites will need to carry IP traffic efficiently in order to stay competitive in servicing the global data- networking and global telephony infrastructure. However, the effects of Voice over IP over switched satellite channels have not been investigated in detail. To fully understand the effects of satellite channels on Voice over IP quality; several experiments were conducted at Lockheed Martin Telecommunications' Satellite Integration Lab. The result of those experiments along with suggested improvements for voice communication over satellite are presented in this document. First, a detailed introduction of IP telephony as a suitable technology for voice communication over future satellites is presented. This is followed by procedures for the experiments, along with results and strategies. In conclusion we hope that these capability demonstrations will alleviate any uncertainty regarding the applicability of this technology to satellite networks.
Wireless ATM (WATM) will be used to support broadband multimedia services. To achieve this, the existing ATM protocol must be augmented with mobility function. Handover is one of important function of mobility management in the WATM system. This paper describes a handover mechanism for intra-switch handovers in the wireless ATM. On the other hand, this paper present a simple signaling and control architecture for mobility support in a wireless ATM network that provides integrated multimedia services to mobile terminals.
The recent proliferation of digital multimedia content has raised the issue of authentication techniques for multimedia content that is composed of still images, video and audio. Subsequently, there have been many authentication techniques for multimedia objects that have been recently proposed. One such class of techniques is based on digital watermarks and in this paper, we focus on such techniques. There are basically two types of watermarks that have been proposed for purposes of authentication, Fragile Watermarks and Content-based Authentication Watermarks. In this paper we survey different types of fragile and content-based authentication watermarking techniques that have been proposed in the literature. We point to new issues raised by the problem of authentication of multimedia content. We also discuss some shortcomings of proposed techniques and list open problems that still do not admit a satisfactory solution.
In this paper we propose a new watermarking scheme for digital images that allows watermark recovery even if the image has been subjected to generalized geometrical transforms. The watermark is given by a binary number and every watermark bit is represented by a 2D function. The functions are weighted, using a mask that is proportional to the luminance, and then modulated onto the blue component of the image. To recover an embedded bit, the embedded watermark is estimated using a prediction filter. The sign of the correlation between the estimated watermark and the original function determine the embedded several times at horizontally and vertically shifted locations. In the watermark recovery process we first compute a prediction of the embedded watermark. Then the autocorrelation function is computed for this prediction. The multiple embedding of the watermark result in additional autocorrelation peaks. By comparing the configuration of the extracted peaks with their expected configuration we can determine the affine distortion applied to the image. The distortion can then be inverted and the watermark recovered in a standard way.
As more and more digital images are distributed on-line via the Internet and WWW, many copyright owners are concerned about protecting the copyright of digital images. This paper describes WaveMark, a novel wavelet-based multiresolution digital watermarking system for color images. The algorithm in WaveMark uses discrete wavelet transforms and error- correcting coding schemes to provide robust watermarking of digital images. Unlike other wavelet-based algorithms, our watermark recovery procedure does not require a match with an uncorrupted original image. Our algorithm uses Daubechies' advanced wavelets and extended Hamming codes to deal with problems associated with JPEG compression and random additive noise. In addition, the algorithm is able to sustain intentional disturbances introduced by professional robustness testing programs such as StirMark. The use of Daubechies' advanced wavelets makes the watermarked images more perceptively faithful than the images watermarked with the Haar wavelet transform. The watermark is adaptively applied to different areas of the image, based on the smoothness of the areas, to increase robustness within the limits of perception. The system is practical for real-world applications, encoding or decoding images at the speed of less than one second each on a Pentium Pro PC.
A wavelet-based watermark casting scheme and a blind watermark retrieval technique are investigated in this research. An adaptive watermark hiding method is first developed to determine significant wavelet subbands and to select a couple of significant wavelet coefficients in these subbands automatically. Then, a blind watermark retrieval technique that can detect the embedded watermark in the wavelet domain without the help from the original image is proposed. Experimental results show that the embedded images's watermark can be retrieved successfully without the original image even after the JPEG compression with a quality factor of 20 percent and the wavelet based compression attack with the 64 to 1 compression ratio. With the help of original image, the watermark can be detected after more serious attacks.
A novel spatial watermarking scheme that is highly robust is proposed. The embedding is achieved through a process similar to direct sequence spread function techniques. Integral to this is modulation according to a perceptual classification of the cover pixels. The classifier is based on the Sobel gradient operator and identifies the cover pixel region as either constant, edge, pseudo-constant, texture, or mixed. Recovery of the watermark does not require the original image and is achieved through a simple prediction process based on an adaptive method.
Steganography is a technique to hide secret information in some other data without leaving any apparent evidence of data alternation. All of the traditional steganographic techniques have limited information-hiding capacity. They can hide only 10 percent of the data mounts of the vessel. This is because the principle of those techniques was either to replace a special part of the frequency components of the vessel image, or to replace all the least significant bits of a multi-valued image with the secret information.
We propose a novel, robust scheme for data hiding/oblivious detection of watermarks in still images. While the low- frequency image coefficients are robust, they cannot be used effectively for oblivious detection methods, when correlative processing is employed for detection. However, in the proposed non-linear detection method, the robust low- frequency bands can be used effectively. Thus the proposed method turns out to be more robust than methods employing linear addition and correlative extraction of the signature. We report the results obtained for 7 test images in terms of probability of error in detection of the watermark/hidden bits.
We present an information-theoretic approach to obtain an estimate of the number of bits that can be hidden in still images, or, the capacity of the data-hiding channel. We show how the addition of the message signal or signature in a suitable transform domain rather than the spatial domain can significantly increase the channel capacity. Most of the state-of-the-art schemes developed thus far for data-hiding have embedded bits in some transform domain, as it has always been implicitly understood that a decomposition would help. In this paper we compare the achievable data-hiding capacities for different decompositions like DCT, Hartley, Hadamard, and subband transforms. We show that transforms with inferior energy compaction property like Hartley and Hadamard are better choices for the decomposition, than transforms with good energy-compaction property, like DCT or subband transforms.
In this paper, an image sequence coding scheme for very low bit-rate video coding is presented. The new technique utilizes windowed overlapped block matching motion compensation for the temporal coding scheme, and scheme, and vector quantization to reduce the spatial redundancy within the predicted image. There are many advantages associated with VQ, especially its capacity to operate in error- resilient coding system and furthermore, VQ does not suffer from blocking effects that are visually disjointed and has therefore a major advantage over DCT based methods. We examine the performance of various codebooks to remove the spatial redundancy within the difference frame. When the codec is configured to operate at 10.1 kbit/s, average PSNR values in excess of 32.86dB and 25.6dB are achieved for the 'Miss America' and 'Carphone' sequences respectively.
When choosing a video conferencing system ,it is natural for the potential buyer to look very closely at the quality of the audio. In order to assure a high-quality video conferencing, it is important to have a good picture, but it is vital to have high quality audio. The reason for this is that most of the information transferred in a video conference is actually in the audio channel. Not only is good audio mandatory to the effective exchange of information exchange of information, it has also been found that it can effect the perceived video quality. In this ape several key issues on audio quality are discussed. Since audio delay is among the most vexing problem, each video conferencing system has to make efforts to shorten it. What causes audio delay, how to measure actual delay and how to shorten it is provided in detail. Finally several strategies about system design are presented in order to improve audio quality. All the above are based on the video conferencing systems developed according to H.324 and suitable to video conferencing system implemented all using software.
In this paper, we propose a layered video coding scheme for multimedia applications which adopts a new motion prediction structure with temporal hierarchy of frames to afford temporal resolution scalability. Moreover, the proposed scheme can offer spatial scalability using image pyramid decomposition. Thus it provides a 2D scalable multi-layered bitstream for heterogeneous environments, so that spatial resolution and frame rate can be independently adjusted according to various user requirements. In addition, it can have a higher compression ratio than replenishment schemes since motion estimation and compensation further reduce the temporal redundancy.
Rate control is very important in video coding and transmission system. Traditional rate control algorithm is restrained with bandwidth and buffer size based on frames. Nowadays prevalent video coding standard such as ITU H.261, H.263, and ISO MPEG1, MPEG2, MPEG4 all are based on macroblocks. Several kinds of macroblocks coding mode also have been given in these international video-coding standards. Optimal rate-distortion function is not just restrained with bandwidth, bit rate and buffer size, different macroblocks mode choice will also influence rate distortion function. This paper presents a new algorithm based on traditional rate control method.Namely optimal rate control algorithm based on macroblocks. Its main contribution is that not only bandwidth, bit ate and buffer size ar to be considered, but also take in to account the influence of different macroblock mode choice. Some related experimental results have been given and compared with traditional rate control method. Results show that PSNR of the proposed algorithm is better than the traditional method in various rate conditions.
In this paper, we consider an efficient method for distributing video which surmounts the difficulties inherent in transmitting video over a standard digital network. This approach uses a parallel analog network which is dedicated to video in tandem with a digital network. We are going to consider three groups of the video network operations that can be used for collaborative work, physical and logical architecture of the Analog/Digital Video Network, and multi- layer control and management communication protocol. The example of the adaptive collaborative video network environment is also considered.
We propose a novel watermarking method. The proposed method insets a watermark information not into a whole image region but only into interesting regions. To extract the interesting region, we divide an original image into sub- blocks by sing Chang's PIM criteria for a region segmentation. Considering a directional information and a frequency band, we insert the watermark information into the sub-block in DCT domain. With use of it, the proposed method gets a much less damaged image in comparison into the other methods. And this result image is more robust to the attacks form outside. Also the proposed method can reduce the distortion in comparison with the other methods which utilize the whole image as an interesting region.
Recently, various kinds of multimedia application systems have actively been developed based on the achievement of advanced high sped communication networks, computer processing technologies, and digital contents-handling technologies. Under this background, this paper proposed a new distributed multimedia database system which can effectively perform a new function of cooperative retrieval among distributed databases. The proposed system introduces a new concept of 'Retrieval manager' which functions as an intelligent controller so that the user can recognize a set of distributed databases as one logical database. The logical database dynamically generates and performs a preferred combination of retrieving parameters on the basis of both directory data and the system environment. Moreover, a concept of 'domain' is defined in the system as a managing unit of retrieval. The retrieval can effectively be performed by cooperation of processing among multiple domains. Communication language and protocols are also defined in the system. These are used in every action for communications in the system. A language interpreter in each machine translates a communication language into an internal language used in each machine. Using the language interpreter, internal processing, such internal modules as DBMS and user interface modules can freely be selected. A concept of 'content-set' is also introduced. A content-set is defined as a package of contents. Contents in the content-set are related to each other. The system handles a content-set as one object. The user terminal can effectively control the displaying of retrieved contents, referring to data indicating the relation of the contents in the content- set. In order to verify the function of the proposed system, a networked electronic museum was experimentally built. The results of this experiment indicate that the proposed system can effectively retrieve the objective contents under the control to a number of distributed domains. The result also indicate that the system can effectively work even if the system becomes large.
The received quality of MPEG-2 video depends crucially on the performance of the underlying transport technology. Loss or delays in the delivery of MPEG-2 packets can result in defects such as jagged movements or screen freezes when the video is played back. Partly because of its high bandwidth and its ability to provide QoS guarantees, ATM is widely seen as an ideal transport technology for MPEG-2 video. ATM in itself, however, is not a complete solution to the successful deployment of MPEG-2 video. The ATM connections must be configured correctly, and the MPEG-2 source must be well behaved, otherwise poor performance can still result. Traffic policing is a key mechanism used by ATM to achieve its QoS guarantees. Policing will discard traffic from sources that do not stay within the bounds of the bandwidth agreed upon when the ATM connection was set up. If the MPEG- 2 source itself is badly behaved and MPEG-2 packets are lost due to policing, then the received video quality may be poor. This paper uses commercially available equipment to assess the impact on video quality of loss due to policing in an ATM network.
In this paper, we present our design and implementation of ASIS multimedia digital library. Besides bibliographic information, our service includes an archive of digital audio and video streams, CD titles, and WWW contents. This system not only aims at end customer supports, e.g., searching, streaming, and browsing operations, but it also facilitates admission control, content production, distribution processes, and license control, etc. We address several design issues in designing a multimedia digital library, e.g., the need for accounting support and system management, for integrating third party system components and varied kind of multimedia, and for a computer aided tool to automatically process the backend production. Our solutions to these problems are presented.
Multimedia applications and specifically voice and video conferencing tools are widely used in business communications, and are quickly being discovered by the consumer market as well. At the same time, wireless communication services such as PCS voice and cellular data are becoming very popular, leading to the desire to deploy multimedia applications in the wireless environment. Wireless links, however, exhibit several characteristics which are different from traditional wired networks. These include: dynamically changing bandwidth due to mobile host movement in and out of cell where bandwidth is shared, high rates of packet corruption and subsequent loss, and frequent are lengthy disconnections due to obstacles, fading, and movement between cells. In addition, these effects are short-lived and difficult to reproduce, leading to a lack of adequate testing and analysis for applications used in wireless environments.