We propose a new algorithm for synthesis of novel views based on existing stereo 3D imagery for both glasses-based
and glasses-free 3D displays. Due to the often differing audience preferences for stereo depth perception on glasses-based
displays, the range of perceived 3D depth needs to be either compressed or expanded. The proposal algorithm
enables this depth adjustment through the synthesis of new virtual views, by incorporating intensity, disparity, and
geometric saliency cues present in the stereo image pair. The proposed algorithm is further capable of eliminating the
grid quantization artifacts, a common phenomenon when manipulating discrete image disparities. The algorithm can be
also applied for generating multiple views for glasses-free 3D displays, based on the same stereo imagery. Successful
results are demonstrated on real-world video datasets, and evaluated and validated by human subject studies.
One of the key issues associated with 3D TVs is the tradeoff between comfort and 3D visual impact. Big disparity is
often preferred for strong visual impact but often lead to viewer discomfort depending on viewer's condition, display
size and viewing distances. The algorithm proposed in this paper is to provide viewers a tool to adjust disparity
according to the environment, contents and their preference in order to have more comfortable and higher quality 3D
experiences. More specifically, given a planar stereoscopic display, the algorithm takes in a stereoscopic image pair that
causes viewing discomfort/fatigue, and outputs a modified stereoscopic pair that causes less or no viewing
discomfort/fatigue. The algorithm fulfills the functions of disparity estimation, occlusion detection, disparity adjustment
and view synthesis. A novel pixel weighting mechanism in regularized-block-matching based disparity estimation helps
improve the robustness, accuracy and speed of matching. Occlusion detection uses multiple cues in addition to matching
errors to improve the accuracy. An accommodation/vergence mismatch visual model is used in disparity adjustment to
predict discomfort/fatigue from the disparity information, the viewing conditions and display characteristics. The hole
filling is in the disparity map of the new view instead of the new view itself to reduce the blurriness.
In this paper, we propose a novel computationally efficient post-processing algorithm to reduce ringing artifacts in the decoded DCT-coded video without using coding information. While the proposed algorithm is based on edge information as most filtering-based de-ringing algorithms do, this algorithm solely uses one single computationally efficient nonlinear filter, namely sigma filter, for both edge detection and smoothing. Specifically, the sigma filter, which was originally designed for nonlinear filtering, is extended to generate edge proximity information. Different from other adaptive filtering-based methods, whose filters typically use a fixed small window but flexible weights, this sigma filter adaptively switches between small and large windows. The adaptation is designed for removing ringing artifacts only, so the algorithm cannot be used for de-blocking. Overall, the proposed algorithm achieves a good balance among removing ringing artifacts, preserving edges and details, and computational complexity.
In this paper, we first analyze a computationally efficient nonlinear edge-preserving smoothing filter, the sigma filter, in the framework of robust estimation. The sigma filter is nearly W-estimator using the metric trimming influence function with a subtle difference. Based on the analysis, we further develop another form of sigma filter as M-estimator of robust estimation using the same metric trimming influence function. The new form of sigma filter is more suitable for hardware implementation. We also compare the sigma filter with other nonlinear edge-preserving filters such as bilateral filter and median filter, in the framework of robust estimation. Finally, experimental results are reported and discussed.
We address the problem of robust streaming of high-quality video over wireless local area networks in a home environment. By robust streaming, we mean maintaining the highest possible video quality and preventing interruptions to the video under varying bandwidth conditions, which may be due to distance, interference, obstructions, and existence of multiple streams. We propose an application-layer approach where we provide algorithms for dynamic on-line network bandwidth estimation and dynamic on-line adaptation of video rate according to the available network bandwidth. The proposed system employs a packet scheduler, and a video rate control and adaptation mechanism at the sender, and bandwidth measurement and feedback mechanisms at the receiver. Our bandwidth estimation approach uses the actual video data in real time by transmitting it in packet bursts; hence, separate test traffic is not required. Since the proposed method operates at the application layer, it is flexible and applicable to different local area network types and implementations. We propose an extension to multiple streams by providing an algorithm for joint rate allocation to multiple video streams over a network enabling network-adaptive simultaneous streaming of high-quality video.
One of the major challenges facing current media management systems and the related applications is the so-called “semantic gap” between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.