PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE-IS&T Proceedings Volume 6822, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the results of measuring the Hurst parameter for a wide selection of video sequences encoded with H.264 compression with constant quantization index. Six techniques for measuring the Hurst parameter are compared and used on the source data. The Hurst parameter was found to have an average value of 0.9, with a minimum of about 0.8, over the whole set of sequences, indicating the presence of a significant degree of long range dependence in the statistics of the coded data. It is known that traffic with long range dependence causes increased queue length and more packet loss than traffic with only short range dependence. We conclude that if video is to be transmitted at near constant quality, the fact that the resulting data has a significant degree of long range dependence must be taken into account when provisioning the traffic in a network in order to meet quality of service guarantees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we consider the problem of matching users for multimedia transmission in peer-to-peer (P2P)
networks and identify strategies for fair resource division among the matched multimedia peers. We propose
a framework for coalition formation, which enables users to form a group of matched peers where they can
interact cooperatively and negotiate resources based on their satisfaction with the coalition, determined by
explicitly considering the peer's multimedia attributes. In addition, our proposed approach goes a step further
by introducing the concept of marginal contribution, which is the value improvement of the coalition induced
by an incoming peer. We show that the best way for a peer to select a coalition is to choose the coalition
that provides the largest division of marginal contribution given a deployed value-division scheme. Moreover,
we model the utility function by explicitly considering each peer's attributes as well as the cost for uploading
content. To quantify the benefit that users derive from a coalition, we define the value of a coalition based on
the total utility that all peers can achieve jointly in the coalition. Based on this definition of the coalition value,
we use an axiomatic bargaining solution in order to fairly negotiate the value division of the upload bandwidth
given each peer's attributes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To cope with the time-varying network conditions, various error-protection and channel adaptation strategies have been
proposed at different layers of the protocol stack. However, these cross-layer strategies can be efficiently optimized only
if they act on accurate information about the network conditions and hence, are able to timely adapt to network changes.
We analyze the impact of such information feedback on the video quality performances of the collaborative multimedia
users sharing the same multi-hop wireless infrastructure. Based on the information feedback, we can estimate the risk that
packets from different priority and deadline classes will not arrive at their destination before their decoding deadline.
Subsequently, cross-layer optimization strategies such as packet scheduling, retransmission (due to transmission error)
limit are adapted to jointly consider the estimated risk as well as the impact in terms of distortion of not receiving
different priority packets. Our results quantify the risk estimation and its benefit in different network conditions and for
various video applications with different delay constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address the problem of rate allocation for video multicast over wireless mesh networks. An optimization
framework is established to incorporate the effects of heterogeneity in wireless link capacities, traffic contention
among neighboring links and different video distortion-rate (DR) characteristics. We present a distributed rate
allocation scheme with the goal of minimizing total video distortion of all peers without excessive network
utilization. The scheme relies on cross-layer information exchange between the MAC and application layers. It
adopts the scalable video coding (SVC) extensions of H.264/AVC for video rate adaptation, so that graceful
quality reduction can be achieved at intermediate nodes within each multicast tree. The performance of the
proposed scheme is compared with a heuristic scheme based on TCP-Friendly Rate Control (TFRC) for individual
peers. Network simulation results show that the proposed scheme tends to allocate higher rates for peers
experiencing higher link speeds, leading to higher overall video quality than the TFRC-based heuristic scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large-scale, widespread distribution of high definition multimedia
contents using IP networks is extremely resource intensive. Service providers have to employ an expensive network of servers, routers, link infrastructures and set-top boxes to accommodate the generated traffic. The goal in this paper is to develop network-aware media communication solutions that help service providers to efficiently utilize their deployed network infrastructures for media delivery. In particular, we investigate the following fundamental problem: given a fixed network infrastructure, what is the best strategy to multicast multiple multimedia contents from a set of server nodes to a set of clients, to realize the best reconstruction quality at the client nodes? We use rate-distortion to formalize the notion of media quality and to formulate the corresponding optimization problem.
We show that current approaches in which multimedia
compression and network delivery mechanisms
are designed separately are inherently suboptimal. Thus,
better utilization of network resources requires a joint
consideration of media compression and network delivery.
We develop one such approach based on optimized
delivery of balanced Multiple Description Codes (MDC),
in which the MDC itself is also optimized with respect to
the optimized delivery strategy. Simulation results are
reported, verifying that our solution can significantly
outperform existing, layered, solutions. As a byproduct,
our solutions introduces a fundamentally different use of
MDC. Up until now, MDC has been adopted to combat
losses, mostly in packet lossy networks. We show that
MDC is an efficient tool for network communication,
even in error-free networks. In particular MDC, when
properly duplicated at routers, can exploit the rich topological
structures in networks to maximize the utilization
of the network resources, beyond conventional coding
techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate a distributed fine grain adaptive FEC (FGA-FEC) scheme for scalable video streaming
to heterogeneous users over a congested multihop network, where we do FGA-FEC decode/recode at selected
intermediate overlay nodes, and do FGA-FEC adaptation at remaining nodes. In order to reduce the overall
computational burden, we propose two methods: (1) a coordination between optimization processes running at
adjacent nodes to reduce the optimization computation, and (2) extension of our overlay multihop FEC (OM-FEC1,
2) to reduce the number of FGA-FEC decode/recode nodes. Simulations show that the proposed scheme
can greatly reduce computation, and can provide near best possible video quality to diverse users.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For effective noise removal prior to video processing, noise power or noise variance of an input video sequence needs to
be found exactly, but it is actually a very difficult process. This paper presents an accurate noise variance estimation
algorithm based on motion compensation between two adjacent noisy pictures. Firstly, motion estimation is performed
for each block in a picture, and the residue variance of the best motion-compensated block is calculated. Then, a noise
variance estimate of the picture is obtained by adaptively averaging and properly scaling the variances close to the best
variance. The simulation results show that the proposed noise estimation algorithm is very accurate and stable
irrespective of noise level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new representation of the location of the target that aims for the application of object tracking with non-stationary cameras and non-rigid motion: the area weighted mean of the centroids corresponding to each color bin of the target. With this representation, the target localization in the next frame can be achieved by a direct one step computation. The tracking based on this representation has several advantages such as being possible to track in low-rate-frame environment, allowing partial occlusion and being fast due to the one step computation. We also propose a background feature elimination algorithm which is based on the level set based bimodal segmentation and is incorporated into the tracking scheme to increase the robustness of the scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the most frequently used coding mode in H.264 is skip mode. In the conventional approach, after the best RD
mode has been computed and the resultant predicted error coefficients block is all quantized to zero, it is switched to
skip mode. This is a waste of computational resources because skip mode doesn't require forward transform and
quantization. In this paper, skip mode condition is checked for the macroblock prior to multi-block motion estimation.
Motion estimation will not be performed if the condition is satisfied which will drastically reduce the computations. The
condition considers zero-block property after 4x4 block transform/quantisation and caters for noise inherent in natural
video images. In addition, color components are also taken into consideration for skip mode decision. The experimental
results show that the approach can improve encoder speed greatly with negligible bit rate increase or PSNR degradation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Network video cameras, invented in the last decade or so, permit today pervasive, wide-area visual surveillance. However, due to the vast amounts of visual data that such cameras produce human-operator monitoring is not possible and automatic algorithms are needed. One monitoring task of particular interest is the detection of
suspicious behavior, i.e., identification of individuals or objects whose behavior differs from behavior usually observed. Many methods based on object path analysis have been developed to date (motion detection followed by tracking and inferencing) but they are sensitive to motion detection and tracking errors and are also computationally complex. We propose a new surveillance method capable of abnormal behavior detection without explicit estimation of object paths. Our method is based on a simple model of video dynamics. We propose one practical implementation of this general model via temporal aggregation of motion detection labels. Our method requires
little processing power and memory, is robust to motion segmentation errors, and general enough to monitor humans, cars or any other moving objects in uncluttered as well as highly-cluttered scenes. Furthermore, on account of its simplicity, our method can provide performance guarantees. It is also robust in harsh environments
(jittery cameras, rain/snow/fog).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A method is introduced to track the object's motion and estimate its pose from multiple cameras. We focus on direct estimation of the 3D pose from 2D image sequences. We derive a distributed solution that is equivalent to the centralized pose estimation from multiple cameras. Moreover, we show that, by using a proper rotation between each camera and a fixed camera view, we can rely on independent pose estimation from each camera. Then, we propose a robust solution to the centralized pose estimation problem by deriving a best linear unbiased estimate from the rotated pose estimates obtained from each camera. The resulting pose estimation is therefore robust to errors obtained from specific camera views. Moreover, the computational complexity of the distributed solution is efficient and grows linearly with the number of camera views. Finally, the computer simulation experiments demonstrate that our algorithm is fast and accurate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel technique for estimating focused video frames captured by an out-of-focus moving
camera. It relies on the idea of Depth from Defocus (DFD), however overcomes the shortage of DFD by
reforming the problem in a computer vision framework. It introduces a moving-camera scenario and explores
the relationship between the camera motion and the resulting blur characteristics in captured images. This
knowledge leads to a successful blur estimation and focused image estimation. The performance of this algorithm
is demonstrated through error analysis and computer simulated experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel particle allocation approach to particle filtering for multiple object tracking which
minimizes the total tracking distortion given a fixed number of particles over a video sequence. Under the
framework of distributed multiple object tracking, we propose the dynamic proposal variance and optimal particle
number allocation algorithm for multi-object tracking to allocate particles among multiple targets as well as
multiple frames. Experimental results show the superior performance of our proposed algorithm to traditional
particle allocation methods, i.e. a fixed number of particles for each object in each frame. To the best of
our knowledge, our approach is the first to provide an optimal allocation of a fixed number of particles among
multiple objects and multiple frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the amount of streaming video has grown rapidly on the Web. Often, retrieving these streaming videos
offers the challenge of indexing and analyzing the media in real time because the streams must be treated as effectively
infinite in length, thus precluding offline processing. Generally speaking, captions are important semantic clues for video
indexing and retrieval. However, existing caption detection methods often have difficulties to make real-time detection
for streaming video, and few of them concern on the differentiation of captions from scene texts and scrolling texts. In
general, these texts have different roles in streaming video retrieval. To overcome these difficulties, this paper proposes a
novel approach which explores the inter-frame correlation analysis and wavelet-domain modeling for real-time caption
detection in streaming video. In our approach, the inter-frame correlation information is used to distinguish caption texts
from scene texts and scrolling texts. Moreover, wavelet-domain Generalized Gaussian Models (GGMs) are utilized to
automatically remove non-text regions from each frame and only keep caption regions for further processing.
Experiment results show that our approach is able to offer real-time caption detection with high recall and low false
alarm rate, and also can effectively discern caption texts from the other texts even in low resolutions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image can be considered as a collection of small regions. Most researches of image understanding extract features of
these regions, and investigate relationships between these regions and keywords of images that are annotated manually.
There are also some researches that explore the ontology of words. However, little attention has been paid to the
relationships among regions in an image. In this paper, we make a close study of this type of relationships without the
assumption that they are independent for visual content understanding. We first analyze the co-occurrence of regions
using a statistical relevance probability model (SRP). Since human attention in the perception process of an image first
focuses in one region and then moves on to other relevant regions, we propose a novel model called region sequence
prediction model (RSP) to describe it. In RSP, annotation keywords for region sequences of the image and their
probabilities are generated by a hidden Markov model. Experimental results of both image annotation and retrieval on
the Corel dataset (an open image dataset) show that mining the relationships of image regions will achieve comparative
or better performance in visual content understanding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an effcient multi-ranking algorithm for content based image retrieval based on view selection.
The algorithm treats multiple sets of features as views, and selects effective ones from them for ranking tasks
using a data-driven training algorithm. A set of views with different weights are obtained through interaction
between all the views by both self-enforcement and co-reduction. The final sets of views are quite small and
reasonable, yet the effectiveness of original feature sets is preserved. This algorithm provides the potential of
scaling up to large data sets without losing retrieval accuracy. Our experimental retrieval results on real world
image sets demonstrate the effectiveness and effciency of our framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Light field is a novel image-based representation of 3D object, in which each 3D object is described by a group of images captured from many viewpoints. It is irrelevant to the complexity of the 3D scenario or objects. Due to this advantage, we propose a 3D object retrieval framework based on light field. An effective distance measure through subspace analysis of light field data is defined, and our method makes use of the structural information hidden in the images of light field. To obtain a more reasonable distance measure, the distance in low dimensional spaces is supplemented. Additionally, our algorithm can handle the problem of arbitrary camera numbers and positions when capturing the light field. In our experiment, a standard 3D object database is adopted, and our proposed distance measure shows better performance than the "LFD" in 3D object retrieval and recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we explore the complexity-performance trade-offs for camera surveillance applications. For this
purpose, we propose a Scalable Video Codec (SVC), based on wavelet transformation in which we have adopted a
t+2D architecture. Complexity is adjusted by adapting the configuration of the lifting-based motion-compensated
temporal filtering (MCTF). We discuss various configurations and have found an SVC that has a scalable complexity
and performance, enabling embedded applications. The paper discusses the trade-off of coder complexity,
e.g. motion-compensation stages, compression efficiency and end-to-end delay of the video coding chain. Our
SVC has a lower complexity than H.264 SVC, but the quality performance at full resolution is close to H.264
SVC (within 1 dB for surveillance type video at 4CIF, 60Hz) and at lower resolutions sufficient for our video
surveillance application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
H.264/AVC Scalable Video Coding (SVC) is an emerging video coding standard developed by the Joint Video Team
(JVT), which supports multiple scalability features. With scalabilities, SVC video data can be easily adapted to the
characteristics of heterogeneous networks and various devices. Furthermore, SVC requires a high coding efficiency that
is equally competitive or better than single-layer H.264/AVC. Motion prediction at the level of Fine Grain Scalability
(FGS) enhancement layers was proposed to improve coding efficiency as well as inter-layer motion prediction. However,
the removal of the FGS enhancement layer at the inter-layer motion prediction causes significant visual errors due to the
encoder-decoder mismatches of motion vectors and MB modes. In this paper, we analyze visual errors to find the cause
as well as the method for reducing such errors. Experimental results showed that the proposed method allowed SVC
bitstreams decoding with reduced visual errors, even when the FGS enhancement layer used for the inter-layer motion
prediction was removed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a fast intermode decision scheme which is suitable for the hierarchical B-picture structure in which much computational power is spent for combined variable block sizes and bi-predictive motion estimation is introduced. The hypothesis testing considering the characteristics of the hierarchical B-picture structure in the proposed method is performed on 16x16 and 8x8 blocks to have early termination for RD computation of all possible modes. The early termination in intermode decision is performed by comparing the pixel values of current blocks and corresponding motion-compensated blocks. When the hypothesis tests are performed, the confidence intervals to accept the null hypothesis or not are decided according to the temporal scalability levels under the consideration of properties of hierarchical B-pictures. The proposed scheme exhibits effective early termination behavior in intermode decision of temporal scalabilities and leads to a significant reduction up to 69% in computational complexity with slight increment in bit amounts. The degradation of visual quality turns out to be negligible in terms of PSNR values.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For rate-distortion optimized rate allocation in JVT Scalable Video Coding (SVC), the distortion impact of every FGS NAL unit on the global reconstruction quality is calculated by repeatedly bitstream decoding, which leads to high complexity. In this paper, a fast rate allocation algorithm by modeling distortion estimation is proposed. Based on the hypothesis that DCT residual coefficients follow Laplacian distribution, we establish the distortion estimation model by calculating quantization error of each FGS NAL unit and analyzing the prediction in hierarchical B coding structure. Besides, the parameter in the model is updated according to the distribution of residual coefficients decoded at the base layer within every frame. Experimental results show that compared to the existing method of R-D optimized rate allocation in SVC, the proposed method results in a reduction in decoding time of nearly 50%, and save the runtime of rate allocation by 45.3%, while the PSNR loss of decoded sequence is only 0.04 dB on average.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fine-Granular SNR scalable (FGS) technologies in H.264/AVC-based scalable video coding (SVC) provide a flexible
and effective foundation for scaling FGS enhancement layer (EL) to accommodate different and variable network
capacities. To support smooth quality extraction of SVC FGS videos, it's important to obtain the Rate-Distortion (R-D)
function of each picture or group of pictures (GOP). In this paper, firstly, we introduce the R-D analysis of SVC FGS
coding in our prior work. Then, with the analysis and models, we present virtual GOP concept and a virtual-GOP-based
packet scheduling algorithm is proposed to acquire the optimal packet scheduling sequence in a virtual GOP. Based on
the packet scheduling algorithm and the R-D analysis of FGS EL, an effective and flexible D-R model is proposed to
describe the D-R function of the virtual GOP. Then, with the R-D model of virtual GOPs, a practical non-search
algorithm for smooth quality reconstruction is introduced. Compared to the quality layer method, the reconstructed video
quality is improved not only objectively but also subjectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a technique for coding high dynamic range videos. The proposed coding scheme is scalable, such that both standard dynamic range and high dynamic range representations of a video can be extracted from one bit stream. A localized inverse tone mapping method is proposed for efficient inter-layer prediction, which applies a scaling factor and an offset to each macroblock, per color channel. The scaling factors and offsets are predicted from neighboring macroblocks, and then the differences are entropy coded. The proposed inter-layer prediction technique is independent of the forward tone mapping method and is able to cover a wide range of bit-depths and various color spaces. Simulations are performed based on H.264/AVC SVC common software and core experiment conditions. Results show the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper develops a new intraframe scalable coding framework based on a subband/wavelet coding approach for MPEG-4 AVC/H.264 scalable video coding (SVC). It attempts to join the subband filter banks with the traditional macroblock and DCT based video coding system. We demonstrate that the current H.264 coding system can be efficiently integrated with the traditional subband filter banks for providing the improved efficiency for intraframe scalable coding. More importantly, unlike the classical wavelet coding, the proposed framework still allows the downsampling filter to be flexibly designed to generate the ideal low resolution video for target applications. Under a wavelet critical sampling setting, the proposed system can perform similar to a conventional single-layer (non-scalable) coder without any performance overhead. Thus, by just reuse of the existing H.264 coding tools to work together with the
added subband filter banks, we can provide the H.264 standard with an alternative intraframe coding scheme that is primarily based on the transform coding approach with the additional spatial scalability and other attractive benefits and useful for both scalable and conventional single-layer intraframe coding applications. The proposed algorithm has been thoroughly evaluated against the current SVC test model JSVM, Motion-JPEG2000, and H.264 high-profile intra coding through extensive coding experiments for both scalable coding and single-layer coding. The simulation results show the proposed algorithm consistently outperforms JSVM and is competitive to Motion-JPEG2000 in PSNR performance. As such, the efficient and highly scalable wavelet image/video compression, as demonstrated by JPEG2000, can be additionally accommodated by the slightly modified MPEG-4 AVC/H.264 standard with low extra implementation costs. Image and video coding applications, traditionally serviced by separate coders, can be efficiently provided by an integrated coding system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years digital imaging devices become an integral part of our daily lives due to the advancements in imaging, storage
and wireless communication technologies. Power-Rate-Distortion efficiency is the key factor common to all resource
constrained portable devices. In addition, especially in real-time wireless multimedia applications, channel adaptive and
error resilient source coding techniques should be considered in conjunction with the P-R-D efficiency, since most of the
time Automatic Repeat-reQuest (ARQ) and Forward Error Correction (FEC) are either not feasible or costly in terms of
bandwidth efficiency delay. In this work, we focus on the scenarios of real-time video communication for resource constrained
devices over bandwidth limited and lossy channels, and propose an analytic Power-channel Error-Rate-Distortion
(P-E-R-D) model. In particular, probabilities of macroblocks coding modes are intelligently controlled through an optimization
process according to their distinct rate-distortion-complexity performance for a given channel error rate. The
framework provides theoretical guidelines for the joint analysis of error resilient source coding and resource allocation.
Experimental results show that our optimal framework provides consistent rate-distortion performance gain under different
power constraints.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In mobile-to-mobile video communications, both the transmitting and receiving ends may not have the necessary
computing power to perform complex video compression and decompression tasks. Traditional video codecs
typically have highly complex encoders and less complex decoders. However, Wyner-Ziv (WZ) coding allows for
a low complexity encoder at the price of a more complex decoder. We propose a video communication system
where the transmitter uses a WZ (reversed complexity) coder, while the receiver uses a traditional decoder,
hence minimizing complexity at both ends. For that to work we propose to insert a transcoder in the network to
convert the video stream. We present an efficient transcoder from a simple WZ approach to H.263. Our approach
saves a large amount of the computation by reusing the motion estimation performed at the WZ decoder stage,
among other things. Results are presented to demonstrate the transcoder performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a redundant picture formation algorithm that takes into account a given redundancy rate constraint is
presented for error resilient wireless video transmission without reliance on retransmissions. The algorithm assigns
priorities to MBs according to two suggested metrics and ranks macroblocks accordingly. The first metric is based on an
end-to-end distortion model and aims at maximising the reduction in distortion per redundancy bit. The end-to-end
distortion accounts for the effects of error propagation, mismatch between the primary and redundancy description and
error concealment. Macroblocks providing large distortion reduction for fewer bits spent are assigned a higher priority.
The second metric employs the variance of the motion vectors of a macroblock and those of its neighbouring blocks.
Results show that the rate distortion metric outperforms other examined metrics by up to 2dB. Moreover, gains over
existing error resilience schemes, such as LA-RDO, are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compressed video is very sensitive to channel errors. A few bit losses can stop the entire decoding process.
Therefore, protecting compressed video is always necessary for reliable visual communications. In recent years,
Wyner-Ziv lossy coding has been used for error resilience and has achieved improvement over conventional
techniques. In our previous work, we proposed an unequal error protection algorithm for protecting data elements
in a video stream using a Wyner-Ziv codec. We also presented an improved method by adapting the parity
data rates of protected video information to the video content. In this paper, we describe a feedback aided error
resilience technique, based on Wyner-Ziv coding. By utilizing feedback regarding current channel packet-loss
rates, a turbo coder can adaptively adjust the amount of parity bits needed for correcting corrupted slices at the
decoder. This results in an effcient usage of the data rate budget for Wyner-Ziv coding while maintaining good
quality decoded video when the data has been corrupted by transmission errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, several methods that divide the original bitstream of an image/video progressive wavelet based source coders into multiple correlated substreams have been proposed. The principle behind transmitting independent multiple substream is to generate multiple descriptions of the source such that the graceful degradation is
achieved when transmitted over severe fading channels and lossy packet networks since some of the streams may be recovered. Noting that multiple substream can benefit from multiple independent channel paths, we naturally consider Multi-Input Multi-Output communication systems where we obtain multiple independent fading channels. Depending on several factors including: the number of antennas employed, the transmission energy, the Doppler shift (due to the motion between the transmitter antenna and receiver antennas), the
total transmission rate and the distortion-rate (D-R) of the source-there exists an optimal number of balanced substreams and an optimal joint source-channel coding policy such that the expected distortion at the receiver is minimized. In this paper we derive an expected distortion function at the receiver based on all of these parameters and provide a fast real-time numerical technique to find the optimal or near optimal number of balanced substreams to be transmitted. This expected distortion is based on our derivation of the probabilistic loss patterns of a balanced multiple substream progressive source coder. The accuracy of the derived expected distortion estimator is confirmed by Monte-Carlo simulation employing Dent's modification of Jakes' model. By accurately estimating the optimal number of balanced substreams to be transmitted, a substantial gain in visual quality at low and intermediate signal-to-noise ratio (SNR) is obtained over severely fading channels. At high SNR, the single stream source coder's source efficiency makes it slightly better than the multiple substream source coder. Overall, using our analytic development, we provide a systematic real-time non-adhoc method that achieves high quality image and video at low and moderate signal-to-noise ratios in severe fading channels for single and multiple antenna systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper demonstrates the effectiveness of employing MDC (multiple-description coding) as a video decomposition for
transmission in the presence of error bursts, with application to both SISO and MIMO-STBC (space-time block coding)
systems envisaged. Various trade-offs involving video encoding parameters are investigated that offer improved
performance and reduce the decoding delay for the given channel conditions, based on the Gilbert-Elliot channel model.
This results in a joint source-channel coding approach that significantly enhances the quality of the transmitted video.
While interleaving without the use of MDC does yield improvements in average PSNR of up to 1dB, these may not be
justified given the high decoding delay incurred. The use of MDC increases these improvements to over 2dB, and also
outperforms SDC coupled with cross-packet FEC. In addition, when FEC is combined with MDC, a gain of up to 2dB is
obtained compared to the equivalent SDC+FEC scheme (for average PERs above 5%), and up to 5dB compared to the
values obtained from simple SDC interleaving.
Unlike simple interleaving, the use of MDC and FEC entails some increased complexity and a decrease in error-free
quality. Simple interleaving, however, cannot achieve the gains available from MDC/FEC in the presence of error bursts,
irrespective of the interleaving depth employed and the resulting decoding delay.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Progressive transmission of images is an important functionality for communicating high resolution images over limited bandwidth networks. By encoding the image data in an accessible and hierarchical format, the JPEG 2000 standard supports many types of image progressions, e.g., based on quality, resolution, component and position. This paper considers a progressive transmission scheme in which codestream ordering and transmission decisions are driven entirely by the server, which is useful for classes of applications that employ image analysis at the server and perform streaming based on the results of this analysis. The proposed system aims to minimize signaling overhead and allow for incremental decoding and display with minimal processing delay. It also aims to fully exploit the various styles of progression that are enabled by the JPEG 2000 coding format. The performance of our proposed scheme is reported in terms of signaling overhead, complexity and visual effectiveness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low-bitrate digital video often suffers from the artifact of texture flattening. Texture synthesis can be used to revive the
removed texture. Patch-based synthesis provides a quite general method for texture synthesis. However, this method still
requires a substantial bitrate to transmit example patches. We propose a method for stochastic texture synthesis which
requires only a very low bitrate (less than 1 kbit/sec) that can replace patch-based synthesis for random textures. Spatial
correlation is modeled as a 2-dimensional Moving Average (MA) process. To achieve a faithful representation of
temporal evolution, we use a translation+scaling motion model combined with a finite impulse response (FIR) filter.
Experiments show that we can successfully reduce texture flattening for a range of random textures such as grass and
roads.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
All existing video coding standards are based on block-wise motion compensation and block-wise DCT. At high
levels of quantization, block-wise motion compensation and transform produces blocking artifacts in the decoded
video, a form of distortion to which the human visual system is very sensitive. The latest video coding standard,
H.264/AVC, introduces a deblocking filter to reduce the blocking artifacts. However, there is still visible distortion
after the filtering when compared to the original video. In this paper, we propose a non-conventional filter to
further reduce the distortion and to improve the decoded picture quality. Different from conventional filters, the
proposed filter is based on a machine learning algorithm (decision tree). The decision trees are used to classify
the filter's inputs and select the best filter coeffcients for the inputs. Experimental results with 4 × 4 DCT
indicate that using the filter holds promise in improving the quality of H.264/AVC video sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A 3D fuzzy-filtering scheme is proposed for reduction of compression artifacts such as blocking and ringing noises. The
proposed scheme incorporates information from temporally-neighboring frames as well as from spatially-neighboring
pixels by accounting for the spatio-temporal relationships in the definitions of spatial-rank orders and spread information
for the fuzzy-filter. Extra information from a 3D set of pixels of the surrounding frames helps enhance the clustering
characteristic of the fuzzy filter while preserving the frame edges. The proposed scheme also exploits the chroma
components from neighboring frames to reconstruct the color of the current frame more faithfully. The experimental
results show that both the subjective and the objective qualities of post-processed video are significantly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quantization matrix is an important encoding tool for discrete cosine transform (DCT) based perceptual image / video
encoding in that DCT coefficients can be quantized according to the sensitivity of the human visual system to the
coefficients' corresponding spatial frequencies. A quadratic model is introduced to parameterize the quantization
matrices. This model is then used to optimize quantization matrices for a specific bitrate or bitrate range by maximizing
the expected encoding quality via a trial based multidimensional numerical search method. The model is simple yet it
characterizes the slope and the convexity of the quantization matrices along the horizontal, the vertical and the diagonal
directions. The advantage of the model for improving perceptual video encoding quality is demonstrated with
simulations using H.264 / AVC video encoding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a systematic approach to configuring the video encoding parameters for optimal video encoding in this
paper. The determination of optimal video encoding parameters is formulated as an optimization problem of maximizing
the expected video encoding quality under a set of constraints that may include a video quality measure, a target bitrate,
computation, memory bandwidth, etc. We use the Video Quality Metric, a measurement paradigm of video quality that
is based on algorithms for objective measurement of video quality, to measure the expected video encoding
performance. The optimization problem can be solved through an efficient multidimensional numerical search method,
direct simplex search method, with encoding of various sequences with different encoding parameter settings. We
illustrate the approach to determine parameters to enable optimal MB level quantization parameter adaptation in H.264 /
AVC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Codecs such as H.264/AVC involve computationally intensive tasks that often prohibit the real-time implementation. It
has been observed that the complexity of such video encoders can be tuned gracefully to a desired level through the use
of a smaller set of macroblock types in mode decision and a lower motion vector precision in motion estimation. The
rate-distortion performance, however, will be affected consequently. In this paper, we propose a flexible syntax
mechanism (FSM) to tune the encoder complexity while maintaining a sufficient rate-distortion performance. The key
idea inherit in the proposed FSM consists of two folds: first is the specification at the higher level of the bitstream syntax
both the subset of macroblock types and the precision of motion vectors to be evaluated by the encoder, and second is
the redesign of the entropy coders accordingly to effectively represent the selected macroblock types and the motion
vectors. Since the entropy coding is optimized in terms of the bitrate consumption specifically for the subset of
macroblock modes and the motion vector precision, the rate-distortion performance will be enhanced compared to the
scenario where identical entropy codes are adopted regardless. Another advantage of our approach is the intrinsic
scalability in complexity for the application of video encoding under different complexity constraints. The proposed
approach may be considered for the next generation of video codecs with flexible complexity profiles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign language users are eager for the freedom and convenience of video communication over cellular devices. Compression of sign language video in this setting offers unique challenges. The low bitrates available make encoding decisions extremely important, while the power constraints of the device limit the encoder complexity.
The ultimate goal is to maximize the intelligibility of the conversation given the rate-constrained cellular channel and power constrained encoding device. This paper uses an objective measure of intelligibility, based on subjective testing with members of the Deaf community, for rate-distortion optimization of sign language video within the H.264 framework. Performance bounds are established by using the intelligibility metric in a Lagrangian cost function along with a trellis search to make optimal mode and quantizer decisions for each macroblock. The optimal QP values are analyzed and the unique structure of sign language is exploited in order to reduce
complexity by three orders of magnitude relative to the trellis search technique with no loss in rate-distortion performance. Further reductions in complexity are made by eliminating rarely occuring modes in the encoding process. The low complexity SL optimization technique increases the measured intelligibility up to 3.5 dB, at
fixed rates, and reduces rate by as much as 60% at fixed levels of intelligibility with respect to a rate control algorithm designed for aesthetic distortion as measured by MSE.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we analyze focus mismatches among cameras utilized in a multiview system, and propose techniques
to efficiently apply our previously proposed adaptive reference filtering (ARF) scheme to inter-view prediction in
multiview video coding (MVC). We show that, with heterogeneous focus setting, the differences exhibit in images
captured by different cameras can be represented in terms of the focus setting mismatches (view-dependency) and
the depths of objects (depth-dependency). We then analyze the performance of the previously proposed ARF
in MVC inter-view prediction. The gains in coding efficiency show a strong view-wise variation. Furthermore,
the estimated filter coeffcients demonstrate strong correlation when the depths of objects in the scene remain
similar. By exploiting the properties derived from the theoretical and performance analysis, we propose two
techniques to achieve effcient ARF coding scheme: i) view-wise ARF adaptation based on RD-cost prediction,
which determines whether ARF is beneficial for a given view, and ii) filter updating based on depth-composition
change, in which the same set of filters will be used (i.e., no new filters will be designed) until there is significant
change in the depth-composition within the scene. Simulation results show that significant complexity savings
are possible (e.g., the complete ARF encoding process needs to be applied to only 20% ~ 35% of the frames)
with negligible quality degradation (e.g., around 0.05 dB loss).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The end of the performance entitlement historically achieved by classic scaling of CMOS devices is within sight, driven
ultimately by fundamental limits. Performance entitlements predicted by classic CMOS scaling have progressively failed
to be realized in recent process generations due to excessive leakage, increasing interconnect delays and scaling of gate
dielectrics. Prior to reaching fundamental limits, trends in technology, architecture and economics will pressure the
industry to adopt new paradigms. A likely response is to repartition system functions away from digital implementations
and into new architectures. Future architectures for visual communications will require extending the implementation
into the optical and analog processing domains. The fundamental properties of these domains will in turn give rise to
new architectural concepts. The limits of CMOS scaling and impact on architectures will be briefly reviewed. Alternative
approaches in the optical, electronic and analog domains will then be examined for advantages, architectural impact and drawbacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel formulation of the classical mean filtering, which has been shown to stem from the theory of continued fractions as well as from the rules of binomial expansion.
Such an alternative formulation of mean filtering is marked by its sufficiency of only a few primitive operations, namely binary shifts and addition (subtraction), in the integer domain.
Subsequently, the resultant process of smoothing a digital image using the mean filter is devoid of any floating-point computation, and can be implemented by a simple hardware, thereof.
In addition, the formulation has the ability of yielding an approximate solution using fewer operations, which can bring the hardware cost further down.
We have tested our method for various images, and have reported some
relevant results to demonstrate its elegance, versatility, and effectiveness, specially when an approximate solution is called for.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
JPEG 2000 is the new standard for image compression. The features of this standard makes it is suitable for imaging and
multimedia applications in this era of wireless and Internet communications. Discrete Wavelet Transform and embedded
bit plane coding are the two key building blocks of the JPEG 2000 encoder. The JPEG 2000 architecture for image
compression makes high quality compression possible in video mode also, i.e. motion JPEG 2000. In this paper, we
present a study of the compression impact using variable code block size in different levels of DWT instead of fixed
code block size as specified in the original standard. We also discuss the advantages of using variable code block sizes
and its VLSI implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the problem of enabling block level parallelism, for multi-dimensional data sets, with
arbitrary but static causal dependency between blocks that constitute the data set. As the use of video and other multi-dimensional
data sets becomes more common place and the algorithms used to process them become more complex,
there is greater need for effective parallelization schemes. We describe a method for synchronizing the execution of
multiple processors to respect the dependency structure and calculate the total processing time as a function of the
number of parallel processors. We also provide an algorithm to calculate the optimal starting times for each processor
which enables them to continuously process blocks without the need for synchronizing with other processors, under the
assumption that the time to process each block is fixed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an energy efficient VLSI architecture for motion estimation using image processing assisted voltage
overscaling (VOS). Motion estimation is the most computationally expensive block inside any video encoder, typically
consuming 40-60% of the total power. This work focuses on using VOS to reduce power consumption at the expense of
marginal loss of visual quality. Some image processing techniques are used to assist VOS so that a better trade-off
between power and visual quality can be achieved. The design is demonstrated using full search and three step search
algorithms. Simulation results in 65nm CMOS technology show that the proposed technique can save up to 30% power
at the cost of 0.5dB loss of PSNR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The bilateral filter is a nonlinear filter that does spatial averaging without smoothing edges; it has shown to be
an effective image denoising technique in addition to some other applications. There are two main contributions
of this paper. First, we provide an empirical study of the optimal parameter selection for the bilateral filter in
image denoising applications. Second, we present an extension of the bilateral filter: multi-resolution bilateral
filter, where bilateral filtering is applied to low-frequency subbands of a signal decomposed using an orthogonal
wavelet transform. Combined with wavelet thresholding, this new image denoising framework turns out to be
very effective in eliminating noise in real noisy images. We provide experimental results with both simulated
data and real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For real-time imaging in surveillance applications, visibility of details is of primary importance to ensure customer
confidence. Usually, image quality is improved by enhancing contrast and sharpness. Many complex scenes require local
contrast improvements that should bring details to the best possible visibility. However, local enhancement methods
mainly suffer from ringing artifacts and noise over-enhancement. In this paper, we present a new multi-window real-time
high-frequency enhancement scheme, in which gain is a non-linear function of the detail energy. Our algorithm
simultaneously controls perceived sharpness, ringing artifacts (contrast) and noise, resulting in a good balance between
visibility of details and non-disturbance of artifacts. The overall quality enhancement is based on a careful selection of
the filter types for the multi-band decomposition and a detailed analysis of the signal per frequency band. The advantage
of the proposed technique is that detail gains can be set much higher than usual and the algorithm will reduce them only
at places where it is really needed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a new image super-resolution algorithm in an adaptive, robust M-estimation framework. Super-resolution
reconstruction is formulated as an optimization (minimization) problem whose objective function is based on a robust error norm. The effectiveness of the proposed scheme lies in the selection of a specific class of robust M-estimators, redescending M-estimators, and the incorporation of a similarity measure to adapt the estimation process to each of the low-resolution frames. Such a choice helps in dealing with violations to the assumed imaging model that could have generated the low-resolution frames from the unknown high-resolution one. The proposed approach effectively suppresses the outliers without the use of regularization in the objective function, and results in high-resolution images with crisp details and no artifacts. Experiments on both synthetic and real sequences demonstrate the superior performance over methods based on the L2 and L1 in the objective function.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An attraction-repulsion expectation-maximization (AREM) algorithm for density estimation is proposed in this
paper. We introduce a Gibbs distribution function for attraction and inverse Gibbs distribution for repulsion
as an augmented penalty function in order to determine equilibrium between over-smoothing and over-fitting.
The logarithm of the likelihood function augmented the Gibbs density mixture is solved under expectation-maximization
(EM) method. We demonstrate the application of the proposed attraction-repulsion expectation-maximization
algorithm to image reconstruction and sensor field estimation problem using computer simulation.
We show that the proposed algorithm improves the performance considerably.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a two-dimensional distributed hidden Markovmodel (2D-DHMM), where dependency of
the state transition probability on any state is allowed as long as causality is preserved. The proposed 2D-DHMM
model is result of a novel solution to a more general non-causal two-dimensional hidden Markovmodel (2D-HMM)
that we proposed. Our proposed models can capture, for example, dependency among diagonal states, which can
be critical in many image processing applications, for example, image segmentation. A new sets of basic image
patterns are designed to enrich the variability of states, which in return largely improves the accuracy of state
estimations and segmentation performance. We provide three algorithms for the training and classification of
our proposed model. A new Expectation-Maximization (EM) algorithm suitable for estimation of the new model
is derived, where a novel General Forward-Backward (GFB) algorithm is proposed for recursive estimation
of the model parameters. A new conditional independent subset-state sequence structure decomposition of
state sequences is proposed for the 2D Viterbi algorithm. Application to aerial image segmentation shows the
superiority of our model compared to the existing models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a line segment based image registration method. Edges are detected and partitioned into line segments.
Line-fitting is applied onto every line segment to rule out those segments of high fitting error. For each segment in a
reference image, putative matching segments in a test image are picked with the constraints obtained by analyzing affine
transformations. Putative segment correspondences result in the correspondences of intersections of segments, which
are used as matching points. An affine matrix is derived from those point correspondences and evaluated by the similarity
metric. The segment correspondences ending up with higher similarity metrics are used to compute the final transformation.
Experimental results show that the proposed method is robust especially when salient points can not be detected accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image demosaicing is a problem of interpolating full-resolution color images from so-called color-filter-array
(CFA) samples. Among various CFA patterns, Bayer pattern has been the most popular choice and demosaicing
of Bayer pattern has attracted renewed interest in recent years partially due to the increased availability of source
codes/executables in response to the principle of "reproducible research". In this article, we provide a systematic
survey of over seventy published works in this field since 1999 (complementary to previous reviews22, 67).
Our review attempts to address important issues to demosaicing and identify fundamental differences among
competing approaches. Our findings suggest most existing works belong to the class of sequential demosaicing
- i.e., luminance channel is interpolated first and then chrominance channels are reconstructed based on recovered
luminance information. We report our comparative study results with a collection of eleven competing
algorithms whose source codes or executables are provided by the authors. Our comparison is performed on
two data sets: Kodak PhotoCD (popular choice) and IMAX high-quality images (more challenging). While
most existing demosaicing algorithms achieve good performance on the Kodak data set, their performance on
the IMAX one (images with varying-hue and high-saturation edges) degrades significantly. Such observation
suggests the importance of properly addressing the issue of mismatch between assumed model and observation
data in demosaicing, which calls for further investigation on issues such as model validation, test data selection
and performance evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel approach for joint denoising and interpolation of noisy Bayer-patterned data acquired from a digital imaging sensor (e.g., CMOS, CCD). The aim is to obtain a full-resolution RGB noiseless image. The proposed technique is specifically targeted to filter signal-dependant, e.g. Poissonian, or heteroscedastic noise, and effectively exploits the correlation between the different color channels. The joint technique for denoising and interpolation is based on the concept of local polynomial approximation (LPA) and intersection of confidence intervals (ICI). These directional filters utilize simultaneously the green, red, and blue color channels. This is achieved by a linear combination of complementary-supported smoothing and derivative kernels designed for the Bayer data grid. With these filters, the denoised and the interpolated estimates are obtained by convolutions over the Bayer data. The ICI rule is used for data-adaptive selection of the length of the designed cross-color directional filter. Fusing estimates from multiple directions provides the final anisotropic denoised and interpolated values. The full-size RGB image is obtained by placing these values into the corresponding positions in the image grid. The efficiency of the proposed approach is demonstrated by experimental results with simulated and real camera data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Demosaicking is the process of reconstructing a full resolution color image from the sampled data acquired by
a digital camera that apply a color filter array to a single sensor. In this paper, we propose a regularization
approach to demosaicking, making use of some prior knowledge about natural color images, such as smoothness of
each single color component and correlation between the different color channels. Initially, a quadratic strategy is
considered and a general approach is reported. Then, an adaptive technique is analyzed, in order to improve the
reconstruction near the edges and the discontinuities of the image. This is performed using a novel strategy that
avoids computational demanding iterations. The proposed approach provides good performances and candidates
itself for many applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we restate the model of spatio-chromatic sampling in single-chip digital cameras covered by Color Filter
Array (CFA)1. The model shows that a periodic arrangement of chromatic samples in the CFA gives luminance and
chromatic information that is localized in the Fourier domain. This representation allows defining a space invariant
uniform demosaicking method which is based on the frequency selection of the luminance and chrominance information.
We then show two extended methods which used the frequency representation of the Bayer CFA2,3 to derive adaptive
demosaicking. Finally, we will show the application of the model for CFA with random arrangement of chromatic
samples, either using a linear method based on Wiener estimation4 or with an adaptive method5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Almost all existing color demosaicking algorithms for digital cameras are designed on the assumption of high
correlation between red, green, blue (or some other primary color) bands. They exploit spectral correlations
between the primary color bands to interpolate the missing color samples. The interpolation errors increase in
areas of no or weak spectral correlations. Consequently, objectionable artifacts tend to occur on highly saturated
colors and in the presence of large sensor noises, whenever the assumption of high spectral correlations does
not hold. This paper proposes a remedy to the above problem that has long been overlooked in the literature.
The main contribution of this work is a technique of correcting the interpolation errors of any existing color
demosaicking algorithm by piecewise autoregressive modeling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The recovery of a full resolution color image from a color filter array like the Bayer pattern is commonly regarded
as an interpolation problem for the missing color components. But it may equivalently be viewed as the problem
of channel separation from a frequency multiplex of the color components. By using linear band-pass filters in
a locally adaptive manner, this latter view has earlier been successfully approached, providing state-of-the-art
performance in demosaicking. In this paper, we address remaining shortcomings of this frequency domain method
and discuss a locally adaptive restoration filter. By implementing restoration as an extension of the bilateral
filter, reasonable complexity of the method can be sustained while being able to improve resulting image quality
by up to more than 1dB.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent developments in spatio-spectral sampling theory for color imaging devices show that the choice of color
filter array largely determines the spatial resolution of color images achievable by subsequent processing schemes
such as demosaicking and image denoising. This paper highlights the cost-effectiveness of a new breed of color
filter array patterns based on this sampling theory by detailing an implementation of the demosaicking method
consisting of entirely linear elements and comprising a total of only ten add operations per full-pixel reconstruction.
With color fidelity that rivals the state-of-the-art interpolation methods and an order of complexity near
to that of the bilinear interpolation, this joint sensor-demosaicking solution to digital camera architectures can
fulfill the image quality and complexity needs of future digital multimedia simultaneously.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a distributed coding scheme for the representation of 3D scenes captured by stereo omni-directional
cameras. We consider a scenario where images captured from two different viewpoints are encoded
independently, with a balanced rate distribution among the different cameras. The distributed coding is built on
multiresolution representation and partitioning of the visual information in each camera. The encoder transmits
one partition after entropy coding, as well as the syndrome bits resulting from the channel encoding of the
other partition. The decoder exploits the intra-view correlation and attempts to reconstruct the source image
by combination of the entropy-coded partition and the syndrome information. At the same time, it exploits the
inter-view correlation using motion estimation between images from different cameras. Experiments demonstrate
that the distributed coding solution performs better than a scheme where images are handled independently,
and that the coding rate stays balanced between encoders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider the problem of communicating compact descriptors for the purpose of establishing visual correspondences
between two cameras operating under rate constraints. Establishing visual correspondences is a critical
step before other tasks such as camera calibration or object recognition can be performed in a network of cameras.
We verify that descriptors of regions which are in correspondence are highly correlated, and propose the use
of distributed source coding to reduce the bandwidth needed for transmitting descriptors required to establish
correspondence. Our experiments demonstrate that the proposed scheme is able to provide compression gains of
57% with minimal loss in the number of correctly established correspondences compared to a scheme that communicates
the entire image of the scene losslessly in compressed form. Over a wide range of rates, the proposed
scheme also provides superior performance when compared to simply transmitting all the feature descriptors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We investigate compression techniques to support flexible video decoding. In these, encoders generate a single
compressed bit-stream that can be decoded in several different ways, so that users or decoders can choose among
several available decoding paths. Flexible decoding has several advantages, including improved accessibility
of the compressed data for emerging applications (e.g., multiview video) and enhanced robustness for video
communication. Flexible decoding, however, makes it difficult for compression algorithms to exploit temporal
redundancy: when the decoder can choose among different decoding paths, the encoder no longer knows deterministically
which previously reconstructed frames will be available for decoding the current frame. Therefore, to
support flexible decoding, encoders need to operate under uncertainty on the decoder predictor status. This paper
extends our previous work on video compression with decoder predictor uncertainty using distributed source
coding (DSC). We present a thorough discussion of flexible decoding, including its theoretical performance. The
main advantage of a DSC approach to flexible decoding is that the information communicated from the encoder
to the decoder (namely, the parity bits) is independent of a specific predictor. By "decoupling" the compressed
information from the predictor, we will demonstrate that, theoretically and experimentally, DSC can lead to a
solution that compares favorably to one based on conventional "closed loop" prediction (CLP), where multiple
prediction residues are sent, one for each possible predictor available at the decoder. The main novelties of the
proposed algorithm are that it incorporates different macroblock modes and significance coding within the DSC
framework. This, combined with a judicious exploitation of correlation statistics, allows us to achieve competitive
coding performance. Experimental results using multiview video coding and forward/backward video playback
suggest the proposed DSC-based solution can outperform flexible decoding techniques based on CLP coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wyner-Ziv video coders perform simple intra-frame encoding and complex inter-frame decoding. This feature makes this type of coder suitable for applications that require low-complexity encoders. Video coding algorithms provide coding modes and parameters so that encoders can fulfill rate constraints and improve the coding
efficiency. However, in most Wyner-Ziv video coders, no algorithm is used to optimally choose the coding modes and parameters. In this paper, we present a rate control algorithm for pixel-domain Wyner-Ziv video coders. Our algorithm predicts the rate and distortion of each video frame as a function of the coding mode and the quantization parameter. In this way, our algorithm can properly select the best mode and quantization for each video frame. We show experimentally that, even though the rate and distortion cannot be accurately predicted in Wyner-Ziv video encoders, rate constraints are approximately fulfilled and good coding efficiency is obtained
by using our algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we consider Wyner-Ziv video compression using rateless LDPC codes. It is shown that the advantages of using rateless LDPC codes in Wyner-Ziv video compression, in comparison to using traditional fixed-rate LDPC codes, are at least threefold: 1) it significantly reduces the storage complexity; 2) it allows seamless integration with mode selection; and 3) it greatly improves the overall system's performance. Experimental results on the standard CIF-sized sequence mobile_and_calendar show that by combining rateless LDPC coding with simple skip mode selection, one can build a Wyner-Ziv video compression system that is, at rate 0.2 bits per pixel, about 2.25dB away from the standard JM software implementation of the H.264 main profile, more than 8.5dB better than H.264 Intra where all frames are H.264 coded intrapredicted frames, and about 2.3dB better than the same Wyner-Ziv system using fixed-rate LDPC coding. In terms of encoding complexity, the Wyner-Ziv video compression system is two orders of magnitude less complex than the JM implementation of the H.264 main profile.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing 3-D dynamic mesh compression methods directly explore temporal redundancy by predictive coding
and the coded bitstreams are sensitive to transmission errors. In this paper, an efficient and error-resilient
compression paradigm based on Wyner-Ziv coding (WZC) is proposed. We first apply an anisotropic wavelet
transform (AWT) on each frame to explore their spatial redundancy. Then the wavelet coeffcients of every
frame are compressed by a Wyner-Ziv codec which is composed of a nested scalar quantizer and a turbo codes
based Slepian-Wolf codec. Benefiting from the inherent robustness of WZC, the proposed coding scheme can
alleviates the problem of error-propagation associated with conventional predictive coding scheme. Furthermore,
based on wavelet transform, our method can be extended to support progressive coding which is desirable for
the streaming of 3D meshes. Experimental results show that our scheme is competitive with other compression
methods in compression performance. Moreover, our method is more robust when transmission error occurs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The accuracy of motion estimation (ME) plays an important role in improving the coding efficiency of Wyner-Ziv video coding (WZVC). Most existing WZVC schemes perform ME at the decoder. The unavailability of the current frame on the decoder side usually impairs the accuracy of ME, which also causes the degradation of coding efficiency of WZVC. To improve the accuracy of ME, some works in the literature assume the current frame can be progressively decoded, and the decoder iteratively refines the motion field based on each partially decoded image. In this paper, we present an
analytical model to estimate the potential gain by employing multi-resolution motion refinement (MMR), assuming the current frame is progressively decoded in the frequency domain. The theoretical results show that at high rates, WZVC with MMR falls about 1.5 dB behind the conventional inter-frame coding, but outperforms WZVC with motion extrapolation by 0.9 to 5 dB. Significant gain has also been observed in the simulations using real video data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wyner-Ziv video coding is committed to the compression of video with low computing resources. Since the encoding
may stop at any time for mobile devices with limited computing resources and bandwidth, it is also desirable to have the
scalable Wyner-Ziv video coding. The bit-plane coding is an inherent solution of scalable video coding. However, the
conventional bit-plane representation in hybrid video coding does not work well in the scenario of Wyner-Ziv coding.
Since the bit-plane representation is closely related to quantization, we propose a new bit-plane representation with
optimal quantization at any bit-plane in terms of Wyner-Ziv coding. In particular, for the DCT-domain Wyner-Ziv video
coding, the distribution of DCT coefficients and the conditional distribution given side information can be modeled with
symmetric Laplacian functions. Accordingly, a simplified adaptive bit-plane representation is proposed without pre-knowing
the Laplacian distribution parameters. A DCT-domain scalable Wyner-Ziv video coding scheme is then
developed, in which the encoding can stop at any bit-plane and the bit-stream can also be flexibly truncated. The testing
has shown that there is no performance penalty due to the unpredicted bit-plane truncation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The coding efficiency of Wyner-Ziv frames relies considerably on the quality of side information and the capability to
model the statistical dependency between the original frame and side information. In the field of distributed multi-view
video coding (DMVC), there are two kinds of side information, namely temporal side information (TSI), which is
generated by exploiting the temporal correlation, and inter-view side information (IVSI), which is generated by
exploiting the inter-view correlation. This paper proposes a new fusion method to get better side information by the
region-based combination of TSI and IVSI. Besides, an improved statistical model of "correlation channel" is proposed
to estimate the statistical dependency between the original frame and side information at the decoder. We call it Region-based
Correlation Channel Model (RCCM). The RCCM models the "correlation channel" between original frame and
side information at a fine granularity level by detecting the spatial quality variation of side information within each
Wyner-Ziv frame. Experimental results demonstrate that the RCCM can more accurately model the "correlation channel" between the original frame and side information and thus, less bits are required for decoding a Wyner-Ziv frame.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Source classification has been widely studied in conventional coding of image and video signals. This paper
explores the idea of exploiting so-called classification gain in Wyner-Ziv (WZ) video coding. We first provide
theoretical analysis of how source classification can lead to improved Rate-Distortion tradeoff in WZ coding and
quantify the classification gain by the ratio of weighted arithmetic mean to weighted geometric mean over subsources.
Then we present a practical WZ video coding algorithm based on the source classification principle. The
statistics of both spatial and temporal correlation are taken into account in our classification strategy. Specifically, the subsource with the steepest R-D slope is identified to be the class of significant wavelet coefficients
of the blocks that are poorly motion-compensated in WZ frames. In such classification-based approach, rate
control is performed at the decoder which can be viewed as the dual to conventional video coding where R-D optimization
stays with the encoder. By combining powerful LDPC codes (for generating coded information) with
advanced temporal interpolation (for generating side information), we have observed that the new Wyner-Ziv
coder achieves highly encouraging performance for the test sequences used in our experiments. For example, the
gap between H264 JM11.0 (I-B-I-B...) and the proposed WZ video coder is dramatically reduced for foreman
and hall QCIF sequences when compared with the best reported results in the literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Horizon detection in still images or video sequences contributes to applications like image understanding, automatic correction of image tilt and image quality enhancement. In this paper, we propose an algorithm for detecting the horizon line in digital images, which employs an edge-based and a new color-based horizon detection technique. The color-based detector calculates an estimate of the horizon line by analyzing the color transition in the clear sky areas of the image. The edge-based detector computes the horizon line by finding the most prominent line or edge in the image, based on Canny edge detection and Hough transformation. The proposed algorithm combines the two detectors into a hybrid detection system, thereby taking advantage of their complimentary strengths. We have applied the algorithm on a manually annotated set of images and evaluated the accuracy of the position and angle of the detected horizon line. The experiments indicate the usefulness of the proposed color-based detector (40% lower error vs. the edge-based detector) and the benefit of the adopted approach for combining the two individual detectors (57% and 17% lower error vs. the edge-based and the color-based detectors, respectively).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper deals with the technique for generating free viewpoint images by using multi-focus imaging sequences.
The method may enable us to realize dense 4-D ray-space reconstruction from 3-D multi-focus imaging sequences.
Previously, we proposed a method of generating free viewpoint, iris and focus images directly from multi-focus
imaging sequences with a 3-D convolution filter that expresses how the scene is defocused in the sequence.
However, the cost of the spatial frequency analysis based on 3-D FFT is not inexpensive. In this paper, efficient
reconstruction of free viewpoint images by dimension reduction and 2-D FFT is discussed. We show some
experimental results by using synthetic and real images. Epipolar plane images are also reconstructed in order
to show the disparity between generated free viewpoint images clearly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Super-resolution (SR) is a technique to obtain a higher resolution image (frame) by fusing multiple low-resolution (LR)
images (frames) of the same scene. In a typical super-resolution algorithm, image registration is one of the most affective
steps. The diffculty of this step results in the fact that most of the existing SR algorithms can not cope with local motions
because image registration in general assumes global motion. Moreover, modeling SR noise including image registration
error has great influence on the performance of the SR algorithms. In this paper, we report that Laplacian distribution
assumption is good selection for global and slow motions that can be easily registered, while for fast motion sequences
that contain multi-moving objects, Gaussian distribution is better for error modeling. Based on these results, we propose
a cost function with weighted L2-norm considering the SR noise model where the weights are generated from the error
of registration and penalize parts that are inaccurately registered. These weights serve to reject the outlier image regions.
Both the objective and subjective results demonstrate that the proposed algorithm gives better results for slow and fast
motion sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a color transfer technique based on Adaptive Directional Wavelet Transform with Quincunx Sampling
(ADWQS) is proposed to transfer color from a color reference image to a grayscale target image. Due to ADWQS's
directional selectivity and symmetrical characteristic, the proposed scheme yields the best color transfer performance.
We search the best matching only in LL subband of luminance coefficients of the two images and then transfer the
chromaticity coefficients to corresponding positions. This operation greatly accelerates the colorization process and
meanwhile maintains the good performance. In addition, the proposed method has no constraint on the image size, i.e.,
the color reference image and the grayscale target image can be of different sizes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we use a set-theoretic approach to provide an efficient and deterministic iterative solution for the
compensated signature embedding (CSE) scheme introduced in an earlier work.4 In CSE, a fragile signature is
derived and embedded into the media using a robust watermarking technique. Since the embedding process leads
to altering the media, the media samples are iteratively adjusted to compensate for the embedding distortion.
Projections Onto Convex Sets (POCS) is an iterative set-theoretic approach known to be deterministic, effective
and has been used in many image processing applications. We propose to use POCS for providing a compensation
mechanism to address the CSE problem. We identify two convex constraint sets defined according to image
fidelity and signature-generation criteria, and use them in a POCS-based CSE image authentication system.
The system utilizes the wavelet transform domain for embedding and compensation. Simulation results are
presented to show that the proposed scheme is efficient and accurate in terms of both achieving high convergence
speed and maintaining image fidelity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel semi-supervised dimensionality reduction learning algorithm for the ranking problem.
Generally, we do not make the assumption of existence of classes and do not want to find the classification
boundaries. Instead, we only assume that the data point cloud can construct a graph which describes the
manifold structure, and there are multiple concepts on different parts of the manifold. By maximizing the distance
between different concepts and simultaneously preserving the local structure on the manifold, the learned metric
can indeed give good ranking results. Moreover, based on the theoretical analysis of the relationship between
graph Laplacian and manifold Laplace-Beltrami operator, we develop an online learning algorithm that can
incrementally learn the unlabeled data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval.
However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S)
problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage
due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant
Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To
address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve
real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is
presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation
methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords)
selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation
algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a progressive mesh geometry coder, which expresses geometry information in terms of spectral coefficients obtained through a transformation and codes these coefficients using a hierarchical set partitioning algorithm. The spectral transformation used is the one proposed in [10] where the spectral coefficients are obtained by projecting the mesh geometry onto an orthonormal basis determined by mesh topology. The set partitioning method that jointly codes the zeroes of these coefficients, treats the spectral coefficients for each of the three spatial coordinates with the right
priority at all bit planes and realizes a truly embedded bitstream by implicit bit allocation. The experiments on common irregular meshes reveal that the distortion-rate performance of our coder is significantly superior to that of the spectral coder of [10].
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We consider the performance of two candidate approaches to compressing stereoscopic digital cinema distribution images: decorrelation transforms and disparity compensation. We show that disparity compensation generally can provide superior performance when significant disparity exists, and furthermore, that the consideration of vertical displacement can be an important factor in maximizing this performance under certain conditions. For context, we also provide details about the current state of both 2D and stereoscopic digital cinema distribution as of the end of the year
2007.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel statistical image model is proposed to facilitate the design and analysis of image processing algorithms. A mean-removed image neighborhood is modeled as a scaled segment of a hypothetical texture source, characterized as a 2-D stationary zero-mean unit-variance random field, specified by its autocorrelation function. Assuming that statistically similar image neighborhoods are derived from the same texture source, a clustering algorithm is developed to optimize both the texture sources and the cluster of neighborhoods associated with each texture source. Additionally, a novel parameterization of the texture source autocorrelation function and the corresponding power spectral density is incorporated into the clustering algorithm. The parametric auto-correlation function is anisotropic, suitable for describing directional features such as edges and lines in images.
Experimental results demonstrate the application of the proposed model for designing linear predictors and analyzing the performance of wavelet-based image coding methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the results of two computational experiments designed to investigate whether the success
of recent image fidelity metrics can be attributed to the fact that these metrics implicitly incorporate region-
of-interest information. Modified versions of four metrics (PSNR, WSNR, SSIM, and VIF) were created by incorporating spatially varying weights chosen to maximize correlation between each metric and subjective ratings of fidelity for images from the LIVE image database. The results reveal that all metrics can benefit from spatially varying weights, especially when the regions are hand-chosen based upon the objects in an image. However, the results suggest that PSNR and VIF would benefit the most from spatial weighting in which the weights are determined based on region of interest information. Additionally, the results show that object based regions follow an intuitive weighting pattern.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
VP6 is a video coding standard developed by On2 Technologies. It is the preferred codec in the Flash 8/9 format used by
many popular online video services and user generated content sites. The wide adoption of Flash video for video delivery
on the Internet has made VP6 one of the most widely used video compression standards on the Internet. With the wide
adoption of VP6 comes the need for transcoding other video formats to the VP6 format. This paper presents algorithms
to transcode H.263 to the VP6 format. This transcoder has applications in media adaptation including converting older
Flash video formats to Flash 8 format. The transcoding algorithms reuse the information from the H.263 decoding stage
and accelerate the VP6 encoding stage. Experimental results show that the proposed algorithms are able to reduce the
encoding complexity by up to 52% while reducing the PSNR by at most 0.42 dB in the worst case.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Interactive Paper Session: Distributed Source Coding
A printed photograph is difficult to reuse because the digital information that generated the print may no longer be
available. This paper describes a mechanism for approximating the original digital image by combining a scan of the
printed photograph with small amounts of digital auxiliary information kept together with the print. The auxiliary
information consists of a small amount of digital data to enable accurate registration and color-reproduction,
followed by a larger amount of digital data to recover residual errors and lost frequencies by distributed Wyner-Ziv
coding techniques. Approximating the original digital image enables many uses, including making good quality
reprints from the original print, even when they are faded many years later. In essence, the print itself becomes the
currency for archiving and repurposing digital images, without requiring computer infrastructure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the past few years, practical distributed video coding systems have been proposed based on Slepian-Wolf and
Wyner-Ziv theorems. The quality of side information plays a critical role in the overall performance for such a system.
In this paper, we present a novel approach to generating the side information using optimal filtering techniques. The
motion vectors (MVs) that define the motion activity between the main information and the side information are first
predicted by an optimal filter, then the MVs obtained from a decoded WZ frame by a conventional motion search
method corrects the prediction results. The side information is generated from the updated MVs via a motion
compensated interpolation (MCI) process and can be subsequently fed into the decoding process to further improve the
quality of a decoded WZ frame. We studied several variations of optimal filters and compared them with other DVC
systems in terms of rate-distortion performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a novel video scrambling scheme is introduced for Distributed Video Coding. The goal is to conceal
video information to preserve privacy in several applications such as video surveillance and anonymous video
communications. This is achieved by performing a transform domain scrambling on both Key and Wyner-Ziv
frames. More specifically, the sign of the scrambled transform coefficient is inverted at the encoder side. The
scrambling pattern is defined by a secret key and the latter is required at the decoder for descrambling. The
scheme is proven to provide a good level of security in addition to a flexible scrambling level (i.e the amount of
distortion introduced). Finally, it is shown that the original DVC scheme and the one with scrambling have a
similar rate distortion performance. In other words, the DVC compression efficiency is not negatively impacted
by the introduction of the scrambling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A large number of practical coding scenarios deal with sources such as transform coefficients that can be well modeled as Laplacians. For regular coding of such sources, samples are often quantized by a family of uniform quantizers possibly with a deadzone, and then entropy coded. For the Wyner-Ziv coding problem when correlated side-information is available at the decoder, the side-information can be modeled as obtained by additive Laplacian or Gaussian noise on the source. This paper deals with optimal choice of parameters for practical Wyner-Ziv coding in such scenarios, using the same quantizer family as in the regular codec to cover a range of rate-distortion trade-offs, given the variances of the source and additive noise. We propose and analyze a general encoding model that combines source coding and channel coding and show that at practical block lengths and code complexities, not pure channel coding but a hybrid combination of source coding and channel coding with right parameters provide optimal rate-distortion performance. Further, for the channel coded bit-planes we observe that only high-rate codes are useful. We also provide a framework for on-the-fly parameter choice based on non-parametric representation of a set of seed functions, for use in scenarios where variances are estimated during encoding. A good understanding of the optimal parameter selection mechanism is essential for building practical distributed codecs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.