PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This work addresses the problem of fairness and efficiency evaluation of various resource allocation schemes for
wireless visual sensor networks (VSNs). These schemes are used to optimally allocate the source coding rates, channel coding rates, and power levels among the nodes of a wireless direct sequence code division multiple access (DS–CDMA) VSN. All of the considered schemes optimize a function of the video qualities of the nodes. However, there is no single scheme that maximizes the video quality of each node simultaneously. In fact, all presented schemes are able to provide a Pareto–optimal solution, meaning that there is no other solution that is simultaneously preferred by all nodes. Thus, it is not clear which scheme results in the best resource allocation for the whole network. To handle the resulting tradeoffs, in this study we examine four metrics that investigate fairness and efficiency under different perspectives. Specifically, we apply a metric that considers both fairness and performance issues, and another metric that measures the “equality” of a resource allocation (equal utilities for the nodes). The third metric computes the total system utility, while the last metric computes the total power consumption of the nodes. Ideally, a desirable scheme would achieve high total utility while being equally fair to all nodes and requiring low amounts of power.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generally, the designs of digital image processing algorithms and image gathering devices remain separate. However, experiments show that the image gathering process profoundly impacts the performance of digital image processing and the quality of the resulting images. We proposed an end-to-end information theory based system to assess linear shift-invariant edge detection algorithms, where the different parts, such as scene, image gathering, and processing, are assessed in an integrated manner using Shannon’s information theory. We evaluated the performance of the different algorithms as a function of the characteristics of the scene and the parameters, such as sampling, additive noise etc., that define the image gathering system. The edge detection algorithm is regarded as having high performance only if the information rate from the scene to the edge image approaches its maximum possible. This goal can be achieved only by jointly optimizing all processes. To validate our information theoretical conclusions, a series of experiments simulated the whole image acquisition process are conducted. After comparison and discussion between theoretic analysis and simulation analysis, we can draw a conclusion that the proposed information-theoretic assessment provides a new tool which allows us to compare different linear shift-invariant edge detectors in a common environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The next generation video coding standard, High Efficiency Video Coding (HEVC), is under development by the
Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T VCEG and the ISO/IEC MPEG. As the first
version of single-layer HEVC standard comes close to completion, there is a great interest to extend the standard
with scalable capabilities. In this paper, an inter-layer Motion Field Mapping (MFM) algorithm is proposed for the
scalable extension of HEVC to generate the motion field of inter-layer reference pictures, such that the correlation
between the motion vectors (MVs) of base-layer and enhancement-layer can be exploited. Moreover, as the
proposed method does not change any block-level operation, the existing single-layer encoder and decoder logic of
HEVC can be directly applied without modification of motion vector prediction for the enhancement-layer. The
experimental results show the effectiveness of the proposed MFM method in improving the performance of
enhancement-layer motion prediction in scalable HEVC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an extension of the upcoming High Efficiency Video Coding (HEVC) standard for supporting
spatial and quality scalable video coding. Besides scalable coding tools known from scalable profiles of prior video
coding standards such as H.262/MPEG-2 Video and H.264/MPEG-4 AVC, the proposed scalable HEVC extension
includes new coding tools that further improve the coding efficiency of the enhancement layer. In particular, new coding modes by which base and enhancement layer signals are combined for forming an improved enhancement layer prediction signal have been added. All scalable coding tools have been integrated in a way that the low-level syntax and decoding process of HEVC remain unchanged to a large extent. Simulation results for typical application scenarios demonstrate the effectiveness of the proposed design. For spatial and quality scalable coding with two layers, bit-rate savings of about 20-30% have been measured relative to simulcasting the layers, which corresponds to a bit-rate overhead of about 5-15% relative to single-layer coding of the enhancement layer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Google has recently been developing a next generation opensource
video codec called
VP9, as part of the
experimental branch of the libvpx repository included in the WebM project (http://www.webmproject.org/). Starting
from the VP8 video codec released by Google in 2010 as the baseline, a number of enhancements and new tools have
been added to improve the coding efficiency. This paper provides a technical overview of the current status of this
project along with comparisons and other stateoftheart
video codecs H.
264/AVC and HEVC. The new tools that
have been added so far include: larger prediction block sizes up to 64x64, various forms of compound INTER
prediction, more modes for INTRA prediction, ⅛pel
motion vectors and 8tap
switchable subpel interpolation filters,
improved motion reference generation and motion vector coding, improved entropy coding and framelevel
entropy
adaptation for various symbols, improved loop filtering, incorporation of Asymmetric Discrete Sine Transforms and
larger 16x16 and 32x32 DCTs, frame level segmentation to group similar areas together, etc. Other tools and various
bitstream
features are being actively worked on as well. The VP9 bitstream
is expected to be finalized by earlyto
mid2013.
Results show VP9 to be quite competitive in performance with mainstream stateoftheart
codecs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The high efficiency video coding (HEVC) standard being developed by ITU-T VCEG and ISO/IEC MPEG
achieves a compression goal of reducing the bitrate by half for the same visual quality when compared with
earlier video compression standards such as H.264/AVC. It achieves this goal with the use of several new tools
such as quad-tree based partitioning of data, larger block sizes, improved intra prediction, the use of sophisticated
prediction of motion information, inclusion of an in-loop sample adaptive offset process etc. This paper describes
an approach where the HEVC framework is extended to achieve spatial scalability using a multi-loop approach.
The enhancement layer inter-predictive coding efficiency is improved by including within the decoded picture
buffer multiple up-sampled versions of the decoded base layer picture. This approach has the advantage of
achieving significant coding gains with a simple extension of the base layer tools such as inter-prediction, motion
information signaling etc. Coding efficiency of the enhancement layer is further improved using adaptive loop
filter and internal bit-depth increment. The performance of the proposed scalable video coding approach is
compared to simulcast transmission of video data using high efficiency model version 6.1 (HM-6.1). The bitrate
savings are measured using Bjontegaard Delta (BD) rate for a spatial scalability factor of 2 and 1.5 respectively
when compared with simulcast anchors. It is observed that the proposed approach provides an average luma BD
rate gains of 33.7% and 50.5% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hypothetical Reference Decoder is a hypothetical decoder model that specifies constraints on the variability of
conforming network abstraction layer unit streams or conforming byte streams that an encoding process may produce.
High Efficiency Video Coding (HEVC) builds upon and improves the design of the generalized hypothetical reference
decoder of H.264/ AVC. This paper describes some of the main improvements of hypothetical reference decoder of
HEVC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we first review the lossless coding mode in the version 1 of the HEVC standard that has recently finalized.
We then provide a performance comparison between the lossless coding mode in the HEVC and MPEG-AVC/H.264
standards and show that the HEVC lossless coding has limited coding efficiency. To improve the performance of the
lossless coding mode, several new coding tools that were contributed to JCT-VC but not adopted in version 1 of HEVC
standard are introduced. In particular, we discuss sample based intra prediction and coding of residual coefficients in
more detail. At the end, we briefly address a new class of coding tools, i.e., a dictionary-based coder, that is efficient in
encoding screen content including graphics and text.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Expanding image by an arbitrary scale factor and thereby creating an enlarged image is a crucial image processing
operation. De-interlacing is an example of such operation where a video field is enlarged in vertical direction with 1 to 2
scale factor. The most advanced de-interlacing algorithms use a few consequent input fields to generate one output
frame. In order to save hardware resources in video processors, missing lines in each field may be generated without
reference to the other fields. Line doubling, known as “bobbing”, is the simplest intra field de-interlacing method.
However, it may generate visual artifacts. For example, interpolation of an inserted line from a few neighboring lines by
vertical filter may produce such visual artifacts as “jaggies.” In this work we present edge adaptive image up-scaling
and/or enhancement algorithm, which can produce “jaggies” free video output frames. As a first step, an edge and its
parameters in each interpolated pixel are detected from gradient squared tensor based on local signal variances. Then,
according to the edge parameters including orientation, anisotropy and variance strength, the algorithm determines
footprint and frequency response of two-dimensional interpolation filter for the output pixel. Filter’s coefficients are
defined by edge parameters, so that quality of the output frame is controlled by local content. The proposed method may
be used for image enlargement or enhancement (for example, anti-aliasing without resampling). It has been hardware
implemented in video display processor for intra field de-interlacing of video images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intra prediction is a fundamental tool in video coding with hybrid block-based architecture. Recent investigations have
shown that one of the most beneficial elements for a higher compression performance in high-resolution videos is the
incorporation of larger block structures. Thus in this work, we investigate the performance of novel intra prediction modes based on different image completion techniques in a new video coding scheme with large block structures. Image completion methods exploit the fact that high frequency image regions yield high coding costs when using classical H.264/AVC prediction modes. This problem is tackled by investigating the incorporation of several intra predictors using the concept of Laplace partial differential equation (PDE), Least Square (LS) based linear prediction and the Auto Regressive model. A major aspect of this article is the evaluation of the coding performance in a qualitative (i.e. coding efficiency) manner. Experimental results show significant improvements in compression (up to 7.41 %) by integrating the LS-based linear intra prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In an interactive multiview image navigation system, a user requests switches to adjacent views as he observes
the static 3D scene from different viewpoints. In response, the server transmits encoded data to enable client-
side decoding and rendering of the requested viewpoint images. It is clear that there exists correlation between
consecutive requested viewpoint images that can be exploited to lower transmission rate. In previous works, this
is done using a pixel-based synthesis and coding approach for view-switch along the x-dimension (horizontal camera
motion): given texture and depth maps of the previous view, texture pixels are individually shifted horizontally
to the newly requested view, each according to its disparity value, via depth-image-based rendering (DIBR).
Unknown pixels in the disoccluded region in the new view (pixels not visible in the previous view) are either
inpainted, or intra-coded and transmitted by server for reconstruction at decoder.
In this paper, to enable efficient view-switch along the z-dimension (camera motion into / out of the scene),
we propose an alternative layer-based synthesis and coding approach. Specifically, we first divide each multiview
image into depth layers, where adjacent pixels with similar depth values are grouped to the same layer. During
a view-switch into the scene, spatial region in a layer is enlarged via super-resolution, where the scale factor is
determined by the distance between the layer and the camera. On the other hand, during a view-switch out
of the scene, spatial region in a layer is shrunk via low-pass filtering and down-sampling. Due to high quality
reconstruction of depth layers in the new view via rescaling, coding and transmission of a depth layer in the new
view by server is necessary only in the rare case when the quality of layer-based reconstruction is poor, saving
transmission rate. Experiments show that our layer-based approach can reduce bit-rate by up to 35% compared
to previous pixel-based approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Distributed Video Coding of multi-view data and depth maps is an interesting and challenging research field,
whose interest is growing thanks to the recent advances in depth estimation and the development of affordable
devices able to acquire depth information. In applications like video surveillance and object tracking, the
availability of depth data can be beneficial and allow for more accurate processing. In these scenarios, the
encoding complexity is typically limited and therefore distributed coding approaches are desirable. In this
paper a novel algorithm for distributed compression of depth maps exploiting corresponding color information is
proposed. Due to the high correlation of the motion in color and corresponding depth videos, motion information
from the decoded color signal can effectively be exploited to generate accurate side information for the depth
signal, allowing for higher rate-distortion performance without increasing the delay at the decoder side. The
proposed scheme has been evaluated against state-of-the-art distributed video coding techniques applied on depth
data. Experimental results show that the proposed algorithm can provide PSNR improvement between 2.18 dB
and 3.40 dB on depth data compared to the reference DISCOVER decoder, for GOP 2 and QCIF resolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a multimodal image registration algorithm through searching the best matched keypoints by employing
the global information. Keypoints are detected from images from both the reference and test images. For each test keypoint,
a certain number of reference keypoints are chosen as mapping candidates. A triplet of keypoint mappings determine an
affine transformation, and then it is evaluated with the similarity metric between the reference image and the transformed
test image by the determined transformation. An iterative process is conducted on triplets of keypoint mappings, and
for every test keypoint updates and stores its best matched reference keypoint. The similarity metric is defined to be the
number of overlapped edge pixels over entire images, allowing for global information being incorporated in evaluating
triplets of mappings. Experimental results show that the proposed algorithm can provide more accurate registration than
existing methods on EO-IR images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Contrary to common assumptions in the literature, the blur kernel corresponding to lens-effect blur has been
demonstrated to be spatially-varying across the image plane. Existing models for the corresponding point spread
function (PSF) are either parameterized and spatially-invariant, or spatially-varying but ad-hoc and discretely-defined.
In this paper, we develop and present a novel, spatially-varying, parameterized PSF model that accounts for
Seidel aberrations and defocus in an imaging system. We also demonstrate that the parameters of this model can
easily be determined from a set of discretely-defined PSF observations, and that the model accurately describes
the spatial variation of the PSF from a test camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we address the problem of disparity estimation required for free navigation in acquired cubicpanorama
image datasets. A client server based scheme is assumed and a remote user is assumed to seek information at each navigation step. The initial compression of such image datasets for storage as well as the transmission of the required data is addressed in this work. Regarding the compression of such data for storage, a fast method that uses properties of the epipolar geometry together with the cubic format of panoramas is used to estimate disparity vectors efficiently. Assuming the use of B pictures, the concept of forward and backward prediction is addressed. Regarding the transmission stage, a new disparity vector transcoding-like scheme is introduced and a frame conversion scenario is addressed. Details on how to pick the best vector among candidate disparity vectors is explained. In all the above mentioned cases, results are compared both visually through error images as well as using the objective measure of Peak Signal to Noise Ratio (PSNR) versus time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a method to extract depth from motion, texture and intensity. We first analyze the depth map to
extract a set of depth cues. Then, based on these depth cues, we process the colored reference video, using texture, motion,
luminance and chrominance content, to extract the depth map. The processing of each channel in the YCRCB-color space
is conducted separately. We tested this approach on different video sequences with different monocular properties. The
results of our simulations show that the extracted depth maps generate a 3D video with quality close to the video rendered
using the ground truth depth map. We report objective results using 3VQM and subjective analysis via comparison of
rendered images. Furthermore, we analyze the savings in bitrate as a consequence of eliminating the need for two video
codecs, one for the reference color video and one for the depth map. In this case, only the depth cues are sent as a side
information to the color video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In lossy image/video encoding, there is a compromise between the number of bits (rate) and the extent of distortion. Bits need to be properly allocated to different sources, such as frames and macro blocks (MBs). Since the human eyes are more sensitive to the difference than the absolute value of signals, the MINMAX criterion suggests to minimizing the maximum distortion of the sources to limit quality fluctuation. There are many works aimed to such constant quality encoding, however, almost all of them focus on the frame layer bit allocation, and use PSNR as the quality index. We suggest that the bit allocation for MBs should also be constrained in the constant quality, and furthermore, perceptual quality indices should be used instead of PSNR. Based on this idea, we propose a multi-pass block-layer bit allocation scheme for quality constrained encoding. The experimental results show that the proposed method can achieve much better encoding performance. Keywords: Bit allocation, block-layer, perceptual quality, constant quality, quality constrained
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a simple but e ective hole lling algorithm for depth images acquired by time-of- ight (ToF) cameras. The proposed algorithm recovers a hole region of a depth image by taking into account contour pixels surrounding the hole region. In particular, eight contour pixels are selected and then grouped into four pairs according to the four representative directions, i.e. horizontal, vertical, and two diagonal directions. The four pairs of contour pixels are then combined via a bilateral filtering framework in which the filter coefficients are obtained by considering the photometric distance between the two depth pixels in each pair and the geometric distance between the hole pixel and the contour pixels. The experimental results demonstrate that the proposed algorithm e ectively recovers depth edges disconnected by the hole region.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This PDF file contains the front matter associated with SPIE Proceedings Volume 8666, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.