Natural and artificial textures occur frequently in images and in video sequences. Image/video coding systems based on texture synthesis can make use of a reliable texture synthesis quality assessment method in order to improve the compression performance in terms of perceived quality and bit-rate. Existing objective visual quality assessment methods do not perform satisfactorily when predicting the synthesized texture quality. In our previous work, we showed that texture regularity can be used as an attribute for estimating the quality of synthesized textures. In this paper, we study the effect of another texture attribute, namely texture granularity, on the quality of synthesized textures. For this purpose, subjective studies are conducted to assess the quality of synthesized textures with different levels (low, medium, high) of perceived texture granularity using different types of texture synthesis methods.
The varying quality of face images is an important challenge that limits the effectiveness of face recognition technology when applied in real-world applications. Existing face image databases do not consider the effect of distortions that commonly occur in real-world environments. This database (QLFW) represents an initial attempt to provide a set of labeled face images spanning the wide range of quality, from no perceived impairment to strong perceived impairment for face detection and face recognition applications. Types of impairment include JPEG2000 compression, JPEG compression, additive white noise, Gaussian blur and contrast change. Subjective experiments are conducted to assess the perceived visual quality of faces under different levels and types of distortions and also to assess the human recognition performance under the considered distortions. One goal of this work is to enable automated performance evaluation of face recognition technologies in the presence of different types and levels of visual distortions. This will consequently enable the development of face recognition systems that can operate reliably on real-world visual content in the presence of real-world visual distortions. Another goal is to enable the development and assessment of visual quality metrics for face images and for face detection and recognition applications.
Texture granularity is an important visual characteristic that is useful in a variety of applications, including analysis, recognition, and compression, to name a few. A texture granularity measure can be used to quantify the perceived level of texture granularity. The granularity level of the textures is influenced by the size of the texture primitives. A primitive is defined as the smallest recognizable repetitive object in the texture. If the texture has large primitives then the perceived granularity level tends to be lower as compared to a texture with smaller primitives. In this work we are presenting a texture granularity database referred as GranTEX which consists of 30 textures with varying levels of primitive sizes and granularity levels. The GranTEX database consists of both natural and man-made textures. A subjective study is conducted to measure the perceived granularity level of textures present in the GranTEX database. An objective metric that automatically measures the perceived granularity level of textures is also presented as part of this work. It is shown that the proposed granularity metric correlates well with the subjective granularity scores.
Blur is an important attribute in the study and modeling of the human visual system. Blur discrimination was studied
extensively using 2D test patterns. In this study, we present the details of subjective tests performed to measure blur
discrimination thresholds using stereoscopic 3D test patterns. Specifically, the effect of disparity on the blur
discrimination thresholds is studied on a passive stereoscopic 3D display. The blur discrimination thresholds are
measured using stereoscopic 3D test patterns with positive, negative and zero disparity values, at multiple reference blur
levels. A disparity value of zero represents the 2D viewing case where both the eyes will observe the same image. The
subjective test results indicate that the blur discrimination thresholds remain constant as we vary the disparity value. This
further indicates that binocular disparity does not affect blur discrimination thresholds and the models developed for 2D
blur discrimination thresholds can be extended to stereoscopic 3D blur discrimination thresholds. We have presented
fitting of the Weber model to the 3D blur discrimination thresholds measured from the subjective experiments.
In this paper we study the application of visual saliency models for the simultaneous localization and mapping (SLAM) problem. We consider visual SLAM, where the location of the camera and a map of the environment can be generated using images from a single moving camera. In visual SLAM, the interest point detector is of key importance. This detector must be invariant to certain image transformations so that features can be matched across di erent frames. Recent work has used a model of human visual attention to detect interest points, however it is unclear as to what is the best attention model for this purpose. To this aim, we compare the performance of interest points from four saliency models (Itti, GBVS, RARE, and AWS) with the performance of four traditional interest point detectors (Harris, Shi-Tomasi, SIFT, and FAST). We evaluate these detectors under several di erent types of image transformation and nd that the Itti saliency model, in general, achieves the best performance in terms of keypoint repeatability.
Many important applications in clinical medicine can benefit from the fusion of spectroscopy data with anatomical
images. For example, the correlation of metabolite profiles with specific regions of interest in anatomical tumor images
can be useful in characterizing and treating heterogeneous tumors that appear structurally homogeneous. Such
applications can build on the correlation of data from in-vivo Proton Magnetic Resonance Spectroscopy Imaging (1HMRSI) with data from genetic and ex-vivo Nuclear Magnetic Resonance spectroscopy. To establish that correlation, tissue samples must be neurosurgically extracted from specifically identified locations with high accuracy. Toward that end, this paper presents new neuronavigation technology that enhances current clinical capabilities in the context of neurosurgical planning and execution. The proposed methods improve upon the current state-of-the-art in neuronavigation through the use of detailed three dimensional (3D) 1H-MRSI data. MRSI spectra are processed and analyzed, and specific voxels are selected based on their chemical contents. 3D neuronavigation overlays are then generated and applied to anatomical image data in the operating room. Without such technology, neurosurgeons must rely on memory and other qualitative resources alone for guidance in accessing specific MRSI-identified voxels. In contrast, MRSI-based overlays provide quantitative visual cues and location information during neurosurgery. The proposed methods enable a progressive new form of online MRSI-guided neuronavigation that we demonstrate in this study through phantom validation and clinical application.
Three-D (3-D) stereo video is becoming widely available and one need to consider depth effects
when extending the 2-D video processing algorithms to 3-D stereo set-up. Depth is the additional
attribute which contributes to overall visual quality of the 3-D stereo video. Sharpness enhancement
algorithm is commonly applied in 2-D video processing chain; the effect of depth perception when
sharpness enhancement algorithm is applied to the 3-D stereo video is studied. A subjective
experiment is presented to study the relation between blur/sharpness and depth. A concept of just
noticeable blur (JNB) at different depths is introduced for the stereo image pairs. Based on the
results of the experiment an adaptive sharpness enhancement algorithm is proposed. The visual
quality results, of the proposed depth aware sharpness enhancement algorithm, are presented.
In this paper, a technique is presented to alleviate ghosting artifacts in the decoded video sequences for low-bit-rate
video coding. Ghosting artifacts can be defined as the appearance of ghost like outlines of an object in a decoded video
frame. Ghosting artifacts result from the use of a prediction loop in the video codec, which is typically used to increase
the coding efficiency of the video sequence. They appear in the presence of significant frame-to-frame motion in the
video sequence, and are typically visible for several frames until they eventually die out or an intra-frame refresh occurs.
Ghosting artifacts are particularly annoying at low bit rates since the extreme loss of information tends to accentuate
their appearance. To mitigate this effect, a procedure with selective in-loop filtering based on motion vector information
is proposed. In the proposed scheme, the in-loop filter is applied only to the regions where there is motion. This is done
so as not to affect the regions that are devoid of motion, since ghosting artifacts only occur in high-motion regions. It is
shown that the proposed selective filtering method dramatically reduces ghosting artifacts in a wide variety of video
sequences with pronounced frame-to-frame motion, without degrading the motionless regions.
In this paper, we present a region-of-interest-based video coding system for use in real-time applications. Region-of-interest (ROI) coding methodology specifies that targets or ROIs be coded at higher fidelity using a greater number of available bits, while the remainder of the scene or background is coded using a fewer number of bits. This allows the target regions within the scene to be well preserved, while dramatically reducing the number of bits required to code the video sequence, thus reducing the transmission bandwidth and storage requirements. In the proposed system, the ROI contours can be selected arbitrarily by the user via a graphical user interface (GUI), or they can be specified via a text file interface by an automated process such as a detection/tracking algorithm. Additionally, these contours can be specified at either the transmitter or receiver. Contour information is efficiently exchanged between the transmitter and receiver and can be adjusted on the fly and in real time. Coding results are presented for both electro-optical (EO) and infrared (IR) video sequences to demonstrate the performance of the proposed system.
In this paper, we present a memory-efficient, contour-based, region-of-interest (ROI) algorithm designed for ultra-low-bit-
rate compression of very large images. The proposed technique is integrated into a user-interactive wavelet-based
image coding system in which multiple ROIs of any shape and size can be selected and coded efficiently. The coding
technique compresses region-of-interest and background (non-ROI) information independently by allocating more bits to
the selected targets and fewer bits to the background data. This allows the user to transmit large images at very low
bandwidths with lossy/lossless ROI coding, while preserving the background content to a certain level for contextual
purposes. Extremely large images (e.g., 65000 X 65000 pixels) with multiple large ROIs can be coded with minimal
memory usage by using intelligent ROI tiling techniques. The foreground information at the encoder/decoder is
independently extracted for each tile without adding extra ROI side information to the bit stream. The arbitrary ROI
contour is down-sampled and differential chain coded (DCC) for efficient transmission. ROI wavelet masks for each tile
are generated and processed independently to handle any size image and any shape/size of overlapping ROIs. The
resulting system dramatically reduces the data storage and transmission bandwidth requirements for large digital images
with multiple ROIs.
KEYWORDS: Video, Video coding, Motion estimation, Video compression, Super resolution, Computer programming, Video processing, Process control, Video surveillance, Visualization
This paper presents a software-only, real-time video coder/decoder (codec) with super-resolution-based enhancement
for ultra-low-bit-rate compression. The codec incorporates a modified JPEG2000 core and interframe predictive
coding, and can operate with network bandwidths as low as 500 bits/second. Highly compressed video exhibits
severe coding artifacts that degrade visual quality. To lower the level of noise and retain the sharpness of the video
frames, we build on our previous work in super-resolution-based video enhancement and propose a new version that is
suitable for real-time video coding systems. The adopted super-resolution-based enhancement uses a constrained set
of motion vectors that is computed from the original (uncompressed) video at the encoder. Artificial motion is also
added to the difference frame to maximize the enhancement performance. The encoder can transmit either the full set
of motion vectors or the constrained set of motion vectors depending upon the available bandwidth. At the decoder,
each pixel of the decoded frame is assigned to a motion vector from the constrained motion vector set. L2-norm
minimization super-resolution is then applied to the decoded frame set (previous frame, current frame, and next frame).
A selective motion estimation scheme is proposed to prevent ghosting, which otherwise would result from the super-resolution
enhancement when the motion estimation fails to find appropriate motion vectors. Results using the
proposed system demonstrate significant improvements in the quantitative and visual quality of the coded video
sequences.
KEYWORDS: Video, Video compression, Video coding, Video surveillance, Computer programming, Process control, Video processing, JPEG2000, Receivers, Wavelets
This paper presents a software-only, real-time video coder/decoder (codec) for use with low-bandwidth channels where the bandwidth is unknown or varies with time. The codec incorporates a modified JPEG2000 core and interframe predictive coding, and can operate with network bandwidths of less than 1 kbits/second. The encoder and decoder establish two virtual connections over a single IP-based communications link. The first connection is UDP/IP guaranteed throughput, which is used to transmit the compressed video stream in real time, while the second is TCP/IP guaranteed delivery, which is used for two-way control and compression parameter updating. The TCP/IP link serves as a virtual feedback channel and enables the decoder to instruct the encoder to throttle back the transmission bit rate in response to the measured packet loss ratio. It also enables either side to initiate on-the-fly parameter updates such as bit rate, frame rate, frame size, and correlation parameter, among others. The codec also incorporates frame-rate throttling whereby the number of frames decoded is adjusted based upon the available processing resources. Thus, the proposed codec is capable of automatically adjusting the transmission bit rate and decoding frame rate to adapt to any network scenario. Video coding results for a variety of network bandwidths and configurations are presented to illustrate the vast capabilities of the proposed video coding system.
KEYWORDS: Video, Motion estimation, Super resolution, Video compression, Image processing, Image enhancement, Image quality, Point spread functions, Electrical engineering, Digital signal processing
In this paper, a new technique for robust super-resolution (SR) from compressed video is presented. The proposed method exploits the differences between low-resolution images at the pixel level, in order to determine the usability of every pixel in the low-resolution images for SR enhancement. Only the pixels, from the lowresolution
images, that are determined to be usable, are included in the L2-norm minimization procedure. Three different usability criterions are proposed, maximum distance from the median - MDM, maximum distance from initial image - MDIM, and maximum distance from the SR estimate - MDSRE. The results obtained with real video sequences demonstrate superior quality of the resulting enhanced image in the presence of outliers and same quality without outliers when compared to existing L2-norm minimization techniques. At the same time, the proposed scheme produces sharper images as compared to L1-norm minimization techniques.
KEYWORDS: Video, Video surveillance, Video compression, Motion estimation, Super resolution, Signal to noise ratio, Receivers, Video processing, Video coding, Target detection
Modern video surveillance and target tracking applications utilize multiple cameras transmitting low-bit-rate
video through channels of very limited bandwidth. The highly compressed video exhibits coding artifacts that
can cause target detection and tracking procedures to fail. Thus, to lower the level of noise and retain the
sharpness of the video frames, super-resolution techniques can be employed for video enhancement. In this
paper, we propose an efficient super-resolution video enhancement scheme that is based on a constrained set
of motion vectors. The proposed scheme computes the motion vectors using the original (uncompressed) video
frames, and transmits only a small set of these vectors to the receiver. At the receiver, each pixel is assigned
a motion vector from the constrained set to maximize the motion prediction performance. The size of the
transmitted vector set is constrained to be less than 3% of the total coded bit stream. In the video enhancement
process, an L2-norm minimization super-resolution procedure is applied. The proposed scheme is applied to
enhance highly compressed, real-world video sequences. The results obtained show significant improvement in
the visual quality of the video sequences, as well as in the performance of subsequent target detection and
tracking procedures.
This paper presents an error-resilient wavelet-based multiple
description video coding scheme for the transmission of video over
wireless channels. The proposed video coding scheme has been
implemented and successfully tested over the wireless Iridium
satellite communication network. As a test bed for the develope dcodec, we also present an inverse multiplexing unit that simultaneously combines several Iridium channels to form an effective higher-rate channel, where the total bandwidth is directly proportional to the number of channels combined. The developed unit can be integrated into a variety of systems such as ISR sensors, aircraft, vehicles, ships, and end user terminals (EUTs), or can operate as a standalone device. The recombination of the multi-channel unit with our proposed multi-channel video codec facilitates global and on-the-move video communications without reliance on any terrestrial or airborne infrastructure whatsoever.
Context-based arithmetic coding has been widely adopted in image and video compression and is a key component of the new JPEG2000 image compression standard. In this paper, the contexts used in JPEG2000 are analyzed using the mutual information, which has a direct link with the compression performance. We first show that, when combining the contexts, the mutual information between the contexts and the encoded data will decrease unless the conditional probability distributions of the combined contexts are the same. Given I, the
initial number of contexts, and F, the final desired number of contexts, there are S(I, F) possible context classification schemes where S(I, F) is called the Stirling number of the second kind. The optimal classification scheme is the one that gives the maximum mutual information. Instead of exhaustive search, the optimal classification scheme can be obtained through a modified Generalized Lloyd algorithm with the relative entropy as the distortion metric. For binary arithmetic coding, the search complexity can be reduced by using the dynamic programming. Our experimental results show that the JPEG2000 contexts capture very well the correlations among the wavelet coefficients. At the same time, the number of contexts used as part of the standard can be reduced without loss in the coding performance.
This paper presents a wavelet-based image coder that is optimized for transmission over binary channels with memory. The proposed coder uses a channel-optimized trellis-coded quantizer (COTCQ) designed for a binary first-order Markov channel. The quantizer stage exploits the channel memory by incorporating the characteristics of the additive correlated channel noise during the quantizer design and by using a new trellis structure. The performance of the proposed memory-optimized COTCQ (MCOTCQ) image coding system is presented for different bit error probabilities and noise correlation parameters. It is shown that the performance of the coder is improved significantly when the second-order statistics of the noise are incorporated at the quantizer design level.
This paper presented a DCT-based image coder optimized for transmission over binary symmetric channels. The proposed coder uses a robust channel-optimized trellis-coded quantization stage that is designed to optimize the image coding based on the channel characteristics. This optimization is performed only at the level of the source encoder, and does not include any channel coding for error protection. The robust nature of the coder increases the security level of the encoded bit stream and provides a much more visually pleasing rendition of the decoded image. Consequently, the proposed robust channel-optimized image coder is especially suitable for wireless transmission due to its reduced completely, its robustness to non-stationary signal and channels, and its increased security level.
This paper presents a wavelet-based hyperspectral image coder that is optimized for transmission over the binary symmetric channel. The proposed coder uses a robust channel- optimized trellis-coded quantization stage that is designed to optimize the image coding based on the channel characteristics. This optimization is performed only at the level of the source encoder, and does not include any channel coding for error protection. The robust nature of the coder increases the security level of the encoded bit stream and provides a much more visually pleasing rendition of the decoded image. In the absence of channel noise, the proposed coder is shown to achieve a compression ratio greater than 70:1, with an average PSNR of the coded hyperspectral sequence exceeding 40 dB. Additionally, the coder is shown to exhibit graceful degradation with increasing channel errors.
This paper presents a wavelet-based image coder optimized for transmission over binary symmetric channels (BSC). The proposed coder uses a robust channel-optimized trellis-coded quantization (COTCQ) stage that is designed to optimize the image coding based on channel characteristics. This optimization is performed only at the level of the source encoder, and does not include any channel coding for error protection. The robust nature of the coder increases the security level of the encoded bit stream and provides a much more visually pleasant rendition of the decoded image. Consequently, the proposed robust channel-optimized image coder is especially suitable for wireless transmission due to its reduced complexity, its robustness to non-stationary signals and channels, and its increased security level.
This paper treats the compression of Synthetic Aperture Radar (SAR) imagery. SAR images are difficult to compress, relative to natural images, because SAR contains an inherent high frequency speckle. Today's state-of-the-art coders are designed to work with natural images, which have a lower frequency content. Thus, their performance on SAR is under par. In this paper we given an overview performance report on the popular compressions techniques, and investigate three approaches to improve the quality of SAR compression at low- bit rates. First, we look at the design of optimal quantizers which we obtain by training on SAR data. Second, we explore the use of perceptual properties of the human visual system to improve subjective coding quality. Third, we consider the use of a model that separates the SAR image into structural and textural components. The paper concludes with a subjective evaluation of the algorithms based on the CCIR recommendation for the assessment of picture quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.