In this paper, we present a novel method for inverse filtering a two dimensional (2-D) signal using phase-based processing techniques. A 2-D sequence can be represented by a sufficient number of samples of the phase of its Fourier transform and its region of support. This is exploited to perform deconvolution. We examine the effects of additive noise and incomplete knowledge of the point spread function on the performance of this deconvolution method and compare it with other 2-D deconvolution methods. The problem of finding the region of support will also be brie y addressed. Finally, an application example will be presented.
The electronic rolling shutter mechanism found in many digital cameras may result in spatially-varying blur kernels if camera motion occurs during an imaging exposure. However, existing deblurring algorithms cannot remove the blurs in this case since the blurred image doesn't typically meet the assumptions embedded in these algorithms. This paper attempts to address the problem of modeling and correcting non-uniform image blurs caused by the rolling shutter effect. We introduce a new operator and a mask matrix into the projective motion blur model to describe the blurring process of each row in the image. Based on this modified geometric model, an objective function is formulated and optimized in an alternating scheme. In addition, noisy accelerometer data along x and y directions is incorporated as a regularization term to constrain the solution. The effectiveness of this approach is demonstrated by experimental results on synthetic and real images.
In this paper, an innovative method of HEVC video pre-processing is proposed. The method applies a simple linear iterative clustering (SLIC), which adapts a k-means clustering to group pixels into perceptually meaningful atomic regions of superpixels. By calculating the average of weighted average of luminance differences around each pixel in the superpixel, a suitable parameter of Gaussian filter for the superpixel is determined. Experimental results show that bit rate can be reduced up to 29% without loss in visual quality.
Block-transform lossy image compression is the most widely-used approach for compressing and storing images or video. A novel algorithm to restore highly compressed images with greater image quality is proposed. Since many block-transform coefficients are reduced to zero after quantization, the compressed image restoration problem can be treated as a sparse reconstruction problem where the original image is reconstructed based on sparse, degraded measurements in the form of highly quantized block-transform coefficients. The sparse reconstruction problem is solved by minimizing a homotopic regularized function, subject to data fidelity in the block-transform domain. Experimental results using compressed natural images at di erent levels of compression show improved performance by using the proposed algorithm compared to other methods.
This paper proposed a novel rain detection and removal algorithm robust against camera motions. It is very difficult to detect and remove rain in video with camera motion. So, most previous works assume that camera is fixed. However, these methods are not useful for application. The proposed algorithm initially detects possible rain streaks by using spatial properties such as luminance and structure of rain streaks. Then, the rain streak candidates are selected based on Gaussian distribution model. Next, a non-rain block matching algorithm is performed between adjacent frames to find similar blocks to each including rain pixels. If the similar blocks to the block are obtained, the rain region of the block is reconstructed by non-local mean (NLM) filtering using the similar neighbors. Experimental results show that the proposed method outperforms previous works in terms of objective and subjective visual quality.
Exploiting perceptual redundancy plays an important role in image processing. Conventional JND models describe the visibility of the minimally perceptible difference by assuming that the visual acuity is consistent over the whole image. Some earlier work considers the space-variant properties of HVS-based on the non-uniform density of photoreceptor cells. In this paper, we aim to exploit the relationship between the masking effects and the foveation properties of HVS. We design the psychophysical experiments which are conducted to model the foveation properties in response to the masking effects. The experiment examines the reduction of visual sensitivity in HVS due to the increased retinal eccentricity. Based on these experiments, the developed Foveated JND model measures the perceptible difference of images according to masking effects therefore provides the information to quantify the perceptual redundancy in the images. Subjective evaluations validate the proposed FJND model.
This paper describes a technique for performing intra prediction of the chroma planes based on the reconstructed luma plane in the frequency domain. This prediction exploits the fact that while RGB to YUV color conversion has the property that it decorrelates the color planes globally across an image, there is still some correlation locally at the block level.1 Previous proposals compute a linear model of the spatial relationship between the luma plane (Y) and the two chroma planes (U and V).2 In codecs that use lapped transforms this is not possible since transform support extends across the block boundaries3 and thus neighboring blocks are unavailable during intra- prediction. We design a frequency domain intra predictor for chroma that exploits the same local correlation with lower complexity than the spatial predictor and which works with lapped transforms. We then describe a low- complexity algorithm that directly uses luma coefficients as a chroma predictor based on gain-shape quantization and band partitioning. An experiment is performed that compares these two techniques inside the experimental Daala video codec and shows the lower complexity algorithm to be a better chroma predictor.
This paper applies energy conservation principles to the Daala video codec using gain-shape vector quantization to encode a vector of AC coefficients as a length (gain) and direction (shape). The technique originates from the CELT mode of the Opus audio codec, where it is used to conserve the spectral envelope of an audio signal. Conserving energy in video has the potential to preserve textures rather than low-passing them. Explicitly quantizing a gain allows a simple contrast masking model with no signaling cost. Vector quantizing the shape keeps the number of degrees of freedom the same as scalar quantization, avoiding redundancy in the representation. We demonstrate how to predict the vector by transforming the space it is encoded in, rather than subtracting off the predictor, which would make energy conservation impossible. We also derive an encoding of the vector-quantized codewords that takes advantage of their non-uniform distribution. We show that the resulting technique outperforms scalar quantization by an average of 0.90 dB on still images, equivalent to a 24.8% reduction in bitrate at equal quality, while for videos, the improvement averages 0.83 dB, equivalent to a 13.7% reduction in bitrate.
In the Differential Pulse-code Modulation (DPCM) image coding, the intensity of a pixel is predicted as a linear combination of a set of surrounding pixels and the prediction error is encoded. In this paper, we propose the adaptive residual DPCM (ARDPCM) for intra lossless coding. In the ARDPCM, intra residual samples are predicted using adaptive mode-dependent DPCM weights. The weights are estimated by minimizing the Mean Squared Error (MSE) of coded data and they are synchronized at the encoder and the decoder. The proposed method is implemented on the High Efficiency Video Coding (HEVC) reference software. Experimental results show that the ARDPCM significantly outperforms HEVC lossless coding and HEVC with the DPCM. The proposed method is also computationally efficient.
Buffer or counter-based techniques are adequate for dealing with carry propagation in software implementations of arithmetic coding, but create problems in hardware implementations due to the difficulty of handling worst-case scenarios, defined by very long propagations. We propose a new technique for constraining the carry propagation, similar to “bit-stuffing,” but designed for encoders that generate data as bytes instead of individual bits, and is based on the fact that the encoder and decoder can maintain the same state, and both can identify the situations when it desired to limit carry propagation. The new technique adjusts the coding interval in a way that corresponds to coding an unused data symbol, but selected to minimize overhead. Our experimental results demonstrate that the loss in compression can be made very small using regular precision for arithmetic operations.
The Block Matching Algorithms used in most popular video codec standards introduce blocking artifacts which must be removed via residual coding or deblocking filters. Alternative transform stages that do not cause blocking artifacts, such as lapped transforms or wavelets, require motion compensation methods that do not produce blocking artifacts, since they are expensive to remove. We design a new Overlapped Block Motion Compensation (OBMC) scheme that avoids these artifacts while allowing adaptive blending window sizes. This has the potential to show significant visual quality improvements over traditional OBMC.
We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an e ort to improve LASSO's prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features. les that typically contain high-entropy data, producing a footprint that is far less conspicuous than existing methods. The system uses a local web server to provide a le system, user interface and applications through an web architecture.
We propose a method for the fair and efficient allocation of wireless resources over a cognitive radio system network to transmit multiple scalable video streams to multiple users. The method exploits the dynamic architecture of the Scalable Video Coding extension of the H.264 standard, along with the diversity that OFDMA networks provide. We use a game-theoretic Nash Bargaining Solution (NBS) framework to ensure that each user receives the minimum video quality requirements, while maintaining fairness over the cognitive radio system. An optimization problem is formulated, where the objective is the maximization of the Nash product while minimizing the waste of resources. The problem is solved by using a Swarm Intelligence optimizer, namely Particle Swarm Optimization. Due to the high dimensionality of the problem, we also introduce a dimension-reduction technique. Our experimental results demonstrate the fairness imposed by the employed NBS framework.
In this paper, we are concerned with unsupervised natural image matting. Due to the under-constrained nature of the problem, image matting algorithms are usually provided with user interactions, such as scribbles or trimaps. This is a very tedious task and may even become impractical for some applications. For unsupervised matte calculation, we can either adopt a technique that supports an unsupervised mode for alpha map calculation, or we may automate the process of acquiring user interactions provided for a matting algorithm. Our proposed technique contributes to both approaches and is based on spectral matting. The latter is the only technique in the literature that supports automatic matting but it suffers from critical limitations among which is the unreliable unsupervised operation. Stressing on that drawback, spectral matting may produce erroneous mattes in the absence of guiding scribbles or trimaps. Using the Gestalt laws of grouping, we propose a method that automatically produces more truthful mattes than spectral matting. In addition, it can be used to generate trimaps, eliminating the required user interactions and making it possible to harness the powers of matting techniques that are better than spectral matting but don't support unsupervised operation. The main contribution of this research is the introduction of the Gestalt laws of grouping to the matting problem.
Law enforcement is interested in exploiting tattoos as an information source to identify, track and prevent gang-related crimes. Many tattoo image retrieval systems have been described. In a retrieval system tattoo segmentation is an important step for retrieval accuracy since segmentation removes background information in a tattoo image. Existing segmentation methods do not extract the tattoo very well when the background includes textures and color similar to skin tones. In this paper we describe a tattoo segmentation approach by determining skin pixels in regions near the tattoo. In these regions graph-cut segmentation using a skin color model and a visual saliency map is used to find skin pixels. After segmentation we determine which set of skin pixels are connected with each other that form a closed contour including a tattoo. The regions surrounded by the closed contours are considered tattoo regions. Our method segments tattoos well when the background includes textures and color similar to skin.
Human object tracking is a challenging problem in video processing applications, and is an important step toward development of surveillance system. In this paper, we have proposed a new method for tracking of a human object in video sequence which is based on Contourlet transform. We have chosen Contourlet transform as it has high directionality and represents salient features of image such as edges, curves and contours in better way as compared with wavelet transform. The proposed method is simple and does not require any other parameter except Contourlet coefficients. Results after applying the proposed method for human object tracking is compared with other state-of-theart methods in terms of visual as well as quantitative performance measures viz. Euclidean distance and Mahalanobis distance. The proposed method is found to be better than other methods.
In this paper, we introduce a computational model of top-down saliency based on multiscale orientation information for artificial object detection for satellite images. Further more, the top-down saliency is integrated with bottom-up saliency to obtain the saliency map in satellite images. We compare our method to several state-of-the-art saliency detection models and demonstrate the superior performance in artificial object detection for satellite images.
In this paper, we propose to use a quantitative approach based on LS-SVM to perform estimation of the impact of lossy compression on remote sensing image compression. Kernel function selection and the model parameters computation are studied for remote sensing image classification when LS-SVM analysis model is establish. The experiments show that our LS-SVM model achieves a good performance in remote sensing image compression analysis. Classification accuracy variation according to compression ratio scales are summarized based on our experiments.
This paper is concerned with the problem of image completion where the goal is to fill large missing parts (holes) in an image, video or a scene in a visually-plausible and a computationally-efficient manner. Recently, the literature on hole filling was dominated by exemplar-based (patch-based) filling techniques with a two-stage unified pipeline that starts by building a bag of significant patches (BoSP), and then uses that bag to fill the hole. In this paper, we propose a new framework which addresses the inherent limitations of the state-of-the-art techniques. Our method capitalizes on a newly-developed technique for image skimming, followed by a novel procedure to propagate the constructed skim to within the hole. Experimental results show that our method compares favourably with the state-of-the-art.
This article discusses features of the parallel hashing for the designing of the frame filtering tables in distributed computing systems. The proposed method of filtering tables design can reduce the time of frame processing by network bridges and switches and provide a low probability of filtering table overflowing. The optimal number of parallel tables was determined for a given amount of memory for table design.
As the demand for higher quality and higher resolution video increases, many applications fail to meet this demand due to low bandwidth restrictions. One factor contributing to this problem is the high bitrate requirement of the intra-coded Instantaneous Decoding Refresh (IDR) frames featuring in all video coding standards. Frequent coding of IDR frames is essential for error resilience in order to prevent the occurrence of error propagation. However, as each one consumes a huge portion of the available bitrate, the quality of future coded frames is hindered by high levels of compression. This work presents a new technique, known as Spatial Resampling of IDR Frames (SRIF), and shows how it can increase the rate distortion performance by providing a higher and more consistent level of video quality at low bitrates.
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The initial affine transformation model will be achieved. In subsequent multiresolution octaves, we apply the transformation affine model getting from upper octave to current octave, then, combined with geometrical distribution of matched keypoints to further remove incorrect mappings and update affine transformation model. The strategy ends with the best affine transformation model on the bottom octave(full-size image). Experimental results show that the proposed method can achieve comparative accuracy with less computational than original SIFT
This paper presents a cascade of classifiers with “resurrection” mechanism for building reliable keypoint matches. It is likely to cause that correct keypoint mappings are removed because of too strict regulation in many existing solutions of image registration. To avoid this situation and get accuracy result, a cascade framework with multi-steps is proposed to remove the incorrect keypoint mappings. To further reduce the rate of misjudgment to correct mappings in each step, we introduce “resurrection” in a cascade structure. Keypoint mappings are initially built with their associated descriptors, and then in each step part of keypoint mappings are determined to be incorrect and deleted completely. Meanwhile, some mappings which perform relatively poor are undetermined and their fate will be decided in next step under their performance. By this means, we use multi-steps efficiently and reduce misjudgment to correct mappings. Experimental results show that the presented cascade structure can robustly remove the outlier keypoint mappings and achieve accurate image registration.
For a freehand line image drawn onto a PC screen where a user-selected reference image, e.g., a color photograph, as a model is faintly displayed with low contrast, we proposed a method for automatic coloring with a constrained Delaunay triangulation that divides the image into small triangles. Using a prototype system based on the proposed method, users can complete impressive pictures by only drawing lines. Our coloring method begins with the triangulation for the set of sampling points on the drawn lines, followed by sampling of color in each triangle on the reference image, smoothing of color among neighboring triangles, and painting of each triangle with the smoothed color. The result of the triangulation is modified such that it satisfies a constraint where its divided lines should not cross over the drawn lines not to mix colors beyond the drawn line. Our prototype system can display the coloring result of the current drawings immediately for convenience. So, the user can check the effect of a newly drawn line on coloring at any time. As the result of the coloring depends on how the user draws freehand lines, it can be seen as an art work with the individuality of each user’s drawings.
Most image sensors mimic film, integrating light during an exposure interval and then reading the "latent" image as a complete frame. In contrast, frameless image capture attempts to construct a continuous waveform for each sensel describing how the Ev (exposure value required at each pixel) changes over time. This allows great flexibility in computationally extracting frames after exposure. An overview of how this could be accomplished was presented at EI2014, with an emphasis on frameless sensor technology. In contrast, the current work centers on deriving frameless data from sequences of conventionally captured frames.