TOPICS: Visual process modeling, Image processing, Receptors, Communication and information technologies, Chromium, Human vision and color perception, Data processing, Physics, Visualization, Psychophysics
It is our pleasure to introduce here the second special section about Retinex in the Journal of Electronic Imaging. The previous one was in Vol. 13 and was Retinex at 40. Here we celebrate its 50th anniversary. Let us briefly recall the origins of Retinex
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
We present a faster and noise-free implementation of the RACE algorithm. RACE has mixed characteristics between the famous Retinex model of Land and McCann and the automatic color equalization (ACE) color-correction algorithm. The original random spray-based RACE implementation suffers from two main problems: its computational time and the presence of noise. Here, we will show that it is possible to adapt two techniques recently proposed by Banić et al. to the RACE framework in order to drastically decrease the computational time and noise generation. The implementation will be called smart-light-memory-RACE (SLMRACE).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Retinex theory estimates the human color sensation at any observed point by correcting its color based on the spatial arrangement of the colors in proximate regions. We revise two recent path-based, edge-aware Retinex implementations: Termite Retinex (TR) and Energy-driven Termite Retinex (ETR). As the original Retinex implementation, TR and ETR scan the neighborhood of any image pixel by paths and rescale their chromatic intensities by intensity levels computed by reworking the colors of the pixels on the paths. Our interest in TR and ETR is due to their unique, content-based scanning scheme, which uses the image edges to define the paths and exploits a swarm intelligence model for guiding the spatial exploration of the image. The exploration scheme of ETR has been showed to be particularly effective: its paths are local minima of an energy functional, designed to favor the sampling of image pixels highly relevant to color sensation. Nevertheless, since its computational complexity makes ETR poorly practicable, here we present a light version of it, named Light Energy-driven TR, and obtained from ETR by implementing a modified, optimized minimization procedure and by exploiting parallel computing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
TOPICS: Visual process modeling, Image processing, Reflectivity, Algorithms, Data modeling, Human vision and color perception, High dynamic range imaging, Visualization, Color vision, Retina
Retinex Imaging shares two distinct elements: first, a model of human color vision; second, a spatial-imaging algorithm for making better reproductions. Edwin Land’s 1964 Retinex Color Theory began as a model of human color vision of real complex scenes. He designed many experiments, such as Color Mondrians, to understand why retinal cone quanta catch fails to predict color constancy. Land’s Retinex model used three spatial channels (L, M, S) that calculated three independent sets of monochromatic lightnesses. Land and McCann’s lightness model used spatial comparisons followed by spatial integration across the scene. The parameters of their model were derived from extensive observer data. This work was the beginning of the second Retinex element, namely, using models of spatial vision to guide image reproduction algorithms. Today, there are many different Retinex algorithms. This special section, “Retinex at 50,” describes a wide variety of them, along with their different goals, and ground truths used to measure their success. This paper reviews (and provides links to) the original Retinex experiments and image-processing implementations. Observer matches (measuring appearances) have extended our understanding of how human spatial vision works. This paper describes a collection very challenging datasets, accumulated by Land and McCann, for testing algorithms that predict appearance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
When we perform a visual analysis of a cosmic object photograph, the contrast plays a fundamental role. We present an approach based on spatial color algorithms to enhance local contrast to make it easier to detect relevant information. We show very promising results on amateur photographs of deep sky objects. The results are presented for a qualitative and subjective visual evaluation and for a quantitative evaluation through image quality measures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Some spatial color algorithms, such as Brownian Milano retinex (MI-retinex) and random spray retinex (RSR), are based on sampling. In Brownian MI-retinex, memoryless random walks (MRWs) explore the neighborhood of a pixel and are then used to compute its output. Considering the relative redundancy and inefficiency of MRW exploration, the algorithm RSR replaced the walks by samples of points (the sprays). Recent works point to the fact that a mapping from the sampling formulation to the probabilistic formulation of the corresponding sampling process can offer useful insights into the models, at the same time featuring intrinsically noise-free outputs. The paper continues the development of this concept and shows that the population-based versions of RSR and Brownian MI-retinex can be used to obtain analytical expressions for the outputs of some test images. The comparison of the two analytic expressions from RSR and from Brownian MI-retinex demonstrates not only that the two outputs are, in general, different but also that they depend in a qualitatively different way upon the features of the image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Several different implementations of the Retinex model have been derived from the original Land and McCann’s paper. This paper aims at presenting the Milano-Retinex family, a collection of slightly different Retinex implementations, developed by the Department of Computer Science of Universitá degli Studi di Milano. One important difference is in their goals: while the original Retinex aims at modeling vision, the Milano-Retinex family is mainly applied as an image enhancer, mimicking some mechanisms of the human vision system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Following Land and McCann’s first proposal of the Retinex theory, numerous Retinex algorithms that differ considerably both algorithmically and functionally have been developed. We clarify the relationships among various Retinex families by associating their spatial processing structures to the neural organizations in the retina and the primary visual cortex in the brain. Some of the Retinex algorithms have a retina-like processing structure (Land’s designator idea and NASA Retinex), and some show a close connection with the cortical structures in the primary visual area of the brain (two-dimensional L&M Retinex). A third group of Retinexes (the variational Retinex) manifests an explicit algorithmic relation to Wilson–Cowan’s physiological model. We intend to overview these three groups of Retinexes with the frame of reference in the biological visual mechanisms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
A model of achromatic color computation by the human visual system is presented, which is shown to account in an exact quantitative way for a large body of appearance matching data collected with simple visual displays. The model equations are closely related to those of the original Retinex model of Land and McCann. However, the present model differs in important ways from Land and McCann’s theory in that it invokes additional biological and perceptual mechanisms, including contrast gain control, different inherent neural gains for incremental, and decremental luminance steps, and two types of top-down influence on the perceptual weights applied to local luminance steps in the display: edge classification and spatial integration attentional windowing. Arguments are presented to support the claim that these various visual processes must be instantiated by a particular underlying neural architecture. By pointing to correspondences between the architecture of the model and findings from visual neurophysiology, this paper suggests that edge classification involves a top-down gating of neural edge responses in early visual cortex (cortical areas V1 and/or V2) while spatial integration windowing occurs in cortical area V4 or beyond.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
TOPICS: Imaging systems, Radar imaging, Radar, Reconstruction algorithms, Signal to noise ratio, Detection and tracking algorithms, Transmitters, Space based lasers, Algorithm development, Error analysis
Radar coincidence imaging (RCI) is a staring imaging technique that originated from optical coincidence imaging. In RCI, the reference matrix needs to be computed precisely to reconstruct the image. However, it is difficult to exactly calculate the reference matrix as model mismatch existing in most applications. The signal model of RCI with model mismatch is derived. Based on a Bayesian framework and regularization method, an algorithm called regularization-focal underdetermined system solver (R-FOCUSS) is proposed to solve the RCI problem with model mismatch. In the proposed method, the scattering coefficients and the perturbation matrix can be calculated during the iterations, so the image can be reconstructed. A norm-ratio method is also proposed to determine the regularization parameters in the objective function, which makes the algorithm suitable for the situation, where the distributions of noise, model error, and target’s sparsity are unknown. The constrained Cramér–Rao bound for scatterer estimation is derived. Compared with some existing sparse reconstruction methods, R-FOCUSS is more robust, with a lower computation complexity. Results of numerical experiments demonstrate that the algorithm can achieve outstanding imaging performance and yields superior performance both in suppressing noise and in adapting to model mismatch.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
In many countries, the motorcyclist fatality rate is much higher than that of other vehicle drivers. Among many other factors, motorcycle rear-end collisions are also contributing to these biker fatalities. To increase the safety of motorcyclists and minimize their road fatalities, this paper introduces a vision-based rear-end collision detection system. The binary road detection scheme contributes significantly to reduce the negative false detections and helps to achieve reliable results even though shadows and different lane markers are present on the road. The methodology is based on Harris corner detection and Hough transform. To validate this methodology, two types of dataset are used: (1) self-recorded datasets (obtained by placing a camera at the rear end of a motorcycle) and (2) online datasets (recorded by placing a camera at the front of a car). This method achieved 95.1% accuracy for the self-recorded dataset and gives reliable results for the rear-end vehicle detections under different road scenarios. This technique also performs better for the online car datasets. The proposed technique’s high detection accuracy using a monocular vision camera coupled with its low computational complexity makes it a suitable candidate for a motorbike rear-end collision detection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Correlation filter-based tracking has exhibited impressive robustness and accuracy in recent years. Standard correlation filter-based trackers are restricted to translation estimation and equipped with fixed target response. These trackers produce an inferior performance when encountered with a significant scale variation or appearance change. We propose a log-polar mapping-based scale space tracker with an adaptive target response. This tracker transforms the scale variation of the target in the Cartesian space into a shift along the logarithmic axis in the log-polar space. A one-dimensional scale correlation filter is learned online to estimate the shift along the logarithmic axis. With the log-polar representation, scale estimation is achieved accurately without a multiresolution pyramid. To achieve an adaptive target response, a variance of the Gaussian function is computed from the response map and updated online with a learning rate parameter. Our log-polar mapping-based scale correlation filter and adaptive target response can be combined with any correlation filter-based trackers. In addition, the scale correlation filter can be extended to a two-dimensional correlation filter to achieve joint estimation of the scale variation and in-plane rotation. Experiments performed on an OTB50 benchmark demonstrate that our tracker achieves superior performance against state-of-the-art trackers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
A method of feature extraction and small target detection, based on infrared polarization, which uses the technical superiority of infrared polarization imaging in artificial target detection to solve the clutter interference problem in infrared target detection, is proposed. First, using the differences in the polarization characteristics of the artificial target and the natural background, the infrared polarization information models for the target and background are established. The compositions of intensity information, polarization information, and target polarization information are extracted, and enhancement measures are analyzed. Then, the variable polarization theories are combined to extract the target polarization characteristics and suppress the background clutter. Finally, the infrared small target is detected, and comparisons with existing methods demonstrate the effectiveness and reliability of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Person reidentification (re-id) has been widely studied because of its extensive use in video surveillance and forensics applications. It aims to search a specific person among a nonoverlapping camera network, which is highly challenging due to large variations in the cluttered background, human pose, and camera viewpoint. We present a metric learning algorithm for learning a Mahalanobis distance for re-id. Generally speaking, there exist two forces in the conventional metric learning process, one pulling force that pulls points of the same class closer and the other pushing force that pushes points of different classes as far apart as possible. We argue that, when only a limited number of training data are given, forcing interclass distances to be as large as possible may drive the metric to overfit the uninformative part of the images, such as noises and backgrounds. To alleviate overfitting, we propose the ring-push metric learning algorithm. Different from other metric learning methods that only punish too small interclass distances, in the proposed method, both too small and too large inter-class distances are punished. By introducing the generalized logistic function as the loss, we formulate the ring-push metric learning as a convex optimization problem and utilize the projected gradient descent method to solve it. The experimental results on four public datasets demonstrate the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Occlusion is one of the most challenging problems in visual object tracking. Recently, a lot of discriminative methods have been proposed to deal with this problem. For the discriminative methods, it is difficult to select the representative samples for the target template updating. In general, the holistic bounding boxes that contain tracked results are selected as the positive samples. However, when the objects are occluded, this simple strategy easily introduces the noises into the training data set and the target template and then leads the tracker to drift away from the target seriously. To address this problem, we propose a robust patch-based visual tracker with online representative sample selection. Different from previous works, we divide the object and the candidates into several patches uniformly and propose a score function to calculate the score of each patch independently. Then, the average score is adopted to determine the optimal candidate. Finally, we utilize the non-negative least square method to find the representative samples, which are used to update the target template. The experimental results on the object tracking benchmark 2013 and on the 13 challenging sequences show that the proposed method is robust to the occlusion and achieves promising results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
The high computational complexity of tree-based multipath search approaches makes putting them into practical use difficult. However, reselection of candidate atoms could make the search path more accurate and efficient. We propose a multipath greedy approach called fast sparsity adaptive multipath matching pursuit (fast SAMMP), which performs a sparsity adaptive tree search to find the sparsest solution with better performances. Each tree branch acquires K atoms, and fast SAMMP reselects the best K atoms among 2K atoms. Fast SAMMP adopts sparsity adaptive techniques that allow more practical applications for the algorithm. We demonstrated the reconstruction performances of the proposed fast scheme on both synthetically generated one-dimensional signals and two-dimensional images using Gaussian observation matrices. The experimental results indicate that fast SAMMP achieves less reconstruction time and a much higher exact recovery ratio compared with conventional algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Estimating three-dimensional (3-D) pose from a single image is usually performed by retrieving pose candidates with two-dimensional (2-D) features. However, pose retrieval usually relies on the acquisition of sufficient labeled data and suffers from low retrieving accuracy. Acquiring a large amount of unconstrained 2-D images annotated with 3-D poses is difficult. To solve these issues, we propose a coupled-source framework that integrates two independent training sources. The first source contains only 3-D poses, and the second source contains images annotated with 2-D poses. For accurate retrieval, we present a local-topology preserved sparse coding (LTPSC) to generate pose candidates, where the estimated 2-D pose of a test image is regarded as features for pose retrieval and represented as a sparse combination of features in the exemplar database. Our LTPSC can ensure that the semantically similar poses are retrieved with larger probabilities. Extensive experiments validate the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
In most sparse representation methods for face recognition (FR), occlusion problems were usually solved via removing the occlusion part of both query samples and training samples to perform the recognition process. This practice ignores the global feature of facial image and may lead to unsatisfactory results due to the limitation of local features. Considering the aforementioned drawback, we propose a method called varying occlusion detection and iterative recovery for FR. The main contributions of our method are as follows: (1) to detect an accurate occlusion area of facial images, an image processing and intersection-based clustering combination method is used for occlusion FR; (2) according to an accurate occlusion map, the new integrated facial images are recovered iteratively and put into a recognition process; and (3) the effectiveness on recognition accuracy of our method is verified by comparing it with three typical occlusion map detection methods. Experiments show that the proposed method has a highly accurate detection and recovery performance and that it outperforms several similar state-of-the-art methods against partial contiguous occlusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
A quaternion vector gradient filter is proposed for RGB-depth (RGB-D) video contour detection. First, a holistic quaternion vector system is introduced to synthetically express the color and depth information, by adding the depth to its scalar part. Then, a convolution differential operator for quaternion vector is proposed to highlight edges with both depth and chromatic variations but restrain the gradient of intensity term. In addition, the quaternion vector gradients are adaptively weighted utilizing depth confidence measure and the quadtree decomposition of the coding tree units in the video streaming. Results on the 3-D high-efficiency video coding test sequences and quantitative simulated experiments on Berkeley segmentation datasets both indicate the availability of the proposed gradient-based method on detecting the semantic contour of the RGB-D videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
This paper describes a central processing unit (CPU)-based technique for terrain geometry rendering that could relieve graphics processing unit (GPU) from processing the appropriate level of detail (LOD) of the geometric surface. The proposed approach alleviates the computational load on the CPU and approaches GPU-based efficiency. As the datasets of realistic terrains are usually huge for real-time rendering, we suggest using a training stage to handle large tiled QuadTree terrain representation. The training stage is based on multiresolution wavelet decomposition and is used to limit the region of error control inside the tile. Maximum approximation errors are then calculated for each tile at different resolutions. Maximum world-space errors of the tile at different resolutions permit selection of the appropriate resolution of downsampling that will represent the tile at the run time. Tests and experiments demonstrate that B-spline 0 and B-spline 1 wavelets, well known for their properties of localization and their compact support, are suitable for fast and accurate localization of the maximum approximation error. The experimental results demonstrate that the proposed approach drastically reduces computation time in the CPU. Such a technique should also be used on low/medium end PCs, and embedded systems that are not equipped with the latest models of graphic hardware.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
We introduce a hybrid fingerprint recognition method built from minutiae and quaternion orthogonal moments. The proposed algorithm includes four steps: extraction of the minutiae triplets (m-triplets), first pass of triplets minutiae matching, validation step of these triplets by characterizing their neighboring gray-level image information through feature vectors of quaternion radial moments, and an adequate similarity measure. By boosting the local minutiae matching step, we avoid consolidation and global matching. To show the added-value of our method, several algorithms for extracting and matching m-triplets are considered and an experimental comparison is established. Experiments are carried out using all four parts of the FVC2004 dataset. Results indicate that the combination of the geometrical features and the quaternion radial moments of the m-triplets leads to an improvement in the overall fingerprint matching performance and demonstrate the expected gain of integrating a validation step in an m-triplets based fingerprint matching algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Abnormal event detection in crowded scenes is a challenging problem due to the high density of the crowds and the occlusions between individuals. We propose a method using two sparse dictionaries with saliency to detect abnormal events in crowded scenes. By combining a multiscale histogram of optical flow (MHOF) and a multiscale histogram of oriented gradient (MHOG) into a multiscale histogram of optical flow and gradient, we are able to represent the feature of a spatial–temporal cuboid without separating the individuals in the crowd. While MHOF captures the temporal information, MHOG encodes both spatial and temporal information. The combination of these two features is able to represent the cuboid’s appearance and motion characteristics even when the density of the crowds becomes high. An abnormal dictionary is added to the traditional sparse model with only a normal dictionary included. In addition, the saliency of the testing sample is combined with two sparse reconstruction costs on the normal and abnormal dictionary to measure the normalness of the testing sample. The experiment results show the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Omnidirectional vision with the advantage of a large field-of-view overcomes the problem that a target is easily lost due to the narrow sight of perspective vision. We improve a target-tracking algorithm based on discriminative tracking features in several aspects and propose a target-tracking algorithm for an omnidirectional vision system. (1) An elliptical target window expression model is presented to represent the target’s outline, which can adapt to the deformation of an object and reduce background interference. (2) The background-weighted linear RGB histogram target feature is introduced, which decreases the weight of the background feature. (3) The Bhattacharyya coefficients-based feature identification method is employed, which reduces the computation time of the tracking algorithm. (4) An adaptive target scale and orientation measurement method is applied to adapt to severe deformations of the target’s outline. (5) A model update strategy is put forward, which is based on similarity measurements to achieve an effective and accurate model update. The experimental results show the proposed algorithm can achieve better performance than the state-of-the-art algorithms when using omnidirectional vision to perform long-term target-tracking tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
The problem of automatically recognizing human activities from videos through the fusion of the two most important cues, appearance metric feature and kinematics feature, is considered. And a system of two-dimensional (2-D) Poisson equations is introduced to extract the more discriminative appearance metric feature. Specifically, the moving human blobs are first detected out from the video by background subtraction technique to form a binary image sequence, from which the appearance feature designated as the motion accumulation image and the kinematics feature termed as centroid instantaneous velocity are extracted. Second, 2-D discrete Poisson equations are employed to reinterpret the motion accumulation image to produce a more differentiated Poisson silhouette image, from which the appearance feature vector is created through the dimension reduction technique called bidirectional 2-D principal component analysis, considering the balance between classification accuracy and time consumption. Finally, a cascaded classifier based on the nearest neighbor classifier and two directed acyclic graph support vector machine classifiers, integrated with the fusion of the appearance feature vector and centroid instantaneous velocity vector, is applied to recognize the human activities. Experimental results on the open databases and a homemade one confirm the recognition performance of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Aerial images are often degraded by space-varying motion blurs and simultaneous uneven illumination. To recover a high-quality aerial image from its nonuniform version, we propose a patchwise restoration approach based on a key observation that the degree of blurring is inevitably affected by the illumination conditions. A nonlocal Retinex model is developed to accurately estimate the reflectance component from the degraded aerial image. Thereafter, the uneven illumination is corrected well. Then nonuniform coupled blurring in the enhanced reflectance image is alleviated and transformed toward uniform distribution, which will facilitate the subsequent deblurring. For constructing the multiscale sparsified regularization, the discrete shearlet transform is improved to better represent anisotropic image features in terms of directional sensitivity and selectivity. In addition, a new adaptive variant of total generalized variation is proposed to act as the structure-preserving regularizer. These complementary regularizers are elegantly integrated into an objective function. The final deblurred image with uniform illumination can be obtained by applying a fast alternating direction scheme to solve the derived function. The experimental results demonstrate that our algorithm can not only effectively remove both the space-varying illumination and motion blurs in aerial images, but also recover the abundant details of aerial scenes with top-level objective and subjective quality, and outperforms other state-of-the-art restoration methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Due to the variation of background, illumination, and view point, license plate detection in an open environment is challenging. We propose a detection method by boundary clustering. To start with, a boundary map is obtained through Canny edge detector and removal of unwanted horizontal background edges. Second, boundaries are classified into different clusters by a density-based approach. In the approach, the density of each boundary is defined by the total gradient intensity of its neighboring and reachable boundaries. Also, the cluster centers and the number of them are determined automatically according to a minimum-distance principle. At last, a set of horizontal candidate regions with accurately located borders are extracted for classification. The classifier is trained on the histogram of oriented gradient feature by a linear support vector machine model. Experiments on three public datasets including images captured under different scenarios demonstrate that the proposed method outperforms several state-of-the-art methods in detection accuracy and its performance in efficiency is also comparable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
The focal length information of an image is indispensable for many computer vision tasks. In general, focal length can be obtained via camera calibration using specific planner patterns. However, for images taken by an unknown device, focal length can only be estimated based on the image itself. Currently, most of the single-image focal length estimation methods make use of predefined geometric cues (such as vanishing points or parallel lines) to infer focal length, which constrains their applications mainly on manmade scenes. The machine learning algorithms have demonstrated great performance in many computer vision tasks, but these methods are seldom used in the focal length estimation task, partially due to the shortage of labeled images for training the model. To bridge this gap, we first introduce a large-scale dataset FocaLens, which is especially designed for single-image focal length estimation. Taking advantage of the FocaLens dataset, we also propose a new focal length estimation model, which exploits the multiscale detection architecture to encode object distributions in images to assist focal length estimation. Additionally, an online focal transformation approach is proposed to further promote the model’s generalization ability. Experimental results demonstrate that the proposed model trained on FocaLens can not only achieve state-of-the-art results on the scenes with distinct geometric cues but also obtain comparable results on the scenes even without distinct geometric cues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
To fulfill the demands of rapid and real-time three-dimensional optical measurement, a fast point cloud registration algorithm using multiscale axis angle features is proposed. The key point is selected based on the mean value of scalar projections of the vectors from the estimated point to the points in the neighborhood on the normal of the estimated point. This method has a small amount of computation and good discriminating ability. A rotation invariant feature is proposed using the angle information calculated based on multiscale coordinate axis. The feature descriptor of a key point is computed using cosines of the angles between corresponding coordinate axes. Using this method, the surface information around key points is obtained sufficiently in three axes directions and it is easy to recognize. The similarity of descriptors is employed to quickly determine the initial correspondences. The rigid spatial distance invariance and clustering selection method are used to make the corresponding relationships more accurate and evenly distributed. Finally, the rotation matrix and translation vector are determined using the method of singular value decomposition. Experimental results show that the proposed algorithm has high precision, fast matching speed, and good antinoise capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
The majority of pedestrian detection approaches use multiscale detection and the sliding window search scheme with high computing complexity. We present a fast pedestrian detection method using the deformable part model and pyramid layer location (PLL). First, the object proposal method is used rather than the traditional sliding window to obtain pedestrian proposal regions. Then, a PLL method is proposed to select the optimal root level in the feature pyramid for each candidate window. On this basis, a single-point calculation scheme is designed to calculate the scores of candidate windows efficiently. Finally, pedestrians can be located from the images. The Institut national de recherche en informatique et en automatique dataset for human detection is used to evaluate the performance of the proposed method. The experimental results demonstrate that the proposed method can reduce the number of feature maps and windows requiring calculation in the detection process. Consequently, the computing cost is significantly reduced, with fewer false positives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Low-rank representation (LRR) has been successfully applied to subspace clustering. However, the nuclear norm in the standard LRR is not optimal for approximating the rank function in many real-world applications. Meanwhile, the L21 norm in LRR also fails to characterize various noises properly. To address the above issues, we propose an improved LRR method, which achieves low rank property via the new formulation with weighted Schatten-p norm and Lq norm (WSPQ). Specifically, the nuclear norm is generalized to be the Schatten-p norm and different weights are assigned to the singular values, and thus it can approximate the rank function more accurately. In addition, Lq norm is further incorporated into WSPQ to model different noises and improve the robustness. An efficient algorithm based on the inexact augmented Lagrange multiplier method is designed for the formulated problem. Extensive experiments on face clustering and motion segmentation clearly demonstrate the superiority of the proposed WSPQ over several state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Reconstructing three-dimensional (3-D) poses from matched feature correspondences is widely used in 3-D object tracking. The precision of correspondence matching plays a major role in the pose reconstruction. Without prior knowledge of the perspective camera model, state-of-the-art methods only deal with two-dimensional (2-D) planar affine transforms. An interest point’s detector and descriptor [perspective scale invariant feature transform (SIFT)] is proposed to overcome the side effects of viewpoint changing, i.e., our detector is invariant to viewpoint changing. Perspective SIFT is detected by the SIFT approach, where the sample region is determined by projecting the original sample region to the image plane based on the established camera model. An iterative algorithm then modifies the pose of the tracked object and it generally converges to a 3-D perspective invariant point. The pose of the tracked object is finally estimated by the combination of template warping and perspective SIFT correspondences. Thorough evaluations are performed on two public databases, the Biwi Head Pose dataset and the Boston University dataset. Comparisons illustrate that the proposed keypoint’s detector largely improves the tracking performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Previous gesture recognition methods usually focused on recognizing gestures after the entire gesture sequences were obtained. However, in many practical applications, a system has to identify gestures before they end to give instant feedback. We present an online gesture recognition approach that can realize early recognition of unfinished gestures with low latency. First, a curvature buffer-based point context (CBPC) descriptor is proposed to extract the shape feature of a gesture trajectory. The CBPC descriptor is a complete descriptor with a simple computation, and thus has its superiority in online scenarios. Then, we introduce an online windowed dynamic time warping algorithm to realize online matching between the ongoing gesture and the template gestures. In the algorithm, computational complexity is effectively decreased by adding a sliding window to the accumulative distance matrix. Lastly, the experiments are conducted on the Australian sign language data set and the Kinect hand gesture (KHG) data set. Results show that the proposed method outperforms other state-of-the-art methods especially when gesture information is incomplete.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Since blur kernel estimation is an ill-posed problem, it is essential that it be constrained by parametric image priors. However, the previous normalized sparsity measure alters the kernel structure during estimation. To address the problem of single-image blur kernel estimation, a local smoothness prior is introduced to the normalized sparsity model to constrain the blurred image gradient to be similar to the unblurred one. Moreover, based on the inequality constraints, a kernel optimization algorithm is proposed to weaken the noise. Experimental results show that the proposed method is robust against noise and is able to estimate a stable blur kernel. It outperforms other state-of-the-art methods on both synthetic and real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Hyperspectral image (HSI) is a three-dimensional data cube containing two spatial information dimensions and one spectral information dimension. The spectral vectors of different classes may have similar tendency and value that may bring about negative influences on classification. It is, therefore, important to introduce signal preprocessing techniques in the spatial domain to improve classification accuracy of HSIs. Assuming that local pixels in HSI have some correlations with each other, this paper proposes a spatial filtering model based on adaptive manifold (AM) for HSI. The AM for spatial filtering emphasizes the similar neighboring pixels and is robust to resist the noisy points with fast speed. The rich information in the filtered data is effective for improving the performance of the subsequent classification. The filtered data are classified by an extreme learning machine (ELM). The experimental results indicate that the framework built based on AM and ELM provides competitive performance. Specifically, by classifying the filtered data, the average accuracy of ELM can be improved as high as 30.54%, while performing tens to hundreds times faster than those state-of-the-art classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Inspired by the theoretical advances of compressed sensing, lots of sparsity-aware methods have been proposed for squinted synthetic aperture radar (SAR) imaging based on the single-measurement vector (SMV) model. Compared with SMV, the multiple measurement vectors (MMV) model has been demonstrated to have better reconstruction performance. In fact, echo received by SAR at different azimuth positions can be viewed as MMVs. However, the MMV model cannot be directly used in squinted SAR imaging, because MMV requires multiple sparse vectors of the common sparse structures, while the high-resolution range profiles (HRRPs) obtained by squinted SAR at different azimuth positions have different sparse structures due to range migration effect. A squinted SAR imaging method is proposed based on MMV. First, a modified MMV model that considers range migration is built to realize sparse representation of echo. Additionally, an improved orthogonal matching pursuit algorithm is developed to reconstruct HRRPs. Finally, a high-resolution two-dimensional image result can be easily achieved via traditional azimuth match filtering. Experimental results based on both simulated and real data demonstrate that the proposed MMV-based method can provide better computational efficiency and antinoise ability compared to the SMV-based method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
To prevent halo artifacts resulting from edge preserving smoothing methods that use a local filter, a filter method with a guided image that fuses multiple kernels is proposed. This method first computes the coefficients of different local multiple kernels at the pixel level and then linearly fuses these coefficients to obtain the final coefficients. Finally, the filtered image is generated using the linear coefficients. Compared with existing methods, including the popular bilateral filter and guided filter methods, our experimental results show that the proposed method not only obtains images with better visual quality but also prevents halo artifacts, resulting in detail enhancement, haze removal, and noise reduction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.
Authenticity is one of the most important evaluation factors of images for photography competitions or journalism. Unusual compression history of an image often implies the illicit intent of its author. Our work aims at distinguishing real uncompressed images from fake uncompressed images that are saved in uncompressed formats but have been previously compressed. To detect the potential image JPEG compression, we analyze the JPEG compression artifacts based on the tetrolet covering, which corresponds to the local image geometrical structure. Since the compression can alter the structure information, the tetrolet covering indexes may be changed if a compression is performed on the test image. Such changes can provide valuable clues about the image compression history. To be specific, the test image is first compressed with different quality factors to generate a set of temporary images. Then, the test image is compared with each temporary image block-by-block to investigate whether the tetrolet covering index of each 4×4 block is different between them. The percentages of the changed tetrolet covering indexes corresponding to the quality factors (from low to high) are computed and used to form the p-curve, the local minimum of which may indicate the potential compression. Our experimental results demonstrate the advantage of our method to detect JPEG compressions of high quality, even the highest quality factors such as 98, 99, or 100 of the standard JPEG compression, from uncompressed-format images. At the same time, our detection algorithm can accurately identify the corresponding compression quality factor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print format on
SPIE.org.