This paper explores sparse coding of natural images in the highly overcomplete regime. We show that as the overcompleteness ratio approaches l0x, new types of dictionary elements emerge beyond the classical Gabor function shape obtained from complete or only modestly overcomplete sparse coding. These more diverse dic tionaries allow images to be approximated with lower L1 norm (for a fixed SNR), and the coefficients exhibit steeper decay. We also evaluate the learned dictionaries in a denoising task, showing that higher degrees of overcompleteness yield modest gains in peformance.
We show how an overcomplete dictionary may be adapted to the statistics of natural images so as to provide
a sparse representation of image content. When the degree of overcompleteness is low, the basis functions
that emerge resemble those of Gabor wavelet transforms. As the degree of overcompleteness is increased, new
families of basis functions emerge, including multiscale blobs, ridge-like functions, and gratings. When the basis
functions and coefficients are allowed to be complex, they provide a description of image content in terms of local
amplitude (contrast) and phase (position) of features. These complex, overcomplete transforms may be adapted
to the statistics of natural movies by imposing both sparseness and temporal smoothness on the amplitudes.
The basis functions that emerge form Hilbert pairs such that shifting the phase of the coefficient shifts the phase
of the corresponding basis function. This type of representation is advantageous because it makes explicit the
structural and dynamic content of images, which in turn allows later stages of processing to discover higher-order
properties indicative of image content. We demonstrate this point by showing that it is possible to learn the
higher-order structure of dynamic phase - i.e., motion - from the statistics of natural image sequences.
Researchers studying human and computer vision have found description and construction of these systems greatly aided
by analysis of the statistical properties of naturally occurring scenes. More specifically, it has been found that receptive
fields with directional selectivity and bandwidth properties similar to mammalian visual systems are more closely
matched to the statistics of natural scenes. It is argued that this allows for sparse representation of the independent
components of natural images [Olshausen and Field, Nature, 1996]. These theories have important implications for
medical image perception. For example, will a system that is designed to represent the independent components of
natural scenes, where objects occlude one another and illumination is typically reflected, be appropriate for X-ray
imaging, where features superimpose on one another and illumination is transmissive?
In this research we begin to examine these issues by evaluating higher-order statistical properties of breast images from
X-ray projection mammography (PM) and dedicated breast computed tomography (bCT). We evaluate kurtosis in
responses of octave bandwidth Gabor filters applied to PM and to coronal slices of bCT scans. We find that kurtosis in
PM rises and quickly saturates for filter center frequencies with an average value above 0.95. By contrast, kurtosis in
bCT peaks near 0.20 cyc/mm with kurtosis of approximately 2. Our findings suggest that the human visual system may
be tuned to represent breast tissue more effectively in bCT over a specific range of spatial frequencies.
Previous work on unsupervised learning has shown that it is possible to learn Gabor-like feature representations,
similar to those employed in the primary visual cortex, from the statistics of natural images. However, such
representations are still not readily suited for object recognition or other high-level visual tasks because they
can change drastically as the image changes to due object motion, variations in viewpoint, lighting, and other
factors. In this paper, we describe how bilinear image models can be used to learn independent representations
of the invariances, and their transformations, in natural image sequences. These models provide the foundation
for learning higher-order feature representations that could serve as models of higher stages of processing in the
cortex, in addition to having practical merit for computer vision tasks.
Overcomplete wavelet representations have become increasingly popular for their ability to provide highly sparse and robust descriptions of natural signals. We describe a method for incorporating an overcomplete wavelet representation as part of a statistical model of images which includes a sparse prior distribution over the wavelet coefficients. The wavelet basis functions are parameterized by a small set of 2-D functions. These functions are adapted to maximize the average log-likelihood of the model for a large database of natural images. When adapted to natural images, these functions become selective to different spatial orientations, and they achieve a superior degree of sparsity on natural images as compared with traditional wavelet bases. The learned basis is similar to the Steerable Pyramid basis, and yields slightly higher SNR for the same number of active coefficients. Inference with the learned model is demonstrated for applications such as denoising, with results that compare favorably with other methods.
We show how a wavelet basis may be adapted to best represent natural images in terms of sparse coefficients. The wavelet basis, which may be either complete or overcomplete, is specified by a small number of spatial functions which are repeated across space and combined in a recursive fashion so as to be self-similar across scale. These functions are adapted to minimize the estimated code length under a model that assumes images are composed as a linear superposition of sparse, independent components. When adapted to natural images, the wavelet bases become selective to different spatial orientations, and they achieve a superior degree of sparsity on natural images as compared with traditional wavelet bases.
We describe a method for learning an over complete set of basis functions for the purpose of modeling data with sparse structure. Such data re characterized by the fact that they require a relatively small number of non-zero coefficients on the basis functions to describe each data point. The sparsity of the basis function coefficients is modeled with a mixture-of-Gaussians distribution. One Gaussian captures non-active coefficients with a large-variance distribution centered at zero, while one or more other Gaussians capture active coefficients with a large-variance distribution. We show that when the prior is in such a form, there exist efficient methods for learning the basis functions as well as the parameters of the prior. The performance of the algorithm is demonstrated on a number of test cases and also on natural images. The basis functions learned on natural images are similar to those obtained with other methods, but the sparse from of the coefficient distribution is much better described. Also, since the parameters of the prior are adapted to the data, no assumption about sparse structure in the images need be made a priori, rather it is learned from the data.
A number of recent efforts have been made to account for the response properties of the cells in the visual pathway by considering the statistical structure of the natural environment. Previously, it has been suggested that the wavelet-like properties of cells in primary visual cortex have been proposed to provide an efficient representation of the structure in natural scenes captured by the phase spectrum. In this paper, we take a closer look at the amplitude spectra of natural scenes and its role in understanding visual coding. We propose that one of the principle insights one gains from the amplitude spectra is in understanding the relative sensitivity of cells tuned to different frequencies. It is suggested that response magnitude of cells tuned to different frequencies increases with frequency out to about 20 cycles/deg. The result is a code in which the response to natural scenes with a 1/f falloff is approximately flat out to 20 cycles/deg. The variability in the amplitude spectra of natural scenes is also investigated. Using a measure called the 'thresholded contrast spectrum' (TCS), it is demonstrated that a good proportion of the variability in the spectra is due to the relative sparseness of structure at different frequencies. The slope of the TCS was found to provide a reasonable prediction of blur across a variety of scenes in spite of the variability in their amplitude spectra.
An algorithm is described which allows for the learning of sparse, overcomplete image representations. Images are modeled as a linear superposition of basis functions, and a set of basis functions is sought which maximizes the sparseness of the representation (fewest number of active units per image). When applied to natural scenes, the basis functions converge to localized, oriented, bandpass functions that bear a strong resemblance to the receptive fields of neurons in the primate striate cortex. Importantly, the code can be made overcomplete, which allows for an increased degree of sparseness in which the basis functions can become more specialized. The learned basis functions constitute an efficient representation of natural images because sparseness forces a form of reduced entropy representation that minimizes statistical dependencies among outputs.
In its evolution, the primate visual system has developed impressive capabilities for recognizing complex patterns in natural images. This process involves many stages of analysis and a variety of information processing strategies. This paper concentrates on the importance of 'information bottlenecks,' which restrict the amount of information that can be handled at different stages of analysis. These steps are crucial for reducing the overwhelming computational complexity associated with recognizing countless objects from arbitrary viewing angles, distances, and perspectives. The process of directed visual attention is an especially important information bottleneck because of its flexibility in determining how information is routed to high-level pattern recognition centers.