## 1.

## Introduction

This tutorial describes new methods and computational imagers for increasing system resolution based on recently developed compressed sensing (CS, also referred to as compressive sampling)^{1, 2} techniques. CS is a mathematical framework with several powerful theorems that provide insight into how a high resolution image can be inferred from a relatively small number of measurements using sophisticated computational methods. For example, in theory a 1 mega-pixel array could potentially be used to reconstruct a 4 mega-pixel image by projecting the desired high resolution image onto a set of low resolution measurements (via spatial light modulators, for instance) and then recovering the 4 mega-pixel scene through sparse signal reconstruction software. However, it is not immediately clear how to build a practical system that incorporates these theoretical concepts. This paper provides a tutorial on CS for optical engineers which focuses on 1. a brief overview of the main theoretical tenets of CS, 2. physical systems designed with CS theory in mind and the various tradeoffs associated with these systems, and 3. an overview of the state-of-the-art in sparse reconstruction algorithms used for CS image formation. There are several other tutorials on CS available in the literature which we highly recommend;^{3, 4, 5, 6, 7} however, these papers do not address important technical issues related to optical systems, including a discussion of the tradeoffs associated with non-negativity, photon noise, and the practicality of implementation in real imaging systems.

Although the CS theory and methods we describe in this paper can be applied to many general imaging systems, we concentrate on infrared (IR) technology as a specific example to highlight the challenges associated with applying CS to practical optical systems and to illustrate its potential benefits for improving system resolution. Much of the research and development in IR imaging is driven by a continued desire for high resolution, large-format focal plane arrays (FPAs). The push for high quality, wide field-of-view IR imagery is particularly strong in military applications. Modern maritime combat, for example, requires consideration of small, difficult-to-detect watercraft operating in low-light environments or in areas where obscurants such as smoke or marine-layer haze make conventional imaging difficult. While high resolution IR imaging can meet these challenges, there is a considerable cost in terms of power, form factor, and finance, in part due to higher costs of IR detector materials and material processing challenges when constructing small pitch FPAs. Despite recent advances,^{8, 9, 10} copious research funds continue to be spent on developing larger format arrays with small pitch. CS addresses the question of whether it is necessary to physically produce smaller sensors in order to achieve higher resolution. Conventional practice would require one sensor (e.g., focal plane array element) per image pixel; however, the theory, algorithms, and architectures described in this paper may alleviate this constraint.

## 1.1.

### Paper Structure and Contribution

In Sec. 2, we introduce the key theoretical concepts of CS with an intuitive explanation of how high resolution imagery can be achieved with a small number of measurements. Practical architectures which have been developed to exploit CS theory and associated challenges of photon efficiency and noise, non-negativity, and dynamic range are described in Sec. 3. Section 4 describes computational methods designed for inferring high resolution imagery from a small number of compressive measurements. This is an active research area, and we provide a brief overview of the broad classes of techniques used and their tradeoffs. These concepts are then brought together in Sec. 5 with the description of a physical system in development for IR imaging and how CS impacts the contrast of bar target images commonly used to assess the resolution of IR cameras. We offer some brief concluding remarks in Sec. 6.

## 2.

## Compressed Sensing

The basic idea of CS theory is that when the image of interest is very sparse or highly compressible in some basis (i.e., most basis coefficients are small or zero-valued), relatively few well-chosen observations suffice to reconstruct the most significant nonzero components. In particular, judicious selection of the type of image transformation introduced by measurement systems may dramatically improve our ability to extract high quality images from a limited number of measurements. In this section we review the intuition and theory underlying these ideas. By designing optical sensors to collect measurements of a scene according to CS theory, we can use sophisticated computational methods to infer critical scene structure and content.

## 2.1.

### Underlying Intuition

One interpretation for why CS is possible is based on the inherent compressibility of most images of interest. Since images may be stored with modern compression methods using fewer than one bit per pixel, we infer that the critical information content of the image is much less than the number of pixels times the bit depth of each pixel. Thus rather than measuring each pixel and then computing a compressed representation, CS suggests that we can measure a “compressed” representation directly.

More specifically, consider what would happen if we knew ahead of time that our scene consisted of one bright point source against a black background. Conventional measurement of this scene would require an FPA with *N* elements to localize this bright spot with accuracy 1/*N*, as depicted in Fig. 1. Here, *I*_{j} is an indicator image for the *j*'th pixel on a
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\sqrt{N} \times \sqrt{N}$\end{document}
$\sqrt{N}\times \sqrt{N}$
grid, so that *x*_{j} is a direct measurement of the *j*'th pixel's intensity. In other words, the point spread function being modeled is a simple Dirac delta function at the resolution of the detector. If the one bright spot in our image is in pixel *k*, then *x*_{k} will be proportional to its brightness, and the remaining *x*_{j}'s will be zero-valued.

However, our experience with binary search strategies and group testing suggests that, armed with our prior knowledge about the sparsity of the signal, we should be able to determine the location of the bright pixel with significantly fewer than *N* measurements. For instance, consider the binary sensing strategy depicted in Fig. 1. The first measurement immediately narrows down the set of possible locations for our bright pixel to half its original size, and the second measurement reduces the size of this set by half again. Thus with only *M* = log_{2}*N* measurements of this form, we may accurately localize our bright source.

It is easiest to see that this approach should work in the setting where there is only one nonzero pixel in the original scene and measurements are noise-free. Compressed sensing provides a mechanism for transforming this intuition into settings with noisy measurements where (i) the image contains a small, unknown number (*K* ≪ *N*) of nonzero pixels or (ii) the image contains significantly more structure, such as texture, edges, boundaries, and smoothly varying surfaces, but can be approximated accurately with *K* ≪ *N* nonzero coefficients in some basis (e.g., a wavelet basis). Case (i) is depicted in Fig. 1, and case (ii) is depicted in Fig. 1. The CS approach requires that the binary projections be constructed slightly differently than those considered in Fig. 1, e.g., using random matrices, but each measurement is nevertheless the (potentially weighted) sum of a subset of the original pixels.

## 2.2.

### Underlying Theory

The above concepts are formalized in this section. We only provide a brief overview of a few main theoretical results in this burgeoning field and refer readers to other tutorials^{3, 4, 5, 6, 7} for additional details. Consider an *N*-pixel image (which we represent as a length-*N* column vector) *f*^{⋆}, represented in terms of a basis expansion with *N* coefficients:

*w*

_{i}is the

*i*'th basis vector and [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star _i$\end{document} ${\theta}_{i}^{\star}$ is the corresponding coefficient. In many settings, the basis

*W*≜[

*w*

_{1}, …,

*w*

_{N}] can be chosen so that only

*K*≪

*N*coefficients have significant magnitude, i.e., many of the [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star _i$\end{document} ${\theta}_{i}^{\star}$ 's are zero or very small for large classes of images; we then say that [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star \triangleq [\theta ^\star _1, \ldots , \theta ^\star _N]^T$\end{document} ${\theta}^{\star}\triangleq {[{\theta}_{1}^{\star},...,{\theta}_{N}^{\star}]}^{T}$ is sparse or compressible. Sparsity (or, more generally, low-dimensional structure) has long been recognized as a highly useful metric in a variety of inverse problems, but much of the underlying theoretical support was lacking. More recent theoretical studies have provided strong justification for the use of sparsity constraints and quantified the accuracy of sparse solutions to underdetermined systems.

^{11, 12}

In such cases, it is clear that if we knew which *K* of the
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta ^\star _i$\end{document}
${\theta}_{i}^{\star}$
's were significant, we would ideally just measure these *K* coefficients directly, resulting in fewer measurements to obtain an accurate representation of *f*^{⋆}. Of course, in general we do not know *a priori* which coefficients are significant. The key insight of CS is that, with slightly more than *K* well-chosen measurements, we can determine which
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta ^\star _i$\end{document}
${\theta}_{i}^{\star}$
's are significant and accurately estimate their values. Furthermore, fast algorithms which exploit the sparsity of θ^{⋆} make this recovery computationally feasible.

The data collected by an imaging or measurement system are represented as

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} x = Rf^\star + n = RW\theta ^\star + n, \end{equation} \end{document} $$x=R{f}^{\star}+n=RW{\theta}^{\star}+n,$$*M*-dimensional set of observations, [TeX:] \documentclass[12pt]{minimal}\begin{document}$n \in {\mathbb R}^M$\end{document} $n\in {\mathbb{R}}^{M}$ is noise associated with the physics of the sensor, and [TeX:] \documentclass[12pt]{minimal}\begin{document}$x \in {\mathbb R}_+^{M}$\end{document} $x\in {\mathbb{R}}_{+}^{M}$ is the observed data. (Typically

*n*is assumed bounded or bounded with high probability to ensure that

*x*, which is proportional to photon intensities, is non-negative.) Sparse recovery algorithms address the problem of solving for

*f*

^{⋆}when the number of unknowns,

*N*, is much larger than the number of observations,

*M*. In general, this is an ill-posed problem as there are a possibly infinite number of candidate solutions for

*f*

^{⋆}; nevertheless, CS theory provides a set of conditions on

*R*,

*W*, and

*f*

^{⋆}that, if satisfied, assure an accurate estimation of

*f*

^{⋆}.

Much of the CS literature revolves around determining when a sensing matrix *A*≜*RW* allows accurate reconstruction using an appropriate algorithm. One widely used property used in such discussions is the restricted isometry property (RIP):

**Definition 2.1**
*(Restricted Isometry Property ^{11}): The matrix A satisfies the restricted isometry property of order K with parameter δ_{K} ∈ [0, 1) if*

*holds simultaneously for all sparse vectors θ having no more than*

*K*nonzero entries. Matrices with this property are denoted RIP(*K*, δ_{K}).[In the above,
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\Vert \theta \Vert _2 \triangleq (\sum _{i=1}^N \theta _i^2)^{1/2}$\end{document}
${\Vert \theta \Vert}_{2}\triangleq {\left({\sum}_{i=1}^{N}{\theta}_{i}^{2}\right)}^{1/2}$
.] For example, if the entries of *A* are independent and identically distributed according to

*A*satisfies RIP(

*K*, δ

_{K}) with high probability for any integer

*K*=

*O*(

*M*/log

*N*).

^{11, 13, 14}Matrices which satisfy the RIP combined with sparse recovery algorithms are guaranteed to yield accurate estimates of the underlying function

*f*

^{⋆}, as specified by the following theorem. [As shown below, [TeX:] \documentclass[12pt]{minimal}\begin{document}$\Vert \theta \Vert _1 \triangleq \sum _{i=1}^N |\theta _i|$\end{document} ${\Vert \theta \Vert}_{1}\triangleq {\sum}_{i=1}^{N}\left|{\theta}_{i}\right|$ .]

**Theorem 2.2 (Noisy Sparse Recovery with RIP Matrices ^{3, 15})**:

*Let*

*A*be a matrix satisfying RIP(2*K*, δ_{2K}) with [TeX:] \documentclass[12pt]{minimal}\begin{document}$\delta _{2K} < \sqrt{2}-1$\end{document} ${\delta}_{2K}<\sqrt{2}-1$ , and let*x*=*A*θ^{⋆}+*n*be a vector of noisy observations of any signal [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star \in {\mathbb R}^{N}$\end{document} ${\theta}^{\star}\in {\mathbb{R}}^{N}$ , where*n*is a noise or error term with ‖*n*‖_{2}⩽ ε. Let [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star _{K}$\end{document} ${\theta}_{K}^{\star}$ be the best*K*-sparse approximation of θ^{⋆}; that is, [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star _{K}$\end{document} ${\theta}_{K}^{\star}$ is the approximation obtained by keeping the*K*largest entries of θ^{⋆}and setting the others to zero. Then the estimate## 2

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} {\widehat{\theta }}= \mathop {\arg \min }_{\theta \in \mathbb {R}^N} \ \ \Vert \theta \Vert _{1} \quad \mathop {\rm subject\, to}\Vert x-A\theta \Vert _{2} \le \epsilon , \end{eqnarray} \end{document} $$\begin{array}{c}\hfill \widehat{\theta}=\underset{\theta \in {\mathbb{R}}^{N}}{\mathrm{arg}\mathrm{min}}{\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\Vert \theta \Vert}_{1}\phantom{\rule{1em}{0ex}}\mathrm{subject}\phantom{\rule{0.16em}{0ex}}\mathrm{to}{\Vert x-A\theta \Vert}_{2}\le \epsilon ,\end{array}$$*C*

_{1, K}and

*C*

_{2, K}are constants which depend on

*K*but not on

*N*or

*M*.

In other words, the accuracy of the reconstruction of a general image *f*^{⋆} from measurements collected using a system which satisfies the RIP depends on a. the amount of noise present and b. how well *f*^{⋆} may be approximated by an image sparse in *W*. In the special case of
*noiseless acquisition of K-sparse signals*, we have ε = 0 and
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta ^\star \equiv \theta ^\star _{K}$\end{document}
${\theta}^{\star}\equiv {\theta}_{K}^{\star}$
, and Theorem 2.2 implies exact recovery so that
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta ^\star = {\widehat{\theta }}$\end{document}
${\theta}^{\star}=\widehat{\theta}$
.

Note that if the noise were Gaussian white noise with variance σ^{2}, then
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\Vert n\Vert _{2} \approx \sigma \sqrt{N}$\end{document}
${\Vert n\Vert}_{2}\approx \sigma \sqrt{N}$
, so the first term in the error bound in Theorem 2.2 scales like
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\sigma \sqrt{N}$\end{document}
$\sigma \sqrt{N}$
. If the noise is Poisson, corresponding to a low-light or infrared setting, then ‖*n*‖_{2} may be arbitrarily large; theoretical analysis in this setting must include several physical constraints not considered above but described in Sec.
3.2. Additional theory on reconstruction accuracy, optimality, stability with respect to noise, and the uniqueness of solutions is prevalent in the literature.^{3, 12, 16} Finally, note that the reconstruction 2 in Theorem 2.2 is equivalent to

## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} {\widehat{\theta }}= \mathop {\arg \min }_{\theta \in \mathbb {R}^N} \ \ \frac{1}{2}\Vert x - A\theta \Vert _2^2 + \tau \Vert \theta \Vert _1 \mbox{ and } \widehat{f} = W{\widehat{\theta }}, \end{eqnarray} \end{document} $$\begin{array}{c}\hfill \widehat{\theta}=\underset{\theta \in {\mathbb{R}}^{N}}{\mathrm{arg}\mathrm{min}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\frac{1}{2}{\Vert x-A\theta \Vert}_{2}^{2}+\tau {\Vert \theta \Vert}_{1}\phantom{\rule{4.pt}{0ex}}\text{and}\phantom{\rule{4.pt}{0ex}}\widehat{f}=W\widehat{\theta},\end{array}$$A related criteria for determining the quality of the measurement matrix for CS is the worst-case *coherence* of *A* ≡ *RW*.^{17, 18, 19} Formally, denote the Gram matrix *G*≜*A*^{T}*A* when the columns of A have unit norm and let

## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} \mu (A)\triangleq \max_{1\le i,j \le N \atop i \ne j} | G_{i,j} | \end{eqnarray} \end{document} $$\begin{array}{c}\hfill \mu \left(A\right)\triangleq \underset{\genfrac{}{}{0pt}{}{1\le i,j\le N}{i\ne j}}{\mathrm{max}}\left|{G}_{i,j}\right|\end{array}$$*R*and

*W*so that μ is as close as possible to

*M*

^{−1/2}. There also exist matrices

*A*associated with highly overcomplete dictionaries where

*N*≈

*M*

^{2}and [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mu (A) \approx 1/\sqrt{M}$\end{document} $\mu \left(A\right)\approx 1/\sqrt{M}$ .

^{20}Several recent works have examined the recovery guarantees of CS as a function of μ(

*A*),

^{17, 21, 22, 23, 24, 25, 26}such as the following:

**Theorem 2.3 (Noisy Sparse Recovery with Incoherent Matrices ^{21}):**

*Let*

*x*=*A*θ^{⋆}+*n*be a vector of noisy observations of any*K*-sparse signal [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta ^\star \in {\mathbb R}^{N}$\end{document} ${\theta}^{\star}\in {\mathbb{R}}^{N}$ , where*K*⩽ [μ(*A*)^{−1}+ 1]/4 and*n*is a noise or error term with ‖*n*‖_{2}⩽ ε. Then the estimate in Eq. 2 obeysOne of the main practical advantages of coherence-based theory is that it is possible to compute μ for a given CS system. Furthermore, Gersgorin's circle theorem^{27} states that if a matrix has coherence μ then it satisfies RIP of order (*k*, ε) with *k* ∼ ε/μ; this fact was used in several papers analyzing the performance of CS.^{19, 28, 29}

Many of the ideas connected to CS, such as using the ℓ_{1} norm during reconstruction to achieve sparse solutions, existed in the literature long before the central theory of CS was developed. However, we had limited fundamental insight into why these methods worked, including necessary conditions which would guarantee successful image recovery. CS theory provides significant inroads in this direction. In particular, we now understand how to assess when a sensing matrix will facilitate accurate sparse recovery, and we can use this insight to guide the development of new imaging hardware. Given a particular CS system, governed by *A* = *RW*, let the ratios ρ = *K*/*M* and δ = *M*/*N*, respectively, quantify the degree of sparsity and the degree to which the problem is underdetermined. It has been shown that for many CS matrices, there exist sharp boundaries in ρ, δ space that clearly divide the “solvable” from “unsolvable” problems in the noiseless case.^{30, 31} This boundary is shown in Fig. 2 for the case of a random Gaussian CS matrix (entries of *A* are drawn independently from a Gaussian distribution); however it has been shown by Donoho and Tanner^{30} that this same boundary holds for many other CS matrices as well. Above the boundary, the system lacks sparsity and/or is too underdetermined to solve; below the boundary, solutions are readily obtainable by solving Eq. 2 with ε = 0.

Finally, it should be noted that the matrix *R* is a model for the propagation of light through an optical system. The reconstruction performance is thus going to depend not only on the RIP or incoherence properties of *R*, but also on modeling inaccuracies due to misalignments, calibration, diffraction, sensor efficiency and bit depth, and other practical challenges. These model inaccuracies can often be incorporated in the noise term for the theoretical analysis above, and typically play a significant role in determining overall system performance.

## 3.

## CS Imaging Systems and Practical Challenges

To date there have been several CS imaging devices built and tested in the laboratory. In general, the central challenge addressed by these methods is to find an architecture which effectively balances between a. physical considerations such as size and cost, b. reconstruction accuracy, and c. reconstruction speed. We review several recently proposed architectures in this section and outline some of the challenges associated with non-negativity and photon noise in this section.

## 3.1.

### Imaging Systems

Developing practical optical systems to exploit CS theory is a significant challenge being explored by investigators in the signal processing, optics, optimization, astronomy, and coding theory communities. In addition to implicitly placing hard constraints on the nature of the measurements which can be collected, such as non-negativity of both the projection vectors and the measurements, practical CS imaging systems must also be robust and reasonably sized. Neifeld and Ke^{32} describe three general optical architectures for compressive imaging: 1. sequential, where measurements are taken one at a time, 2. parallel, where multiple measurements are taken simultaneously using a fixed mask, and 3. photon-sharing, where beam-splitters and micromirror arrays are used to collect measurements. Here, we describe some optical hardware with these architectures that have been recently considered in literature.

## 3.1.1.

#### Rice single-pixel camera

Perhaps the most well-known example of a CS imager is the rice single-pixel camera developed by Duarte
^{33, 34} and the more recent single-pixel microscope.^{35} This architecture uses only a single detector element to image a scene. A digital micromirror array is used to represent a pseudorandom binary array, and the scene of interest is then projected onto that array before the aggregate intensity of the projection is measured with a single detector. Since the individual orientations of the mirrors in the micromirror array can be altered very rapidly, a series of different pseudorandom projections can be measured successively in relatively little time. The original image is then reconstructed from the resulting observations using CS reconstruction techniques such as those described in Section 4. One of the chief benefits of this architecture is that any binary projection matrix can readily be implemented in this system, so that existing CS theory can be directly applied to the measurements. The drawback to this architecture is that one is required to keep the camera focused on the object of interest until enough samples have been collected for reconstruction. The time required may be prohibitive in some applications. Although we can rapidly collect many projections sequentially at lower exposure, this increases the amount of noise per measurement, thus diminishing its potential for video imaging applications.

## 3.1.2.

#### Coded aperture imagers

Marcia and Willett,^{36} Marcia
^{37} Romberg,^{38} and Stern and Javidi^{39} propose practical implementations of CS ideas using coded apertures, demonstrating that if the coded apertures are designed using a pseudorandom construction, then the resulting observation model satisfies the RIP. Furthermore, the resulting sensing matrix *R* has a Toeplitz structure that allows for very fast computation within reconstruction algorithms, providing a significant speedup over random matrix constructions.^{40} The “random lens imaging” optical system^{41} is another parallel architecture that is highly suitable for practical and implementable compressive imaging since it provides a snapshot image (i.e., all *M* measurements are collected simultaneously) and does not require complex, and potentially large, imaging apparatuses.

## 3.1.3.

#### CMOS CS imagers

Both Robucci
^{42} and Majidzadeh
^{43} have proposed performing an analog, random convolution step in complementary, metal-oxide-semiconductor (CMOS) electronics. A clear advantage to this architecture is that the additional optics required for spatial light modulation are removed in favor of additional circuitry. In general, this seems to be a wise trade-off to make, considering the immediate reduction in imager size. The device described by Robucci
^{42} also leverages the ability of CMOS electronics to perform fast, block-wise inner products between the incoming data and a pre-defined random sequence. In the cited work the authors used a noiselet basis with binary {1, −1} coefficients for the projection operation, however, the architecture is extremely flexible and could admit many other choices (e.g., discrete cosine transform). The CMOS implementation is also likely to be low cost, relative to other approaches requiring expensive optical components.

## 3.1.4.

#### Spectral imagers

Tradeoffs between spectral and spatial resolution limit the performance of modern spectral imagers, especially in photon-limited settings where the small number of photons must be apportioned between the voxels in the data cube, resulting in low signal-to-noise ratio (SNR) per voxel. Gehm
^{44} and Wagadarikar
^{45} proposed innovative, real-time spectral imagers, where each pixel measurement is the coded projection of the spectrum in the corresponding spatial location in the data cube. This was first implemented using two dispersive elements separated by binary-coded masks; later, simpler designs omitted one dispersive element. In related work, objects are illuminated by light sources with tunable spectra using spatial light modulators to facilitate compressive spectral image acquisition.^{46} Finally, we mention more recent examples of compressive spectral imagers, such as compressive structured light codes where each camera pixel measures light from points along the line of sight within a volume density,^{47} and cameras that use dispersers for imaging piecewise “macropixel” objects (e.g., biochip microarrays in biochemistry).^{48}

## 3.1.5.

#### Application-specific architectures

Compressive imaging can afford other possible advantages besides simply reducing the number of pixels. Shankar used CS to develop an infrared camera that significantly reduced the thickness of the optics (by an order of magnitude).^{49} Coskun
^{50} also used CS principles to eliminate optical components (a lens) in a fluorescent imaging application of naturally sparse signals. Other recent works include the application of CS theory to radar imaging^{51} and the recovery of volumetric densities associated with translucent media (e.g., smoke, clouds, etc.).^{52} Still, other CS optical systems have also been proposed in a variety of applications including DNA microarrays,^{53, 54} magnetic resonance imaging (MRI),^{55} ground penetrating radar,^{56} confocal microscopy,^{57} and astronomical imaging.^{58}

## 3.2.

### Non-negativity and Photon Noise

The theory in Sec. 2 described pseudorandom sensing matrices that satisfied the RIP, and hence led to theoretical guarantees on reconstruction accuracy in the presence of Gaussian or bounded noise. However, these sensing matrices were zero mean, so that approximately half of the elements were negative. Such a system is impossible to construct with linear optical elements. In addition, Gaussian or bounded noise models are not appropriate for all optical systems. Finally, we have the (typically not modeled) physical constraint that the total light intensity incident upon our detector cannot exceed the total light intensity entering our aperture; i.e., ‖*Rf*‖_{1} ⩽ ‖*f*‖_{1}. These practical considerations lead to an active area of ongoing research. Several of the physical architectures described above were designed based on zero-mean sensing models, and then subjected to a mean shift to make every element of the sensing matrix non-negative. In high SNR settings, reconstruction algorithms can compensate for this offset, rendering it negligible. This adjustment is critical to the success of many sparse reconstruction algorithms discussed in the literature, which perform best when *R*^{T}*R* ≈ *I*. In particular, assume we measure *x*_{p} = *R*_{p}*f*^{⋆} + *n*, where *R*_{p} is defined as *R* − μ_{R} for a zero-mean CS matrix *R* and μ_{R}≜(min _{i, j}*R*_{i, j})**1**_{M × N}; every element of *R*_{p} is non-negative by definition. Note *x*_{p} = *Rf*^{⋆} + μ_{R}*f*^{⋆} + *n*. Since μ_{R}*f*^{⋆} is a constant vector proportional to the total scene intensity we can easily estimate *z*≜μ_{R}*f*^{⋆} from data and apply sparse reconstruction algorithms to
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\widehat{x} \triangleq x_{p} - \widehat{z} \approx Rf^\star + n$\end{document}
$\widehat{x}\triangleq {x}_{p}-\widehat{z}\approx R{f}^{\star}+n$
.^{37}

In low SNR photon-limited settings, however, the combination of non-negative sensing matrices and light intensity preservation present a significant challenge to compressive optical systems. In previous work, we evaluated CS approaches for generating high resolution images and video from photon-limited data.^{59, 60} In particular, we showed how a feasible positivity- and flux-preserving sensing matrix can be constructed, and analyzed the performance of a CS reconstruction approach for Poisson data that minimizes an objective function consisting of a negative Poisson log likelihood term and a penalty term which measures image sparsity. We showed that for a fixed image intensity, the error bound actually grows with the number of measurements or sensors. This surprising fact can be understood intuitively by noting that dense positive sensing matrices will result in measurements proportional to the average intensity of the scene plus small fluctuations about that average. Accurate measurement of these fluctuations is critical to CS reconstruction, but in photon-limited settings the noise variance is proportional to the mean background intensity and overwhelms the desired signal.

## 3.3.

### Dynamic Range

An important consideration in practical implementations of CS hardware architectures is the quantization of the measurements, which involves encoding the values of *x* in Eq. 1 in finite-length bit strings. The representation of real-valued measurements as bit strings introduces error in addition to the noise discussed above. Moreover, if the dynamic range of the sensor is limited, very high and low intensity values that are outside this range will be truncated and simply be given the maximum and minimum values of the quantizer. These measurement inaccuracies can be mitigated by incorporating the quantization distortion within the observation model^{61, 62} and by either judiciously rejecting “saturated” measurements or factoring them in as inequality constraints within the reconstruction method.^{63}

## 4.

## CS Reconstruction Methods

The ℓ_{2}-ℓ_{1} CS problem 3 can be solved in a variety of ways. However, many off-the-shelf optimization software packages are often unsuitable because the size of imaging problems is generally prohibitively too large. For instance, second-derivative methods require solving an *N* × *N* linear equation at each iteration of the underlying Newton's method. In our settings, *N* corresponds to the number of pixels. Thus, to reconstruct a 1024 × 1024 pixel image requires solving a 1024^{2} × 1024^{2} linear system at each iteration, which is computationally too expensive and memory intensive. In addition, the ℓ_{1} term in the objective function in Eq. 3 is not differentiable; thus, the ℓ_{2}-ℓ_{1} CS reconstruction problem must be reformulated so that gradient-based methods can be applied. Finally, the ℓ_{1} regularization common in CS can be implemented efficiently using simple thresholding schemes within the optimization algorithms; computational shortcuts like this are not exploited in most off-the-shelf software. Current sparse reconstruction methods are designed so that these computational considerations are taken into account.

For example, gradient projection methods^{64, 65} introduce additional variables and recast Eq. 3 as a constrained optimization problem with a differentiable objective function. Gradient descent directions, which are generally easy to compute, are used at each iteration, and are then projected onto the constraint set so that each step is feasible. The projection involves only simple thresholding and can be done very quickly, which leads to fast computation at each iteration. Alternatively, methods called iterative shrinkage/thresholding algorithms^{66, 67, 68, 69} map the objective function onto a sequence of simpler optimization problems which can be solved efficiently by shrinking or thresholding small values in the current estimate of θ. Another family of methods based on matching pursuits (MP)^{70, 71, 72} starts with
[TeX:]
\documentclass[12pt]{minimal}\begin{document}${\widehat{\theta }}= 0$\end{document}
$\widehat{\theta}=0$
and greedily chooses elements of
[TeX:]
\documentclass[12pt]{minimal}\begin{document}${\widehat{\theta }}$\end{document}
$\widehat{\theta}$
to have nonzero magnitude by iteratively processing residual errors between *y* and
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$A{\widehat{\theta }}$\end{document}
$A\widehat{\theta}$
. MP approaches are well-suited to settings with little or no noise; in contrast, gradient-based methods are more robust to noisier problems. Additionally, they are generally faster than MP methods, as *A* is not formed explicitly and is used only for computing matrix-vector products. Finally, while these methods do not have the fast quadratic convergence properties that are theoretically possible with some black-box optimization methods, they scale better with the size of the problem, which is perhaps the most important issue in practical CS imaging.

There is a wide variety of algorithms in the literature and online available for solving Eq. 3 and its variants. Many of these algorithms aim to solve the exact same convex optimization problem, and hence will all yield the same reconstruction result. However, because of specific implementation aspects and design decisions, some algorithms require fewer iterations or less computation time per iteration depending on the problem structure. For instance, some algorithms will exhibit faster convergence with wavelet sparsity regularization, while others may converge more quickly with a total variation regularizer. Similarly, if the *A* matrix has a helpful structure (e.g., Toeplitz), then multiplications by *A* and *A*^{T} can be computed very quickly and algorithms which exploit this structure will converge most quickly, while if *A* is a pseudorandom matrix then algorithms which attempt to limit the number of multiplications by *A* or *A*^{T} may perform better.

## 4.1.

### Alternative Sparsity Measures

While most of the early work in CS theory and methods focused on measuring and reconstructing signals which are sparse in some basis, current thinking in the community is more broadly focused on high-dimensional data with underlying low-dimensional structure. Examples of this include conventional sparsity in an orthonormal basis,^{11} small total variation,^{73, 74, 75, 76, 77} a low-dimensional submanifold of possible scenes,^{78, 79, 80, 81, 82} a union of low-rank subspaces,^{83, 84} or a low-rank matrix in which each column is a vector representation of a small image patch, video frame, or small localized collection of pixels.^{85, 86} Each of the above examples has been successfully used to model structure in “natural” images and hence facilitate accurate image formation from compressive measurements (although some models do not admit reconstruction methods based on convex optimization, and hence may be computationally complex or only yield locally optimal reconstructions). A variety of penalties based on similar intuition have been developed from a Bayesian perspective. While these methods can require significant burn-in and computation time, they allow for complex, nonparametric models of structure and sparsity within images, producing compelling empirical results in denoising, interpolating and compressive sampling that are on par with, if not better than, established methods.^{87}

## 4.2.

### Non-negativity and Photon Noise

In optical imaging, we often estimate light intensity, which *a priori* be non-negative. Thus, it is necessary that the reconstruction
[TeX:]
\documentclass[12pt]{minimal}\begin{document}${\widehat{f}}= W{\widehat{\theta }}$\end{document}
$\widehat{f}=W\widehat{\theta}$
is non-negative, which involves adding constraints to the CS optimization problem Eq. 3, i.e.,

## 5

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} \widehat{\theta} &=& {\mathop{{\rm arg\; min}}\limits_{\theta \in {\bm R}^N}} \quad \frac{1}{2} \| x - RW\theta \|_2^2 + \tau \textrm{pen}(\theta) \nonumber \\ && \textrm{subject to \ } W\theta \ge 0 \\ \widehat{f} &=& W \widehat{\theta} \nonumber \end{eqnarray} \end{document} $$\begin{array}{ccc}\hfill \widehat{\theta}& =& \underset{\theta \in {\mathit{R}}^{N}}{\mathrm{arg}\phantom{\rule{0.28em}{0ex}}\mathrm{min}}\phantom{\rule{1em}{0ex}}\frac{1}{2}{\Vert x-RW\theta \Vert}_{2}^{2}+\tau \mathrm{pen}\left(\theta \right)\hfill \\ & & \mathrm{subject}\mathrm{to}\phantom{\rule{0.33em}{0ex}}W\theta \ge 0\hfill \\ \hfill \widehat{f}& =& W\widehat{\theta}\hfill \end{array}$$^{88}

In the context of low-light settings, where measurements are inherently noisy due to low count levels, the inhomogeneous Poisson process model^{89} has been used in place of the
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\Vert x-RW\theta \Vert _2^2$\end{document}
${\Vert x-RW\theta \Vert}_{2}^{2}$
term in Eq. 5. The adaptation of the reconstruction methods described previously can be very challenging in these settings. In addition to enforcing non-negativity, the negative Poisson log likelihood used in the formulation of an objective function often requires the application of relatively sophisticated optimization theory principles.
^{90, 91, 92, 93, 94, 95, 96}

## 4.3.

### Video

Compressive image reconstruction naturally extends to sparse video recovery since videos can be viewed as sequences of correlated images. Video compression techniques can be incorporated into reconstruction methods to improve speed and accuracy. For example, motion compensation and estimation can be used to predict changes within a scene to achieve a sparse representation.^{97} Here, we describe a less computationally intensive approach based on exploiting interframe differences.^{98}

Let
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$f^\star _1,f^\star _2,\ldots$\end{document}
${f}_{1}^{\star},{f}_{2}^{\star},...$
be a sequence of images comprising a video, and let *W* be an orthonormal basis in which each
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$f_t^{\star }$\end{document}
${f}_{t}^{\star}$
is sparse, i.e.,
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$f_t^{\star } = W\theta _t^{\star }$\end{document}
${f}_{t}^{\star}=W{\theta}_{t}^{\star}$
, where
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta _t^{\star }$\end{document}
${\theta}_{t}^{\star}$
is mostly zeros. To recover the video sequence
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\lbrace f_t^{\star } \rbrace$\end{document}
$\left\{{f}_{t}^{\star}\right\}$
, we need to solve for each
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$f^\star _t$\end{document}
${f}_{t}^{\star}$
. Simply applying Eq. 5 at each time frame works well, but this approach can be improved upon by solving for multiple frames simultaneously. In particular, rather than treating each frame separately, we can exploit interframe correlation and solve for
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta _t^{\star }$\end{document}
${\theta}_{t}^{\star}$
and the difference
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\mathit {\Delta }\theta _t^{\star } \triangleq \theta _{t+1}^{\star } - \theta _t^{\star }$\end{document}
$\Delta {\theta}_{t}^{\star}\triangleq {\theta}_{t+1}^{\star}-{\theta}_{t}^{\star}$
instead of
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta _t^{\star }$\end{document}
${\theta}_{t}^{\star}$
and
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\theta _{t+1}^{\star }$\end{document}
${\theta}_{t+1}^{\star}$
. This results in the coupled optimization problem:

## 6

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} &&{\left[\begin{array}{l} \ \ \ \! \widehat{\theta }_t \\[6pt] \mathit {\Delta }\widehat{\theta }_t \end{array}\right]} = \mathop{\textrm {arg min}}_{{\theta _t, \mathit {\Delta }\theta _t}} \quad \ \ \ \frac{1}{2} \left\Vert {\left[ \begin{array}{l} x_t \\[3pt] x_{t+1} \end{array}\right]} \! - \! {\left[ \begin{array}{l} R_t\quad 0 \\[3pt] 0\quad R_{t+1} \end{array}\right]}\! {\left[ \begin{array}{l} W\quad 0 \\[3pt] W\quad W \end{array}\right]}\! {\left[ \begin{array}{l} \ \ \ \! \theta _t \\[3pt] \mathit {\Delta }\theta _t \end{array}\right]} \right\Vert _2^2 + \tau _1 \Vert \theta _t \Vert _1 + \tau _2 \Vert \mathit {\Delta }\theta _t \Vert _1 \\[3pt] &&\qquad\quad\qquad \mathop {\rm subject\,\, to}\ \ \ \ \! W\theta _t \ge 0, \ W(\theta _t + \mathit {\Delta }\theta _t) \ge 0, \nonumber \end{eqnarray} \end{document} $$\begin{array}{ccc}& & \left[\begin{array}{c}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}{\widehat{\theta}}_{t}\hfill \\ \Delta {\widehat{\theta}}_{t}\hfill \end{array}\right]=\underset{{\theta}_{t},\Delta {\theta}_{t}}{\mathrm{arg}\mathrm{min}}\phantom{\rule{1em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\frac{1}{2}{\Vert \left[\begin{array}{c}{x}_{t}\hfill \\ {x}_{t+1}\hfill \end{array}\right]-\left[\begin{array}{c}{R}_{t}\phantom{\rule{1em}{0ex}}0\hfill \\ 0\phantom{\rule{1em}{0ex}}{R}_{t+1}\hfill \end{array}\right]\left[\begin{array}{c}W\phantom{\rule{1em}{0ex}}0\hfill \\ W\phantom{\rule{1em}{0ex}}W\hfill \end{array}\right]\left[\begin{array}{c}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}{\theta}_{t}\hfill \\ \Delta {\theta}_{t}\hfill \end{array}\right]\Vert}_{2}^{2}+{\tau}_{1}\Vert {\theta}_{t}{\Vert}_{1}+{\tau}_{2}{\Vert \Delta {\theta}_{t}\Vert}_{1}\hfill \\ & & \phantom{\rule{2.em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{2.em}{0ex}}\mathrm{subject}\phantom{\rule{0.16em}{0ex}}\phantom{\rule{0.16em}{0ex}}\mathrm{to}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}\phantom{\rule{0.33em}{0ex}}W{\theta}_{t}\ge 0,\phantom{\rule{0.33em}{0ex}}W({\theta}_{t}+\Delta {\theta}_{t})\ge 0,\hfill \end{array}$$_{1}, τ

_{2}> 0 and

*R*

_{t}and

*R*

_{t + 1}are the observation matrices at times

*t*and

*t*+ 1, respectively. When [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta _{t+1}^* \approx \theta _t^*$\end{document} ${\theta}_{t+1}^{*}\approx {\theta}_{t}^{*}$ , then [TeX:] \documentclass[12pt]{minimal}\begin{document}$\Delta \theta _t^* = \theta _{t+1}^* - \theta _t^*$\end{document} $\Delta {\theta}_{t}^{*}={\theta}_{t+1}^{*}-{\theta}_{t}^{*}$ is very sparse compared to [TeX:] \documentclass[12pt]{minimal}\begin{document}$\theta _{t+1}^*$\end{document} ${\theta}_{t+1}^{*}$ , which makes it even better suited to the sparsity-inducing ℓ

^{1}-penalty terms in Eq. 6. We use different regularization parameters τ

_{1}and τ

_{2}to promote greater sparsity in [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathit {\Delta }\theta _t$\end{document} $\Delta {\theta}_{t}$ . This approach can easily be extended to solve for many frames simultaneously.

^{98, 99, 100}

## 4.4.

### Connection to Super-Resolution

The above compressive video reconstruction method has several similarities with super-resolution reconstruction. With super-resolution, observations are typically not compressive in the CS sense, but rather downsampled observations of a high resolution scene. Reconstruction is performed by: a. estimating any shift or dithering between successive frames, b. estimating the motion of any dynamic scene elements, c. using these motion estimates to construct a sensing matrix *R*, and d. solving the resulting inverse problem, often with a sparsity-promoting regularization term.

There are two key differences between super-resolution image/video reconstruction and compressive video reconstruction. First, CS recovery guarantees to place specific demands on the size and structure of the sensing matrix; these requirements may not be satisfied in many super-resolution contexts, making it difficult to predict performance or assess optimality. Second, super-resolution reconstruction as described above explicitly assumes that the observed frames consist of a small number of moving elements superimposed upon slightly dithered versions of the same background; thus estimating motion and dithering is essential to accurate reconstruction. CS video reconstruction, however, does not require dithering or motion modeling as long as each frame is sufficiently sparse. Good motion models can improve on reconstruction performance, as shown in Sec. 6, by increasing the amount of sparsity in the variable (i.e., sequence of frames) to be estimated, but even if successive frames are very different, accurate reconstruction is still feasible. This is not the case with conventional super-resolution estimation.

## 5.

## Infrared Camera Examples

IR cameras are a particularly promising target for compressive sampling owing in large part to the manufacturing costs associated with the FPA. Demands for large format, high resolution FPAs have historically meant even smaller pixels grown on even larger substrates. For typical materials used in infrared imagers (e.g., HgCdTe, InSb), the manufacturing process is extremely challenging and requires (among other things) controlling material geometry at the pixel-level. Recent developments in FPA technology have resulted in 20 μm pitch FPAs across the infrared spectrum. Yuan (Teledyne Judson Technologies) have recently demonstrated a 1280 × 1024, 20 μm pitch InGaAs FPA for the shortwave infrared (SWIR) waveband, 1.0 to 1.7 μm.^{8} In the midwave infrared, 3 to 5 μm, Nichols
^{9} used a 2048 × 2048, 20 μm pitch InSb FPA (Cincinnati Electronics), while Carmody report on a 640 × 480, 20 μm pitch HgCdTe FPA.^{10} Despite these advances, numerous research funds continue to be spent on developing even larger format arrays while attempting to decrease the pixel pitch to ⩽15 μm. The ability to improve the resolution of existing FPA technology without physically reducing pixel size, therefore, has some potentially powerful implications for the IR imaging community.

In this section, we explore taking existing focal plane array technology and using spatial light modulation (SLM) to produce what is effectively a higher resolution camera. The specific architecture used will depend on whether one is imaging coherent or incoherent light and on the physical quantity being measured at the detector. For incoherent light, typical imaging systems are linear in intensity and the focal plane array is directly measuring this intensity. Thus, one way to design a CS imager is to use a fixed, coded aperture as part of the camera architecture.^{36, 38, 42, 101} The imager shown schematically at the top of Fig. 3 illustrates one such approach, whereby the aperture modulates the light in the image's Fourier plane. Mathematically, this architecture is convolving the desired image intensity *f* with the magnitude squared of the point spread function
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$|h|^2=|\mathcal {F}^{-1}(H)|^2$\end{document}
${\left|h\right|}^{2}={\left|{\mathcal{F}}^{-1}\left(H\right)\right|}^{2}$
where *H* is the mask pattern and
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\mathcal {F}$\end{document}
$\mathcal{F}$
denotes the Fourier transform. Low-resolution observations are then obtained by downsampling the impinging light intensity either by integration downsampling at the detector or by subsampling using SLMs. Each downsampling approach has its advantages and disadvantages: integration downsampling allows for more light and hence higher SNR observations, but results in loss of high frequency components and adversely impacts the incoherence of *A*, whereas subsampling results in much stronger theoretical properties but allows for less light. Thus, depending on the light conditions, one approach might be more suitable than the other. For example, in high photon count regimes, subsampling will more likely yield a more accurate reconstruction. Finally, if the mask is designed so that the resulting observation model satisfies the RIP property and the image is sparse in some basis (e.g., wavelet), then the underdetermined problem can be solved for the high-resolution image *f*.^{36} Using this approach, the spatial resolution of the modulation becomes the resolution of the recovered image. That is to say, if we desire a 2 × improvement in resolution, our coded aperture would possess 1/2 the pixel width of the FPA. Heuristically, one can think of the convolution as a mechanism for spreading localized (spatial) image information over all pixels at the detector. One can additionally think of the fixed, known, mask pattern as the unique “key” that allows the reconstruction algorithm to then extract the true image, despite the apparent ambiguity introduced by the detector downsampling. This is precisely what the RIP property means physically for the system architecture, namely that it (a) modulates the image in a manner that allows all of the needed information to be encoded in the compressed samples, yet (b) ensures a unique solution to the true image can be found. Although all the information needed for reconstruction is present in the compressed samples, the light blocked by the aperture decreases the signal-to-noise ratio. One possible solution is to increase the camera integration time though this can result in motion blur artifacts.

Another possibility is to use the architecture shown at the bottom of Fig. 3 as proposed by Rivenson
^{102} (see also Romberg^{38}). This architecture is more appropriate for imaging coherent light. In this approach, one makes use of Fourier optics to convolve the electromagnetic field associated with the image and a random phase pattern. A second phase modulator, located at the image plane, is necessary in order for the resulting sensing matrix satisfies the RIP.^{102} Note that the detector in this architecture needs to be capable of measuring both real and imaginary parts of the complex image (as opposed to image intensity). The potential advantage of this architecture is that it modulates phase only, thus light is not blocked as it is in the coded aperture architecture. However, this camera architecture requires two SLMs, each with its own inherent losses. Furthermore, detectors that can measure the complex image field are far less common than those that measure image intensity.

Regardless of the specific architecture used, one could either consider the coded apertures to be fixed patterns that do not change in time or, given the state of SLM technology, one could dynamically modify the pattern for video data. This could potentially provide a benefit (i.e., help guarantee the RIP) to the multiframe reconstruction methods described in Section 4. Finally, we should point out that in a typical camera the image is not located exactly one focal length away. Rather, the system possesses a finite aperture *D*, designed in conjunction with the focal length *FL* to collect imagery at some prescribed range from the camera. In this more general case there will still be a lens Fourier plane at which we can apply our modulation, however, the location of the mask will need to be specified to within δ∝λ(*FL*/*D*)^{2}*m*. Diffraction effects may also need to be considered in developing the signal model to be used in solving the ℓ_{2} − ℓ_{1} reconstruction problem; potential solutions to this problem for conventional coded apertures have been described by Stayman
^{103}

Central to each of the above-described architectures is the availability of fixed apertures, digital micromirror devices, or SLMs capable of spatially modulating the light at the desired spatial resolution. A fixed coded aperture is clearly the simplest design and can be manufactured easily to have micrometer resolution using chrome on quartz. SLMs allow codes to be changed over time, but current SLM technology restricts the masks to have lower resolution than chrome on quartz solutions.^{104} For example, Boulder Nonlinear Systems manufactures a 512 × 512, 15 μm pitch device that can operate in the visible, short-wave IR, midwave IR, or long-wave IR wavelengths and can be used to spatially modulate the amplitude, phase, or both.^{105} Sections
5.1, 5.2 illustrate how coded-aperture IR imaging can be used to enhance the resolution of existing IR cameras.

## 5.1.

### Midwave Camera Example

As an illustrative example, we consider bar target imagery collected from a midwave infrared camera. The camera, shown in Fig. 3, was designed for maritime search and tracking applications. Optically, the camera is a catadioptric, f/2.5 aperture system that provides a 360° horizontal and a -10° to +30° elevation field of view. The core of the imaging system is a 2048 × 2048 pixel, 15 μm pitch, cryogenically cooled indium antimonide focal plane array. The sensor is designed to operate in the 3.5 to 5 μm spectral band with a CO_{2} notch filter near 4.2 μm. Additional details for the camera are described by Nichols
^{9}

In order to evaluate the compressive imaging reconstruction algorithm, we collected bar target imagery generated using a CI System extended blackbody source held at fixed temperatures ranging from 5 to 45C. Four bar patterns of varying spatial frequencies occupied a 128 × 128 pixel area. Figure. 4 shows the image acquired for a bar target temperature of *T* = 45 C. This image was then downsampled by a factor of 2 in order to simulate observations from a lower resolution, 30 μm focal plane array. The downsampled 64 × 64 image is shown in Fig. 4. Low resolution, compressed imagery was also generated by numerically applying a fixed, coded aperture with random, Gaussian distributed entries in the lens Fourier plane prior to downsampling [e.g., simulate the architecture of Fig. 3]. That is to say the degree of transparency for each of the elements in the mask *H* is chosen to be random draws from a truncated Gaussian distribution spanning the range 0 (no light blocked) to 1 (all light blocked). Each sample collected by a 30 μm pixel is therefore modeled as a summation over four, 15 μm blocks of the image intensity convolved with the magnitude squared of the point spread function
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$h=\mathcal {F}^{-1}(H)$\end{document}
$h={\mathcal{F}}^{-1}\left(H\right)$
. This process is shown at the single pixel level in Fig. 5.

There are certainly multiple images that could give rise to the same compressed sample value. What is remarkable about CS theory is that it tells us that if the image is sparse in the chosen basis and if we modulate the image in accordance with the RIP property, then this ambiguity is effectively removed and the true image recovered. The practical advantage to using this system, as was pointed out in Sec. 3, is that the information needed to recover the high resolution image is encoded in a single low-resolution image. Thus there is no need for multiple shots for the recovery to work. The cost, however, as was already mentioned, is a decrease in signal to noise ratio.

The compressed measurements are shown in Fig. 4 and clearly illustrate the influence of the randomly coded aperture on the image. Finally, to recover the image, we solved Eq. 3 using the gradient projection for sparse reconstruction (GPSR) algorithm,^{64} which is a gradient-based optimization method that is very fast, accurate, and efficient. The basis used in the recovery was the length-12 Coiflet. We have no reason to suspect this is the optimal basis (in terms of promoting sparsity) for recovery; however among the various wavelet basis tried by the authors, little difference in the results was observed. The image in Fig. 4 is the full resolution (128 × 128) reconstruction.

Qualitatively it can be seen that the reconstruction algorithm correctly captures some of the details lost in the downsampling. This comparison can be made quantitative by considering a “slice” through the 0.36 (cycle/mrad) resolution bar target (lower right set of bars). Figure 4 compares the measured number of photons in the original 45 C bar target image to that achieved by interpolating the downsampled image and the recovered image. Clearly, some of the contrast is lost in the downsampled imagery, whereas the compressively sampled, reconstructed image suffers little loss. Figure 4 shows how the estimated image contrast behaves as a function of bar target temperature. Contrast was determined for the bar target pixels as
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$|\max (f)-\min (f)|/\bar{f}$\end{document}
$|\mathrm{max}\left(f\right)-\mathrm{min}\left(f\right)|/\overline{f}$
, where
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$\bar{f}$\end{document}
$\overline{f}$
is the mean number of photons collected by the background and max (*f*), min (*f*) are, respectively, the largest and smallest number of photons collected across the bar target.

Regardless of sampling strategy, the contrast is greater near the extremes (*T* = 5 C, *T* = 45 C) and is lowest near the ambient (room) temperature of *T* = 27 C as expected. However, for areas of high contrast we see a factor of 2 improvement by using the compressive sampling strategy over the more conventional bi-cubic interpolation approach. We have also tried linear interpolation of the low-resolution imagery and found no discernible difference in the results. Based on these results we see that a CS camera with half the resolution of a standard camera would be capable of producing the same resolution imagery as the standard camera without any significant reduction in image contrast.

## 5.2.

### SWIR Video Example

We now consider the application of compressive coded apertures on video collected by a short-wave IR (0.9 to1.7μm) camera. The camera is based on a 1024 × 1280 InGaAs (indium, gallium arsenide) focal plane array with 20 μm pixel pitch. Optically, the camera was built around a fixed, *f*/2 aperture and provides a 6° field of view along the diagonal with a focus range of 50 m → ∞. Imagery were output at the standard 30 Hz frame rate with a 14 bit dynamic range. The video used in the following example consists of 25 256×256 pixel frames cropped from the full imagery. All video data were collected at the Naval Research Laboratory, Chesapeake Bay Detachment, in November 2009. We simulate the performance of a low-resolution noncoded video system by simply downsampling the original video by a factor of 2 in each direction. We also simulate the performance of a low-resolution noncoded video system and perform two methods for reconstruction: the first solves for each frame individually while the second solves for two frames simultaneously. For this experiment, we again used the GPSR algorithm. The maximum number of iterations for each method are chosen so that the aggregate time per frame for each method is approximately the same. The 25^{th} frames for the original signal, the reconstruction using the single-frame noncoded method, and the reconstruction using the two-frame coded method are presented in Fig. 6. The root-mean squared error (RMSE) values for this frame for the single-frame noncoded and the two-frame coded methods are 3.78% and 2.03%, respectively.

## 6.

## Conclusions

This tutorial is aimed at introducing optical engineers to several of the theoretical breakthroughs and practical challenges associated with compressed sensing in optical systems. While many of the theoretical results are promising, in that a relatively small focal plane array can be used to collect high resolution imagery, translating this theory to practice requires careful attention to the tradeoffs between focal plane array size; optical component size, weight, and expense; admissibility of theory in practical systems; and choice of reconstruction method. Through proof-of-concept experiments with a bar target and with video in SWIR systems, we demonstrate how compressed sensing concepts can be used to improve contrast and resolution in practical optical imaging settings.

## Acknowledgments

The authors would like to thank Zachary Harmany for sharing his sparse reconstruction algorithms for compressive coded apertures. This work was supported by NSF CAREER Awards CCF-06-43947, DMS-08-11062, DARPA Grant No. HR0011-09-1-0036, NGA Award HM1582-10-1-0002, and AFRL Grant FA8650-07-D-1221.

## References

*p*is much larger than

*n*,” Ann. Stat. 35, 2313–2351 (2007). 10.1214/009053606000001523 Google Scholar

_{1}minimization,” Ann. Stat. 37, 2145–2177 (2009). 10.1214/08-AOS653 Google Scholar

_{1}minimization,” Proc. Natl. Acad. Sci. U.S.A 100(5), 2191–2202 (2003). 10.1073/pnas.0535624100 Google Scholar

_{1}minimization,” Proc. Natl. Acad. Sci. U.S.A. 100(5), 2197–2202 (2003). 10.1073/pnas.0437847100 Google Scholar

*IEEE Trans. Image Process*. (2010) (submitted). Google Scholar

## Biography

**Rebecca M. Willett** is an assistant professor in the Electrical and Computer Engineering Department at Duke University. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005. Prof. Willett received the National Science Foundation CAREER Award in 2007, is a member of the DARPA Computer Science Study Group, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. Prof. Willett has also held visiting researcher positions at the Institute for Pure and Applied Mathematics at UCLA in 2004, the University of Wisconsin-Madison 2003-2005, the French National Institute for Research in Computer Science and Control (INRIA) in 2003, and the Applied Science Research and Development Laboratory at GE Healthcare in 2002. Her research interests include network and imaging science with applications in medical imaging, wireless sensor networks, astronomy, and social networks. Additional information, including publications and software, are available online at http://www.ee.duke.edu/~willett/.

**Roummel F. Marcia** received a BA in Mathematics from Columbia University in 1995 and his PhD in Mathematics from the University of California, San Diego in 2002. He was a Computation and Informatics in Biology and Medicine Postdoctoral Fellow in the Biochemistry Department at the University of Wisconsin-Madison and a Research Scientist in the Electrical and Computer Engineering at Duke University. He is currently an Assistant Professor of Applied Mathematics at the University of California, Merced. His research interests include nonlinear optimization, numerical linear algebra, signal and image processing, and computational biology.

**Jonathan M. Nichols** received a BSc degree from the University of Delaware in 1997 and MSc and PhD degrees from Duke University in 1999 and 2002 respectively, all in Mechanical Engineering. He currently works for the Naval Research Laboratory in Washington, DC as a member of the Maritime Sensing Section in the Optical Sciences Division. His research interests include infrared imaging, signal processing, and inverse problems.