## 1.

## Introduction

Coded-aperture imaging was introduced by Dicke^{1} and Ables^{2} in the 1960s, and the field developed rapidly for astronomical x-ray and gamma-ray imaging. Coded-aperture imagers extend the pinhole camera concept, which requires no lenses, has unlimited depth of focus, and can image radiation of any wavelength. By placing a large number of pinholes in a common aperture plane, the light-gathering capability is greatly increased. Improved sensitivity comes at the price of having the focal plane record a multiplex of overlapping images, requiring algorithmic reconstruction to render a clear image. Thus, coded-aperture imaging is a subset of the general field of computational imaging.

Some limitations of a random pinhole distribution, primarily numerical instability due to the existence of zeros in the inverse-mask Fourier-domain transfer function, were mitigated with the development of coded apertures with uniformly redundant arrays (URAs)^{3} and modified URAs (MURAs)^{4} of pinholes. By placing the pinholes in prime-number-based patterns determined by sampling theory, it was thought that the near-delta-function system point-spread functions (PSFs) could be achieved (after deconvolving the distribution with appropriate filter functions), greatly improving the image reconstruction.

Use of coded-aperture imaging for general terrestrial applications poses two challenges. First, strictly speaking, the information-theoretic advantages of URAs and MURAs apply only for radiation wavelengths short enough that diffraction is negligible, since a diffracted MURA is no longer a MURA. Thus, coded-aperture imaging was applied mainly for x-ray and gamma-ray astronomies. Second, MURAs have limited effectiveness for extended scenes in general, regardless of radiation wavelength, due to the fundamentally ill-posed nature of reconstructing an extended scene from a finite-sized coded-aperture image. Thus, exploiting the potential of coded-aperture imaging requires addressing both diffraction and ill-posed reconstruction with very fast and efficient decoding methods. The work presented in this paper shows how to accomplish that goal.

## 1.1.

### Objective: Lensless Imaging of Extended Scenes with Natural Illumination

In the astronomical applications for which coded aperture imaging was originally developed, the scene typically contains a discrete bright object in a largely dark field. A key assumption behind MURA encoding is that “all [nonzero] object points contribute a complete cyclic version of the basic [aperture] pattern,”^{3} and the imaging geometry is designed to ensure that the complete aperture “shadows” are collected on the focal-plane array. However, terrestrial applications involve nonzero scene elements extending to and past the edge of the effective field-of-view: these scene elements will cast only partial aperture shadows on the focal plane. This effectively causes more object points to contribute to the image than there are image pixels, creating an inherently ill-posed problem for reconstructing the “true” scene.

To address the ill-posed imaging problem, we developed a new class of coded aperture masks which provide compelling advantages over MURAs. The new class of masks, which we call separable Doubly-Toeplitz apertures, provides images that are orders of magnitude faster to decode and are less affected by the ill-posed nature of the imaging of extended scenes—their transfer functions have far fewer zero eigenvalues than do those of MURA masks. Unlike MURAs, separable Doubly-Toeplitz apertures retain their properties when diffracted, making them much more robust for broadband imaging than MURAs.

This paper addresses the optics and mathematics of the new class of coded apertures.

The method has also been validated with real-world imaging of extended scenes in natural light. Such validations are absolutely necessary—much of the published work based on simulations alone presents overly optimistic predictions of coded-aperture performance. Thus, we provide a demonstration of coded-aperture imaging in visible light acquired with off-the-shelf hardware. This work also uncovered several optical phenomena that are minor nuisances for lensed imaging, but causes significant performance degradation for coded apertures, and would not have been uncovered by modeling and simulation alone. A future publication will provide in-depth exposition of the practical optical challenges for developing software-defined optics.

## 1.2.

### Former State-of-the-Art: Fourier Decoding and Modified Uniformly Redundant Array Masks

The coded-aperture mask is a flat plane parallel to an imaging array. In this configuration, the image $\mathbf{I}$ is just a correlation function between the object $\mathbf{O}$ and the mask $\mathbf{M}$:

or in terms of matrix elementsThe goal of decoding is to use knowledge of $\mathbf{M}$ to recover the object scene $\mathbf{O}$ from the encoded image $\mathbf{I}$.

The prior state-of-the-art in coded-aperture imaging was to select binary codes which could be inverted via forward correlation with slightly modified versions of themselves. Specifically, mask codes $\mathbf{M}$ were selected for which the masks are simply a collection of binary (0% or 100% transparent) elements and for which the inverse matrix $\mathbf{G}$, such that

## (3)

$${\mathbf{O}}_{\text{Estimated}}=\mathbf{G}\otimes \mathbf{I}=\mathbf{G}\otimes \mathbf{M}\otimes \mathbf{O}\approx \mathbf{O},$$^{3}masks,

URAs and MURAs satisfy Eq. (3) under periodic boundary conditions. In practice, diffraction is not always negligible. An example of an ideal mask is shown in Fig. 1, along with a diffracted mask image. In this case, the MURA was lithographically deposited as a pattern on a glass slide, so that it was truly binary, either passing or blocking light. However, the collected mask image shows effects of scattering and diffraction, compromising the self-inverse property. In addition, as is explained below, even ideal MURAs have limited utility for extended scenes.

Because diffracted MURAs do not obey Eq. (3), inversion of the transfer function is required. The most straightforward approach for inversion of Eq. (1) is by Fourier transform. Our initial investigations using spatial-light-modulator (SLM)-defined masks showed noisy low-contrast recovered images. We successively eliminated the causes such as diffraction, polarization, and nonuniform SLM-element throughput. Finally, we traced the performance limitations to the ill-posed mathematics inherent in imaging extended scenes. The larger the mask, the more ill-posed the inversion, thus requiring regularization.

Initially, we implemented regularized decoding by introducing a real, positive function ${c}^{2}(\mathbf{k})$ into the Fourier-transform-based decoding and computing an estimated image from

## (5)

$${O}_{\text{Recovered}}(\mathbf{k})\approx I(\mathbf{k})M(\mathbf{k})/[{|M(\mathbf{k})|}^{2}+{c}^{2}(\mathbf{k})].$$This is a special case of the Tikhonov^{5} regularization method used in image deblurring^{6}—${c}^{2}(\mathbf{k})$ plays the dual roles of regularizing the inversion and controlling the effects of noise. With this regularization, recognizable decodings of laboratory images were achieved with both lithographically produced and programmable-SLM-based masks.

## 2.

## New Approach to Ill-Posed Coded-Aperture Decoding: Theory

Even with regularized Fourier inversion, decoded images of extended scenes were usually unrecognizable. We traced this to the Fourier method itself, which has the limitation of implicitly assuming cyclic or periodic behavior of the object $\mathbf{O}$. While the cyclic assumption is acceptable for compact objects against dark backgrounds, it is inappropriate for extended scenes viewed through large aperture masks.

To relax the assumption of cyclic boundaries, we recast the decoding problem in the most general terms. The correlation of a filter $\mathbf{M}$ with an object scene $\mathbf{O}$ to yield an image $\mathbf{I}$ in Eq. (1) can also be expressed as a simple matrix multiplication:

## (6)

$$\mathbf{I}\mathbf{1}={\mathbf{M}}^{\prime \prime}\xb7\mathbf{O}\mathbf{1},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{so that}\text{\hspace{0.17em}}\text{\hspace{0.17em}}I{1}_{i}=\sum _{k}{{M}^{\prime \prime}}_{i,k}O{1}_{k},$$## (7)

$$I{1}_{{C}_{I}\times i+j}={I}_{i,j},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}O{1}_{{C}_{O}\times i+j}={O}_{i,j},$$## (8)

$${{M}^{\prime \prime}}_{1,{C}_{O}\times i+j}=\{\begin{array}{ll}{M}_{i,j}& \text{for}\text{\hspace{0.17em}\hspace{0.17em}}j\in \text{the first}\text{\hspace{0.17em}}{C}_{F}\text{\hspace{0.17em}}\text{columns}\\ 0& \text{for}\text{\hspace{0.17em}\hspace{0.17em}}j\in \text{later columns}\end{array}.$$The size of the matrix ${\mathbf{M}}^{\prime \prime}$ grows as the sizes of the image and the filter mask grow. If the image has dimensions ${r}_{I}\times {c}_{I}$ and the filter mask has dimensions ${r}_{F}\times \text{\hspace{0.17em}}{c}_{F}$, then the sampled object is of size ${r}_{O}\times {c}_{O}$, with ${r}_{O}=({r}_{I}+{r}_{F}-1)$ and ${c}_{O}=({c}_{I}+{c}_{F}-1)$. This yields for ${\mathbf{M}}^{\prime \prime}$ dimensions of (${r}_{I}\times {c}_{I}$) rows and (${r}_{O}\times {c}_{O}$) columns.

The relationship between image size, filter size, and the extent of the object area sampled illustrates the inherently ill-posed nature of any imaging system. Since the PSF of a real-optical system is never a delta-function, there are always more object points contributing to an image than there are points in the image itself. For conventional imaging systems, where the PSF may be very compact, it is possible to clip or “throw-away” points near a reconstructed image’s edges and to retain only those which represent an approximately well-posed inversion. If significant diffraction is present, then the clipping strategy is inadequate—hence, the emphasis is on the diffraction limit in conventional optics.

For coded-aperture imaging, the filter area (and the spread function) is directly proportional to the light-gathering power of the system. Because of this proportionality of spread to light-gathering, it is not possible to use a clipping strategy to select well-posed portions of the recovered image, except in the trivial case of a single-pinhole aperture. Thus coded-aperture imaging must address inherently ill-posed encoding and decoding.

While the forward correlation can be done exactly using Eq. (6), the inversion (estimating $\mathbf{O}$ from the measurements $\mathbf{I}$) presents a seriously ill-posed^{7} and computationally challenging task. The general case is commonly solved via the regularized normal equations:

## (9)

$${{\mathbf{M}}^{\prime \prime}}^{\mathrm{T}}\mathbf{I}\mathbf{1}=({{\mathbf{M}}^{\prime \prime}}^{\mathrm{T}}{\mathbf{M}}^{\prime \prime}+{\alpha}^{2}{\mathbf{1}}_{{r}_{O}\times {c}_{O}})\mathbf{O}\mathbf{1},$$The operations count for Eq. (9) is on the order of^{8}

For 1 megapixel images, this yields ${10}^{18}$ operations, requiring on the order of ${10}^{6}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{s}$ to 12 days on a teraflops processor. Thus, computations on large images using generalized masks are usually impractical. Further, the matrix ${\mathbf{M}}^{\prime \prime}$ is so large that the accumulation of round-off errors will likely swamp the inversion even if it could be accomplished in a manageable time. The only exception would be if the encoding mask is a single-pixel pinhole aperture—a well-posed but trivial case.

## 2.1.

### Separable Doubly-Toeplitz Masks

We have developed a method to specify finite-sized masks that greatly reduce the degree to which the imaging is ill-posed and render the inversions of Eq. (9) tractable. If the mask $\mathbf{M}$ can be expressed as an outer product of two 1-D vectors $\mathbf{A}$ and $\mathbf{B}$, with lengths ${N}_{\mathrm{A}}$ and ${N}_{\mathrm{B}}$, respectively, then $\mathbf{M}$ is an ${N}_{\mathrm{A}}\times {N}_{\mathrm{B}}$-dimensional matrix of the form

where $A(i)$ is the $i$’th element of vector $\mathbf{A}$, and $B(j)$ is the $j$’th element of vector $\mathbf{B}$. Equation (9) can now be written simply as a product of much smaller two-dimensional (2-D) matrices:The matrices ${\mathbf{M}}_{\mathrm{A}}$ and ${\mathbf{M}}_{\mathrm{B}}$ are Toeplitz, with the elements shifted by 1 pixel on each successive row:

## (14)

$${\mathbf{M}}_{\mathrm{A}}=\left[\begin{array}{ccccccccc}{A}_{1}& {A}_{2}& \dots & {A}_{{N}_{A}}& 0& 0& 0& \dots & 0\\ 0& {A}_{1}& {A}_{2}& \dots & {A}_{{N}_{A}}& 0& 0& \dots & 0\\ \vdots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots \end{array}\right].\phantom{\rule{0ex}{0ex}}$$The vector ${\mathbf{M}}_{\mathrm{B}}$ is similarly constructed from the elements of the vector $\mathbf{B}$.

The difficulty of inversion using Eq. (13) is vastly less than for solving Eq. (9). While this is still an ill-posed problem, the sizes of matrices ${\mathbf{M}}_{\mathrm{A}}$ and ${\mathbf{M}}_{\mathrm{B}}$ are much smaller than ${\mathbf{M}}^{\prime \prime}$ in Eq. (9). With the matrix dimensions summarized in Table 1, the inversions require an operations count on the order of

## (15)

$${N}_{\text{Operations},\text{Doubly}\text{-}\text{Toeplitz}}\propto [{({r}_{I}{r}_{O})}^{3}+{({c}_{I}{c}_{O})}^{3}].$$## Table 1

Relationship of dimensions of the collected image, the encoding filter mask, and the object area contributing to the image.

Matrix | Rows | Columns |
---|---|---|

Coded-aperture mask filter, M | rF | cF |

Image on focal plane, I | rI | cI |

Object area, O | rO=(rI+rF−1) | cO=(cI+cF−1) |

Left Toeplitz mask, MA | rI | rO |

Right Toeplitz mask, MB | cI | cO |

For image sizes on the order of 1 megapixel, this yields on the order of $2\times {10}^{9}$ operations, requiring just milliseconds on a teraflops processor. Thus, unlike arbitrary general masks or diffracted MURAs, separable Doubly-Toeplitz masks can be inverted in a usefully short time.

To our knowledge, this Doubly-Toeplitz strategy for specifying coded apertures has not been previously applied to the spatial imaging problem, though there are hints of 1-D Toeplitz masks in the temporal motion-deblurring work of Agrawal and Raskar.^{9}^{,}^{10}

## 3.

## Diffraction Effects on Doubly-Toeplitz Masks

Having identified an approach to reduce the impact of ill-posed inversions, the investigations next turned to mitigating the effects of diffraction. Simulations verified that the Doubly-Toeplitz property is preserved in a diffracted mask. Simulations also provide a means of synthesizing decoding matrices representing diffracted mask images for use when good laboratory-collected mask images are not available.

Diffraction simulations are particularly straightforward for a collection of identical rectangular apertures placed at points on a regular-rectangular-grid SLM. Let ${\mathbf{I}}_{\mathbf{O}}$ and ${\mathbf{I}}_{\mathbf{P}}$ denote the intensity fields at the object and image planes, respectively, and let ${\mathbf{E}}_{\mathbf{O}}$ denote the complex optical field at the object plane. When the illumination is incoherent, the diffracted intensity in a pixel at location $(i,j)$ on the image plane is

where## (17)

$${Q}_{X}({k}_{x})=\sum _{{j}_{X}}A({j}_{X})\mathrm{exp}\left\{i\frac{2\pi}{\lambda}{j}_{X}{p}_{X}\right[{k}_{x}\frac{\mathrm{\Delta}{x}_{P}}{{f}_{P}}\phantom{\rule{0ex}{0ex}}-\frac{{j}_{X}{p}_{X}}{2}(\frac{1}{R}+\frac{1}{{f}_{P}})\left]\right\}\phantom{\rule{0ex}{0ex}}\times \mathrm{sinc}\left\{\frac{\pi}{\lambda}{a}_{X}\right[{k}_{x}\frac{\mathrm{\Delta}{x}_{P}}{{f}_{P}}-{j}_{X}{p}_{X}(\frac{1}{R}+\frac{1}{{f}_{P}})\left]\right\}$$## (18)

$${Q}_{Y}({k}_{y})=\sum _{{j}_{Y}}B({j}_{Y})\mathrm{exp}\left\{i\frac{2\pi}{\lambda}{j}_{Y}{p}_{Y}\right[{k}_{y}\frac{\mathrm{\Delta}{y}_{P}}{{f}_{P}}\phantom{\rule{0ex}{0ex}}-\frac{{j}_{Y}{p}_{Y}}{2}(\frac{1}{R}+\frac{1}{{f}_{P}})\left]\right\}\phantom{\rule{0ex}{0ex}}\times \mathrm{sinc}\left\{\frac{\pi}{\lambda}{a}_{Y}\right[{k}_{y}\frac{\mathrm{\Delta}{y}_{P}}{{f}_{P}}-{j}_{Y}{p}_{Y}(\frac{1}{R}+\frac{1}{{f}_{P}})\left]\right\}.$$The parameters in Eqs. (16)–(18) are

$({a}_{X},{a}_{Y})$ = aperture opening sizes,

$({p}_{X},{p}_{Y})$ = aperture pitch, $>({a}_{X},{a}_{Y})$,

$(\mathrm{\Delta}{x}_{P},\mathrm{\Delta}{y}_{P})$ = image-plane pixel pitch,

$\mathbf{A}$, $\mathbf{B}$ = vectors of weights of mask elements in the $x$ and $y$ directions,

$R$ = range from aperture mask to object plane,

${f}_{P}$ = range from aperture mask to image plane, and

$\lambda $ = wavelength of light.

Diffraction is a significant contributor to extended-scene coded-aperture imaging in visible light. Our experimental investigations were conducted with a bandpass filter centered on 550 nm, and a coded aperture implemented by varying the throughput of elements of a transmissive SLM. The SLM had elements with dimensions of approximately $24\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}\times 31\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ on a $36\text{-}\mu \mathrm{m}$ pitch. The total area of active pixels measured $1024\times 768\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$, corresponding to a maximum aperture of approximately $27\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}\times 36\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$. The distance from the aperture to the focal plane was typically 65 mm. With these parameter values, the Fresnel number for the coded aperture optics ranged from under 0.02 (with a single SLM element used as a pinhole) to over 40,000. Because the objective of coded-aperture imaging is to capture the advantages of pinhole imaging while maximizing light-gathering capability, many SLM elements were set to the “on” state, so that our experiments were conducted predominantly at Fresnel numbers between 150 and 40,000. We simulated the Fresnel diffraction with a custom-written numerical code developed for coded-aperture imaging and validated by comparison with laboratory results using both programmable and fixed apertures. Results of one of these experiments are shown in Fig. 2. Within the experimental uncertainties caused by random variations of the transmissivity of the SLM elements and the finite resolution of the focal plane, the agreement is very good. In future papers, we plan to present the effects of diffraction, noise, and variable transmissivity on the quality of coded-aperture images via objective and subjective metrics such as peak signal-to-noise ratio and structural similarity.

Examination of Eq. (16) shows that, if the raw aperture mask is an outer product of two 1-D vectors $\mathbf{A}$ and $\mathbf{B}$, then the effects of the diffracted mask can still be expressed as an operation of a pair of Toeplitz matrices. Thus, a separable Doubly-Toeplitz aperture mask will yield a separable Doubly-Toeplitz encoding even when there is significant diffraction. If the object is an incoherent point-like source, then its image will be decomposable into $\mathbf{A}$ and $\mathbf{B}$ vectors which can be used to synthesize decoding masks.

To verify the diffraction model, we conducted a series of experiments comparing simulations with image data, obtaining excellent agreement between model and data. Diffracted coded apertures were also modeled and compared with “collected masks”—images of a point-source viewed through the spatial-coding masks. We verified that the collected masks decompose into the outer products of diffracted 1-D functions, in accord with Eq. (16). The mask shown in Fig. 3 is the basic “workhorse” mask used in the investigations presented in this paper. Three scaled versions of this mask were used: “02” is 2.2-mm wide, “04” is 4.4-mm wide, and “08” is 8.8-mm wide. The “02” mask spans 62 elements of our SLM, with a minimum feature size of two SLM elements. The SLM elements had a nominal $35.5\text{-}\mu \mathrm{m}$ pitch. The pitches were identical along the axes of the SLM, but the aperture openings were asymmetric—$24\text{-}\mu \mathrm{m}$ high and $31\text{-}\mu \mathrm{m}$ wide, accounting for the asymmetry of the diffraction pattern. The 31-element base mask is adapted from the one published by Agrawal and Raskar^{10} for temporal encoding and decoding to compensate for motion blur. In a future publication, we will report on methods for efficient searches for masks optimized specifically for 2-D spatial coded-aperture imaging.

Figure 3 also demonstrates that the diffraction increases the size of the effective encoding masks, increasing the ill-posedness of the inverse problem. This makes separable Doubly-Toeplitz masks even more valuable for cases in which fine imaging resolution is required.

## 4.

## Solution: Method

In real-imaging systems, noise will always arise from effects such as photon counting or read noise, so that the actual inverse problem includes a noise term $\mathbf{N}$. The matrix ${\mathbf{I}}^{\delta}$ denotes the image containing noise effects which are not encoded by the aperture mask. In addition, there is the ill-posed nature of the problem embodied in the nonsquare matrices ${\mathbf{M}}_{\mathrm{A}}$ and ${\mathbf{M}}_{\mathrm{B}}$. Both the noise and the ill-posedness require a regularization method. We begin by multiplying Eq. (13) by the transposes of the mask matrices, obtaining the normal equation:

## (19)

$${{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{I}}^{\delta}{\mathbf{M}}_{\mathbf{B}}=({{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{A}})\mathbf{O}{({{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{B}})}^{\mathrm{T}}.$$The matrix products $({{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{A}})$ and $({{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{B}})$ are square and symmetric, but are not invertible, due to degenerated zero eigenvalues. The degree of degeneracy increases with the size of the filter masks—there are (${r}_{O}-{r}_{I}$) zero eigenvalues for ${\mathbf{M}}_{\mathrm{A}}$, and (${c}_{O}-{c}_{I}$) zero eigenvalues for ${\mathbf{M}}_{\mathrm{B}}$. [Compare this with solving Eq. (9) for a MURA or generic mask, which is orders of magnitude more degenerate, having (${r}_{O}{c}_{O}-{r}_{I}{c}_{I}$) zero eigenvalues.]

To relieve the degeneracies, we introduce regularization parameters ${\alpha}_{\mathrm{A}}$ and ${\alpha}_{\mathrm{B}}$, so that

## (20)

$${{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}\mathbf{I}\text{\hspace{0.17em}}{\mathbf{M}}_{\mathbf{B}}\approx ({{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{A}}+{{\alpha}_{A}}^{2}{\mathbf{1}}_{{R}_{O}})\mathbf{O}{({{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{B}}+{{\alpha}_{B}}^{2}{\mathbf{1}}_{{C}_{O}})}^{\mathrm{T}}\phantom{\rule{0ex}{0ex}}+{{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{NM}}_{\mathbf{B}},$$## (21)

$${\mathbf{O}}_{\mathrm{Est}}={({{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{A}}+{{\alpha}_{A}}^{2}{\mathbf{1}}_{{R}_{O}})}^{-1}{{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}\mathbf{I}\text{\hspace{0.17em}}{\mathbf{M}}_{\mathbf{B}}\phantom{\rule{0ex}{0ex}}{({{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{B}}+{{\alpha}_{B}}^{2}{\mathbf{1}}_{{C}_{O}})}^{-1}.$$## 5.

## Methods for Improving the Solution

## 5.1.

### Normalization of Reconstructed Images

The reconstructed images initially show patterns at their edges within a distance of one mask width (or height) of the border. We noted similar artifacts in the encoded images of uniform (all white) scenes, which led us to a method by which the artifacts could be normalized out of the reconstructed images. The normalization is accomplished by encoding and decoding a uniform all-white object $\mathbf{W}$ to create a matrix ${\mathbf{W}}_{\mathrm{Est}}$ that includes the spurious brightness differences accrued as the coded-aperture mask straddles the edge of the image area:

## (22)

$${\mathbf{W}}_{\mathrm{Est}}={[{{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{M}}_{A}+{{\alpha}_{A}}^{2}{1}_{{R}_{O}\times {R}_{O}}]}^{-1}({{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{M}}_{A})\mathbf{W}({{\mathbf{M}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B})\phantom{\rule{0ex}{0ex}}{[{{\mathbf{M}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B}+{{\alpha}_{B}}^{2}{1}_{{C}_{O}\times {C}_{O}}]}^{-1}.$$${\mathbf{O}}_{\mathrm{Est}}$ is divided by ${\mathbf{W}}_{\mathrm{Est}}$, element-by-element, to obtain a normalized object estimate ${\mathbf{O}}_{\text{Norm}}$:

The normalization can also be expressed as multiplication by two normalization matrices ${\mathbf{D}}_{\mathrm{A}}$ and ${\mathbf{D}}_{\mathrm{B}}$:

## (24)

$${\mathbf{O}}_{\text{Norm}}={\mathbf{D}}_{A}{\mathbf{O}}_{\mathrm{Est}}{{\mathbf{D}}_{B}}^{\mathrm{T}}.$$This is because the all-white matrix $\mathbf{W}$ can be expressed as an outer product of two 1-D vectors of constant value, which in turn allows the normalization estimate to be decomposed into an outer product of two 1-D matrices:

The normalization matrices ${\mathbf{D}}_{\mathrm{A}}$ and ${\mathbf{D}}_{\mathrm{B}}$ each have identical columns, given by

## (26)

$${D}_{A}(i,j)=1/{W}_{A}(i)=1/\{\sum _{k=1}^{{R}_{O}}[{({{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{M}}_{A}+{{\alpha}_{A}}^{2}{1}_{{R}_{O}\times {R}_{O}})}^{-1}\phantom{\rule{0ex}{0ex}}({{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{M}}_{A})](i,k)\}\phantom{\rule{0ex}{0ex}}{D}_{B}(i,j)=1/{W}_{B}(i)=1/\{\sum _{k=1}^{{C}_{O}}[{({{\mathbf{M}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B}+{{\alpha}_{B}}^{2}{1}_{{C}_{O}\times {C}_{O}})}^{-1}\phantom{\rule{0ex}{0ex}}({{\mathbf{M}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B})](i,k)\}.$$The decoded and normalized object estimates are then

## (27)

$${\mathbf{O}}_{\text{Norm}}={\mathbf{D}}_{A}{({{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{M}}_{A}+{{\alpha}_{A}}^{2})}^{-1}{{\mathbf{M}}_{A}}^{T}{\mathbf{I}}^{\delta}{\mathbf{M}}_{B}\phantom{\rule{0ex}{0ex}}{[{({{\mathbf{M}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B}+{{\alpha}_{B}}^{2})}^{\mathrm{T}}]}^{-1}{{\mathbf{D}}_{B}}^{\mathrm{T}}.$$## 5.2.

### Improvement via Landweber Iteration

The results can be further improved by iterating^{11} from the solution in Eq. (27). We settled on the method of Landweber,^{12} which iterates using a regularization constant ${\epsilon}_{L}$:

## (28)

$${\mathbf{O}}_{(n+1)}={\mathbf{O}}_{(n)}+{\epsilon}_{L}{{\mathbf{M}}_{A}}^{\mathrm{T}}{\mathbf{P}}_{A}[{\mathbf{I}}^{\delta}-{\mathbf{M}}_{A}{\mathbf{O}}_{(n)}{{\mathbf{M}}_{B}}^{\mathrm{T}}]{{\mathbf{P}}_{B}}^{\mathrm{T}}{\mathbf{M}}_{B},$$## (29)

$${\mathbf{P}}_{A}={\mathbf{D}}_{\mathbf{A}}{\mathbf{S}}_{\mathbf{A}}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\mathbf{P}}_{B}={\mathbf{D}}_{\mathbf{B}}{\mathbf{S}}_{\mathbf{B}},$$## (30)

$${\mathbf{S}}_{A}={({{\mathbf{M}}_{\mathrm{A}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{A}}+{{\alpha}_{A}}^{2})}^{-1},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\phantom{\rule{0ex}{0ex}}{\mathbf{S}}_{B}={({{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}{\mathbf{M}}_{\mathbf{B}}+{{\alpha}_{B}}^{2})}^{-1}.$$The starting estimate ${\mathbf{O}}_{(0)}$ is estimated from Eq. (27), then iterated using

## (31)

$${\mathbf{O}}_{(n+1)}={\mathbf{O}}_{(n)}+{\epsilon}_{L}^{2}{\mathbf{D}}_{\mathbf{A}}{\mathbf{S}}_{\mathbf{A}}{{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}[{\mathbf{I}}^{\delta}-{\mathbf{M}}_{\mathbf{A}}{\mathbf{O}}_{(n)}{{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}]{\mathbf{M}}_{\mathbf{B}}{\mathbf{S}}_{\mathbf{B}}{\mathbf{D}}_{\mathbf{B}}\phantom{\rule{0ex}{0ex}}+{a}_{n}({\mathbf{O}}_{(n)}-{\mathbf{O}}_{(0)}),$$Equation (31) is actually a modified Landweber method^{13} that is guaranteed to converge as long as $0<{\epsilon}_{L}^{2}<2$. However, the convergent solution is not guaranteed to be correct—the results may be noise dominated unless the iterations stop at a suitable point. We apply an *a posteriori* criterion based on the residuals after $n$ iterations:

## (33)

$${\mathbf{R}}_{(n)}={{\mathbf{M}}_{\mathbf{A}}}^{\mathrm{T}}[{\mathbf{I}}^{\delta}-{\mathbf{M}}_{A}{\mathbf{O}}_{(n)}{{\mathbf{M}}_{\mathbf{B}}}^{\mathrm{T}}]{\mathbf{M}}_{\mathbf{B}},$$The algorithm stops the iterations when the residuals from Eq. (34) begin to increase.

## 6.

## Encoding and Decoding Real-Extended Imageries

The refined methods of coding and decoding made it possible to image extended outdoor scenes in natural light. We used these images to further improve and debug the processing.

One experimental setup is shown in Fig. 4. The camera was situated in a commercial high-rise building in Honolulu, Hawaii, approximately 45 m above the ground, looking east toward the city skyline. The SLM was a Holoeye HEO 0017 LC-SLM with a $1024\times 768$-element active area. This was paired with a Kodak RMV 16000 digital camera. The RMV 16000 was chosen because its focal plane area of $36\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}\times 24\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ is close in size to the $33\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}\times 25\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$ dimensions of the Holoeye SLM and because the RMV pixel pitch of $7.4\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ allows oversampling of the nominal $35.5\text{-}\mu \mathrm{m}$ pitch of the SLM elements, reducing sampling as a source of error. The SLM was placed 6.5 cm in front of the camera’s focal plane.

In addition to the programmable SLM, the camera had a green-bandpass filter to reject optical wavelengths that are not adequately blocked by the “off” pixels of the SLM. We also developed methods to deal with “bad” or “dead” pixels, inhomogeneous focal plane response, fixed-pattern noise, and with stray light reflecting inside the camera. Most of the effects are minor nuisances for conventional lens-assisted imaging, but cause significant degradation of coded-aperture decoding quality. This is because the decoding algorithms attempt to invert the image, artifacts and all, to construct an object scene. Careful reduction of all nonencoded-light is required to achieve satisfactory results.

## 6.1.

### Encoding and Decoding Real-Extended Imagery: Results

The details of the various artifacts and the associated mitigation algorithms will be presented in a future publication. The basic steps to precondition the images then to decode them are outlined in Fig. 5. The denoising step can be either single-frame denoising or denoising by adding multiple raw images before decoding. Stray-light interference limited the effectiveness of single-frame denoising. This was mitigated via a 20-frame average, with the raw images averaged together before application of flat, dark, and defect corrections.

## 6.2.

### Multimask Image Stacking

Our investigations showed that the regularization artifacts differ from mask to mask. Thus, reconstruction could be improved by using multiple images collected with a sequence of different masks. We tested this approach using four 90-deg rotations of the same basic mask. The four images were decoded separately and stacked to provide an improved image. Results from simply averaging the four resulting decoded images are compared with single-mask imagery in Fig. 6 and also compared with an image acquired with a conventional lensed camera.

## 7.

## Conclusions

Our goal was to improve the performance of completely lensless imagers, especially for imaging extended scenes in natural light. We have demonstrated a novel approach, separable Doubly-Toeplitz masks, for specifying coded-aperture masks that allow efficient decoding and have shown the feasibility of using programmable SLMs to implement the new class of masks. This work presents a fundamental change in coded-aperture imaging and lays the foundation for future developments in which we will quantify the performance of the separable Doubly-Toeplitz masks for a greater variety of scenes with both programmable and fixed masks, exploring the limitations of the undistorted field-of-view achievable, as well as extending the applicability to bands beyond the visible spectrum.

In particular, it has been suggested that the diffraction could limit the performance of coded apertures for infrared imaging.^{14} The work presented here shows that the diffraction can be readily included in the decoding algorithms for separable Doubly-Toeplitz masks. More serious limiters, especially for the long-wave infrared, are likely to be the influence of nonencoded information from thermal self-emission of the camera body and drifts in the nonuniformity corrections for uncooled thermal-camera focal planes.

Future work will also explore how the separable-Doubly-Toeplitz masks can be combined with refractive elements, extending the super-resolution capabilities studied under the DARPA LACOSTE program.^{15}

The project has also identified other implementation-specific difficulties that need to be addressed in using SLMs for lensless imaging. Since it is still early in the development of the technology, rapid progress can be made. Once the device-specific sources of noise and error are mitigated, the next biggest improvement will come from developing a computationally efficient means for identifying mask candidates. Rather than generate random binary codes, then exhaustively test each one, a more-efficient approach is likely to be a genetic algorithm which evolves mask codes from the best currently known starting points.

Since Doubly-Toeplitz masks are continuous, having no isolated open sections, self-supporting nonprogrammable masks can be built in to the fuselages and carapaces of small unmanned vehicles with three-dimensional printing methods, reducing the time and cost of fielding new systems. If fabricated from material that blocks x-ray and gamma-ray radiations, then this would in turn enable extremely compact noncontact imaging in these wavelengths for medicine, defense, and homeland security.

## Acknowledgments

We are grateful for support for this work provided by the Office of Naval Research under contract N00014-10-C-0193.

## References

## Biography

**Michael J. DeWeert** is a BAE Systems Engineering Fellow and the chief technology officer at BAE Systems in Honolulu. He has over 25 years of experience in remote sensing research and development for intelligence, surveillance, and reconnaissance. His current responsibilities include technical oversight spectral imaging, LIDAR, interferometric remote-sensing systems, and modeling and simulation for maritime and terrestrial sensing. Currently, he is the principal investigator for projects developing advanced imaging technologies for long-standoff mine countermeasures.

**Brian P. Farm** has over 25 years of experience in modeling, algorithm/software development, and spatial and image data analyses. He is involved in algorithm/software research and development for data exploitation, real-time image analysis, and sensor-model adjustment and exploitation. He has worked on 3-D imaging LIDARs, developed registration and processing algorithms for imaging and interferometric systems, and developed information processing for novel, nontraditional imaging sensors, and created simulation systems to allow advanced algorithm testing on proposed new sensor systems.