**η**matrix for modeling the OTF in the imaging chain.

## 1.

## Introduction

Most photographs captured today are acquired with cameras that produce digital images, i.e., an image that is composed of an array of numbers that represent the brightness of each picture element, called a pixel (Fig. 1). But how are these numbers created and what influenced their values? What drives the pixel values away from the ideal values that would give us the best image quality? Camera designers struggle with these questions every day as they balance the desired image quality with design constraints, most notably size, weight, power, and cost.

The relationship between the final image quality and the camera design factors, such as detector size and optical quality, needs to be well characterized and understood in order to relate system requirements to the image quality required by the image user. This relationship needs to be characterized early in the design phase to prevent costly redesigns later in the development phase, especially when new technologies are being considered. Ideally, we would like to study the design options by conducting image evaluations using images captured from camera prototypes that cover every design option, but this is typically prohibited by the time and cost involved in building all the necessary hardware, so we need other design tools at our disposal.

A historical approach to creating simulated image examples was to create an optical lens that closely matched the predicted system-level image quality effects of the proposed camera. Images were captured using this lens and then used in image quality studies to predict the proposed camera’s performance. This approach greatly restricted the ability to perform design trades of the optical design because a representative lens would need to be created for each modification in the optical design trade. The other difficulty was determining how close the lens performance needed to match the predicted camera imaging performance such that the evaluation results would be valid, i.e., how good is good enough? For camera systems that will drive important decisions, it is critical that the image simulations do not differ in quality from the operational images that will drive those decisions.

With the advent of digital imaging and faster computers, processing techniques were advancing enough to allow accurate digital image simulations using mathematical models that encompass every step of the imaging process, i.e., the imaging chain.^{1}^{,}^{2} Each element of the imaging chain could be mathematically modeled and linked together to capture the interactions between the links. Ultimately, very accurate image simulations could be produced using the imaging chain models that were indistinguishable from the operational images captured when the camera was built from the same design. Image chain models have become invaluable at displaying image quality differences between camera designs that have the same top-level design parameters, e.g., the same camera optics size and sensor size, but subtle differences at the component-level designs that cannot be easily translated into the predicted image quality differences, e.g., optical quality and detector sensitivity. Figure 2 shows image simulations of two design options for a panchromatic remote sensing camera that has the same optics aperture size, focal length, and detector size but has different optical aberrations and detector sensitivities. Modeling the imaging chain has become a critical design tool to assess camera designs before hardware is built and prevent unwanted surprises after the camera is built and the first images are acquired.

## 2.

## Simple Imaging Chain Model

The fundamental links of the imaging chain for a camera system are radiometry, image formation, image sensing, processing, display, and interpretation, as shown in Fig. 3. At first glance, the imaging chain seems fairly simple, but as we dig deeper to develop mathematical models for each of these links, we can quickly get overwhelmed with the subtle complexities involved. It is very important to understand not only the role that each link plays in producing the final image quality, but also how the interaction between the links affects the image quality as well. Let us first look at the most basic imaging chain models for a simple digital camera system before focusing on a more detailed method for modeling the optical wavefronts in the imaging chain.

## 2.1.

### Radiometry

The imaging chain starts with modeling the radiometry, i.e., the electromagnetic energy that is the essence of creating the image. Imaging is fundamentally the act of detecting and measuring electromagnetic radiation in an attempt to understand a remote object or collections of objects. The electromagnetic spectrum ranges from gamma rays with a wavelength $\lambda $ in the picometers to radio waves with wavelengths of hundreds of kilometers. The shorter wavelength electromagnetic spectrum is detected via photonic interactions with matter and is often visualized as packets of energy called photons, where the longer wavelengths are detected with the aid of antennas and are often visualized as waves. In this paper, we are concerned with the range of the electromagnetic spectrum that includes the visible spectrum ($\lambda =400$ to 800 nm) that can be redirected with refractive or reflective optics and detected with bandgap semiconductors. The visible spectrum is a subset of the optical spectrum, which involves frequencies from ultraviolet up to the long-wave infrared (about $\lambda =10\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{nm}$ to 1 mm).

We are most interested in modeling the spectral radiance from the object of interest, ${L}_{\text{object}}(x,y,\lambda )$, but this is only one of the contributions to the spectral radiance of the entire scene, ${L}_{\text{scene}}(x,y,\lambda )$, that enters the camera aperture.^{3} Modeling the spectral radiance for the scene is very challenging given the complex variations and sources of the light energy that will arrive at the camera (Fig. 4). For example, the spectrum of the light changes dramatically in the imaging chain if the camera moves from sunlight illumination to indoor incandescent light. To further complicate the model for outdoor lighting, the radiometry is heavily dependent on the atmospheric conditions that the light travels though, which is constantly changing. The radiometry from indoor light is also complicated by the variety of light sources and the illumination condition from the light reflecting off walls and other objects. Probably, the best example of varying radiometric conditions for indoor lighting occurs when an image is captured with or without a camera flash (Fig. 5).

Various software programs have been created to assist modeling the complex scene spectral radiance. Atmospheric models, such as moderate resolution atmospheric transmission (MODTRAN), produced by Spectral Sciences Inc., Burlington, Massachusetts and the U.S. Air Force, are usually employed to model the atmospheric propagation of the electromagnetic radiation. The scene can be modeled using a physics-based radiometric scene generation tool that builds up an object from smaller facets, with each facet comprising its own material properties. An example of a radiometric scene generating tool is the digital imaging and remote sensing image generation (DIRSIG) model developed at the Rochester Institute of Technology.

## 2.2.

### Image Formation

The image formation element of the imaging chain starts with the spectral radiance from the scene, ${L}_{\text{scene}}(x,y,\lambda )$, entering the aperture of the camera. As the light waves propagate through the imaging elements to form the image on the sensor, we must consider the modifications to the spatial distribution of the light energy before we can calculate the resulting photons that strike each detector. If we assume that the optical system behaves as a linear shift-invariant system, then we can model the spatial modifications by understanding the effects on a single point of light, resulting in a point spread function (PSF).^{4}5.^{–}^{6} The resulting spatial modification to the spectral radiance in the imaging plane, ${L}_{\text{image}}(x,y,\lambda )$, can then be modeled by simply convolving the PSF for the system, ${\mathrm{PSF}}_{\text{system}}$, with scene spectral radiance, i.e.,

## (1)

$${L}_{\text{image}}(x,y,\lambda )={\int}_{-\infty}^{\infty}{\int}_{-\infty}^{\infty}{L}_{\text{scene}}(\alpha ,\beta ,\lambda ){\mathrm{PSF}}_{\text{system}}\phantom{\rule{0ex}{0ex}}(x-\alpha ,y-\beta ,\lambda )\mathrm{d}\alpha \text{\hspace{0.17em}}\mathrm{d}\beta $$## (3)

$${\mathrm{PSF}}_{\text{system}}(x,y,\lambda )={\mathrm{PSF}}_{1}(x,y,\lambda )*{\mathrm{PSF}}_{2}(x,y,\lambda )\phantom{\rule{0ex}{0ex}}*{\mathrm{PSF}}_{3}(x,y,\lambda )\dots .$$It is also very helpful to understand how the imaging system is modifying the contrast of the spatial frequencies in the scene by calculating the Fourier transform of the system PSF to give us the system transfer function. In Fourier space, the convolution operation becomes a multiplication operation, so the transfer function for the entire camera system, ${H}_{\text{system}}$, is obtained by simply multiplying together the individual transfer functions from each element in the imaging chain that contribute to the system transfer function, i.e.,

## (4)

$${H}_{\text{system}}({\xi}_{x},{\xi}_{y},\lambda )={H}_{1}({\xi}_{x},{\xi}_{y},\lambda ){H}_{2}({\xi}_{x},{\xi}_{y},\lambda ){H}_{3}({\xi}_{x},{\xi}_{y},\lambda )\dots ,$$## (5)

$$H({\xi}_{x},{\xi}_{y},\lambda )=\mathrm{MTF}({\xi}_{x},{\xi}_{y},\lambda ){e}^{i\mathrm{PTF}({\xi}_{x},{\xi}_{y},\lambda )}.$$The MTF shows us how the transfer function modifies the contrast of each spatial frequency and the PTF shows us how the transfer function modifies the phase of each spatial frequency.

A broader PSF translates into a transfer function that drops off more rapidly in the higher spatial frequencies, resulting in a blurrier image (Fig. 6). Thus, the PSF and its corresponding transfer function have a direct impact on the image quality, so it is very important to understand the elements in the imaging chain that contribute to them. For most camera systems, the biggest contributor to the system PSF is the optical PSF and its corresponding optical transfer function (OTF).

## 2.2.1.

#### Optical PSF and coherence

To visualize the image formation process of the optical PSF in the imaging chain, let us start with a broadband point source in object space. This point source could be radiating on its own or scattering light, but either way the point source is radiating spherical waves. A small solid angle patch of these spherical waves enters the entrance aperture of our imaging system. At this moment, if we were to stop time and investigate the field that exists at the entrance aperture from the point source, we would notice that the field is spatially correlated across the entrance aperture. This spatial correlation is required for the system to make a diffraction-limited PSF. If there is a loss of spatial correlation across the entrance aperture, then the PSF becomes wider, resulting in a blurry image. Propagation through the atmosphere or aberrations in the optics could cause this loss of spatial correlation. If we now look along the axis of propagation, we would notice a very short correlation length; this direction is often thought of as the temporal correlation length. This lack of temporal correlation is a direct result of the broadband nature of the point source. However, one can imagine that each wavelength creates a colocated PSF in the image plane whose width is determined by the wavelength. These colocated PSF’s then add as intensities, because of the temporal averaging of the detector to form one broad band PSF.

To understand the difference between coherent and incoherent imaging, we must bring in another point source. If both point sources are broadband, then they both have short correlation times. The measured intensity is the time average of the modulus squared of the field and all cross-terms from the two point sources average to zero because there is no correlation between the two point sources, resulting in the image of the two point sources adding as intensities. This type of imaging is, therefore, linear in intensity, i.e., the modulus squared of the field. If both of the point sources are monochromatic with the same wavelength, then the two point sources differ by no more than a relative phase. This time the interference pattern set up by the two fields is stationary, resulting in a distribution of bright and dark regions. In other words, the cross-terms do not average to zero. This type of imaging is, therefore, linear in field. In both cases, coherent and incoherent, it is possible to add the fields, perform a modulus square, and then do a time average to get the measured intensity. It just so happens that in the incoherent case, all the cross-terms average to zero, allowing one to just add the intensities.

A full image can be thought of as a collection of closely spaced point sources. If these point sources are uncorrelated, which is the case with solar illumination of an outdoor scene, then we will have incoherent imaging. If, however, there is a phase relationship between the point sources, which is the case when a scene is illuminated with a laser, then we have coherent imaging. It is important to remember that in both cases, the field from one point source is spatially coherent across the entrance aperture. The previous discussion of coherent and incoherent imaging has only scratched the surface of this rich and complex field; the interested reader is directed to Refs. 5, 7, and 8.

## 2.2.2.

#### Optical diffraction PSF and OTF

Let us go back to the entrance aperture and follow the spatially coherent section of a spherical field to the focal plane. Depending on the distance to the point source and, to a lesser extent, the size of the entrance aperture, the field at the entrance aperture may appear to be a plane wave, a parabolic wave, or a spherical wave. The optical system is designed to take the diverging wavefront at the entrance aperture and turn it into a converging spherical wave at the exit pupil. This converging spherical wave is called the Gaussian reference sphere. The center of the Gaussian reference sphere is on the focal plane and is called the Gaussian reference point. If the point sources in the object space are mapped to their corresponding Gaussian reference points, then we have what is called the Gaussian reference image. The Gaussian reference image is an imaginary perfect unaberrated image that does not have the blurring effect of the PSF and is the starting point for modeling the optical wavefronts in the imaging chain. Each point in the focal plane can be thought of as the center of a set of converging waves on a Gaussian reference sphere from an exit pupil, but each point may be the result of different views of the exit pupil. This effect can manifest itself in two ways, first as a warping of the Gaussian reference image and second as a spatially varying PSF. If the spatially varying PSF is small enough to be ignored, then the system is said to be a linear shift-invariant system. Another term that is often used is isoplanatic patch, which refers to the region of the image plan for which the spatial variability of the PSF is small enough to be considered constant.

If we calculate the Fresnel number to determine what fidelity of the diffraction theory is required to propagate the field at the exit pupil to the image plane, we would find that the Fresnel number is not much less than one, which is required in order to use the Fraunhofer diffraction theory. The diffraction calculation for a camera requires Fresnel diffraction but because the optical system has created a wavefront that is converging to a point in the image plane, the parabolic phase term in Fresnel diffraction is canceled with the parabolic phase of the converging wavefront. This, fortunately, leaves us with the mathematics of the Fraunhofer diffraction theory. It should be pointed out that our visualization is for converging spherical waterfronts, but our mathematics is for converging parabolic waves, which is a result of the paraxial approximation, i.e., considering the rays that are near the optical axis.

The converging field at the exit pupil has an amplitude that depends on the intensity of the point source in object space and on the complexities of the propagation from the source to the imaging system. Normally, the concept of radiometry is separated from the magnitude of the PSF. The field at the exit pupil is taken to be one such that the PSF from all point sources have the same magnitude. The radiometry of the scene is then applied by scaling the values of the Gaussian reference image. The converging field at the exit pupil also has phase aberrations relative to the perfect Gaussian reference sphere. These phase aberrations are called wavefront error (WFE).

As an example of the Fraunhofer diffraction PSF, let us look at a clear circular aperture of diameter $D$ with no WFE. The Fruanhofer diffraction pattern for light at wavelength $\lambda $ with incident electric field ${E}_{o}$ imaged with a lens of focal length $f$ will be

## (6)

$${\mathrm{PSF}}_{\mathrm{diff}\text{-}\mathrm{circ}}(r,\lambda )={\left[\frac{{E}_{o}\pi {D}^{2}}{2\lambda f}\right]}^{2}{\left[\frac{2{J}_{1}\left(\frac{\pi Dr}{\lambda f}\right)}{\frac{\pi Dr}{\lambda f}}\right]}^{2},$$The Fourier transform of the diffraction PSF gives the same result as autocorrelating of the aperture pattern to give us the diffraction OTF.

## (9)

$${H}_{\mathrm{diff}\text{-}\mathrm{circ}}(\rho ,\lambda )=\{\begin{array}{ll}\frac{2}{\pi}[{\mathrm{cos}}^{-1}(\frac{\rho}{{\rho}_{c}})-\frac{\rho}{{\rho}_{c}}\sqrt{1-{(\frac{\rho}{{\rho}_{c}})}^{2}}]& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{\rho}{{\rho}_{\mathrm{c}}}\le 1\\ 0& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{\rho}{{\rho}_{\mathrm{c}}}>1\end{array},$$## 2.2.3.

#### WFE and OTF

Although many simple imaging chain models only consider the aperture diffraction effects on the OTF, there are many other factors that will alter the functional form of the OTF. As discussed earlier, aberrations in the imaging elements will cause departures from the ideal spherical wavefronts in the optical system. Accurately modeling the WFE caused by these aberrations can be very complicated, but is essential in order to properly model the image quality of the camera. We will return to this point as the focus of Sec. 3.

The addition of the aberrations will result in a MTF that is equal to or lower than the diffraction MTF, i.e.,

## (11)

$${\mathrm{M}\mathrm{T}\mathrm{F}}_{\text{aberrated}}({\xi}_{x},{\xi}_{y})\le {\mathrm{M}\mathrm{T}\mathrm{F}}_{\mathrm{diff}}({\xi}_{x},{\xi}_{y}).$$The WFE caused by aberrations will, therefore, reduce the OTF and degrade the image quality with additional blurring (Fig. 9). It is common to characterize the WFE of an optical system by the root-mean-square (RMS) WFE, ${\mathrm{WFE}}_{\mathrm{RMS}}$, i.e., the statistical variance of the optical path difference over the entire wavefront. When the aberrations are not known, such as early in the design phase of an optical system, a random phase error can be used to model a nominal WFE produced by the aberrations and the optics manufacturing process. The optics degradation is modeled as an optical quality factor (OQF) transfer function that is multiplied with the diffraction transfer function to get the OTF. Hufnagel modeled the OQF transfer function for aberrations and high-frequency surface roughness as^{9}

## (12)

$${H}_{\mathrm{OQF}\text{-}\text{Hufnagel}}(\rho )={e}^{-4{\pi}^{2}{\mathrm{WFE}}_{\mathrm{RMS}}^{2}(1-{e}^{-4\frac{{\rho}^{2}}{{l}^{2}}})},$$^{10}and developed further by Barakat.

^{11}One key idea that is often overlooked is that the term ${e}^{-4({\rho}^{2}/{l}^{2})}$ represents a correlation function that may not be a Gaussian.

Although there is a correlation between ${\mathrm{WFE}}_{\mathrm{RMS}}$ and the reduction of the OTF, different aberrations at the same ${\mathrm{WFE}}_{\mathrm{RMS}}$ can have different effects on the resulting image quality, so it is very important to model the WFE from the actual aberrations if they are known. As an example, let us look at aberrations that cause a low spatial frequency WFE compared to aberrations that cause a high spatial frequency WFE. Figure 10 shows the distinct differences of the low-frequency and high-frequency aberration optics PTF and PSF even though they both have ${\mathrm{WFE}}_{\mathrm{RMS}}=0.2\lambda $. It is also important to note that different weightings between the types of optical aberrations, e.g., spherical aberration and astigmatism, can occur, which will result in the same RMS WFE but result in different optics MTFs and PSFs. This variability as well as the accuracy of the Hufnagel approximation is tighter for a high spatial frequency WFE than it is for a low spatial frequency WFE, as illustrated in Fig. 11. Figure 12 shows the different effects that the aberrations have on image quality even though they have the same ${\mathrm{WFE}}_{\mathrm{RMS}}$. It should be noted that these image examples did not have noise added or any image restoration processes applied to them.

## 2.3.

### Motion

The next consideration in the imaging chain is the relative motion between the camera and the scene that will cause additional blurring to the image. General motion transfer functions can be derived from

## (13)

$${H}_{\text{motion}}(\overrightarrow{\xi})=\frac{1}{{t}_{\mathrm{exp}}}{\int}_{0}^{{t}_{\mathrm{exp}}}{e}^{-2\pi i{\overrightarrow{x}}_{o}(t)\xb7\overrightarrow{\xi}}\mathrm{d}t,$$Smear is directional blurring caused by a constant linear motion, such as snapping a long exposure image from the window of a fast-moving car. If we let ${\overrightarrow{x}}_{o}(t)=vt$, where $v$ is the relative velocity, and evaluate Eq. (13), then we find the transfer function to be

## (14)

$${H}_{\text{smear}}(\xi )=\frac{\mathrm{sin}(\pi {d}_{\text{smear}}\xi )}{\pi {d}_{\text{smear}}\xi}{e}^{-\pi i{d}_{\text{smear}}\xi}\phantom{\rule{0ex}{0ex}}=\mathrm{sinc}({d}_{\text{smear}}\xi ){e}^{-\pi i{d}_{\text{smear}}\xi},$$The phase term ${e}^{-\pi i{d}_{\text{smear}}\xi}$ shifts the centroid of the image by half the smear distance. This phase term is normally ignored because it only represents the shift in the image pixels and does not affect image quality; hence, usually only the smear MTF as a sinc function is used in the modeling. If the smear direction is not along an axis, then a more general smear transfer function can be created by replacing ${d}_{\text{smear}}\xi $ with ${\overrightarrow{d}}_{\text{smear}}\xb7\overrightarrow{\xi}$. The smear PSF along the $x$-direction is modeled as

## (16)

$${\mathrm{PSF}}_{\text{smear}}(x,y)=\mathrm{rect}\left(\frac{x}{{d}_{\text{smear}}}\right)\delta (y),$$## (17)

$$\mathrm{rect}\left(\frac{x}{w}\right)=\{\begin{array}{cc}1& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}|x|<\frac{w}{2}\\ \frac{1}{2}& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}|x|=\frac{w}{2}\\ 0& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}|x|>\frac{w}{2}.\end{array}$$Note that convolving the smear PSF with the scene performs a uniform integration of the scene radiance over the local distance ${d}_{\text{smear}}\xi $.

Jitter is the result of very high-frequency vibrations [${f}_{\text{jitter}}\gg (1/{t}_{\mathrm{exp}})$] of the camera system. Jitter can be derived by visualizing the integration time being broken up into many very short subintegration times $\delta t$ such that the camera motion during the time $\delta t$ can be assumed stationary. These individual points and subintegration times can be placed into Eq. (13), which will return the transfer function that is the average of all of the spatial shifts. However, if the probability distribution is known or assumed for the spatial location of the points, then one can find the expected value of the average of the transfer function. Because of the form of Eq. (13), the expected value of the average of the transfer function is the Fourier transform of the probability distribution. This allows us to visualize the probability distribution as the PSF for jitter. Often jitter is modeled as a Gaussian probability distribution given by

## (18)

$${\mathrm{PSF}}_{\text{jitter}}(x,y)\phantom{\rule{0ex}{0ex}}=\frac{1}{2\pi {\sigma}_{x}{\sigma}_{y}\sqrt{1-{\rho}_{xy}^{2}}}{e}^{-\frac{1}{2(1-{\rho}_{xy}^{2})}[{\left(\frac{x}{{\sigma}_{x}}\right)}^{2}+{\left(\frac{y}{{\sigma}_{y}}\right)}^{2}-2{\rho}_{xy}(\frac{xy}{{\sigma}_{x}{\sigma}_{y}}\left)\right]},$$## (19)

$${H}_{\text{jitter}}({\xi}_{x},{\xi}_{y})={e}^{-2{\pi}^{2}({\xi}_{x}^{2}{\sigma}_{x}^{2}+{\xi}_{y}^{2}{\sigma}_{y}^{2}+2{\rho}_{xy}{\xi}_{x}{\xi}_{y}{\sigma}_{x}{\sigma}_{y})},$$## 2.4.

### Image Sensing

Modeling the sensor requires us to understand the impact that the sensor has on the spatial properties of the image as well as the electromagnetic energy reaching the detector. Let us first consider how a digital sensor alters the spatial properties of the image. The predominant effect in most sensors is the integration of the incident light across the aperture shape of each detector. For a sensor with rectangular detectors of size ${d}_{x}$ by ${d}_{y}$, this integration will cause a blurring modeled as

## (20)

$${\mathrm{PSF}}_{\mathrm{det}\text{-}\mathrm{ap}}(x,y)=\mathrm{rect}(\frac{x}{{d}_{x}},\frac{y}{{d}_{y}})=\mathrm{rect}\left(\frac{x}{{d}_{x}}\right)\mathrm{rect}\left(\frac{y}{{d}_{y}}\right),$$## (21)

$${H}_{\mathrm{det}\text{-}\mathrm{ap}}(\xi )=\mathrm{sinc}({d}_{x}{\xi}_{x},{d}_{y}{\xi}_{y})=\frac{\mathrm{sin}(\pi {d}_{x}{\xi}_{x})}{\pi {d}_{x}{\xi}_{x}}\frac{\mathrm{sin}(\pi {d}_{y}{\xi}_{y})}{\pi {d}_{y}{\xi}_{y}}.$$Other blurring effects need to be incorporated into the sensor model, such as carrier diffusion, but these will depend on the specific sensor design and architecture and will not be considered here for our simple imaging chain model.

The other predominant effect that needs to be modeled for a digital sensor is the sampling. The simplest sensor model can be given by the blurring of the optical image by the detector PSF, ${\mathrm{PSF}}_{\mathrm{det}}(x,y)$, followed by a sampling at each detector location (Fig. 14). For an $M\times N$ detector array with each detector a distance ${p}_{x}$ and ${p}_{y}$ from the next, we can model the detector blur and sampling on the incident radiance $L(x,y)$ as

## (22)

$${g}_{\text{detector}}(x,y)=\sum _{m=1}^{M}\sum _{n=1}^{N}[L(x,y)*{\mathrm{PSF}}_{\text{detector}}(x,y)]\delta (x-m{p}_{x})\delta (y-n{p}_{y}).$$Mathematically, ${g}_{\text{detector}}(x,y)$ is a distribution function because of the Dirac delta distribution in its definition. The operation of sampling is where our visualization transitions from continuous to discrete. An alternative representation of ${g}_{\text{detector}}(x,y)$ can be purely discrete, i.e.,

## (23)

$${{g}_{\text{detector}}(m,n)=L(x,y)*{\mathrm{PSF}}_{\text{detector}}(x,y)|}_{\begin{array}{l}x=m{p}_{x}\\ y=n{p}_{y}\end{array}}.$$The distances ${p}_{x}$ and ${p}_{y}$ are called the sampling pitch of the detector and can have a profound effect on the image quality. All spatial frequencies higher than the Nyquist frequency, defined by

for either direction will appear in the image as a lower alias spatial frequency. This effect is easily seen when the spacing between lines in the scene are scaled down in the image to a spacing smaller than twice the detector sampling pitch (Fig. 15). The sampling pitch of the detector, therefore, imposes a fundamental limitation on the image resolution by limiting the spacing between objects that can be resolved.Let us now consider the light energy that reaches each detector. For a detector with a spectral bandpass of ${\lambda}_{\mathrm{min}}$ to ${\lambda}_{\mathrm{max}}$, the radiant flux (watts) on the detector is given by

## (25)

$${\mathrm{\Phi}}_{\text{detector}}(x,y)=\frac{{A}_{\text{detector}}{A}_{\text{aperture}}}{{f}^{2}{(m+1)}^{2}}{\int}_{{\lambda}_{\mathrm{min}}}^{{\lambda}_{\mathrm{max}}}{\tau}_{\text{optics}}(\lambda ){L}_{\text{image}}(x,y,\lambda )\mathrm{d}\lambda ,$$## (26)

$${\mathrm{\Phi}}_{\text{detector}}(x,y)=\frac{\pi {d}^{2}}{4{f}_{/\#}^{2}}{\int}_{{\lambda}_{\mathrm{min}}}^{{\lambda}_{\mathrm{max}}}{\tau}_{\text{optics}}(\lambda ){L}_{\text{image}}(x,y,\lambda )\mathrm{d}\lambda .$$The digital imaging sensor model converts the spatially distributed light energy of the image into the digital counts that record the image. The most common digital sensors are composed of individual detectors that generate electrons from the incident light through the photoelectric effect; thus, the average number of photoelectrons generated at the detector during the exposure time ${t}_{\mathrm{exp}}$ is given by

## (27)

$${s}_{\text{detector}}(x,y)=\frac{\pi {d}^{2}{t}_{\mathrm{exp}}}{4{f}_{/\#}^{2}}{\int}_{{\lambda}_{\mathrm{min}}}^{{\lambda}_{\mathrm{max}}}\frac{\lambda}{hc}\eta (\lambda ){\tau}_{\text{optics}}(\lambda ){L}_{\text{image}}(x,y,\lambda )\mathrm{d}\lambda ,$$## (28)

$$\text{counts}(x,y)=\lfloor {s}_{\text{detector}}(x,y)\frac{{N}_{\mathrm{DR}}}{{N}_{\text{well}}}\rfloor ,$$Unfortunately, the digital count value of the image does not have a one-to-one relationship with the scene radiance, ${L}_{\text{scene}}(x,y,\lambda )$, due to the blurring caused by the PSF, the sampling, and the noise that is introduced. The noise introduces an undesired randomness to the brightness of the digital count value for each pixel with the primary noise contributors in most digital cameras coming from the photons’ noise, dark noise, and quantization noise. The photons arriving at the sensor do not arrive in a steady stream but in random fluctuations that follow a Poisson distribution, giving rise to the photon noise that increases with the square root of the signal intensity. The dark noise appears as fluctuations in the digital counts even when no signal is present. There are many causes for the dark noise, such as error in the analog-to-digital converter that creates the digital count value, so this value in the imaging chain model is captured from measured data for the sensor being modeled. Finally, we have quantization noise that arises from quantizing the signal electrons into integer digital count values, creating a randomness that follows a uniform distribution. Putting these noise sources together gives us the root-sum-square noise for the sensor electrons as

## (29)

$${\sigma}_{\text{noise}}(x,y)=\sqrt{{\sigma}_{\text{photon}}^{2}(x,y)+{\sigma}_{\text{quantization}}^{2}(x,y)+{\sigma}_{\text{dark}}^{2}(x,y)}$$## (30)

$$=\sqrt{{s}_{\text{detector}}(x,y)+\frac{1}{12}{\left(\frac{{N}_{\text{well}}}{{N}_{\mathrm{DR}}}\right)}^{2}(x,y)+{\sigma}_{\text{dark}}^{2}(x,y)}.$$An interesting detail regarding the photon noise is that the arriving photons arrive following a Poisson distribution but then drive a binomial probability distribution that has a probability of success given by the quantum efficiency. It just so happens that when a Poisson process drives a binomial process, the result is also a Poisson process, so it is, therefore, still valid to use a Poisson distribution for the photoelectrons.

## 2.5.

### Additional Considerations When Putting the Imaging Chain Elements Together

Figure 16 illustrates the fundamental steps of putting together all of the elements of a simple imaging chain model to simulate the image capture portion of the imaging chain. Image processing, such as calibration, is generally applied to the output of the image capture portion of the imaging chain to generate the final image produced by the camera. The image simulations can then be processed to enhance the image quality, displayed, and evaluated by image users to complete the imaging chain for assessing the image quality of the camera design.

It is easy for many developers to oversimplify the imaging chain models, but these simplifications can lead to misleading results when the evaluations are conducted using the image simulations. For example, many simple image chain models will only consider the PSF of the Airy pattern, shown in Fig. 7, in their system. Although it is the optics diffraction that sets the fundamental resolution limit of the system PSF, using the Airy pattern as the system PSF overlooks the blurring effects caused by the other elements of the imaging chain that can significantly impact the resulting image quality. Also, the Airy pattern is the circular aperture diffraction PSF for a single wavelength of light and ignores the integration of the PSF of the spectral bandpass of the imaging system. If the camera integrates the light over the spectral range ${\lambda}_{\mathrm{min}}$ to ${\lambda}_{\mathrm{max}}$, then the radiant flux, ${\varphi}_{\text{image}}(x,y)$, reaching the image plane is a function of ${L}_{\text{scene}}(x,y,\lambda )$ by

## (31)

$${\varphi}_{\text{image}}(x,y)\propto {\int}_{{\lambda}_{\mathrm{min}}}^{{\lambda}_{\mathrm{max}}}{L}_{\text{scene}}(x,y,\lambda )*{\mathrm{PSF}}_{\text{system}}(x,y,\lambda )\mathrm{d}\lambda ,$$As an illustration, let us consider a gray world where the spectral radiance of the scene, ${L}_{\text{scene}}(x,y,\lambda )$, is flat across the spectral range ${\lambda}_{\mathrm{min}}$ to ${\lambda}_{\mathrm{max}}$ and the spectral response of the system is flat across the spectrum as well. In this special case, the integration over the spectral bandpass will simply average the spatial scaling of the PSF over the range ${\lambda}_{\mathrm{min}}$ to ${\lambda}_{\mathrm{max}}$. Even if we only look at the diffraction PSF and corresponding OTF for a circular aperture over the visible spectral bandpass ${\lambda}_{\mathrm{min}}=0.4\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$ and ${\lambda}_{\mathrm{max}}=0.8\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$, as shown in Fig. 17, we see differences from the single wavelength model. Note that the side rings of the Airy pattern smooth out after the PSF is integrated over the spectral bandpass.

When building an imaging chain model for a digital imaging system, particular attention also needs to be paid to the relationship between the optics and the digital sensor. We saw earlier that each imposes a fundamental limit on the detail that can be imaged with a digital camera and the image quality is dependent on the relationship between these limiting factors. The diffraction from the optical aperture imposes a limit on the highest spatial frequency that can be captured [Eq. (10)] and likewise the detector sampling imposes a limit from the Nyquist frequency [Eq. (24)]. The ratio of the sampling frequency and the optical cutoff frequency is a design parameter $Q$ defined as^{12}

## (32)

$$Q\equiv \frac{\frac{1}{p}}{\frac{1}{\lambda {f}_{/\#}}}=\frac{2{\xi}_{N}}{{\rho}_{\mathrm{c}}}=\frac{\lambda {f}_{/\#}}{p}.$$Historically, $Q$ has been defined as the ratio of the sampling frequency to the optical cutoff frequency rather than the Nyquist frequency to the optical cutoff frequency, so the detector and optics resolution limits are equal when ${\xi}_{N}={\rho}_{\mathrm{c}}$ and $Q=2$. The resolutions of digital cameras that are designed with $Q>2$ are fundamentally limited by the optics diffraction, whereas digital cameras designed with $Q<2$ are fundamentally limited by the detector sampling.

It may seem intuitive that digital cameras should all be designed with $Q=2$, but other factors, such as signal-to-noise ratio and motion blurring, will influence the final image quality and $Q=2$ may not be the best design when all image quality factors are considered (Fig. 18). Most digital cameras will produce brighter, sharper images when $Q<2$, with $Q$ typically in the range $0.5<Q<1.5$. It is important to note that $Q$ only compares the resolution limits between the optics and the detector sampling and does not take into consideration the system transfer function. The impact of the OTF is a significant factor in the system transfer function when $Q\gtrsim 0.1$ (Fig. 19), so for most imaging systems, we must understand the effect of the optical WFE on the system transfer function, which will influence the image quality.

## 3.

## Wavefront Error and Optical Transfer Function Generation

## 3.1.

### Problem Motivation

In the previous section, we only considered a very simple form of the OTF that included the diffraction from a clear circular aperture with a generalized WFE that can be modeled as an OQF, although the importance of properly modeling the WFE was illustrated. If we simply extend the aperture model to include a circular central obscuration of diameter ${D}_{\mathrm{obs}}$, as we would see in a Cassegrain telescope, then the aperture diffraction PSF will now have the form

## (33)

$${\mathrm{PSF}}_{\mathrm{obs}}(r,\lambda )={\left[\frac{{E}_{o}\pi {D}^{2}}{4\lambda f}\frac{1}{(1-{\u03f5}^{2})}\right]}^{2}{[\frac{2{J}_{1}\left(\frac{\pi Dr}{\lambda f}\right)}{\left(\frac{\pi Dr}{\lambda f}\right)}-\frac{2{\u03f5}^{2}{J}_{1}\left(\u03f5\frac{\pi Dr}{\lambda f}\right)}{\left(\u03f5\frac{\pi Dr}{\lambda f}\right)}]}^{2},$$The diffraction OTF for a circular aperture with a central obscuration is given by

## (35)

$${H}_{\mathrm{diff}\text{-}\mathrm{circ}\text{-}\mathrm{obs}}(\rho ,\lambda )=\frac{2}{\pi}\frac{A+B+C}{1-{\u03f5}^{2}},$$## (36)

$$A={\mathrm{cos}}^{-1}\left(\frac{\rho}{{\rho}_{c}}\right)-\frac{\rho}{{\rho}_{c}}\sqrt{1-{\left(\frac{\rho}{{\rho}_{c}}\right)}^{2}},$$## (37)

$$B=\{\begin{array}{ll}{\u03f5}^{2}\left[{\mathrm{cos}}^{-1}\right(\frac{\rho}{\u03f5{\rho}_{c}})-\frac{\rho}{\u03f5{\rho}_{c}}\sqrt{1-{\left(\frac{\rho}{\u03f5{\rho}_{c}}\right)}^{2}}]& \text{for}\text{\hspace{0.17em}\hspace{0.17em}}0\le \frac{\rho}{{\rho}_{\mathrm{c}}}\le \u03f5\\ 0& \text{for}\text{\hspace{0.17em}\hspace{0.17em}}\frac{\rho}{{\rho}_{\mathrm{c}}}>\u03f5\end{array},$$## (38)

$$C=\{\begin{array}{ll}-\pi {\u03f5}^{2}& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}0\le \frac{\rho}{{\rho}_{\mathrm{c}}}\le \frac{1-\u03f5}{2}\\ -\pi {\u03f5}^{2}+\u03f5\text{\hspace{0.17em}}\mathrm{sin}(\varphi )+\frac{\varphi}{2}(1+{\u03f5}^{2})-(1-{\u03f5}^{2}){\mathrm{tan}}^{-1}\left[\frac{1+\u03f5}{1-\u03f5}\text{\hspace{0.17em}}\mathrm{tan}\right(\frac{\varphi}{2}\left)\right]& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1-\u03f5}{2}\le \frac{\rho}{{\rho}_{\mathrm{c}}}\le \frac{1+\u03f5}{2}\\ 0& \text{for}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{\rho}{{\rho}_{\mathrm{c}}}>\frac{1+\u03f5}{2}\end{array},$$## (39)

$$\varphi ={\mathrm{cos}}^{-1}\left[\frac{1+{\u03f5}^{2}-4{\left(\frac{\rho}{{\rho}_{c}}\right)}^{2}}{2\u03f5}\right].$$Note that adding the central obscuration to the clear circular aperture has the effect of moving the concentric rings in the PSF closer toward the center while more energy is moved from the central peak to the outer rings (Fig. 20). Also note that the contrast of the mid-spatial frequencies in the diffraction OTF is reduced as the central obscuration increases in size, but the cutoff frequency remains the same.

Although these models for the diffraction OTF are useful for simple imaging chain models, they are still inadequate for more complex optical designs, especially when we need to consider all of the potential sources for WFEs.

The primary driver for the shape of the PSF is the shape of the exit pupil. Historically and in many elementary texts on optics, the shape of the exit pupil is an unobstructed circle. However, in reality, the exit pupil has obstructions and in more complex systems, like segmented arrays and multitelescope systems, the overall shape of the exit pupil is very complex. A secondary effect on the shape of the PSF is that of the WFE. WFE is the spatial variation about the optimal optical path length over the exit pupil. Again, historically and in many elementary texts on optics, WFE would be modeled with a linear combination of Zernike polynomials, which are an infinite set of orthogonal functions over the unit disk. Obviously, this orthogonality is lost on exit pupils with complex shapes. One option is to find a new set of orthogonal functions for the new shapes.^{13} This works well for a small yet common set of shapes. Another option is to accept the loss of orthogonality and develop a method that allows the usage of any collection of basis functions, including the influence functions of adaptive optics on an arbitrary exit pupil geometry.

The method presented below is a generalized method for tracking WFE in a consistent manner that is applicable to arbitrary pupil geometries. WFE will be visualized as a vector in an infinite-dimensional vector space; however, calculations will be done in a finite-dimensional vector space. This vector space visualization will conveniently allow for the separation of WFE into multiple subspaces that can be aligned with the specifications for a future system. Calculations for RMS WFE, line-of-sight (LOS), and other macroscopic parameters are reduced to fast matrix calculations.

## 3.2.

### OTF Generation

The impulse response function $h(\overrightarrow{x})$, which is a precursor to the PSF, can be calculated by evaluating the Fraunhofer diffraction integral.

## (40)

$$h(\overrightarrow{x})={\int}_{\mathcal{D}}P(\overrightarrow{u}){e}^{-\frac{2\pi}{\lambda f}i(\overrightarrow{u}\xb7\overrightarrow{x})}{\mathrm{d}}^{2}\overrightarrow{u},$$Before we go much farther, it is important that we become explicitly clear about our choice of sign conventions. In general, sign conventions and handedness of axes are completely arbitrary, but we must be consistent and when obtaining data from an external source, one must take great care to understand the hidden assumptions with sign conventions.

We start off with the definition of the optical axis that runs along the center line of the optical system in the direction of propagation. This will be the $z$ axis of our coordinate system. In other words, the $z$ axis points from the exit pupil to the focal plane. We will use a right-handed coordinate system, which will dictate the order of the $x$ and $y$ axes in the focal plane, as shown in Fig. 21. The labels $u$ and $v$ have been used for the axes in the exit pupil. The polar angle $\theta $ will be positive when measured from the $x$ axis in the direction the fingers of the right hand point when the thumb of the right hand is pointing in the positive $z$ direction. This is the normal convention for the polar angle $\theta $.

The complex scalar field at the Gaussian reference sphere at the exit pupil is called the pupil function, $P(\overrightarrow{u})$, and is used in Eq. (40) to calculate the impulse response function. The pupil function can be broken into two real functions: the aperture function, $0\le A(\overrightarrow{u})\le 1$, and the WFE function, $\varphi (\overrightarrow{u})\in \mathit{R}$. The aperture function, which is often a binary function but, in principle, can take on values between 0 and 1, represents the relative spatial distribution of the transmission of the exit pupil. The WFE function represents deviations in the wavefront from the ideal Gaussian wavefront.

In this documentation, we have picked $\varphi (\overrightarrow{u})>0$ to be in the direction along positive $z$. This choice forces us to place a minus sign in the exponent of Eq. (40). This choice supports the visualization that the normal vector of the tip-tilt WFE shown in Fig. 21 points toward the center of the PSF.

The Fraunhofer diffraction integral of Eq. (40) has the same form as a Fourier transform integral, provided that we define the forwarded transform as

In practice, one will be creating sampled images of the exit pupil and then using a discrete Fourier transform (DFT) algorithm to calculate the PSF and OTF with the DFT implemented using a fast Fourier transform (FFT).

To avoid aliasing, one must sample the pupil function with sufficient zero padding. For this discussion, we will assume that a square grid is used and we will only talk about the linear scaling along one side. Let $w$ be the physical width of the sampling grid and ${R}_{\text{in}}$ be the enclosing radius of the pupil function. For a simple round optic, the enclosing radius is the radius of the optic, but for a complex multisegmented system, it is the radius of the smallest circle that can contain the full system. The enclosing radius must satisfy the following constraint:

If the square grid is sampled with an $n\times n$ array, the sample spacing is $w/n$. The left most axis on Fig. 22 shows the sampling of one side of this grid. The pupil function, $P({x}_{i})={P}_{i}$, has been sampled at

## (44)

$${x}_{\mathtt{i}}=-\frac{w}{2}+\mathtt{i}\frac{w}{n},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{where}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathtt{i}\in (\mathrm{0,1},2,\dots ,n-1).$$An arbitrary choice has been made regarding the sampling that is done from $-(w/2)$ to $(w/2)-(w/n)$. We could have easily sampled from $-(w/2)+(w/n)$ to $w/2$, which would give us consistent results. The current choice is driven by using a computer programming language that references arrays with a zero-based index system. In other words, $\mathtt{A}[\mathtt{0}]$ is the first element of the array and $\mathtt{A}[\mathtt{n}-\mathtt{1}]$ is the last. For an array with an even number of points and length $n$, the center value of the function is at array index $\mathtt{A}[\mathtt{n}/\mathtt{2}]$.

Another requirement for sampling the pupil function is that the smallest structure of the pupil function is adequately sampled. This includes the physical structure in the aperture function and the variations in the WFE function. If the size of the resulting grid gets too large for the computing resources of the day, one can use the techniques of Eikenberry et al. or Ransom et al.^{14}^{,}^{15}

Let $\mathrm{FFT}({P}_{\mathtt{i}})={h}_{\mathtt{j}}$ represent the action of a DFT on the ${P}_{\mathtt{i}}$ data to create the ${h}_{\mathtt{j}}$ data. One should keep in mind that every data value ${h}_{\mathtt{j}}$ is a function of all the input data values ${P}_{\mathtt{i}}$. The PSF data can be calculated from the sampled pupil function, ${P}_{\mathtt{i}}$, as follows:

where the PSF data are sampled at## (46)

$${x}_{\mathtt{j}}=-\frac{n\lambda f}{2w}+\mathtt{j}\frac{\lambda f}{w},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{where}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathtt{j}\in (\mathrm{0,1},2,\dots ,n-1),$$## (47)

$$\mathrm{FFT}({\widehat{g}}_{\mathtt{j}})=\mathrm{FFT}({\mathrm{PSF}}_{\mathtt{j}})\times \mathrm{FFT}({g}_{\mathtt{j}})={\mathrm{OTF}}_{\mathtt{k}}\times {G}_{\mathtt{k}},$$## (48)

$${\widehat{g}}_{\mathtt{j}}={\mathrm{FFT}}^{-1}[{\mathrm{OTF}}_{\mathtt{k}}\times \mathrm{FFT}({g}_{\mathtt{j}})],$$## (49)

$${\nu}_{\mathtt{k}}=-\frac{w}{2\lambda f}+\mathtt{k}\frac{w}{n\lambda f},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{where}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathtt{k}\in (\mathrm{0,1},2,\dots ,n-1).$$When modeling ${\mathrm{OTF}}_{\mathtt{k}}$, it is often convenient to consider the MTF and PTF, which are related as

The magnitude of the OTF, is the MTF which is scaled such that the zero frequency is equal to one, ${\mathrm{MTF}}_{n/2}=1$. The PSF and OTF are a Fourier Transform pair,

and the PSF is the inverse Fourier transform of the OTF.For a simple round optic of radius $R$ with WFE, one can expand the phase of the pupil function out into a set of orthogonal functions. A common choice for these orthogonal functions are Zernike polynomials.

## (53)

$$\varphi (\overrightarrow{u})=\varphi (r,\theta )=\varphi (\rho R,\theta )=\sum _{\mathtt{j}=0}^{\infty}{a}_{\mathtt{j}}{Z}_{\mathtt{j}}(\frac{r}{R},\theta ).$$Two key benefits of using a set of orthogonal functions to describe WFE are that it is easy to describe the key aberrations with just a few terms and the orthogonality relationship of the basis functions makes it easy to calculate macroscopic parameters with just the coefficients of the basis functions. The orthogonality relationship is as follows:

## (54)

$${\int}_{0}^{2\pi}{\int}_{0}^{1}{Z}_{\mathtt{j}}(\rho ,\theta ){Z}_{{\mathtt{j}}^{\prime}}(\rho ,\theta )\rho \mathrm{d}\rho \text{\hspace{0.17em}}\mathrm{d}\theta =\pi {\delta}_{\mathtt{j},{\mathtt{j}}^{\prime}}.$$The coefficients that describe a given WFE, $\varphi (\overrightarrow{u})$, can be calculated by taking advantage of the orthogonality relationship.

## (55)

$${a}_{\mathtt{i}}=\frac{1}{\pi}{\int}_{0}^{2\pi}{\int}_{0}^{1}{Z}_{\mathtt{i}}(\rho ,\theta )\varphi (\rho R,\theta )\rho \mathrm{d}\rho \text{\hspace{0.17em}}\mathrm{d}\theta .$$Once the coefficients $({a}_{0},{a}_{1},{a}_{2},,{a}_{n})$ have been calculated, we can easily calculate the following integral:

## (56)

$$\langle {\varphi}^{2}\rangle ={\int}_{0}^{2\pi}{\int}_{0}^{R}\varphi {(r,\theta )}^{2}r\mathrm{d}r\text{\hspace{0.17em}}\mathrm{d}\theta ,$$Therefore, if we know the coefficients $({a}_{0},{a}_{1},{a}_{2},,{a}_{n})$, calculating the value for $\langle {\varphi}^{2}\rangle $ is as simple as summing the square of the coefficients. Another very important way of looking at this is if we think of the WFE as a vector made up of the coefficients

then the integral that provides the value of $\langle {\varphi}^{2}\rangle $ can be thought of as the dot product of $\overrightarrow{\varphi}$ with itself.The rest of this work describes how to do these types of integrals when the optic is arbitrarily complex and the orthogonality relationship is no longer valid. This is done by careful book keeping of the WFE vector spaces and the calculation of a metric we will call the eta matrix, $\mathit{\eta}$. This matrix will allow us to keep the visualization that WFE is a vector in a vector space and will be used to calculate the new dot product in this vector space.

## 4.

## Exit Pupil Modeling for Arbitrarily Complex Systems

We are going to consider a general hierarchy of nested apertures that make up the exit pupil. Each aperture can have a WFE that will affect the total system WFE. Also, each aperture can have child apertures, which are completely contained within the domain of the parent aperture. This hierarchy can be thought of as a simple tool to model a complex set of system specs, or as a method of producing a spatially correlated WFE over a large multisegmented aperture array, or even for modeling a multitelescope system. An example of an arbitrary nested aperture system is shown in Fig. 23. In Sec. 4.7, we will present a full numerical example for a nonrealistic multisegmented, multitelescope. This example will contain the elements needed to demonstrate many of the applications of the $\eta $ matrix method.

## 4.1.

### Aperture Nesting Rules

In general, each aperture has an enclosing disk or other enclosing shape that contains the unobscured domain of the basis functions. For this discussion, we will assume that the enclosing shape is a disk with radius of ${R}_{\overrightarrow{q}}$, where $\overrightarrow{q}$ is the vector-index designator for the given aperture. The enclosing disk is the domain over which the WFE basis functions are defined. If one is using Zernike polynomials, then the basis functions are also orthogonal over the domain of the disk. However, if the domain of the basis functions is not the shape of a disk, then one can think of the radius ${R}_{\overrightarrow{q}}$ as a scale parameter of the enclosing shape. For example, if the basis functions are square image files, then the enclosing shape is a square and the scaling parameter ${R}_{\overrightarrow{q}}$ can be the distance from the center to one of the corners.

The domain over which the $\overrightarrow{q}$’th aperture is defined will be referred to as ${\mathcal{D}}_{\overrightarrow{q}}$. This region is shown in Fig. 23 as the gray regions within the supporting circles. The gray regions get darker as there are more nested apertures. The $\overrightarrow{q}$ vector indexing method helps visualize the tree structure of the nested apertures. Figure 24 shows the nesting hierarchy for the general hierarchy of nested apertures shown in Fig. 23. The $\overrightarrow{q}$ vector is as long as the deepest trail on the graph from the root to the leaf. Each level corresponds to a dimension in the $\overrightarrow{q}$ vector. The $\overrightarrow{q}$ vector is the trail from the root to the node in question with zero padding. The union of the domains of the leafs of the aperture tree corresponds to the final pupil mask as shown in the inset of Fig. 23.

Apertures can be combined into a hierarchal arrangement to aid in modeling a complex system. Any aperture can have children apertures, which can be used to build up a complex hierarchy. The following rules apply to hierarchical aperture arrangements:

• The domain of child apertures is completely contained within the domain of their parent aperture.

• No other overlapping of apertures is allowed.

• WFE of the exit pupil is the sum of the WFE on all of the apertures at a given point in the exit pupil. One must work with the optical designer to map the WFE to the exit pupil.

• The exit pupil mask is the union of all of the domains of the leaf nodes of the aperture tree. An aperture tree is shown in Fig. 24.

## 4.2.

### Calculation of the $\mathit{\eta}$ Matrix

The goal of the $\mathit{\eta}$ matrix is to replace time-consuming numerical integrations of the product of two functions defined over the domain of the pupil with a quick matrix multiplication.

## (60)

$${\int}_{\mathcal{D}}\varphi (\overrightarrow{r})\psi (\overrightarrow{r})\mathrm{d}A={\overrightarrow{\varphi}}^{\top}\mathit{\eta}\overrightarrow{\psi}.$$This matrix multiplication can be thought of as the dot product between the two vectors $\overrightarrow{\varphi}$ and $\overrightarrow{\psi}$ in a nonorthogonal space described by the metric $\mathit{\eta}$, where the elements of $\overrightarrow{\varphi}$ and $\overrightarrow{\psi}$ are the expansion coefficients of the functions $\varphi (\overrightarrow{r})$ and $\psi (\overrightarrow{r})$. If we first look at a simple single aperture system where the WFE is expanded on the following basis functions, ${B}_{i}(\overrightarrow{x})$, then Eq. (60) can be rewritten as

## (61)

$${\int}_{\mathcal{D}}\varphi (\overrightarrow{r})\psi (\overrightarrow{r})\mathrm{d}A={\int}_{\mathcal{D}}[\sum _{\mathtt{i}=1}^{\infty}{a}_{\mathtt{i}}{B}_{\mathtt{i}}(\overrightarrow{r})][\sum _{\mathtt{j}=1}^{\infty}{b}_{\mathtt{j}}{B}_{\mathtt{j}}(\overrightarrow{r})]\mathrm{d}A.$$It is now possible to switch the order of the integration and summations to arrive at

## (62)

$${\int}_{\mathcal{D}}\varphi (\overrightarrow{r})\psi (\overrightarrow{r})\mathrm{d}A=\sum _{\mathtt{i}=1}^{\infty}\sum _{\mathtt{j}=1}^{\infty}{a}_{\mathtt{i}}{b}_{\mathtt{j}}{\int}_{\mathcal{D}}{B}_{\mathtt{i}}(\overrightarrow{r}){B}_{\mathtt{j}}(\overrightarrow{r})\mathrm{d}A.$$This seemingly trivial algebraic operation has profound computational effects. The integral in Eq. (62) can be calculated for all combinations of $\mathtt{i}$ and $\mathtt{j}$ with just the knowledge of the geometry of the aperture, $\mathcal{D}$. This will become the $\mathit{\eta}$ matrix. The instance of the WFE is contained in the expansion coefficients ${a}_{\mathtt{i}}$ and ${b}_{\mathtt{j}}$, which become the WFE vectors $\overrightarrow{\varphi}$ and $\overrightarrow{\psi}$. Basically, we have separated the geometry of the aperture from the instance of the wavefront error.

In the most general case, the $\mathit{\eta}$ matrix is the overlap matrix between every combination of two basis functions for every combination of two apertures over the domain these two apertures make in the exit pupil. The $\mathit{\eta}$ matrix is calculated by evaluating the following integral for each matrix element:

## (63)

$${\eta}_{{\overrightarrow{q}}_{a},{\overrightarrow{q}}_{b}:\mathtt{i},\mathtt{j}}={\int}_{{\mathcal{D}}_{\text{children}}}{B}_{{\overrightarrow{q}}_{a}:\mathtt{i}}\left(\frac{\overrightarrow{r}}{{R}_{{\overrightarrow{q}}_{a}}}\right){B}_{{\overrightarrow{q}}_{b}:\mathtt{j}}\left(\frac{\overrightarrow{r}}{{R}_{{\overrightarrow{q}}_{b}}}\right)\mathrm{d}A,$$• For ${\overrightarrow{q}}_{a}=\langle \mathrm{1,0},0\rangle $ and ${\overrightarrow{q}}_{b}=\langle \mathrm{1,0},0\rangle $, the domain of integration is all of the gray boxes in Fig. 24.

• For ${\overrightarrow{q}}_{a}=\langle \mathrm{1,0},0\rangle $ and ${\overrightarrow{q}}_{b}=\langle \mathrm{1,1},0\rangle $, the domain of integration is all of the gray boxes with $\overrightarrow{q}=\langle \mathrm{1,1},n\rangle $, where $n\in (\mathrm{1,2},3)$, in Fig. 24.

• For ${\overrightarrow{q}}_{a}=\langle \mathrm{1,1},0\rangle $ and ${\overrightarrow{q}}_{b}=\langle \mathrm{1,2},0\rangle $, the domain of integration is the empty set because these two $\overrightarrow{q}$’s have no common terminating branches.

${B}_{{\overrightarrow{q}}_{a}:\mathtt{i}}(\overrightarrow{r}/{R}_{{\overrightarrow{q}}_{a}})$ in Eq. (63) is the $i$’th basis function on the ${\overrightarrow{q}}_{a}$’th aperture that is used to describe the WFE of the optical system.

As a result of there being no restrictions on the basis functions used and the fact that the same type of basis functions can be used on parent apertures along with children apertures, the $\mathit{\eta}$ matrix will not have full rank and there will be redundancy in how a given WFE can be represented on the basis functions. However, along the diagonal, there are blocks that have full rank.

## 4.3.

### Derivation of Support Matrices

We need to derive some supporting matrices to assist in the calculation of the system RMS WFE, global pointing plane (GPP), and LOS. To do this, one needs to calculate vector representations for the simple functions 1, $x$, and $y$.

As a result of using multiple complete sets of basis functions over the same region of space, there is some freedom in how one can expand any given function on the basis functions. When modeling a system, one has the freedom to pick appropriate basis functions. If the basis functions are polynomial, then the lowest-order basis functions are most likely of the form ${c}_{0}$, ${c}_{1}x$, and ${c}_{2}y$, or some linear combination thereof, where ${c}_{i}$ are constants. In this situation, the vector expressions for the functions 1, $x$, and $y$ will evaluate to a finite number of nonzero terms. If the basis functions are not polynomial, then the expansion of the functions 1, $x$, and $y$ may not terminate. Due to this condition, it is suggested that at least three basis functions are polynomial and of the form ${c}_{0}$, ${c}_{1}x$, and ${c}_{2}y$. The remainder of the basis functions can be anything else that best matches the needs of the modeling the WFE of the system.

There is no loss in generality by requiring three basis functions for each leaf node to be the first three Zernike polynomials. If one chooses to use a different set for the first three basis functions, one is required to derive the final expressions shown in this section.

An arbitrary function $f(\overrightarrow{r})$ can be written in terms of a linear combination of basis functions ${B}_{{\overrightarrow{q}}_{a}:\mathtt{i}}(\overrightarrow{r})$ over the domain of the ${\overrightarrow{q}}_{a}$’th aperture as

## (64)

$$f(\overrightarrow{r})=\sum _{\mathtt{i}=0}^{\infty}{b}_{{\overrightarrow{q}}_{a}:\mathtt{i}}{B}_{{\overrightarrow{q}}_{a}:\mathtt{i}}\left(\frac{\overrightarrow{r}-{\overrightarrow{x}}_{{\overrightarrow{q}}_{a}}}{{R}_{{\overrightarrow{q}}_{a}}}\right).$$When solving for the expansion coefficients ${b}_{{\overrightarrow{q}}_{a}:\mathtt{i}}$ by multiplying by the $\mathtt{j}$’th basis function, one can integrate over the full domain, not just the domain as masked by the ${\overrightarrow{q}}_{a}$ aperture. This, of course, assumes that the function $f(\overrightarrow{r})$ is known outside of the pupil domain and everywhere within the domain of the basis functions.

If Zernike polynomials are used for the basis functions, the expansion coefficients can be calculated with

## (65)

$${b}_{{\overrightarrow{q}}_{a}:\mathtt{i}}=\frac{1}{\pi}{\int}_{0}^{2\pi}{\int}_{0}^{1}f({\overrightarrow{x}}_{{\overrightarrow{q}}_{a}}+{R}_{{\overrightarrow{q}}_{a}}\overrightarrow{\rho}){Z}_{{\overrightarrow{q}}_{a}:\mathtt{i}}(\rho ,\theta )\rho \mathrm{d}\rho \text{\hspace{0.17em}}\mathrm{d}\theta ,$$## (66)

$${\mathbf{1}}_{{\overrightarrow{q}}_{a}}={\langle \mathrm{1,0},\mathrm{0,0},,0\rangle}^{\mathsf{T}},$$## (67)

$${\mathit{X}}_{{\overrightarrow{q}}_{a}}={\langle {x}_{{\overrightarrow{q}}_{a}},0,\frac{{R}_{{\overrightarrow{q}}_{a}}}{2},0,,0\rangle}^{\mathsf{T}},$$## (68)

$${\mathit{Y}}_{{\overrightarrow{q}}_{a}}={\langle {y}_{{\overrightarrow{q}}_{a}},\frac{{R}_{{\overrightarrow{q}}_{a}}}{2},\mathrm{0,0},,0\rangle}^{\mathsf{T}}.$$The point $({x}_{{\overrightarrow{q}}_{a}},{y}_{{\overrightarrow{q}}_{a}})$ is the center of the ${\overrightarrow{q}}_{a}$’th aperture and ${R}_{{\overrightarrow{q}}_{a}}$ is the radius of the disk that encloses the aperture. Because the $\mathit{\eta}$ matrix of Eq. (63) contains the information for all the apertures and basis functions, the ${\mathbf{1}}_{{\overrightarrow{q}}_{a}}$, ${\mathit{X}}_{{\overrightarrow{q}}_{a}}$, and ${\mathit{Y}}_{{\overrightarrow{q}}_{a}}$ vectors are combined to form full system vectors, for example,

## (69)

$$\mathit{X}={\langle {\mathit{X}}_{{\overrightarrow{q}}_{1}}^{\mathsf{T}},{\mathit{X}}_{{\overrightarrow{q}}_{2}}^{\mathsf{T}},\dots ,{\mathit{X}}_{{\overrightarrow{q}}_{Q}}^{\mathsf{T}}\rangle}^{\mathsf{T}},$$## 4.4.

### Projecting Out the Mean and the GPP

Given an arbitrary WFE function $\varphi (\overrightarrow{r})$ defined over the union of all of the domains of the leafs of the pupil mask tree, the mean of the WFE can be found by integrating $\varphi (\overrightarrow{r})$ over the valid domain and then dividing by the area of the domain, giving

## (70)

$$\langle \varphi (\overrightarrow{r})\rangle =\frac{{\int}_{{\mathcal{D}}_{{\cup}_{a}{\overrightarrow{q}}_{a}}}\varphi (\overrightarrow{r})\mathrm{d}A}{{\int}_{{\mathcal{D}}_{{\cup}_{a}{\overrightarrow{q}}_{a}}}\mathrm{d}A}.$$Our goal is to find a matrix operation that accomplishes this task. By using Eq. (60) and the vector $\mathbf{1}$, we can write down the integration as a matrix operation.

## (71)

$$\langle \varphi (\overrightarrow{r})\rangle =\frac{{\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\overrightarrow{\varphi}}{{\mathbf{1}}^{T}\mathit{\eta}\mathbf{1}}={K}_{\mu}\overrightarrow{\varphi}.$$This leads us to two matrices, one that calculates the mean, ${\mathit{K}}_{\mu}$, and one that projects the mean out of the original WFE vector, ${\mathit{P}}_{\mu}$.

## (72)

$${\mathit{K}}_{\mu}=\left(\frac{1}{{\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}}\right){\mathbf{1}}^{\mathsf{T}}\mathit{\eta},$$In addition to projecting out the mean, another common operation is calculating and projecting out the best-fit plane to the WFE. This best-fit plane is called the GPP. A normal vector to the GPP that intersects the optical axis points to the location of the maximum of the PSF in the image plane. This location will be called the LOS. GPP is the global piston, tip, tilt (PTT) of a system.

We first start with a least-squares fit cost function for the GPP.

## (74)

$$S(\alpha ,\beta ,\gamma )=\sum _{q=1}^{Q}{\int}_{{\mathcal{D}}_{q}}{[{\varphi}_{q}(x,y)-\alpha x-\beta y-\gamma ]}^{2}\mathrm{d}A.$$We proceed in the normal fashion by setting the partial derivatives to zero and then solving for $\alpha $, $\beta $, and $\gamma $. During the mathematical manipulation, one will come across terms of the following form, which can be rewritten in a matrix form.

## (75)

$$\sum _{q=1}^{Q}{\int}_{{\mathcal{D}}_{q}}x{\varphi}_{q}(x,y)\mathrm{d}A=\sum _{q=1}^{Q}{\overrightarrow{\mathit{X}}}_{q}^{\mathsf{T}}{\mathit{\eta}}_{q}{\overrightarrow{\varphi}}_{q}={\overrightarrow{\mathit{X}}}^{\mathsf{T}}\mathit{\eta}\overrightarrow{\varphi},$$## (76)

$$\sum _{q=1}^{Q}{\int}_{{\mathcal{D}}_{q}}xy\mathrm{d}A=\sum _{q=1}^{Q}{\overrightarrow{\mathit{X}}}_{q}^{\mathsf{T}}{\mathit{\eta}}_{q}{\overrightarrow{\mathit{Y}}}_{q}={\overrightarrow{\mathit{X}}}^{\mathsf{T}}\mathit{\eta}\overrightarrow{\mathit{Y}},$$## (77)

$$\left(\begin{array}{c}{\mathit{X}}^{\mathsf{T}}\mathit{\eta}\\ {\mathit{Y}}^{\mathsf{T}}\mathit{\eta}\\ {\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\end{array}\right)\overrightarrow{\varphi}=\left(\begin{array}{ccc}{\mathit{X}}^{\mathsf{T}}\mathit{\eta}\mathit{X}& {\mathit{X}}^{\mathsf{T}}\mathit{\eta}\mathit{Y}& {\mathit{X}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}\\ {\mathit{X}}^{\mathsf{T}}\mathit{\eta}\mathit{Y}& {\mathit{Y}}^{\mathsf{T}}\mathit{\eta}\mathit{Y}& {\mathit{Y}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}\\ {\mathit{X}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}& {\mathit{Y}}^{\mathsf{T}}\mathbf{\eta}\mathbf{1}& {\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}\end{array}\right)\left(\begin{array}{c}\alpha \\ \beta \\ \gamma \end{array}\right).$$It is now possible to write down the matrices for calculating GPP and projecting out the GPP.

## (78)

$${\mathit{K}}_{\mathrm{\Gamma}}={\mathit{M}}^{-1}\left(\begin{array}{l}{\mathit{X}}^{\mathsf{T}}\mathit{\eta}\\ {\mathit{Y}}^{\mathsf{T}}\mathit{\eta}\\ {\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\end{array}\right),$$Finally, the ${\mathit{P}}_{\mathrm{\Gamma}}$ matrix, which is used to project out the GPP, can be calculated with

## (79)

$${\mathit{P}}_{\mathrm{\Gamma}}=\mathit{I}-\mathcal{L}{\mathit{K}}_{\mathrm{\Gamma}}=\mathit{I}-(\mathit{X}\vdots \mathit{Y}\vdots \mathbf{1}){\mathit{K}}_{\mathrm{\Gamma}},$$The coefficients for the GPP can be calculated by applying the ${\mathit{K}}_{\mathrm{\Gamma}}$ matrix on the WFE vector, $\overrightarrow{\varphi}$.

## (80)

$$\overrightarrow{\mathrm{\Gamma}}=\left(\begin{array}{c}\alpha \\ \beta \\ \gamma \end{array}\right)={\mathit{K}}_{\mathrm{\Gamma}}\overrightarrow{\varphi}.$$The WFE about the GPP is obtained by subtracting the GPP from the full WFE function, or in matrix terms, the WFE vector can have the GPP projected out, where we let ${\overrightarrow{\varphi}}_{\mathrm{\Gamma}}$ be the WFE vector with the GPP projected out, i.e.,

## 4.5.

### Generalized Projection Operators

In general, one can project one subspace into another provided that there is some overlap between the subspaces. An excellent example is the case of adaptive optics, where a set of influence functions are used to back out a measured WFE. Let ${\widehat{\overrightarrow{\varphi}}}_{1}$ of length $n$ be the measured WFE, which is expanded out on the basis functions ${B}_{1}(\overrightarrow{x})$. This subspace will be called ${\mathcal{S}}_{1}$. Let ${\widehat{\overrightarrow{\varphi}}}_{2}$ of length $m$ be the induced WFE from the influence functions ${B}_{2}(\overrightarrow{x})$ in subspace ${\mathcal{S}}_{2}$. The hat above the vectors is used to denote that these vectors are not the full length that corresponds to the dimensions of the $\mathit{\eta}$ matrix. The influence functions are just another set of basis functions. The full WFE vectors are

## (82)

$${\overrightarrow{\varphi}}_{1}=\left(\begin{array}{c}{\widehat{\overrightarrow{\varphi}}}_{1}\\ \dots \\ {\overrightarrow{0}}_{2}\end{array}\right)\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\overrightarrow{\varphi}}_{2}=\left(\begin{array}{c}{\overrightarrow{0}}_{1}\\ \dots \\ {\widehat{\overrightarrow{\varphi}}}_{2}\end{array}\right),$$## (83)

$$\mathit{\eta}=\left(\begin{array}{cc}{\widehat{\mathit{\eta}}}_{\mathrm{1,1}}& {\widehat{\mathit{\eta}}}_{\mathrm{1,2}}\\ {\widehat{\mathit{\eta}}}_{\mathrm{2,1}}& {\widehat{\mathit{\eta}}}_{\mathrm{2,2}}\end{array}\right).$$It should be pointed out that in this example, the two subspaces ${\mathcal{S}}_{1}$ and ${\mathcal{S}}_{2}$ make up the whole $\eta $ matrix, but this idea can easily be extended without modifying the mathematics when there is an additional subspace ${\mathcal{S}}_{3}$, which holds everything else. One just needs to be careful indexing into the $\eta $ matrix and the WFE vectors.

The derivation of the projection operator that will project ${\widehat{\overrightarrow{\varphi}}}_{1}$ into subspace ${\mathcal{S}}_{2}$ is easily derivable by looking at the functional form of the WFE. We need to find the best coefficients for ${\widehat{\overrightarrow{\varphi}}}_{2}$ that can be used to describe ${\widehat{\overrightarrow{\varphi}}}_{1}$.

## (84)

$$\sum _{\mathtt{j}=1}^{n}{\widehat{\overrightarrow{\varphi}}}_{1(\mathtt{j})}{B}_{1(\mathtt{j})}(\overrightarrow{x})\approx \sum _{\mathtt{i}=1}^{m}{\widehat{\overrightarrow{\varphi}}}_{2(\mathtt{i})}{B}_{2(\mathtt{i})}(\overrightarrow{x}),$$## (85)

$$\sum _{\mathtt{j}=1}^{n}{\widehat{\overrightarrow{\varphi}}}_{1(\mathtt{j})}{\int}_{\mathcal{D}}{B}_{1(\mathtt{j})}(\overrightarrow{x}){B}_{2(\mathtt{k})}(\overrightarrow{x})\mathrm{d}{\overrightarrow{x}}^{2}\approx \sum _{\mathtt{i}=1}^{m}{\widehat{\overrightarrow{\varphi}}}_{2(\mathtt{i})}{\int}_{\mathcal{D}}{B}_{2(\mathtt{i})}(\overrightarrow{x}){B}_{2(\mathtt{k})}(\overrightarrow{x})\mathrm{d}{\overrightarrow{x}}^{2},$$## (86)

$${\mathit{\eta}}_{\mathrm{2,1}}{\widehat{\overrightarrow{\varphi}}}_{1}\approx {\mathit{\eta}}_{\mathrm{2,2}}{\widehat{\overrightarrow{\varphi}}}_{2},$$## (87)

$${\widehat{\overrightarrow{\varphi}}}_{2}^{\prime}={\mathit{\eta}}_{\mathrm{2,2}}^{-1}{\mathit{\eta}}_{\mathrm{2,1}}{\widehat{\overrightarrow{\varphi}}}_{1}={\widehat{\mathit{P}}}_{1\to 2}{\widehat{\overrightarrow{\varphi}}}_{1},$$## (88)

$${\mathit{P}}_{1\to 2}=\left(\begin{array}{ll}\mathbf{0}& {\widehat{\mathit{P}}}_{1\to 2}\\ \mathbf{0}& \mathit{I}\end{array}\right).$$This can be done, in general, where there are many subspaces. One just has to keep track of where the elements of ${\widehat{\mathit{P}}}_{1\to 2}$ are mapped into the larger matrix. $\mathit{I}$ is required to preserve the components in ${\mathcal{S}}_{2}$ and ensure that ${\mathit{P}}_{1\to 2}$ behaves like a projection operator.

Back to the adaptive optics example, if we can find the best instance of the influence functions that can cancel the input measured WFE, we will have optimality decreased the total system WFE. Therefore, using what was just derived above, the optimal setting for the influence functions is

## (89)

$${\widehat{\overrightarrow{\varphi}}}_{2}^{\prime}=-{\mathit{\eta}}_{\mathrm{2,2}}^{-1}{\mathit{\eta}}_{\mathrm{2,1}}{\widehat{\overrightarrow{\varphi}}}_{1}=-{\widehat{\mathit{P}}}_{1\to 2}{\widehat{\overrightarrow{\varphi}}}_{1}.$$Notice the minus sign, which is needed to ensure that the influence functions cancel the incoming WFE and do not amplify it. In Sec. 4.7.2, we will look at an example and point out some options available while modeling a system.

## 4.6.

### System Parameters

Once the $\mathit{\eta}$ matrix and the supporting projection matrices have been calculated, one can start calculating system-level parameters with fast matrix operations.

## 4.6.1.

#### RMS WFE

One of the most popular metrics calculated for an optical system is the RMS WFE. For a simple round unobscured optic, where the WFE is represented by Zernike polynomials, the RMS WFE can be calculated by taking the square root of the sum of the squares of the Zernike polynomial coefficients. However, for a more complex system with obscured optics and when other basis functions are used, this calculation involves performing a numerical integration over the WFE function over the domain of the aperture. If one needs to do Monte Carlo simulations over many wavefront configurations, this numerical integration can become a limiting calculation hurdle.

By using the projection matrices derived in Sec. 4.3, one may write down matrix expressions for RMS WFE relative to mean wavefront and GPP.

1. RMS: This is the normal RMS that is often used for a monolith system. This is the RMS of the WFE after the mean has been removed. In a monolith system, any residual tip-tilt is simply thought of as an LOS offset and is not expected to change with time. The tip-tilt that does change with time is called image motion and is often modeled as smear and/or jitter.

## (90)

$$\mathrm{RMS}(\overrightarrow{\varphi})=\sqrt{\langle {[\varphi (\overrightarrow{r})-\langle \varphi (\overrightarrow{r})\rangle ]}^{2}\rangle}$$2. RMS GPP: This is the RMS WFE with the best-fit plane removed. This RMS is rarely used in monolith systems, but becomes very important in multitelescope and segmented systems. In an actively controlled multitelescope or segmented system, the random motion of the segments can cause an effective image motion that is separate from the bus motion.

## (93)

$${\mathrm{RMS}}_{\mathrm{GPP}}(\overrightarrow{\varphi})=\sqrt{\langle \varphi (\overrightarrow{r})-\alpha x-\beta y-\gamma \rangle}$$where the parameters, $\alpha $, $\beta $, and $\gamma $ are picked to minimize the ${\mathrm{RMS}}_{\mathrm{GPP}}(\overrightarrow{\varphi})$, which was done in the derivation of ${\mathit{P}}_{\mathrm{\Gamma}}$.## (94)

$$=\sqrt{\frac{{\overrightarrow{\varphi}}^{\mathsf{T}}{\mathit{P}}_{\mathrm{\Gamma}}^{\mathsf{T}}\mathit{\eta}{\mathit{P}}_{\mathrm{\Gamma}}\overrightarrow{\varphi}}{{\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}}},$$

For faster computations, both RMS calculations can be done with fewer operations by noting that $A={\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}$, where $A$ is the area of the full aperture, and a transformed $\eta $ can be precalculated, ${\mathit{\eta}}_{\mathrm{\Gamma}}={\mathit{P}}_{\mathrm{\Gamma}}^{\mathsf{T}}\mathit{\eta}{\mathit{P}}_{\mathrm{\Gamma}}$, resulting is the following expression for RMS WFE relative to GPP.

## 4.6.2.

#### LOS

The LOS is defined as the location of the maximum of the PSF. This is also the intersection of a normal originating from the GPP at the optical axis with the focal plane. The LOS can be thought of as a 2-D vector.

## (96)

$$\overrightarrow{L}=\frac{\lambda}{2\pi}\left(\begin{array}{ccc}f& 0& 0\\ 0& f& 0\end{array}\right)\overrightarrow{\mathrm{\Gamma}}$$## (97)

$$=\frac{\lambda}{2\pi}\left(\begin{array}{ccc}f& 0& 0\\ 0& f& 0\end{array}\right){\mathit{K}}_{\mathrm{\Gamma}}\overrightarrow{\varphi},$$## 4.6.3.

#### Correlated WFE vector

There are times when you know the coefficients for GPP, $\overrightarrow{\mathrm{\Gamma}}$, and would like to know the correlated WFE vector that would produce the given GPP without adding any RMS WFE. This is called correlated because the PTT of all of the leaf node apertures move in unison to create the desired GPP. The correlated WFE vector is

The subscript GPP is used to distinguish this WFE vector from the WFE vector ${\overrightarrow{\varphi}}_{\mathrm{\Gamma}}$, which is the WFE relative to GPP. The WFE relative to the GPP can be thought of as a n uncorrelated WFE.

## 4.7.

### Extreme Example

The example shown in this section is an unrealistic multitelescope system with each subtelescope consisting of three hexagonal segments. This system has been picked such that it is possible to show many of the features of the $\mathit{\eta}$ matrix methods. This example, shown in Fig. 25(a), contains 13 apertures, 4 parent apertures, which are outlined with dashed circles, and 9 child apertures, which are the 9 hexagonal elements. The dots on the hexagonal elements are the locations of simple Gaussian influence functions.

If the system in Fig. 25(a) is only in the initial concept stage there will be many unknown system parameters, however this will not stop the questions of image quality and overall performance. To precede one must make resalable assumptions. This process will inform the community and help with the creation of system requirements. For this investigation, each of the hexagonal segments will have PTT control and random PTT noise. Each group of three hexagonal segments make up one telescope that is combined with the other two. This can result in the PTT of each multitelescope subsystem. The final combiner optic, which is modeled via the largest enclosing parent aperture, will have aberrations and, in this case, also contain the influence functions. It is obvious that the real complexities of this system are quite involved, but from an initial image science point of view, each image point is made from a converging wavefront that can be represented as coming from an imaginary exit pupil, which is an effective focal length away. Therefore, provided it is possible to make reasonable assumptions on the form and statistical nature of the WFE, it is possible to bound the possible performance and help set requirements for the system concept.

The first thing that needs to be done is to calculate the $\mathit{\eta}$ matrix, which can be done by evaluating Eq. (63) for all of the basis functions and apertures in the given problem. It is not required, but it is convenient to create the smallest possible dimensional $\mathit{\eta}$ matrix. For example, in a situation where you are given some specifications that require a limit of a given RMS WFE on the following basis functions, $({f}_{1},{f}_{2},\dots ,{f}_{n})$, and a subset of these basis functions along with other basis functions are also required to model some external effects, like atmospheric turbulence, the $\mathit{\eta}$ matrix that is created should only have one row-column that corresponds to the basis functions that are common to both subspaces. The process for finding the lowest-dimensional $\mathit{\eta}$ matrix can be as simple as generating a list of all possible basis function/aperture combinations required and then eliminating the duplicates and keeping track of the index mappings back into all of the subspaces. Another way to think about subspaces is that subspaces are the links between a set of basis functions and a group of apertures, like an $n$ to $m$ link table in a database.

In this example, 14 groups of basis functions are used. The basis function groups include sets of Zernike polynomials, like $\mathtt{j}\in (\mathrm{0,1},2)$, $\mathtt{j}\in (\mathrm{3,4},5)$, and $\mathtt{j}\in (\mathrm{6,11},\mathrm{17,24},\mathrm{18,13},9)$, Gaussian influence functions at all the locations which are shown as dots in Fig. 25, and nine instances of a power spectrum density wavefront, one for each hexagonal aperture. When these groups of basis functions are combined with the different combinations of apertures, it results in 17 subspaces that can be used to model the dynamic PTT of the segments of the system, low spatial frequency variations of the apertures or the adaptive optics along with other effects. Anyone of these subspaces could have a temporal dependence, for example, the low spatial frequency could have a time dependence resulting from temperature fluctuations.

The $\mathit{\eta}$ matrix for this example is shown in the right part of Fig. 25, where the white cells represent a value greater than zero, the black cells are less than zero, and the gray cells are identically zero.

## 4.7.1.

#### Dynamic PTT WFE

If the physical structure of the system will result in high-frequency vibrations of the piston, and tip and tilt of the nine hexagonal mirror segments, one must be able to simulate the possible effects. When real PTT test data or simulated data from a structural engineer are provided for the system, one must be able to separate out the dynamical parts that drive the WFE from the dynamical parts that drive image motion or jitter. Otherwise, the image scientist must generate reasonable PTT data based on RMS WFE and image motion requirements. However, either way, a time-dependent PTT WFE vector is created, which has a form similar to

## (99)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}})={\langle \mathrm{0,0},\dots ,{\varphi}_{1}^{\mathsf{T}}({t}_{\mathtt{i}}),{\varphi}_{2}^{\mathsf{T}}({t}_{\mathtt{i}}),\dots ,{\varphi}_{9}^{\mathsf{T}}({t}_{\mathtt{i}}),\dots ,0\rangle}^{\mathsf{T}},$$The time dependence of ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}})$ can be broken into two components: the correlated component, ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{C}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})$, and the uncorrelated component, ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{U}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})$. The correlated component represents the motion of the GPP, which is the average direction of the wavefront that determines the location of the PSF in the imaging plane. The uncorrelated component is the deviation from this average wavefront that determines the variation in shape of the PSF that is not driven by the geometry of the aperture. During an integration time with random PTT motion, one can imagine the maximum of the PSF randomly moving around on the image plane. This motion should be considered image motion or jitter. If you were to ride along with the maximum of the PSF, you would see the PSF changing shape as a function of time. This dynamic shape change should be considered to be resulting from WFE.

The two components of ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}})$ can be calculated with

## (100)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{U}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})={\mathit{P}}_{\mathrm{\Gamma}}{\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}}),$$## (101)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{C}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})=\mathcal{L}{\mathit{K}}_{\mathrm{\Gamma}}{\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}}).$$The operation of applying ${\mathit{K}}_{\mathrm{\Gamma}}$ returns the $\overrightarrow{\mathrm{\Gamma}}({t}_{\mathtt{i}})$ vector as shown in Eq. (80), which is three parameters for the GPP as a function of time. The application of $\mathcal{L}$ takes the GPP vector and creates a full WFE vector. The correlated component, ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{C}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})$, produces all of the LOS variation and the uncorrelated component produces all of the RMS WFE. A frame out of an animation is shown in Fig. 26, which shows the RMS WFE, the LOS, and the wavefront on the aperture.

If one is given a specification for RMS WFE, ${\mathrm{RMS}}_{\mathrm{PTT}}$ for dynamical PTT, and one for image motion possibly in the form of jitter, $\sigma $, then one can generate Monte Carlo data that match these specifications in the following manner:

1. Generate random time series for ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}})$ and $\overrightarrow{\mathrm{\Gamma}}({t}_{\mathtt{i}})$ from whatever method matches your physical situation.

2. Create the uncorrelated vector, ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{U}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})$, and scale to the required RMS WFE.

where ${s}_{\mathrm{U}}$ is the required scaling derived from## (102)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{U}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})={s}_{\mathrm{U}}{P}_{\mathrm{\Gamma}}{\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}}),$$## (103)

$${\mathrm{RMS}}_{\mathrm{PTT}}={s}_{\mathrm{U}}\sqrt{\frac{1}{n}\sum _{\mathtt{i}=1}^{n}\frac{{\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}^{\mathsf{T}}({t}_{\mathtt{i}}){P}_{\mathrm{\Gamma}}^{\mathsf{T}}\mathit{\eta}{P}_{\mathrm{\Gamma}}{\overrightarrow{\mathrm{\Phi}}}_{\mathrm{PTT}}({t}_{\mathtt{i}})}{{\mathbf{1}}^{\mathsf{T}}\mathit{\eta}\mathbf{1}}}.$$3. Create the LOS profile by scaling the correlated ${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{C}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})$ vector.

where ${s}_{\mathtt{C}}$ is the required scaling derived from## (104)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{C}\text{-}\mathrm{PTT}}({t}_{\mathtt{i}})={s}_{\mathrm{C}}\mathcal{L}\overrightarrow{\mathrm{\Gamma}}({t}_{\mathtt{i}}),$$where the function $\mathcal{J}[{\overrightarrow{x}}_{\mathtt{i}}]$ will map a path into an image motion metric.## (105)

$$\sigma ={s}_{\mathrm{C}}\mathcal{J}\left[\frac{\lambda}{2\pi}\right(\begin{array}{ccc}f& 0& 0\\ 0& f& 0\end{array})\overrightarrow{\mathrm{\Gamma}}({t}_{\mathtt{i}})],$$

The time to do this type of analysis and Monte Carlo simulation is significantly reduced by using the $\mathit{\eta}$ matrix to preform these operations. Without the $\mathit{\eta}$ matrix, each term in the RMS time series above would require fitting a plane to the WFE and a numerical integration over the domain for the aperture. With the $\mathit{\eta}$ matrix, all possible elementary numerical integration combinations have already been done and all one needs to do is fast matrix multiplications.

## 4.7.2.

#### Moving wavefront error between subspaces

As was pointed out in Sec. 4.2, the $\mathit{\eta}$ matrix most likely will not be of full rank. Because this wavefront vector space is overcomplete, it is possible to move the expression of the wavefront from one subspace to another. A common example of this is in adaptive optics, where a measured wavefront is reduced by applying a matching but negative wavefront. We are not going to talk about how one would model a wavefront sensing system or model the error in the application of the adaptive optics basis functions. However, we will show how the $\mathit{\eta}$ matrix methodology can help in separating ideas and the calculation of macroscopic parameters.

The following example will be for a time instance; one can easily add temporal dynamics by using similar methods as shown in the previous section. The total wavefront of a system is the simple vector sum of all of the wavefront subspaces.

## (106)

$$\overrightarrow{\mathrm{\Phi}}={\overrightarrow{\varphi}}_{1}+{\overrightarrow{\varphi}}_{2}+\dots +{\overrightarrow{\varphi}}_{n}.$$These subspaces could represent any WFE source that is included in your modeling of the system. Let the sum of a subset of these vector spaces be ${\overrightarrow{\varphi}}_{m}$, which represents the wavefront that is measured using a wavefront sensing method. If the wavefront sensing method is blind to the GPP, then one needs to break ${\overrightarrow{\varphi}}_{m}$ into the correlated and uncorrelated pieces.

## (107)

$${\overrightarrow{\varphi}}_{m}={\mathit{P}}_{\mathrm{\Gamma}}{\overrightarrow{\varphi}}_{m}+\mathcal{L}{\mathit{K}}_{\mathrm{\Gamma}}{\overrightarrow{\varphi}}_{m}.$$The measurement of ${\mathit{P}}_{\mathrm{\Gamma}}{\overrightarrow{\varphi}}_{m}$ will have some residual error, $\overrightarrow{\u03f5}$, such that the final measured wavefront is

## (108)

$${\tilde{\overrightarrow{\varphi}}}_{m}={\mathit{P}}_{\mathrm{\Gamma}}{\overrightarrow{\varphi}}_{m}-\overrightarrow{\u03f5}.$$Using the projection operator derived in Sec. 4.5, it is possible to map ${\tilde{\overrightarrow{\varphi}}}_{m}$ into another vector space that contains the basis functions for the adaptive optics influence functions.

## (109)

$${\overrightarrow{\varphi}}_{\mathrm{ao}}=-{\mathit{P}}_{m\to \mathrm{ao}}{\tilde{\overrightarrow{\varphi}}}_{m}.$$If one wants to assume that the application of the influence functions does not affect the GPP, then one could project out the GPP by applying the ${\mathit{P}}_{\mathrm{\Gamma}}$ projection operator to ${\overrightarrow{\varphi}}_{\mathrm{ao}}$. One can also add the noise associated with the application of the influence functions by creating an error vector ${\overrightarrow{\u03f5}}_{\mathrm{ao}}$. The final wavefront vector for the system is as follows:

## (110)

$${\overrightarrow{\mathrm{\Phi}}}_{\mathrm{ao}}={\overrightarrow{\varphi}}_{1}+{\overrightarrow{\varphi}}_{2}+\dots +{\overrightarrow{\varphi}}_{n}+{\overrightarrow{\varphi}}_{\mathrm{ao}}+{\overrightarrow{\u03f5}}_{\mathrm{ao}},$$As a final visual example, the top four subspaces shown in Fig. 27 have been applied to the system and assumed to be measured with no error. This measured WFE is then mapped into the influence function basis space using the methods previously discussed. The influence functions are then used to cancel the measured WFE. The system with the original WFE and the reduced WFE are shown in Fig. 28 along with cross-section plots of the MTF and density plots of the PTF. Because of the sparse nature of the geometry of the aperture, the MTF drops much faster than the filled aperture as shown in the top right of Fig. 9. The application of the influence functions improves the MTF but has a considerable effect on the PTF shown in the last row of Fig. 28. The PSF of the system before and after the application of the influence functions is shown in Fig. 29 along with a simple image simulation.

## 5.

## Summary

Modeling the imaging chain has proven to be an invaluable tool for optical designers to assess design trades and their impact on the final image quality. A tutorial for modeling the imaging chain of a digital camera system was provided to give an overview of the concept and the mathematics associated with a simple model for digital cameras. Accurately modeling the OTF in the imaging chain is critical for understanding the relationship between the WFE and the final image quality. Unfortunately, modeling the pupil function for the OTF calculation of an optical system can prove to be the most challenging step in the imaging chain model for dynamically changing complex optical systems, such as segmented or sparse aperture systems. Modeling the OTF over the range of dynamic wavefront instances to ascertain RMS WFE, LOS, and GPP can prove challenging and laborious. We introduced a novel approach that simplifies modeling the pupil function WFE in the imaging chain model. With the calculation of an $\eta $ matrix, the resulting WFE, LOS, and GPP calculations are greatly simplified with fast matrix calculations. No longer is it necessary to perform costly numerical integrations every time the instance of WFE changes. We demonstrate the application of the $\eta $ matrix to provide the dynamical changing WFE for a segmented sparse aperture.

## Acknowledgments

We would like to acknowledge our great appreciation to our colleagues in the Imaging Science group at Exelis for the many useful and engaging conversations while developing the image chain modeling concepts.

## References

## Biography

**Robert D. Fiete** is the director of research and development at Exelis Geospatial Systems. He received his BS in physics and math from Iowa State University and his MS and PhD in optical sciences from the University of Arizona. Over the past 30 years, he has developed imaging chain models to assess the image quality of imaging system designs. He is a senior member of OSA and SPIE as well as a fellow of SPIE.

**Bradley D. Paul** is a senior image scientist at Exelis. He did undergraduate work at Miami University in Oxford, Ohio. He received his PhD in physics from JILA at the University of Colorado in Boulder, Colorado. After biking around the United States and Canada, he worked for a start-up company assisting in the construction of a holographic printer. He joined Eastman Kodak in 2004, where he has been developing new methods of analyzing electro-optical imaging systems.