## 1

## -INTRODUCTION

The images recorded by a telescope are degraded by aberrations. These aberrations can be due to atmospheric turbulence and to distortions of the optical system. Whatever their origins, they lead to phase variations in the pupil plane, which severely reduce the optical transfer function (OTF). The problem is to estimate the phase of the wavefront to correct the images from these distortions. Many wavefront sensing techniques have been proposed but few can be used with extended scenes. One of them is the phase diversity. This technique, first proposed by Gonsalves[l] in 1978, has been significantly developed during the last ten years. It has been used successfully by many authors to determine aberrations[19, 2, 3] and also to restore the quality of images, like solar imaging through turbulence[4, 5]. This technique uses a low-cost, optically simple wavefront sensor but it requires a numerical processing to restore the unknowns from the images. The conventional estimation found in the literature is based on a joint estimation of the aberrations and the observed object. The reconstruction of these parameters is an ill-posed inverse problem and must be regularized. Several method have been proposed, in the literature, to this end. Concerning the aberrations, there is, at least, an implicit regularization by expanding the phase on a finite linear combination of basis functions. Furthermore, in the case of imaging through turbulence, a prior on the turbulent phase is derived from Kolmogorov statistics[6]. For regularizing the object, several approaches have been proposed: some authors use a low-pass noise filter[4, 2], some impose a sieve on the object[5] and recently a Tikhonov regularization on the object was proposed[8]. All these solutions require the tuning of regularization parameters, which must be done by hand, in a joint estimation. When the problem is to estimate the aberrations only, the restoration of the object is not interesting and because of the choice of regularization parameters, not easy. Furthermore, in a joint estimation, the ratio of the number of unknowns (aberrations + object) to the number of data does not tend towards zero so this joint estimation does not have good asymptotic properties. The aim of this article is to solve these problems by proposing a robust estimator derived in a Bayesian framework that restores the sole interesting parameters: the aberrations. It is obtained by integrating the observed object out of the problem. This novel method is called marginal estimator and is based on a Maximum A Posteriori approach. This estimator has good asymptotic properties and allows the unsupervised estimation of the noise variance and of the regularization parameters of the object . The knowledge of these hyperparameters can be used, afterwards, to perform the object restoration. The outline is as follows: we begin with the mathematical formulation of the problem, in Section 3 we recall the usual joint estimation scheme and we present the marginal estimator. In section 4, we compare the properties and performances of these two estimators for the reconstruction of the aberrations, by means of simulations.

## 2

## -PROBLEM STATEMENT AND PHASE DIVERSITY PRINCIPLE

In the isoplanatic patch of the telescope, the image is the noisy convolution of the point-spread function *h* in the observation plane and the object *o*:

where *r* is a two-dimensional vector in the image plane and *n* is an additive noise. Under near field approximation, the point-spread function associated with the focused image is given by:

where *u* is a two-dimensional vector in the pupil plane *ϕ* is the unknown aberrated phase function, *P* is the binary aperture function and *FT* denotes the Fourier transform. The aberrated phase function is expanded on a set of Zernike polynomials[9].

Ideally, *k* tends to infinity but in practice *k* will be a high number. And in the particular case of a space telescope, the first polynomials are enough to describe the aberrations (typically the first twenty). In the following, we will note **a** = (*a*_{1},…,*a _{k}*)

^{t}where

*t*denotes transposition. The estimation of the aberrations from only the focused image, does not ensure the uniqueness of the solution. This indetermination is due to the relationship between the psf and the aberrated phase: a couple (

*ϕ,ϕ’*) exists such that

*h*(

*ϕ*) =

*h*(

*ϕ*′). In particular, when the phase is expanded on the Zernike polynomials, the global sign of the set of

*a*such that

_{i}*i*= 2

*p*could not be estimated. Phase diversity was proposed to add information and thus remove this indetermination. The idea is to collect, at least, one additional image which differs from the focused one, by a known phase variation. To implement simply this technique, the second image is taken in a defocused plane (with a known defocused distance), the added phase is thus quadratic. In the defocused plane:

where *ϕ _{d}* is the known diversity phase function.

In practice, data are discrete arrays because of the spatial sampling of the images. Equation 1 takes the form:

where **H** is the Toeplitz matrix corresponding to the convolution by *h*, and where **i**,**o** and **n** are the discrete forms of the previous variables.

The problem is to estimate the aberration parameters **a** (the set of *a _{i}*) from the data (focused

*i*

_{1}and defocused

*i*

_{2}images) and the defocused distance, without knowing the object

*o*. We begin by recalling the joint estimator, then we will present the marginal one.

## 3

## - ESTIMATORS

## 3.1

### - Joint estimation

The general approach for jointly estimating the object and the aberrations, which is sometimes called the joint Maximum A Posteriori (JMAP) approach, is defined by

where *f*(*i*_{1}, *i*_{2}, *o*, **a**; *θ*) is the joint probability density function of the data (*i*_{1},*i*_{2}), of the object *o* and of the aberrations **a**. It also depends on the set of hyperparameters *θ* (*θ* stands for the regularization parameters). *f*(*i*_{1}|*o*, a; *θ*) and *f*(*i*_{2}| *o*, **a**; *θ*) denote the likelihood of the data *i*_{1} and *i*_{2}, *f*(*o*; *θ*) and *f*(**a**; *θ*) are the *a priori* probability density function of *o* and **a**. The noise in the images is the sum of the photon noise (Poisson distributed random variable) and the Gaussian detector noise. In the case of Earth scenes, the mean signal is high enough so that the photon noise can be approximed as Gaussian. Besides, we can assume that the global noise is a stationary white Gaussian noise with a variance *σ*^{2} (the same for the two images). We choose a Gaussian prior probability distribution for the object with a mean of *o _{m}* and a covariance matrix

*R*also a Gaussian one for the aberrations with a null mean and a covariance matrix

_{o}*R*. Hence we have

_{a}where |*x*| denotes the determinant of *x, N*^{2} is the number of pixels in the image and k the number of aberrations. The criterion to minimize, *L*_{JMAP}(*o*, **a**) = − ln(*f*(*i*_{1}, *i*_{2}, *o*, **a**; *θ*)), can be written in the Fourier domain. Furthermore its derivative with respect to the object gives[10] a closed-form expression for the object *ô*(**a**) that minimizes the criterion for a given **a**.

where tilde denotes the two-dimensional DFT and *S _{o}* is the power spectral density of the object. Substituting

*ô*(

**a**) into the criterion yields a new criterion that does not depend explicitly on the object:

The *a priori* information required on the object consists of the power spectral density of the object. We choose the following model

where *E* stands for the mathematical expectation. This heuristic model and similar ones have been quite widely used[11]. Notice that this model requires the choice of three hyperparamaters *θ _{o}* = (

*k*,

*v*,

_{o}*p*).

We will note *θ _{a}* the hyperparameters due to the aberrations prior, and

*θ*those due to the noise statistics.

_{n}## 3.2

### - Marginal estimation

## 3.2.1

#### - Criterion

When phase diversity is used as a wavefront sensor, the aberrations only need to be estimated, the observed object is a nuisance parameter. We propose to estimate the sole phase or the phase and the hyperparameters by integrating the object out of the problem. This novel estimator is obtained by integrating the probability density function used for the joint estimation and it is based on the Maximum A Posteriori estimator for **a**, which is defined by

We denote *I* = (*i*_{1} *i*_{2})^{T} the vector which concatenates the data. *I* depends linearly on variables (*o* and *n*) that are Gaussian, so it follows a Gaussian statistics. Maximizing *f*(*i*_{1}, *i*_{2}, **a**; *θ*) = *f*(*I*, **a**; *θ*) is thus equivalent to minimizing the following criterion:

where *m _{I}* = (

*H*

_{1}

*o*

_{m}

*H*

_{2}

*o*

_{m})

^{T}is the vector which contains the mean images,

*R*is the covariance matrix of

_{I}*I*and |

*R*| its determinant.

_{I}*R*has the following expression (

_{I}**1**is the identity matrix):

## 3.2.2

#### - Relationship between the joint estimator and the marginal estimator

It can be shown[12] that the two criterions *L*_{MAP} and *L′*_{JMAP} are related by the following relationship:

Thus, the difference between the marginal estimator and the joint estimator consists of a single additional term, which in the Fourier domain is given by:

Although the two estimators differ only by one term, we shall see in section 4 that their properties differ considerably. Before that, let us see how the hyperparameters can be estimated.

## 3.3

### - Estimation of the hyperparameters

The marginal estimator as the joint one calls for the choice of the values of the hyperparameters *θ* = (*θ _{n}, θ_{o}, θ_{a}*). The difficulty for estimating

*θ*and

_{n}, θ_{o}*θ*is not equal : the hyperparameters of the aberrations

_{a}*θ*could be easily adjusted by using Kolmogorov statistics, the hyperparameters of the object

_{a}*θ*must be estimated for each object because they depend on the structure of the object. For the marginal estimator, the estimation of

_{o}*θ*and

_{o}*θ*is not a problem because they can be easily estimated jointly with the aberrations. Indeed, they are given by

_{n}There are four hyperparameters *θ _{n}* =

*σ*

^{2},

*θ*= (

_{o}*k, v*) to estimate but by doing the following change of variable

_{o}, p*μ*=

*σ*

^{2}/

*k*, we can find a closed-form expression for such that = 0. The criterion to minimize becomes

*L*

_{MAP}(

**a**,

*μ,v*). On the contrary, for the joint estimator, hyperparameters

_{o}, p, θ_{a}*θ*and

_{o}*θ*can not be jointly estimated with

_{n}*o*and

*a*. Indeed, the criterion degenerates when one seeks

*θ*together with

_{o}*o*and

**a**; in particular it is possible to find a pair (

*θ*) that does not depend on the data such that the criterion tends to minus infinity. So before minimizing

_{oe}, o_{e}*L*

_{JMAP}, these hyperparameters must be chosen by the user.

## 4

## - COMPARISON

## 4.1

### - Image simulation

In this section, we shall compare the two estimators by means of simulations. The simulations have been obtained in the following way: our object is an Earth view. The phase is generated with the first 21 Zernike polynomials (their values are listed in Table 1) and the estimated phase will be expanded on the same polynomials. Note that the coefficients *a*_{1–3} will not be estimated: the piston coefficient *a*_{1} is a constant added to the phase and has no influence on the point-spread function and the tilts coefficients *a*_{2}, *a*_{3} introduce a shift in the image that is of no importance for extended objects. The defocused amplitude for the second observation plane is 2π rad peak to peak. The simulated images are monochromatic and are sampled at the Shannon rate. They are obtained by the convolution (made by FFT) of the point-spread function and the object (see Figure 2) and corrupted by a stationary white Gaussian noise.

## Table 1:

Values of the coefficients used for simulations.

ai | a4 | a5 | a6 | a7 | a8 | a9 | a10 | a11 | a12 |

(rad) | -0.2 | 0.3 | -0.45 | 0.4 | 0.3 | -0.25 | 0.35 | 0.2 | 0.1 |

ai | a13 | a14 | a15 | a16 | a17 | a18 | a19 | a20 | a21 |

(rad) | 0.05 | -0.05 | 0.05 | 0.02 | 0.01 | -0.01 | -0.02 | 0.01 | 0.01 |

## 4.2

### - Properties

One acknowledged problem with joint estimation is that it does not ensure that bias and variance asymptotically vanish, i.e. the estimate may not converge towards the true value as the data size tends to infinity[13, 14]. On the contrary, the marginal estimator, under not very restrictive conditions, has a null asymptotic bias and variance[15, 16]. Figure 3, obtained with simulations, illustrates these asymptotic behaviors. Notice that we use a gradient-conjugate method to minimize the criterion and that these curves, like all the following ones, are the average on 20 realizations of noisy monochromatic image pairs. Furthermore we have not regularized the aberrations (an explicit regularization is not necessary when the phase is expanded on few Zernike polynomials (21)). For the two estimators, we have set the value of the hyperparameters (noise + object): σ^{2} is equal to the average number of photons per pixels and we fit the power spectral density model with the true object to obtain the hyperparameters (*k, v _{o}* and

*p*), which will call the “true” hyperparameters. For the marginal estimator (left figure 3), the root mean square error (RMSE) of the aberration decreases when the number of data increases. On contrary, for the joint estimator (right figure), the increase of the number of data has no effect on the RMSE. An intuitive explanation of this phenomenon is that, if a bigger image is used to estimate the aberrations, the size of the object, which is jointly reconstructed, increases too, so that the ratio of the number of unknowns to the number of data does not tend towards zero. Additionally the joint criterion may present local minima, whereas according to a Central Limit Theorem, when the number of data (i.e. the image size) increases, the shape of the likelihood function tends to become more and more Gaussian so that it presents less and less local minima.

## 4.3

### - Influence of the hyperparameters

The problem of the hyperparameters is not the same as we use the joint estimator or the marginal one. For the joint estimator, we have pointed out that they must be estimated by hand, we will study now the influence of the global hyperparameter (1/*k*) i.e. the one that quantifies the trade-off between goodness of fit to the data and the *a priori*, on aberrations and object estimates. Figure 4 shows the RMSE on the aberrations estimates and on object estimate as a function of the value of this hyperparameter (we varied 1/*k* around its true value which corresponds to 1 in figure 4). We see that the best value of this hyperparameter is not the same for the object and for the aberrations. For the estimation of the aberrations, the reconstruction is better when the object is under-regularized. The behavior of the curve corresponding to the aberrations depends on the noise level: for a high noise level (14% here), there is one optimal hyperparameter but for lower noise levels (4%), any value under 1, including a null regularization, leads to an accurate estimation of the aberrations even though the jointly estimated object is totally noisy. This observation sheds some light on the fact that, for low noise level, unregularized phase estimation has been successfully used in the literature [17, 18] and also that explains the lack of regularization used on the object[19, 2, 3, 6] (it is difficult to regularize the object without losing on the aberrations estimates). This observation also leads us to study the asymptotic behavior of the joint estimator with no regularization. Figure 6 shows the results. In this case, when the number of data increases, the RMSE on the aberrations estimates decreases. The number of data samples over the number of unknowns is the same as for the estimation with the true hyperparameters but here, the estimator behaves, surprisingly, as if the object were not being estimated. For the marginal estimator, we compare the quality of the aberrations reconstruction obtained by minimizing *L*_{MAP} with the true hyperparameters and by minimizing *L*_{MAP}(*a, μ, v _{o}, p, θ_{a}*) which we call unsupervised estimation, for two different image sizes. From figure 5, we see that for low noise levels, the agreement is very good. For 128 × 128 pixels, it is quite good for any noise level. For 32 × 32 pixels and high noise level, the reconstruction is seriously degraded because of the lack of information contained in the noisy data.

## 4.4

### - Numerical comparison

We present results of the comparison between the joint estimator and the marginal estimator. In order to compare these estimators in a realistic way, we use the joint estimator with a null regularization and for the marginal approach, the hyperparameters are reconstructed jointly with the aberrations. We compared the root mean square error of the aberration estimates as a function of noise level for two image sizes (32×32 pixels and 128×128 pixels). The results for the two estimators are plotted in figure 7. We can see at least two different domains: when the noise level *n* < 5%, the two estimators give approximately the same results. For 5% < *n* < 20%, the MAP estimation is better. We have noticed that the results, for very high noise level *n* > 15% and an image size of 32 × 32 pixels, depends on the value of the aberrations coefficients i.e. the joint estimator could give better results than the marginal estimator. Anyway, in practice, this noise level is never reached. In terms of computational time, these two methods are quite equivalent.

## 5

## - CONCLUSION

We have presented two estimators. The first one is based on joint estimation of the aberratons and the object. It is the usual estimator in the field of phase diversity. We have shown its properties with simulations and pointed out its surprisingly good behavior for null regularization. We have also proposed a novel marginal estimator which estimates the sole aberration parameters and allows unsupervised estimation of the hyperparameters. We have shown, with simulations, its good asymptotic behavior. Finally, we have compared the performance of these estimators, for the reconstruction of the aberrations, for different image sizes and different noise levels. It appears that the marginal estimator gives equal or better reconstructions than the joint estimator while presenting a better coherence.