*ε*

**(k)**describable by a mixture of spectral endmembers. We prove that a simulated annealing search conducted according to a suitable schedule converges to a solution maximizing the

*a-posteriori*probability that spectral radiance detected at the top of the atmosphere originates from a patch with stipulated

**and**

*T**ε*

**(k)**. Any such solution will be nonunique. The average of a large number of simulated annealing solutions, however, converges almost surely to a unique maximum

*a-posteriori*(MAP) solution for

**and**

*T**ε*

**(k)**.

## 1.

## Introduction

The temperature–emissivity separation (TES) problem bedevils any attempt to extract spectral information from remote sensing in the thermal infrared. Numerous methods have been proposed for handling TES.^{1}2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.^{–}^{17} In this letter we investigate the application of simulated annealing to a MAP solution of the TES problem. The approach is an extension of earlier work on Bayesian TES.^{3}^{,}^{5} Simulated annealing cannot give a unique solution to this problem, but we shall show that the average of a large number of simulated annealing TES solutions converges almost surely to a unique TES estimate.

## 2.

## Background

Simulated annealing has traditionally been regarded as a preferred method of global solution for combinatorial optimization problems, such as the traveling salesman. In this letter, we adapt the Metropolis algorithm^{18}19.^{–}^{20} to an optimization problem that lacks a unique global optimal solution: TES. The underdetermined TES problem, notoriously,^{1}2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.^{–}^{17} has a continuous infinity of solutions that yield the identical optimum value for any cost or payoff function one cares to choose.

A key part of any simulated annealing algorithm is the choice of an annealing schedule that causes the posterior probabilities to transition from nearly uniform to very tight in such a way as to evade the risk of the MAP search from converging to a local, rather than a global, optimum. Factors that enter into the choice of the annealing schedule are described in Refs. 19 and 20. In what follows, we shall stipulate that a suitable annealing schedule has been supplied and shall concern ourselves with the existence of a solution to the simulated annealing TES problem.

## 3.

## Simulated Annealing and the Temperature–Emissivity Separation Problem

## 3.1.

### Metropolis Algorithm Search for MAP Solution

Suppose that we have in our possession prior knowledge that a target patch, forming part of the lower boundary of the atmosphere, is composed of an intimate mixture of $m+1$ spectral endmembers $\{{\epsilon}_{i}(k)\}$ at temperature $T$. The label $k$ may, depending upon context, refer to the wavenumber or to a finite number of wavenumber-averaged spectral bands.

For the development that follows, it is desirable to restate some of the main definitions used in spectral mixing theory using language borrowed from topology.^{21}^{,}^{22} Let $m+1$ distinct points ${\mathbf{y}}_{0},{\mathbf{y}}_{1},\cdots {\mathbf{y}}_{m}$ in ${\mathbf{R}}^{\mathbf{m}}$ be chosen so that the vectors ${\mathbf{y}}_{1}-{\mathbf{y}}_{0},{\mathbf{y}}_{2}-{\mathbf{y}}_{0}\cdots {\mathbf{y}}_{m}-{\mathbf{y}}_{0}$ are linearly independent. Then the set

## (2)

$${\lambda}_{i}\ge 0,\phantom{\rule[-0.0ex]{1em}{0.0ex}}\forall \text{\hspace{0.17em}\hspace{0.17em}}i,$$A spectral mixture amounts to a mapping into a geometric $m$-simplex, whose vertices have spectral endmembers at a stipulated temperature $T$ for “coefficients.” A mixture with stipulated weights ${\lambda}_{i}$ corresponds to the vector

The “interior” of ${K}_{m}$ is the subset of ${K}_{m}$ for which ${\lambda}_{i}>0$, i.e., the closure of its interior. The “polyhedron” of ${K}_{m}$, denoted as $|{K}_{m}|$, is the set comprised of the points of $\mathbf{x}\in {K}_{m}$ considered as a subset of ${\mathbf{R}}^{m}$ and is a convex compact subset of ${\mathbf{R}}^{\mathbf{m}}$.In the case $m+1=3$, a familiar example of a 2-simplex is the ternary diagram used to classify phreatic igneous rocks. The double three-component diagram used in the quartz, alkali feldspar, plagioclase, feldspathoid classification^{23} scheme is a union of two 2-simplices, and is an example of a “simplicial complex.”

For present purposes, the $i$’th pure endmember for the $n$’th trial is assigned to the $i$’th vertex of ${K}_{m}$

with the spectral mixture corresponding to a point in the polyhedron of ${K}_{m}$.It is necessary to account for surface temperature in a somewhat different way. Let the minimum and maximum physically admissible surface temperatures be ${T}_{\mathrm{min}}$ and ${T}_{\mathrm{max}}$, respectively. Then the temperature of our target patch is given as

with Corresponding to $\mathbf{x}$ introduced already, we have from Eq. (6) the unit interval, with The quantity that appears in the forward model for the $n$’th trial is## (10)

$$\u27e8\epsilon (k){B}_{k}({T}_{n})\u27e9=\sum _{i=1}^{m}{\lambda}_{i}{\epsilon}_{i}(k){B}_{k}({T}_{n}),$$We score trial mixtures by that we most wish to maximize; the posterior probability for the observed spectral radiance to originate from a surface patch with temperature $T$ and spectral emissivity $\epsilon (k)$. A standard argument^{3}^{,}^{5} gives the posterior probability in terms of a MAXENT estimator

## (12)

$$P(I|T,\epsilon ,\sigma )=\mathrm{exp}[-\frac{{(I-{I}_{\mathrm{FM}})}^{2}}{2{\sigma}^{2}({T}_{\mathrm{a}})}]\frac{\mathrm{d}I}{\sigma ({T}_{\mathrm{a}})},$$## (13)

$${I}_{\mathrm{FM}}=f[\sum _{i=1}^{m}{\lambda}_{i}{\epsilon}_{i}(k){B}_{k}({T}_{n})]\iff f(x),$$## (14)

$$P(\{{I}_{k}\}|T,\epsilon ,\sigma )=\prod _{k=1}^{J}\mathrm{exp}\{-\frac{{[{I}_{k}-{I}_{\mathrm{FM}}(k)]}^{2}}{2{\sigma}^{2}({T}_{\mathrm{a}})}\}\frac{\mathrm{d}I}{\sigma ({T}_{\mathrm{a}})}.$$## (15)

$$P(T,{\epsilon}_{i}(k)|\{{I}_{k}\},K)=P[T,\epsilon (k)|K]\frac{P[\{{I}_{k}\}|T,{\epsilon}_{i}(k),K]}{P(\{{I}_{k}\}|K)}.$$^{24}Equation (15) is evaluated with the aid of the prior probability for the surface to be at temperature $T$ and has spectral emissivity $\epsilon (k)$, given available knowledge $K$

^{5}where $P[T,{\epsilon}_{i}(k)|\{{I}_{i}\},K]$ is the conditional probability for the hypothesis that the surface temperature is $T$, and the spectral emissivity ${\epsilon}_{k}$, given observed radiances $\{{I}_{i}\}$ and prior knowledge $K$.

Each trial is thus scored according to the joint posterior probability for observed spectral radiance ${I}_{i}$ to result from surface temperature $T$ and spectral emissivity ${\epsilon}_{k}$

where $x$ stands for $\{\mathbf{x},{x}_{m+1}\}$. Thus, in going from the $(n-1)$’th to the $n$’th trial, the $n$’th candidate mixture is selected by Monte Carlo draw and ${p}_{n}$ for the new trial is compared to ${p}_{n-1}$ for the last one. The probability that it is accepted is^{18}

^{,}

^{20}

## (18)

$$P=\{\begin{array}{ll}1& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{p}_{n}/{p}_{n-1}\ge 1\\ P({T}_{\mathrm{a}})& \text{otherwise}\end{array},$$## 3.2.

### Convergence

We now examine the question of convergence. Corresponding to the sequence of $m$-simplices, ${K}_{m}$, as the number of trials $n$ increases without bound is a sequence of trials $\{{T}_{n},{\epsilon}_{i}(k)\}$ with associated loci $\{x\}\in {H}^{m+1}$.

As a closed bounded subset of ${\mathbf{R}}^{m+1}$, ${H}^{m+1}$ is a compactum. Therefore, as $n\to \infty $, the sequence of trials $x$ contains a convergent subsequence, whatever the value of $m$. Correspondingly, the sequence of posterior probabilities likewise has a convergent subsequence that, by construction, tends to the maximum value of the posterior probability, i.e., to an MAP solution for $T$ and $\epsilon (k)$.

Consider the map ${x}^{\prime}=\mathrm{\Phi}(x)$ given by

## (19)

$$\mathrm{\Phi}(x)=\{\begin{array}{ll}{x}^{\prime}& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{p}_{n}({x}^{\prime})-{p}_{n}(x)\ge 0\\ x& \text{otherwise}.\end{array}.$$The mapping of Eq. (19) generates a sequence of trials $x$ for which ${p}_{n}$ is nondecreasing. By Zorn’s lemma, the set comprised of all admissible trials $x$ has at least one element with a maximal value of ${p}_{n}$. We note that maximizing ${p}_{n}$ also maximizes the information-theoretic entropy by Eq. (12). According to the usual statement of the second law, the state of maximum entropy is one of thermodynamic equilibrium. We may, in view of the thermodynamic analogy underlying the Metropolis algorithm, therefore call the limit Eq. (22) an equilibrium point.

This nomenclature is attractive for another reason. In the limit, Eq. (22) amounts to a fixed point of Eq. (19). Ordinary fixed-point theorems are inapplicable to Eq. (19) because it is neither continuous nor semicontinuous: It can map an open set $\in {H}^{m+1}$ to a singleton ${x}^{\prime}$. We can, however, adapt the celebrated construction introduced by Nash^{25} to prove the existence of a fixed (equilibrium) point of an equivalent self-mapping.

In fact, we shall prove a somewhat stronger result. Consider

for stipulated $x$. The function $\phi $ is continuous in the mixture ${x}_{\alpha}$. Define the mapping $N:x\to {x}^{\prime}$ by## (24)

$${x}^{\prime}=\frac{x+\sum _{\alpha}{\phi}_{\alpha}{x}_{\alpha}}{1+\sum _{\alpha}{\phi}_{\alpha}},$$Equation (24) is continuous and maps points $x$ into a convex compactum $\subset {\mathbf{R}}^{m+1}$. A fixed point

Therefore, exists according to the Brouwer fixed-point theorem that, by construction, maximizes the*a-posteriori*probability of $x$.

The mapping Eq. (19) generates a sequence of trials $x$ for which ${p}_{n}$ is nondecreasing and gives the maximal value of ${p}_{n}$ in the limit, while Eq. (27) demonstrates the existence of a trial ${x}^{*}$ for which ${p}_{n}$ cannot be made greater. In view of the ensemble-theoretic freedom to choose ${x}_{\alpha}$, we may identify the limit in Eq. (22) with the fixed point in Eq. (27). Therefore, a convergent subsequence of annealing trials exists that tends to an equilibrium point. Moreover, Eq. (27) demonstrates that the annealing search can, in principle, find ${x}^{*}$ in a finite number of trials. We conclude that, granted a suitable annealing schedule, there exists at least one convergent sequence of trials that tends to MAP surface temperature and spectral emissivity estimates consistent with observed spectral radiances ${I}_{k}$.

## 3.3.

### Uniqueness

Sequential compactness guarantees the existence of a convergent subsequence of trials. In practice, we must expect that there will be more than one such sequence. The nonuniqueness of solutions to the TES problem suggests that there will be a continuous infinity of possible trials $\{{T}_{n},{\epsilon}_{i}(k)\}$ that yield any stipulated value for the posterior probability. In any realizable search strategy, however, we need only contend with a countable set of convergent subsequences. Among these, there will be one for which the posterior probability is greatest. Its limit will be the closest approach to the MAP solution achieved by simulated annealing. In the nature of things, more than one convergent subsequence may be expected to exist that yields this same maximal estimate, with the same asymptotic annealing temperature ${T}_{\mathrm{a}}^{\infty}$. We ignore all subsequences except these maximal ones.

In Refs. 5 and 3, expectation values for $T$ and $\{\epsilon (k)\}$ over the the posterior probability [Eq. (15)] were shown to give good estimates for physical surface temperatures and emissivities. We claim that the mean of a large number of subsequences that converge to the limiting MAP value will tend to the expectation values for $T$ and $\{\epsilon (k)\}$ with respect to Eq. (15).

The MAXENT estimator is constructed from the posterior probability of noise power in a spectral bin. For the sake of simplicity, we assume identical noise power in each bin. (This assumption is inessential and may be relaxed.) A fully annealed MAP estimate may be thought of as an individual Bernoulli trial drawn from the likelihood function for $\{{T}_{n},{\epsilon}_{i}(k)\}$. By construction, all such trials are independent and identically distributed with bounded expectation values. Moments over Eq. (15) are bounded despite bad behavior of the Jeffreys prior at $T=0$, because of the rapid decay of the exponentials away from the MAP solution, as L’Hôpital’s rule demonstrates.

Let

and be the means of MAP surface temperature and spectral emissivity taken over $N$ convergent subsequences. Suppose the covariance matrix $\mathbf{\Sigma}$ of the trials to be nonsingular. We invoke the multivariate central limit theorem to conclude the mean values converge weakly to the multivariate Gaussian distribution## (30)

$$\sqrt{N}\left[\begin{array}{c}\overline{T}-\u27e8T\u27e9\\ \overline{\epsilon (1)}-\u27e8\epsilon (1)\u27e9\\ \vdots \\ \overline{\epsilon (m)}-\u27e8\epsilon (m)\u27e9\end{array}\right]\rightsquigarrow {\mathbf{N}}_{m}(0,\mathbf{\Sigma}).$$Reliance on the mixing hypothesis in the form given by Eq. (10), however, brings with it the concern that the relevant covariance matrix might be singular. In that event, the strong law of large numbers^{26}^{,}^{27} ensures

To the extent that the estimator used in the simulated annealing search is zero-mean error, we conclude the estimates yield accurate values for the physical values of $T$ and $\{\epsilon (k)\}$. As the spectral weights $\mathbf{x}$, lying as they do between zero and unity, possess bounded moments, this conclusion applies to the limiting mean values of $\{{T}_{n},\{{x}_{m}^{n}\}\}$ as well.

## 4.

## Discussion

The practical utility of the mathematical development in this letter may be questioned. We briefly address two possible concerns.

A legitimate concern is that the spectral emissivity of natural ground covers in the wild will seldom be known to the level of accuracy found in Ref. 28. While true in general, this concern has not dissuaded other researchers from relying upon spectral unmixing.

While convergence of the algorithm has been proved to our satisfaction, we have no equally satisfactory estimates of the rate of convergence, with the consequence that the choice of annealing schedule remains a matter of trial and error. In response to this concern, the availability of massively parallel computation made possible by the ready availability of cheap graphics processing unit arrays means that massive processing requirements need not preclude the use of a resource-hungry algorithm if that algorithm can provide a performance not attainable by other approaches.

## 5.

## Conclusion

In this letter, the Metropolis algorithm has been adapted to formulate a simulated annealing approach to the TES problem for a spectral mixture. We have presented a proof of convergence of simulated annealing searches for candidate MAP TES solutions. We have additionally shown that the average of a large number of these candidate MAP solutions converges almost surely to a unique estimate of surface temperature and spectral emissivity that, given a forward model leading to an unbiassed estimator for $T$ and $\{{\epsilon}_{k}\}$, closely approximates the true values of these quantities.

The simulated annealing approach to TES by spectral unmixing offers something that other TES algorithms do not. By construction, it gives (in the limit) the unique best estimate in an MAP sense, for the remote determination of surface temperature and spectral emissivity of a patch of ground that is known to be comprised of a spectral mixture of a stipulated set of spectral end members.

## Acknowledgments

Support from the Aerospace Corporation Technical Investment Program under USAF Contract No. FA8802-14-C-0001 is acknowledged.

## References

A. Barducci et al., “Maximum entropy temperature-emissivity separation in the TIR spectral range using the MaxEnTES algorithm,” Infrared Phys. Technol. 56, 12–20 (2013).BAMOAD0273-0979http://dx.doi.org/10.1016/j.infrared.2012.09.002IPTEEY1350-4495Google Scholar

Z.-L. Li et al., “Land surface emissivity retrieval from satellite data,” Int. J. Remote Sens. 34, 3084–3127 (2013).IJSEDK0143-1161http://dx.doi.org/10.1080/01431161.2012.716540Google Scholar

J. A. Morgan, “Comparison of Bayesian land surface temperature algorithm performance with Terra MODIS observations,” Int. J. Remote Sens. 32, 8139–8159 (2012).IJSEDK0143-1161http://dx.doi.org/10.1080/01431161.2010.532824Google Scholar

J. A. Sobrino et al., “Land surface temperature derived from airborne hyperspectral scanner thermal infrared data,” Remote Sens. Environ. 102, 99–115 (2006).http://dx.doi.org/10.1016/j.rse.2006.02.001Google Scholar

J. A. Morgan, “Bayesian estimation for land surface temperature retrieval: the nuisance of emissivities,” IEEE Trans. Geosci. Remote Sens. 43, 1279–1288 (2005).IGRSD20196-2892http://dx.doi.org/10.1109/TGRS.2005.845637Google Scholar

P. Dash et al., “Land surface temperature and emissivity estimation from passive sensor data: theory and practice-current trends,” Int. J. Remote Sens. 23, 2563–2594 (2002).IJSEDK0143-1161http://dx.doi.org/10.1080/01431160110115041Google Scholar

Z.-M. Wan, MODIS Land-Surface Temperature Algorithm Theoretical Basis Document, Institute for Computational Earth System Science, University of California, Santa Barbara (1999).Google Scholar

Z.-L. Li et al., “Evaluation of six methods for extracting relative emissivity spectra from thermal infrared images,” Remote Sens. Environ. 69, 97–214 (1999).http://dx.doi.org/10.1016/S0034-4257(99)00049-8Google Scholar

A. B. Kahle and R. E. Alley, “Separation of temperature and emittance in remotely sensed radiance measurements,” Remote Sens. Environ. 42, 107–111 (1992).http://dx.doi.org/10.1016/0034-4257(92)90093-YGoogle Scholar

P. S. Kealy and S. J. Hook, “Separating temperature and emissivity in thermal infrared multispectral scanner data: implications for recovering land surface temperatures,” IEEE Trans. Geosci. Remote Sens. 31, 1155–1164 (1993).IGRSD20196-2892http://dx.doi.org/10.1109/36.317447Google Scholar

F. Petitcolin and E. F. Vermote, “Land surface reflectance, emissivity and temperature from MODIS middle and thermal infrared data,” Remote Sens. Environ. 83(1–2), 112–134 (2002).http://dx.doi.org/10.1016/S0034-4257(02)00094-9Google Scholar

Z.-L. Li and F. Becker, “Feasibility of land surface temperature and emissivity determination from AVHRR data,” Remote Sens. Environ. 43, 67–85 (1993).http://dx.doi.org/10.1016/0034-4257(93)90065-6Google Scholar

K. Watson, “Spectral ratio method for measuring emissivity,” Remote Sens. Environ. 42, 113–116 (1992).http://dx.doi.org/10.1016/0034-4257(92)90094-ZGoogle Scholar

C. C. Borel, J. Szymanski, “Physics-based water and land temperature retrieval,” in Handbook of Science Algorithms for the Multispectral Thermal Imager, and B. W. Smith, Ed., Los Alamos National Laboratory and Savannah River Technology Center, Los Alamos, New Mexico (1998).Google Scholar

Z.-M. Wan and J. Dozier, “A generalized split-window algorithm for retrieving land-surface temperature from space,” IEEE Trans. Geosci. Remote Sens. 34, 892–905 (1996).IGRSD20196-2892http://dx.doi.org/10.1109/36.508406Google Scholar

A. Barducci and I. Pippi, “Temperature and emissivity retrieval from remotely sensed images using the ‘Grey body emissivity’ method,” IEEE Trans. Geosci. Remote Sens. 34, 681–695 (1996).IGRSD20196-2892http://dx.doi.org/10.1109/36.499748Google Scholar

Z.-M. Wan and Z.-L. Li, “A physics-based algorithm for land-surface emissivity and temperature from EOS/MODIS data,” IEEE Trans. Geosci. Remote Sens. 35, 980–996 (1997).IGRSD20196-2892http://dx.doi.org/10.1109/36.602541Google Scholar

N. Metropolis et al., “Equations of state calculations by fast computing machines,” J. Chem. Phys. 21, 1087–1091 (1953).JCPSA60021-9606http://dx.doi.org/10.1063/1.1699114Google Scholar

S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, “Optimization by simulated annealing,” Science 220, 671–680 (1983).SCIEAS0036-8075http://dx.doi.org/10.1126/science.220.4598.671Google Scholar

W. Press et al., Numerical Recipes in C, pp. 326–334, Cambridge University Press, Cambridge (1988).Google Scholar

C. R. F. Maunder, Algebraic Topology, pp. 31–32, Dover, Mineola, New York (1996).Google Scholar

J. J. Rotman, An Introduction to Algebraic Topology, pp. 31–38, Springer-Verlag, New York (1988).Google Scholar

M. J. Le Bas and A. L. Streckeisen, “The IUGS systematics of igneous rocks,” J. Geol. Soc. London 148, 825–833 (1991).http://dx.doi.org/10.1144/gsjgs.148.5.0825Google Scholar

L. Bretthorst, “Bayesian spectrum analysis and parameter estimation,” in Lecture Notes in Statistics, and J. Berger et al., Eds., Vol. 48, Springer-Verlag, New York (1988).Google Scholar

J. F. Nash, “Non-cooperative games,” Ann. Math. 54, 286–295 (1951).ANMAAH0003-486Xhttp://dx.doi.org/10.2307/1969529Google Scholar

J. W. Salisbury et al., Infrared (2.1–25 μ) Spectra of Minerals, Johns Hopkins University Press, Baltimore, Maryland (1992).Google Scholar

A. W. van der Waart, Asymptotic Statistics, Cambridge University Press, Cambridge (1998).Google Scholar

Z. Artstein and R. A. Vitalem, “A strong law of large numbers for random compact sets,” Ann. Probab. 3, 879–882 (1975).APBYAE0091-1798http://dx.doi.org/10.1214/aop/1176996275Google Scholar