Simulated annealing approach to temperature – emissivity separation in thermal remote sensing

. The method of simulated annealing is adapted to the temperature – emissivity separation problem. A patch of surface at the bottom of the atmosphere is assumed to be a gray-body emitter with spectral emissivity ϵ ð k Þ describable by a mixture of spectral endmembers. We prove that a simulated annealing search conducted according to a suitable schedule converges to a solution maximizing the a-posteriori probability that spectral radiance detected at the top of the atmosphere originates from a patch with stipulated T and ϵ ð k Þ . Any such solution will be nonunique. The average of a large number of simulated annealing solutions, however, converges almost surely to a unique maximum a-posteriori (MAP) solution for T and ϵ ð k Þ . © The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JRS.10.040501]


Introduction
The temperature-emissivity separation (TES) problem bedevils any attempt to extract spectral information from remote sensing in the thermal infrared.][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] In this letter we investigate the application of simulated annealing to a MAP solution of the TES problem.The approach is an extension of earlier work on Bayesian TES. 3,5Simulated annealing cannot give a unique solution to this problem, but we shall show that the average of a large number of simulated annealing TES solutions converges almost surely to a unique TES estimate.
A key part of any simulated annealing algorithm is the choice of an annealing schedule that causes the posterior probabilities to transition from nearly uniform to very tight in such a way as to evade the risk of the MAP search from converging to a local, rather than a global, optimum.Factors that enter into the choice of the annealing schedule are described in Refs.19  and 20.In what follows, we shall stipulate that a suitable annealing schedule has been supplied and shall concern ourselves with the existence of a solution to the simulated annealing TES problem.
3 Simulated Annealing and the Temperature-Emissivity Separation Problem

Metropolis Algorithm Search for MAP Solution
Suppose that we have in our possession prior knowledge that a target patch, forming part of the lower boundary of the atmosphere, is composed of an intimate mixture of m þ 1 spectral endmembers fϵ i ðkÞg at temperature T. The label k may, depending upon context, refer to the wavenumber or to a finite number of wavenumber-averaged spectral bands.
For the development that follows, it is desirable to restate some of the main definitions used in spectral mixing theory using language borrowed from topology. 21,22Let m þ 1 distinct points y 0 ; y 1 ; • • • y m in R m be chosen so that the vectors y 1 − y 0 ; y 2 − y 0 • • • y m − y 0 are linearly independent.Then the set E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 5 8 4 with E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 2 5 and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 4 8 7 is an m-simplex.
A spectral mixture amounts to a mapping into a geometric m-simplex, whose vertices have spectral endmembers at a stipulated temperature T for "coefficients."A mixture with stipulated weights λ i corresponds to the vector E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 3 9 6   x ¼ The "interior" of K m is the subset of K m for which λ i > 0, i.e., the closure of its interior.The "polyhedron" of K m , denoted as jK m j, is the set comprised of the points of x ∈ K m considered as a subset of R m and is a convex compact subset of R m .In the case m þ 1 ¼ 3, a familiar example of a 2-simplex is the ternary diagram used to classify phreatic igneous rocks.The double three-component diagram used in the quartz, alkali feldspar, plagioclase, feldspathoid classification 23 scheme is a union of two 2-simplices, and is an example of a "simplicial complex."For present purposes, the i'th pure endmember for the n'th trial is assigned to the i'th vertex of K m E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 2 3 9 with the spectral mixture corresponding to a point in the polyhedron of K m .It is necessary to account for surface temperature in a somewhat different way.Let the minimum and maximum physically admissible surface temperatures be T min and T max , respectively.Then the temperature of our target patch is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 1 6 0 with E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 1 1 7 0 ≤ λ mþ1 ≤ 1: Corresponding to x introduced already, we have from Eq. ( 6) the unit interval, with E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 7 0 1 The quantity that appears in the forward model for the n'th trial is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 6 6 3 where B k ðT n Þ is the (band-integrated, as needed) Planck function at temperature T n .The parametrization of the choice fT n ; ϵðkÞg in terms of the vector x is a mapping into the topological product ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 5 7 8 of I 1 and jK m j.The set H mþ1 is not a simplex nor is it necessarily a simplicial complex.It is, however, a convex polytope and is the convex hull of its vertices Although we will not need it in what follows, H mþ1 can be decomposed into either a simplicial complex or a union of simplices.We score trial mixtures by that we most wish to maximize; the posterior probability for the observed spectral radiance to originate from a surface patch with temperature T and spectral emissivity ϵðkÞ.A standard argument 3,5 gives the posterior probability in terms of a MAXENT estimator in terms of a forward model E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 3 9 3 that is some function of the n'th trial, in each spectral bin k.We note that while the equation of transfer is linear, the dependence of its solution I FM upon ϵ i ðkÞB k ðT n Þ need not be.The assumed noise variance σ 2 is shown as having a formal dependence upon a parameter, the "annealing temperature" T a , which governs the annealing schedule for the search for an MAP solution.
The joint posterior probability in J spectral bands is proportional to If radiance I k in each of J bands originating from a patch on the Earth's surface has been detected at the top of the atmosphere, the posterior probability that the surface patch is at a temperature T given prior knowledge K is given by Bayes' theorem as The noise variance is assumed to be known and the functional dependence of probabilities upon σ i is omitted.The prior probability PðfI i gjKÞ for the radiances fI k g has no dependence upon T and for our purposes may be absorbed into an overall normalization. 24Equation ( 15) is evaluated with the aid of the prior probability for the surface to be at temperature T and has spectral emissivity ϵðkÞ, given available knowledge K 5 where P½T; ϵ i ðkÞjfI i g; K is the conditional probability for the hypothesis that the surface temperature is T, and the spectral emissivity ϵ k , given observed radiances fI i g and prior knowledge K.Each trial is thus scored according to the joint posterior probability for observed spectral radiance I i to result from surface temperature T and spectral emissivity ϵ k E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 6 ; 6 7 5 p n ¼ P½T n ; ϵðkÞjfI i g; K ≡ p n ðxÞ; (17)   where x stands for fx; x mþ1 g.Thus, in going from the ðn − 1Þ'th to the n'th trial, the n'th candidate mixture is selected by Monte Carlo draw and p n for the new trial is compared to p n−1 for the last one.The probability that it is accepted is 18,20 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 1 1 6 ; 6 0 9 where the probability PðT a Þ of taking a downward step in p n is determined by the annealing schedule.The dependence of PðT a Þ on the annealing schedule is symbolized by the annealing temperature T a , which is taken to decrease systematically during the MAP search.The actual form PðT a Þ takes in practical calculations and is determined empirically.

Convergence
We now examine the question of convergence.Corresponding to the sequence of m-simplices, K m , as the number of trials n increases without bound is a sequence of trials fT n ; ϵ i ðkÞg with associated loci fxg ∈ H mþ1 .As a closed bounded subset of R mþ1 , H mþ1 is a compactum.Therefore, as n → ∞, the sequence of trials x contains a convergent subsequence, whatever the value of m.Correspondingly, the sequence of posterior probabilities likewise has a convergent subsequence that, by construction, tends to the maximum value of the posterior probability, i.e., to an MAP solution for T and ϵðkÞ.
Consider the map x 0 ¼ ΦðxÞ given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 1 1 6 ; 3 7 1 The mapping of Eq. ( 19) gives the action of the Metropolis algorithm according to Eq. ( 18) at sufficiently late times in the annealing schedule so that a transition to a state of decreased posterior probability occurs rarely, in the limit, almost never.We have noted that at a sufficiently late point in the annealing schedule, trials that decrease the posterior probability [Eq.( 17)] will become infrequent.We may elide any such trials without affecting the convergence of the subsequence, which then takes the form E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 0 ; 1 1 6 ; 2 5 4 For all n greater than some M, convergence of the subsequence implies the Cauchy condition E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 1 ; 1 1 6 ; 2 1 1 (with the Euclidean norm supplying a suitable metric for finite m) so that E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 2 ; 1 1 6 ; 1 6 8 x → ΦðxÞ: The mapping of Eq. ( 19) generates a sequence of trials x for which p n is nondecreasing.By Zorn's lemma, the set comprised of all admissible trials x has at least one element with a maximal value of p n .We note that maximizing p n also maximizes the information-theoretic entropy by Eq. (12).According to the usual statement of the second law, the state of maximum entropy is one of thermodynamic equilibrium.We may, in view of the thermodynamic analogy underlying the Metropolis algorithm, therefore call the limit Eq. ( 22) an equilibrium point.This nomenclature is attractive for another reason.In the limit, Eq. ( 22) amounts to a fixed point of Eq. ( 19).Ordinary fixed-point theorems are inapplicable to Eq. ( 19) because it is neither continuous nor semicontinuous: It can map an open set ∈ H mþ1 to a singleton x 0 .We can, however, adapt the celebrated construction introduced by Nash 25 to prove the existence of a fixed (equilibrium) point of an equivalent self-mapping.
In fact, we shall prove a somewhat stronger result.Consider E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 3 ; 1 1 6 ; 6 6 3 for stipulated x.The function ϕ is continuous in the mixture x α .Define the mapping N∶x → x 0 by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 4 ; 1 1 6 ; 6 0 9 where the index α is taken to run over members of any finite set of admissible trials x α in the execution of the Metropolis algorithm.(One may think of the collection of all sequences x α in ensemble-theoretic terms.)Suppose that x 0 is a fixed point under Eq.( 24).In Eq. ( 24), some values of α correspond to choices for fT i ; ϵ i ðkÞg for which the posterior probability does not increase E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 5 ; 1 1 6 ; 4 9 7 For these values of α E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 6 ; 1 1 6 ; 4 5 4 If the choice x is fixed under the mapping N in Eq. ( 24), then the contribution to x 0 from any x β must not decrease; therefore, ϕ β ¼ 0, ∀ β, lest the denominator in Φ exceed unity.Put another way, no other choice of fT i ; ϵ i ðkÞg can increase the posterior probability, but that is the definition of an equilibrium point.If, on the other hand, an equilibrium point x maximizes the posterior probability [Eq.( 17)], every ϕ α vanishes, so that x is a fixed point.Equation ( 24) is continuous and maps points x into a convex compactum ⊂ R mþ1 .A fixed point E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 7 ; 1 1 6 ; 3 3 9 Therefore, exists according to the Brouwer fixed-point theorem that, by construction, maximizes the a-posteriori probability of x.The mapping Eq. ( 19) generates a sequence of trials x for which p n is nondecreasing and gives the maximal value of p n in the limit, while Eq. ( 27) demonstrates the existence of a trial x Ã for which p n cannot be made greater.In view of the ensemble-theoretic freedom to choose x α , we may identify the limit in Eq. ( 22) with the fixed point in Eq. (27).Therefore, a convergent subsequence of annealing trials exists that tends to an equilibrium point.Moreover, Eq. ( 27) demonstrates that the annealing search can, in principle, find x Ã in a finite number of trials.We conclude that, granted a suitable annealing schedule, there exists at least one convergent sequence of trials that tends to MAP surface temperature and spectral emissivity estimates consistent with observed spectral radiances I k .

Uniqueness
Sequential compactness guarantees the existence of a convergent subsequence of trials.In practice, we must expect that there will be more than one such sequence.The nonuniqueness of solutions to the TES problem suggests that there will be a continuous infinity of possible trials fT n ; ϵ i ðkÞg that yield any stipulated value for the posterior probability.In any realizable search strategy, however, we need only contend with a countable set of convergent subsequences.Among these, there will be one for which the posterior probability is greatest.Its limit will be the closest approach to the MAP solution achieved by simulated annealing.In the nature of things, more than one convergent subsequence may be expected to exist that yields this same maximal estimate, with the same asymptotic annealing temperature T ∞ a .We ignore all subsequences except these maximal ones.
In Refs. 5 and 3, expectation values for T and fϵðkÞg over the the posterior probability [Eq.(15)] were shown to give good estimates for physical surface temperatures and emissivities.We claim that the mean of a large number of subsequences that converge to the limiting MAP value will tend to the expectation values for T and fϵðkÞg with respect to Eq. (15).
The MAXENT estimator is constructed from the posterior probability of noise power in a spectral bin.For the sake of simplicity, we assume identical noise power in each bin.(This assumption is inessential and may be relaxed.)A fully annealed MAP estimate may be thought of as an individual Bernoulli trial drawn from the likelihood function for fT n ; ϵ i ðkÞg.By construction, all such trials are independent and identically distributed with bounded expectation values.Moments over Eq. ( 15) are bounded despite bad behavior of the Jeffreys prior at T ¼ 0, because of the rapid decay of the exponentials away from the MAP solution, as L'Hôpital's rule demonstrates.Let E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 8 ; 1 1 6 ; 5 3 2 and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 9 ; 1 1 6 ; 4 7 0 be the means of MAP surface temperature and spectral emissivity taken over N convergent subsequences.Suppose the covariance matrix Σ of the trials to be nonsingular.We invoke the multivariate central limit theorem to conclude the mean values converge weakly to the multivariate Gaussian distribution Reliance on the mixing hypothesis in the form given by Eq. ( 10), however, brings with it the concern that the relevant covariance matrix might be singular.In that event, the strong law of large numbers 26,27 ensures E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 1 ; 1 1 6 ; 2 5 3 T ã :s: hTi; and E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 2 ; 1 1 6 ; 2 0 3 ϵðkÞ ã :s: hϵðkÞi; but without giving estimated variances of the mean values, such as with Eq. (30).
To the extent that the estimator used in the simulated annealing search is zero-mean error, we conclude the estimates yield accurate values for the physical values of T and fϵðkÞg.As the spectral weights x, lying as they do between zero and unity, possess bounded moments, this conclusion applies to the limiting mean values of fT n ; fx n m gg as well.The practical utility of the mathematical development in this letter may be questioned.We briefly address two possible concerns.A legitimate concern is that the spectral emissivity of natural ground covers in the wild will seldom be known to the level of accuracy found in Ref. 28.While true in general, this concern has not dissuaded other researchers from relying upon spectral unmixing.
While convergence of the algorithm has been proved to our satisfaction, we have no equally satisfactory estimates of the rate of convergence, with the consequence that the choice of annealing schedule remains a matter of trial and error.In response to this concern, the availability of massively parallel computation made possible by the ready availability of cheap graphics processing unit arrays means that massive processing requirements need not preclude the use of a resourcehungry algorithm if that algorithm can provide a performance not attainable by other approaches.

Conclusion
In this letter, the Metropolis algorithm has been adapted to formulate a simulated annealing approach to the TES problem for a spectral mixture.We have presented a proof of convergence of simulated annealing searches for candidate MAP TES solutions.We have additionally shown that the average of a large number of these candidate MAP solutions converges almost surely to a unique estimate of surface temperature and spectral emissivity that, given a forward model leading to an unbiassed estimator for T and fϵ k g, closely approximates the true values of these quantities.
The simulated annealing approach to TES by spectral unmixing offers something that other TES algorithms do not.By construction, it gives (in the limit) the unique best estimate in an MAP sense, for the remote determination of surface temperature and spectral emissivity of a patch of ground that is known to be comprised of a spectral mixture of a stipulated set of spectral end members.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 6 ; 1 1 6 ; 9 9 P½T; ϵðkÞjKdT ∝ Y k dϵðkÞ dT T ;

E
Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 ↝N m ð0; ΣÞ: