Practical model for improved classification of trace chemical residues on surfaces in active spectroscopic measurements

Abstract. Trace chemical detection and classification in stand-off reflection-based spectroscopic data is challenging due to the variability of measured data and the lack of physics-based models that can accurately predict spectra. Most available models assume that the chemical takes the form of spherical particles or uniform thin films. A more realistic chemical presentation that could be encountered is that of a nonuniform chemical film that is deposited after evaporation of the solvent that contained the chemical. We present an improved signature model for this type of solid film. The proposed model, called sparse transfer matrix, includes a log-normal distribution of film thicknesses and is found to reduce the root mean square error between simulated and measured data by about 25% when compared with either the particle or uniform thin film models. When applied to measured data, the sparse transfer matrix model provides a 10% to 28% increase in classification accuracy over traditional models.


Introduction
Detection of trace amounts of chemicals on surfaces is a desirable capability for a wide range of applications such as forensics, defense, border protection and monitoring, and other applications throughout the law enforcement and intelligence communities. 1 Chemicals of interest for these applications include explosives, chemical warfare agents, narcotics, etc. Midinfrared (MIR) spectroscopy is intrinsically capable of detecting such chemicals with both high sensitivity and high specificity. 2 Active spectroscopy is arguably the only technique capable of achieving highsensitivity standoff detection of trace chemicals on surfaces while achieving high areal coverage rates. 3,4 A notional example of an active MIR hyperspectral imaging (HSI) system is shown in Fig. 1. The system operates by measuring the spectral reflectance of the target surface in the MIR portion of the optical spectrum. The illumination source is typically a quantum cascade laser (QCL). 3,5,6 The measured signature is compared to a spectral library of reference signatures. Because of the wide range of relevant applications for this type of technology, the spectral library often includes hundreds to thousands of reference chemicals, making the association of measured data with the reference data very challenging.
The detection and classification performance of such a system is limited by the availability of relevant datasets for properly training the detection algorithms. Because it is often not possible to measure all combinations of chemicals, chemical presentations (i.e., deposition method), and substrates, it is important to be able to generate spectral libraries from physics-based signature models. 7 However, developing a signature model for trace chemical detection applications is challenging due to the phenomenological complexities. Representative models must account for multiple types of scattering with dependencies on the chemical, surface, and geometric properties, including particle size and distribution, surface roughness and dielectric properties, illumination angle, etc. 8,9 Furthermore, due to the high variability of chemical and surface properties in real-world data, the signature model parameters must be carefully selected. 10 Of particular interest to the research presented in this paper is the development and testing of a signature model designed specifically for modeling a trace chemical residue. Here, trace chemical residue is defined as the film-like residue that remains on a surface after the evaporation of a solvent that contained the chemical. Previous models that have been developed for trace chemical particles (e.g. Mie scattering) [11][12][13] and uniform thin liquid films (e.g. the transfer matrix -TMmodel), [14][15][16][17][18] do not sufficiently capture the case of the film-like residue. Specifically, we have found that chemical residues do not present themselves as uniform films, as shown in the photomicrographs of two chemical residue samples on glass in Fig. 2. This paper proposes a modification to the well-known TM model to specifically handle the physics of trace chemical residue. We call this modified model sparse transfer matrix (STM). STM assumes a nonuniform film with thickness sampled from a log-normal distribution, as opposed to the standard TM model, which assumes uniform film thickness. Note that STM includes a uniform thin film as a special case. This paper is structured as follows. Section 2 gives a brief overview of previous research in the area of active spectroscopic phenomenology of chemicals. Specifically, we cover the Fig. 1 Notional depiction of stand-off trace chemical detection via an active spectroscopic instrument. The reference signature library is pertinent to the system's ability to detect chemicals of interest. state-of-the-art in extracting optical constants that are required inputs to any chemical signature model followed by a summary of physical, chemical, surface, and geometric properties that have been found to cause spectral signature variability. We finish this section by defining two related physics-based models for the application at hand. The proposed model for this research is described in Sec. 3. The descriptions of real and synthetic data used in this research can be found in Sec. 4.1. Section 4.2 compares the overall fit of the various signature models with measured data. The results on the classification performance improvement on real data are presented in Sec. 4.3.

Background
The derivation of physics-based signature models used for active spectroscopy requires not only an understanding of the underlying physics, but also information on the various physical properties. We define the models presented in this paper as being "physics-based" to make the distinction between physical models and machine/deep learning models, which have also been applied to chemical reflectance modeling. 7 The physical models that are typically used in standoff active spectroscopy applications combine theoretical physics with empirical measurements or assumptions as some if not many of the physical properties are not known for a given measurement. 19 In the next few subsections, we discuss some of the physical properties and empirical data that we use to estimate and predict chemical reflectance. One of the most crucial inputs to a physics-based model is the wavenumber-dependent complex optical constants that are unique to each chemical. 20 There exists much research in the literature on extracting chemical optical constants of chemicals, in either liquid or solid phase, as well as signature models for calculating the reflection spectrum from contaminated surfaces. Although this area of developing active spectroscopic signature models has been explored for quite some time, we consider two of the most widely-accepted models in this paper: one for solid particles on a surface and one for uniform thin liquid films on a surface. 19

Estimating Optical Constants
It is well known that the underlying spectral features (absorption peaks in active spectroscopy) of both the chemical and surface derive from their complex optical constants. 20 While the determination of optical constants for liquids is relatively straightforward, their determination for solids is much more complicated. One of the more widely accepted approaches is to use single-angle reflectance spectroscopy followed by Kramers-Kronig transformation for estimating optical constants (ñ) for crystalline solids. [21][22][23][24] For solid minerals, DeVetter et al. have shown that using carefully designed mask apertures with low reflectance is more optimal for solid minerals. 25 The optical constants used in this research were measured and provided by Pacific Northwest National Laboratory (PNNL) through IARPA's (Intelligence Advanced Research Projects Activity) SILMARILS (Standoff ILluminator for Measuring Absorbance and Reflectance Infrared Light Signatures) program.

Trace Chemical Phenomenology
In measured data, estimating trace chemical reflectance is not as simple as calculating the reflectance of a solid chemical at the chemical/air boundary. 26 Instead, it varies greatly with a number of factors. Some of the more significant parameters are particle size 27 and shape for solids 28 or film thickness in the case of liquids, 29,30 sample morphology, 31 surface roughness, 32 and sampling angle (i.e. bidirectional reflectance function -BRDF). 9,33 In addition to these factors, reflectance spectra may also vary with chemical thermodynamic state, 34 molecular interactions, 35 and humidity. 36,37 Figure 3 demonstrates the expected variability of normalized reflectance for trace chemical films on surfaces. The curves show measurements for six identical samples (saccharin films on glass) collected by the same sensor at two slightly different measurement angles. There is high variability in the overall spectral shape and the depth of spectral features due to the differences in measurement geometry.

Existing Signature Models
The implementations of the Mie scattering particle model and TM uniform thin film model considered in this paper are defined by Myers et al. 19 For the reader's convenience, these models are summarized in the next two sections.

Mie scattering models for particles
Mie scattering describes light scattering from an isolated spherical particle of known complex optical constant and diameter. In addition, it is often used to approximately describe the scattering of light from particles on a surface, which is schematically depicted in Fig. 4(a). For the case of particles on a surface, the effect of the substrate has a significant impact on the reflectance spectrum. The Mie scattering-based model accounts for multiple types of scatteringbackscattering from the particle back to the sensor at varying angles depending on the particle shape and sensing geometry (e.g., angle) and forward scattering from the particle to the substrate (i.e., surface) first and back toward the sensor second [see Fig. 4(a)]. Finally, there is reflectance of the bare substrate itself in regions that are not covered in particles. The fraction of a pixel covered by particles is known as the fill factor. The fill factor (FF) depends on the chemical mass loading (i.e., concentration), m (μg∕cm 2 ), chemical density, ρ (μg∕cm 3 ), particle diameter mean, μ (cm), and standard deviation, σ (cm), as Fig. 3 Normalized reflectance measurements of six identical samples (saccharin film on glass at a concentration of 100 μg∕cm 2 ) collected with the same sensor at two measurement angles, demonstrating the high variability of trace chemicals. Note the variability in not only overall spectral shape, but also the depth of the distinct spectral features. Fig. 4 (a) A diagram of the types of scattering captured by the Mie scattering particle model. Backscatter interacts with the particle and reflects back toward the sensor while forward scatter reflects off the particle, onto the substrate, and back toward the sensor. Areas without particles will only show substrate reflectance. (b) The TM method models the light refraction as it travels through and back out of the liquid film on the substrate, as well as scattering within the film as the light interacts with the substrate itself. (c) The STM model includes films of nonuniform thickness sparsely covering the pixels. The film contributions are calculated using the TM method. The reflectance is a linear combination of film and substrate reflectance. (1) where D particle;i is a particle diameter with units (cm) sampled from a particle size distribution.
We use a log-normal distribution, shown to be effective for modeling particle sizes [38][39][40] with mean μ and standard deviation σ. Let R particle be the particle reflectance based on Mie scattering E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 4 8 0 where λ is the wavenumber (cm −1 ) and R sub is the substrate reflectance. S B , S LT , and S QT are the backscattering, linear transflection, and quadratic transflection strength parameters, respectively. Q B and Q S are the backward and forward scattering reflectance contributions calculated using the complex optical constants for the specific chemical. Then, the full model for a particle on a surface is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 8 8 where SF sub is the substrate scale factor, which may be used as a proxy for BRDF information.
The user must define the particle diameter mean, μ, standard deviation, σ, and substrate scale factor SF sub . Figure 5 shows comparisons of the Mie scattering model predictions to actual measurements of cyclotrimethylenetrinitramine (RDX) and pentaerythritol tetranitrate (PETN) particles on glass (samples prepared by the Naval Research Laboratory). These results were generated using μ ¼ 12 μm, σ ¼ 10 μm, and SF sub ¼ 1.0.

Transfer matrix model for liquids
The TM method is a standard approach for calculating the reflection and transmission properties through a stack of uniform thin films with each layer having a known complex refractive index thickness [see Fig. 4(b)]. Recall the complex optical constant is denoted byñ. We define the optical constants at each uniform thin film layer interface as:ñ air for air,ñ chem for the chemical, andñ sub for the substrate. The complex reflection coefficients at each layer interface are defined by r 1 and r 2 (for the single chemical case) 16  where * indicates the complex conjugate. Let r 3 be defined as where δ is the optical depth through the chemical film. 17,18,26 Finally, the reflectance from a uniform thin film is given as 14,15 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 1 7 8 R F ðλÞ ¼ r 3 ðλÞr 3 ðλÞ Ã : Note that r 1 and r 2 are calculated at normal incidence. This is an approximation, as knowledge of the incidence angle is not guaranteed in standoff active spectroscopy applications.
We previously used this model as part of a hyperspectral imaging simulator. As shown in Fig. 6, the simulator was able to duplicate the main characteristics of the hyperspectral image of a sample that depicted the logo of IARPA using two different chemicals. 19 In particular, the TM model effectively predicted the spectra of two chemicals, silicone oil and triethyl phosphate (TEP), on a plastic surface is shown in Fig. 6.

Sparse Transfer Matrix Model for Solid Film Residue
For most samples, however, we have found that neither the Mie scattering nor the TM method is sufficient to model film-like residues. Instead, we developed a model called STM to account for this case. STM assumes that only a portion of the surface is covered by the chemical, the fraction called the fill factor. The remainder is bare substrate. This is shown in Fig. 4(c). Furthermore, the film in the contaminated regions is assumed to have a nonuniform thickness. As with the Mie scattering model for particles, we assume the film thickness follows a log-normal distribution. The STM model is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 6 3 3 Before using the STM model for detection and classification of chemical residue samples, we must set some application-dependent parameters: the particle diameter mean and standard deviation, μ and σ, respectively, and the substrate scale factor, SF sub .

Experiments and Results
The analysis for this effort focuses on demonstrating the utility of the STM model. First, we quantitatively compare the synthetic spectra generated by each model to measured data. We include qualitative comparisons between the measured data and simulated data to demonstrate the phenomenology captured by each of the models. Finally, we observe the improvement in classification performance on measured data when using the proposed STM model over the more well-known models. The measurements used for this analysis were acquired by Kelley et al., 5,6 and the full simulation tool used to produce synthetic spectra with the STM model was developed by Myers et al. 19 as part of the IARPA SILMARILS program.

Description of the Measured Data
Various substrate samples with chemical contaminations at a range of concentrations were prepared and provided by Johns Hopkins University Applied Physical Laboratory (JHU/APL). The solid chemicals were first dissolved in a solvent and then evenly airbrushed over the substrates using a mechanical arm. The active MIR hyperspectral reflectance measurements were collected by the system developed by Block MEMS for the IARPA SILMARILS program. 5,6,41 In total, JHU/APL prepared six different chemicals on eight different substrates, though not all of the chemicals were used on all of the substrates. To avoid biasing the classification results for a particular chemical-substrate combination, we limit the data used for these experiments to those chemicals and substrates for which we have at least one measurement for each unique pair (three chemicals and four substrates in this case). The breakdown of measured samples per chemical, substrate, and concentration are shown in Table 1. As shown in Table 1, there are an unequal number of measurements for each chemicalsubstrate class. This can lead to biased parameter tuning and results if not addressed properly. For this research, we focus on overall performance metrics. That is, fit and classification performance results are averaged within each class prior to averaging performance results across the different classes in Table 1.
Recall from Secs. 2.3.1 and 3 that the user must define several parameters before applying the Mie scattering or STM models: the particle diameter mean and standard deviation, μ and σ, respectively, and the substrate scale factor, SF sub . The selected parameters should be relevant and physically realistic for the trace chemical detection application. Ideally, a range of values for each parameter should be used such that the simulations capture the full variability. Solid particles with a mean diameter of 10 μm were dissolved to produce the samples discussed. Though dissolved particles may be <0.1 μm in diameter, the scattering from such particles is negligible in the MIR where the illumination waves are on the order of 1 μm. Similarly, we only consider particle diameter standard deviations of 0.1 to 1.25 μm. The substrate scale factor in the Mie scattering and STM models provides a proxy for the substrate BRDF as this information is not necessarily readily available. We consider a scale factor ranging from 0.1 to 10.0, which is the range of BRDF values measured from a clean sample of high-density polyethylene (HDPE) (measurements provided by PNNL under the IARPA SILMARILS program). These parameter ranges are summarized in Table 2. Both the simulated and real data used for this analysis consist of 200 wavenumbers from 980 to 1290 cm −1 with an approximate 1.56 cm −1 spacing. Reflectance signatures are normalized to avoid any calibration inconsistencies as well as to show the differences in the overall shape and location of spectral features, which are the discriminating features in active spectroscopy detection and classification applications.

Comparisons of Simulated and Measured Data
The plots in Fig. 7 provide qualitative comparisons of the synthetic spectra generated by the STM simulation tool (gray curves) with their corresponding measurements (black curves). The variability in the simulated spectra can be attributed to the wide range of parameter values summarized in Table 2. Of perhaps more interest to this research is the quantitative comparison of the abilities of the various signature models discussed to accurately model real measured reflectance signatures. For this comparison, we calculate the overall root mean square error (RMSE) of the outputs of the Mie scattering, TM, and STM models with their corresponding measurements.
We calculate overall RMSE while varying each of the model parameters to capture the sensitivity of the models considered. Using Eq. (1), we find that the fill factor is only <1 for mean particle diameters >2 μm. Therefore, the substrate scale factor has no effect for small particle sizes [see Eqs. (3) and (8)]. We begin the sensitivity analysis by varying the mean particle diameter, μ, within the range in Table 2. The standard deviation, σ, and substrate scale factor, SF sub , are set to 0.5 μm and 1.0, respectively. The average RMSE for each of the reflectance  Rough aluminum 4 at 10 μg∕cm 2 1 at 10 μg∕cm 2 3 at 10 μg∕cm 2 7 at 100 μg∕cm 2 5 at 100 μg∕cm 2 5 at 100 μg∕cm 2 -1 at 150 μg∕cm 2 1 at 150 μg∕cm 2 models is shown as a function of μ in the left-hand plot in Fig. 8. Because the TM model only varies with the input concentration, its RMSE does not vary with μ. STM outperforms the other two models in terms of overall RMSE. Both STM and Mie scattering demonstrate minimum RMSE for mean particle diameters >2 μm (i.e., fill factor <1) indicating that the uniform thin film assumption of the TM model is less valid for residue samples.  We continue the sensitivity analysis while allowing σ to vary. For this result, μ is set to 5.46 μm to jointly minimize the RMSE for both the Mie scattering and STM models. Figure 8(b) shows that the particle size standard deviation has less effect on the RMSE than the mean particle size. Finally, we set σ to 1.14 μm to minimize the RMSE of both Mie scattering and STM and allow SF sub to vary. The result is shown in Fig. 8(c). Overall, the STM model achieves an average 25% reduction in overall RMSE.
The RMSE provides a measurement of the overall fit of the simulated data to the measured data, but does not tell us how well the models capture the phenomenology of the samples. Figure 9 compares example spectra simulated by each of the models with real measurements. As with the previous results, we select model parameters that jointly minimized the RMSE for both the Mie scattering and STM models for this result: μ ¼ 5.46 μm, σ ¼ 1.14 μm, and SF sub ¼ 1.0. The examples include all three chemicals on three different substrates. The Mie scattering and TM models capture some of the spectral features in each sample, but the STM models provides an overall better match to the phenomenology. Some of the differences between the STM and TM models in particular appear minor (e.g., the feature at 1025 cm −1 in the pentaerythritol plot). Recall that an active spectrometer has many applications, and the spectral library often contains hundreds or more chemicals. Minor differences such as this are critical for accurate classification when considering many chemicals that share the same or very similar features.

Classification Results on Real Data
Next, we test the ability of the reflectance models to improve classification results on real measurements. For these results, we use RMSE as a classification metric. We compare each measurement to the full spectral library generated by each model and select the chemical that minimizes the RMSE. The model parameters are varied in the same way as in Sec. 4.2. The overall classification accuracy as a function of each model parameter is shown for each model in Fig. 10. Again, we see that the Mie scattering and STM models perform better for Fig. 9 Comparisons of measured data (black solid curves) with example spectra generated by the three signature models considered. The model parameters were selected to jointly minimize the overall RMSE of all models. The STM model (red dotted curves) captures more of the features and provides an overall better fit to the phenomenology of the measured data than the Mie scattering (blue dashed curves) and TM (green dashed curves). Fig. 10 Overall classification accuracy as a function of each of the model parameters for each of the three signature models considered. The STM model (red dotted curves) consistently outperforms the other models in terms of overall fit to the measured data. In particular, both the STM and Mie scattering models achieve higher accuracy for larger particle diameters (fill factor <1), indicating that the nonuniform film assumption is valid. larger particle sizes, suggesting that substrate effects must be considered for capturing the phenomenology of residue samples. Again, performance is averaged over each class prior to averaging across the different chemical-substrate classes. On average, STM achieves an overall classification accuracy 12% to 15% greater than TM and Mie scattering, respectively. The peak classification accuracy of the STM model is 74% as compared to the TM and Mie scattering models at 46% and 64%, respectively.

Summary
In this work, we present STM, an extension of the physics-based transfer matrix model. The STM model better captures the phenomenology of chemical residues on surfaces by allowing for a log-normal distribution of film thicknesses sparsely covering the surface. We compare the STM model to the well-known Mie scattering and the standard transfer matrix models. First, we quantify the overall fit of the simulated spectra to the measured data as a function of the model parameters. Our STM model reduces the overall RMSE between simulated spectra and measured spectra by about 25%. We also calculate the overall classification accuracy achieved when using each of the three models to generate the reference signature library. When the model parameters are optimized, STM outperforms the other two signature models by 10% to 28%.