Transmission versus reflectance spectroscopy for quantitation

Abstract. The objective of this work was to compare the accuracy of analyte concentration estimation when using transmission versus diffuse reflectance spectroscopy of a scattering medium. Monte Carlo ray tracing of light through the medium was used in conjunction with pure component absorption spectra and Beer–Lambert absorption along each ray’s pathlength to generate matched sets of pseudoabsorbance spectra, containing water and six analytes present in skin. PLS regression models revealed an improvement in accuracy when using transmission compared to reflectance for a range of medium thicknesses and instrument noise levels. An analytical expression revealed the source of the accuracy degradation with reflectance was due both to the reduced collection efficiency for a fixed instrument etendue and to the broad pathlength distribution that detected light travels in the medium before exiting from the incident side.


Introduction
For some biomedical optical spectroscopy applications, it is possible to measure the tissue of interest in either transmission mode or diffuse reflectance mode. For example, pulse oximetry is most commonly performed in transmission mode on the finger or earlobe, but devices have also been developed to use reflectance mode on the forehead. 1 Schmitt has shown that pulse oximetry transmission and reflectance modes should yield similar accuracy and sensitivity to potential interferences when the diffusion approximation to light propagation in a turbid medium is applicable. 2 Indeed, the accuracy of pulse oximetry in both modes has been experimentally shown to be equivalent, with a root-mean-squared error (RMSE) of about 2% compared to a blood reference. 3,4 One of the key factors in enabling pulse oximetry to become a standard clinical measurement 5 is the pulsatile nature of the signal of interest, which allows much of the interfering background tissue signals to be easily minimized. 6,7 Other quantitative applications of biomedical optical spectroscopy cannot take advantage of a pulsatile signal. For example, regional tissue oximetry of both the brain and muscle uses reflectance mode sampling of the total remitted intensity. The RMSE of these measurements is about 4%, 8,9 twice that of pulse oximetry. Regional oximetry cannot be made in transmission mode, due to the extreme attenuation of the optical signal along the long transmission path through the head or arm. Another application, noninvasive glucose monitoring using near-infrared (NIR) spectroscopy, has been attempted by many researchers at a variety of sites using either mode, including the finger in transmission, 10 inner lip in reflectance, 11 finger in reflectance, 12 forearm in reflectance, 13,14 and tongue in transmission. 15 However, no direct comparison between transmission and reflectance modes has been made using the same instrumentation and subject population.
Intuitively, it makes sense that a transmission measurement through a tissue that is relatively thin compared to its inverse scattering coefficient would yield better accuracy than a measurement made in diffuse reflectance mode from a thick tissue. This paper quantifies the improvement using realistic simulations of tissue absorption spectroscopy. Monte Carlo ray trace simulations were performed for two measurement geometries: diffuse reflectance from a semi-infinite medium and transmission through a slab medium of several different thicknesses. Then, sets of synthetic spectra with identical analyte concentrations were generated for the two geometries. Partial least-squares regression of spectral changes to analyte concentration changes were made to compare the quantitative accuracy of reflectance and transmission.

Theory
The absorbance measured from a homogeneous, scattering medium at a single wavelength is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 3 2 6 ; 2 2 1 The absorption coefficient, μ a (mm −1 ), is due to the sum of all absorbers in the medium, present at variable concentration. This coefficient can further be expressed as a sum of each analyte's molar absorptivity times its concentration, μ a ¼ lnð10Þ The pathlengths, l (mm), traveled by light follow a probability distribution, pðlÞ (mm −1 ). And, f is a factor accounting for both specular reflection off the medium surface and the collection efficiency of the measurement system.
For a nonscattering medium measured in transmission through a constant pathlength, the pathlength distribution is a delta function, and Eq. (1) reduces to the familiar Beer-Lambert law. For a scattering medium, the pathlength distribution results in a nonlinear relationship between absorbance and analyte concentration. This can be shown by taking the partial derivative of Eq. (1) with respect to one analyte's concentration, with the restriction that a change in the analyte's concentration does not change pðlÞ 16 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 6 5 3 The effective pathlength, l eff (mm), is the weighted-average pathlength traveled through the sample by detected light; the weighting function is simply the transmission along a pathlength l. Although Eq.
(2) has the form of the Beer-Lambert law, and is sometimes referred to as the modified Beer-Lambert law, 17 note that the effective pathlength depends on the total absorption coefficient of the medium. Therefore, as concentration of any analyte changes, the effective pathlength changes, and the relationship between absorbance and concentration is nonlinear. Noting that Eq. (2) defines the mean effective pathlength as the first central moment of pathlength of the probability density function pðlÞe −μ a l , I introduce here a new term using the second central moment E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 6 3 ; 4 6 4 σ 2 l eff ¼ The square root of this variance term is the standard deviation of effective pathlength, and is a measure of how broad the weighted pathlength distribution is about its mean. In the nonscattering transmission case, this term is zero. As the standard deviation in effective pathlength gets larger compared to the mean, one would expect the relationship between absorbance and concentration to become more nonlinear, and more difficult to model. Spectroscopy is used to compensate for unwanted signal variation caused by interfering analyte changes, by using measurements of multiple wavelengths. If, for example, there are 10 independent analytes varying in the sample, then at least 10 wavelengths are needed to form an accurate calibration model relating absorbance changes to concentration changes of a single analyte. In a nonscattering medium, the pathlength is constant for all wavelengths. However, in a scattering medium, the effective pathlength depends on the total absorption, which varies as a function of wavelength. This effective pathlength spectrum distorts the pure component spectrum of the analyte of interest. And, when any analyte in the sample changes concentration, the entire effective pathlength spectrum will change by a different amount at each wavelength. This second nonlinearity across wavelengths makes it more difficult for spectroscopy to provide accurate quantitative information about an analyte in a scattering medium.

Monte Carlo Simulation
The Monte Carlo method of light propagation in a homogeneous, scattering medium was used. Code by Prahl 18 was modified to include a slab geometry, angled ray launch and detection, and spatially offset ray launch. The modified algorithm was implemented in MATLAB ® Release 2016a (The MathWorks, Inc., Natick, Massachusetts). For each spectrum, the medium had a randomly generated scattering coefficient of 9 to 11 mm −1 , a Henyey-Greenstein scattering anisotropy factor 19 of 0.8 to 0.9, a refractive index of 1.4, and zero absorption. The scattering and refractive index properties are representative of human tissue in the 2100-to 2400-nm spectral region, 20 but were each approximated to be independent of wavelength in order to decrease the total simulation time. Rays that traveled a pathlength longer than 30 mm were approximated to be completely absorbed by the medium, as this distance translates to over 20 orders of magnitude attenuation for water-based tissue in the spectral region of interest. Six ray traces were carried out: one for a planar, semi-infinitely thick medium, and the other five for a slab with 0.5, 1, 1.5, 2, and 2.5 mm thicknesses (Fig. 1). For all cases, light was launched uniformly in position and angle within a 1-mm diameter circle and 0.37 numerical aperture (NA) cone on the medium surface. For the slab cases, light was transmitted through the slab into a 2-mm-diameter circle with 0.37 NA, resulting in a collection etendue of 0.43 mm 2 sr −1 . For the reflectance case, light was collected by seven 0.6-mm core diameter 0.37 NA fibers. When packed into a bundle, these seven fibers also result in a 0.43-mm 2 sr −1 collection etendue. Depending on the geometry, 0.1 to 2.8 million rays were injected normal to the planar surface and traced, such that about 20,000 rays were detected for each geometry. The pathlengths of rays diffusely reflected and (in the slab cases) transmitted were stored. Repeat traces with a different starting random number seed and different number of total rays traced were used to verify that the simulations yielded stable results in terms of the quantitative accuracy discussed below.

Synthetic Spectra
After ray-tracing, six sets of synthetic absorbance spectra were generated using the pathlength distributions and Eq. (1). The absorptivity spectra εðλÞ of water, collagen, elastin, triolein, decorin, ethanol, and glucose in the 2100-to 2400-nm region ( Fig. 2) were estimated from nonscattering transmission measurements of each component in water using an FT-NIR spectrometer. This region is important spectroscopically as it contains combinations of the fundamental vibrational modes of important functional groups such as C─H, O─H, and N─H. 21 As such, even minor analytes, such as ethanol 22 and glucose, 23 can be quantitated using this spectroscopic region. Water concentrations in the range 66.5% to 73.5%, typical of skin water content, 24 were chosen at random, and combined with random concentrations of other analytes, plus water temperature variation of AE0.5°C, to form the total absorption coefficient spectrum, μ a ¼ lnð10Þ P ε i c i . See Table 2 for the uniform concentration ranges of each analyte. In total, 200 absorption coefficient spectra were generated, resulting in matched sets of diffuse reflectance spectra for the semi-infinite medium and transmission spectra for the slab media. After generating the synthetic spectra, noise was randomly added to the reflectance and transmission spectra, assuming the source power was constant across the spectrum. The noise was modeled as detector noise, which is normally distributed and independent of wavelength in intensity space. The level of intensity noise added was such that average signal-to-noise ratio for the reflectance mode spectra was ln(10) times 10 3 , 10 4 , 10 5 , or 10 6 . These values in intensity space translate to noise levels in absorbance (-log of reflectance) space of 10 −3 , 10 −4 , 10 −5 , and 10 −6 , respectively. These same noise levels were then randomly added to the transmission mode data sets.
After adding noise, the absorbance spectra were either used as is for regression or first preprocessed. The preprocessing was the standard multiplicative scatter correction method, 25 which for each spectrum finds a scalar additive value and scalar multiplicative value such that the least-squares difference between the scaled spectrum and the average spectrum is minimized.

Partial Least-Squares Regression
The PLS method was used to regress analyte concentrations onto spectra. This algorithm is the most common quantitative method used in NIR spectroscopy, as it is suited for cases where there are many predictor variables (wavelengths) compared to training samples, and for which at least some of the predictor variables are correlated. It overcomes these traditional limitations by regressing concentrations onto latent variables (also referred to as loadings or factors) instead of onto individual wavelength channels in a spectrum. As with other regression methods, the result of this algorithm is a vector of regression coefficients, β, which when multiplied by an absorbance spectrum, A, and summed across wavelengths (dot product), give an estimate of the analyte concentration: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 5 9 1ĉ For each of the seven analytes, PLS models were built and accuracy was assessed using 100 bootstrap iterations of holding out 10% random samples at a time. The accuracy metric used was RMSE between PLS model predictions and known analyte concentrations. The minimum RMSE when using from 1 to 40 latent variables was stored for each analytical model.

Results
Typical pathlength distributions are shown in Fig. 3. For clarity in the figure, only the 0.5, 1, and 2 mm thickness transmission distributions are shown. For a medium with no absorption, there are extreme differences in the pathlength distributions for different modes, with a large percentage of rays traveling more than 5 mm for the reflectance and 2 mm transmission slab cases. When weighted by an absorption coefficient of 2 mm −1 , typical of tissue around 2200 nm, 20 the reflectance pathlength distribution becomes more like the 0.5-and 1-mm transmission slab cases, but is still much broader. The distribution of the 2-mm transmission slab is most like that of the reflectance distribution, but shifted to longer pathlengths due to the 2-mm minimum allowable path. Figure 4 shows the mean absorbance and effective pathlength spectra for the reflectance and transmission modes. The absorbance in reflectance mode is most like that of the 2-mm transmission slab. However, the effective pathlength in reflectance mode is most like that of the 1-mm transmission slab. The shape of the effective pathlength spectrum for reflectance is less flat than the pathlength spectra for transmission, with smaller pathlength at either end of the spectrum where water absorption is higher (Fig. 2). Table 1 summarizes the synthetic spectra for each mode in terms of the mean (across spectra and wavelengths) detected light intensity, effective pathlength, standard deviation of effective pathlength, and effective pathlength coefficient of variation. The intensities vary by four orders of magnitude across the modes, with the reflectance mode having an intensity between the 1-and 1.5-thick transmission modes. The coefficient of variation for the reflectance mode is 45%, and is 9% or less for the transmission modes, with a slight decrease as the slab thickness increases. Figure 5 plots the results of a single quantitative model, for ethanol concentration estimation. Using 12 latent variables to relate spectra to concentrations, the RMSE for the reflectance case is 5.9 mg∕dL. For the 1-mm slab transmission case using the same number of latent variables, the accuracy is improved to 1.1 mg∕dL, a factor of 5.4 improvement. When the optimal number of latent variables is chosen independently for both geometries, the improvement factor increases slightly to 5.6. Table 2 displays a summary of the quantitative accuracy results when using the reflectance mode. Looking down the columns, the accuracy improves (RMSE decreases) dramatically as the concentration range of the analyte decreases. Looking across the rows, the accuracy remains constant for low levels of noise, and then degrades for increasing noise above 10 −3 . The relative amount of degradation is more evident for the minor analytes (ethanol and glucose).
The accuracy differences between modes is summarized using an accuracy improvement factor, defined as the ratio of minimum RMSE for reflectance to transmission. An improvement factor of 10 means that the transmission mode is 10 times more accurate (less error) than reflectance. Across all cases simulated, the improvement factor has a mean of 10.3, a median of 7.3, and ranges from 0.1 to 54. Of the total 175 model comparisons, 149 (85%) favor the transmission mode. All transmission measurements through thicknesses of 1.5 mm or less resulted in an accuracy improvement over reflectance, regardless of analyte spectral shape, analyte concentration level, or spectroscopic noise level.
Tables 3 and 4 display the improvement factors when quantifying water and ethanol, respectively, for the different levels of noise. For most cases, the improvement factor is larger for water, the main absorber in the simulation, than for ethanol, a minor analyte. The improvement factor is less than 1 for one water quantification case: the thickest tissue slab and highest noise level. For ethanol, the improvement   factor is less than 1 for the two highest noise levels in the 2-mm slab, and for the four highest noise levels in the 2.5 mm slab. When the multiplicative scatter correction preprocessing was first applied to the reflectance spectra, the spectral variance between the 200 simulated spectra decreased by over a factor of 1000, but the quantitative accuracy on average degraded. The improvement factor for preprocessed reflectance data compared to the values in Table 1 has a mean of 0.8, a median of 0.7, and ranges from 0.6 to 1.5.

Analyte
Concentration range (mg/dL)  Table 3 PLS model accuracy improvement factor when quantifying water in transmission mode, for different slab thicknesses and reflectance-mode absorbance noise levels (σ).
Transmission slab thickness (mm) Improvement factor over reflectance  Table 4 PLS model accuracy improvement factor when quantifying ethanol in transmission mode, for different slab thicknesses and reflectance-mode absorbance noise levels (σ).
Transmission slab thickness (mm) Improvement factor over reflectance

Discussion
The results of this simulation show that the effect of a broad distribution of pathlengths encountered when measuring the diffuse reflectance from a scattering medium results in a significant reduction in quantitative accuracy in measuring the medium's analyte concentrations. This is true for a variety of analyte conditions and instrument noise levels. Even when no noise is added to the synthetic spectra, and with equivalent numbers of detected rays, the transmission mode outperforms reflectance mode for a variety of medium thicknesses. This can be understood by combining Eqs. (2) and (4). A change in the absorbance spectrum, ∂A, due to a change in the analyte of interest, ∂c 1 , plus changes in all p − 1 other analytes, results in an estimated change in the analyte of interest's concentration using the PLS regression vector E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 6 3 ; 5 8 3∂ Equation (4) has two error terms. The first term, e A , is the spectral error in measuring the true absorbance spectrum and contains the detector noise simulated above, as well as other error terms such as shot noise, relative intensity noise, instrument drift, stray light, and detector nonlinearity. The second term, e 1 , is the scalar reference error in analyte concentration, which is common in spectroscopic measurements of tissue that use a reference measurement from a blood sample.
The amount of spectral error in absorbance space usually decreases with increasing amount of detected light. The decrease is linear with intensity for an instrument that is detector noise limited, depends on the square root of intensity for a shot noise limited instrument, and is independent of intensity only when relative intensity noise dominates. Since this absorbance noise is random, any regression model can model spectral changes only down to the noise floor. In this work that assumes a detector noise limited instrument, modes that detect a higher level of light, such as transmission through thin slabs (Table 1) will allow for more accurate quantitation than other modes, based solely on the size of e A relative to the other terms in Eq. (5). The same would be true of a shot noise limited instrument, although the differences between modes will be smaller in magnitude.
The most important aspect of Eq. (5) is the effective pathlength term. For any new sample, l eff can be separated into the average effective pathlength spectrum for the calibration samples used to build the regression model, l, and an additive part specific to this new sample, Δl;. This additive pathlength term varies with wavelength and is a function of the scattering and absorption (and therefore analyte concentration) properties of the medium. It is this term that makes modeling of concentration changes with absorbance spectral changes so complex, as it will change for each sample. Even two samples that have the same concentration of the analyte of interest, can have different, unmodeled absorbance differences if their unique combination of scattering and absorption properties are not well-represented in the calibration set. As discussed above, the errors due to Δl; can be modeled only down to the level of random noise present in the spectra, so increasing amounts of spectroscopic noise (e A ) limit the ability of regression methods to account for variable pathlengths. But even two modes that produce the same level of absorbance noise will still produce different accuracies, favoring the mode that has the smaller variation in effective pathlength relative to the mean.
To some extent, the terms having Δl can be thought of as additional error terms that can never be fully modeled. On the other hand, if Δl was zero for all samples, then Eq. (5) would reduce to a simple linear relationship between concentration and absorbance, and the multivariate calibration problem would return to the classical problem of quantification in the presence of a linear-additive combination of interferences. 26 The transmission mode for the 0.5 and 1 mm slabs accomplishes this well, as is seen in Fig. 3(b). When using the effective pathlength coefficient of variation as a metric to judge the applicability of linear modeling techniques (Table 1), all transmission modes studied should produce similar results if the absorbance noise can be reduced to a low level, for example, by increasing the source light power. Therefore, finding measurement sites on the body that are optically thin enough to permit transmission spectroscopy would be useful. Measurements of skin thickness have been performed by other researchers, and there are several promising sites already identified. 27,28 Other methods exist to narrow the pathlength distributions that detected light takes through a scattering medium, such as time-resolved diffuse reflectance, 29 spatially resolved diffuse reflectance, 30 patterned illumination, 31 confocal reflectance, 32 and spectroscopic optical coherence tomography. 33 Compared to simple transmission, these techniques should have much lower signal-to-noise for a given acquisition time, because much of the remitted light is not used.
That the multiplicative scatter correction preprocessing method had no effect on accuracy can be understood using Eq. (5). The scatter correction estimates a scalar value for Δl; (and one for e A as well), which in truth varies as a function of wavelength. More complicated empirical preprocessing methods would need to properly estimate the wavelengthdependence of the pathlength variation across samples. Or, radiative transport theory should be used to develop algorithms that recover the absorption coefficient from at least two unique reflectance measurements. For the 2100-to 2400-nm region simulated here, the diffusion approximation cannot be used, because the absorption is on par with the reduced scattering. But an inverse solution to radiative transfer could be employed to separate the effects of scattering from absorption in a homogeneous medium.
The PLS regression method used here is a linear regression method. Because the reflectance mode results in a nonlinear relationship between changes in absorbance and changes in concentration, nonlinear regression methods may offer an improvement. 34 One limitation of this work is that the Monte Carlo method simulated unpolarized light, but did not simulate enpolarization (increasing degree of polarization) due to the highly scattering bulk medium. It has been shown both numerically and experimentally that the average degree of polarization can increase from 0 to 0.75 when the light source is highly coherent, and the medium has little absorption compared to scattering. [35][36][37][38] It is possible that enpolarization effects could influence the comparison of reflectance and transmission, especially when the two modes have very different pathlength distributions, and this potential should be investigated in future experimental work.
Another limitation of this work is that a single-layer homogeneous medium was studied. However, the favorability of transmission mode over reflectance mode should become even more evident in the presence of a multilayer medium, as is commonly found in tissue. Now, each layer has different bulk scattering and absorption levels, and may have different concentrations of the analyte of interest. In reflectance, wavelengths that have deeper penetration due to less absorption and scattering will have longer pathlengths through a deeper layer than other wavelengths. This is especially true in the NIR region modeled in this work, as the water absorption coefficient is similar in magnitude to the reduced scattering coefficient, and varies by a factor of two over the spectral region of interest. But for transmission, light at all wavelengths are forced to travel through each layer, and the effective pathlength spectrum in each layer is more uniform than in the reflectance case.
In future work, other spectral regions should be explored to determine how thick a medium can be successfully interrogated using transmission mode under realistic instrument signal and noise conditions. In addition, these simulations assumed a constant slab thickness across spectra; future work should study the more realistic case where the thickness is variable but measurable to within a small degree of uncertainty.

Conclusion
Diffuse reflectance spectroscopy from a scattering medium can lead to severely degraded quantitative accuracy compared to transmission spectroscopy through a thin slab of the same medium, over a broad range of analyte signal size and instrument noise conditions. This is due to the creation of a broad pathlength distribution for diffusely reflected light, which creates complex, nonlinear changes in absorbance due to any change in analyte concentration or medium scattering that are difficult to model. As applied to medical device development, a transmission mode measurement through a thin slab of tissue should be evaluated, when possible, and compared to optimized reflectance spectroscopy if the quantitative accuracy needs to be improved before successful clinical application.

Disclosures
The author is employed by TruTouch Technologies Inc., a developer of noninvasive blood alcohol measurement devices employing NIR spectroscopy.