## 1.

## Introduction

Diabetes mellitus is a chronic systemic disease with no known cure in which the body either fails to produce, or fails to properly respond to, the glucose regulator hormone insulin. Estimates put the number of diabetics in North America at
$15\phantom{\rule{0.3em}{0ex}}\text{million}$
and the number worldwide at over
$100\phantom{\rule{0.3em}{0ex}}\text{million}$
, with these numbers expected to rise dramatically over the next
$30\phantom{\rule{0.3em}{0ex}}\text{years}$
.^{1} The primary goal of diabetes therapy is to maintain normal blood glucose homeostasis through diet, exercise, medication, or insulin injections. Accordingly, frequent monitoring of blood glucose levels is of paramount importance in guiding these therapies. Conventional “finger-prick” methods have low compliance due to their invasive nature, and as a result, many diabetics do not monitor their blood glucose levels as frequently as needed to obtain the medical care that could avoid long-term complications of the disease. A noninvasive method for glucose monitoring would greatly aid in increasing the frequency of monitoring and lead to a decrease in the complications associated with diabetes.

Due to this tremendous need for a less invasive method of blood glucose monitoring, a significant amount of research in a variety of fields has focused on developing such a method. Apart from diabetes, there is also a need for more frequent or continuous monitoring of preterm infants and other at-risk infants as well as people with rare metabolic disorders.^{2} Among the many approaches being investigated, research is underway to develop optical methods for noninvasive monitoring, including infrared spectroscopy,^{3, 4} optical coherence tomography (OCT)^{5}, photoacoustic techniques,^{6, 7} Raman spectroscopy,^{8} fluorescence techniques,^{9} and polarimetry.^{10, 11, 12} While some of these techniques are approaching acceptable levels of accuracy *in vivo*, none have been approved for clinical use; therefore, improvements in these modalities, combining these modalities, or other modalities are needed.

Nearly all of the techniques in development measure a signal due not only to glucose but also to many other biological constituents. As a result, the techniques suffer in their specificity to glucose. In addition, the techniques lack sensitivity because the signal due to glucose is generally much smaller than that due to other constituents. In this work, we propose a combined modality optical approach to determine glucose concentration in biological media. Specifically, combining near-infrared (NIR) spectroscopy signals, arguably the most promising optical method, with spectral polarization information lends itself well to a hybrid approach, because simultaneous measurements can be made with a single polarization-sensitive optical system.^{13} This combination exploits three optical effects of glucose: its NIR absorption spectrum, its optical rotatory dispersion (ORD, also known as optical activity), and its refractive index matching effect.^{14} For the simulation study described in this paper, only the first two effects were explored, NIR absorption and ORD, since these are specific to glucose. The refractive index matching effect is the change in the refractive index of the media with changes in constituent concentrations and is not specific to a single analyte.^{14} However, this effect may add further sensitivity and will be investigated in future work. Although glucose was used as the constituent of interest for this study, the technique is applicable for other absorbing and optically active constituents and may well have application to other biological and nonbiological systems [e.g., detection of optically active drugs, remote sensing of planetary (chiral) atmospheres].

To test the combination of NIR and ORD spectroscopy for concentration determination, a model of blood plasma containing glucose and plasma proteins was used to generate intensity and polarization spectra in both clear and scattering media. As with nearly all of the developing optical techniques, this requires extraction of the signal specific to glucose from a signal also influenced by many confounding factors (plasma proteins and water in this case). The field of chemometrics^{15} provides a number of well-developed techniques for analyzing measurements of complex chemical systems to yield constituent concentrations or other properties of interest. Most of these techniques regress one block of data, such as a set of NIR absorption spectra, to a single sample property of the sample, such as glucose concentrations, to build a predictive model. In our case, we wish to regress two blocks of data (NIR intensity and ORD polarization spectra) to improve the predictive abilities of the model. This requires the use of multiblock chemometrics to combine two or more of the data sets into a single predictive model.

The outline of this paper is as follows. First, the blood plasma model used in this study to generate the intensity and polarization signals as a function of wavelength is described. Both a clear and a scattering model were employed in this study to investigate our approach under these two conditions of increasing biological relevance. Second, the chemometric techniques used for concentration predictions are briefly reviewed, and the procedure for building and testing the predictive regression model is described. Third, the results of the predictions are presented for both the clear and scattering model, and the findings and their implications are discussed.

## 2.

## Blood Plasma Model

To investigate the abilities of a combined intensity and polarization method to predict analyte concentrations, a simulated blood plasma model was employed as a biological model. The model was implemented using MATLAB 7 (The MathWorks, Inc). Blood plasma is the liquid components of blood with the blood cells removed, the primary constituents being the proteins albumin, globulins, and fibrinogen as well as glucose. Other constituents of blood plasma such as cholesterol and urea were omitted in this model, because they exist in lower levels.^{16} As noted above, due to the considerable interest in noninvasive glucose monitoring, glucose was chosen as the analyte of interest for this study. The mean and standard deviation of the concentrations of plasma components, in units of
$\mathrm{g}\u2215100\phantom{\rule{0.3em}{0ex}}\mathrm{ml}$
, from a group containing both healthy and diabetic individuals are shown in Table 1
.^{16} Concentrations of the plasma components were randomly generated in this model using these means and standard deviations and assuming a normal distribution. The wavelength range of
$500\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}2000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
was chosen for this study because it is the most frequently employed spectral window, due to relatively low hemoglobin absorption, and includes the effects of both absorption and optical activity. The following sections describe the methods used to create the clear and scattering blood plasma models.

## Table 1

Mean and standard deviation of blood plasma component concentrations (g∕100ml) . The standard deviation of fibrinogen was estimated based on standard deviations of albumin and globulin.

Analyte | Mean | Standard Deviation |
---|---|---|

Albumin | 4.2 | 0.54 |

Globulin | 2.8 | 0.47 |

Fibrinogen | 0.3 | 0.045 |

Glucose | 0.18 | 0.113 |

## 2.1.

### Absorption and Optical Rotatory Dispersion

The effects of absorption due to water, plasma proteins, and glucose in the visible and NIR were modeled using experimental data from a number of reports.^{17, 18, 19, 20} Since data for individual plasma protein absorption dispersion could not be found, the representative plasma total protein (albumin, globulin, and fibrinogen) absorption was used. Figure 1
shows the absorption coefficients
${\mu}_{a}\left(\lambda \right)$
per concentration of analyte (
${\mathrm{cm}}^{-1}\phantom{\rule{0.2em}{0ex}}{\mathrm{g}}^{-1}$
dl), given as a function of wavelength for water, total protein, and glucose. To calculate the absorption due to water, the mass displacement caused by the addition of the proteins and glucose was calculated and subtracted from the mass of the initial water. This corrected mass of water was then used to calculate the absorption due to water using the data in Fig. 1. The absorption due to the total protein and glucose was calculated simply by using the mass of each constituent and the appropriate data from Fig. 1. The total absorption coefficient was then calculated as the sum of the absorption coefficients due to each component.

The effects of optical activity due to proteins and glucose in the visible and NIR were modeled using Drude’s equation,

where ${\left[\alpha \right]}_{\lambda}$ is the specific rotation of the molecule in units of rotation per concentration and pathlength (deg ml ${\mathrm{g}}^{-1}\phantom{\rule{0.2em}{0ex}}{\mathrm{dm}}^{-1}$ ) at the wavelength $\lambda $ , $A$ is a constant specific to the molecule, and ${\lambda}_{c}$ is the center wavelength.^{21}The parameters of Drude’s equation for the plasma proteins

^{21}and glucose

^{22}are given in Table 2 and plotted in Fig. 2 . The resulting rotation in degrees is calculated aswhere $C$ is the concentration of the molecule in $\mathrm{g}\u2215100\phantom{\rule{0.3em}{0ex}}\mathrm{ml}$ , and $l$ is the optical pathlength in dm $(\mathrm{dm}=0.1\phantom{\rule{0.3em}{0ex}}\mathrm{m})$ . The total rotation due to all components in plasma was calculated as the sum of the rotations due to each component.

## Table 2

Parameters of Drude’s equation for plasma proteins and glucose.

Analyte | A | λc |
---|---|---|

Albumin | $-1.75\times {10}^{7}$ | 264 |

Globulin | $-1.48\times {10}^{7}$ | 211 |

Fibrinogen | $-1.37\times {10}^{7}$ | 260 |

Glucose | $1.72\times {10}^{7}$ | 150 |

Given the concentrations of the plasma components, the absorption and optical rotation signals as a function of wavelength,
${\mu}_{a}\left(\lambda \right)$
and
$\alpha \left(\lambda \right)$
, can be calculated using the data in Fig. 1 and Eq. 2, respectively. The full polarization state of the light before and after passing through the plasma sample can be described with a four-element Stokes vector
$\mathbf{S}$
.^{23} The first element
$I$
represents the intensity of the light beam, the second element
$Q$
represents the linear polarization at 0 and
$90\phantom{\rule{0.3em}{0ex}}\mathrm{deg}$
, the third element
$U$
represents the linear polarization at 45 and
$135\phantom{\rule{0.3em}{0ex}}\mathrm{deg}$
, and the fourth element
$V$
represents the circular polarization. Polarization effects are applied to the Stokes vector using a
$4\times 4$
Mueller matrix
$\mathbf{M}$
. Given the input state of polarization
${\mathbf{S}}_{\mathrm{i}}$
, the output state
${\mathbf{S}}_{\mathrm{o}}$
after passing through the clear sample is calculated as

## Eq. 3

$${\mathbf{S}}_{\mathrm{o}}={\mathbf{M}}_{\mathrm{a}}{\mathbf{M}}_{\mathrm{r}}{\mathbf{S}}_{\mathrm{i}},$$## Eq. 4

$$\left[\begin{array}{c}{I}_{o}\left(\lambda \right)\\ {Q}_{o}\left(\lambda \right)\\ {U}_{o}\left(\lambda \right)\\ {V}_{o}\left(\lambda \right)\end{array}\right]=\left[\begin{array}{cccc}{e}^{-{\mu}_{a}\left(\lambda \right)l}& 0& 0& 0\\ 0& {e}^{-{\mu}_{a}\left(\lambda \right)l}& 0& 0\\ 0& 0& {e}^{-{\mu}_{a}\left(\lambda \right)l}& 0\\ 0& 0& 0& {e}^{-{\mu}_{a}\left(\lambda \right)l}\end{array}\right]\left[\begin{array}{cccc}1& 0& 0& 0\\ 0& \mathrm{cos}\left(2\alpha \left(\lambda \right)\right)& \mathrm{sin}\left(2\alpha \left(\lambda \right)\right)& 0\\ 0& -\mathrm{sin}\left(2\alpha \left(\lambda \right)\right)& \mathrm{cos}\left(2\alpha \left(\lambda \right)\right)& 0\\ 0& 0& 0& 1\end{array}\right]\left[\begin{array}{c}{I}_{i}\left(\lambda \right)\\ {Q}_{i}\left(\lambda \right)\\ {U}_{i}\left(\lambda \right)\\ {V}_{i}\left(\lambda \right)\end{array}\right],$$Six wavelength-dependent signals were available to build the regression model (refer to Sec. 3) and predict the glucose concentrations in clear (nonscattering) media [ ${\mu}_{a}\left(\lambda \right)$ and $\alpha \left(\lambda \right)$ calculated based on the constituent concentrations, and the Stokes parameters ${I}_{o}\left(\lambda \right)$ , ${Q}_{o}\left(\lambda \right)$ , ${U}_{o}\left(\lambda \right)$ , and ${V}_{o}\left(\lambda \right)$ calculated using Eq. 4 based on the input polarization]. In an experimental situation, the measurements of the Stokes parameters would be made and ${\mu}_{a}\left(\lambda \right)$ and $\alpha \left(\lambda \right)$ would be calculated based on measured Stokes parameters. Using the described methods to calculate absorption and optical activity, intensity and polarization spectra were generated from $500\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}2000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ in $2\text{-}\mathrm{nm}$ intervals with physiological levels of the plasma constituents, and with the optical pathlength through the plasma being $1\phantom{\rule{0.3em}{0ex}}\mathrm{cm}$ . The light incident on the sample was set to be linearly polarized horizontally $({\mathbf{S}}_{\mathrm{i}}={\left[1\phantom{\rule{0.3em}{0ex}}1\phantom{\rule{0.3em}{0ex}}0\phantom{\rule{0.3em}{0ex}}0\right]}^{\mathrm{T}})$ . To assess the predictive abilities of the models in an experimental situation, simulated Gaussian noise was added to each signal with an increasing standard deviation. This added uncertainty simulated both source and measurement noise.

## 2.2.

### Scattering Monte Carlo Model

To investigate the ability to predict analyte concentrations using the combined method in scattering biological media such as tissue, a Monte Carlo model for polarized light propagation in multiply scattering media was used. This validated model,^{24, 25} available for download,^{26} records individual photon packet polarizations in the form of Stokes vectors and includes the effects of multiple scattering, absorption, optical activity, and linear birefringence. The model tracks a large number of photons as they propagate and multiply scatter through the sample, then sums the photons to calculate their macroscopic properties of interest.

To reduce the computational time needed to generate data, simulations were run to create a lookup-table with absorption and optical activity as the independent parameters. This lookup table approach is similar to that taken in other Monte Carlo studies.^{27} The computational time required was reduced in two ways: first, the number of simulations needed was reduced because data for each full spectrum were not required, since each spectrum could be generated from the lookup table; second, fewer photons were needed in each simulation because the lookup table could be smoothed to reduce the noise due to the statistical nature of the model. To create the lookup table, the values of absorption and optical activity were varied in the full range of blood plasma physiological levels as found from the compiled data. The changes in refractive index and the resulting effects on scattering due to the variations in plasma components were ignored in this study to reduce the parameter space of the lookup table, and thus to reduce the number of simulations required. This approximation is a reasonable first step; in fact, its inclusion, while complicating the analysis, may actually improve the ability to predict analyte concentrations due to an increased sensitivity to concentration changes in both light intensity and polarization.^{13} In addition, refractive index dispersion in all materials was ignored to reduce the required simulations. Finally, the linear birefringence was set to zero since this effect was ignored in the current study.

Simulations were run with the absorption coefficient varied between 0 and $30\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$ in $2\text{-}{\mathrm{cm}}^{-1}$ increments, and with the optical activity varied between 0 and $0.009\phantom{\rule{0.3em}{0ex}}\mathrm{rad}\phantom{\rule{0.2em}{0ex}}{\mathrm{cm}}^{-1}$ (0 and $0.5157\phantom{\rule{0.3em}{0ex}}\mathrm{deg}\phantom{\rule{0.2em}{0ex}}{\mathrm{cm}}^{-1}$ ) in $0.001\phantom{\rule{0.3em}{0ex}}\mathrm{rad}\phantom{\rule{0.2em}{0ex}}{\mathrm{cm}}^{-1}$ $\left(0.05730\phantom{\rule{0.3em}{0ex}}\mathrm{deg}\phantom{\rule{0.2em}{0ex}}{\mathrm{cm}}^{-1}\right)$ increments, to create a $10\times 16$ lookup table. All simulations were run with $3\times {10}^{8}$ photons. The sample was a $1\times 1\times 1\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{3}$ cube with a scattering coefficient of $60\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$ , which is somewhat lower than that of tissues; this coefficient was selected for this feasibility investigation to reduce the simulation times by reducing the number of photons required so each photon would retain a higher degree of polarization. The scattering particles in the medium were simulated to represent spherical polystyrene microspheres with a diameter of $1.4\phantom{\rule{0.3em}{0ex}}\mu \mathrm{m}$ . The refractive indices of the scattering particles and the surrounding media were 1.59 and 1.33, respectively (corresponding to polystyrene and water), giving an anisotropy $g=0.93$ for the microspheres at $633\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ . The photons exiting the sample were binned in $1\phantom{\rule{0.3em}{0ex}}{\mathrm{mm}}^{2}$ detection areas with an acceptance angle of $20\phantom{\rule{0.3em}{0ex}}\mathrm{deg}$ to approximate a typical experimental measurement. The forward direction (direct transmission) was chosen as the geometry of interest to compare the results with those from the clear plasma model. As in the clear media case, the light incident on the sample was set to be horizontally linearly polarized $({\mathbf{S}}_{\mathrm{i}}={\left[1\phantom{\rule{0.3em}{0ex}}1\phantom{\rule{0.3em}{0ex}}0\phantom{\rule{0.3em}{0ex}}0\right]}^{\mathrm{T}})$ . Spectra were then generated with the lookup table to create wavelength-dependent Stokes parameters [ ${I}_{o}\left(\lambda \right)$ , ${Q}_{o}\left(\lambda \right)$ , ${U}_{o}\left(\lambda \right)$ , and ${V}_{o}\left(\lambda \right)$ ] between 500 and $2000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ in $2\text{-}\mathrm{nm}$ intervals using the appropriate values for optical activity and absorption at each wavelength. The orientation of polarization $\gamma \left(\lambda \right)$ as a function of wavelength was found from the output Stokes parameters as

## Eq. 5

$$\gamma \left(\lambda \right)=\frac{1}{2}\phantom{\rule{0.2em}{0ex}}{\mathrm{tan}}^{-1}\left[\frac{{U}_{o}\left(\lambda \right)}{{Q}_{o}\left(\lambda \right)}\right].$$## 3.

## Chemometric Methods

Two chemometric algorithms, unfold partial least squares (U-PLS),^{28} and multiblock partial least squares (MB-PLS).^{29} were used to combine the intensity [
$I\left(\lambda \right)$
and
${\mu}_{a}\left(\lambda \right)$
] and polarization [
$Q\left(\lambda \right)$
,
$U\left(\lambda \right)$
,
$V\left(\lambda \right)$
, and
$\gamma \left(\lambda \right)$
] data in a model to predict the glucose concentration. These two methods were chosen because they employ the widely used conventional partial least squares (PLS) regression algorithm currently used in most glucose monitoring techniques. The Multiblock Toolbox for MATLAB was used to implement the algorithms.^{30} This section will briefly review these two algorithms; further details are available in the cited references.

Both algorithms are based on the widely used PLS regression method. In PLS, a regression relationship is found between a descriptor block or matrix of data (in our case, a set of intensity or polarization spectra) and a response block or matrix (in our case, a set of corresponding glucose concentrations). This is achieved by decomposing both the descriptor and response blocks into so-called latent variables that describe the maximum variance in the data. The regression is calculated based on how the variances in each block explain each other—in other words, finding the covariance between the blocks. In situations where there is more than one descriptor block, as is the case with the intensity and polarization measurements, a method of combining the information in both descriptor blocks must be used to predict the response block.

The most rudimentary approach for combining and regressing the data is U-PLS. In this method, each descriptor block is placed in a single combined block, as shown in Fig. 3 , and normal PLS regression is performed on this combined descriptor block. In other words, as the name suggests, U-PLS “unfolds” the multiblock data into a single block. The inclusion of more information in the combined descriptor block can improve the predictive ability of the model. A more sophisticated method of combining and regressing the data is MB-PLS. In this method, the blocks are kept separate; however, the blocks are used to create a single super descriptor block that is then regressed to the response block using PLS. The creation of this super block involves finding the common information contained in each of the descriptor blocks, referred to as the “consensus” in the literature. In other words, the variations in the signals that are common in both descriptor blocks are identified as the consensus between the two blocks. Since each block contains measurements done on the same samples, looking at the common information contained in each block can provide better predictive ability for analytes that affect both measurement techniques (as in the plasma model). For example, glucose (as well as plasma proteins) influences both the intensity and polarization signals, causing a common variation in both signals that should be identified in the calculation of the consensus.

Once the regression techniques have been chosen, two steps are required to develop a chemometric model: calibration and testing. In calibration, often referred to as training, a set of data (both descriptor and response blocks) is used to create the PLS model. In our case, a set of intensity and polarization spectra (descriptor blocks) and glucose concentrations corresponding to each spectra (response blocks) were used to calculate the regression parameters using either U-PLS or MB-PLS. Once the model is built in the calibration step, it must be tested with new data to ensure the validity of the model and to assess its predictive ability. In our test, a new set of intensity and polarization spectra, along with corresponding glucose concentrations, was generated for the purpose of testing both regression techniques. The size of both the calibration and testing data sets was 400 spectra with corresponding glucose concentrations. The predictive ability of the methods was determined by calculating the root mean square error of prediction (percent error) using the testing data set, with increasing noise added to both the training and testing spectra. The percent error was then used to calculate the error as a percentage of the mean glucose concentration. Results from conventional PLS regression on intensity and polarization spectra separately were also calculated to quantify the improvement in the combined methodology.

## 4.

## Results and Discussion

## 4.1.

### Clear Media

The percent error values for the clear plasma model are shown in Figs.
4, 5, 6
. They demonstrate the improved predictive abilities of the combined intensity and polarization approach. The descriptor block contained 400 spectra between 500 and
$2000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
in
$2\text{-}\mathrm{nm}$
intervals, with each spectrum having a corresponding glucose concentration in the response block. For reference, a typical human blood glucose level is
$0.18\phantom{\rule{0.3em}{0ex}}\mathrm{g}\u2215100\phantom{\rule{0.3em}{0ex}}\mathrm{ml}$
(from Table 1); to be within the generally acceptable level of error for glucose monitoring of 15%^{31} (some other reports indicate 20%^{32}), the predictive error should be less than
$0.027\phantom{\rule{0.3em}{0ex}}\mathrm{g}\u2215100\phantom{\rule{0.3em}{0ex}}\mathrm{ml}$
. Figure 4 plots the results from regressing
$\alpha \left(\lambda \right)$
and
${\mu}_{a}\left(\lambda \right)$
individually, with increasing noise added to the signals using conventional PLS, and the combined regression results using U-PLS and MB-PLS. In this case, the standard deviation of the added noise was specified as a percentage of the mean value of each intensity and polarization signal. This was done to corrupt the signals at approximately the same rate, because the signals differed considerably in magnitude. Figure 4 shows that regressing the absorption spectra
${\mu}_{a}\left(\lambda \right)$
produced much better predictive results (i.e., less corrupted by noise) than regressing the optical activity spectra
$\alpha \left(\lambda \right)$
. This is due to the similarity in the ORD of the plasma components, as shown in Fig. 1, making them difficult for the PLS regression to separate. However, when the absorption and optical activity spectra were regressed with the combined methods (either with U-PLS or MB-PLS), there was considerable improvement over regressing the absorption spectra (or ORD) alone. Both combined methods produced an approximately 25% reduction in percent error with the maximum added noise over
${\mu}_{a}\left(\lambda \right)$
alone. As hypothesized, the combination of the information in both these spectra decreased the predictive error. Given the poor results of the optical activity spectra regression, the large improvement when it was combined with the absorption spectra was somewhat surprising. Evidently, enough information was contained in the optical activity spectra to complement the absorption spectra and improve the combined regression results.

To test the regression methods on more experimentally realistic signals (i.e., on results from a typical polarimeter^{13}), the Stokes vectors of the output light from the clear plasma model were calculated using Eq. 4 with input horizontally and linearly polarized light
$({\mathbf{S}}_{\mathrm{i}}={\left[1\phantom{\rule{0.3em}{0ex}}1\phantom{\rule{0.3em}{0ex}}0\phantom{\rule{0.3em}{0ex}}0\right]}^{\mathrm{T}})$
. The results from regressing the Stokes parameters individually [including the intensity
$I\left(\lambda \right)$
] using conventional PLS with increasing noise added to each parameter are shown in Fig. 5. Stokes parameter
$V\left(\lambda \right)$
(circular polarization) was omitted because it contained no information in the present model. Predictions with the parameter
$U\left(\lambda \right)$
produced poor results because the input light was horizontally polarized, yielding small values of
$U\left(\lambda \right)$
that were rapidly corrupted by the added noise. A small change due to the rotation of the polarization from optical activity was present; however, this small change was quickly swamped by the added noise. Predictions using either
$I\left(\lambda \right)$
or
$Q\left(\lambda \right)$
parameters produced similar and far superior results. The predictive errors using
$I\left(\lambda \right)$
and
$Q\left(\lambda \right)$
were very similar since they contained nearly identical information due to the fully horizontal polarization of the input light
$[{I}_{i}\left(\lambda \right)={Q}_{i}\left(\lambda \right)]$
. Stokes parameter
$Q\left(\lambda \right)$
was attenuated by the absorption and slightly reduced while the optical activity while the polarization vector was rotated. In this case, even though
$Q\left(\lambda \right)$
was affected by both absorption and optical activity, there was no improvement gained over
$I\left(\lambda \right)$
, which contained only absorption information. This suggests that even though
$Q\left(\lambda \right)$
was affected by both absorption and optical activity, it was difficult for PLS to decouple the effects.

The results from combined regression methods for the Stokes parameters are shown in Fig. 6, along with the result for the regression of $I\left(\lambda \right)$ alone (from Fig. 5) for comparison. Both U-PLS and MB-PLS produced lower predictive errors than regressing $I\left(\lambda \right)$ alone. In both cases, combining either $I\left(\lambda \right)$ and $Q\left(\lambda \right)$ or $I\left(\lambda \right)$ , $Q\left(\lambda \right)$ , and $U\left(\lambda \right)$ resulted in similar predictive errors, yielding little additional information in the $U\left(\lambda \right)$ signal as discussed above. However, MB-PLS produced significantly lower predictive errors than U-PLS. The reductions in percent error for U-PLS and MB-PLS were approximately 8% and 15%, respectively, at the maximum noise level. These results differed from the results of predicting using $\alpha \left(\lambda \right)$ and ${\mu}_{a}\left(\lambda \right)$ , where U-PLS and MB-PLS prediction accuracy were nearly the same. Evidently, the creation of the consensus super-block in MB-PLS by combining the common information in each descriptor block leads to better predictive power in the case of the Stokes parameters. This is likely due to the mix of absorption and optical activity information contained in Stokes parameters $Q\left(\lambda \right)$ and $U\left(\lambda \right)$ as opposed to the pure optical activity and absorption channels $\alpha \left(\lambda \right)$ and ${\mu}_{a}\left(\lambda \right)$ . This mixing may make it difficult for the regression methods to decouple the effects when building the predictive models. These results suggest that regressing absorption combined with optical rotation, rather than the measured Stokes parameters, provides better predictive abilities [an improvement of approximately 25% for combined $\alpha \left(\lambda \right)$ and ${\mu}_{a}\left(\lambda \right)$ versus an improvement of approximately 15% for the combined Stokes parameters]. However, when optical rotation is calculated based on measured Stokes parameters, the magnitude of noise will increase as the parameters are divided in Eq. 5. Therefore, as shown later, the potential improvement is diminished. To summarize, it is apparent from these results that combining the information contained in the intensity and polarization spectra improves the predictive abilities for glucose in simulated clear media.

## 4.2.

### Scattering Media

The results of predictive calculations using the data generated by the Monte Carlo model for scattering media are shown in Figs. 7, 8, 9 . Similar to the clear media, the descriptor block contained 400 spectra between 500 and $2000\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ in $2\text{-}\mathrm{nm}$ intervals with each spectrum having a corresponding glucose concentration in the response block. As also was the case for the clear media, a reduction in predictive error was achieved by combining the intensity and polarization signals. Figure 7 plots the percent error for the Stokes parameters $I\left(\lambda \right)$ , $Q\left(\lambda \right)$ , $U\left(\lambda \right)$ , and $V\left(\lambda \right)$ , plus the orientation of the polarization $\gamma \left(\lambda \right)$ calculated using Eq. 5, with each regressed individually. In contrast to Fig. 5, the results for the parameter $V\left(\lambda \right)$ are now shown, since scattering produced some transfer of linear-to-circular polarization [giving a nonzero value of $V\left(\lambda \right)$ ] and therefore some information is now contained in $V\left(\lambda \right)$ . As expected, when using single information channels, regression using $I\left(\lambda \right)$ produced the lowest predictive error. The other Stokes parameters and $\gamma \left(\lambda \right)$ suffered (to various extents) in predictive ability as a result of the depolarizing effects of multiple scattering. This depolarizing reduced the values of the Stokes parameters, leading to large corruption, relative to $I\left(\lambda \right)$ , by the added simulated noise. The fraction of retained polarization for the incident fully linear (horizontally) polarized light was approximately 50% after propagating through the media. As a result, the values of the Stokes parameters $Q\left(\lambda \right)$ , $U\left(\lambda \right)$ , and $V\left(\lambda \right)$ were significantly reduced. Because the incident light was horizontally polarized, $Q\left(\lambda \right)$ yielded much lower predictive errors than either $U\left(\lambda \right)$ , $V\left(\lambda \right)$ , or $\gamma \left(\lambda \right)$ , although the percent error was still higher than that for $I\left(\lambda \right)$ due to depolarization.

The predictive errors for the combined regression methods are shown in Figs. 8 and 9, along with the results for the regression of $I\left(\lambda \right)$ alone for comparison (from Fig. 6). In addition to the combinations of the Stokes vectors, the orientation of the polarization vector $\gamma \left(\lambda \right)$ as calculated by Eq. 5 was also combined and regressed with $I\left(\lambda \right)$ using U-PLS and MB-PLS. All the combined methods with Stokes parameters improved the predictive power; however, the improvement was more modest than that for clear media calculations. In addition, little difference was observed between the percent errors for U-PLS and MB-PLS, as was seen for the clear media, with the overall reduction in error being approximately 8% for all combined Stokes parameter prediction methods. Also, the combination of $I\left(\lambda \right)$ and $\gamma \left(\lambda \right)$ through U-PLS or MB-PLS produced little or no improvement. In fact, the predictions when using U-PLS shown in Fig. 8 overlapped with predictions when using $I\left(\lambda \right)$ alone. This is likely due to an increase in noise when calculating $\gamma \left(\lambda \right)$ through Eq. 5, which led to a decrease in predictive power [i.e., the effective error on $\gamma \left(\lambda \right)$ was approximately two times that for the other parameters]. As previously discussed, the lower improvements are due to the depolarizing effects of multiple scattering and the subsequent reduction in the Stokes parameters $Q\left(\lambda \right)$ , $U\left(\lambda \right)$ , and $V\left(\lambda \right)$ . However, even with the lower improvement for the combined approaches in scattering media, the results still demonstrate the potential for the method. A significant improvement in prediction is still achieved with the combination of intensity and polarization information from multiply scattering media. Future work will further investigate the influence of scattering in an effort to determine at what level of scattering an improvement is still realized. Since this rather large reduction in improvement has already been observed from clear media to a scattering coefficient of $60\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1}$ , it would seem unlikely that predictions with more realistic tissue-scattering values $(\sim 100\phantom{\rule{0.3em}{0ex}}{\mathrm{cm}}^{-1})$ would exhibit significant improvement. However, this is yet to be determined, and smoothing or other preprocessing steps could be used to reduce the signal noise. The present results demonstrate the potential for the methodology in clear or moderately scattering biological fluids such as blood plasma, or extra-cellular fluid and tissues such as the aqueous humor of the eye.

Four additional areas have been identified for further investigation and will be explored in future work. First, the effects of refractive index matching due to variations in constituents^{13} will be added to the model to investigate how these effects influence the predictive abilities of the regression techniques. The changes in the refractive index and resulting changes in the scattering properties of the media will influence both the intensity and polarization of the light propagating through the media and may add some additional sensitivity to the constituent concentration changes. It is possible that this may improve the predictive power of this methodology. Second, the influence of other common biological optical effects such as linear birefringence^{25} on the abilities of the regression methods will also be investigated. Linear birefringence has the effect of transferring the polarization state from linear to circular or vice versa, and will interfere with the effects of optical activity. The extent to which this influences the abilities of the methodology will be investigated by employing a method of polar Mueller matrix decomposition to decouple the two effects.^{33, 34} This will be done in both clear and scattering media with increasing scattering coefficients. Third, the effect of detection geometry and wavelength range will also be investigated to determine the optimal experimental parameters for the measurement of intensity and polarization signals.^{11, 13} This will involve both simulations and experimental work. Fourth, experimental testing must be completed to validate the combining methodology, because the current results are for simulated measurements only. This will involve the design, construction, and testing of a combined spectral intensity and polarimetry system suitable for experimental measurements.

## 5.

## Conclusions

We have simulated and analyzed a combined intensity and polarization method for analyte concentration determination in clear and scattering biological-like media with glucose as the analyte of interest. The determination of glucose concentrations in biological media is a crucially important unsolved problem in clinical medicine, because a tremendous need exists for a noninvasive method that diabetics can use to monitor their blood glucose levels. The methodology was tested using a simulated blood plasma model for both clear and scattering media. The spectral intensity and polarization data generated through this model were used to build and test predictive chemometric models that combined the two data sets. The results showed that the combination of these two modalities improved the predictive ability over the use of intensity or polarization data alone, for both simulated clear and scattering media, thus demonstrating potential for the methodology. This improvement is reduced in scattering media due to light depolorization. Thus, the applicability of these methods may be limited in higher scattering media; however, further study is required to investigate this. In addition, the effects of refractive index matching, tissue birefringence, detection geometry, and wavelength range, which have not been included in the current model, require further study. The development of a suitable combined intensity and polarization system and experimental testing of the methodology is also required for validation. The applicability of this methodology is not limited to glucose sensing, because other analytes of interest could be targeted. The methodology is also not limited to biological media, since nonbiological applications for concentration determination of absorbing optically active molecules exist—for example, in remote sensing scenarios. To our knowledge, this is the first demonstration of a combined spectral intensity and polarization methodology for analyte concentration determination.

## Acknowledgments

The authors would like to thank Dr. Robert Weersink for assistance with the chemometric software and Drs. Xinxin Guo and Nirmalya Ghosh (all from the Ontario Cancer Institute) for assistance with the preparation of this manuscript. Financial support from the Natural Sciences and Engineering Research Council of Canada and the Ontario Graduate Scholarship Program is gratefully acknowledged.

## References

**,” IEEE Spectrum, 39 52 –57 (2002). https://doi.org/10.1109/6.993789 0018-9235 Google Scholar**

*Keeping watch on glucose: new monitors help fight the long-term complications of diabetes***,” J. Fluoresc., 14 513 –520 (2004). 1053-0509 Google Scholar**

*Current problems and potential techniques in**in vivo*glucose monitoring**,” IEEE J. Sel. Top. Quantum Electron., 9 322 –330 (2003). https://doi.org/10.1109/JSTQE.2003.811283 1077-260X Google Scholar**

*Noninvasive blood glucose assay using a newly developed near-infrared system***,” J. Mol. Struct., 735–736 299 –306 (2005). 0022-2860 Google Scholar**

*Noninvasive glucose determination in the human eye***,” Diabetes Care, 25 2263 –2267 (2002). https://doi.org/10.2337/diacare.25.12.2263 0149-5992 Google Scholar**

*Non-invasive blood glucose monitoring with optical coherence tomography***,” Clin. Chem., 45 1587 –1595 (1999). 0009-9147 Google Scholar**

*Advances in photoacoustic noninvasive glucose testing***,” J. Biomed. Opt., 10 031114 (2005). https://doi.org/10.1117/1.1920212 1083-3668 Google Scholar**

*Raman spectroscopy for noninvasive glucose measurements***,” J. Fluoresc., 14 617 –633 (2004). https://doi.org/10.1023/B:JOFL.0000039349.89929.da 1053-0509 Google Scholar**

*Ophthalmic glucose monitoring using disposable contact lenses—a review***,” J. Biomed. Opt., 9 213 –220 (2004). https://doi.org/10.1117/1.1629683 1083-3668 Google Scholar**

*Balanced detection for low noise precision polarimetric measurements of optically active, multiply scattering tissue phantoms***,” Opt. Express, 15 1348 –1360 (2007). https://doi.org/10.1364/OE.15.001348 1094-4087 Google Scholar**

*Monte Carlo study of pathlength distribution of polarized light in turbid media***,” J. Biomed. Opt., 10 024029 (2005). https://doi.org/10.1117/1.1891175 1083-3668 Google Scholar**

*Dual-wavelength polarimetry for monitoring glucose in the presence of varing birefringence***,” J. Biomed. Opt., 11 041105 (2006). https://doi.org/10.1117/1.2339134 1083-3668 Google Scholar**

*Angular resolved linear Stokes polarimeter measurements of light scattered by turbid chiral media***,” J. Biomed. Opt., 7 291 –299 (2002). https://doi.org/10.1117/1.1483880 1083-3668 Google Scholar**

*Optical rotation and linear and circular depolarization rates in diffusely scattered light from chiral, racemic, and achiral turbid media***,” Anal. Chim. Acta, 371 255 –267 (1998). https://doi.org/10.1016/S0003-2670(98)00318-3 0003-2670 Google Scholar**

*Measurement of glucose and other analytes in undiluted human serum with near-infrared transmission spectroscopy***,” Vib. Spectrosc., 18 33 –40 (1998). 0924-2031 Google Scholar**

*Near-infrared spectra serum albumin and $\gamma $-globulin and determination of their concentrations in phosphate buffer solution by partial least squares regression***,” Appl. Opt., 41 (7), 1469 –1475 (2002). https://doi.org/10.1364/AO.41.001469 0003-6935 Google Scholar**

*Determination of glucose concentration in a scattering medium on selected wavelengths by use of an overtone absorption band***,” Analyst (Cambridge, U.K.), 128 1471 –1477 (2003). 0003-2654 Google Scholar**

*Simultaneous determination of human serum albumin, $\gamma $-globulin, and glucose in a phosphate buffer solution by near-infrared spectroscopy with moving window partial least-squares regression***,” J. Biomed. Opt., 11 (4), 041128 (2006). https://doi.org/10.1117/1.2342076 1083-3668 Google Scholar**

*Prediction of glucose in whole blood near-infrared spectroscopy: influence of wavelength region, preprocessing, and hemoglobin concentration***,” J. Biomed. Opt., 5 (1), 5 –16 (2000). https://doi.org/10.1117/1.429962 1083-3668 Google Scholar**

*Optical glucose monitoring in biological fluids: an overview***,” Absorption and Scattering of Light by Small Particles, Wiley, New York (1983). Google Scholar**

*Electromagnetic theory***,” Opt. Express, 13 148 –163 (2005). https://doi.org/10.1364/OPEX.13.000148 1094-4087 Google Scholar**

*Robust concentration determination of optically active molecules in turbid media with validated three-dimensional polarization sensitive Monte Carlo calculations***,” J. Biomed. Opt., 12 (1), 014029 (2007). https://doi.org/10.1117/1.2434980 1083-3668 Google Scholar**

*Polarized light propagation in multiply scattering media exhibiting both linear birefringence and optical activity: Monte Carlo model and experimental methodology***,” (2005) http://www.novajo.ca/ont-canc-inst-biophotonics/ Google Scholar**

*Pol-MC: a three dimensional polarization sensitive Monte Carlo implementation for light propagation in tissue***,” Phys. Med. Biol., 48 2373 –2390 (2003). https://doi.org/10.1088/0031-9155/48/15/309 0031-9155 Google Scholar**

*Simulation study of in vitro glucose measurements by NIR spectroscopy and a method of error reduction***,” J. Chemom., 1 (1), 41 –56 (1987). https://doi.org/10.1002/cem.1180010107 0886-9383 Google Scholar**

*Multi-way principal components and PLS analysis***,” J. Chemom., 3 3 –20 (1988). 0886-9383 Google Scholar**

*A multiblock partial least squares algorithm for investigating complex chemical systems***,” http://www.models.kvl.dk/source/MBToolbox/ Google Scholar**

*Multi-block toolbox for Matlab***,” Diabetes Care, 10 95 –99 (1987). 0149-5992 Google Scholar**

*Consensus statement on self-monitoring of blood glucose***,” Diabetes Tech. Therapeut., 9 545 –552 (2007). Google Scholar**

*Assessing glucose monitor performance—a standardized approach***,” J. Opt. Soc. Am. A, 13 1106 –1113 (1996). 0740-3232 Google Scholar**

*Interpretation of Mueller matrices based on polar decomposition***,” Opt. Express, 14 190 –202 (2006). https://doi.org/10.1364/OPEX.14.000190 1094-4087 Google Scholar**

*Mueller matrix approach for determination of optical rotation in chiral turbid media in backscattering geometry*