Mapping the probability of freshwater algal blooms with various spectral indices and sources of training data

Abstract. Algal blooms are pervasive in many freshwater environments and can pose risks to the health and safety of humans and other organisms. However, monitoring and tracking of potentially harmful blooms often relies on in-person observations by the public. Remote sensing has proven useful in augmenting in situ observations of algal concentration, but many hurdles hinder efficient application by end users. First, numerous approaches to estimate aquatic chlorophyll-a are available and can produce inconsistent results. Second, lack of quantitative in situ observations limits opportunities to train models for specific waterbodies, such that models developed for other systems must be used instead. We (1) implement univariate and multivariate logistic regression models to estimate the probability that aquatic chlorophyll-a concentrations exceed an accepted threshold beyond which harmful effects become likely and (2) evaluate the use of visually classified bloom/no-bloom satellite imagery to augment in situ training data. Using a binary classification of aquatic chlorophyll-a exceeding 10  μg  /  L, we found that (1) logistic regression models were ∼80  %   accurate, (2) univariate models trained with visually classified data produce nearly the same accuracy (79%) as models trained with in situ observations (80%), and (3) augmenting in situ chlorophyll-a observations with visual classifications outperformed (82% accuracy) models trained on in situ observations alone (80% accuracy). These results provide a framework for evaluating multiple spectral indices in retrieving algal bloom presence or absence and illustrate that training data derived directly from satellite imagery can be useful in augmenting in situ observations.


Introduction
Freshwater algal blooms are a global concern, 1-4 and there is evidence that they are becoming more common in response to climate change. 1,[5][6][7] Because algal blooms can adversely affect public health, economies, and ecosystem services by degrading water quality, 8,9 early identification of algal blooms can improve public safety and mitigate economic concerns.
Algal blooms are often identified via visual inspection of a waterbody, 10 with reports from both waterbody managers and the public playing a fundamental role in algal bloom monitoring for state environmental monitoring agencies. [11][12][13][14][15][16] Although visual inspection by water quality agencies and public health departments is a relatively accurate way to identify the presence of algal blooms, 10 the number of waterbodies that can be monitored this way is limited. Further, visual inspection results can be subjective and conclusions might differ between individuals, even when optical recording devices are used. 17 As a result, some public health agencies maintain reactionary stances to algal bloom monitoring, waiting for a bloom to be reported before investigating, analyzing, and providing public health guidance. 18,19 This stance can result in incomplete monitoring coverage (e.g., omission of algal bloom events unless reported), and delays in public health notices that can have real-world implications on human health and socioeconomics. 19 Remote sensing has the potential to augment in situ visual inspection while increasing the spatial scale of coverage. In the past 50 years, considerable research attention has been devoted to developing remote sensing techniques for identifying and tracking algal blooms. 20,21 Remote sensing of water quality for inland, freshwater systems has lagged marine applications partially due to the optical complexity of inland waters. 22 Despite this lag, nearly 30 years of studies have focused on the development of methods to derive water quality metrics from spectral signatures. 22 In the past 20 years, a shift toward operationalizing freshwater water quality remote sensing has occurred. 22,23 Identifying cyanobacterial blooms has been the focus of significant investment in remote sensing, with particular focus on the ocean and land color instrument (OLCI) on board the Sentinel-3A and Sentinel-3B satellites. [24][25][26][27] By focusing on spectral features at 665 and 681 nm, this body of work relies on a well characterized two-step approach to identify the presence of phycocyanin and to then quantify the strength of the signal. 24,28,29 OLCI collects imagery with a nominal 300-m ground sampling distance, allowing for the monitoring of larger waterbodies and the production of operational cyanobacterial index products at a large spatial scale. However, these products do not have sufficient spatial resolution to monitor the near-shore environment nor narrow waterbodies that are common in the intermountain west, where deep river valleys have been dammed to create reservoirs that produce hydropower and supply irrigation and drinking water.
Satellite-based sensors with spatial resolution sufficient to resolve narrow waterbodies [e.g., the operational land imager (OLI) on Landsat-8 and Landsat-9, and the multispectral instrument (MSI) on Sentinel-2A and Sentinel-2B] do not have the spectral resolution required to implement the cyanobacteria index approach listed above. 29,30 Instead, work with these images to identify algal conditions has focused on retrieving chlorophyll-a, 31,32 which has been demonstrated to serve as a robust surrogate for cyanobacterial concentrations in conditions dominated by cyanobacteria. 33 Focusing on chlorophyll-a precludes differentiation between harmful algal blooms dominated by cyanobacteria and other aquatic photosynthetic growth. 28,34,35 This lack of specificity leads to a bias toward public health protection when noncyanobacterial blooms are identified. Further, the 10-m spatial resolution delivered by Sentinel-2 imagery used in this study allows waterbody managers and public health officials to monitor relatively small waterbodies, narrow portions of larger waterbodies (e.g., bays), and near-shore environments where blooms can accumulate due to wind driven transport. 36,37 In this work, we evaluate the ability to classify chlorophyll concentrations using higher spatial-but lower spectral-and temporal-resolution imagery from the MSI on board the Sentinel-2A and Sentinel-2B satellites.
Multiple spectral indices have been developed to retrieve chlorophyll-a conditions from a range of passive optical sensors and are presented in the literature. 30,[38][39][40][41][42][43][44][45][46] However, none of these approaches have been shown to consistently outperform the others in retrieving chlorophyll-a concentrations. Additionally, we typically lack water quality observations for any given waterbody that are coincident with satellite imagery despite large-scale projects to compile such matchups. 47 As such, two distinct challenges must be addressed when using satellite imagery to estimate water quality: (1) identifying spectral indices that describe water quality metrics of interest and (2) relating these spectral indices to water quality metrics in the absence of in situ observations. First, we hypothesize that incorporating multiple spectral indices will describe water quality more robustly than selecting a single spectral index. We test this hypothesis by evaluating the accuracy of single variate logistic regression models for each spectral index against multivariate logistic regression models that incorporate multiple spectral indices. Second, we hypothesize that algal blooms can be identified directly from true color composite satellite imagery, obviating the need for in situ observations. We test this hypothesis by training univariate and multivariate logistic regression models of algal bloom presence with bloom observations identified via visual interpretation of satellite imagery. We evaluate the performance of the logistic regression model calibrated with the visual interpretation calibration dataset relative to those calibrated with in situ samples to determine the efficacy of generating training data from satellite imagery. The work presented here differs from previous efforts by combining bloom presence and absence data with logistic regression models to produce bloom presence probabilities from multivariate models.

Study Site
This work was conducted in Brownlee Reservoir, located on the Idaho-Oregon border (Fig. 1). It is the largest reservoir in the Hells Canyon Complex of hydroelectric reservoirs at 61 km 2 in surface area, 93 km in length, and 1.8 km 3 in volume, with a maximum depth of nearly 100 m near the dam. 50 The reservoir is ∼650 m wide, on average, and is surrounded by hills with 20% to 30% slopes. Brownlee Reservoir has designated beneficial uses of cold-water aquatic life, primary contact recreation, domestic water supply, industrial water supply, irrigation water, livestock watering, salmonid rearing and spawning, resident fish and aquatic life, wildlife and hunting, fishing, boating, aesthetics, and hydropower. 51 Brownlee Reservoir is listed as impaired for excess nutrients associated with nuisance algae growth and has a history of cyanobacteria blooms. 51 The reservoir is an active recreation destination with ∼20;000 nights of camping along the shore of the reservoir in 2013. 11 Additionally, discharge from Brownlee Reservoir flows into the Hells Canyon National Recreational Area, which has been estimated to have more than 50,000 boaters visit per year making it a significant economic resource where the populace can be impacted by water quality. 50 where in situ samples were collected. Red "+" symbols and blue "x" symbols indicate sites where chlorophyll-a concentrations were above or below 10 μg∕L, respectively. (c) Locations of manual digitization of bloom presence and absence. Red "+" symbols and blue "x" symbols represent "bloom" and "no-bloom" classifications, respectively. 49

Field Data Collection
Water samples were collected by Idaho Power Company personnel from Brownlee Reservoir. Samples were collected from predetermined locations within the reservoir with known coordinates to match sample collection locations with pixels in the associated satellite imagery. Samples were collected within 2 m of the surface, immediately placed on ice, and delivered to the analysis laboratory within 24 h. Samples were spectrophotometrically analyzed for total chlorophyll-a, corrected for pheophytin following standard method 10200H.2. 52 Only results from samples collected on the same date as Sentinel-2 satellite imagery were included in this analysis.
The World Health Organization has identified chlorophyll-a concentrations exceeding 10 μg∕L to be associated with a transition from slight to moderate risk of adverse health effects from primary contact in cases where Microcystis dominates the chlorophyll-a concentration. 53 Although the dominant taxa are not identified in this work, "bloom" and "bloom conditions" are defined in this work to represent chlorophyll-a concentrations greater than or equal to 10 μg∕L.

Visual Bloom Identification
To evaluate the efficacy of developing training datasets directly from satellite imagery, points representing the distinct presence or absence of an algal bloom were visually interpreted and digitized (Fig. 2) from a series of 26 Sentinel-2 satellite images obtained from the Copernicus Application Programming Interface. 48,49 Digitization was conducted in the Geographic Information System ArcMAP 10.8.1 from Environmental Systems Research Institute, Inc. (Redlands, California) where true-color (red, band 4; green, band 3; and blue, band 2) Sentinel-2 images were displayed. Minimum and maximum values in the visualization were set to the equivalent to 0% and 100% reflectance, respectively. Locations associated with algal blooms were visually identified as pixels with elevated reflectance in the green band arranged in continuous shapes associated with algal blooms [Figs. 2(b) and 2(d)]. Points representing no-bloom conditions were identified based on low reflectance in the red, green, and blue bands to provide class balance in training data [Figs. 2(c) and 2(e)]. Bloom and no-bloom conditions were assigned without knowledge of in situ observations to reduce identification bias. Incorporating these data in the evaluation of spectral indices leverages the information that is readily available within historic satellite imagery via conventional image interpretation and is similar to approaches used to develop training data for pixel-based supervised image classification of land cover. 54,55

Satellite Imagery
Level 1C top of atmosphere imagery collected with the multispectral instrument (MSI) sensors on the Sentinel-2A and Sentinel-2B satellites for tile 11TMK was obtained from the European Space Agency through the Copernicus Application Programming Interface. 48 Top of atmosphere imagery was atmospherically corrected using the dark spectrum fitting algorithm approach implemented in the Atmospheric Correction for OLI 'lite' generic processor version (v20190326.0) to produce aquatic reflectance products. 56 See Ref. 57 for a full description of the dark spectrum fitting approach. Default settings were used in the atmospheric correction with the exception that waterbody elevation was set to 610 m above sea level to account for atmospheric path length.
At each location where an in situ or visually identified observation was made, aquatic reflectance values for each band were extracted from all pixels with centers within 50 m of the observation's location. 49 A 50-m buffer was used to spatially smooth reflectance values and to account for potential positional error in sample collection location. The median reflectance values within the 50-m buffer were used to represent each band's value at the specified location. The median statistic was used rather than the mean to reduce the impact of outliers on the resulting aquatic reflectance values.

Spectral Index Evaluation
Seventeen spectral indices that were expected to be sensitive to chlorophyll-a concentrations were selected from the literature and evaluated 30,[38][39][40][41][42][43][44][45][46] (Table 1). Spectral indices developed for sensors other than the MSI sensor used in this work were selected if the central wavelengths of all bands used in index development [i.e., bands from OLI or the medium resolution imaging spectrometer (MERIS)] fell within unique MSI bands. MSI bands were defined for this work by their central wavelengths and full-width half-maximums. 60

Binary Logistic Regression
The probability that an algal bloom was present for each pixel in a Sentinel-2 image was determined by relating the presence (chlorophyll-a concentration >10 μg∕L) or absence (chlorophyll-a concentration <10 μg∕L) of an algal bloom to the value for one or more spectral indices using a binary logistic regression approach. Binary logistic regression was implemented as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 2 0 5 p ¼ e β 0 þβ 1 ÃX 1 þ: : : β n ÃX n 1 þ e β 0 þβ 1 ÃX 1 þ: : : β n ÃX n ; (1) where p is the probability that chlorophyll-a concentration exceeded 10 μg∕L, β 0 is an intercept calibration term, and β 1 through β n are the parameter effects for spectral indices X 1 through X n .
To address class imbalances in calibration data (i.e., more observations of bloom versus nonbloom conditions), weights were applied to the observations as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 1 3 where W p and #P are the weights for and number of bloom condition observations, respectively, and W N and #N are the weights for and number of nonbloom condition observations, respectively. Univariate and multivariate logistic regression models were developed to assess performance of individual spectral indices and combinations of spectral indices to identify algal blooms. Additionally, logistic regression models were trained and tested with different combinations of in situ and visually identified observations to evaluate the impact of different training data sources on model performance ( Table 2).
The "gaged" calibration scenario reflects a widely used approach to model calibration using in situ observations. 30 The "ungaged" scenario evaluates the efficacy of training a model based on visually identified bloom occurrences when in situ observations are too sparse or not available. The "augmented" scenario evaluates the utility of augmenting in situ observations with visually identified blooms.
Each modeling scenario and the associated training and testing data are described in the following sections. Performance between and among univariate and multivariate models was

Univariate logistic regression models
Logistic regression models were developed for each of the 17 spectral indices listed in Table 1 and each of the calibration scenarios listed in Table 2. Performance of the resulting univariate models was quantified by assessing the accuracy of each individual index in identifying algal blooms; these results provided benchmarks to compare multivariate models.

Multivariate logistic regression models
Multivariate logistic regressions were produced to test the hypothesis that classifications based on multiple spectral indices are more robust than classifications from single spectral indices. Three multivariate logistic regression models were developed, one for each of the gaged, ungaged, and augmented scenarios in Table 2, to assess how in situ and visually identified training data affect accuracy of algal bloom identification from Sentinel-2 imagery. Multivariate logistic regressions were produced from the spectral indices listed in Table 1 using a three-step approach. First, highly correlated spectral indices were identified based on their variance inflation factor (VIF) values and removed one at a time to achieve a subset of spectral indices where the VIF for each index was <10. 63,64 This was done by removing the index with the highest VIF, recomputing VIF for all remaining indices and removing the subsequent index with the highest VIF. This process was repeated until no indices had VIF values above ten. Second, the scenario-specific training dataset identified in Table 2 was selected. Third, multivariate logistic regressions were calibrated using all spectral indices identified through the VIF-based variable selection process. During the calibration procedure, parsimonious multivariate models were identified using stepwise variate selection with the objective of minimizing the Akaike information criterion (AIC). 65 This procedure was repeated for all three calibration scenarios in Table 2.

Accuracy Assessment
The accuracy of the logistic regression models was evaluated using a 10-fold cross validation approach with an 80% calibration, 20% validation split ( Table 2). For each iteration, 80% of the in situ data were randomly selected as the training dataset, and the remaining 20% were used to test model accuracy. Performance was evaluated using four metrics: precision, recall, F1 score, and overall accuracy. Precision is a measure of how many of a model's positive predictions (e.g., above threshold) were correct [Eq.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 9 5 recall ¼ #TP∕ð#TP þ #FNÞ; where #TP is the number of true positives, #FP is the number of false positives, and #FN is the number of false negatives. The F1 statistic was used as a multiple-criterion metric to evaluate the performance of logistic regressions that accounts for the trade-off between precision and recall. 66 The F1 statistic was computed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 4 0 5 Accuracy, defined here as the percent of observations that were correctly classified, was calculated to provide a more intuitive and familiar evaluation of model performance. Accuracy was calculated as the number of true positive and true negative results divided by total number of observations in the validation dataset. 67 An exceedance probability of 50% (0.5) was used to classify model output as exceeding 10 μg∕L. Figure 3 provides a graphical example of the four possible outcomes, true positive, true negetive, false positive, and false negative, for each validation data point relative to the 50% and 10 μg∕L thresholds.

Field Observations
Twenty-four in situ observations from 15 sites along Brownlee Reservoir were used in the analysis (Fig. 1). Chlorophyll-a concentrations in these samples ranged from 1.2 to 241 μg∕L with a median value of 7 μg∕L. There were 10 observations (42%) with concentrations of 10 μg∕L or higher, indicating relative parity in observations above and below the 10 μg∕L threshold. An additional 195 points were manually digitized from 26 Sentinel-2 images (Fig. 1). Of the manually digitized points, 109 (56%) were classified as blooming conditions. Data are available in Ref. 49.
Reflectance spectra from extract from imagery at bloom locations showed elevated reflectance in bands three (∼559 nm) and five (∼704 nm) for both in situ and visually identified bloom locations (Fig. 4). The reflectance values were similar between visually identified and in situ observations for the nonbloom conditions while reflectance values were higher for bands 3 Fig. 3 Schematic of result quadrant to illustrate the four possible outcomes for each validation data point. Observed data are classified as those that fall above or below 10 μg∕L of chlorophyll-a. Model results are divided between those predicting more or less than 50% probability of exceeding 10 μg∕L.  (∼560 nm), 5 (∼704 nm), 6 (∼740 nm), 7 (∼783 nm), 8 (∼833 nm), and 8a (∼865 nm) for the visually identified data under bloom conditions than for the in situ data.

Univariate Model Performance
With the gaged calibration approach, the relationship between four spectral indices (S02, S08, S10, and S13) and chlorophyll concentration exceeding 10 μg∕L were statistically significant (p < 0.05). Of these four models, the univariate models based on S10 and S13 had the highest classification accuracies of 80% and F1 scores of 0.74 (Table 3). The univariate models established with the gaged calibration approach that used indices S09 and S16 were the highest performing with clear separation in exceedance probability between concentrations above and below the 10 μg∕L threshold (Fig. 5) but were not found to be statistically significant (i.e., the model β 1 term had p > 0.05). Misclassified observations for models based on S10 and S13 had concentrations within 2.5 μg∕L of the 10 μg∕L threshold on average illustrating that for the best performing models, cases of misclassification were limited to conditions near the 10 μg∕L threshold (Fig. 5).
When training with the visually identified dataset and testing on the in situ observations in the ungaged approach, all spectral indices, except those based on S06 and S17, had statistically significant relationships (p < 0.05) with the probability of bloom occurrence. Of these models, those based on S08 and S14 were the highest performers with accuracy rates of 79% and F1 scores of 0.67 (Table 4). However, separation in exceedance probabilities across concentrations was less clear (Fig. 6) when compared to the gaged calibration approach (Fig. 5). Misclassified observations for models based on S08 and S14 had concentrations within 4 μg∕L of the 10 μg∕L threshold on average, suggesting that cases of misclassification were limited to conditions near the 10 μg∕L threshold for the best-performing models (Fig. 6).
When in situ observations are augmented with visually identified observations in the augmented calibration approach, all models except those based on S06 and S17 had statistically significant relationships with the probability of bloom occurrence (p < 0.05). Models based on S05, S08, and S14 had the highest F1 scores (0.58) and the highest accuracy (74%, Table 5). Accuracy and F1 values for these highly correlated indices were lower than for the top performing models under the gaged and ungaged calibration approaches because of decreases in precision driven by an increase in false negatives (Fig. 7). Misclassified observations for models based on S05, S08, and S14 had concentrations within 3 μg∕L of the 10 μg∕L threshold on average, indicating that for the best performing models in the augmented calibration approach the cases of misclassification are limited to conditions near the 10 μg∕L threshold (Fig. 7).

Multivariate Model Performance
Of the 17 spectral indices examined for Sentinel-2 (Table 1), S01, S03, S04, S05, S09, S10, S11, S12, S14, S16, and S17 were found to be the most highly correlated with other indices (Fig. 8) and were removed in the stepwise, VIF-based variable selection process. The remaining six indices, S02, S06, S07, S08, S11, and S15, had VIF values <10 at the end of the stepwise removal process and were selected for evaluation in the multivariate regression approach.
The best performing multivariate models for the gaged (M G ), ungaged (M U ), and augmented (M A ) model calibration approaches had accuracies of 0.80, 0.79, and 0.82, respectively (Table 6). For the multivariate models, the augmented calibration approach also had the highest F1 statistic (0.73), although it is rather similar to the F1 score of 0.72 for the gaged calibration approach. Misclassified observations for the gaged, ungaged, and augmented multivariate models had concentrations within 3 μg∕L of the 10 μg∕L threshold, on average, suggesting that for all multivariate models the cases of misclassification are limited to conditions near the 10 μg∕L threshold (Fig. 9). The spectral indices included in the best performing multivariate models varied by calibration scenario ( Table 7). The multivariate model calibrated with the gaged approach (M G ) selected two model members. The stepwise parameter selection process for the ungaged multivariate model calibration approach (M u ) resulted in a univariate model (S08), as a balance between model parsimony and maximum model likelihood. The multivariate model calibrated with the augmented dataset incorporated all potential spectral indices except for S02 and S06. In all cases, the models with the lowest AIC also had the highest F1 scores.

Discussion
We developed models that can be applied to identify algal blooms from satellite imagery by evaluating different data sources describing the presence and absence of algal bloom conditions against multiple spectral indices designed to identify chlorophyll presence.

Correlation of Spectral Indices
Although 17 spectral indices were identified in the literature and evaluated in this work, many were found to be highly correlated. Through an iterative index removal process, 11 indices were removed before the remaining indices had VIFs <10. This result indicates that only six of the evaluated 17 spectral indices are required to represent the observed variability in chlorophyll-a concentrations. Reducing the search space by more than 60% is valuable as it reduces the number of indices that require evaluation.

Spectral Indices
As expected, spectral indices developed for and evaluated on MSI imagery outperformed those developed for other sensors when used in isolation in the univariate models calibrated with in situ observations (Tables 1 and 3). Specifically, the top-performing univariate models with the gaged calibration approaches, S10 and S13, were developed for the MSI. 30,43 They also both focus on band 5 (704 nm) relative to band 4 (665 nm) normalized by bands 6 (740 nm) or 8a (865 nm) thus illustrating the importance of the "red edge" and red bands for retrieving a chlorophyll signal in agreement with previous work. 68 However, indices developed for other sensors joined the top performers when model calibration included visually identified data and in multivariate models. Specifically, S08, developed for OLI, 30 and S14, developed for the MSI, were the top performing univariate models in the ungaged calibration scenario (Table 4). Indices S08 and S14 focus on the reflectance peak for band 3 (560 nm), illustrating the influence of "green" light on the identification of algal blooms when using RGB color composites to identify algal blooms. Index S05, developed for MERIS 58 and focused on the "red edge," joined S08 and S14 as a top performer for the augmented calibration scenario ( Table 5). The improved performance from the augmented calibration scenarios (Table 6) highlights the value of using visible, spatial, and infrared cues to identify algal blooms.

Univariate versus Multivariate Results
The multivariate model performed just as well as all the statistically significant univariate models for the gaged and ungaged calibration scenarios. The multivariate model under the augmented calibration scenario was the highest performing of the statistically significant models overall. This increase in the performance could be due to the incorporation of multiple spectral features present in algal blooms (Fig. 4), as the ensemble model calibrated with in situ data augmented with spectra extracted from satellite imagery focused on bands 2-5 and 8a (Table 1). This result is similar to previous studies 69,70 and is consistent with our hypothesis "incorporating multiple spectral indices is more robust than selecting a single spectral index." The improvement in accuracy is attributable to an increase in precision associated with a reduction in false positives as well as an increase in recall. These results suggest that the multivariate models were more skilled in identifying observed bloom conditions (Table 6).

Incorporating Image Derived Training Data
The univariate and multivariate models trained on visually identified training data alone were nearly as accurate (79% accuracy) as training based on in situ observations (80% accuracy). This is a remarkable finding because it implies that training datasets can be built for waterbodies lacking in situ data by extracting the necessary information from the satellite images themselves. Further, the multivariate approach calibrated on the augmented observations provided the highest accuracy overall with a mean accuracy of 82% indicating a benefit of including visually identified end-member spectra even in cases where in situ data are available.  The multivariate model calibrated on visually identified data (M U ) had near perfect model recall, meaning that nearly all the observed bloom conditions in the in situ observation dataset were identified in the resulting model. However, this same model had relatively low precision due to the presence of numerous false positives. The high recall and low precision indicate that classification with visually identified data is best suited to cases where decision makers tend to be more tolerant of false positives than false negatives. Notably, the probability (50%) and concentration (10 μg∕L) thresholds can, and likely should, be adjusted in this approach to fit end-user communication and reporting needs. In fact, it can be seen in Fig. 9(c) that selecting a slightly higher chlorophyll-a threshold (∼15 μg∕L) would result in perfect classification. Figure 4 shows that the visually identified bloom locations had higher NIR reflectance than pixels identified as bloom conditions via in situ observations. This may reflect a bias in the visual interpretation toward identifying floating algae that would have higher NIR reflectance than submerged algae. Further, a robust analysis of the consistency and repeatability of manually classified training data in Brownlee and other waterbodies could improve classification.
The ungaged model results indicate that the use of image-derived spectra for training models could be useful in cases where in situ observations are limited. The reasonable accuracy obtained with the ungaged multivariate calibration (79% for M U ), and the increased accuracy   of the augmented multivariate (M A ) relative to the gaged multivariate (M G ) is consistent with our hypothesis "satellite imagery itself contains information useful for evaluating spectral indices."

Spatial Patterns in Model Results
In addition to the correct identification of conditions at observation locations, the spatial patterns of model results can be examined qualitatively to confirm agreement with features visible in satellite imagery. In Fig. 10(a), an algal bloom is clearly seen in the true color composite. A sample collected from within this feature had chlorophyll-a concentration of 86.6 μg∕L, verifying the feature as an algal bloom. The ribbon-like features of the algal bloom are well described by some of the models (Fig. 10). However, some univariate models do not appear to be sensitive to the presence of the algal mass, returning nearly uniform exceedance probabilities for all pixels in the image. Although this is not a quantitative assessment, examining the models' abilities to reproduce spatial patterns of algal blooms provides insight into an index's general performance.

Sources of Uncertainty
The approach taken here is subject to multiple sources of uncertainty, including but not necessarily limited to the atmospheric correction procedure, interfering effects of sediment and other nonchlorophyll-a containing substances on the chlorophyll-a signal, the presence of nonalgal plants (e.g., submerged aquatic vegetation of sloughed macrophyte mats) obfuscating interpretation of the chlorophyll-a signal as an algal bloom, error rates associated with the visual identification process, the effects of wind-driven sun glint, the use of chlorophyll-a that is not corrected for degradation byproducts like pheophytin, adjacency effects, bottom reflectance, and potential temporal and spatial mismatch between in situ observations and extracted aquatic reflectance values. The limited number of in situ observations likely also contributed to calibration uncertainty, exemplifying the very common challenge of calibrating semiempirical approaches with limited data. Notably, the ungaged calibration approach removes the uncertainty associated with temporal and spatial mismatch as the signals are derived from imagery directly. This, in addition to a larger validation dataset, may have contributed to more univariate models with statistically significant calibrations under the ungaged approach relative to the gaged approach. Despite many potential sources of error, the achieved accuracies of 80% and higher indicate that the algal bloom signal is large in comparison with the noise associated with all these potential sources of uncertainty. The encouraging results reported herein notwithstanding, addressing each of these potential sources of uncertainty could improve model accuracy.

Future Applications
Our intent in introducing this approach is to provide an additional tool for public health and natural resource managers to identify potentially harmful conditions that warrant in situ monitoring. Providing timely situational awareness of algal bloom extent has the potential to increase resource efficiency by guiding field staff to priority sampling locations. These methods also afford the potential to identify nascent blooms in remote areas before they would be identified otherwise. Finally, historic satellite imagery contains information on algal bloom dynamics. Reanalysis of these images could provide information on spatial and temporal trends that might yield insight regarding potential drivers of algal blooms.

Conclusion
Multivariate models were as accurate as univariate indices in classifying aquatic chlorophyll-a relative to a 10-μg∕L threshold. Manually digitized observations of end-member conditions (e.g., bloom and nonbloom) were used to calibrate aquatic chlorophyll-a retrieval in the absence of in situ observations with reasonable accuracy (79%) that is nearly equal to that of using in situ observations only (80%). Augmenting in situ observations with manually digitized observations of end-member conditions (e.g., bloom and nonbloom) improved remote sensing accuracy to 82%. These results suggest that image interpretation might be suitable for deriving training data for algal bloom classification in the absence of or to augment in situ observations matched with Sentinel-2 satellite imagery.
collection. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. government. The Article was prepared solely by employees of the United States federal government as part of the employees' official duties. It is an official U.S. government publication, and is not subject to copyright protection within the United States.

Data Availability
The data used in this study are available in Ref. 49.