Benthic river algae mapping using hyperspectral imagery from unoccupied aerial vehicles

Abstract. The increasing prevalence of nuisance benthic algal blooms in freshwater systems has led to water quality monitoring programs based on the presence and abundance of algae. Large blooms of the nuisance filamentous algae, Cladophora glomerata, have become common in the waters of the Upper Clark Fork River in western Montana. To aid in the understanding of algal growth dynamics, unoccupied aerial vehicle (UAV)-based hyperspectral images were gathered at three field sites along the length of the river throughout the growing season of 2021. Select regions within images covering the spectral range of 400 to 850 nm were labeled based on a combination of professional judgment and spectral profiles and used to train a random forest classifier to identify benthic algal growth across several classes, including benthic growth dominated by Cladophora (Clado), benthic growth dominated by growth forms other than Cladophora (non-Clado), and areas below a visually detectable threshold of benthic growth (bare substrate). After classification, images were stitched together to produce spatial distribution maps of each river reach while also calculating the average percent cover for each reach, achieving an accuracy of approximately 99% relative to manually labeled images. Results of this analysis showed strong variability across each reach, both temporally (up to 40%) and spatially (up to 46%), indicating that UAV-based imaging with high-spatial resolution could augment and therefore improve traditional measurement techniques that are spatially limited, such as spot sampling.


Introduction
The presence of nuisance benthic algal blooms is becoming an increasing concern in freshwater systems, such as lakes and rivers, where widespread growth of algae has been shown to indicate deteriorated water quality and ecosystem health. 1,2][5] Cladophora glomerata is one of over 400 species of Cladophora, a genus that is considered one of the most abundant alga in alkaline rivers worldwide and one of the most important nuisance filamentous alga in inland waters. 6[9][10] Water quality monitoring can be based on chemical or biological analyses, both of which provide valuable insight into nutrient enrichment and stress in fluvial systems. 11Historically, water quality standards in Montana have been based on either visual standards or numeric levels of chemical properties, such as benthic algal standing crops, measured as the mass of the pigment chlorophyll a (chl a) per square meter; 12,13 however, these assessment methods have limitations.Visual standards lack quantitative biological analysis, whereas numeric standards are based on in-situ sampling that is limited in the spatial distribution and coverage required to accurately assess large-scale algal growth. 14,15ater quality monitoring along the UCFR has a long history, with major impairment recorded as early as 1908. 16Beginning in the 1970s, large blooms of Cladophora became a regular occurrence between the UCFR headwaters near Butte, Montana, and approximately 193 river kilometers downstream, near Missoula, Montana. 7,16Cladophora growth in the UCFR is spatially patchy and dependent on stream morphology and flow, 17 indicating that water quality measurements that include assessment of algal standing crops must take this spatial variability into account; however, laboratory-based analysis of algal samples does not capture information on the spatial extent of algal growth. 5Current methods for determining algal coverage rely on direct visual assessment based on qualitative estimates of percent algal cover, predominant algal color, and growth condition, measured at several points along a transect. 18Transect methodologies require little training and time to perform but only capture small regions, are not often quantitatively rigorous, and are commonly site-specific.Some sampling-based methods are designed to be quantitative, such as the USGS richest targeted habitat method, 19,20 but are resource intensive and have limited spatial representation.
5][26][27] More recently, the use of unoccupied aerial vehicle (UAV) systems has grown in popularity due to their flexibility in revisit times and flight planning, ability to carry a variety of imaging systems, and higher spatial resolution. 28,29UAV-based systems have captured imagery using uncalibrated red-green-blue (RGB) imagers 17,30 and multispectral systems with near-infrared sensitivity [31][32][33] with promising results for identifying algal blooms.Methods to classify imagery and estimate algal coverage typically rely on spectral information that is analyzed using a variety of methods, such as spectral angle mapping, 17 statistical approaches, 31 and machine learning techniques. 30,33,34Though hyperspectral satellite imagery has shown promise in differentiating between cyanobacteria genera in large bodies of inland water using spectral analysis, 26 there is little work exploring the efficacy of UAV-based hyperspectral imagery to map the spatial distribution of benthic macroalgae and SAV in mid-sized inland waterways.
This paper builds on a recently published method in which a UAV-mounted hyperspectral imager was used to estimate river algae pigment abundance. 35In this paper, a random forest classification method was adopted to explore all available wavelengths, but we also report elsewhere progress toward a low-cost multispectral imager that will rely on more traditional spectral classification methods. 36Here, the spatial distribution of river algae is mapped using UAV-based hyperspectral imagery collected across the 2021 growing season, with the following objectives.
1. Differentiate between benthic regions dominated by Cladophora growth (Clado), other forms of benthic growth (non-Clado), and areas below a visually detectable threshold of benthic growth (bare substrate) using a custom-trained random forest classification model.2. Generate georectified classification maps from UAV-based hyperspectral imagery.3. Estimate average percent cover at three field sites along the UCFR sampled between June and September 2021.
Three field sites were selected along the UCFR to capture algal growth characteristics near the headwaters (Deer Lodge, 46.38°N, 112.74°W), midway through its length (Gold Creek, 46.59°N, 112.93°W), and near its confluence with the Blackfoot River (Bear Gulch, 46.71°N, 113.33°W) to capture possible influence of nutrient transport and effects of input drainages (Fig. 1).Macrophyte growth along the UCFR follows a complex seasonal pattern, influenced by many factors, such as net primary productivity, flow rates, temperature, and canopy shading and is typically initiated after snowmelt-fed runoff hydraulically scours benthic material in May or June, 10,37 informing data collection periods.

Data Collection
Image data were collected using a Resonon Pika L Airborne Hyperspectral Imaging System (Resonon Inc., Bozeman, Montana, United States) mounted on a DJI Matrice 600 Pro hexacopter (DJI, Shenzhen, China) using a DJI Ronin-MX gimbal between 30 June 2021 and 16 September 2021 on an approximate biweekly schedule.The Pika L Hyperspectral Imager has a spectral range of 387 to 1023 nm and spectral resolution of 2.1 nm, producing 300 spectral channels per pixel.Images were captured using a 17-mm objective lens with a 17.6 deg across-track full-angle field of view directed nadir, resulting in an across-track ground swath of approximately 37 m when flown at a typical height of 120 m above ground level.Flight paths were established to follow the midline of each river segment for approximately 1 km and held constant throughout the data collection period to ensure the same reach was evaluated throughout the study.All UAV flights were planned and conducted by, or under the supervision of, a pilot certified through the Federal Aviation Administration Part 107 licensure program.

Image Preprocessing
Image data were converted from raw digital number to reflectance using a cosine-corrected Ocean Insight Flame VIS-NIR spectrometer (Ocean Insight, Orlando, Florida, United States) mounted to the top chassis of the UAV and calibrated to measure downwelling spectral irradiance from 350 to 1000 nm with a nominal spectral resolution of 1.34 nm.Downwelling irradiance spectra were measured throughout each UAV flight, with measurements taken each time a new hyperspectral image was recorded by the imager, then a spline interpolation was used to match the spectral channels on the Pika L. Images were converted to reflectance using E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 4 ; 6 4 0 where all terms are a function of wavelength (λ) and R image is the reflectance of the image, L upwelling is the upwelling radiance measured by the Pika L, π accounts for assumed Lambertian reflectors in the scene, and E downwelling is the downwelling irradiance measured by the spectrometer.Once converted to reflectance, images were smoothed using a Savitzky-Golay filter with a window length of 13 and polynomial order of 3 38 to suppress noise in the measured spectra, reducing the risk of spurious classification.Due to strong water absorption in the near-infrared, wavelengths beyond 850 nm were removed, leaving 221 spectral channels in each image; however, the reflectance of the 850 nm channel was used as a method to identify and remove both sun glint and bank pixels from the image.To simplify the classification task, pixels containing material other than water (e.g., floating debris and bank vegetation), or pixels containing shadows or sun glint, needed to be removed from images prior to classification.Using spectral profiles and professional judgement, a reflectance greater than 5% at 850 nm was found to be an effective means of separating pixels containing submerged objects from pixels containing sun glint or non-submerged material, whereas a reflectance less than 1.5% at 550 nm was found to isolate shadowed water pixels for removal.
Hyperspectral image data were spatially corrected using Resonon SpectrononPro software 39 to align the imagery with cardinal directions and latitude and longitude while also accounting for spatial deviations in imagery caused by following bends in the river.This spatial correction was performed using an automated georectification tool that used data from the onboard GPS and inertial measurement unit data, along with ground elevation, flight height, and focal length of the imaging system.

Training
Image classification was based on a supervised random forest classification model implemented using Scikit-Learn. 40The model was trained and validated on 2,157,996 manually labeled pixels from hyperspectral imagery collected along the UCFR during 2021 to 2022.Images were selected between June and September to capture benthic growth under a variety of conditions throughout the summer growing season, including changes in physical characteristics (e.g., flow rates and water temperature) and biological characteristics (e.g., phenological state and biomass accrual).Representative pixels from 13 hyperspectral images were visually identified and labeled into three classes: benthic growth dominated by Cladophora (Clado, n ¼759,873 pixels), benthic growth not dominated by Cladophora (non-Clado, n ¼705,483), and benthic growth below a visually detectable threshold (bare substrate, n ¼692,640).These representative pixels were manually labeled by drawing polygons around homogeneous pixel groups based on professional judgment using RGB composite color images (with the MATLAB Image Processing Toolbox 41 ).The RGB images were created with SpectrononPro using wavelengths of 639.8, 550.0, and 459.7 nm.Contrast was enhanced with the histogram equalization method with the "process bands individually" option selected.The pixel labels in the initial, manually drawn polygons were validated by visual inspection of the pixel spectra that were displayed as the mouse was hovered over individual pixels within the polygon in SpectrononPro, to ensure similarity of the spectral shape of each class (Fig. 2).For example, pixels labeled as Clado contained relatively strong reflectance in green (∼550 to 560 nm) and near-infrared wavelengths (∼700 to 750 nm), although no quantitative threshold was used.Similarly, pixels labeled as non-Clado lacked both a prominent green peak (∼550 to 560 nm) and prominent chl a absorption dip (∼670 to 690 nm), giving them a dark brown visual appearance, with a small near-infrared reflectance peak (∼700 to 750 nm) that indicated the presence of vegetation.Bare substrate pixels had a relatively featureless spectrum across the visible wavelengths (400 to 700 nm) resulting in a light gray or tan appearance.Benthic growth labeled as Clado was often highly separable from the other classes; however, non-Clado and bare substrate were spectrally similar at times, likely due to trace amounts of growth across the benthos, which may have introduced errors in the manual labeling process.Labeled pixels were randomly split into 30% testing and 70% training subsets using the "train_test_split()" function in Scikit-Learn 40 to train and validate the random forest model.Hyper parameters were tuned using a randomized search across the training data subset with 5-fold crossvalidation across 50 iterations, leading to the parameters shown in Table 1.
The fivefold cross-validation on the tuned random forest model produced a mean training accuracy of 99.9% and a mean validation accuracy of 99.6% across all folds.The tuned classifier showed little confusion when predicting classes, with the largest error attributed to incorrect classification  between non-Clado and bare substrate [Fig.3(a)].As full-spectrum images were used during classification (i.e., without dimensionality reduction), the importance of the ten spectral bands with the highest predictive power were analyzed using the mean decrease in impurity [Fig.3(b)].Two of the top ten bands matched spectral channels previously used to predict chl a abundance in the UCFR 35 and surrounded a known chl a absorption line near 675 nm. 42,43In addition, two green wavelengths were identified, likely because of their utility in differentiating between the typical bright green Cladophora and areas of benthic growth other than Cladophora, which are generally dark green or brown.
Though the classifier showed promising early results, the random train/test split was based on pixel-by-pixel assignment, therefore, the accuracy metrics may have been biased high due to spatial autocorrelation (i.e., adjacent pixels that are spectrally related present in both training and validation sets).To further assess the accuracy of the method, two labeled hyperspectral images were withheld from the training and validation process and used only to test accuracy.The trained algorithm was run on these unseen images and the accuracy of the labeled regions was reported per class in terms of precision, recall, and F1-score (Table 2) along with confusion matrices (Fig. 4).Precision is a measure of the positive predictions that were correct (true positives) out of all positive predictions [Eq.( 2)], recall measures true positives identified from all positive labeled data [Eq.( 3)], and F1-score is the harmonic mean of recall and precision [Eq.( 4)]: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 4 ; 2 7  7 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 4 ; 2 2 4 Recall ¼ E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 4 ; 1 8 9 where T p is the number of true positives, F p is the number of false positives, and F n is the number of false negatives.The classification algorithm showed good performance, with an average F1-score of 99% across the 122,075 labeled pixels.

Percent cover estimates
After classification and preprocessing, including identification and removal of pixels containing riverbank material, all labeled pixels contained only submerged material, from which percent  42,43 and match salient wavelengths identified in previous work. 35over estimates could be obtained.Percent cover was calculated for both Clado and non-Clado by dividing the number of pixels classified as either class by the total number of pixels identified as being within the water (i.e., pixels with a reflectance less than 5% at 850 nm).The resulting value gave a percent cover estimate for each classified data cube, spanning roughly 80 m.The average percent cover for each UAV flight was calculated as the mean of all classified data cubes collected along each 1-km river reach.

Results and Discussion
UAV flights were conducted between June and September of 2021 (Table 3) at the three field sites discussed in Sec.2.1.Individual, preprocessed hyperspectral data cubes captured during each flight were used as inputs to the trained random forest classifier.Each classified image contained predictions for every pixel in the data cube, where green pixels represent Clado, blue represent non-Clado, gray represent bare substrate, and black represent non-submerged or shadowed pixels removed during preprocessing (Fig. 5).After classification, the spatially corrected images gathered at each site were manually stitched together and the average percent cover was calculated across the entirety of the reach.Percent cover estimates generally showed higher levels of non-Clado across the growing season, with an average percent Clado cover of 26%, 42%, and 29% and average percent non-Clado cover of 58%, 42%, and 37% for Deer Lodge, Gold Creek, and Bear Gulch, respectively (Table 3).Clado coverage estimates generally declined at all sites until late July, with the exception of significant growth at Gold Creek on 15 July, indicating that maximum algal cover occurred prior to UAV flights (Fig. 6).Cladophora growth showed signs of resurgence in early August at Gold Creek and Bear Gulch, with percent cover estimates gradually increasing until the end of the sampling period.Deer Lodge also showed signs of a resurgence event, with percent cover increasing during the final UAV flight in early September.Non-Clado benthic growth showed an inverse relationship with Clado at Deer Lodge and Gold Creek, with more variability observed at Bear Gulch.Surprisingly, the seasonal average of non-Clado coverage was higher than Clado coverage at Deer Lodge and Bear Gulch but was equal at Gold Creek.Non-Clado coverage tended to decrease moving farther downstream from the headwaters of the UCFR, dropping from a seasonal average of 58% at the most upstream site to 37% at the most downstream site.
As noted in previous work on the UCFR, algal growth was spatially variable within each reach, 17 ranging from nearly total coverage to areas with little to no algal growth [Fig.7(a)].Estimating percent cover in 80-m steps along the full 1-km reach at the Bear Gulch field site on 2 September 2021 revealed that Clado and non-Clado varied by up to 42% and 46%, respectively [Fig.7(b)].Given this spatial variability, existing measurement methods that rely on spot sampling along a small section of a reach may not capture the true character of algal activity in the river.
Many factors influence Cladophora growth rates; however, growth has been shown to be related to stream velocity, 44 which may explain the tendency of blooms to form along outer bends in the river, though more work is required to understand the exact relationship between stream morphology and algal growth. 3In addition to spatially variable growth, rapid temporal shifts were also noted across all sites and sampling periods, with Clado cover fluctuating by 40% between 15 and 22 July at Gold Creek and non-Clado coverage following similar patterns.

Conclusion
The significant spatial and temporal variability of Clado and non-Clado in the UCFR suggests that spatially limited traditional methods of measuring percent cover may provide misleading results by missing this variability.Using a custom-trained random forest classifier, georectified spatial distribution maps were created for three field sites along the UCFR between June and September of 2021, highlighting the spatial and temporal variation of algal growth in the UCFR and an inverse relationship between Clado and non-Clado.Though the method showed promising results, with training and validation accuracies greater than 99% and an average F1-score of 99%, there are limitations.First, ground control points were not used during the georectification process, introducing possible errors in spatial accuracy.Second, the removal of bank and shadow pixels was not field-verified, again possibly introducing small errors in the estimation of percent cover.Finally, though the presence of Cladophora and other forms of benthic growth were verified in the field, ground-truth samples were not collected to assess the accuracy of the classification method versus in-situ samples and the role of depth in identifying benthic growth was not explored.
The ability to classify algal growth along continuous river reaches allows for better assessment of benthic algal cover but is also a step forward in species identification by differentiating between areas dominated by either Cladophora or benthic growth other than Cladophora.Separation by growth form allows for further studies of habitat preference, seasonal shifts in dominant growth forms, and a better understanding of eutrophication over larger scales.In future work, the classification methods presented here will be combined with techniques for estimating algal biomass, as chl a abundance, to provide georectified classification and biomass maps.Additionally, the salient spectral bands are being used to develop a low-cost, compact multispectral algae imager. 36

Fig. 2
Fig. 2 Example spectra for the three labeled classes, including pixels containing benthic growth visually labeled as Clado (upper left), non-Clado (bottom right), and bare substrate (upper right).

Fig. 3
Fig.3(a) Confusion matrix of the tuned random forest model, showing a summary of the predictions made by the classifier, where the largest classification error was between the non-Clado and bare substrate classes, though this error was less than 1%; and (b) top 10 wavelengths for making class predictions extracted using mean decrease in impurity, where the most important features (right) surround known chl a absorption lines42,43 and match salient wavelengths identified in previous work.35

Fig. 4
Fig. 4 Confusion matrices of the tuned random forest model tested on (a) validation image 1 and (b) validation image 2. Both images showed minimal confusion between Clado and non-Clado pixels, though there was slight confusion between non-Clado and bare substrate.

Fig. 5
Fig. 5 Hyperspectral data cube (a) before and (b) after classification using the trained random forest model at Gold Creek on 1 July 2021, with percent cover estimations of approximately 55% and 29% for Clado and non-Clado, respectively.Covering approximately 80 m, the classification map shows a large bloom of Clado in green, regions of non-Clado near the banks in blue, areas of bare substrate in gray, and non-submerged objects in black.Note the removal of the bridge and associated shadow.

Fig. 6
Fig. 6 Decimal percent cover estimates for (a) Deer Lodge, (b) Gold Creek, and (c) Bear Gulch across the summer growing season of 2021, with Clado shown in green and non-Clado shown in dashed blue.

Fig. 7
Fig. 7 (a) Example of a full-reach image (∼1 km) at Bear Gulch on 2 September 2021, shown in RGB (top) and after classification (bottom) with an average Clado cover (green) of 40% and non-Clado (blue) cover of 27% with bare substrate shown in gray; and (b) percent cover as a function of distance along the river, where each point represents the calculated percent cover across an 80-m ground swath.Calculated estimates varied by up to 42% for Clado and 46% for non-Clado when sampled with 4-cm spatial resolution across a 1-km reach, with the rest of the reach classified as bare substrate, suggesting that spatially limited sampling techniques, such as spot measurements, could be improved with high-spatial-resolution remote sensing methods.

Table 1
Optimal random forest classifier hyper parameters selected after randomized search with 5-fold cross-validation across 50 iterations.

Table 2
Accuracy metrics of the random forest classification algorithms after testing on unseen validation images, reported as precision, recall, F1-score, and the total number of samples in each class.Validation image 1 contained no examples of bare substrate so no metrics are included.

Table 3
Dates, locations, and percent cover estimates of UAV flights during the 2021 data collection period.The average percent cover (bottom row, bold font) indicates that non-Clado benthic growth generally dominated across the 2021 growing season, with the exception of Gold Creek, where percent cover was approximately equal.
a The percent cover estimates from 17 August at Gold Creek are based on an 80-m reach at Gold Creek due to limited data collection during flight.