Estimating biomass in temperate grassland with high resolution canopy surface models from UAV-based RGB images and vegetation indices

Abstract. Monitoring grassland biomass throughout the growing season is of key importance in sustainable, site-specific management decisions. Precision agriculture applications can support these decisions. However, precision agriculture relies on timely and accurate information on plant parameters with a high spatial and temporal resolution. The use of structural and spectral features derived from unmanned aerial vehicle (UAV)-based image data from low-cost sensors is a promising nondestructive approach to assess plant traits such as above-ground biomass or plant height. Therefore, the main objectives were (1) to evaluate the potential of low-cost UAV-based canopy surface models to monitor sward height as an indicator of grassland biomass, (2) to evaluate the potential of vegetation indices from low-cost UAV-based red-green-blue (RGB) digital image data, and (3) to compare the mentioned methods with established methods for biomass monitoring such as rising plate meters and spectroradiometer-based narrowband vegetation indices over the growing season in 2017, including three cuts. We compared the accuracy of each single UAV-based height feature and vegetation index using a combined multivariate approach to estimate fresh and dry biomass. The heterogeneous sward structure with high spatiotemporal variability led to varying performance in biomass estimation depending on the growths (time between two cuts) and choice of predictor variable. The results showed that biomass prediction by height features provided moderate-to-good results (cross-validation R2  =  0.57 to 0.73 for dry biomass and 0.43 to 0.79 for fresh biomass), but reference measurements based on rising plate meters were more robust when estimating biomass. The spectral features (RGB-based vegetation indices and spectroradiometer-based vegetation indices) yielded varying accuracy and suitability for biomass prediction. Despite the variability, our findings indicate a promising approach for grassland biomass monitoring.

In recent years, structural plant parameters, such as plant height, have become the focus of UAV-based remote sensing approaches for crop monitoring. Plant height derived from multitemporal canopy surface models (CSMs) has been studied as a robust estimator of biomass. [24][25][26][27][28] In addition, crop biomass has been estimated from spectral information from UAV-based standard RGB and multispectral or hyperspectral cameras. [29][30][31][32][33][34] Furthermore, some studies have investigated combining structural and spectral features from ground-based or UAV-based sensors to estimate crop parameters. 7,24,[35][36][37][38] Those approaches have mainly been adopted to monitor arable crops, where spatial heterogeneity is often lower than in grasslands, 10 which typically has a high spatio-temporal heterogeneity as a result of the considerably different floristic compositions and the co-occurrence of different phenological stages. This heterogeneity poses challenges in remote sensing-based estimation of biomass quantity and quality. 39 Some studies have faced these challenges by deploying ground-based measurements using ultrasonic sensors, 39,40 light detection and ranging, 41,42 or a combination of structural and spectral features. 7,36,39,43,44 Furthermore, a few studies have been published on deploying structural features such as height or volume derived from UAV-based camera systems for grassland height or biomass estimation. 6,28,36,[45][46][47][48] However, to our knowledge, no study has utilized low-cost UAV-system-based sward height and vegetation indices (VIs) to estimate grassland biomass for at least one entire growing season with three consecutive growths in mid-latitude Europe.
Therefore, the main objectives of the present study are (1) to evaluate the potential of low-cost UAV-based CSMs to monitor sward height as an indicator of biomass, (2) to evaluate the potential of VIs from low-cost UAV-based RGB digital image data, and (3) to compare those methods with established methods for biomass monitoring such as RPMs and spectroradiometerbased narrowband VIs.

Study Site
The study was conducted in 2017 on an experimental grassland site in Ersdorf (N 50°34′56.4″, E 6°59′21.1″) on the Campus Klein-Altendorf research facility of the Rheinische Friedrich Wilhelms University Bonn, Germany. The site is about 320 m above mean sea level and has a southeast exposition. The experimental site was established in March 2014 as a nitrogen fertilizer gradient experiment to develop a high range in biomass and plant height of perennial ryegrass (Lolium perenne). Further species are also present such as foxtail grass (Alopecurus sp.), timothy-grass (Phleum pratense) and clover (Trifolium sp.), and to a small amount dandelion (Taraxum officinale) and cocksfoot (Ranunculus sp.). The vegetation type was identified as a typical Lolio-Cynosuretum. The percentage contribution of species was limited so that dominance of species did not change significantly and the vegetation type remained the same.
The experimental setup comprised three growths with four cuts (including one equalization cut in early March) within one year, and the plots were set up in six fertilizer levels with three replicates, resulting in 162 plots of 1.5 × 3 m gross area (see Fig. 1). The plots were separated by 20-cm-wide border strips treated with herbicide. Calcium ammonium nitrate fertilizer (CaH 4 N 4 O 9 , 27% N) was applied at the beginning of each growth. Thus, six fertilizer levels were established from 0 to 500 kg N ha −1 in increments of 100 kg N ha −1 . Unfortunately, the, at-times, high activity of moles and the common vole disturbed parts of the sward significantly in several plots of replicate 1 (N1, N2, N5, N6) and replicate 2 (N4) (see Figs. 1 and 2).
The soil type was classified as a stagnosol (soil textural classes: silty clay loam/clay loam/ clay). The annual precipitation in 2017 was 598.5 mm and the mean annual temperature was 10.8°C. Figure 3 displays the monthly mean precipitation and temperature for 2017 along with the long-term monthly means (1956 to 2015). April, May, and June of 2017 were hotter and dryer than the long-term averages. Table 1 lists all sampling dates for the growing season in 2017. For each growth, three sampling dates were scheduled. Data collection included UAV campaigns, spectroradiometer measurements of canopy reflectance, and RPM measurements of compressed sward height and destructive biomass sampling.   T1   T2   T3   T1   T2   T3   T1   T2   T3   T1   T2   T3   T1   T2   T3   T1   T2 T3

Reference Field Data and Biomass Sampling
Reference measurements (SH RPM ) of the compressed sward height were taken using a RPM (Platemeter F400, Farmworks Precision Farming Systems Ltd., Feilding, New Zealand). The device uses an electronic digital counter unit to record the compressed sward height per measurement in millimeters and stores each value in an internal memory. The data were downloaded via USB to a paddock management software (P-PLUS, AgHub Ltd., Feilding, New Zealand) and were exported as a .csv file. Five RPM measurements were taken per plot prior to biomass sampling on each date and averaged to represent mean compressed sward height. A sickle bar mower was used to harvest the biomass of the plots of the respective sampling date directly after each flight campaign. The fresh biomass (FBM) weight for each harvested plot was determined by weighing the clipped biomass per plot. Subsamples of 500 g∕plot were taken, dried in a forced air drier at 65°C to a constant weight and reweighed to determine dry biomass (DBM) yield per unit ground area. Biomass values were upscaled to tons per hectare. Owing to slightly differing plot sizes, the area of each plot was calculated based on a high-resolution UAV-based orthomosaic (1 cm∕pix: see Sec. 2.3) from the first sampling date (April 26, 2017) and double-checked on site with a tape measure to determine the correct upscaling factor per plot.

UAV-based data acquisition
Image data were collected using a consumer-grade DJI Phantom 4 Advanced (DJI, Shenzhen, China). The UAV's camera had a 1" CMOS sensor with 20 megapixels, a field of view of 84 deg (35 mm equivalent format) and was mounted on a three-axis gimbal (Zenmuse X4S). Waypoint navigation and automated camera triggering were programmed using the DJI-GO App (DJI, Shenzhen, China). The flight was set up as a grid pattern (along-and across-track). The flying height was set to 25 m and the speed to 1.8 m∕s. Due to software problems, the across-track flight on June 20 had to be flown manually. Images were captured in 2-s intervals in .jpeg format and saved on an SD card. White balance was set to manual and adjusted to illumination conditions for each sampling date. All flights were performed between 9.30 and 11.30 a.m. CET (solar noon: 12 p.m.) in stable illumination conditions. Forward and lateral image overlaps were >85%.
Twelve ground control points (GCPs) were installed on-site (see Fig. 1). The targets were plywood boards (30 × 30 cm), painted with a black cross on a light-gray background and mounted on wooden poles across the field. The GCPs were measured using a Real Time Kinematic Differential GPS (Topcon HiperPro: Topcon, Tokyo, Japan).

Field spectroradiometer measurements
Canopy reflectance of the sward was measured with an ASD FieldSpec3 spectroradiometer (Analytical Spectral Devices, Boulder, Colorado, USA). The ASD FieldSpec3 acquires reflectance in the wavelength range from 350 to 2500 nm using three detectors: one in the visible-near infrared region (VNIR: 350 to 1000 nm) and two for the shortwave-infrared regions (SWIR1: 1001 to 1830, SWIR2: 1831 to 2500). The fiber optic with a field of view of 25 deg was mounted on an orthogonal suspension rod to ensure fixed viewing geometry for each measurement. The sensor-canopy distance was set to 60 cm, resulting in a footprint diameter of 26.6 cm.
Reflectance measurements with the spectroradiometer were taken directly after each flight campaign (between 11 a.m. and 2 p.m. CET, solar noon: 12 p.m.). Eight reflectance measurements per plot were taken with 10 sample counts per spectra. White reference using a Zenith Lite TM panel and dark current measurements were taken prior to the next plot.
For the first and second growths, reflectance measurements were only possible on two sampling dates due to unstable weather conditions. Furthermore, for the same reason, on three sampling dates, it was only possible to collect reflectance measurements for two of three replicates (see Table 1). Since the UAV flight time to cover the whole experimental field was only about 10 min, it was possible to acquire image data in stable illumination conditions. In contrast, data acquisition with the ASD instrument took about 1 to 1½ h and, depending on changing cloud cover, was sometimes not possible at all.

UAV-based data processing and feature extraction
The image datasets were processed in the SfM software Agisoft Photoscan Professional v1.4 (Agisoft LLC, St. Petersburg, Russia) to obtain digital surface models (DSMs) and orthomosaics, as numerous studies have already demonstrated accurate results. 27,49,50 Image alignment was run on "low" with camera reference preselection, to place one of the 12 GCPs in at least 10 images, preferably more. Reference accuracy settings were set to AE0.01 m for GCPs and to AE10 m for camera positions. After GCP placement, image alignment was run again on "high" quality using 100,000 as the key point limit and 400,000 as the tie point limit. The resulting sparse point clouds were checked for outliers using manual selection. The dense point cloud was computed using high-quality setting and "mild" depth filtering, to preserve the finer details of the grass canopy, based on our own previous tests and as suggested by Cunliffe et al. 51 Table 2 summarizes the outputs of the error reports generated by Agisoft Photoscan v1.4 for each image dataset. The number of images acquired on each sampling date varies due to the start time of camera triggering and slightly varying speed since the speed was needed to be set manually using a slider bar in the app at the beginning of each flight.
The highest errors calculated from the GCPs were found in the Y axis, with no error lower than 4.32 and up to 7.32 cm. Errors in the X axis were between 1.44 and 3.51 cm, and the errors in the Z axis ranged between 0.91 and 5.91 cm. The highest total errors were reported for G1-T3 and G2-T2. The highest reprojection error was reported on G1-T1 with 1.72 pixels and the lowest for G2-T2 with 1.06 pixels. Point densities ranged between 2180 and 2720 points∕m 2 .
The dense point cloud was used to compute a DSM and an orthomosaic ("mosaic blending mode") with a spatial resolution of 2 and 1 cm, respectively, both in WGS 1984, UTM Zone 32 N. Subsequent raster file processing was completed in ArcGIS Pro v2.1 (ESRI, Redlands, California, USA).
The workflow described by Bendig et al. 49 was applied to extract height features. The DSM acquired directly after the equalization cut (T0) was subtracted from the DSM of each sampling date (T1 to T3) per growth to obtain the CSM. Sward height per plot was extracted from each CSM using a polygonal shapefile of the plots (12.5 cm inward buffer to account for border effects). The height features included the mean sward height (SH mean ) and the 90th percentile of the sward height (SH p90 ), since they correlated well with grassland biomass in previous tests and in the studies by Viljanen et al. 7 and Näsi et al. 36 The orthomosaic of each sampling date were used to calculate four RGB-based VIs (VI RGB , see Table 3), from the individual bands of the orthomosaics. For each plot per sampling date, the mean value of the VIs was extracted using the same polygonal shapefile as for the height features.
In addition, two composite indices were assessed, the GrassI and the ExcessGreen Index ðExGIÞ þ SH p90 , calculated from one height feature and one VI RGB . Both indices have performed well in estimating grassland biomass. 7 The VIs were chosen to be comparable to previous studies that used aerial image data. The Red-Green-Blue Vegetation Index (RGBVI) was tested for barley biomass in Bendig et al.'s  56 and for corn and soybean biomass in Hunt et al.'s study. 57 The Excess Green Index was originally intended to calculate fractional vegetation cover 54 but showed good results for grassland biomass estimation. 7 Gitelson et al. 52 introduced the Visible Atmospherically Resistant Index (VARI) as an extension of the NGRDI to assess vegetation fraction, but it also showed good results in estimating grassland biomass. 55

Field spectroradiometer-derived vegetation indices
Spectroradiometer measurements were preprocessed in the ASD-software Indico Pro v5.0 (Analytical Spectral Devices, Boulder, Colorado, USA) for sensor offset correction and subsequently processed in the R package "prospectr" 58 by applying a Savitzky-Golay filter with a second-degree polynomial and a moving window size of 17 to smooth the spectra. For this study, VIs from the VNIR (VI VNIR ) region were tested (see Table 4). The Normalized Difference Vegetation Index (NDVI), Optimized Soil Adjusted Vegetation Index (OSAVI), Renormalized Difference Vegetation Index (RDVI), Red Edge Inflection Point (REIP), and Normalized Difference Red-Edge Index (NDREI) were chosen to be comparable to previous studies. 7,36,[59][60][61] In addition, the VIs listed in Table 3 were calculated from narrowband data acquired by the ASD (with 670 nm as red, 550 nm as green, and 480 nm as blue band) to evaluate the VI RGB obtained by the UAV's camera.

Statistical Analysis and Evaluation
Statistical analysis was performed in R v3.5. 67 The sensitivity of each feature was tested using Pearson's Correlation Coefficient (PCC). Bivariate and multivariate linear regression models were established to estimate biomass as a function of the predictor variables. The package "caret" was used for statistical modeling. 68,69 For the bivariate regression models, each height and VI feature was taken as a predictor of DBM and FBM. For the multivariate linear regression (MLR) models, each VI feature was paired with one of the height features. The data were split based on the three growths and the statistics are reported accordingly. Prediction accuracy of each predictor variable was quantified using the coefficient of determination (R 2 ) and root-mean-squared error (RMSE). The performance metrics were calculated using leave-one-out cross-validation (LOO-CV). LOO-CV holds out one sample point as a reference, while the regression model is trained using all remaining samples. This process was repeated n times. The resulting error estimates for n runs were averaged. 70 3 Results

Orthomosaics and Canopy Surface Models
The orthomosaics and CSMs for the three sampling dates of the first growth are presented in Fig. 4. Both display a good visual correlation to the fertilizer treatments and a detailed representation of sward height is achieved with high spatial resolution. The disturbances by rodents as mentioned above became visible in the CSMs and orthomosaics. In the orthomosaics of Fig. 4, the effect of a rainfall event on G1-T2 shortly before image acquisition can be seen: the wet soil appears darker than the dry soil on the first and third sampling dates.

Reference Measurements
Due to vigorous growth in G3, grass stalks were slightly lodging in the highest fertilizer treatments (N5 and N6) on G3-T3 (see Fig. 2). Plots with high rodent activity were excluded from the analysis, since they produced outliers with unreasonable biomass values. Table 5 summarizes the descriptive statistics of the biomass measurements. G1 and G3 yielded the highest biomass values, while G2 was affected by drought and thus had lower tiller density and biomass development. Table 6 summarizes the descriptive statistics for the RPM sward height reference measurements (SH RPM ) and the mean sward height derived from CSM (SH CSM ). The SH CSM resulted in a higher range of values and a higher standard deviation than the SH RPM .   Table 7 lists the PCCs for the CSM-based height features SH p90 and SH mean and the RPM reference measurements of DBM and FBM. Since the SH p90 correlated slightly better than SH mean for DBM and FBM, only the SH p90 is described in the following section. For completeness, a sensitivity analysis of all single features by growths, by growths and treatment, and by growths and sampling date is included in the Appendix (Tables 12-19).
The relationships between DBM and SH RPM and SH p90 are displayed in Fig. 5. SH RPM shows a strong linear relationship to DBM with little deviation from the regression line for all three growths.  The SH p90 feature shows larger deviation from the regression line, while the second growth falls outside the pattern of the first and third growths. Nevertheless, the R 2 for the second growth was higher for SH p90 , compared to the SH RPM . SH RPM and SH p90 were only moderately well correlated. Figure 6 presents the relationship between selected VIs and DBM. The UAV-based NGRDI performed moderately well for the first growth with an R 2 of 0.58 but shows no correlation for the second and third growths. In comparison, the ASD-based nNGRDI performed more consistently for the three growths (see Table 15). Slight lodging of the grass sward in six plots with high fertilizer treatments at G3-T3 led to a higher deviation from the regression line. The spectroradiometer-based NDVI performed moderately well. In comparison to the NGRDI, the ExGI correlated better with DBM in G3 since the slightly lodging sward in six plots of treatments N5 and N6 seemed to have only a minor effect on the ExGI values.

Cross-Validation Results of Simple Linear Regression for Estimation of Dry Biomass and Fresh Biomass
The predictive accuracy of the height features and VIs for biomass estimation was assessed using LOO-CV.  performance with an R 2 of 0.54 and 0.58, respectively, for G1 but showed no correlation for G2 and G3. Excluding G3-T3 (slightly lodging sward in six plots of treatments N5 and N6) led to a better performance of both indices with an R 2 of 0.42 (RMSE: 0.469 t ha −1 ) and 0.56 (RMSE: 0.409 t ha −1 ), respectively. Similarly, the R 2 of the RGBVI increased to 0.55 (RMSE: 0.416 t ha −1 ). However, the RGBVI showed no predictive accuracy for G1 and G2. The ExGI had little-to-no predictive accuracy for G1 and G2 but an R 2 of 0.55 (RMSE: 0.601 t ha −1 ) for G3. The ASD-based narrowband VI RGB performed well for DBM estimation for G1 and G3, except for the ExGI. However, only a weak correlation was observed for G2. The composite indices GrassI and ExGI þ SH p90 performed as expected in the range of the best-performing component (height features from CSM). The VI VNIR performed best for G1. The NDVI, REIP, and NDREI had an R 2 of 0.83, 0.86, and 0.87, respectively, while the RDVI performed weak for G1. For G2, the VI VNIR showed moderate predictive accuracy, while the REIP only had an R 2 of 0.29. In G3, the red-edge-based indices REIP and NDREI showed no correlation, and the NDVI, OSAVI, and RDVI showed only a weak predictive accuracy. Similar to the VI RGB , the exclusion of G3-T3 increased the performance of the NDVI, REIP, and NDREI, although only a low predictive accuracy could be achieved (NDVI: R 2 0.39, RMSE 0.512 t ha −1 ; REIP: R 2 0.24, RMSE 0.574 t ha −1 ; NDREI R 2 0.26, RMSE 0.567 t ha −1 ). Table 9 presents the cross-validation results of the FBM prediction based on simple linear regression. As for the DBM prediction, SH RPM is the best predictor for FBM for G1 and G3 with an R 2 of 0.78 (RMSE: 1.448 t ha −1 ) and 0.83 (RMSE: 1.367 t ha −1 ), respectively. However, SH p90 has a slightly higher R 2 of 0.79 (RMSE: 1.421 t ha −1 ) for G1 and outperforms SH RPM for G2 with an R 2 of 0.65 (RMSE: 1.166 t ha −1 ). The best-performing VI RGB for G1 were the NGRDI and VARI, while both indices obviously failed to estimate FBM for G2 and G3. The RGBVI showed a better predictive accuracy for FBM (R 2 : 0.55, RMSE: 2.070 t ha −1 ) than for DBM in G2, while the ExGI followed a similar pattern for FBM prediction as for DBM, with the best performance in G3 (R 2 : 0.51, RMSE: 2.331 t ha v1 ). The performance of the ASD-based narrowband VI RGB for FBM estimation was similar to that for DBM estimation. As expected, the composite indices GrassI and ExGI+SHp90 performed in the range of the best-performing single feature (height features from CSM). The VI VNIR performed best for G1. All indices had a high R 2 value above 0.70, and both the rededge-based indices REIP and NDREI had an R 2 of 0.89. Also, for G2, the R 2 values for FBM prediction of the VI VNIR were 0.65 or slightly higher, except for the REIP, which only had an R 2 of 0.36 (RMSE: 1.180 t ha −1 ). For G3, the best-performing indices were the OSAVI and RDVI with an R 2 of 0.57 (RMSE: 2.252 t ha −1 ) and 0.59 (RMSE: 2.203 t ha −1 ), respectively. The lowest RMSE values were observed for the VI VNIR for G2, where all indices except the REIP had an RMSE below 0.900 t ha −1 .
Observed and predicted DBM for selected features are displayed in Fig. 7. SH RPM is closest to the 1:1 line, while predictions based on NDVI and SH p90 deviate more from the regression line, especially for higher DBM values.

Cross-Validation Results of Multivariate Linear Regression for Dry Biomass and Fresh Biomass
Estimations of DBM and FBM using combined structural and spectral features by multivariate linear regression are presented in Tables 10 and 11, respectively. Since the SH p90 outperformed the SH mean in the SLR models, the SH p90 was chosen as a structural feature for the combination with the spectral features (VI RGB and VI VNIR ). The combination of SH p90 and the VI RGB yielded similar results with an R 2 of 0.72 in G1, which is comparable to the highest best-performing single feature (SH p90 ). For G2, the combination of SH p90 and VI RGB led to a better predictive accuracy for the NGRDI and  For the VI VNIR , the results improved significantly for the combination of SH p90 with OSAVI and RDVI for G1, while the other features showed only slight improvement. For G2, the REIP and NDREI showed the best R 2 values with both 0.72, while combining SH p90 and the NDVI, OSAVI or RDVI led to R 2 values lower than the best-performing single feature (SH p90 ). In G3, the combination of OSAVI and RDVI with SH p90 predicted DBM most accurately, while REIP and NDREI performed in the range of the best single feature (SH p90 ). A similar pattern was observable for the multivariate estimation of FBM.

Discussion
The primary aims of this study were (1) evaluating the ability of height features and VI features derived from UAV-based image data to predict DBM and FBM in grassland and (2) comparing the performance of these features with established VIs from the VNIR spectral region and RPM reference measurements. Although Bareth and Schellberg 6 found that SH RPM and the SH CSM correlated well, in this study, both features only correlate at a low-to-medium level depending on the growths. The SH CSM features tend to overestimate the manually measured SH RPM , which was expected due to the compression of the sward when using the RPM. 6 Furthermore, the choice of the RPM instrument might have influenced the correlation due to a different disk weight and diameter.
Recent studies using SfM-MVS to derive canopy height models for grassland have obtained reference measurements in the field with a height stick or a ruler. Dependent on the sward structure, species composition, and growing stage, Grüner et al. 46   when comparing reference measurements from RPMs and SH CSM for a grassland field of mainly perennial ryegrass and clover in southwest England. Higher correlation between RPM reference measurements and low-cost SH CSM (R 2 of 0.83 to 0.91) were reported by Bareth and Schellberg 6 for three consecutive years in the long-term Rengen Grassland Experiment (RGE) in Germany. In contrast to manual measurements, the CSM approach captures the spatial variability of the plant height in much finer detail.
To obtain sward height, the CSM-approach described by Bendig et al. 49 was used. Table 2 shows that the errors of the reconstructed DSMs in X, Y, and Z directions were within the range of up to 7.3 cm, which clearly have an impact on calculating sward height when DSM-T 0 is subtracted from DSM-T 1-n . The high error rates may be due to several factors such as the suboptimal placement of GCPs or the sensor geometry of the DJI P4A camera. Owing to overgrowing of the border strips between the plots, it was not possible to classify a DTM representing the ground for every sampling date. For small-scale field experiments, DTM classification or measuring ground surface points with an real time kinematic differential GPS may be more feasible. Nevertheless, the CSM approach by Bendig et al. 49 may be more applicable to actual field conditions since ground surface classification may not be viable for a fully covered grassland field of several hectares with varying topography. However, the CSM approach by Bendig et al 49 requires that the data for the base model are acquired over the bare soil surface of the field, which is not feasible for grassland fields due to permanent vegetation coverage.
The high rodent activity in a few plots per growth led to higher uncertainties in SH CSM calculation, which, in some cases, resulted in negative sward height values and unreasonable biomass values. The decision to exclude these plots improved the biomass estimation, although these disturbances are likely to occur in the field.
Biomass prediction models based on LOO-CV results showed moderate accuracies depending on the growths when using the height features (SH mean and SH p90 ). Viljanen et al. 7 reported correlation coefficients between 0.75 and 0.98 for biomass prediction using height features from CSM but concluded that biomass prediction accuracy was dependent on the density and growth stage of the sward. Näsi et al. 36 reported correlation coefficients for grassland biomass prediction between 0.10 and 0.41. Grüner et al. 46 achieved R 2 values of 0.46 and up to 0.87 depending on the sward composition for mixed legume-grass swards and pure legumes and grass stands. Roth and Streit 50 analyzed different cover crops, and the prediction accuracy for biomass improved when lodging plants were excluded from the analysis. Our findings are, thus, comparable to other studies. Since G2 was affected by drought (see Fig. 3), low biomass, low tiller density, and low sward heights in G2 led to a weak correlation when all growths were combined in a single regression analysis (see Fig. 8).
Similar results were also observed by Grüner et al. 46 when pure grass stands from the second growth in 2017 were excluded. These findings indicate that challenges remain when using SfM-MVS reconstruction to determine a global model for biomass estimation in pure grass swards.
The varying accuracy especially of the VI RGB from the orthomosaics as biomass predictors might be explained by several factors such as the varying soil color due to rainfall events or drought, bent stalks or flowers. Owing to vigorous growth in plots of treatments N5 and N6 in G3-T3, stalks tended to bend over, and as a result, a higher proportion of the spectral signature of the stalks was captured instead of the desired nadir view of the canopy (see Fig. 2). In this study, the correlation of the UAV-based VI RGB by growth and treatment did not yield more robust results than the analysis of the features excluding the treatment effects. Furthermore, no clear pattern between treatments and VIs is recognizable in the sensitivity analysis (see Tables 14-16). For clarification, an extended sensitivity analysis of all single features can be found in the Appendix.
Dependent on the sampling date, the illumination conditions were stable but either sunny or overcast, which probably contributed significantly to the UAV-based VI RGB values. This observation was also shown by Rasmussen et al. 71 for wheat. Furthermore, the effect of rainfall shortly before image acquisition probably influenced the VI RGB values due to changing soil color (see Fig. 1) The study by Viljanen et al. 7 achieved correlation coefficients of 0.70 to 0.85 for the correlation of biomass and RGBVI, NGRDI, and ExGI. However, for the random forest classifier, the VI RGB was not among the most important features in estimating grass biomass.
The choice of multiple linear regression (one VI feature + one height feature) was motivated by the search for a robust and simple model that can be applied to different growths per season.
Initial tests on different interaction terms for the MLR (such as VI feature × height feature) did not yield more robust results.
The performance of the composite VIs GrassI and ExGI þ SH p90 did not yield significantly better results than the multivariate regression of the respective VI RGB and height features, as they rely on the quality of both features. Owing to the different viewing geometry and technique of the ASD-based VIs, these features are not directly comparable to the results of the UAV-based VI RGB . As expected, VIs from a well-calibrated instrument, and including the NIR spectral region, performed better and more consistently over all growths than the UAV-based VI RGB . 24,38 However, all ASD-based VIs had a decreasing accuracy from G1 to G3 and were affected by slightly lodging sward in six plots with higher N-treatments, as mentioned already. The ASD-based narrowband VI RGB validates the respective VIs as viable for biomass prediction in grassland and demonstrate the challenges associated with uncalibrated sensors such as the camera used in this study.
The heterogeneous sward structure with high spatiotemporal variability compared to crops led to varying performance for biomass estimation depending on the growths and choice of predictor variable. It is therefore essential to study this topic further and to evaluate different swards under varying conditions and sites and over multiple years. A promising approach is the application of high-resolution multispectral imaging sensors, such as the MicaSense RedEdge camera (Micasense Inc., Seattle), which can readily be employed on a UAV. Furthermore, comparing simple regression techniques with more sophisticated algorithms such as random forest should be explored for larger datasets of grassland biomass and UAV-based structural and spectral features.

Conclusions
Estimating biomass in high spatial and temporal resolution is a key component in precision agriculture applications and ecosystem monitoring. In this study, SfM-MVS-derived features of sward height and VIs from UAV-based images were assessed as predictors of grassland above-ground biomass and compared to established narrowband VIs from spectroradiometer measurements. The application of UAV-based imaging sensors serves as a fast and nondestructive method for data acquisition with high spatial and temporal resolution.
This study has shown that, especially for the first growth, which is considered the most important from an agronomical point of view, grassland biomass estimation by SfM-MVS-derived sward height from UAV-based images is feasible and provides an alternative means to manual measurement techniques such as RPMs or clipping. However, the results are influenced by various factors such as growing stage, sward composition, and biotic and abiotic factors, which need to be further investigated. Further research should be focused on integrating structural and spectral features for grassland biomass estimation and on generalizing models to different sites and years.