Modeling wetland aboveground biomass in the Poyang Lake National Nature Reserve using machine learning algorithms and Landsat-8 imagery

Abstract. Quantitative estimation of wetland aboveground biomass (AGB) is an essential aspect in evaluating the health and conservation of this valuable ecosystem. We combine AGB field measurements and remote sensing data to establish a suitable model for estimating wetland AGB in the Poyang Lake National Nature Reserve (PLNNR), which is included in the Ramsar Convention’s List of Wetlands of International Importance. All field sampling points cover four dominant vegetation communities (Carex cinerascen, Phalaris arundinacea, Artemisia selengensis, and Miscanthus sacchariflorus) in the PLNNR. Wetland AGB is retrieved from the Landsat-8 OLI image. To improve the accuracy of wetland AGB estimation, we compare the performances of three machine learning algorithms, namely, random forest (RF), back-propagation neural network (BPNN), and support vector regression (SVR), with linear regression (LR) in estimating the AGB in the PLNNR. Results are as follows: (1) the RF model with a root-mean-square error of 0.25  kg m  −  2 performs better than BPNN (0.29  kg m  −  2), SVR (0.27  kg m  −  2), and LR (0.31  kg m  −  2) in our testing dataset, and AGB density in the PLNNR is between 0 and 1.973  kg m  −  2. (2) The four most important features for AGB modeling are near-infrared, short-wave infrared 1 band, enhanced vegetation index, and red band. Our study presents an effective and operational RF model that estimates wetland AGB from Landsat data, providing a scientific basis for floodplain wetland carbon accounting and possible future studies, such as the linkage between wetland AGB and the great water level fluctuations.

Modeling wetland aboveground biomass in the Poyang Lake National Nature Reserve using machine learning algorithms and Landsat-8 imagery Rongrong Wan, a,b, * Peng Wang, a,b Xiaolong Wang, a,b, * Xin Yao, c and Xue Dai a,b 1 Introduction width, and plant diameter allow SAR to utilize the backscattering ratio for predicting vegetation biomass. Microwave technology not only interacts with the canopy but also can penetrate into vegetation stalks. 5 SAR can obtain the surface and body scattering information of vegetation due to its capability to penetrate clouds and vegetation. Thus, SAR is suitable for inverting vegetation parameters with evident structural characteristics, such as tall trees in forests. In addition, the saturation problem of backscattering coefficients limits SAR application in wetlands. Therefore, SAR is rarely used for the biomass inversion of wetland vegetation. Recently, the advantages of acquiring structural information about objects on the ground have rendered LiDAR further attractive to vegetation biomass inversion researchers. The earliest application of LiDAR was in measuring forest biomass. Then, LiDAR was utilized successfully for wetland biomass, especially mangrove forests in costal zones. [6][7][8][9] However, LiDAR applications are restricted by weak penetrability, low saturation in plants found in high-density canopies, and deficiencies in spectral information, particularly with tall and lush trees, mangrove forests, and low-lying herbs in freshwater wetlands. Optical remote sensing technology is a common and well-tested method in terms of data availability, processing simplicity, and extensive applications over a large region. However, high-resolution remotely sensed images and LiDAR or SAR data are often restricted by their limited spatial and temporal coverage. Accordingly, numerous researchers prefer medium-resolution satellite images for measuring AGB over long time periods and at large areas. 10 Landsat is a trade-off of spatial, temporal, and spectral resolution; thus, it is a good option for large-scale AGB modeling. [11][12][13][14] Generally, the optical remote sensing method utilizes the spectral characteristics of plants, particularly the huge difference in reflectance in the red and near-infrared (NIR) bands, to construct a vegetation index (VI) for analyzing relationships. Normalized difference vegetation index (NDVI) is the most commonly used VI; [15][16][17][18] however, it is greatly affected by the atmosphere, soil composition, and heavy saturation in dense vegetation. Thus, researchers have proposed modified VIs, including soil-adjusted vegetation index (SAVI), 19 modified SAVI, 20 and enhanced vegetation index (EVI). 21 These indices are widely used and frequently combined to overcome the effects of background noise and improve the accuracy of biomass evaluation. [22][23][24][25] Remote sensing methods for modeling AGB can be divided into two groups: statistical and physical models. Physical models, such as SAIL, Kuusk, and PROSEPCT, have been used to establish the link between the vegetation spectral reflectance (leaf or canopy) and biomass by analyzing the entire radiation transmission process of light inside and outside the vegetation. These models are useful under certain circumstances; however, their complexity, the overabundance of parameters, and the uncertainty of measurements limit their application in large-scale regions. 26 For statistical models, a single or multiple VIs are traditionally adopted as the predictors for establishing a linear, exponential, logarithmic, or power model. 27,28 The development of machine learning and artificial intelligence has allowed for improved predictive accuracy. Such techniques can produce complex nonlinear mappings due to advanced learning strategies by utilizing the information contained in a set of reference samples. Another advantage is that no assumptions have to be formulated about data distribution. Thus, nonlinear machine learning methods are often regarded as distribution-free. Given this property, the retrieval process can integrate data from different sources with poorly defined (or unknown) probability density functions that relate well to the target variable. Regardless of the approach, either empirical or physical models, the high complexity and nonlinearity of retrieval problems require the development and usage of further advanced methods. The artificial neural network (ANN) 29 is a commonly used technique in the field of geo-/biophysical variable retrieval. 30,31 The ANN, due to its effectiveness and relatively higher accuracy, is more effective for estimating wetland AGB than the traditional linear model. 32,33 Numerous studies [34][35][36] have shown that the ANN model exhibits better accuracy, stability, and computational speed than the other investigated strategies. Support vector regression (SVR) 37 has also become popular in the last few years and is particularly effective in the field of wetland AGB retrieval. 30,38 Study results reveal the promising features of this method, such as its good intrinsic generalization capability and its capacity for overcoming noise interference when reference samples are limited. Ensemble methods, such as random forest (RF), 39 have successfully been used to enhance predictive accuracy in the ecology field. 40,41 The RF algorithm is a nonparametric statistical technique that can synthesize regression or classification functions on the basis of discrete or continuous datasets. The RF can also handle the complex relationships between predictors due to noise when using large amounts of data and weighing the importance of each input variable. In the remote sensing field, the RF has been widely applied in various domains as a classification algorithm. [42][43][44] Mutanga et al. 45 investigated the capability of RF to model AGB in iSimangaliso Wetland Park on the basis of WorldView-2 images. Recently, Byrd et al. 46 generated a remote sensing model based on RF to model tidal marsh AGB and carbon stocks in the United States. Studies that compare the effectiveness of several machine learning algorithms in modeling and mapping AGB based on Landsat images are limited, particularly at a vegetation landscape scale for seasonal lake wetland in floodplain areas. Therefore, such a study must be conducted.
In this study, we evaluated the effectiveness of linear regression (LR), back-propagation neural network (BPNN), SVR, and RF models in estimating wetland AGB in the Poyang Lake National Nature Reserve (PLNNR). The objectives of this study are as follows: (1) to explore which machine learning algorithms and spectral features can yield the most accurate AGB, (2) to estimate the AGB and their distribution and various characteristics in the PLNNR quantitatively, and (3) to evaluate the importance of each input band variable derived from Landsat images for predicting AGB.

Study Area
Poyang Lake, the largest freshwater lake in China, is located at 115°47 to 116°45'E and 28°22' to 29°45′N on the southern bank of the Yangtze River (Fig. 1). The lake is fed primarily by five tributaries (Ganjiang, Fuhe, Xinjiang, Raohe, and Xiushui Rivers) and is connected to the Yangtze River at Hukou. Poyang Lake has a subtropical monsoon climate with an average annual temperature of 17.6°C and mean annual precipitation level from 1450 to 1550 mm, with the rainy season generally occurring in summer. Interactions among the hydrology, soil, and plants of Poyang Lake have formed a unique wetland ecosystem, which provides essential functions, such as water supply, floodwater storage, and biodiversity maintenance. Poyang Lake wetlands are home to 102 species of aquatic plants from 38 families and to 122 species of fish from 23 families. More than 280 bird species are also available, representing 12 genera and 51 families, including 50 rare species.
The PLNNR is located northwest of Poyang Lake, at the intersection of Ganjiang and Xiushui Rivers (Fig. 1). The PLNNR, with an area of 224 km 2 , was established in 1988 to preserve wintering birds. 47 Twenty-three threatened species in the International Union for Conservation of Nature and Natural Resources red list 48 were found in the PLNNR, and approximately 95% of the entire population of critically endangered Siberian cranes (Grus leucogeranus Pallas), nearly 80% of endangered Oriental storks (Ciconia boyciana), and over 70% of vulnerable white-naped cranes (Grus vipio) wintered in the PLNNR. 49,50 For these reasons, Poyang Lake was one of the first wetlands to be included in the Ramsar Convention's List of Wetlands of International Importance. 51 Complex inflow, outflow, and backflow patterns lead to large seasonal water level fluctuations. 52 The plant distribution of the PLNNR wetland, accompanied by the fluctuating water level, is characterized by a typical concentric pattern along the elevation gradient from the lake to the shoreland. 53 Four main types of plants are abundant in the PLNNR wetlands, namely, Carex cinerascen, Phalaris arundinacea, Miscanthus sacchariflorus, and Artemisia selengensis. They form three belts, namely, bulrushes (Miscanthus sacchariflorus or Phragmites australis communities), sedges (Carex cinerascen or Artemisia selengensis communities), and sparse emergent vegetation (Phalaris arundinacea communities), which occur naturally along a moisture gradient from the higher lands to the lake shoreline. Wetland vegetation in the PLNNR is distributed in different types of bottomlands, which are often inundated during flood season. These bottomlands include the littoral land of the main lake, inflow river delta, sublakes detached from the main lake during autumn and winter, and small islands seasonally submerged during summer. From October to December, water levels are low, thereby exposing the areas of these vegetation communities in Poyang Lake. At this time of the year, emergent vegetation (e.g., Miscanthus sacchariflorus) experiences a heading stage and subsequently withers and dies; sedges (e.g., Carex cinerascen), which have a long growing period, continue blooming; and sparse emergent vegetation (e.g., Phalaris arundinacea and Artemisia selengensis) begin to wither and die. During this period, the spectral characteristics and AGB of these vegetation communities do not change considerably. Thus, this type of phenology phenomenon necessitates the implementation of a field work that covers all four dominant vegetation communities in the PLNNR.

Field Surveying and Data Collection
The field campaign was conducted on November 23 to 30, 2016. We selected four typical bottomlands, which are representative of PLNNR wetland for sampling, and a total of 94 sampling points, which covered the four main vegetation communities in Poyang Lake wetland ( Fig. 1 and Table 1). Then, in view of the concentric pattern of vegetation communities along the elevation gradient, a predetermined fixed number of 1 m × 1 m sample plots at each bottomland were created from the shoreline to the relatively higher land, where flood cannot overflow. The interval between plots is 50 to 120 m (in accordance with the distribution of slop and vegetation belts in the sites) to cover all the main vegetation communities at different elevations in various types of bottomlands (Fig. 1). Once the sample plot was located, we recorded its geographic coordinates and elevation through GPS (Trimble) with an accuracy of 0 to 0.20 m for position and 0.10 m for elevation. Then, all plant types in the plot were identified, recorded, and excavated. All dead materials were removed from clipped plants and fresh biomass was measured immediately using a digital scale. Then, the average fresh AGB per plot was calculated from these measurements (n ¼ 3). The Landsat 8 image, which was acquired from USGS, 54 with 11 bands (bands 1 to 7 and 9 to 11 with a spatial resolution of 30 m and band 8 with a spatial resolution of 15 m) from December 16, 2016, covering the PLNNR, was used to complete the study.

Data Preprocessing and Preparation
Image preprocessing, executed via ENVI 5.2, included geometric, radiometric, atmospheric corrections, and spatial subsets. On the basis of the georeferenced images of Poyang Lake, the root-mean-square errors (RMSEs) in the image registration were ensured at <0.3 pixel for the seven images. The FLAASH atmospheric correction module, a feature of ENVI 5.2, was used to finish the atmospheric correction. NDVI, SAVI, EVI, and the second band from the Kauth-Thomas transformation 55 were added to the Landsat 8 OLI image with 7 multispectral bands by layer stacking to create an 11-band layer-stacked Landsat image. Then, the layerstacked image from December 16, 2016, was used to extract 94 spectral sampling points, based on geographical coordinates recorded by GPS in preparation for image classification. The proposed methods are briefly explained in the flowchart (Fig. 2).

AGB Model Methods
We selected eight variables, namely, NDVI, SAVI, EVI, B3 (red band), B4 (NIR band), greenness (the second band from the Kauth-Thomas transformation), B6 (SWIR1, short-wave infrared 1 band), and B7 (SWIR2, short-wave infrared 2 band), as inputs. The descriptions and computational formulas of four VIs in this study are shown in Table 2. The variable values of 94 sampling points were extracted in accordance with their geographic coordinates. The effectiveness of LR, BPNN, SVR, and RF models in estimating AGB in the PLNNR was evaluated. Then, we utilized the trained models with the highest testing accuracy to map AGB in the PLNNR. We used spectral features as predictors to improve the accuracy of the models.

RF model
The RF model is an ensemble learning technique developed by Breiman to improve the classification and regression tree method by combining a large set of decision trees. 39 In RF regression, each tree is constructed by selecting a random set of variables and a random sample from the training dataset via a deterministic algorithm. Three parameters must be optimized in this model: (1) n tree , the number of regression trees grown based on a bootstrap sample of the observations, with a default value of 500 trees; (2) m try , the number of predictors tested at each node, with a default value 1/3 of the total number of variables; and (3) node size, the minimum size of the terminal nodes of the trees. To determine the n tree and m try values that can most accurately predict the wetland biomass, the two parameters were optimized on the basis of the RMSE. In addition, as the importance of each predictor is measured by an increase of mean squared errors and node purity, we excluded these predictors individually from the RF models. In our study, we tried 500 parameter sets, including n tree , m try , and node size, for the RF model and selected the one with the highest accuracy.

SVR model
Support vector machine (SVM) is a supervised nonparametric statistical learning technique, with no requirement for data distribution. The SVM can solve regression problems, which are generally regarded as the SVR. The two advantages of this technique include unique and globally optimal architectures and its easily accepted results. Nonlinear SVR maps input data X to a highdimensional feature space using a kernel function. For our study, we utilized the commonly used RBF kernel, because it is associated with fewer numerical difficulties than any other kernel. Given the training data [ðx 1 ; y 1 Þ; ðx 2 ; y 2 Þ: : : ðx n ; y n Þ], where x i and y i are the input and output data, respectively, we used ε-SVR to determine the function fðxÞ with the most ε deviation from the input data and that is as flat as possible. The RBF kernel formula is as follows: where γ is a parameter and vector x j denotes the training data input. The unknown vector of w is determined to minimize the function: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 4 4 min w ∈ R : : : where cost ðCÞ > 0 controls the trade-off in the flatness of fðxÞ, and deviations greater than ε are tolerated. Further details are provided by Awad and Khanna. 56 We adopted the most commonly used method, in which γ, C, and ε are calibrated to a certain range by a grid search. Similarly, 500 pairs of parameters were tried, and the set with the best performance is selected.

BPNN model
BPNN has a good generalization capability, 57,58 and it consists of input, hidden, and output layers, including their nodes and activation functions. The main mathematical expression of BPNN is as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 1 7 5 x i is the i 0 th node value of the previous layer, y j denotes the j 0 th node value of the present layer, w ji represents the weighted value connecting x i and y j , n refers to the number of nodes in the previous layer, and f indicates the activation function. The BPNN model is discussed in further detail by Buscema. 59 Levenberg-Marquardt algorithm was used to determine the weighting and bias matrices for each iteration. We selected a bagging method (n_estimators: 400, max_samples:0.2) to ensure the stability and robustness of the trained model. Figure 3 shows the structures of the three models for estimating the AGB of Poyang Lake.

Accuracy assessment
In this study, we implemented all four modeling methods through packages in Python: scikitlearn. 60,61 Considering that we do not have enough sample points for every year, we divided the 94 sample points from the field survey in 2016 into two parts: training (80%) and testing (20%) datasets. We used three criteria, namely, RMSE, coefficient of determination (R 2 ), and mean absolute error (MAE), to evaluate the performance of these models in predicting AGB. RMSE [Eq. (4)] is a standard metric for measuring the discrepancies between the simulated and actual AGB values; however, it is easily influenced by outliers. 62 Therefore, MAE [Eq. (5)] is suggested to be used with RMSE for determining the variations of errors in the model. 63 R 2 [Eq. (6)] is utilized to determine the collinearity between the predicted and observed AGB values. RMSE and MAE values close to 0 and an R 2 value close to 1 indicate that the model is an accurate predictor: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 1 0 4 y is is the i 0 th simulated AGB value, y it denotes the i 0 th real AGB value among the tested sample points,ȳ is represents the mean simulated AGB for all tested sample points, and n indicates the size of tested samples. The RMSE, R 2 , and MAE in Tables 3 and 5 have an average value after fivefold cross validation. Table 3 presents the specific value for the three criteria of four machine learning algorithms. Among the 76 sample points from the training dataset, the SVR had the lowest RMSE (0.25 kg m −2 ) and the highest R 2 (0.84), and it also performed best in MAE (0.20 kg m −2 ), which was considerably lower than that of the other three models. RF had the second best performance in training dataset: RMSE (0.30 kg m −2 ), R 2 (0.71), and MAE (0.31 kg m −2 ). The BPNN and LR were similar in magnitudes of RMSE, R 2 , and MAE. For the 18 sample  Finding an overfitting problem that occurs in the SVR is easy. Although the predicting capability of BPNN and LR in the training dataset almost had no difference, the R 2 (0.59) of BPNN is considerably higher than that of LR (0.53) in the testing dataset, indicating that BPNN had a relatively better generalization capability in this study. Figure 4 shows that the deviation between the simulated and actual values of RF had a relatively even distribution compared with the other models. From the scatter plot, the prediction values of the BPNN and LR had a relatively more dispersed distribution around the fitting line than those of the RF and SVR, indicating that these two models had worse stability of predictions. The least accurate model was the LR, with the highest RMSE of 0.31 kg m −2 and the lowest R 2 of 0.53 in the testing dataset. No discernible difference among the generalization capabilities of the three models except LR existed. Thus images with high spatial and spectral resolution might be essential to improve modeling accuracy. We concluded that RF was a slightly better model for predicting wetland AGB in the PLNNR that the other models.

Predicting AGB in the PLNNR
We utilized the most accurate model, namely, RF, for exploring the AGB distribution in the PLNNR (Fig. 5). The maps show that the AGB density is between 0 and 1.973 kg m −2 .
On the whole, a higher than average AGB value occurred in the north part, including the Ganjiang River delta and Banghu sublake, whereas the south part, including bottomlands at the Dahuchi sublake experienced relatively low AGB values.

Accuracy and Uncertainty of the Study
This study presents for the first time that a landscape-scale remote sensing model of the AGB for seasonal lake wetland in floodplain areas has been developed on the basis of machine learning algorithms and Landsat images. We used RMSE to compare the predictive performance of the RF model to that of other models (Table 4). During our remote sensing analysis of wetland biomass, we concluded that simple LR had the worst performance regarding simulation effects, whereas the further advanced machine learning algorithm performed better with regard to RMSE. In our study, the RF model had a 0.21 kg m −2 RMSE value in the testing dataset, which is considerably lower than the mean level of ∼0.3 to 0.5 kg m −2 . Researchers might select different input variables; thus, the final simulated results would be affected by the randomness of sample points and the species types in wetlands. However, we cannot ignore that the RF model is useful and effective for predicting AGB in wetlands.
The time inconsistency of remote sensing imaging and sampling may cause the error on AGB inversion. For example, the December 16 scene is the only appropriate Landsat image closest to our field survey time in 2016. The atmospheric conditions of imaging time can affect the gray value of each pixel, thereby resulting in a heterogeneous gray value of the same vegetation type on the ground in different times. Thus, the training model generalization capability will be reduced.

Implications of the Input Variables for Modeling AGB
The rank of a feature used as a decision node in a tree can be utilized to assess the relative importance of that feature with respect to the predictability of the target variable. Features at the top of the tree greatly influence the final prediction decision of a large fraction of the input samples. Thus, the expected percentage of the samples that they contribute to can be used to estimate the relative importance of the features. By averaging the expected activity rates over several randomized trees, one can reduce the variance of such an estimate and use it for feature selection. Fig. 6 shows the results of applying RF with least squares loss and 500 base learners to the AGB in Poyang Lake wetland. Plot (a) shows the training and  Fig. 6 Relative importance of all input variables identified using the RF model in this study. Wan et al.: Modeling wetland aboveground biomass in the Poyang Lake National Nature Reserve. . .
Journal of Applied Remote Sensing 046029-10 Oct-Dec 2018 • Vol. 12 (4) testing errors at each iteration. Plot (b) shows the feature importance, which can be obtained via the feature importance property. Details on how the RF determined the importance of variables are discussed by Genuer et al. 68 When the number of trees is close to 450 or more, the test dataset error is ∼0.2, which is the minimum value. Thus, we select 450 as the best N of the RF model in this study. Furthermore, the top four most important features were NIR, SWIR1, EVI, and red band. NIR performed best among all the variables, followed by SWIR1. NDVI was not the best predictive factor for estimating AGB in the PLNNR, possibly because it is considerably more affected by Poyang Lake wetland's complex environmental background conditions than the other modified VIs.
There is no doubt that EVI, SAVI, NDVI, NIR, and red band are highly correlated. However, in general, the models would be further effective for estimation if their input variables were independent of each other. Thus, we conducted an experiment to determine whether using the NIR alone as the input variable of these four models would produce further accurate results for estimating AGB in the PLNNR. Table 5 shows the specific performance. The RMSE, R 2 , and MAE values changed remarkably in the training and testing datasets. The RMSE values for the training dataset increased by 0.11 kg m −2 on average (RF, 0.15; SVR, 0.12; BPNN, 0.07; and LR, 0.03), whereas the R 2 values decreased. In the testing dataset, a slight improvement was observed in the LR. The RMSE and MAE decreased by 0.01 kg m −2 and R 2 increased by 0.02, indicating that LR could not cope with the collinearity of variables and could not efficiently extract other variables' useful information. The RMSE and R 2 values of the other three models were increased by more than 0.05 kg m −2 and reduced by 0.18 on average (RF, 0.22; SVR, 0.19; and BPNN, 0.13). Therefore, we insisted that the RF and SVR had better capability for processing further complicated information than the other methods. We concluded that placing these VIs into models together, which may decrease the influence of environmental background to some extent, improves the capability of the models for measuring AGB in the PLNNR. This improvement surpassed the influence of the collinearity of variables to some extent.

Limitations of Predicting AGB Using Optical Remote Sensing Data such as Landsat
The complexity of species composition and the density of vegetation in wetland areas present a huge challenge for remote sensing. 1 In fact, VIs calculated from broadband sensors will rapidly approach a saturation level when the AGB estimation is limited by the asymmetrical nature of the relationship between the AGB and VIs calculated from medium-spatial-resolution (10 to 100 m) multispectral sensors using NIR and red bands. Therefore, the RF model is likely to overestimate biomass at low observed values and underestimate biomass at high observed values, which may explain why errors are associated with high biomass values. Despite these limitations, our findings showed that the Landsat NIR band was sensitive to the AGB in the PLNNR. Recent efforts have been geared toward using narrow band VIs from hyperspectral data or WorldView-2 (eight bands including red edge band and 2-m spatial resolution) to estimate high canopy density biomass. 15,23,69,70 Results from these studies have shown that modified VIs calculated from the red edge and NIR shoulder domains can more accurately estimate biomass at a full canopy cover than the standard red/NIR indices. 2,23 A reasonable explanation for this finding is that the indices calculated from the red edge are more sensitive to vegetation properties, such as canopy biomass and chlorophyll content, than that from other regions of the electromagnetic spectrum. A slight change in vegetation properties could result in a shift in the red edge curve, and NIR can minimize the influence of the atmospheric and water absorption as well as the soil background. However, the use of fine spatial and spectral resolution sensors (<5 m and >100 bands) for estimating AGB is limited by the cost, availability, and complexity of processing high-dimensional data. 70,71 This technology may be widely used in the future when costs decrease.

Conclusions
The quantitative estimation of wetland AGB is crucial for evaluating the health and conservation of this vital ecosystem. Traditional methods do not meet the requirements for rapid, accurate, and effective observation demands for a seasonal and changeable wetland such as Poyang Lake wetland. Therefore, numerous researchers are compelled to conduct an overall estimation of wetland AGB without considering the AGB information of different wetland vegetation communities. Landsat is a good remote sensing method alternative due to its relatively high spatial, spectral, and time resolution. In this study, we compared the performances of three machine learning algorithms, namely, RF, SVR, and BPNN, in estimating the AGB in the PLNNR. The RF model with 0.25 kg m −2 RMSE performed better than BPNN (0.29 kg m −2 ), SVR (0.27 kg m −2 ), and LR (0.31 kg m −2 ) models in the testing dataset. Furthermore, the AGB density in the PLNNR was found to be between 0 and 1.973 kg m −2 using the trained RF model to map the ABG distribution.
Our results indicated that RF had a relatively better generalization capability than LR, BPNN, and SVR in predicting AGB in the PLNNR. By considering the variable importance selection of the RF model, we regarded NIR, SWIR1, EVI, and red band as the most critical variables for estimating AGB in the PLNNR. Moreover, we found that the introduction of modified VIs can greatly improve the estimation accuracy, as opposed to only using NIR as the input variable. Furthermore, images with high spatial and spectral resolution are essential for improving AGB modeling precision and overcoming the saturation problem. This study presents an effective and operational RF model that estimates seasonal lake wetland AGB from Landsat-8 data, thereby providing a scientific basis for floodplain wetland carbon accounting.