Biomass estimation of wetland vegetation in Poyang Lake area using ENVISAT advanced synthetic aperture radar data

Abstract Biomass estimation of wetlands plays a role in understanding dynamic changes of the wetland ecosystem. Poyang Lake is the largest freshwater lake in China, with an area of about 3000     km 2 . The lake’s wetland ecosystem has a significant impact on leveraging China’s environmental change. Synthetic aperture radar (SAR) data are a good choice for biomass estimation during rainy and dry seasons in this region. In this paper, we discuss the neural network algorithms (NNAs) to retrieve wetland biomass using the alternating-polarization ENVISAT advanced synthetic aperture radar (ASAR) data. Two field measurements were carried out coinciding with the satellite overpasses through the hydrological cycle in April to November. A radiative transfer model of forest canopy, the Michigan Microwave Canopy Scattering (MIMICS) model, was modified to fit to herbaceous wetland ecosystems. With both ASAR and MIMICS simulations as input data, the NNA-estimated biomass was validated with ground-measured data. This study indicates the capability of NNA combined with a modified MIMICS model to retrieve wetland biomass from SAR imagery. Finally, the overall biomass of Poyang Lake wetland vegetation has been estimated. It reached a level of 1.09 × 10 9 , 1.86 × 10 8 , and 9.87 × 10 8     kg in April, July, and November 2007, respectively.


Introduction
Wetlands are an important component of global ecosystems because of their role in maintenance of environmental quality and biodiversity. Wetland biomass is a key index to the health of the wetland ecosystem and provides quantitative information for understanding its ecological and environmental functions. 1 Conventional methods of in situ estimation are often time consuming, labor intensive, and difficult to implement, especially in remote areas. Isolated plot measurements cannot provide spatial distribution of biomass in large areas. The advantages of remote-sensing techniques, such as repetition of data collection, a synoptic view, a digital format that allows fast processing of large quantities of data, and high correlations between spectral bands and vegetation parameters, make it an efficient source for large-area biomass estimation, especially in areas of difficult access. Therefore, remote sensing-based biomass estimation has increasingly attracted scientific attention. 2 Poyang Lake is the largest freshwater lake in China, with an area of about 3000 km 2 . The lake's wetland ecosystem has a significant impact on China's environmental change. In previous studies of this region, biomass estimation using traditional optical remote-sensing imagery such as Landsat TM∕ETMþ has been conducted. Li and Liu 3 estimated the wetland biomass in April 2000 using Landsat ETMþ data. With the vegetation index extracted from Landsat TM data and field measurements, Li et al. 4 also conducted biomass estimation in this region based on nonlinear regression analysis. However, optical remote-sensing data are often limited because of heavy cloud cover in the rainy season. With all-weather, cloud-penetration capacities, synthetic aperture radar (SAR) data has become a good choice for biomass estimation during rainy and dry seasons in this region.
In recent years, remote-sensing research has led to the development of methods for retrieving wetland biomass from radar backscatter. Several studies (e.g., Refs. [5][6][7][8] have exploited the sensitivity of radar signal to biomass parameters of vegetation canopy over a water layer such as mangrove and rice. The ability of satellite SAR to map wetland biomass was demonstrated with C-band ERS-1/2 data. 9,10 A combination of Radarsat and JERS-1 imagery was used to understand the saturation point in a logarithmic relationship between backscattering coefficients and biomass in the Amazon floodplain. 11 A major problem in wetland biomass inversion from SAR data is the influence of other environmental variables such as water content, vegetation height, and water level. 12 To reduce these effects, multifrequency and multiple datasets have been mainly considered. 13 Most studies in retrieving biomass have focused on the implementation of linear and nonlinear regression models. [14][15][16] However, the interaction between SAR image and vegetative surfaces is complex and nonlinear. 17 The semi-empirical regression model based on ground measurements cannot express their relationship sufficiently. Comparison between neural network algorithms (NNAs) and both linear and nonlinear regression algorithms highlights the overall superior performance of NNA using SAR data in both Pand L-band. 12 The aim of this study is to estimate wetland biomass in the Poyang Lake area using alternating polarization ENVISAT advanced synthetic aperture radar (ASAR) data. An NNA is combined with a canopy-scattering model to establish the relationship between the backscattering values and biomass instead of using linear or nonlinear regression models. In the previous study of biomass estimation based on a neural network, ground-truth measurements were mostly used as input units, and the backscattering coefficients extracted from SAR images were used as output units in the training process. 18 However, the training data completely relied on the ground measurements, and they became unavailable when the study area was hard to access or the ground data were hard to get. The inversion accuracy is mostly determined by the measurement accuracy. In this paper, biomass is retrieved with NNA in three temporal stages from April to November. This paper is structured in six sections. The next section describes the test site and dataset. Methods arepresented in Sec. 3. In Sec. 4, the results are analyzed and biomass estimation validation with ground data are presented. Section 5 discusses the inversion results and their effects. The final section of the paper summarizes the results.

Test Site
The test site is located in Poyang Lake wetland, Jiangxi province, China. The latitude and longitude are 115°47′ to 116°45′E, 28°22′ to 29°15′N (Fig. 1). The climate is characterized as a subtropical, humid monsoon climate with 1620 mm mean annual precipitation and about 17°C annual average temperature. Poyang Lake exhibits large interannual variations in water level. In summer, it is the largest freshwater body in China and extends up to 3500 km 2 by the end of the rainy, wet season (June to September). In the dry season (November to April), Poyang Lake can be less than 1000 km 2 in extent, with only several wandering water courses remaining. In 2007, the fluctuation of the water level was up to 8.37 m (Fig. 2). In the dry season, wetland vegetation emerges above water and starts to grow rapidly from early spring, with the aboveground biomass reaching the highest level in April. 19 In the wet season, wetland vegetation is flooded and hardly grows except at the lakesides with higher ground level. Therefore aboveground biomass is relatively low. In November, the water recedes and vegetation starts to grow again. In general, the biomass dynamics present positive correlations with variations of water level.
The predominant vegetation in Poyang Lake is mostly carex and reed, which account for >90% of the vegetation coverage. 19 The biophysical properties of vegetation vary in different hydrological stages from April to November. In April, plants grow rapidly with green leaves up  to 50 to 70 cm in length. By July, plants are submerged under water. In November, they turn to senescent stage with low water contents.

SAR Data
Four scenes of ENVISAT ASAR alternating polarization (AP) precision image (PRI) mode data, with range and azimuth pixel spacing of 12.5 and 12.5 m, were collected over this area in three periods of 2007: April, July, and November ( Table 1). The ASAR images were radiometrically corrected with the BEST toolbox provided by the European Space Agency. All images were georeferenced with respect to a topographic map. The data in April were composed of two different scenes (April 4 and 6, 2007), each covering part of the test site, and mosaicked together. These two scenes have the same HH/VV polarization and incidence angle (39 to 43 deg), with different overpass directions.

Field Measurements
Field measurements of biophysical parameters of wetland vegetation standing above water were collected during two field campaigns coinciding with satellite overpasses in April and November. Both time periods are in the growing stages of wetland vegetation. Taking the accessibility into account, 46 (in April) and 45 (in November) sampling sites with with intersite spacings >50 m (four times the pixel size) were randomly selected over the study area. Due to the different ground status in April and November, caused by the flood from July to September (Fig. 2), there are few sample points with the exactly same locations for these two field campaigns. Ground data, including plant water content, aboveground biomass, and plant height, were collected during the field campaign. For biomass measurements at each sampling site, we clipped the total standing biomass above water or above ground within an area of 0.5 × 0.5 m 2 . The sites are covered with water or very wet soil. The clipped samples were weighed in situ and oven dried (at 120°C for 24 h) to calculate the wet and dry total biomass (WTB and DTB, respectively). The DTB is referred to as biomass in the following sections. The weight of plant water per unit area was calculated as the difference between WTB and DTB.
The plant height was measured three to five times at each sampling site using a meterstick, then the mean value was calculated as the truth data. According to ground data, the DTB was correlated with plant height, with a correlation coefficient of 0.65 in a 90% confidence interval (Fig. 3).

Methodology
The neural network is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve complex problems. This structure makes NNA inherently suitable for solving nonlinear problems. In this study, we tested the feasibility of biomass estimation with combined analysis of canopy-scattering model and neural network process. By comparing with ASAR images, we simulated radar backscatter in herbaceous wetland environments with a modified canopy-scattering model. A rich amount of training data were simulated from the model and fed into NNA, with which biomass was estimated via model inversion. The modeled results were finally validated with ground measurements.

Canopy-Scattering Model and Backscatter Simulation
The Michigan Microwave Canopy Scattering (MIMICS) model 20 has been widely used for the tree canopy comprising a crown layer, a trunk layer, and rough-surface ground boundary. The model assumes that total backscatter is a linear composition of the following four scattering components: direct scattering from vegetation canopy, backscatter from multiple-path scattering between surface and vegetation canopy, double-bounce trunk-ground interactions, and backscatter from the ground surface. Because the MIMICS model requires so many input parameters, some assumptions must be made to use the model. For ENVISAT ASAR data, the radar's parameters (frequency and look angle) are known. For wetland applications in our study area, there is no woody layer in herbaceous vegetation but only two layers: ground surface and grass canopy. Thus, we modified the MIMICS model input files, turning off the parameters about trunk, primary branch, and secondary branches. The simulated backscatter included ground-surface scattering, multipath scattering between the ground surface and vegetation canopy, and canopy volume scattering. Because the study area is wetland with high soil moisture (especially between April and October), the ground is mostly flooded, covered by water. By changing the settings, the parameters about ground parameters (soil and snow) are turned off, so there is no backscatter from the ground surface. Finally, the parameters about radar (frequency and look angle), leaf (gravimetric moisture, dry density, number density, diameter, thickness, and temperature), ground standing water (salinity and temperature), and crown (layer height) were used by the MIMICS model as input parameters. Some of the parameters were treated as constant or replaced by the average of survey data, e.g., standing water salt content, dry density of leaf material, leaf thickness, temperature for vegetation and standing water ( Table 2). Wetland vegetation consists primarily of carex, grass, and reed and was modeled as a cluster of vertical dielectric cylinders that were uniformly distributed. 20 In flooded wetlands, radar backscatter is mostly attributed to vegetation properties. With the modified MIMICS model, we simulated backscatter coefficients responding to increases in a set of vegetation biophysical parameters, including plant height [ Fig. 4(a)] and plant water content [ Fig. 4(b)]. Figure 4(c) shows that the incidence angle has little impact on backscatter at all polarizations (HH, VV, and HV) owing to the limited double-bounce in the absent trunk layer in wetland vegetation. Due to the dense coverage of vegetation, the total C-band backscatter is mainly from canopy volume scattering and canopy-ground multiple interaction, 21 which are not strongly affected by the variation of incidence angle. The incidence angle has only a little effect on canopy-ground interaction, but less microwave energy can penetrate the dense canopy layer to reach the ground and be reflected by the ground to produce canopyground interaction. Put simply, the incidence angle has little impact on the backscatter due to the dense canopy layer and the absence of the trunk-ground and canopy-ground interactions.   After simulating the backscatter coefficient (σ 0 ) using the MIMICS model, we take the results from April 2007 for accuracy evaluation by comparing it with that from ENVISAT ASAR image through the root mean square error (RMSE). The RMSE is 1.86 dB for VV polarization and 1.42 dB for HH polarization, which is smaller compared to the actual dynamic range (6.84 dB for VV polarization, 4.52 dB for HH polarization). Figure 5 also shows that the coefficient of determination (r 2 ) between the backscatter coefficients from the ASAR images and simulated by the MIMICS model is 0.9271 for VV polarization and 0.94961 for HH polarization.

Generating Training Data for the Neural Network
Training data, as prior knowledge of the network, are very important and mainly determine the accuracy of the training results. To estimate aboveground biomass of wetland vegetation, the training data need to satisfy the following conditions: 23 (1) the training data should consist of a wide range of biomass from different plant growth stages; (2) the change of backscatter with biomass, height, plant moisture content, and system factors should be consistent with ground measurements and ASAR images; and (3) highly correlated data should be removed to reduce data redundancy and improve simulation accuracy and efficiency. Training data meeting the above conditions could not be easily obtained from ground measurements owing to limitations in time, labor, and large random errors introduced in field. In this study, we apply the MIMICS model to simulate backscattering coefficients with a set of system factors (polarization, incidence angle, etc.) and biophysical parameters (plant height, water content, biomass, etc.). Training data between biophysical parameters and backscattering coefficients are thus generated. Here we first determine the range of the training data based on ground measurements to cover all biomass levels in the study area (Table 3). Then, a set of 50 training data pairs were generated using the MIMICS model, with each pair matching biomass, plant water content, and height to HH and VV backscattering coefficients, respectively (Fig. 6). The HV polarization is not considered because it is not recorded in the ASAR imagery. Figure 7 presents the topology of the neural network used in this study. The neural network is a one-hidden-layer back propagation network with two input elements (HH, VV) and three output elements (biomass, plant height, plant water content). There are eight neurons in the hidden layer. The activation function of each input element in the hidden layer is a sine function, and a logistic function of the output element is defined as

Training the Neural Network
(1) where x represents the input element. The NNA is trained with the data generated from the MIMICS model. The HH and VV backscatter can also be extracted from the ASAR images acquired in April, July, and November. Therefore, the neural network could be applied to estimate biomass in the study area in an inverted process. Table 3 Range of the training data used in the neural network.

Biomass Maps of the Poyang Lake Wetland
Biomass distributions of Poyang Lake wetlands in April, July, and November 2007 were mapped using ENVISAT ASAR data and MIMICS-fed neural network analysis (Fig. 8). These maps clearly depict the development phases of wetland plants with the changes of water level of the lake. In April, water level starts to rise but most of the wetland is visible. The vegetation is mainly distributed in the southern areas of the lake. In July, the water level is about 17 m, close to the peak level for this year. Therefore most areas of the lake are flooded except the shorelines on higher ground in the south. In November, the lake reaches the lowest water level, which results in the smallest area of water body. Therefore most areas of the lake are covered with wetland vegetation. Table 4 summarizes the biomass level distributions in the three periods. In April, total dry biomass of the wetland was 1.09 × 10 9 kg. In the middle of the growing season, 23.66% of the biomass in April is at the level of 200 to 500 g m −2 and 39.27% at the level of 500 to 800 g m −2 . In July, the percent of high biomass level increases because the plants grow to maturation in this period. However, with higher water level, most of the wetland is flooded, so the total biomass decreased to 1.86 × 10 8 kg. In November, as plants wither and leaves dry up, the high-level biomass starts to decrease. About 72.68% of the biomass was in the range of 200 to 800 g m −2 . The total biomass in this period was nearly the same as that in April. The average biomass in the three periods was nearly consistent. In April and November, the wetland areas are widespread but most plants have low biomass. In July, plants grow very well and reach high biomass, although most of the area is flooded.

Accuracy Assessment
Ground data collected in April and November were used to examine the accuracy of NNA-estimated biomass. Figure 9 shows the scatterplots between ground truth and estimated biomass in April [ Fig. 9(a)] and November [ Fig. 9(b)]. The estimated biomass in July is not evaluated due to the lack of ground measurements in this period. When all data samples are considered, the r 2 between ground-measured and estimated biomass in April was 0.718, and in November, 0.5579. The intermediate level of biomass between 400 and 800 g m −2 appeared to be better simulated, whereas the high and low biomass estimation results were not satisfactory. According to the MIMICS simulations, vegetation in high biomass levels (>1800 g m −2 ) has backscattering values similar to those of vegetation at intermediate biomass, resulting in underestimation in the high biomass level. Another reason for the underestimation is that the NNA has been trained for biomass values in the range of 125 to 1300 g m −2 (as shown in Table 3), so the NNA  approach cannot estimate biomass above this value. In future work, the NNA should also be trained for higher biomass. In the two scatterplots in Fig. 9, there are a few sample points in the top left that demonstrate apparent overestimation, which may result from the low plant density in April and November. The total backscatter was strongly influenced by surface backscatter from wet soil ground, which was not taken into account in the modified MIMICS model.
The overall RMSE of biomass estimation in the study area was 141 g m −2 in April and 104 g m −2 in November. In the intermediate biomass level, it reached an RMSE of 117 and 91 g m −2 . This indicates the potential of the NNA approach combined with a MIMICS herbaceous wetland scattering model in large-area, multitemporal biomass estimation.

Discussion
This study developed an integrated approach to estimating wetland biomass by combining MIMICS model, NNA, and ASAR HH/VV radar images. With the MIMICS-simulated radar backscatter from varying biophysical properties, the neural network was trained to estimate biomass in an inversion process with ASAR images. The results clearly show that seasonal variations in water level have strong influences on the biomass of wetland vegetation. During the low-water-level period, vegetation grows rapidly, so the total biomass is high. During the high-water-level period, most vegetation is flooded and the total biomass decreases. Therefore, it would make more sense to choose a low-water-level period to monitor the changes in biomass from year to year. Furthermore, the low-water-level period is also the vegetation growing season in the Poyang Lake. Therefore data in this period are important in understanding the wetland ecosystem and its functions in leveraging water balance and environmental change in this region. Soil properties are not considered in this study, as the wetland is flooded or saturated with water during image acquisition. However, during field trips we found that some the sample points were not saturated although soil moisture was still high. In this case, ground-surface backscatter was strong, which resulted in overestimation in biomass retrieval. If soil properties were considered in the modeling process, the accuracy of the biomass inversion would be much higher. At the same time, the fluctuation in water levels also affects radar backscatter. Areas at higher water levels tend to have lower accuracy of model simulation owing to the effects of stalk lodging in flooded vegetation.
In comparison with ground measurements in Fig. 9, our results in low biomass level are overestimated and those in high biomass level are underestimated. But for intermediate biomass levels, the modeled results reach a better agreement with ground measurements. For most vegetation in the study area, especially in periods of low water levels such as April and November, total biomass estimated from the ASAR images is in a reasonable range. Hence, to some extent, Fig. 9 could represent the real distribution of the biomass. But uncertainties in biomass measurements have several sources: errors in random sample selection, visual estimates of biomass categories performed during field campaigns, and errors in weighing and drying operations. The RMSEs of overall biomass estimation are similar to those with the ERS SAR data in other studies. 9

Conclusions
This study focused on the application of NNA combined with the MIMICS model to retrieve wetland vegetation biomass with ENVISAT ASAR alternative polarization data. The training data of the neural network was simulated from the MIMICS model, in accordance with the bounding parameters obtained during field trips in wetland environments. A trained NNA was used in inversion process to estimate aboveground dry biomass with ASAR data. In general, the inversion model has RMSEs of 18.9% in April and 14.3% in November. The total biomass of the Poyang Lake Wetland reached a level of 1.09 × 10 9 , 1.86 × 10 8 , and 9.87 × 10 8 kg in April, July, and November 2007, respectively. The results indicate the potential of biomass estimation in wetland environments with combined radar imagery, radiative transfer model, and general classification models. Further investigation will be conducted in the near future to improve the accuracy of model simulation.
A general problem of neural networks is their ability in global-scale usage. In future work, efforts for better understanding the model to make it globally useful are of great worth. Furthermore, future RADARSAT-2 polarimetric data should be used to verify how different polarizations could optimize the estimation of wetland vegetation biomass.