Analysis and prediction of rubber tree phenological changes during Pestalotiopsis infection using Sentinel-2 imagery and random forest

Abstract. The fungus Pestalotiopsis sp. causes aberrant leaf fall, resulting in an annual reduction in rubber production. The loss of latex yield has a significant impact on the income of smallholder farmers. The objective of this study is to estimate NDVI using reflectance from the upper canopy based on Sentinel-2 NDVI time series data and daily climate data, analyze phenological changes of rubber trees using a harmonic model and time-lag cross-correlation, and develop a prediction model of vegetation indices using the random forest regressor (RFR) algorithm. The study was conducted on the rubber plantation of the Indonesian Rubber Research Institute in Sembawa, Indonesia. Three rubber clones aged 10 to 11 years were used to study the wintering trend. The study found that the overall trend of the NDVI decreased significantly from 2019 to 2022. The cycle of defoliation and refoliation changed due to disease and climate change. The time lag effects of vegetation index and climate variables are crucial for predicting vegetation dynamics. Increasing temperature and changing precipitation play significant roles in influencing pathogen incidence. The mean absolute percentage error (MAPE) was used to evaluate the trend of the vegetation indices. The RFR algorithm predicts the vegetation indices with a MAPE of 6.07%, 5.96%, and 8.18% of BPM24, GT1, and RRIC 100, respectively. Our finding indicated that the analysis and prediction model can be used to understand the phenological pattern and predict the wintering pattern to prevent disease spread and minimize latex yield loss due to infestation outbreaks.


Introduction
The Pestalotiopsis sp.fungal disease outbreak has significantly reduced the output of Indonesia's rubber plantations over the last 6 years (2016 to 2021). 1,2Pestalotiopsis sp. has infected mature rubber trees on at least 0.4 million hectares.The spread of this disease afflicted South Kalimantan, Central Sulawesi, Lampung, Central Java, and East Java.In Pestalotiopsis, up to 75% to 90% of the defoliation is abnormal, and more than 25% of the latex yield is lost. 1 *Address all correspondence to Yeni Herdiyeni, yeni.herdiyeni@apps.ipb.ac.idRubber cultivation in Indonesia is dominated by smallholder plantations.As a result, it has a negative impact on the well-being of rubber smallholders.
Pestalotiopsis is distinguished by brown spots on juvenile leaves that develop into dark brown spots and cause the leaves to fall prematurely.The newly developed leaves are smaller than ordinary leaves.The transitions from leaf defoliation to new leaf production (refoliation) are the primary phenological stages in rubber plantations.Plant phenology is the study of how vegetation develops throughout the year.4][5] On the ground, species-specific phenology reactions are found at limited scales.This procedure is time-consuming and expensive.Changes in vegetation can also be visible in high spatial and spectral resolution remote sensing imagery.Remote sensing has the potential to detect changes in vegetation features at numerous sizes, from the local to the continental and even the global. 62][13][14][15] Based on phenology, it is feasible to investigate how a rubber plantation changes during the dry season, from defoliation to the growth of new leaves. 15he content of chlorophyll in leaves has been used to track changes in vegetation.The normalized difference vegetation index (NDVI) has been widely utilized and implemented to evaluate the chlorophyll content for vegetation development and health in areas with complete leaf coverage. 16In rubber plantations, NDVI was found to be more susceptible to phenological fluctuations in canopy moisture and vegetation cover.According to the study, NDVI has been successfully employed to detect wintering in rubber plantations using MODIS images. 16ccurate time-series of vegetation indices are crucial for monitoring vegetation phenology in order to detect early stages of diseases and reduce production loss due to infestation outbreaks.Time series of NDVI from satellite data can be used to approximate phenological states and quantify overall vegetation behavior. 179][20][21][22] The prediction of vegetation dynamics using time series data is critical for disease control.Recent research has shown that random forest (RF) may accurately predict vegetation phenology. 23he study of the phenology of the rubber tree is critical for disease prevention.Zhai et al. studied whether prolonged defoliation enhanced powdery mildew disease infection rates and resulted in lower latex yield. 24Gasparotto et al. discovered that the yearly defoliation and refoliation cycles that occur in rubber plants after 5 years of age have a considerable connection with foliar disease attacks. 25Although phenological studies have risen in recent decades, we simply do not have any information on phenological studies of different rubber clones during Pestalotiopsis disease, particularly in tropical rubber plantations, based on Sentinel-2 imagery.
The main purpose of this study is to contribute to rubber disease monitoring by observing the phenological changes of rubber plantations, specifically the pattern of the wintering period based on Sentinel-2 NDVI time series data and local weather station data from 2019 to 2022 during Pestalotiopsis infection.The goals of this study are as follows: (1) estimating NDVI using reflectance from the upper canopy based on Sentinel-2 imagery across three rubber clones (BPM 24, GT 1, and RRIC 100); (2) analyzing phenological changes of rubber trees on the wintering period during Pestalotiopsis infection using a harmonic model and time-lag cross-correlation; and (3) developing a prediction model of vegetation indices using the random forest regressor (RFR) algorithm.

Experimental Locations and Workflow
The study area of this research is the rubber plantation of Indonesian Rubber Research Institutes (IRRI) in Sembawa, South Sumatra, Indonesia.The fields have been infected by Pestalotiopsis diseases since 2016 with varying degrees of canopy infection severity.Three fields with different rubber tree clones, namely BPM 24, RRIC 100, and GT 1, and aged 10 to 11 years were selected as experimental areas in this research.Figure 1 and Table 1 present the location and digital elevation model image of the study area.
The workflow of the proposed work is shown in Fig. 2. The method starts with data acquisition, pre-processing, NDVI calculation, imputation and smoothing, analysis of phenological changes, prediction of vegetation indices, and evaluation.

Mean Absolute Percentage
Error (MAPE)

Data Acquisition
In this study, we obtained a 4-year time series data from the Sentinel-2 level 2A product of the study areas from January 1, 2019, to December 31, 2022.The 291 imagery data were acquired from the Google Earth Engine (GEE) platform.Sentinel-2 provides reflectance data for 13 bands and has a 5-day temporal resolution.For this study, we used four spectral bands from a top-ofatmosphere product with a spatial resolution of 10 m: B02 (blue), B03 (green), B04 (red), and B08 (infrared).We also collected UAV multispectral data and digital hemispherical data during disease outbreak as the ground truth data for disease severity at two time periods, between January 18-19, and June 15-16, 2022.The use of UAV data enables a broader understanding of the affected area, providing increased resolution and enhanced accuracy through hemispherical data.These data were used by experts to assess the health of vegetation and determine the severity of Pestalotiopsis disease.Daily climate data were obtained from local weather stations in the rubber plantation and the Indonesian Central Bureau of Statistics and the Indonesian Meteorology, Climatology, and Geophysical Agency from 2019 to 2022. 263 Pre-Processing Sentinel-2 Imagery Cloud cover is a common problem in evaluating time-series data using optical imaging.Cloud removal is vital for the subsequent data analysis and to increase the integrity and availability of the remote sensing data.There are three approaches of cloud removal, i.e., spatial informationbased method, temporal information-based methods, and multi-source auxiliary informationbased method.27 Many studies on cloud removal have been conducted.[28][29][30][31] To eliminate cloud cover in this experiment, we used s2cloudless, a machine learning-based cloud identification technology developed by Sinergise's EO Research team.This technique leverages the light GBM algorithm, which was trained on a massive dataset of cloudy and non-clouded samples from around the world.32 Before applying the gradient boost-based technique, all bands are upsampled to 10 m resolution using bilinear interpolation.The algorithm employs Sentinel-2 Level-1C TOA reflectance values from the following 10 bands as input: B01, B02, B04, B05, B08, B8A, B09, B10, B11, B12; and the output is a cloud probability map.In this study, we used a threshold value of 0.4 to reduce cloud omission errors.

Calculation of NDVI
To explore the rubber trees phenology stages and define vegetation behavior during Pestalotiopsis infestation, we studied the NDVI-time series data acquired from Sentinel-2 datasets.The time series graph was created using the NDVI from random sites in the three rubber fields.The NDVI is determined using the equation as follows: where NIR is the reflectance of NIR radiation and RED is the reflectance of visible RED energy.The index for Sentinel-2 can be derived as the ratio of the difference and total of reflectance in NIR (B8) and red (B4) areas, as shown as: 33

Imputation and Smoothing NDVI Time Series
The missing information generated by cloud causes spatial and temporal gaps in satellite earth observation data, as well as biases in subsequent image processing and application.Various methods for dealing with missing data have been proposed.Methods are classified into four types: spatial-based methods, spectral-based methods, temporal-based methods, and hybrid methods. 34n this study, we used a temporal-based method or "gap filling" technique.Recent temporal techniques, such as data fusion, 35,36 pixel unmixing, 37 data interpolation, 38 or best-pixel selection, 39,40 have been examined for large-area land cover mapping.To obtain the gap-filled satellite image, we used the linear interpolation method in this work.The linear interpolation function equation is as follows: 41 ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 4 ; 1 0 1 where x is the independent variable, x 1 and x 0 are known values of the independent variable, and fðxÞ is the value of the dependent variable for a value x of the independent variable.The noise of the signals received by the satellite sensors can occur due to cloud cover, atmospheric disturbance, sensor failure, and so on. 42The time series smoothing method can be used to reduce noise and assess plant phenological trends. 42Smoothing methods were investigated by Atkinson et al. in reconstructing consecutive time series of vegetation indices to estimate vegetation phenology. 43There were three types of smoothing methods: 44 temporal methods, frequency methods, and hybrid methods.In this study, we used temporal-based method, a moving average (MA) filter for smoothing NDVI time series data.The MA filter reconstructs NDVI time series by replacing the central point with the average of all points in an equation-based moving window of fixed size.The advantage of this strategy is that the seasonal peak of the curve is preserved.

Analysis of Phenological Changes
The analysis of phenological changes of NDVI was investigated using harmonic analysis.The harmonic analysis of NDVI time series aims to characterize the spectral behavior of NDVI by decomposing the time series into a sequence of basic sinusoids and a set of sine or cosine terms with unique coefficients and intercept (harmonic terms). 45Time-lag cross-correlation analysis was used to understand the relationship and potential lag (lead effects) between NDVI and climate variables.Cross-correlation measures the similarity between two time series as a function of the time lag applied to one of them.Time lag indicates the time delay between precipitation and temperature changes and their impact on NDVI.

Prediction of Vegetation Indices
In this study, we analyzed the relationship between NDVI variations and climatic variables to understand the impact of the Pestalotiopsis diseases on vegetation dynamics in a rubber plantation using RFR.RFR was first proposed by Breiman (1996) as an ensemble learning method that can be used in both classification and regression task. 46RFR is a non-linear ensemble technique that focuses solely on decision trees for classification or regression.The concept behind RF for time series is to replace the standard bootstrap with a dependent bootstrap in order to subsample time series during the tree construction phase in order to account for time dependence. 46

Evaluation
To validate the performance of the model, multi-step-ahead monthly NDVI forecasts were simulated to walk-forward validation.The walk-forward validation was utilized to improve the model's precision.The dataset time series was divided into training, validation, and testing sets, with 90% of the dataset used for training and validation, and 10% retrained for testing.Means absolute percentage error (MAPE) performance evaluation was conducted to validate the accuracy of the proposed model.The following expression is the MAPE equation: where E t is the predicted value for t'th, and the A t is the element is the actual value for t'th.

Results and Discussion
Understanding and predicting the leaf phenology changes of rubber trees can help control diseases.The pattern of defoliation and refoliation of rubber trees (wintering period) is used to understand disease spread and severity, as well as to plan or decide on chemical spraying to control diseases. 45Wintering happens when rubber developed an adaptation to reduce transpiration due to water scarcity during the dry season.Wintering patterns are well known to have an impact on yield.In this study, we analyzed phenological changes during through wintering period during Pestalotiopsis infection using Sentinel-2 NDVI time series data and daily climate data from 2019 to 2022.

Environmental Factors Affecting Pestalotiopsis Disease
The incidence and severity of Pestalotiopsis sp.disease are intricately correlated with various environmental factors that influence the fungus's growth, spread, and impact on plant populations. 47Pestalotiopsis sp.prefers specific temperatures for growth and reproduction.Extreme temperatures cause stress in plants, making them more susceptible to infections. 48emperature fluctuations can also affect the fungus's growth rate and development, as well as the host plants'.High humidity can create an ideal microclimate for fungal growth.Humidity, along with temperature, affects spore production and dispersal.Pestalotiopsis sp.thrives in humid environments, as do many other fungi.
Environmental stressors, such as intense rainfall or flooding, have the potential to accelerate disease spread by creating favorable conditions for spore dispersal and germination. 48,49In contrast, prolonged periods of drought have the potential to diminish the vitality of plants and render them more vulnerable to infections. 50,51he vegetation index is an effective tool for assessing plant health in relation to Pestalotiopsis disease.A declining vegetation index may indicate stress on plant populations, which could be caused by Pestalotiopsis spp.infection.Monitoring this index enables the early detection of potential outbreaks.A thorough understanding of the interactions between temperature, precipitation, vegetation index, and other environmental factors is required for predicting, preventing, and managing Pestalotiopsis disease in rubber plantations.The analysis of the relationships between these variables allows for the identification of patterns and correlations, which aids in the prediction of Pestalotiopsis sp.outbreaks, the implementation of preventive measures, and the development of disease management strategies in affected ecosystems. 47

Phenological Changes Based on Harmonic Analysis of NDVI Time
In this study, we observed three rubber clones (GT 1, BPM24, and RRIC 100) aged 10 to 11 years to study the NDVI trend.Figures 3(a)-3(c) show a harmonic model of NDVI time series from 2019 to 2022 for three rubber clones.We found that the overall trend of the NDVI decreased significantly from 2019 to 2022 for all clones.
Pestalotiopsis was reported in rubber plantations in North Sumatra and South Sumatra since 2016. 1 Figure 4 depicts defoliation and refoliation trends of each clone.According to previous research phenology stage of defoliation and refoliation period for rubber wintering occurs in the dry season, which in Sumatra is between July and September 16 (as shown in Fig. 4 in gray area).Defoliation takes place over a 4-week span from July to August, signifying the conclusion of the growing season.Simultaneously, it marks the onset of the subsequent growing season, transitioning into the refoliation period spanning August to September.During 2019 to 2022, we found that defoliation and refoliation events shifted due to Pestalotiopsis disease.Early wintering was reported in all clones.The first defoliation occurred in March to July 2019, followed by secondary leaf fall in November and January 2020.In 2020 and 2021, the defoliation occurred repeatedly or extensively in all clones, indicating physiological stress, nutrient depletion, and a compromised defense system.These factors create favorable conditions for pests and pathogens to thrive, leading to increased vulnerability and the potential for further damage to the trees.
In 2022, the NDVI gradually decreased and defoliation significantly and occurred earlier in March.All clones suffered from extensive defoliation between March and July in 2022.According to weather statistics data in Fig. 5, in 2022, precipitation increased and caused a decrease in minimum temperature compared to 2019 to 2021.The increase in precipitation can create a cascade of unfavorable conditions for rubber trees, including the spread of fungal diseases, waterlogged soil, nutrient deficiencies, disruption of photosynthesis, pest infestations, and physiological stress.These factors collectively contribute to the extensive defoliation observed in rubber trees during periods of heavy rainfall and this can be reflected in temporal NDVI profiles with shifts in the timing of NDVI peaks and valleys.
The phenology of rubber trees is affected by the period of defoliation.A longer defoliation period would consume more conserved carbohydrates, reducing available reserves for disease defense and latex production. 24This condition is associated with significant defoliation and decrease in rubber yield.The shifts in the starting and ending periods of the annual dry season and wintering are likely the result of a complex interplay between Pestalotiopsis disease and broader climate change impacts.

Analysis and Prediction of NDVI Based on Climate Variable Time Series
Using Random Forest Algorithm The values of NDVI, temperatures, humidity, precipitation, radiation, and wind speed were used to assess phenological changes in response to disease outbreaks and climate changes for three rubber clones.We applied the augmented Dickey-Fuller test (ADF test) to test the stationary time series.If the p-value is >0.05, then the data series is non-stationary.Table 2 shows that the  p-values of NDVI for all clones (GT1, BPM24, RRIC), precipitation, maximum temperature, humidity, and radiation are greater than the significance level of 0.05, and the ADF statistic is higher than any of the critical values.Therefore, the sequence of NDVI and climate data were non-stationary.
We applied time-lag cross-correlation analysis to understand the relationship and potential lag (lead effects) between NDVI and climate variables.We found a time lag which indicates time delay between precipitation and temperature changes and their impact on NDVI. Figure 7 shows the lag value associated with the peak of the correlation function of precipitation, temperature, and NDVI.Results show that vegetation has a 1-to 3-month lag response to precipitation and temperature anomalies.It means the precipitation and temperatures highly correlate with a time shift of 1 to 3 months.A negative peak cross-correlation between NDVI and precipitation and a positive peak cross-correlation between NDVI and temperature anomaly correlation are crucial for assessing the impact of changing climate patterns.Anomaly correlation between NDVI and climate variables indicates how variations in climate patterns affect vegetation growth and disease prevalence.The time lag between climatic conditions and changes in NDVI reflects the health conditions of vegetation caused by Pestalotiopsis disease.The significant reduction in NDVI values resulting from disease outbreaks and climatic conditions provides evidence for the development of Pestalotiopsis.
We applied the lagging time series to the features for predicting the NDVI changes using RFR based on time-lag cross-correlation of temperature and precipitation.Table 3 shows the features of the prediction model.
In order to evaluate the performance of the predictive model, we utilized time series crossvalidation, a process that involves conducting multiple validation iterations.The prediction model was built using data from 2019 to 2023.The entire time period was divided into training and testing sets.The expanding window walk-forward validation method, specifically a fivefold time series cross-validation, was used in our evaluation, as shown in Fig. 8.As the crossvalidation process progressed, the training data expanded to encompass all data that had preceded it, while the size of the testing data remained constant.The objective of walk-forward validation was to enhance the accuracy of the model in predicting the initiation of defoliation, with forecasting intervals varying between 1 and 5 months.
Before starting the model's training and testing phases with the RFR model, we used randomized search and grid search hyperparameter tuning.Randomized search is used as the first step in hyperparameter optimization.It aids in quickly gaining an understanding of the hyperparameter space and identifying promising areas for further investigation through grid search.
The training set was subjected to grid-search time-series cross-validation in order to identify optimal hyperparameters from the search space and to ensure unbiased tuning.We ran a randomized search for each fold in the time series cross-validation.This experimental setup was used for three different rubber clones.Table 4 shows the description and optimal hyperparameters used in the RFR model for each clone.8.18% for RRIC 100, 6.07% for BPM24, and 5.96% for GT1.The model had accuracies of 90.80%, 93.65%, and 93.35% for RRIC 100, BPM24, and GT1, respectively.Figure 10 depicts the important variables in the correlation between NDVI and climate variables for individual rubber clones using RFR model.Incorporating time lag effects of vegetation and climate variables is especially useful in predictive analysis, particularly when vegetation responses to changing conditions follow temporal patterns.Our findings show that time lag effects recognize that the current NDVI value is influenced not only by the immediate past value but also by values from specific time intervals prior to the current observation.In this study, the time-lag effects of NDVI (lag_ndvi1, lag_ndvi2, rolling_mean_ndvi) and climate variables, such as precipitation (lag_pre2, rolling_mean_pre), temperature (lag_diff temp1), radiation (lag_rad1, lag_rad2), humidity (lag_hum2), and wind speed (wind_speed, lag_win1, lag_win2) influenced the prediction of NDVI values in time series data.Significantly, the 1-month interval preceding NDVI features (lag_ndvi1) emerges as the most important NDVI predictor for all rubber clones.
Precipitation and temperature have a much greater impact on the phenological pattern of all rubber clones than humidity, wind speed, and radiation.Important characteristics for NDVI value prediction include the significance of time-lag effects, particularly precipitation (lag_pre2), the variance in maximum and minimum temperatures (diff_temp, lag_diff_temp2), and the ratio of precipitation to temperature (pre_temp, lag_pre_temp2).The ratio of precipitation to temperature offers valuable insights into the equilibrium between temperature and water availability, a critical determinant in evaluating potential stress on vegetation.A disparity in temperature and precipitation may indicate water stress, which has negative impacts on plant development and overall ecosystem health.Furthermore, the combination of increased precipitation and lower temperatures creates an environment conducive to the growth, spread, and establishment of pathogenic microorganisms, particularly fungi.This phenomenon contributes to an increased prevalence of diseases among rubber trees.It is concluded that the temporal lag effects of climate variables and the vegetation index are significant in the prediction of forthcoming vegetation dynamics.
The results show the random-forest-based classification was highly effective in modeling the relationship between NDVI and climate variables.We found a strong correlation between NDVI and climate variables.This correlation indicates that these variables can be used to determine the vegetation cover and to assess phenological changes in response to disease outbreaks.The analysis indicated that the outbreak of Pestalotiopsis disease and the weather pattern have been confirmed to influence the pattern and timing of defoliation and refoliation of rubber trees (wintering period).We conclude that the analysis of the phenological pattern and the proposed prediction model can be used to forecast the early wintering period of rubber trees to prevent the spread of disease and minimize latex yield loss.

Conclusion
During the Pestalotiopsis disease outbreak, the phenological pattern of rubber trees changed.Disease and climate change have shifted the defoliation and refoliation cycles in rubber plantations.Structure of temporal sequences RFR was used in conjunction with NDVI data and climate variables to forecast the phenological stages or wintering pattern of rubber trees.Time lag effects in both climate variables and vegetation indices are critical for predicting future vegetation dynamics.The time lag between temperature and precipitation changes and their impact on the NDVI was determined to be 1 to 3 months.By utilizing this methodology, complex correlations between historical and current NDVI values can be captured, thereby improving the precision of the forecasts.MAPE was applied to evaluate of the NDVI changes with horizon of 1 to 5 months.RFR algorithm exhibits good performance in predicting the NDVI changes for three rubber clones, The MAPE of prediction model for each rubber clone are 8.18%, 6.07%, and 5.96% for RRIC 100, BPM24, and GT1, respectively.The investigation of the phenological pattern and the prediction of the early stages of rubber tree defoliation is essential for disease spread prevention and minimize latex yield loss.

Fig. 1
Fig. 1 Location of the study area.

Fig. 3
Fig. 3 Harmonic model of NDVI time series over four years (from 2019 to 2022) for three clones, (a) GT1, (b) BPM24, and (c) RRIC 100.The red curve illustrates the fitted model, while the blue points denote the actual NDVI data.

Fig. 4
Fig. 4 Defoliation and refoliation trends of several rubber clones (BPM 24, GT 1, and RRIC 100) with Pestalotiopsis infection in 2019 to 2022.The gray area denotes the annual wintering period in Sumatra (between July and September).

Fig. 10
Fig. 10 Importance of variables in relationship between NDVI and climate variables.

Table 1
Extent of each field for three rubber tree clones.

Table 2
ADF test to test the stationary of time series.

Table 3
List of features for prediction model.

Table 4
The description and optimal hyperparameter used in the RF regression model.