Framework for agricultural performance assessment based on MODIS multitemporal data

Abstract. We present a hierarchical classification framework for automated detection and mapping of spatial patterns of agricultural performance using satellite-based Earth observation data exemplified for the Aral Sea Basin (ASB) in Central Asia. The core element of the framework is the derivation of a composite agricultural performance index which is composed of different subindicators taking into account cropping intensity, crop diversity, crop rotations, fallow land frequency, land utilization, water use efficiency, and water availability. We derive these subindicators from net primary productivity and evapotranspiration data obtained from the MODIS sensor on board the Terra satellite during the observation period from 2000 to 2016, as well as from cropland maps created through multiannual classification of normalized difference vegetation index (NDVI). We classified pixel-based NDVI time series covering more than 8  ×  106  ha of irrigated cropland based on a hierarchical approach concatenating unsupervised and supervised classification techniques to automatically generate and refine training labels, which are then used to train a decision fusion classifier, achieving an average overall accuracy of 78%. The results give unprecedented insights into spatial patterns of agricultural performance in the ASB. The proposed method is transferable and applicable for global-scale mapping, and the results of this remote sensing-aided assessment can provide important information for regional agricultural planning purposes.

Framework for agricultural performance assessment based on MODIS multitemporal data 1 Introduction 1.1 Background scarcity, advancing land degradation, or negative effects on irrigation water availability by climate change or conflict, 6 but also competitive water uses from other sectors. It is therefore important to monitor and if necessary adjust agricultural performance. This requires a comprehensive understanding of the pattern of agricultural performance across both space and time, to identify potentially vulnerable regions. Measuring and quantifying agricultural performance is typically based on the relation of outputs (e.g., crop yield) to inputs (e.g., fertilizer or irrigation water, but also labor). Another measure for agricultural performance is the cropping intensity (the number of harvests per year). Spatially and temporally comprehensive and robust information must be collected frequently to trace back and better understand the determinants of the dynamics of agricultural performance, 7 and to evaluate and possibly improve agricultural management. The challenge is retrieving reliable and spatiotemporally consistent information. Agricultural statistics can be outdated, of doubtful quality, 8 or missing adequate sub-national historical inventory data. 9 Doing in-situ sampling can become prohibitively expensive when the areas to be monitored are huge or inaccessible.
Satellite-based Earth observation (EO) is a well-known alternative for assessing large-scale agricultural systems. EO has a great potential for supporting agricultural monitoring due to its ability to provide consistent information across scales, from the level of single parcels to regional-level coverage. Previous studies illustrated that EO-based monitoring can provide detailed information on land use and productivity to support detecting critical zones that are characterized by low agricultural productivity 10 or areas vulnerable to agricultural drought. 11 Such data provide valuable baseline information for decision makers to support implementing countermeasures to the adverse effects of land degradation or drought. Inversely, the detection of zones with a high agricultural productivity and high sustainability is useful to ensure their persistence. 12 The advantage of EO-based systems over conventional, terrestrial crop inventories is the timeliness and regularity, which is crucial for an operational land use monitoring system of a (near) real-time or retrospective concept.
Previous research has focused on land use monitoring and crop mapping using multitemporal satellite EO data, such as that from the Moderate Resolution Imaging Spectroradiometer (MODIS). Such studies present small-scale, regional [13][14][15][16] or large-scale, global [17][18][19][20] crop type mapping, where the majority of works clearly emphasizes the use of time series of normalized difference vegetation index (NDVI) 4,21 or the enhanced vegetation index. 22 Another important agricultural application based on MODIS data is crop condition mapping addressing aspects of vegetation dynamics. 23 In the latter field, MODIS imagery is commonly employed for drought forecast or early warning applications 24,25 and it generally fulfills the requirements of agricultural production monitoring, namely having a quasigapless, consistent archive of regular observations with sufficiently high temporal resolution since the year 2000 26 . Furthermore, the many contributions on agricultural applications employing MODIS data indicate that the relatively coarse spatial resolution provides an analytical scale of suitable spatial granularity for such applications.
Based on such advancements, operational monitoring systems have been developed that make use of meteorological data and remote sensing to support agricultural decision-making and research. 27 Examples are the Global Information and Early Warning System of the United Nations (UN) Food and Agriculture Organization (FAO) 28 or the Monitoring Agricultural Resources Program of the European Commission. 29 These monitoring systems focus on vegetation health, vegetation conditions, yield gap predictions, or assessments for drought risk estimation. Others are based solely on land degradation analysis caused by land abandonment, soil degradation, and desertification. 30 Previously developed monitoring systems for Africa focus on water productivity and take into account various sources of information, such as net primary productivity (NPP) or evapotranspiration (ET), e.g., the FAO water productivity open-access portal. 31 Such monitoring systems usually provide long-term monitoring which not only enables the understanding of spatiotemporal patterns and land use change, but also constitutes a solid basis for predictive analysis. 32 However, existing monitoring systems are mostly noncrop specific, hence, they do not inform about crop-specific land use intensity, or they are developed region by region and missing a comprehensive set of indicators that characterize agricultural performance. 33 In addition, they fall short in providing information about crop area and crop production. Moreover, when different basic (global) products are compared, e.g., cropland extent, crop types, or rainfall estimates, they often show large differences and show inaccuracies, which can partly be explained by restrictions due to the lack of in-situ data for calibrating crop growth models or crop classification, or barriers in operational adoption of EO-based methodologies. Relying on a single indicator to understand agricultural performance is likely insufficient to reveal a comprehensive understanding. For example, lower yields can be counterbalanced by higher intensity. Thus, it is recommended to combine various complementary indicators into an overall composite index of agricultural performance. Originally, the application of composite indices can be found in many studies related to agricultural, environmental, or socioeconomic vulnerability, 34,35 but has not been incorporated explicitly into an EO-based framework.
One region particularly affected by the adverse effects of an increasing food demand of a growing population and increasing agricultural performance is the irrigated agricultural system in the Aral Sea Basin (ASB) in Central Asia. It is one of the largest irrigated systems of the world, but only poorly equipped in terms of comprehensive monitoring systems that allow mapping and back tracing agricultural performance across the countries that belong to the ASB. Archive crop type or performance maps of the whole ASB do not exist yet, and previous studies on this issue were mostly developed region by region.
Against this background, we present a composite agricultural performance index (CAPI), based on satellite EO for assessing and mapping agricultural performance in the ASB. This index is based on a multitemporal sequence of maps that were derived from MODIS NDVI time series data using supervised, machine-learning-based time series classification. Since labeled ground truth data (i.e., crop type represented by a certain NDVI time series) for training and calibration at sufficient quality, quantity, and spatial granularity is often not available at a regional scale, we developed a framework that, first, generates and refines training data using a sequence of unsupervised and supervised classifiers in a hierarchical manner, and second, inputs that training data to a decision fusion classifier to estimate the underlying phenology of a given NDVI time series which at the same time can be used as a proxy for specific crop types (e.g., summer crops are depicted by the peak of season, i.e., maximum NDVI occurring during the summer season, whereas 80% of the summer crops are composed of cotton fields) and other cropland types (e.g., fallow and orchards). Herein, the specific technical goals are to • develop and validate a method to map agricultural land use (i.e., phenological cropland maps depicting the growing season as a proxy for actual crop types), • to derive cropland-specific indicators through automated training data generation and supervised classification of MODIS time series, and • map and describe agricultural performance using the CAPI.

Definition of Agricultural Performance
In this paper, we define agricultural performance as a combined metric of sustainability, efficiency, and productivity. This is an envisaged goal of the Central Asian countries to improve the efficiency of the cropping systems (i.e., the ratio between output and effort) and to increase the food production in order to ensure the food security for a growing population which is expected to reach 80 × 10 6 by 2030. 36 This aims at the UN Sustainable Development Goals to combat hunger and desertification and simultaneously achieving sustainable agricultural practices. 37 The combination of criteria such as sustainability, efficiency, and productivity into a composite agricultural index helps to better understand the patterns of agricultural development as well to identification of problematic (i.e., less sustainable, less efficient, and less productive) regions. The criteria are as follows.
High values of the proposed CAPI depict regions where all of the above criteria are met. Low values stand for the exposure to agricultural threats, such as land degradation or water scarcity, and the inversion of the CAPI can be used as a proxy to map agricultural vulnerability.

Aral Sea Basin
The ASB is a transboundary river basin composed of the six countries Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, Uzbekistan, and part of Northern Afghanistan. The ASB is directly supplied by the Amudarya and the Syrdarya Rivers. Irrigated cropland in the ASB accounts for ∼8.5 Mha that stretches alongside these two major rivers. 38 Under Soviet leadership, the ASB experienced a pronounced increase of the irrigated cropland area from 5.4 Mha (1950s) to around 8 Mha (1990s), mostly with the aim of increasing the production of cotton. 39 However, the excessive withdrawal of freshwater and inefficient water use reduced the inflow of the rivers. As a result, the Aral Sea dried out almost completely, and the irrigated cropland became largely salinized 40 and partly abandoned. 41 After the collapse of the Soviet Union and the following institutional and economic transformation, the agricultural production partly decreased sharply in some regions of the ASB. The socioeconomic and agricultural setting in each of these countries differs. Uzbekistan still has a state-regulated, planned economy, while there is an already well-established market economy in Kazakhstan, where the relative contribution of the agricultural sector to the gross national product has been decreasing since several years in favor of the secondary and tertiary economic sectors. 42 This is reflected by the different water use policies and irrigation practices of the countries. Figure 1 shows the study area with the different irrigation zones within the ASB.
The climate in the irrigated regions of the ASB is dry-arid continental with 100 to 250 mm of annual precipitation mainly falling during the winter (December to February). 43 Annual precipitation in the mountains can exceed 1000 mm. Because of the aridity, agriculture in the study area is fully dependent on a dense irrigation and drainage network, which was extensively developed during the Soviet era. Tajikistan and Kyrgyzstan, both upstream countries, are rather water-rich, whereas Uzbekistan and Turkmenistan experience more often water scarcity due to their dependency on the water withdrawal from the main river systems. In the arid environment of the ASB, precipitation contributes only very marginal to the overall water availability of the croplands whereas irrigation is crucial for the agricultural productivity in the region. 44 To maintain sustainable water use management, it is important that the actual field ET is less than the potential ET. The ASB is characterized by a high variance in climatic conditions ranging from the Kazakh steppe, the irrigation zones in the periphery of the Pamir Mountains with higher irrigation water availability, to the Karakum and Kyzylkum deserts in Turkmenistan and Uzbekistan, respectively.

MODIS net primary productivity data
To assess land and water use productivity, we utilized the annual MODIS MOD17A3H NPP with 500-m pixel size as a proxy for crop biomass. Previous studies showed the utility of MODIS NPP for the assessment of agricultural production. 45,46 Annual NPP is derived from 45 annual, 8-day gross primary productivity products (MOD17A2H) for a given year. 47

MODIS evapotranspiration product
To maintain sustainable water use management, it is important that the actual field ET is less than the potential ET. To assess this, we used the MODIS 8-day MOD16A2 collection 003 data with 500-m pixel size 48 and extracted the actual (ET a ) and potential (ET p ) evapotranspiration data for the observation period. The provided products are based on the Penman-Monteith model that uses other supplementary data such as meteorological data, albedo, land cover, and vegetation property dynamics. 49 The data quality of the MODIS ET products has been validated and regarded as sufficient in various studies. 50,51

Cropland mask
We used an existing cropland mask with ∼230-m pixel size for the year 2015. It covers the whole ASB to mask out nonagricultural land and areas which are not part of the irrigated cropland of the ASB. The mask was derived in a previous study based on MODIS NDVI time series data 5 and contains both actively cultivated (i.e., sown and harvested) and abandoned cropland. The cropland mask captures the maximum cropland extent, i.e., each pixel that has been identified as irrigated cropland at least once during the observation period between 2000 and 2015. We spatially partitioned the ASB in eight different irrigation zones which we defined in accordance to their spatial extent, irrigation system, and their administrative affiliation, i.e., province (see Fig. 1).

Landsat data
We used Landsat 8 data for a cross-comparison with the cropland classification results based on MODIS data. Therefore, we randomly selected five Landsat 8 OLI scenes between the year 2013 and 2016 covering the study area. For each selected year, there were at least eight cloud-free scenes for the observation period between March and October in order to be temporally consistent with the MODIS data that were used in this study.

Ancillary data
We used the first level (country level) of the open Database of Global Administrative Areas (GADM) 52 for spatial aggregation and cartographic purposes.

Method
The proposed framework demonstrates the ability for automated cropland classification without prior in-situ data from field campaigns or other ground-based sources. The cropland classes are linked to the phenology and growing seasons, thus, making an identification through vegetation index time series possible. The transferability to other years and regions requires the calibration of a stable and consistent classification model based on supervised classification trained with automatically obtained samples from data clustering and manual assignation of the target classes to the derived clusters. On the other hand, in-situ data are used for verification and validation purposes only. In a second step, cropland and water use indicators are calculated based on the created cropland maps and primary net productivity data and ET data based on MODIS data products. The final step is the calculation of a composite index (CAPI) based on all derived data products and weighted through a principal component analysis (PCA). In short, the framework consists hereby of several steps as shown in Fig. 3: • creation of cropland maps from MODIS NDVI time series, • derivation of cropland and water use subindicators, and • calculation of a composite index (CAPI).
The subindicators can be roughly divided into two groups: (i) cropland-related indicators and (ii) water use-related indicators. For creating the first group of maps, i.e., cropland-related indicators, we developed a new set of cropland maps for the ASB, based on an approach that is described in the following.

Satellite Data Preprocessing
We calculated the NDVI from the red and near-infrared bands contained in the MOD09Q1 products. We excluded pixels flagged as no data and excluded snow/ice or clouds in the MOD09Q1 pixel reliability layer prior to filtering based on MODIS quality assurance information. Only pixels labeled "good data" or "marginal data" were retained. Then, the annual NDVI data vectors were smoothened through a Savitzky-Golay filter 53 to reduce data contamination by clouds, cloud shadows, climatic effects, and atmospheric noise. Several tests examined and determined the optimum size of the filter window to 5 and the degree of the polynomial to 4, resulting in a good fit of the original NDVI signal and retaining important peaks of the time series. All raster datasets (i.e., the cropland mask, NDVI, NPP, and ET a and ET p ) were resampled to the same spatial resolution as the cropland mask using nearest-neighbor interpolation. To keep the original values encoded in the MODIS grids, we kept the sinusoidal projection for the analysis. Based on the cropland mask (see Sec. 2.2.4), all nonagricultural pixels were masked out and not analyzed further.

Multiannual Cropland Classification
Creating a set of multiannual cropland maps for a larger region, such as the ASB, requires the calibration of a consistent classifier algorithm that can be applied to data of different years and regions to ensure transferability and persistence. Due to the lack of a suitable set of in-situ training data, we employed a hierarchical approach employing unsupervised and supervised classifiers in a sequential manner to create labeled training datasets for the target cropland classes that describe the crop phenology. The assignation (labeling) to the target classes was done by manually selecting suitable clusters by visual interpretation of the respective NDVI time series response pattern, using the year 2016 as a baseline for all other years. Only clusters with explicit correspondence with the target class were considered and fed to a supervised classification algorithm. All other clusters were neglected. The classification algorithm classified the multiannual MODIS NDVI time series for all observation years for all the ASB in the defined cropland classes. The workflow is shown in Fig. 4 and described in the following.

Training data generation
Based on the multitemporal MODIS data in 2016 consisting of 30 individual scenes, we extracted pixel-wise NDVI time series, each represented as a data point in a 30-dimensional feature space defined by the NDVI values from each point in time. We used k-means clustering, which is commonly applied to remote sensing time series data, 54 in order to detect clusters of pixels characterized by similar temporal signatures. To determine an optimum number of clusters, we performed k-means for a systematically varying number of clusters k, and we used silhouette analysis 55 to evaluate the quality of the assigned clusters for each k by measuring the quality of the assigned clusters based on distances to the cluster centroid and to the nearest cluster for each data point. This yields an individual measure (i.e., silhouette score) for each data point, which is typically averaged over all data points to obtain a global separability measure. We visualized the clustering results by mapping a random subset of pixels (i.e., N ¼ 5000) from the 30-dimensional feature space into a two-dimensional (2-D) space using t-distributed stochastic neighbor embedding (t-SNE). 56 This allowed for visual inspection of the clustering results, as shown for a subset of the study area (i.e., Fergana Valley, Uzbekistan) in Fig. 5.
To account for cropland variability across the different irrigation zones in the study area, we run separate k-means clustering for each irrigation zone, allowing for a different optimum number of clusters in each region. The resulting clusters were then assigned to seven target classes  (see Table 1) by visually inspecting and interpreting the annual NDVI time series signature of each cluster and referring to literature research about major crop types in the ASB. The target classes are cropland classes based on phenology and growing season. They depict the major crop types by characteristic phenological signatures captured by the NDVI time series. Clusters that did not unambiguously represent one of these target classes were neglected. Fallow lands represents regions that are temporarily (or permanently) taken out of the production cycle, be it due to water shortage, farmer's decision, land degradation, or land abandonment. On the latter, primary vegetation succession can be found. Due to this, fallow land encompasses several land cover types and has a temporal and spectral similarity with bare soil, shrubland, trees, or pastures and natural grassland. Figure 6 shows the NDVI signatures of the derived target classes.

Supervised classification
After labeling the selected clusters, we randomly extracted training data (i.e., MODIS pixels) from each cluster for calibrating a classifier algorithm. The supervised classification was conducted by a decision fusion approach via a soft voting classifier based on random forest (RF) 57 and gradient boosting trees (GBT). 58 Both classification algorithms are ensemble methods, but handle the variance and bias of the input dataset differently. RF builds parallel decision trees with high variance and low bias, whereas GBT generates sequential low variance and high bias decision trees with an ensemble of weak learners. Hyperparameter settings of both algorithms were set to 300 estimators (trees). The learning rate of the GBT algorithm was set to 0.01. To avoid overfitting, the maximum depth of the trees for both algorithms was set to 10. The voting classifier finally merges the decision functions of both classifiers through soft voting as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 0 8 where w j is the weight that is assigned to the j'th classifier, i is the class labels, and p is the predicted class probabilities. 59 As shown in Table 1, the initial training data generated by k-means clustering were utterly imbalanced and did not have a sufficiently even spatial distribution. To further balance the number of training data, we generated additional samples through a preclassification step. We first applied the precalibrated classifier models to classify each of the eight irrigation zones in 2016. The ensemble classifier created two types of outputs: (i) the class labels and (ii) a a-posteriori estimation of the classification probability for each pixel representing a confidence score of the class assignment.
We then selected "pure" pixels with an estimated posterior probability of more than 99.5% and added these to the initial training dataset (Table 1). This high level of confidence was selected based on previous research in this region, 60 where high probabilities were associated with pixels that were characterized by homogeneous land use/land cover.
Based on the training NDVI time series underlying the grid cells from the two-stage selection process, we calibrated the final classifier model in 2016 that was then applied to create the annual cropland maps for the years 2000 to 2015. The year 2016 was chosen for calibration due to the high consistency and availability of high-resolution satellite imagery in Google Earth for the whole ASB where we inspected the selected training data visually. Because of the agroecological gradients and the variety of management practices in the ASB, spectral signatures varied spatially and could lower the recognition ability by a single global classifier. The study area, therefore, was stratified according to eight irrigation zones (see Fig. 1). Hence, for each cropland class, 1000 training pixels were randomly selected in each of the eight irrigation zones and then split into a 60%/40% partition for training and testing, respectively, and forwarded to the above described decision fusion approach. Finally, we applied the trained classification models to each of the 17 annual NDVI time series to create one cropland map per year. This multitemporal cropland information was the basis for the subsequent derivation of several indices describing agricultural performance (Table 2), as presented in Sec. 3.3.1.

Cropland classification validation
We validated the created cropland maps in a three-stage validation procedure: • Landsat cross-comparison: Cross-comparison of the MODIS-based cropland maps to a cropland classification created by the same classifier and same training labels, however, trained on Landsat 8 OLI NDVI time series of higher spatial resolution (i.e., 30 m). • Visual verification: Visual inspection and interpretation of the temporal signatures of the NDVI time series and Google Earth historic imagery based on a stratified random sampling scheme. We visually associated the selected samples with the defined classes and verified in available Google Earth imagery, whether the MODIS pixel is located within cropland or whether the land has been cultivated for this observation time. • External crop type validation: Accuracy assessment through GPS-based in-situ data associated with the target classes used herein.
Landsat cross-comparison: To assess effects of the relatively coarse spatial resolution of MODIS data (i.e., ∼230 m) over time, we cross-compared them to high-resolution Landsat 8 data. [71][72][73] We randomly selected six Landsat 8 OLI scenes within the ASB (Fig. 7) from different years between 2013 and 2016 and applied the same proposed classification approach with the same target classes for each Landsat scene. A direct comparison between both datasets is difficult due to the large spatial resolution differences. The Landsat classification maps were aggregated and reprojected to MODIS spatial resolution and sinusoidal projection. We reassigned the majority class of the Landsat pixels located within a MODIS pixel by zonal statistics through an area majority rule. Based on these pairs of cropland classifications, we performed pixel-wise map comparison to derive agreement measures.
Visual croptype verification: To assess spatial variations of accuracy across the study area, we estimated accuracy in the different subregions using a stratified random sampling approach 74 by visually inspecting the NDVI time series signatures of the samples and the corresponding classes. For each class, 250 samples were generated over the whole ASB for randomly selected years and assigned to the respective target class if the NDVI time series curves could be interpreted unambiguously. Such an approach can be also found in other related studies. 5,75,76 External croptype validation: We used GPS-based crop type in-situ data from two different field campaigns. To associate these discrete measurements with underlying agricultural parcels, we manually digitized the respective using Google Earth imagery. We then reclassified the insitu labels to the phenologic classes used herein and discarded parcels that covered <50% of the overlapping MODIS cells. This procedure affected ∼5% of the overall in-situ data.

Agricultural Performance Indicator Derivation
Based on the multitemporal cropland information, several indices were calculated, each of them describing agricultural performance ( Table 2).

Cropland subindicators
Active cropland index. Active cropland is defined as agricultural land sown and harvested in a given growing season. The active cropland index (ACI) summarizes the number of years within the observation period (2000 to 2016) in which land was classified as active cropland, i.e., all pixels excluding those classified as "fallow" in the cropland maps (see Sec. 4). This indicator supports detecting spatial patterns of land abandonment 5 and tracing the effects of periods of water scarcity or drought. The ACI is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 1 0 5 where f n is the number of pixels which were classified as fallow land during the observation period and i is the total number of years. Values close to 1 indicate a very high agricultural activity, whereas values close to 0 reveal zones with a high fallow land frequency.

Cropping intensity index. Cropping intensity index (CII) is an indicator for the quanti-
fication of agricultural intensification over time. It supports the identification of single or double cropping systems, i.e., irrigated agricultural land with more than one harvest per growing season or the area cropped more than once per year. In the case of agriculture in the ASB, most of the double cropping systems are supposed to include a temporal sequence or winter wheat followed by rice, maize, sunflower, vegetables (mostly potatoes and mung bean), cucurbits (mostly melons), or sorghum. The CII is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 5 9 2 where NIA is the average net irrigated area for all years, defined as the sum of all area under crops, and GIA is the average gross irrigated area for all years, defined as the sum of all area under crops, whereby the area under double cropping is counted twice. Values closer to 1 indicate a gapless multiannual double cropping intensity with at least two harvests per growing season, which can be interpreted as high agricultural performance.
Spatial crop diversity index. Crop diversification is a necessary technique to reduce the potential risk of specific replant disease as is often found in monocropping cultivation. 77 This indicator is measured by the Simpson index of diversity 78 where c is the relative share of the respective cropland class i associated with specific crops based on the growing season and phenology.
Temporal crop diversity index. Similar to the spatial crop diversity index (SCDI), the temporal crop diversity index (TCDI) reveals the diversity of crop types and whether monocropping or multicropping is present, but in the temporal domain. The TCDI contains information of crop rotation patterns and the frequency of the pattern change. Additionally, we introduced a measure of crop variety to emphasize not only the frequency of crop rotations but also the occurrence of different crop types. A certain land parcel can include many crop rotations within a certain time period with not many different crop types, e.g., a typical example within the ASB would be a regular cotton-wheat rotation for many years indicating more frequent change, but with only two different crops. On the other hand, a pixel can indicate a lesser amount of crop rotations, but for many different crop types, e.g., wheat-wheat-wheat-cotton-rice. 42 Thus, we propose the TCDI as a fused measure of two temporal diversity measures: • The frequency of crop rotations within the observation period, which indicates how often the cropland (i.e., the phenological type) has changed within a pixel. • The average number of different crop types within the observation period, which indicates how many different crop types have been cultivated within a pixel.
Both measures are highly correlated, thus we propose an aggregated index of their product as follows: where x n is the crop rotation events, y is the number of observation years, and μ c is the average number of different crop types per pixel.
Cultivated land utilization index. The cultivated land utilization index (CLUI) is an indicator of land utilization efficiency and indicates where and how the land was utilized. 79 It considers the total area under crops in relation to the average duration of the growing season per pixel. A longer duration suggests higher irrigation water availability. The CLUI can also support the interpretation of socioeconomic factors, e.g., a longer duration leads to a higher employment rate in the agricultural sector. The duration of the season is derived by the occurrence of bands in the time series whose NDVI value is >0.4, where experiments have shown that this threshold value is very suitable for the distinction of active cropland and nonagricultural vegetation.
where l i is the relative share of the average land conversion for each year (with 2001 as the starting year).

Water use subindicators
In addition to the cropland-related indicators, water use indicators are very useful to measure the water footprint and the water use efficiency of the produced crops. For each of the water use subindicators (Table 3), the cropland classification is essential and serves as input for the computation. Water use efficiency index. The measure of the "drop-per-crop" aspect is an important indicator for the analysis of water footprints in semiarid irrigation agriculture. Water use efficiency index (WUEI) has been defined in different ways according to respective project needs. The most common definition of WUEI is given by the ratio of the crop yield or biomass and the actual ET. 81 In our case, the calculation of WUEI is focused on the ratio of the ET a and the NPP per hectare, scaled by the respective crop share and by a weighting factor for each crop as shown in Table 4. The weighting factor is the transformed crop importance. This is given by the world market prices for commodities 85 which are associated with the respective cropland class, averaged for all possible crop types that can occur within a cropland class. We obtained world market prices in $ per kg for commodities for the period 2000 and 2016. We used the world market prices as factors to weight the importance of the respective cropland class in relation to the water use efficiency. where NPP is the net primary product for each cropland class c i with the associated world market price p and ET a is the average actual evapotranspiration.
Water availability index. Water availability index (WAI) is expressed by ratio of actual and potential ET. If ET a is less than ET p , then the water supply is sufficient, indicated by values close to 1, whereas lower values indicate water stress, drought risk, or drought vulnerability. 86 We adapted this measure to introduce a factor of water stress and inserted the crop coefficients from the FAO-24 report as shown in Table 5. 87 Crop coefficients are crop-specific factors for the  calculation of ET. Higher values indicate less crop water losses, thus higher water availability is present for potentially increasing agricultural performance.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 7 1 1 WAI ¼ ET a ET p Ã k c (9) where k c is the respective crop coefficient.

Composite agricultural performance index
Each of the calculated subindicators described in Secs. 3.3.1 and 3.3.2 was rescaled to a [0-1] range through a min-max normalization and to ensure indicator comparability within the same data variance. For all indicators, a lower numeric value indicates zones of lower productivity or efficiency. We have chosen a weighted aggregation technique 88 for the construction of the CAPI, which considers the summation of all subindicators with the respective importance to the overall CAPI. We calculated the principal components of all indicators to derive the factor loadings for each component 89 where j is the respective subindicator and ω is the weight for each subindicator.
At the point of this study, a questionnaire of local agronomists and experts who would define the weights of each indicator could not be conducted and information about a knowledge-based use of indicator weights is not present. Thus, the fusion of all subindicators into an aggregated composite index has been done through a weighted additive model, where the weights were retrieved through a PCA. 90 In our case, the averaged factor loadings of the PCA are used to construct the weights for the additive model, where the first three components described about 90% of the overall variance as shown in Table 6. Figure 8 shows the distribution of the initial clusters that were associated with the respective cropland class. As the clusters are spatially not well distributed, additional training data were Table 6 The loadings of the first three principal components and the calculated weights ω. Explained variance (cumulative) ∼60% ∼79% ∼90% -generated by training the voting classifier ensemble with the initial labeled clusters. The resulting classified pixels with more than 99.5% classification probability were added to the labeled training dataset to improve the spatial distribution of the training data and to account for spatial variability. The NDVI signatures of the newly selected training pixels were again visually inspected and used in the consequent cropland classification.

Map comparison, MODIS versus Landsat
The accuracy assessment (Sec. 3.2.3) revealed a high overall agreement between the MODISbased cropland maps from 2016 and the corresponding Landsat based maps (Table 7). Clearly, the distinction of summer crops and rice was sufficiently accurate. Lower precision and recall scores were achieved for the spring crops class that was confused with the double-crops class as both classes include winter wheat and other cereals of the same phenologic characteristics. Moreover, the pattern of the double crops (two maxima in the NDVI time series) is sometimes confused with the spring crops class when the lower second peak of the double-crops time series is similar to the lower second peak of the spring crops, which is often residual vegetation after the harvest. The lowest accuracy scores were achieved for the Landsat scene 158034 located in Turkmenistan due to the lower parcel sizes and a higher within-field heterogeneity. Soil salinity and natural vegetation mixed with agricultural land cause a more mixed NDVI response in the MODIS time series. 91 The highest accuracy scores were achieved for the Landsat scene 157029 located in the Lower Syrdarya irrigation zone (Kazakhstan), where only 3 cropland classes were present (rice, alfalfa, and fallow). Overall, the classifications have a higher precision and a lower recall which means that the classifier identified most pixels semantically, but also missed a lot of actual true positives, resulting in more false negatives. An overall agreement of 74% between the Landsat and MODIS classification datasets can be regarded as sufficient. Figure 9 shows the classification examples and the agreement between both datasets cartographically. A direct comparison is sometimes aggravated due to the different dimensionality of the temporal features of the Landsat dataset. For each year, much fewer Landsat observations (on average 11 observations within a growing season) were available due to cloud cover over the areas of interest. This reduced dimensionality (compared to the 30 available MODIS composite scenes per year) also may lead to a worse distinction of certain crop classes which require a higher data dimensionality for a better class separation, such as orchards and alfalfa.

Visual verification
The multitemporal NDVI signatures for each sample were manually inspected and compared with the respective MODIS classification results. This approach has an average overall classification accuracy of 84% because we draw samples from the same data distribution. If we summarize both validation approaches, we yield an average overall classification accuracy of about 78% for the given target classes and the long-term monitoring. Due to the temporal acquisition regularity, the same data dimensionality is ensured and thus, model transferability across different years, capturing the respective phenological patterns is possible, e.g., through transfer learning, inferred on unseen data from other distributions (years) and still from the same domain. 92 This likely caused the relatively high classification accuracies. In terms of the classification accuracy, the results were very accurate ( Table 8). The proposed method achieved 84% overall classification accuracy.

External verification
We have deployed in-situ GPS data from different field campaigns in Karakalpakstan and the Fergana Valley, both located within the Republic of Uzbekistan. Due to the coarse resolution of the MODIS sensor and the mixed pixel effects caused by the MODIS pixel size of 5.3 ha (230 m 2 ), we achieved a maximum overall accuracy of 75% in the Lower Amudarya, whereas the classification in the Fergana Valley achieves a score of 72% as shown in Table 9.
As can be seen from Table 9, the in-situ data were collected in a systematic manner and some of the target classes could not be captured. Moreover, during the conduction of the field campaigns, certain crop types were not grown in the sampled areas.

Classification Results
The  systems. This suggests that land intensification has been targeted in order to increase food production which is driven by a growing population and at the same time decreasing cropland availability. The acreage of orchards and alfalfa show a positive trend in all countries for the observation period. It is clear that cropland dedicated for cotton growing has been reduced in favor of crop diversification as well as the focus on other crops with a more sustainable water footprint and a more efficient water use. This can be seen by the negative trend of the summer crops, of which cotton is the major crop. More precisely, the aggregated statistics corresponds to the agropolitical developments of Uzbekistan and Turkmenistan since 2011 to reduce cotton production in favor of cereals and orchards. Moreover, the occurrence of water-rich years has increased for the time period since 2008. From the results for the Surxondarya and Vakhsh irrigation systems in Eastern Uzbekistan, Northern Afghanistan, and Tajikistan (Fig. 11), agricultural intensification and a decrease of unused fallow land have occurred during the observation period.

Spatial Patterns of Cropland and Water Use Subindicators
The calculation of the subindicators yielded eight different maps as shown in Fig. 12 with each of them representing agricultural performance in a different manner. The ACI map reveals areas with a higher cropland frequency (or reversely, lower fallow land frequency) that are closer to the main irrigation systems of the Amudarya and Syrdarya rivers. The Fergana Valley mostly includes areas with a high ACI whereas this indicator draws a clear boundary between the provinces of Khorezm (UZB), Karakalpakstan (UZB), and Tashauz (TKM), showing high ACI scores in Khorezm only, whereas the areas peripheral to the deserts are affected with a higher fallow land frequency due to land abandonment, desertification, or soil salinization. The CII maps show a similar pattern of agricultural performance except the irrigation areas of the Lower Syrdarya, where no cropping intensity has been identified. High spatial crop diversity was found in similar areas, whereby cropping diversity in the Fergana Valley had less importance than in the provinces Tashkent, Jizzakh, or Samarkand. Usually, areas with high spatial crop diversity also achieve high temporal crop diversity scores, expressed by the TCDI as areas with a frequent and regular crop rotation and a higher crop variation over time. CLUI scores were high in most irrigation zones except in Turkmenistan, which can be explained through the relatively short duration of the growing season. The LCI shows hotspots of land reclamation in Karakalpakstan, Turkmenistan and other peripheral and arid regions prone to desertification and land abandonment.

Correlation Analysis of the Subindicators
A correlation analysis between all subindicators (Fig. 13) shows the highest Pearson correlation coefficient scores between the CLUI and ACI indicators. The lowest scores were achieved between the CII and WAI indicators. In this sense, cropping intensity is independent from the actual and potential ET. Surprisingly, both water use indicators, WAI and WUEI, are not correlated. This can be explained with the relative high discrepancies between the actual and potential ET values of the WUEI indicator. Moreover, we generated scatterplots between each pair of subindicators and plotted them in combination with the cross-correlation matrix (Fig. 13), giving additional insights in the relationships between the subindicators.

Composite Agricultural Performance Index
The resulting map in Fig. 14 shows high CAPI values in proximity to the mountain areas in the Eastern ASB as well as in close proximity to the main irrigation channels. A very high correlation between both maps, especially in zones with a higher agricultural performance, is evident. In general, the usage of the additional MODIS products for NPP and ET for the calculation of the CAPI through all subindicators seems to suggest higher agricultural performance, even if crop diversity, cropping intensity, and crop rotations are relatively low, and fallow land occurrence relatively high. The computation of the CAPI solely through the cropland subindicators is more precise for the depiction of low agricultural productivity areas. A drawback of the CAPI is the high abstraction and generalization level. Areas that perform better within a specific subindicator might still be interpreted as vulnerable or less productive, e.g., areas with a high ACI (i.e., low fallow land occurrence or frequency) might still have low cropping intensity, low spatiotemporal crop diversity, low land utilization, or a low net irrigated area share. This could be monocropping areas without crop rotation, such as continuous summer cropland (in most cases cotton). The inverse CAPI can be also interpreted as an agricultural vulnerability index depicting critical areas with the following categorization: • general low performance areas, e.g., high fallow land occurrence, • areas with a less-sustainable land use management, e.g., monocropping, and • areas with a less-sustainable water use management, e.g., a low WUEI.
However, the CAPI does not provide information on the causality of the vulnerability or the drivers which have led to such circumstances. They can be numerous and often linked  to socioeconomical factors. The benefit of a CAPI over single indicators is the capability to summarize and to reduce the dimensionality of the observed variables. Especially in the constantly increasing domain of EO data, generalization is essential to build robust and descriptive global or regional models. Moreover, land use is a very dynamic, spatially and temporally varying issue where a single indicator is not sufficiently descriptive enough. Composite indicators had their origin in socioeconomic sciences where a variety of indicators is merged into a more meaningful and comprehensive information. The CAPI can be further used by agrobusiness or agroinsurance companies to enhance the information on the performance, efficiency, vulnerability, and sustainability of the used cropland. As accurate crop yield predictions are still very difficult to obtain, the CAPI can be also used as a proxy indicator for crop performance, identifying hotspots of potentially higher or lower crop yields. If applied in a real-time or nearreal-time manner, the CAPI could also be used as an early warning system for impending land degradation.

Conclusion
We presented a framework for the mapping agricultural performance by fusing various EO-based indicators about agricultural intensity and productivity in the ASB. We created new cropland maps for a 17-year period. Our approach is transferable and applicable on regional or global scales, given the target crop types for classification. It is suitable for monitoring, assessing, and analyzing of agricultural land in any agroecosystem. We can conclude that MODIS data are suitable for land use and cropland mapping and the derivation of cropland and water use indicators for large-scale applications, such as our study area which covers 9 × 10 6 ha. The temporal regularity of the data acquisition makes MODIS data a favorable instrument for the identification of phenological patterns for crop type classification and mapping of inactive or abandoned agricultural land. On the other hand, there is a spatial uncertainty and mixed spectral signatures from ambiguous land use, and cropland types per pixel decrease the analysis accuracy. For instance, the derivation of cropping intensity can be inaccurate, when two or more adjacent fields have a mixed constellation of single spring crops (e.g. winter wheat) and single summer crops (e.g. cotton), respectively. The NDVI signature for this pixel would also produce a bimodal distribution and mimic double cropping systems. In our study, double cropping systems were classified with a low omission error. In some cases, signal confusion happened when the variance and bias of the data were not captured sufficiently by the automatically derived training data.
A shortcoming of the proposed method is the dependency on a high-quality cropland mask which needs to be created or obtained a priori.
We demonstrated how the generation of training data through clustering and the generation of additional training samples by extracting the pixels can help to create reliable cropland maps with a very high classification accuracy. To create a more stable classification model, we deployed a decision fusion algorithm with soft voting based on GBT and RF. The derived cropland classification maps have a sufficient classification accuracy with an average of 73% derived from independent Landsat thematic maps and 84% derived from random stratified sampling data.
The calculation of the CAPI revealed high agricultural performance in regions with a dense population and a more developed infrastructure, such as the Fergana Valley and the Jizzakh-Tashkent-Syrdarya irrigation area. The CAPI mapping results in provinces Karakalpakstan in Uzbekistan, Kazalinsk in Kazakhstan, and Tashauz in Turkmenistan on the contrary suggest a low agricultural performance with longer cycles of fallow land and less spatiotemporal crop diversity. Clearly, the Fergana Valley is a region with a high agricultural performance, 93 but other indicators should be taken into account as well. We suggest the further fusion of our proposed indicators with other hydrological (e.g., average ground water table), pedological (e.g., degree of soil salinization or average soil quality), demographic (e.g., population density, birth rate, or the population share below 30), environmental (e.g., air and water pollution indices), or socioeconomic indicators (e.g., average unemployment rate or people employed within the agricultural sector). Such data were not available during the course of this study and it generally is very difficult to obtain, but would improve the calculation of an overall aggregated index. However, the spatial resolution of such indicators is very coarse and they often exist only on an administrative level and do not provide continuous spatial data.
Additional research needs to be done in the exact extraction of pure MODIS training pixels though a subpixel or cover fraction analysis from high-resolution data, such as Landsat 8 or Sentinel-2. Our workflow was based on k-means clustering for the extraction of homogenous pixel groups as training data, yet there exist other clustering techniques, such as DBSCAN or agglomerative clustering, which could be explored for the same purpose in future research. The semantic definition of the target classes can be further partitioned in a more precise way, resulting in more target classes which explain the feature space of the data.
The increasing amount of cloud-free data from the Sentinel-1, Sentinel-2, and new Landsat constellations will provide a sufficient baseline for high-resolution land use and cropland indicator mapping, e.g., using the Google Earth Engine platform or spatiotemporal data fusion methods.
To increase the meaningfulness of the composite indicator, more subindicators could be integrated. On the one hand, remote sensing based indicators, such as the vegetation condition index, derived from long-term NDVI observations, could insert more valuable information about drought events and desertification processes, which would be an important aspect for measuring agricultural vulnerability.