Detection of wintertime green vegetated cover using object-based classification with open-source remote sensing and geospatial technologies

Abstract. It is important to detect wintertime green vegetated cover (WGVC), since it includes cover crop information, which is one of the most important agricultural best management practices used today. Related to this, cover crop area, which is part of WGVC, has been estimated using survey methods traditionally, but remote sensing can be used as a more time- and cost-effective assessment. Previously developed pixel-based methods to assess cover crops using remote sensing can have a salt-and-pepper effect, which lowers classification accuracy. Therefore, object-based classification was applied to estimate the spatial distribution of WGVC across the entire state of Delaware. To reduce the financial burdens of fee-based software products, the workflow was formalized only with open-source remote sensing software and publicly available imagery. WGVC in this study was defined as any vegetation planted or surviving during winter on field crop areas. Obviously, the WGVC area estimated in this study was far more extensive than the surveyed area of conventionally defined cover crops, which had a narrower definition. Applying this methodology across Delaware, the total WGVC was estimated to be 137,297 ha between 12/26/2021 and 04/30/2022. The classification accuracy of each date was evaluated using samples collected from pan-sharpened Landsat 8 and 9 images, and the accuracies were higher than 85%. Kappa statistics were above 74% in all cases. The workflow in this study may improve time, labor, and cost efficiency in other areas.


Introduction
Cover crops are an essential agricultural best management practice benefiting growers.3][4][5] Also, cover crops can protect perennial crop seedlings during establishment, 6,7 add organic matter, and enhance soil aggregation. 7In addition, leguminous cover crops can transform atmospheric nitrogen into biomass, which mineralizes in soil for the following grain crop to use. 7However, understanding the adoption and impacts of cover crops is hindered by the difficulty of accurately characterizing the spatial distribution of cover crops.
According to Ref. 8, cover crops are generally defined as grasses, legumes, and forbs planted for seasonal cover and associated benefits.These crops are meant to cover and enrich the soil instead of being harvested.The most common types of cover crops are rye and winter wheat. 8For example, cereal rye is often planted as a winter cover crop in the fall between cash crops (e.g., corn and soybeans). 8However, we focus on wintertime green vegetated cover (WGVC), which is overall vegetation cover on croplands during wintertime.The reason is that they function similarly to cover crop regardless of the purpose or precise timing of planting even though some of WGVC is not conventionally defined as cover crop.Any vegetation during wintertime is expected to have the ecological functionality of cover crops to a certain degree (such as reducing soil erosion), and perennial crops have been included in types of cover crop in some past studies (e.g., Refs.9-11).Consequently, WGVC identified in this study may include perennial crops (e.g., alfalfa), field crops planted in fall or early spring (before May), and green weeds in addition to conventional cover crops.Therefore, we expected to have a larger area of WGVC compared to the surveyed area of cover crops of traditional definition.
Currently, the usage of cover crops is examined by windshield surveys 5,12 or by questionnaire surveys, which are incomplete, often biased, and require significant time, labor, and cost. 13herefore, remote sensing technologies are known to be promising approaches for the assessment of cover crops especially in large scale cropland as noted in Ref. 13.A pixel-based approach has been applied in many cases (e.g., Refs. 5 and 13-15) to estimate cover crop area in agricultural remote sensing studies.Pixel-based approaches were used with agricultural and vegetation indices, such as a combination of normalized difference vegetation index (NDVI), normalized difference tillage index (NDTI), and normalized difference residue index (NDRI), 5 or a combination of NDVI, difference vegetation index (DVI), normalized green red difference index (NGRDI), and ratio vegetation index (RVI). 13However, it is known that the pixel-based approach has a salt-and-pepper effect, [16][17][18][19] which can cause problems when converting classified pixels to polygons.In Ref. 13, it was noted that the salt-and-pepper effect was observed in pixelbased classification for the assessment of cover crops.It is obvious that this salt-and-pepper effect will occur in WGVC as well.Therefore, another remote sensing approach other than the pixelbased method was sought to overcome the salt-and-pepper effect when assessing WGVC.Object-based classification was applied in this study to detect WGVC from satellite imagery while solving the salt and pepper effect.
In many cases, object-based classification is applied using commercial fee-based software.However, an object-based classification workflow using commercial software financially burdens many environmental, agricultural, or educational organizations because of the high cost of software licenses.Therefore, the large-scale area WGVC detection workflow proposed in this study was implemented using open-source remote sensing and geographic information system (GIS) technologies.This study is feasible because of the growing maturity of open-source geospatial technologies.Related to this study, there has been an unmanned aerial vehicle-based object-based image analysis (OBIA) of cover crop detection in vineyards 20 using eCognition Developer 9.2 21 recently.However, a workflow for large-scale (e.g., state-level) WGVC identification by supervised object-based classification using open-source technologies was not found in the literature.
For large-scale (e.g., state-level) cropland WGVC detection, it is ideal to use medium resolution imagery.Processing a high-resolution image for a large area will require large computing resources, and low-resolution images will give inaccurate results for WGVC delineation.Sentinel-2 constellation images have 10 m resolution with 5-day temporal resolution.Also, these images are free of charge to public, scientific, and commercial users. 22Therefore, Sentinel-2 images were used in this study because they are mid-resolution satellite images with near real-time accessibility.
The purpose of this study is to develop a workflow for delineating large-scale WGVC with OBIA and open-source geospatial technologies using Delaware, United States as our area of study.Delaware is a good location because it allows us to process and analyze data for an entire

Materials and Methods
The selected area of interest is Delaware, United States.For this area of interest, Copernicus Sentinel-2 23,24 (specifications in Table 1) images were acquired from US Geological Survey (USGS) EarthExplorer. 25 Currently, the Sentinel-2 data are accessible in Copernicus Browser 24 but not in USGS EarthExplorer any longer.The revisit time of 5 days was sufficient to find cloudless images in reasonable temporal intervals.
Because WGVC can change during a season, multiple image dates were needed to represent the entire wintertime.The selected images were from December of the harvest year, and from February and April of the following year.Since three tiles are required to cover Delaware, virtual raster images were built using the raster program gdalbuildvrt 26 using tiles in Table 2 and clipped in ".tif" format for the state of Delaware.
These false color (NIR, R, and G bands) images (Fig. 1 displayed with cumulative count cut with minimum of 2% and maximum of 98%) were segmented using large-scale mean-shift (LSMS) 27 segmentation with Orfeo ToolBox (OTB). 28,29The mean shift algorithm 30 has advantages in its use of a hierarchical relationship between segmentation levels, unlike the scale invariance of the watershed algorithm. 31Also, it does not require prior knowledge about cluster numbers and shape constraints. 32In addition, more complex shapes can be modeled using the mean shift algorithm compared to K-means. 33LSMS was chosen as the segmentation algorithm because the image data in this study had 10 m GSD and covered the entire state of Delaware, which was quite large to be segmented using the traditional mean shift option in OTB.Table 3 shows the parameters used for LSMS in this study.
Each spatial and range radius in Table 3 represents the thresholds of spatial and spectral signature Euclidean distance in evaluating pixels in an object. 34The minimum size for segmentation of an object was set to 400 pixels, which is equivalent to 9.88 acres.This value was determined after finding that the size of most individual fields was larger than 400 pixels after visual examination of the fields in Sentinel-2 images.Also, any isolated crop field less than 400 pixels  To restrict the area of interest to field crop areas, only the segmentations on the field crops were extracted by clipping segmented objects by CDL cropland areas.Beyond a small-scale area (e.g., farm level), it is hard to identify field crops physically for the entire state.Therefore, 2021 NASS CDL for Delaware was used to identify field crops area, as used for cover crop identification by Hively et al. 5,36 in a similar way.Because CDL includes land covers other than crops, only field crop pixels were extracted using gdal_calc.py.Table 4 shows the field crops and their area calculated using QGIS 37 raster layer unique values report tool.The crop name for each pixel value could be found in Delaware CDL Metadata. 38The total area of field crops was calculated as 183,754 ha (454,057 acres).The most significant field crops were corn and soybeans, whose sum represented about 81% of the entire field crop area.
However, the clipping by CDL modified the segmentation object polygons and even created multi-part polygons in some locations after clipping out central areas of polygons.Therefore, the attribute of each object had to be recalculated to reflect these changes.First, possible multipart polygons were separated by using Multipart to Single part tool in QGIS, and the objects' attributes (mean and variance) were calculated with the Zonal Statistics tool in QGIS for each band.In Ref. 39, texture was measured in terms of homogeneity, contrast, dissimilarity, entropy, standard deviation, correlation, angular second moment, and mean.Therefore, using mean and variance addresses not only spectral characteristics but also textures of objects to some degree.
After training samples, the field crop segmentations were classified using normal Bayes (NB) 40 supervised classification in OTB.In Ref. 41, NB and support vector machine (SVM)  showed better performance than the classification and regression tree (CART) and K nearest neighbor (KNN).Although NB needs a higher number of samples, SVM requires complex tuning parameters.Therefore, NB has a clear advantage in this study compared to SVM, CART, and KNN.For sampling purposes, both false and true color images, which were displayed with cumulative count cut with minimum (2%) and maximum (98%), were used as ground truth.
Using best judgment, the operator could identify most vegetation cover on the field crop area as red and green color tones in each false and true color image for training.Figure 2 shows examples  of the sampling of WGVC and non-WGVC on the clipped segmentations for training.The sampling points for each class were marked inside each sample object by the operator.The properties of each object were acquired for each sample point by join attributes by location with intersects geometric predicate in QGIS.
It was noted that the number of training samples should be between 10 and 100 times the number of bands in practice. 42Therefore, the number of samples, which gave satisfactory classification results later in each case, were larger than 30 (Table 5) in every case.This operation was implemented for each false color image for each of the three dates, and they were combined by Union operation in QGIS to represent the WGVC between the winter 2021 and the spring of 2022.
The average NDVI values of WGVC and non-WGVC objects for each date were compared.After creating NDVI raster, the mean NDVI for each sample object was calculated by zonal statistics in QGIS. 37In Fig. 3, WGVC samples had higher NDVI values than non-WGVC samples.However, the minimum NDVI of WGVC and maximum NDVI of non-WGVC overlapped to some degree for 12/26/21 and 02/09/22.The NB classifier is expected to address this confusion problem by computing probability of membership to each class compared to alternative methods relying on NDVI thresholds.
Figure 4 shows the main workflow described above.
For the accuracy assessment, a different dataset other than Sentinel-2 imagery was sought.The criteria of the data for accuracy assessment were (1) the data should be publicly available geospatial data; (2) the data should show land cover on the dates near those of Sentinel-2 images.Landsat 8 and 9 images satisfied the criteria, since they are publicly available georeferenced images in USGS EarthExplorer, and Landsat has temporal resolution frequent enough to have similar image capture dates with Sentinel-2 images.Using Landsat 8 and 9 images, false color images were composited with NIR, R, and G bands, and true color images were composited with R, G, and B bands, as shown in Table 6.Both false and true color images were used to assess the accuracy of the classification.However, the GSD of NIR, R, G, and B bands of Landsat 8 and 9 were quite large (30 m).Therefore, pan-sharpened images (GSD ¼ 15 m) were created for both false and true color composite images using panchromatic images (GSD ¼ 15 m) with Pansharpening tool 43 in the GDAL plugin of QGIS. Figure 5 shows the images used for the accuracy assessment.The time difference between Landsat and Sentinel data was 10 days maximum.This time gap was allowed because some of the Landsat 8 and 9 imagery taken near Sentinel-2 data acquisition dates had severe amounts of clouds.Fifty samples with an area larger than or equal to 10,000 m 2 were collected randomly from each class of classification results using QGIS random extract within subsets tool for date 12/26/2021 and 2/9/2022 results.The classification results of those objects were compared with pan-sharpened false color and true color Landsat images (12/26/2021, 2/9/2022 each) by the operator.
However, the 5/9/2022 Landsat 8 image still had thick cloud cover in the southeastern part of DE.Therefore, 80 initial sample polygons larger than 10,000 m 2 were randomly chosen from 4/30/2022 classification results table, and 50 samples, which were not affected by the cloud cover in the order of rows were used for the accuracy assessment by comparing with pan-sharpened false and true color Landsat image of 5/9/2022.The area threshold (10,000 m 2 ) was imposed to include only larger polygons for accuracy assessment.If the sample object was composed of WGVC and non-WGVC pixels, the object was identified as the class of the majority of pixels.Since the samples for the accuracy assessment were segmented objects, the classification accuracy was calculated considering the area of each sample object using Eq. ( 1  E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 7 ; 1 1 5 π ¼ where π is the overall accuracy, c i is either 1 for correct classification or 0 for incorrect classification, n is the number of validation units, and s i is the area of single sample unit i.Also, the Kappa statistic was calculated with the area of objects instead of the number of pixels in the following equation: 42 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 4 ; 2 1 3 where r ¼ the number of rows in the error matrix, x ii ¼ area in row i and column i, x iþ ¼ total area in row i (shown as marginal total to right of the matrix), x þi ¼ total of obervations in column i (shown as marginal total at bottom of the matrix), and N ¼ total number of observations included in matrix.
The strength of the Kappa statistics was evaluated for agreement as suggested in Ref. 46, with values of 0.41 to 0.60 considered moderate, 0.61 to 0.80 considered substantial, and 0.81 to 1.00 considered almost perfect agreement.Finally, the areas of WGVC for each date and the area merged into polygons by Union, which is a QGIS vector processing tool, were calculated.7) for WGVC and non-WGVC.The training accuracy of every image was high.WGVC and non-WGVC of Delaware were classified as shown in Figs.6(a)-6(c) for 12/26/2021, 02/09/2022, and 04/30/2022 using an NB classifier with trained models for the extracted objects for field crop area.Also, visual inspection suggested WGVC and non-WGVC usually was classified properly (example zoomed area west of Dover, Delaware, United States in Fig. 7).In Fig. 6, there was no substantial WGVC in the northern part of DE, because this part is mostly urban area (e.g., the cities of Newark and Wilmington).Also, the Delaware Bay area (mostly wetlands), Dover (urban), and Redden State Forest (forest) did not have a substantial presence of crops.The polygons merged by union for the above three dates showed WGVC identification spread across most of the state [Fig.6(d)].The accuracy of classification was evaluated against pan-sharpened Landsat 8, 9 false/true color images by visual inspection as shown in the confusion matrices (Table 8).In confusion matrices, user's, producer's, and overall accuracies in all dates were higher than 85% (Table 8).Therefore, the classification results were entirely satisfactory.Also, kappa statistics (KHAT), which is a measure of the difference between the actual agreement and chance agreement, 42 was higher than 74% in all cases.These kappa statistics were quite satisfactory, as well.Total WGVC area was found to be about 137,297 ha (Table 9).
It was found that a multi-date WGVC identification strategy was useful because the total area estimated by Union was higher than found using images of individual dates.The total area of field crops from Delaware CDL 2021 (Table 4) was 183,754 ha.Therefore, about three quarters (75%) of the field crop areas were covered by WGVC between 12/26/2021 and 4/30/2022.Since WGVC includes conventionally defined cover crop, it is meaningful to compare WGVC area of 2021 to 2022 winter with the cover crop area most recently surveyed in 2022.In the Nonpoint Source Program (NPSP) 2022 Annual Report for Delaware, 12 the cover crop area was 40,811 ha (originally 100,846 acres) or ∼22% of field crops.When we compare this with total WGVC found (137,297 ha), the difference is 96,486 ha.The first reason of difference is the additional types of vegetation of WGVC that are not defined as conventional cover crop.The second reason is the difference in traditionally defined cover crop area itself between the survey and true value, since the survey methods can be incomplete and biased. 13t should be noted that the usage of cover crops has been found to be increased during the investigation.The cover crop usage has been increased by 50% between 2012 and 2017 in the United States, 8 and four times between 2011 and 2021 in the Midwestern United States. 47he 2016 to 2017 report on the fifth annual cover crop survey by the Sustainable Agriculture Research and Education and the Conservation Technology Information Center, where respondents represented 47 states, showed about 25% and 60% increase in the usage of cover crops from 2014 to 2015 and 2016, respectively. 48Also, the 2022-2023 report, where respondents represented 49 states, showed that the mean number of acres of cover crop for respondents who used cover crops, has been increased from 324.9 acres to 413.6 acres between 2018 and 2022. 49his upward trend was verified when comparing the cover crop area surveyed in the NPSP 2022 Annual Report for Delaware 12 and the Census of Agriculture (COA) of 2017 50 for Delaware.The cover crop area of Delaware in the 2022 NPSP annual report was 40,811 ha, which was larger by 5,153 ha when compared with the cropland planted to a cover crop (excluding CRP) (35,658 ha, originally 88,112 acres) in 2017 COA for Delaware.Because vegetation during winter included not only planted cover crops, the WGVC estimated by the proposed method in this study would be more beneficial for some agricultural and environmental modeling applications.
Using OTB for supervised OBIA is challenging because of the lack of user-friendly documents for sampling, training, classification, and accuracy assessment.This is often the case for many free, open-source technologies.However, the object-based classification tutorial 51 was informative for forming the workflow of supervised classification using OTB in this study.The current study is expected to contribute to these resources, since we conducted supervised object-based classification with concrete examples in this study.Therefore, this study is applicable to workflow development for other uses of OBIA.

Summary and Conclusion
To identify WGVC areas in Delaware, object-based classification was applied using open-source geospatial technologies.The application of a remote sensing technique (object-based image classification) with LSMS segmentation enabled large-scale WGVC detection, which is efficient in terms of cost, time, and labor.Also, the salt-and-pepper effect was removed by applying an object-based classification approach instead of the traditional pixel-based methodology.
Another hurdle in object-based classification, which is fee-based commercial software's high cost, was overcome using OTB, GDAL, and QGIS, which are free and open-source geospatial technologies.For training NB classifier, samples were chosen for WGVC and non-WGVC classes.For accuracy assessment, classification results were compared with the pan-sharpened Landsat 8,9 images by the operator.The final NB classification results evaluated against pansharpened Landsat 8,9 false/true color images were quite satisfactory with confusion matrices showing overall accuracies higher than 85% and KHATs higher than 74% in all cases.The input data was created using NIR, R, and G bands of Sentinel-2 images.These images were free and publicly available with optimal GSD (10 m) for state-level analysis and high temporal resolution

Fig. 7
Fig. 7 False color image (a) and classification (b) for an example location west of Dover, Delaware, United States.
(5 days), which enabled the acquisition of cloudless images in a reasonable time interval.The field crop areas from NASS CDL were used as candidate areas of WGVC.The total WGVC area was created by using the Union tool in QGIS, merging cover crop polygons of three dates (12/21/2021, 02/06/2022, and 04/30/2022), and this area was estimated as 137,297 ha overall.The presented study offers large-scale (state-level) WGVC detection with supervised objectbased classification using completely open-source technologies.The presented workflow in this study will be beneficial to future WGVC studies and can be used by organizations with limited time, labor, and funding.

Table 2
Description of image data.

Table 3
Parameters for large scale mean shift segmentation.

Table 5
The number of training samples required for classification results to become satisfactory for each image date.

Table 6
Landsat data used for accuracy assessment.

Table 7
Confusion matrix for training sampling (column is reference label, row is produced label).

Table 8
Confusion matrices and Kappa statistics of three classification result.

Table 9
The total area of WGVC class union.