Secondary succession is a process that is often observed taking place in former agricultural ecosystems. Its characteristics are especially important in protected areas, for the purposes of monitoring and protective measures. Effective mapping of succession is facilitated by the development of automated methodologies based on remote sensing data, which are capable of complementing traditional field research. The objective of this work is to determine whether the classification of high-resolution hyperspectral and light detection and ranging (LiDAR) data with the use of the random forest algorithm enables us to produce an accurate succession species map. First, feature extraction techniques are applied to 1-m hyperspectral images and a ∼7 point / m2 dense point cloud. Minimum noise fraction layers and vegetation indices are calculated from the hyperspectral data and geometry related indices from the LiDAR data. Finally, the recursive feature elimination algorithm is applied to the combined dataset and the reference polygons to select the optimal set of features for subsequent classification. The results indicate that the proposed methodology has the potential to be used operationally. The final classification product is characterized by a relatively high Cohen’s kappa value of 0.68, with single species classified with various accuracies, expressed by F1 scores ranging from 0.45 to 0.87.
The succession process is currently the subject of a large number of research projects all over the world, for example in Europe,1–3 North America,4–9 South America,10–12 and also in Asia.13 The reason behind the interest in this topic is undoubtedly connected with its significance, which can be seen from three different perspectives. In some areas, similar to the current research, the secondary succession is perceived as a threat to certain ecosystems. This is because large areas of seminatural nonforest communities, e.g., grasslands and meadows (and especially the less productive parts) have been abandoned, resulting in secondary succession.14–16 The cessation of mowing or grazing allows species with clonal growth to complete their full development and induces changes in the quantitative and spatial structures of plant communities.17 One of the results of this process is the disappearance of some groups of species (e.g., heliophilous ones) and the formation of shrub and forest communities created by species that are better adapted to poor light conditions. This situation leads to changes in the composition of species in an ecosystem,18,19 in not only plant species but also animal communities.20–22 As a result, secondary succession also leads to changes at the landscape level. This standpoint is generally taken if the process threatens the existence of protected habitats or the vegetation involves invasive species. This view is represented by Szostak et al.,2 Szostak et al.,3 Vanderlinder et al.,7 and others. In other places, secondary succession is recognized as having a highly positive impact on climate and biodiversity and as an important reservoir of carbon.8,9,11 In most cases, however, the importance of secondary succession is simply explained by the large size of the terrain encompassed, land use and land cover changes, and the resultant need to assess its influence on ecosystem functions and services.1,4–6,10,12,13
Regardless of the reason, studies of secondary succession focus mainly on identifying the places, in which the succession occurs, estimating the size of this area3,7 and describing the stages of the process,1,4–6,9–13 and rarely on determining other characteristics such as the stand age or species richness.8 To the authors’ knowledge, few research projects have investigated the subject of the differentiation of succession species. Although many studies have focused on the identification of tree, shrub, or other plant species, only a small number of these have involved the mapping of succession tree species.23,24 The majority of studies have been carried out in an urban environment25–27 or in different types of mature forests.28–30 This indicates a need for further research into succession-specific challenges, e.g., the possibilities of distinguishing different succession species based on their spectral and geometrical characteristics or approaches to the processing of reference data. This study is a response to these needs. Its primary goals are to (i) examine the possibility of effectively mapping succession species present in an agricultural landscape via the use of simultaneously acquired high-resolution hyperspectral and light detection and ranging (LiDAR) data and (ii) to investigate the most important features of tree-species classification using the recursive feature elimination (RFE) algorithm.
The study area is located in the southern part of Poland, in the Silesian Voivodeship, near Czestochowa city (50°45” N; 19°17” E). It encompasses an area of about . This area, according to the physiographic regionalization of Poland,31 is located in the macroregion of the Krakow-Czestochowa Upland, the mesoregion of the Czestochowa Upland, and the microregion of the Mirowsko-Olsztynska Upland. The area is located in the Warta basin, east of Czestochowa, at the northern end of the Czestochowa Upland, being an enclave of natural and seminatural ecosystems, present between the highly urbanized areas of Silesia and Czestochowa industrial districts.
The thermal conditions of the area are typical for the climatic region of the central highlands, with an average annual temperature of 7.6°C. The coldest months of the year are January and February, and the average temperatures during these months are relatively low, approaching . The average annual air temperature amplitude is typical of western and central Poland, amounting to 20.8°C. Due to the thermal conditions, the vegetation period lasts for about 210 days per year. Snow cover, for which the durability depends on a negative air temperature, persists in the upland area for about 50 to 70 days. Precipitation is relatively high, exceeding 700 mm per year.
The area includes a complex of limestone hills (absolute heights range between 278 and 374 m) with numerous karst forms, such as caves, crosses, fissures, sinkholes, and so on. The hills are covered by natural forest communities (mostly beech and hornbeam forests) or seminatural species-rich grasslands.32 The areas adjacent to the hills are occupied by cultivated fields and pine forests. The area is characterized by a wide diversity of habitats, with a considerable wealth of plant and animal species;33 it is, therefore, protected as a Natura 2000 site known as Ostoja Olsztynsko-Mirowska (PLH240015) (Fig. 1).
The most important threatened nonforest habitats in this area are two types of grassland (codes 6120 and 6210), which were analyzed in this research. The first type, habitat 6120, consists of dry, frequently open grasslands on more or less calcareous sands with a subcontinental centre of distribution, comprising seminatural, moderately open to closed, relatively low-grown mesoxeric grasslands on slightly calcareous sands in the lowlands, and medium height mountains throughout temperate Europe. These grasslands are mostly dominated by tussock-forming, narrow-leaved grasses of the Festuca ovina aggregate, often accompanied by Agrostis capillaris, Poa angustifolia, or Carex praecox. If they are in good condition, they are relatively rich in herbs, which form an important nectar source for insects.34
The grasslands of habitat 6210 are among the most species-rich plant communities in Europe in the number of plant species they support per unit area.35 The calcareous grasslands of northwest Europe, for instance, consist of up to 80 plant species per square metre.32,36 They also contain a large number of rare and endangered species, including the priority species listed in Annex II of the Habitats Directive, as well as various bryophytes and lichens.37 In Poland, the floristic diversity of these grasslands is created by the many rare species of vascular plants, including orchids.38 The invertebrate fauna associated with this habitat, particularly butterflies, is also rich37 and includes a number of species listed in the Habitats Directive.39
In moderate climate regions, both types of grasslands described above depend on extensive agricultural management, and primarily grazing. The abandonment of agropastoral activities results in the development of “scrubland facies,” represented by thermophile scrub with an intermediate stage of the thermophilous fringe vegetation (Trifolio-Geranietea).40 Calcareous grasslands evolve toward formations such as Rosa spp., Prunus spinosa, Rhamnus cathartica, Crataegus spp., Cornus sanguinea, Juniperus communis, and Betula pendula (10 to 15 years) and, in the succession process, toward the formation of more complex forest (of the order Fagetalia), over 70 years or more.41 Scrub encroachment is the most frequently documented reason for change in 6210* sites; this is considered to be an acute threat because it can result in an increase in soil nutrients and a decline in the richness of grassland species42 as the succession progresses.
An overview of the landscape and selected succession species of the study area is presented in Fig. 2.
Materials and Methods
The research described here was carried out using two types of data: remote sensing and field data. Remotely acquired data consisted of hyperspectral images, RGB images, and LiDAR data (Fig. 3). The technical parameters characterizing the sensors and the data they provided are presented in Table 1. The collected field data could be divided into two groups: the first was spectrometer measurements that were needed for the evaluation of the atmospheric correction of hyperspectral imagery, and the second comprised botanical measurements acquired in the form of GPS points, with supplementary attributes characterizing the trees and shrubs occurring in the area under study (i.e., the height of an object, its crown density, crown radius, level of discoloration, level of defoliation, and neighboring vegetation type). The species were divided into two groups: (i) species playing a role in the early stages of the succession process, referred to as “promoters” (seven species) and (ii) nonsuccessive species referred to here as “other,” which comprised the remainder of a diverse range of species occurring in the study area. The total number of collected polygons was 406 (Fig. 4); after processing, they served as a classification reference.
Technical parameters for the sensors used and the data they provided.43,44
|Technical parameters||Sensor type|
|Airborne laser scanner (FWF) Riegl LMS-Q680i||Hyperspectral camera HySpex||Medium format camera RGB|
|Flight altitude (m)||735|
|Point density (points per square meter)||7||—||—||—|
|Spatial resolution (m)||—||0.5||1||0.1|
|Spectral resolution||430 bands encompassing 0.4- to spectral range||Red, green, and blue bands|
|Spectral sampling (nm)||—||3.26||5.45||—|
|FOV max (deg)||60||34||34||52|
|Overlap width (m)||855||450||450||720|
It is especially important that all of the data mentioned already were acquired simultaneously. Acquiring all remote sensing data at the same time enabled subsequent uncomplicated and precise (low registration error) data fusion. Remote sensing data were acquired in early autumn, between September 10 and 13, 2016, that is, at a time of intensive discoloration of the leaves, which provided the opportunity to discriminate between species. Field data were gathered as close as possible to the date of the flight: botanical field campaign measurements were made on September 29 and 30, 2016.
Overview of Workflow
The research involved several steps, which are presented schematically in Fig. 5.
First, the raw hyperspectral and LiDAR data were the subject of initial processing. The accuracy of the atmospheric correction of hyperspectral imagery was evaluated, making use of spectrometer measurements made on surfaces that were assumed to be unchangeable, such as asphalt. Raw RGB images were also orthorectified and the final product served for visual interpretation throughout all the stages of the research. In the second step, dedicated to feature extraction, preprocessed hyperspectral and LiDAR data were used to prepare new informative products. The hyperspectral imagery alone and the datasets created by combining it with the LiDAR data are known to be powerful sources of information about vegetation and have been used in many research studies involving species discrimination.23,25,26,28–30,45
In the next stage, botanical field measurements were manually processed, which made it possible to determine the outlines of the research objects. The polygons created at this stage were later used as a reference for species classification. The supervised feature selection done in the fourth step made it possible to choose an optimal set of features for subsequent classification. The result of the classification process was then masked based on three criteria: height, land cover, and the presence of shadows. In the final step, the result was evaluated using statistical measures.
The whole process is described in detail in the following sections, which discuss each of the aforementioned issues.
Preparation of reference data
The source data for preparation of the reference polygons were botanical field measurements indicating the location and characteristics of the research objects, which were trees and shrubs. Points were collected using a differential global positioning system (DGPS) receiver, with error values of between 0.3 and 0.5 m on average and never exceeding 1 m. Although the measurement accuracy was high, the creation of a reference polygon in the form of a buffer around a point would not be an ideal approach. Bearing in mind the relationship between the measurement accuracy and the size of the object, it could be expected that several valid pixels would be omitted and some invalid ones added to a reference set in this way.
The procedure chosen in the first step assumed the creation of a buffer whose radius was determined to be the object’s radius, as measured by the botany expert in the field, plus a few (2 to 3) meters. The additional enlargement of the buffer was done to account for a DGPS measurement accuracy, as well as possible little misalignments between the measurements and remote sensing data. The prepared vectors, i.e., circles, were then projected onto the pixel grid of the hyperspectral data, dividing each circle into a number of elements reflecting pixels. In the last step, these elements were visually evaluated against the presence of an object (a tree or a shrub), mixed pixels on the border of an object and its surroundings, and the presence of shadows. This work was carried out by a botany expert, who utilized many different materials: orthophotos, hyperspectral mosaic, and a crown height model (CHM). An example of the resultant polygons can be seen in Fig. 6.
A few nonsuccessive species present on the research area occurred mostly in high forest stands, and therefore, reference polygons created for these specimens could not be based on precise dGPS measurements. However, they were highly different spectrally and in crown shape, thus easy to identify visually and vectorize on the remote sensing materials.
Finally, the prepared reference polygons were divided into training and validation samples using stratified random sampling. Each stratum was a combination of a species and the values of parameters that in the authors’ opinion characterized the most important of its features: its height, crown density, and size defined by the radius. This operation ensured diversification of both the training and the validation sets for each species. The final number of reference polygons used in the classification process is presented in Table 2.
The number of reference polygons used in the classification process.
|Species name||Number of calibration polygons||Number of validation polygons|
|Silver birch, Betula pendula||14||18|
|Scots pine, Pinus sylvestris||15||19|
|Common juniper, Juniperus communis||20||25|
|Buckthorn, Rhamnus cathartica||13||20|
|Hawthorn, Crataegus spp.||15||19|
|Blackthorn, Prunus spinosa||13||17|
|Common pear, Pyrus communis||13||20|
LiDAR data processing
Raw LiDAR data were preprocessed and oriented according to the standard procedure in Riegl’s dedicated software RiProcess (RIEGL Laser Measurement Systems GmbH, Horn, Austria).46 Full-wave signal decomposition was also performed in Riegl’s software (RIEGL Laser Measurement Systems GmbH, Horn, Austria),47 using a Gaussian decomposition technique to describe the recorded signal, its extracted echoes and their descriptive parameters such as the echo width and amplitude. Next, the preprocessed point clouds were classified using Axelsson’s algorithm48 into ground, vegetation, and unclassified, according to the ASPRS LAS classification standard, in Terrasolid software (Terrasolid Ltd., Helsinki, Finland).49 The result of the automatic classification was manually checked using cross sections views of the point cloud in TerraScan and all detected errors were manually corrected by the airborne data provider.
In the next step, LiDAR-based rasters, an input dataset for species classification, were generated. The processing of point clouds was performed using TU Wien OPALS software50 and standard rasterization methods. To calculate several raster models, several additional calculations were needed directly in the point clouds, for example for the normalized height or echo ratio parameter51 of each point. The final raster datasets were divided into three groups based on the point cloud class used to create each raster and according to the scheme used in BCAL software:52 an ALL group (generated based on the points of all classes except noise), a GRD group (generated based on ground points only), and a VEG group (generated based on vegetation classes points). This kind of division not only makes it possible to describe the parameters of the vegetation itself more effectively but also to gather information about other parameters depending on penetration (i.e., the number of ground returns) and the ratio of vegetation to nonvegetation points. These models include, for instance, total vegetation density, vegetation cover, or Interquartile range of height (IQR).53 All of the products generated in this way are listed in Table 3. Similar features characterizing the structure of the analyzed objects were also used in research studies by Chance et al.,25 Alonzo et al.,26 and Shen and Cao.28
LiDAR-based raster dataset used in the experiment.
|Feature type||Statistics generated||ALS classes|
|Amplitude||Min, max, mean, and variance||ALL, GRD, and VEG|
|Echo width||Min, max, mean, and variance||ALL, GRD, and VEG|
|nDSM||Min, max, mean, and variance||ALL and VEG|
|nDSM||Range, quantiles 0.05 to 0.95||VEG|
|DTM||Min, max, mean, and variance||GRD|
|Echo ratio||Min, max, mean, and variance||ALL, GRD, and VEG|
|Intensity||Min, max, mean, and variance||ALL, GRD, and VEG|
|Point density||Number of||ALL, GRD, and VEG|
|Total vegetation density||Vegetation to ground points ratio||—|
|Vegetation cover||Vegetation to all points ratio||—|
|IQR, interquartile range of height||Range between quantiles 0.75 and 0.25||—|
|Slope||DSM-based slope map||ALL|
Hyperspectral data processing
The processing of hyperspectral data started with the transformation of single raw images into the useful product: a true orthomosaic. First, the images were radiometrically calibrated using Hyspex RAD software (Norsk Elektro Optikk AS, Skedsmokorset, Norway),54 and a geometric correction was then applied using PARGE software (ReSe Applications GmbH, Wil, Switzerland).55 Next, the images were subjected to atmospheric correction, which was done with the use of ATCOR4 software (ReSe Applications GmbH, Wil, Switzerland)56 omitting corrections for terrain topography using variable water vapor and visibility estimation and predetermined aerosol type (rural). The bands covering wavelengths longer than contained high noise, were cropped, leaving 430 bands that were subjected to the spectral polishing process using the Savitzky–Golay algorithm with a range of 13 bands.57 In the last step, the single corrected images were merged to produce a mosaic. The mosaicking process was carried out in PARGE (ReSe Applications GmbH, Wil, Switzerland),55 using the “center cropped” option that finds the middle of the overlapping areas between the images as a cut line. As mentioned in the overview above (Sec. 3.2), the results of the atmospheric correction were evaluated based on field measurements done using ASD FieldSpec 4 spectroradiometer (ASD Inc., Longmont, Colorado). All of this work was done by the airborne data provider.
The output of the processing specified already was used in the next step to produce new, informative products. The choice on the set of products followed extensive research studies which aimed at determining the datasets that perform best in distinguishing between species. Feature extraction involved calculating the minimum noise fraction (MNF)58 and vegetation indices. MNF is a linear transformation consisting of principal component analysis rotation, smoothing, rejection of the most noisy components, and finally retransformation to the original spectral space.58 It was performed using ENVI software (Harris Geospatial Solutions, Broomfield, Colorado)59 and the resultant layers were put through a preselection process, in which the noisy layers were deleted and only informative layers were kept. Vegetation indices were calculated using both ENVI (Harris Geospatial Solutions, Broomfield, Colorado)59 and EnMAP-Box software (The Environmental Mapping and Analysis Program, Earth Observation Center EOC of DLR, Germany),60 and many different vegetation features were characterized.
In the next step, the products of the LiDAR and hyperspectral data processing underwent a feature selection procedure. As described by Archibald and Fann61 and Singh et al.,62 the application of feature selection is an important step that should be implemented before classification, as it can reduce the time taken to build the learning model and increase the accuracy of the classification result, improving its generalization ability as well. These benefits can be achieved by removing redundant and insignificant features. In this research, the feature selection process was performed using the RFE algorithm based on a random forest estimate of feature importance. The chosen algorithm is categorized in the literature as a wrapper approach.62
As a standard part of the random forest learning algorithm, an internal assessment of the variable importance is calculated. For every model learned using the random forest classifier, an estimation of the importance of each feature used for classification is available in the form of a value between 0.0 and 1.0.
When used in an RFE workflow, this possibility is used in the following way. First, a classifier is learned using the full set of features. The resulting importance of the variable (or feature) is then analyzed and typically one feature (or sometimes more) with the lowest score (i.e., the least useful in discriminating between target classes) is eliminated from the feature set. The next round of learning is then performed, eliminating the next weakest feature, and this procedure is repeated until all the features are exhausted or the accuracy of the model starts dropping significantly, which happens as the number of features gets too small to discriminate the classes.
A full set of validation metrics was calculated at each stage of this procedure (i.e., number of features, Kappa values calculated in both hard and fuzzy way, validation accuracies on the training, as well as on validation set), then the metrics from the whole procedure were plotted on a graph. Example of the selected metrics is presented in Fig. 7. The number of features considered “best” was selected after manual analysis, aiming for relatively small number of most informative features still resulting in good learning/validation performance for a given dataset.
These properties of the algorithm made it possible to choose an optimal set of features for classification, a combination of hyperspectral and LiDAR-based layers, making use of Cohen’s kappa value63 to characterize each of the models.
Classification and postprocessing
As indicated in the previous sections, the classification involved making use of both the set of layers determined during the feature selection stage and the reference polygons. The classification was done using a pixel-based random forest algorithm, treating the succession species as separate classes, and the rest of the species present in the research area as one class, in a similar way to Colgan et al.45 and Graves et al.23
The result obtained from a classifier was an image indicating one chosen class for each pixel, meaning that the research area was completely classified. Given the way in which the algorithm works, there was no measure that specified the degree of similarity between the model (training data) and each pixel analyzed. Therefore, areas that were outside the interest of the research could not be eliminated automatically in the same step.
The exclusion process was realized by preparing a specific mask. Four types of deletion criteria were defined:
• Anthropogenic objects (i.e., nonvegetative areas outside the analysis in the research) were selected using 0.4 threshold value of the normalized difference vegetation index (NDVI).
• Shadows (i.e., places without proper spectral information, where a correct classification result could not be expected) were delineated using 0.2 threshold value of a selected near-infrared band (band 140–).
• Very short vegetation, that is, mainly grasses, short herbal, and agricultural species, but sometimes also young individuals representing succession species (the first group do not take part in an ongoing succession process, whereas the second could not be properly classified using data characterized by the parameters specified in Sec. 3.1), was delineated using 0.3-m threshold value of the CHM.
• High vegetation including old trees, some of which could also originate from succession but which were not of primary interest in the research, was delineated using 7-m threshold value of the CHM.
The defined threshold values were determined visually in an expert approach.
Feature Selection Result
The procedures described in the previous section gave rise to the following products. The first one was an optimal set of features (hyperspectral and LiDAR-based raster products) for species discrimination and thus for use in classification. From the 177 features entered into the feature selection algorithm, a set of 24 were chosen, as listed in Table 4.
Set of features chosen at the feature selection stage.
|No.||Feature category||Feature name (for LiDAR-based: class, type, and statistic)|
|1||LiDAR-based||ALL nDSM max|
|2||LiDAR-based||ALL nDSM mean|
|3||LiDAR-based||ALL nDSM min|
|4||LiDAR-based||ALL nDSM variance|
|5||LiDAR-based||ALL Echo ratio mean|
|6||LiDAR-based||nDSM sigma 3 sigma 0|
|7||LiDAR-based||nDSM sigma 5 sigma 0|
|8||LiDAR-based||VEG nDSM mean|
|9||LiDAR-based||VEG echo ratio mean|
|10||LiDAR-based||VEG echo ratio min|
|11||LiDAR-based||VEG intensity mean|
|12||ENVI vegetation indices||ARI2|
|13||EnMap-box vegetation indices||LCI|
|14||EnMap-box vegetation indices||LWVI2|
|15||EnMap-box vegetation indices||PSSrc|
|16||EnMap-box vegetation indices||SRtot|
The set comprised 11 LiDAR-based products and 13 hyperspectral ones (five vegetation indices and eight MNF products) suggesting that both types of information are important and a diversified set is needed. Similar conclusions were drawn by Alonzo et al.,26 Shen and Cao,28 Colgan et al.,44 and Dalponte et al.29
LiDAR-based products express the geometrical characteristics of the analyzed objects. The ability of airborne laser scanning (ALS) to penetrate the vegetation and register multiple echoes allows the determination of many features of trees and shrubs, such as the heights and shapes of individuals, the relative proportions of different parts of the objects, and, indirectly, leafage-type characteristics. The use of these kinds of variables in the feature selection process makes it possible to find the unique characteristics of the species or ranges of values with which the species may be described. As the results show, discrimination between succession species in the research area was mainly achieved in the geometrical domain by the use of features representing different statistics of the normalized digital surface model (nDSM), and therefore focused on the height of the objects. Additionally, three of the chosen features were statistics of the echo ratio, which is a measure of local transparency and roughness.51 Two other features that were used were also connected with roughness. Sigma 0, which is the standard deviation of the distance between the points and a fitted plane, was calculated from the nDSM with a three- or five-pixel kernel. The set of features chosen in the feature selection process also included a variable characterizing the intensity of the LiDAR signal. Since the amount of reflected and registered laser energy depends on the character of an object, this phenomenon also follows the shape and complexity of a tree and is consequently relevant in the differentiation of species.
The chosen hyperspectral raster products were complementary to the characterized LiDAR-based features. Five vegetation indices turned out to be important for species differentiation: anthocyanin reflectance index 2 (ARI2),64 leaf chlorophyll index (LCI),65 leaf water vegetation index (LWVI2),66 simple ratio (SRtot),67 and pigment-specific simple ratio for carotenoids (PSSRc).68 Four of these refer to vegetation characteristics that are related to the visible part of the electromagnetic spectrum, i.e., the presence of leaf pigments (chlorophylls and carotenoids). The amounts of the specified pigments are functions of time and may change differently for different species; these are, therefore, significant in the analysis. The results of the study by Alonzo et al.,26 which focused on urban tree species discrimination, confirm the high information content of the visible spectral range when compared to other parts of the electromagnetic spectrum.
The final index chosen LWVI2 is connected with the amount of water present in leaves, a feature that is distinguished in the short-wave infrared part of the spectrum. The amount of water (leaf turgor) is strongly dependent on the air (temperature and humidity) and soil (primarily moisture) conditions, in which the plant grows. For individuals growing relatively close to each other in similar habitat conditions, it can be expected that the amount of water in the leaves is partially a function of the species.
Eight products of MNF transformation were difficult to analyze in detail; the spectral origin of each MNF band is not easy to trace, as it results from complex processing of source spectral bands. It can only be observed that the chosen set did not comprise the most informative layers, that is, the first ones. On the contrary, it included layers that were characterized by different information contents, proving that the features needed to discriminate between species are very specific.
As shown in the overview of the workflow (Sec. 3.2), the set of features chosen as a result of implementing the feature selection algorithm became the subject of classification. Maps showing the spatial distribution of the succession species in the research area are presented in Fig. 8.
The maps reveal that the most frequently occurring succession species are Pinus sylvestris and Betula pendula (mainly in the northern part of the study area). A large part of the research area is also occupied by nonsuccessive species, referred to here as “other.” Both of the above species, Pinus sylvestris and Betula pendula, occur in compact single-species patches. In most cases, Pinus sylvestris is present close to the border between arable land and dense coniferous stands, whereas Betula pendula occurs in various places. The rest of the classified succession species do not form regularly shaped patches and are mostly scattered. In general, the trees and shrubs shaping the succession process form elongated patches corresponding to the fallows on or close to which they grow. This is obvious given that fallows, as unused land, are places with mostly undisturbed ecological processes.
The classification result was evaluated using statistical measures as part of an accuracy assessment. The report included the value of Cohen’s kappa, which characterizes the global accuracy of the product; scores, which provide information about the classification accuracy of each species (Table 5); and a confusion matrix (Table 6).
F1 scores and Cohen’s kappa.
|Species name||F1 score|
|Silver birch, Betula pendula||0.756|
|Scots pine, Pinus sylvestris||0.874|
|Common juniper, Juniperus communis||0.603|
|Buckthorn, Rhamnus cathartica||0.451|
|Hawthorn, Crataegus spp.||0.456|
|Blackthorn, Prunus spinosa||0.694|
|Common pear, Pyrus communis||0.531|
|Other species||Common juniper||Scots pine||Hawthorn||Buckthorn||Common pear||Silver birch||Blackthorn||Total||Producer accuracy (%)|
|User accuracy (%)||86.5||55.7||93.0||45.0||68.6||88.4||90.1||83.3|
The value of Cohen’s kappa of 0.681 suggests that discrimination between succession species was relatively successful; however, individual species were classified with various levels of accuracy. The highest score of 0.919 characterized the “other” class, which could have been predicted in view of the large amount of reference data in the class and the broad range of values it encompassed. The confusion matrix indicates that this class had a very low omission error (1.9%) and a higher commission error (13.5%), spreading into other classes, especially Pyrus communis, Betula pendula, and Prunus spinosa. The same tendencies in the commission error of the “other” class were observed by Colgan et al.45 and Graves et al.23
It can also be noticed that three succession species, Betula pendula, Pinus sylvestris, and Prunus spinosa, were classified with high accuracy, with scores of 0.756, 0.874, and 0.694, respectively. The four remaining species were classified with average accuracy, ranging from for Rhamnus cathartica to for Juniperus communis.
In this study, a methodology for identifying succession species in an agricultural landscape was proposed and evaluated. The results obtained here indicate that the selected approach is valid, although there are several points that can be further developed.
The first of these is the reference set, which plays a critical role in the classification. The way in which this is prepared affects all stages of processing. The reference polygons determine the training (class definition) and validation (selection of an optimal set of features during the feature selection step, determination of accuracy assessment validity) data. The level of difficulty characterizing the preparation of an appropriate reference set in the agricultural environment can be realized by comparing the nature and size of research objects (trees and shrubs) with the economically viable spatial resolution of the airborne imagery. The monitoring of succession is mainly focused on small vegetation identified at an early stage of the process. The sizes of the objects to be classified are predominantly equal to four or nine times the size of a pixel and represent crown diameters of 1, 2, or 3 m, respectively. Additionally, every individual can be characterized by a different crown density, usually ranging from 30% to 100% at the time of airborne data acquisition. As a result of these two issues and the fact that borders between neighboring image elements (pixel grid) are created randomly in relation to the research objects, a relatively small number of pure pixels and a large number of mixed ones are present. The spectral response from the mixed pixels is a different proportion combination of the signal from the analyzed tree or shrub and other objects present below or next to the crown, such as the ground or shorter or neighboring vegetation. The selection of pixels that have been sufficiently well defined in the spectral and geometrical domains for addition to the reference set is, therefore, very problematic, and it is important for future studies to specify the minimal values for key parameters of the research objects (e.g., crown density) that will allow for a good definition. Determining these lower thresholds will help make the consumer of the final product aware of the map’s information content. It can also support botany experts in planning field campaigns by identifying objects that are suitable or unsuitable for measurement.
As explained in Sec. 3.2.1, the reference polygons used in this research were created manually by selecting groups of valid pixels (in the expert’s opinion) based on field data (points indicating the object’s location) and remote sensing data. This approach has both advantages and drawbacks. On the one hand, the manually prepared outline of a crown is likely to be more precise than one created with the use of an automatic tool. On the other hand, the manual treatment of mixed pixels may be inconsistent between different individuals, while an automated tool would act in a reliable way and would create an approach that is better fitted to the data. Future research on this subject should be focused on testing different segmentation methods or other automated procedures. The results of research studies by Alonzo et al.,26 Graves et al.,23 Colgan et al.,45 and Shen and Cao28 suggest that these techniques can be used for the correct and efficient delineation of single trees based on the remote sensing data.
The most problematic issue concerning the classification stage seems to be the definition of classes. The approach applied in the current work is directly related to the project’s needs: each succession species was treated as a separate class, whereas the rest of the vegetation occurring in the study area was placed in the “other” class. The results of the accuracy assessment suggest that a few species are very similar to each other, based on remote sensing measures, and that merging these should probably be considered in the future. Incorporating an automatic analysis of the separability of the classes into the methodology may be an optimal way of dealing with this problem. This separability analysis needs to be done individually, not only for different groups of species but also for other individuals (due to intraspecies diversity) and obviously for remote sensing data acquisition dates, as a result of the vegetation’s phenological stages over time.
The postprocessing of the final product is also a challenging task. An image resulting from the application of the random forest classifier characterizes each pixel as belonging to a single chosen class; that is, the research area is entirely classified. The correct delineation of the spatial extent of the species, therefore, requires further processing. This can be done in several different ways:
• By collecting the reference polygons for each of the classes present in the research area, this may be time-consuming and therefore expensive, especially if the study site is characterized by high biodiversity, i.e., a large number of species.
• By creating a special mask, i.e., a layer containing the objects and areas for which a reference was not provided and which are therefore to be deleted from the classification map. This method has been proven to be effective for eliminating objects and areas that are relatively easy to define, such as anthropogenic areas or water, but is often useless for deleting nonsuccessive species that are very similar to the research objects.
• Using a measure of similarity between the model (training set) and an analyzed pixel. The calculated values can be thresholded to select pixels with low similarity values, i.e., those that should not be included in any of the analyzed classes. This is the most straightforward approach, but is not always available for a given classifier, for example, the random forest algorithm used in this research. This problem was also discussed by Alonzo et al.26
• By utilizing a combination of the aforementioned approaches.
As indicated in Sec. 3.2.5, the classification map was postprocessed using a combined approach. Different nonvegetative objects and areas were delineated using a special mask, while pixels representing nonsuccessive species were chosen by creating an “other” class from a selection of representative species. This approach is easy to implement: it does not require any complicated calculations and can be applied without collecting a reference for each species. However, it often overestimates the “other” class because of its high variance. The issue of the definition of the “other” class should be studied further in the future.
The aim of this research was to determine the potential of an accurate identification of succession species present in an agricultural landscape with the use of high-resolution aerial remote sensing data. Promising results were obtained from classifying a dataset comprised of hyperspectral and LiDAR-based products using the random forest algorithm. The use of the RFE algorithm also enabled us to observe that both data sources are important in the correct discrimination of species. The operational use of the methodology presented here requires several elements to be developed further; these are the preparation procedure for the reference polygons and the definition of the classes used in classification.
The authors would like to express their gratitude to MGGP Aero Sp. z o.o. for preprocessing the hyperspectral and LiDAR data and to Justyna Wylazłowska for helping with the botanical field campaign. This research was partially funded by the Poland National Centre for Research and Development. Three types of experts are involved in the project’s implementation: airborne remote sensing data provider (Consortium Leader: company MGGP Aero), botany experts (project partners, scientific units: University of Lodz, Institute of Technology and Life Sciences, and University of Silesia in Katowice), and remote sensing specialists (project partners, scientific units: Warsaw University of Technology, University of Warsaw, and Warsaw University of Life Sciences). The authors declare no conflict of interest.
Aleksandra Radecka received her BSc and MSc degrees in spatial planning from Warsaw University of Technology (WUT) in Poland, in 2015 and 2017, respectively. She is currently pursuing her PhD in geodesy and cartography at WUT. Since 2016, she has worked as a research assistant in the Department of Photogrammetry, Remote Sensing and Spatial Information Systems, WUT. Her research interest includes the usage of thermal and hyperspectral remote sensing techniques for environmental studies.
Dorota Michalska-Hejduk received her MSc degree in biology and her PhD in ecology from the University of Lodz in Poland where she is working in the Department of Geobotany and Plant Ecology. Her research interests include plant science, nature protection, monitoring of Natura 2000 habitats and species, and ecological processes. In 2014, she received an award for services to “Environmental Protection and Water Management” from the Ministry of the Environment in Poland. She is a member of the Society of Ecological Restoration.
Katarzyna Osińska-Skotak received her MSc degree in photogrammetry and cartography in 1994, her MSc degree in meteorology and atmosphere protection in 1997, her PhD in 2001, and her DSc degree in photogrammetry and remote sensing in 2011 from Warsaw University of Technology (WUT) in Poland. Since 2012, she has been a head of the Department of Photogrammetry, Remote Sensing and Spatial Information Systems at WUT. Her research interests include the applications of thermal, multi- and hyperspectral remote sensing techniques in environmental and urban monitoring. She is a board member of the Remote Sensing and Geoinformatics Group of the Polish Geographical Society.
Adam Kania has experience in private consultancy for business in areas such as environmental management, quality management systems, IT management, and software development. His areas of interest also included data processing, statistics, machine learning, optimization, and AI algorithms. He is currently leading the development of the Vegetation Classification Studio—a scientific software tool for fast and effective classification of remote sensing data. Last year, he founded Definity Ltd., a company working in research and development of algorithms and tools for AI, machine learning, remote sensing, and scientific computing.
Konrad Górski received his MSc degree in photogrammetry and remote sensing in 2016 from Warsaw University of Technology (WUT) in Poland. He is now a PhD student at the WUT Faculty of Geodesy and Cartography. From 2016 to 2018, he has been working in the Department of Photogrammetry, Remote Sensing and Spatial Information Systems at WUT. His research interests include employing new sensors and processing techniques in a wide range of specialized applications, especially modern ultralight sensors compatible with unmanned platforms, to enhance industry capabilities.
Wojciech Ostrowski received his MSc degree in geodesy and cartography (specialization: photogrammetry and remote sensing) in 2013 from Warsaw University of Technology (WUT) in Poland, where he has been working as a research-teaching assistant in the Department of Photogrammetry, Remote Sensing and Spatial Information Systems, since 2013. His research interests include LIDAR data applications in archaeology prospection and remote sensing as well as photogrammetric processing of UAV and aerial oblique images. He is a member of the Polish Society for Photogrammetry and Remote Sensing, and the Computer Applications and Quantitative Methods in Archaeology and Aerial Archaeology Research Group.