Rapid generation of global forest cover map using Landsat based on the forest ecological zones

Abstract. The easy and ready access to Landsat datasets and the ever-lowering costs of computing make it feasible to monitor the Earth’s land cover at Landsat resolutions of 30 m. However, producing forest-cover products rapidly and on a large scale, such as intercontinental or global, is still a challenging task. By utilizing the huge catalog of satellite imagery as well as the high-performance computing capacity of Google Earth Engine, we proposed an automated pipeline for generating 30-m resolution global-scale forest map from time-series of Landsat images. We describe the methods to create products of forest cover at a global scale. First, we partitioned the landscapes into subregions of similar forest type and spatial continuity. Then, a multisource forest/nonforest sample set was established for machine algorithm learning training. Finally, a random forest classifier algorithm was used to obtain samples automatically, extract the characteristics of satellite images, and establish the forest/nonforest classifier models. Taking Landsat8 images in 2018 as a case, a novel 30-m resolution global forest cover (GFC30) map has been produced. The result shows that by the end of 2018, the total forest area in the world was 3.71  ×  109  ha. The accuracy evaluation of GFC30 for 2018 was carried out using verification points via stratified random sampling of a MODIS land cover map (MCD12C1 product in 2012) and verified on high-resolution satellite imagery (e.g., Google Earth). According to the validation result, the overall accuracy of GFC30 for 2018 is 90.94%.


Introduction
Forests cover about 30.6% of the Earth's land area and constitute a critical terrestrial ecosystem. Forest cover change (FCC) is highly relevant to the global carbon cycle, water supplies, biodiversity richness, and for understanding the rates and causes of land use change. As global ecological environment changes and population growth and ensuing human activities intensify, global forest cover (GFC) has decreased from 4128 × 10 6 ha in 1990 to 3999 × 10 6 ha in 2015. There was a net loss of some 129 × 10 6 ha of forest between 1990 and 2015, which was about the size of South Africa, representing an annual net loss rate of 0.13%. 1 Yet deforestation, or forest conversion to other land use, is more complicated than that. Therefore, forest cover mapping has important practical significance and scientific value in terms of the spatial and temporal detailed change on global-scale forest. 2 In recent years, there has been an increasing need for 30-m forest cover mapping because of the finer scale of forest change especially those resulting from anthropogenic factors. [3][4][5] Mapping forest cover and FCC are two of the most common uses of Landsat data. However, Landsat data have been used at relatively local or national scales for land cover mapping. Peter Potapov et al., 4 Song et al., 5 and Huang et al. 6 are some of the few researchers who studied wall-to-wall change detection at national scales. The Forest Resource Assessment (FRA) of the United Nations' Food and Agriculture Organization (FAO) carried out the first systematic estimates of global forest land use and change between 1990 and 2005. 7 More recently, Landsat samples, along with wallto-wall datasets, have been used to monitor forest loss in the tropics and subsequently for the globe with forest loss and gain since 21st century. [8][9][10] Nevertheless, these approaches do not meet the need for continental-or global-scale forest cover mapping using Landsat data because of the long period of data processing and it often takes a few years to update. The other major constraint is the very large computational and storage demands in processing huge volumes of high-quality data. However, owing to Google Earth Engine (GEE), a new generation of cloud computing platforms with access to a huge catalog of satellite imagery and global-scale analysis capabilities is now available. It is now possible to perform global-scale geospatial analysis efficiently without caring about preprocessing of satellite images. 11 GEE is a cloud-based platform that makes it easy to access high-performance computing resources for processing very large geospatial datasets, without suffering the pains surrounded. Midekisa et al. 12 used Landsat on GEE for land cover change over continental Africa and found it overcome the computational challenges of handling big earth data. Sidhu et al. 13 used GEE to detect land cover change in Singapore. Although those works tried to use GEE to serve regional or continual land cover mapping, results of global scale have not been reported. However, with access to a huge catalog of satellite imagery and global-scale analysis capabilities, it is possible to perform parallel analysis and operation efficiently on GEE for rapid production of global mapping.
Since mapping over large landscapes typically involves many satellite scenes and a complicated classifier, it will cost a significant amount of time to process the globe as a file. A common method is to stratify landscapes into subregions of similar biophysical and spectral characteristics. This process is not new to remote sensing and has been widely used as a method to improve accuracy and efficiency. [14][15][16][17] In 1990, FAO has developed an ecological zone covering only the tropical areas to presents forest data, and now a Global Ecological Zone (GEZ) was developed over years, which can be downloaded at Ref. 18. The new GEZ map has some ecological meaning that can be more generally understood as broad forest types (e.g., tropical rain forests and boreal forests). Therefore, we can use the ecological zones as areas with similar features and grouped into a single file.
In addition, a large number of high-quality sample points are very important for global products. So far, many of the land cover sample sets and related services have been publicly released worldwide [such as 30-m or coarse resolution global land cover (GLC) maps, Crowdsourced data]. Making full use of the existing data is an effective way to solve the problem of largescale global sample points.
In this study, we test an automated approach for forest cover mapping with Landsat images in GEE. By utilizing the forest ecological zones (FEZs) as a processing unit, making full use of the existing data to obtain global forest sample points, as well as with the huge catalog of satellite imagery on GEE, we proposed an automated pipeline method for generating 30-m resolution global-scale forest cover map. A novel 30-m resolution global forest map of 2018 has been produced and was verified by accuracy assessment.

Forest Ecological Zone Map
Since global mapping usually involves a great amount of data and a complicated classifier, areas with similar features are grouped into a single file; the FEZ map classification is one of them. 18 The underlying concept of FEZ delineation is a preclassification division of the landscape into a finite number of units that represent relative homogeneity with respect to landform, soil, vegetation, spectral reflectance, and image footprints at a project scale that is affordable.
The FEZs are based on several existing global maps, starting with the FAO GEZ map, a map designed for reporting forest and forest change statistics in the context of the FRAs. 19,20 The GEZ map is basically a map of natural vegetation types. Based on GEZ map, the type of main forest is extracted, and the other vegetation types are merged, and at the same time the boundary is modified by reference to several global products primarily developed for land cover or forest cover. Forest and nonforest areas were included in order to provide global coverage, and there are 45 FEZs in global region. The 45 FEZs are identified by a code from 1 to 45 ( Fig. 1). Classification using FEZs can not only optimize the classifiers, but because the ecological zones are independent, they can also be processed in parallel to reduce processing time.

Multisource Reference Data Integration
A large number of high-quality sample points are an important guarantee for extrapolating forest classification from national to global scale. At the global scale, there are a variety of reference datasets that can be used to support 30-m GLC mapping, such as existing GLC maps at coarser resolution, 30 m or higher resolution of regional land cover data. Online-distributed geospatial datasets and services (such as GEE, Map World, and Open Street Map), as well as land coverrelated services (such as Geo-Wiki and Global Crowdsourced data) also provide valuable external and interoperable ancillary sources of information for forests. [21][22][23] Ancillary data are less uniform than satellite image data, varying in format, accuracy, and spatial resolution. To facilitate the use of such data and their incorporation into classification and validation processes, all the ancillary data were processed and checked carefully. When all the reference data were combined, they were divided into each FEZ and checked by experts for the balance of all forest and nonforest categories. The samples were supplemented by interpretation from high-resolution images, such as Google map. Under the premise of ensuring no less than 1000 sample points per ecological zones, a total of 61,653 points were collected for training datasets and the distribution of the points is shown in Fig. 2.

Forest Cover Mapping via Google Earth Engine
In this study, the definition of forest is the same as used in FRA of FAO, that is, land spanning more than 0.5 ha with trees higher than 5 m and a canopy cover of more than 10%, or trees able to reach these thresholds in situ. 24 A FEZ-based classification approach was developed, as shown in Fig. 3, which mainly consists of three steps, data preprocessing, feature selection, and accuracy assessment. This method takes the FEZs as the processing unit and processes the classification of each zone in parallel in GEE and generates GFC products as results. GEE can combine FEZs and other classification techniques well.

Data preprocessing
In this study, Landsat 8 (OLI) surface reflectance data, which were computed using the Landsat Ecosystem Disturbance Adaptive Processing System, were used. 25,26 A cloud screening algorithm was applied using quality assessment (QA) bands in order to remove snow-and cloudcontaminated pixels for each Landsat image. 27 Annual composites were then produced by taking the median value from images in the target year, plus or minus one year.
In addition to satellite image data, digital elevation model (DEM) and global reference sample points grouped on FEZs were another input data sources. The global training data consisted of 23,569 forest and 38,084 nonforest samples, which were collected from multisource and generated by stratified random sampling (mentioned in Sec. 2.2). DEM is used to derive terrain information.

Feature selection and classification
In addition to producing raw image spectral characters, textural information, as well as terrain features were computed and combined as the input features.
To better finish the analysis on green vegetation's abundance and activity, a series of spectral vegetation indices, based on the diverse spectral features of forest, were set up in past decades.  Normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI), as examples of those indices, have different utilities on the vegetation with different height. NDVI exaggerates the saturation using nonlinear model, which improves the monitoring on low-mid vegetation but does not apply well to high-vegetation cover. On the other hand, EVI, which is generated using the surface reflectance bands 2 (blue), 4 (red), and 5 (near-infrared) of Landsat8, have better performance on high-vegetation cover, because the interference from canopy backgrounds is reduced with this algorithm. Recent research and analysis have shown evidence that vegetation index is effective in distinguishing vegetation area with specific geographical features using spectral characteristics. [28][29][30] Unfortunately, spectral characteristics would be interfered when it is applied on a global scale due to the variation of forest types and conditions. Therefore, the textures, which are a kind of characteristic that reflects the roughness of object surface, are sometimes necessarily used to further extract forest information, because some other geographical features such as shrub or plants have similar spectral characters. In this case, the gray-level co-occurrence matrix is introduced to extract the texture characteristics that we need for forest distinction. According to the research by Zhang et al., [31][32][33][34] the mean of the eight features, which were calculated from the multispectral bands (bands 1 to 7) of the Landsat 8 is the most prominent, and the optimal window size is 19. Furthermore, the vertical zoning and slope or aspect play important roles on the vegetation distribution in certain high-altitude regions such as southwest Tibet. Therefore, this study selected DEM and its derivative data as input features at those regions.
The random forest (RF) algorithm provided by GEE was applied to train a decision classifier. RF classifier with a higher number of decision trees usually provides better results but causes higher cost in computation time. Since the input features of the algorithm that have high sensitivity to forest cover were well selected, as shown in Ref. 31, we limited the number of decision trees in the forest to 100 for trade-off between accuracy and efficiency.

Accuracy assessment
In terms of guaranteeing the accuracy of GFC map and the completeness of QA, a stratified random sampling method was used to generate points for the validation. At the same time, in order to make these generated points to represent the major biomes identified by Olson et al., 19 we use the MODIS land cover map [MCD12C1 product in 2012, using University of Maryland (UMD) scheme] to partition the Earth's land surface into 14 land cover classes. The forest classes were then merged into five categories based on their similarities, including evergreen broad-leaf forest, evergreen needle-leaf forest, deciduous broad-leaf forest, deciduous needle-leaf forest, and mixed forest, and the nonforest classes included four categories, such as shrub, meadow, farmland, and water body. A total of 1500 points are randomly selected for each class and the reliability of those validation points are guaranteed by careful checking.

Forest Cover Map
With the methodology and reference data described in Sec. 2.3, based on the database and RF classifier algorithm provided on GEE, a 30-m-resolution GFC for 2018 (GFC30,2018), as shown in Fig. 4, is generated, which was projected in a geographical (Lat/Long) projection at 0.00025 deg (∼ 30 m) resolution, with the WGS84 horizontal datum and the EGM96 vertical datum. The result consisted of 10 × 10 ðdegÞ tiles ranges from 180 W to 180 E and 80 N to 60 S, which can be freely downloaded from Ref. 35.
According to statistical results, the total area of global forest in 2018 is about 3815 × 10 6 ha, which makes up ∼ 25.60% of the whole land area (about 15 × 10 9 ha). The global forest distribution is shown in Fig. 4. The global forests are unequally distributed, with zonal distribution along latitudes. These major forests include South America, the tropical rainforest area in Southeast Asia and Central Africa, the boreal forest area in Russia and Canada, and the Pacific coast and Atlantic coast. This distribution has certain relevance with the pattern of global climate zones.
The world is divided into four climatic zones, which are tropical, subtropical, temperate, and northern frigid. The distribution of global forest is spatially uneven in different climatic zones. From large to small, the forest area is tropical, northern frigid, temperate, and subtropical, as shown in Table 1 and Fig. 5. Tropic has the greatest forest area of 1.847 × 10 9 ha, accounting for 48.40% of the global forest area, which is almost a half of the GFC. While subtropical zone has the smallest forest area of 383 × 10 6 ha, accounting for 10.12%.
The regional distribution of global forest has significant relation with temperature and precipitation. Generally, high temperature makes the growth period of vegetation longer, and abundant precipitation can improve the status of vegetation growth.
In the aspects of continental views, the forest distribution in six continents also has big differences (Fig. 6). Table 2 shows the figure of forest cover of each continent and proportion   of global forest. From the table, it can be seen that the forest cover in Asia is the largest on Earth, mainly because of the broad north forest in Russia. Owing to its relatively small area, Oceania is the continent with the least forest. And South America has the highest forest coverage rate, mainly because of the Amazon rainforest, the world's largest and contiguous tropical rainforest.

Validation Result
Around the world, 39,900 points created in Sec. 2.3.3 are selected as points for final validation. The distribution of these validation points for GFC Product 2018 is shown in Fig. 7. According to the validation result, the overall accuracy (OA) of GFC30 (2018) is 90.94%. To better analyze the accuracy of our product, the accuracy of GFC30 across the six continents was also validated (Table 3). Accuracy for Oceania, Europe, and South America was relatively high, namely, 95.59%, 94.25%, and 91.79%, respectively. The main reason is that the remote sensing images for these areas have high cover rate so that there is a large amount of available and high-quality data for these areas. As an example, South America has dense tropical rainforest, high forest cover rate, and less influence from seasonal change. Therefore, with little limitation on temporal phases, the data for these regions have high quality.
However, the accuracy of the data for Africa is not as good as expected. One important reason is that the trees in Savanna, which is the transverse zone of forest and grassland at the north and south of Africa, are distributed unevenly. Thus, the cover rate is difficult to be identified. In addition, Savanna regions have distinct climates for wet and dry seasons, and hence the cover of deciduous forests changes significantly with seasons. Therefore, the selected images may not  contain the most accurate data, leading to the difference in the result of analysis and lower accuracy.

Conclusions
Forests are essential for human well-being, sustainable development, and the health of the Earth system. The rapid development of 30-m-resolution forest cover products worldwide can provide an important source of information and a reference for people to study the status and changes of forests.
In this study, we propose an automated pipeline for generating 30-m-resolution global-scale forest cover map utilizing GEE. Different from the previous GFC product, the proposed FEZbased approach uses FEZs to divide the global forest into 45 zones, accelerate the global product production speed and efficiency by introducing the GEE platform, and uses the RF algorithm for parallel. The method and result can provide experience for big data analysis and technical supports for analysis and creating products of land covers.
Based on the proposed method, a novel 30-m-resolution GFC map of year 2018, with the OA of 90.94%, was produced. GFC30 provides reliable and valid data support for the analysis of forest cover states in different scales, including the spatial distribution of global forest, forest cover states for each continent, and the forest cover states of highly concerned areas, such as the Amazon River Basin, the Congo River Basin, and Southeast Asia. In addition, various statistics and conclusions provided by the GFC 2018 can provide the relevant departments with information support and decision-making services.  However, GFC2018 product has weaknesses. In Africa, due to the influence of the distribution of cloud and rain, and the lack of availability of high-quality optical images, the product shows lesser accuracy. The radar data, which are available freely via Sentinel satellites, could be utilized to improve the product's quality. On the other hand, this study has used lots of samples for classification, and the future research should focus on how to realize the high-quality production with lower or smaller samples of points.
For farther research, based on GFC 2018, our research group will continue to work for the development of FCC products, diversifying the forest cover products and analyzing the change trend of forest, forest disturbance, and its influence on the ecosystem.