AI-ForestWatch: semantic segmentation based end-to-end framework for forest estimation and change detection using multi-spectral remote sensing imagery

Abstract. Forest change detection is crucial for sustainable forest management. The changes in the forest area due to deforestation (such as wild fires or logging due to development activities) or afforestation alter the total forest area. Additionally, it impacts the available stock for commercial purposes, climate change due to carbon emissions, and biodiversity of the forest habitat estimations, which are essential for disaster management and policy making. In recent years, foresters have relied on hand-crafted features or bi-temporal change detection methods to detect change in the remote sensing imagery to estimate the forest area. Due to manual processing steps, these methods are fragile and prone to errors and can generate inaccurate (i.e., under or over) segmentation results. In contrast to traditional methods, we present AI-ForestWatch, an end to end framework for forest estimation and change analysis. The proposed approach uses deep convolution neural network-based semantic segmentation to process multi-spectral space-borne images to quantitatively monitor the forest cover change patterns by automatically extracting features from the dataset. Our analysis is completely data driven and has been performed using extended (with vegetation indices) Landsat-8 multi-spectral imagery from 2014 to 2020. As a case study, we estimated the forest area in 15 districts of Pakistan and generated forest change maps from 2014 to 2020, where major afforestation activity is carried out during this period. Our critical analysis shows an improvement of forest cover in 14 out of 15 districts. The AI-ForestWatch framework along with the associated dataset will be made public upon publication so that it can be adapted by other countries or regions.


Introduction
Second, deforestation affects biological diversity, forest habitat, and increases soil erosion. Despite continuous efforts by the world community to curb deforestation, it continues unrelenting in the world, posing a serious threat to human livelihoods, eco-systems, and global climate. There is an urgent need to curb deforestation using automated techniques and tools to detect changes more frequently in the forests for sustainable management of forest inventory.
Historically, the forests are monitored by manual and/or aerial field surveys. These surveys are time-consuming and requires a lot of financial and manpower resources. Due to this, these surveys are conducted after multiple years and in case of developing or under-developed countries after several years. To automate the forest inventory and data management processing, the Food and Agriculture Organization of the United Nations developed an open source tool Open Foris. 2 However, field workers are still required to manually collect the data from forest locations, which are later processed automatically. In the context of automating the monitoring process, there are number of existing approaches that provide a solution to monitor forests either globally [3][4][5] or regionally 6-8 using remote sensing imagery. However, these approaches are limited to bi-temporal change detection or uses hand crafted feature extraction for statistical machine learning techniques to detect changes. The statistical machine learning algorithms are not designed to process high-dimensionality data and huge amount of training samples generated by the remote sensors. In contrast to statistical machine learning algorithms, deep learning-based algorithms can easily manage this huge amount of data due to parallelism offered by the graphical processing units. Furthermore, with the advances in remote sensing and artificial intelligence fields, the whole forest estimation and change detection pipeline can be completely automated. Specifically, this automated chain will be very useful for countries with limited resources as they do not have to rely on expensive tools and surveys to get initial estimates of forest area.
In order to resolve the above-mentioned problems, in this work, we present AI-ForestWatch, an end to end framework for continuous monitoring of the forests by utilizing advancements in the remote sensing and neural network (NN) domains. Our approach is aimed at reducing the financial and manpower burden on the foresters and provides a framework for initial estimations on multi-temporal changes in the forest cover. Specifically, we aim to exploit the available spatiotemporal multi-spectral Landsat satellite data to detect the forest area and changes. The main contributions of this work are as follows.
• An end-to-end semantic segmentation-based framework has been proposed for detailed analysis of forests cover change by utilizing the open-source frameworks (i.e., Pytorch for implementation), tools (i.e., QGIS and Google Maps for dataset preparation), and remote sensing data (i.e., Landsat-8). • The analysis is completely data driven and is done by employing deep learning-based semantic segmentation on extended Landsat-8 imagery to achieve pixel-wise labeling of spatiotemporal multi-spectral images. • Detailed exploration and comparison of our semantic segmentation-based approach with other statistical machine learning algorithms (i.e., random forest, decision tree, and logistic regression) for land cover classification has been carried out using Landsat-8 imagery as input and generated digitized maps as ground truth. • Finally, all the analysis including the source code and the associated dataset will be made public on GitHub (https://github.com/dll-ncai/AI-ForestWatch) upon publication.
The remainder of this paper is organized as follows. The related literature in terms of land cover classification and forest change detection is presented in Sec. 2. Section 3 gives a detailed explanation on the study area, background, and proposed approach for forest change detection using semantic segmentation. Section 4 summarizes the obtained results on basis of our implementation. Section 5 analyzes the obtained results and presents detailed discussion on the AI-ForestWatch framework. Finally, conclusions and outlook to the future work are presented in Sec. 6.

Related Work
The methods related to forest change detection can be divided into two categories: 9-11 (1) lowlevel feature-based methods and (2) object-based methods. The former employ pixel-based statistical inference uses multi-spectral images. 12,13 They perform pixel-wise contrast comparison to detect changes in temporal sequence of images. Few other approaches use spectral mixture analysis, 6 vegetation index, 14 and change vector analysis 15 -based statistical indices for analyzing Landsat time-series imagery for forest change detection. Alternatively, object-based methods 9,10,12 rely on incorporating contextual knowledge by processing similarity-based (e.g., spatial and appearance) homogeneous pixels together. 16 Typically, this is achieved by employing unsupervised machine learning techniques for object-based approaches via prior segmentation to obtain pixel clusters. These approaches perform better as they overcome spatial and spectral variations due to noise or georeferencing effects but suffer from inaccurate (under or over) segmentation results. 9 Shimada et al. 5 and Hansen et al. 17 provided global forest cover change statistics while using ALOS PALSAR data and Landsat data, respectively. Hansen et al. provided forest loss statistics at 30 m per pixel spatial resolution. On the other hand, Shimada et al. provided forest cover maps at a spatial resolution of 25 m until 2018.

Machine Learning-Based Forest Change Detection
Machine learning approaches based on classifiers, such as support vector machines (SVM), 18 random forest classification, 19 and decision trees 20 have also been employed for land cover classification in both pixel and object-based categories. For instance, Zhao et al. 21 compared four machine learning classifiers including classification and regression tree, SVM, artificial NN, and random forest to estimate forest parameters of black locust plantations on the Loess Plateau. Qamer et al. 22 performed a study for analysis of forest cover change in Western Himalaya, Pakistan. In this study, six forest and eight non-forest classes are labeled using GPS-based ground truth measurements, which are used to train a semiautomatic supervised classifier. They considered forest to nonforest transition to be deforestation and dense to sparse forest to be degradation or cover loss of the forests. The results depicted a relative loss in forest cover in the Khyber Pakhtunkhwa province regions with annual forest cover change rate of −0.42% from 1990 to 2010. Similarly, forest cover change study has been carried out by Souza et al. 6 for the Brazilian Amazon for the 2000 to 2010 period using Landsat imagery. Image registration and spectral mixture analysis were performed as preprocessing steps. With the main aim of generating large scale maps of forest change harnessing the power of cloud computing, Lehmann et al. 13 defined forest as tree cover with atleast 20% of canopy reaching at least 2-m height and used National Carbon Accounting System and National Forest Trend datasets for forest cover change detection in the Australian continent. More recently, Voight et al. 23 analyzed forest cover change using Landsat-8 imagery using a multi-layer perceptron for training and using annotations by manual labeling of images via Google Earth Engine to forecast magnitude and spatial patterns of deforestation.

Forest Change Detection Using Deep Learning
In the context of deep learning, the problem of forest cover classification is divided into two categories: one is patch-based classification that divides the full size multi-spectral images into spatially equal sized patches, i.e., 64 × 64 pixels and a deep learning convolution NN model assigns a single class to the whole patch based on the most dominant distribution of the class. 24,25 Deep neural networks (DNNs), such as ResNet, 26 GoogleNet, 27 and other similar topologies have been shown to perform well on this problem but because a whole patch of multiple pixels is labeled with just one class, they produce low-resolution under-segmented output maps. Another method to detect forest change is pixel-wise classification using encoder-decoder topologies. 28 A similar approach is used by de Bem et al. 28 to map deforestation in the Brazilian Amazon using Landsat data. However, the study focused only on mapping the deforestation by channel wise fusing 1 year apart images between 2017 and 2018 and between 2018 and 2019. This technique is different from our approach as we use only 1 year images with extended vegetation indices for estimation of the forest cover and generate change maps for all the other years (i.e., 2014 and 2016 to 2020). Furthermore, in this work, we use UNet-based semantic segmentation on patches of the input multi-spectral Landsat-8 imagery to perform pixel-wise classification. The advantage of using this approach is that the high resolution of the image can be fully exploited to generate detailed forest cover maps at the same spatial resolution.

Study Area
In this study, we selected the KP province, which initiated the Billion Tree Afforestation Project (BTAP) in 2014 to restore 350,000 hectares of deforested land in northern Pakistan. BTAP project is also a winner of the international Bonn challenge 29 due to its environmental and social impact. Before this project, the forest area of Pakistan was <2% of its total land area. This percentage is dangerously low when compared with other parts of the world and 25% recommendation of the united Nations. This afforestation project has considerably improved the forest cover of Pakistan, which will help to reduce the chances of floods, increase the precipitation, provide a guard against soil erosion, and help to control the impacts of the climate change.
We performed our analysis for 15 out of 25 districts of the BTAP regions where most of the afforestation drive was carried out. Figure 1 shows the districts used for this work as shaded regions. The remaining districts showed <1% forest cover for the year 2015 and were not considered for this study. Additionally, Chitral and Upper-Dir districts were also not considered for this study, since most parts of these districts are covered with snow and we were unable to label any useful forest data points in this region. Although, the paper maps labeled forest in these regions. However, it would only introduce noise in the training dataset and can produce wrong segmentation results. The BTAP study regions along with their coordinates are listed in Table 1.

Datasets
Satellite images are available in analysis-ready formats from multiple sensors, i.e., MODIS, Landsat that allow full coverage of the globe at very high spatial and temporal resolution. The Landsat program was launched in 1972 and provides the longest continuous coverage of the whole world through its eight satellite programs. 30 In this study, we have used Landsat-8 top of atmosphere imagery, which is available from 2013 and has a temporal resolution and spatial resolution of 16 days and 30 m per pixel, respectively. For our explorations, we selected images for each year with <10% cloud cover [ Fig. 2-(1)] and the set of these less cloudy images were then passed through a median filter that calculated the median value for each pixel location for each band of the image to generate clean composites representing the land cover for 1 year. The resulting image served as input ] to our framework for forest estimation and change detection.
The foresters of the KP province maintain the land cover maps of the BTAP districts in paper printed form. These maps are publicly available for the year 2015 31 with 10 labeled classes. For our analysis, binary maps with only forest and non-forest classes are created by fusing all the non-forest classes together. Since, no digital version of the maps are available, we used georeferencing technique to digitize these maps. This technique is discussed in detail in the next section.
Digitization of available land cover maps for BTAP districts. The BTAP land cover maps of year 2015 have 10 labeled classes such as alpine pasture, shrub and bushes, forests, agriculture land, and others. Since our aim is to detect forests in these maps, all classes but forest are fused together to form a single non-forest class. We utilized Google Earth Engine 32 for extracting Landsat-8 imagery and an open-source tool, QGIS, for performing this digitization task [ Fig. 2- (3)]. The digitization process consists of the following steps.
1. The paper printed BTAP district land cover maps were used as reference for digitization. A sample publicly available land cover map for Battagram district, KP, is depicted in Fig. 3. We only labeled two classes for digitization, forest and non-forest classes, by combining all non-forest classes into one. 2. The available land cover maps were in the form of images that were georeferenced against Landsat-8 imagery by precisely selecting distinct features such as rivers bends and lakes, as our ground control points (GCP) in the land cover maps and Landsat-8 images. 3. Thin plate spline was utilized as the transformation to map the land cover map on top of Landsat-8 imagery via the labeled GCPs. For all the maps digitized, the georeferencing error was <1 × 10 −13 . Step-(1) to step-(5) depict the digitization process of the available year 2015 maps.
Step-(6) shows the UNet training and the step-(7) and step- (8) show the generation of forest estimation and change detection, respectively.
4. The landcover maps were already color coded with dark green color used for forest pixels. We converted all dark-green pixels in the digitized maps to forest labels and all the other pixels to non-forest. This resulted in a georeferenced digitized map. 5. Since shapefiles of districts for which the maps were digizited were available, we multiplied all the pixels outside the district boundaries with zero to create NULL or invalid pixels outside district boundaries. This was necessary since the labeled maps only contained valid forest/non-forest pixel labels for all locations inside district boundaries, so outer pixels must be considered invalid in the digitized map as well. It is important to mention at this point that the actual Landsat-8 imagery is at 30 m per pixel spatial resolution which makes district images large in size (several thousands by several thousands pixels in every district) but the available labeled land cover maps are low-resolution images, which are georeferenced on the satellite images. This results in some differences in the forest cover percentages (FC%) labeled in the forest cover maps and our digitized maps and cannot be avoided due to the limitation of our ground truth.
The results at different steps of the above-mentioned process are shown in Fig. 4. These digitized maps are in the form of binary images with each pixel labeled as 2 or 1 to indicate forest/non-forest class for each pixel while 0 indicates an invalid/NULL pixel. NULL pixels are taken for training but test scores are reported only for the forest/non-forest classes as a binary classification task. A similar procedure is adopted for all the selected fifteen districts of the KP BTAP districts. These digital maps are used as target annotations for the rest of our research. It is important to mention here that no numerical assessment of the accuracy of these maps is performed since the available ground truth, the actual land cover maps are paper printed maps.

Data preprocessing
Landsat-8 satellite has a temporal resolution of 16 days, which means that every location on Earth is captured by its multi-spectral sensor camera roughly 22 times in a year. Each of these images has a cloud cover score, which can be used to filter them for least cloud cover. For our analysis, we filtered images for each year with <10% cloud cover [ Fig. 2-(1)] and the set of these less cloudy images were then passed through a median filter that calculated the median value for each pixel location for each band of the image to generate clean composites representing the land cover for 1 year. For example, for year 2015, we first filtered the images that had <10% cloud and then for each of the 11 original bands of these images, for every single pixel, median value was calculated across these images to generate clean composites. The indices explained above were calculated on these composites and appended with them afterward. This median filter allows us to minimize the atmospheric and seasonal variation effects and generate single images that represent one whole year. This step was performed to generate clean images for all years from 2014 to 2020 throughout this work [ Fig. 2-(2)]. For some districts, 10% cloud cover criteria was relaxed to 20% if clear images were not available for one whole year.
Landsat-8 provides imagery in 11 spectral bands 30 as presented in Table 2. Considering the high temporal resolution of 30 m, the available spectral bands are appended with seven more It has also been processed using its shape file to nullify all pixels outside the district boundary in order to extract only valid pixels for training. The dark green pixels are color coded in the ground truth map indicating forest pixels. (d) Final digitized map of district Abbotabad after converting all dark green pixels to forest label, shown in green, and all the other non-forest classes into non-forest label, shown in red. The NULL pixels are shown in black which includes all pixels outside the district boundary.
vegetation indices in the form of seven bands that are well-known to provide useful information for land cover classification. 33 Details of these appended seven indices are given below. The Bx represents band number of the Landsat-8 image (x ¼ 1; 2; : : : ; 11), which are required to calculate the appended index value.

Normalized difference vegetation index. Normalized difference vegetation index
(NDVI) is a commonly used index for vegetation detection. 34 It exploits the chlorophyll's absorption property of red wavelength and reflectance of infrared wavelength. Its value is always between −1 and þ1, where close to −1 means water, around 0 means barren land and close to þ1 means healthy vegetation. It is calculated as given under Soil adjusted vegetation index. Soil adjusted vegetation index (SAVI) helps to rectify the effects of surface reflectance in cases where there is low vegetation cover in the area of interest (<40%). 36 The reflectances of red and infrared wavelengths of light affect calculation of NDVI and SAVI modifies the original NDVI for Landsat-8 as given as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 1 4 3 Modified soil adjusted vegetation index. Modified soil adjusted vegetation index (MSAVI) accommodates for the soil brightness in NDVI in areas of low vegetation cover. 37 Although, similar to SAVI, MSAVI also depends on the red (B4) and near-infrared (NIR) (B5) bands of the multi-spectral imagery. However, the MSAVI minimizes the effects of bare soil on the SAVI and in the red-NIR reflectance space, the characteristics of MSAVI are different from SAVI. 38 MSAVI can be calculated using the equation given as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 6 9 9 The work from Barati et al. 39 presents the accuracy of different vegetation indices for estimation of vegetation cover including SAVI and MSAVI. Olmos-Trujillo et al. 40 used vegetation indices (both SAVI and MSAVI) to perform a geostatistical analysis of rainfall and temperature.
Normalized difference moisture index. Normalized difference moisture index (NDMI) is a ratio calculated using the NIR and short-wavelength infrared (SWIR) bands. It is used for calculating the water content in vegetation 41 using the following equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 5 7 2 Normalized burn ratio. Normalized burn ratio (NBR) is used for detecting the presence and severity of burned areas. 42 The following expression is used for calculating NBR: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 4 9 9 Normalized burn ratio-2. Normalized burn ratio-2 (NBR2) modifies NBR to estimate water content in burned areas and is helpful for burned areas recovery studies. 43 It is calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 4 1 5 For detailed exploration, different configurations with respect to number of input channels were explored. Machine learning and deep learning algorithms were trained and tested with different numbers of input channels to select an ideal model for the task of forest estimation and change detection. These configurations are as follows.
• RGB. Only three red (R), green (G), and blue (B) channels of the Landsat-8 image are provided as input. For instance, in case of RGB input configuration, the input image size for our network is 128 × 128 × 3. • Full-spectrum. Complete 11 bands (presented in

Forest estimation using semantic segmentation
The AI-ForestWatch framework has a two-stage forest change detection pipeline. First, pixelwise segmentation is performed on a trained model from Fig. 2-(6) to generate the binary forest/ non-forest cover maps [ Fig. 2-(7)]. Second, pixel-wise difference [ Fig. 2-(8)] between two consecutive binary forest/non-forest cover maps generates a change map. For pixel-wise segmentation, a UNet topology as shown in Fig. 5 with encoder and decoder 44 stages is used. An encoder is a fully convolution feature extractor network. The operators used in this encoder are convolution, batch normalization, activation, and pooling. Four of such modules comprise the encoder of our UNet topology, and it is known as a four-stage encoder. The other part of the UNet, the decoder, is the mirror image of the first half, a four-stage decoder. Each module in a decoder consists of transposed convolution operation followed by a copy-and-fuse connection between the corresponding stages of encoder and decoder. It is followed by convolution, batch normalization, activation, and pooling layers. This concatenation operation of encoder and decoder outputs allows the decoder to utilize the features extracted by the encoder at subsequent stages. The detailed network topology is shown in Fig. 5. The encoder (dotted square on the left) is where the input tensor is down-sampled and encoded into a smaller dimensional vector. The decoder (dotted square on the right) decodes this vector and produces full resolution segmentation for the input image. Each decoder module concatenates its output with the encoder output at the corresponding stage and the final decoder tensor is passed through a convolution again followed by a softmax layer, which normalizes the output probabilities for both classes at each pixel.
Implementation and training. The deep learning part of the proposed AI-ForestWatch framework is implemented in Python using the PyTorch framework. The district images at 30 m per pixel spatial resolution are quite large in volume so we divide them into smaller patches of size 256 × 256 × C pixels, where C represents the number of channels of the input image, i.e., C ¼ 3 for RGB, C ¼ 11 for full spectrum, C ¼ 7 for vegetation indices, and C ¼ 18 for augmented input image. All district images are of different sizes, with an average image size of 4000 × 4000 × C pixels, a dataset of 3375 ðð 4000 256 × 4000 256 Þ × 15Þ image patches for training and testing are created. 80% of these patches were randomly chosen for training and while 10% for validation and 10% for testing. Table 3 presents the number of patches used for each of the aforementioned sets. The random selection of the patches for each set results in distinct parts of districts going into training, validation, and test sets, so that we can capture maximum distribution. The network itself accepts inputs of size 128 × 128 × C, which are randomly cropped from these patches at training time [ Fig. 2-(6)]. Furthermore, for fast training of the model, the encoder layers are initialized by pretrained weights of the VGG 45 layers. At inference, these patches are generated directly at 128 × 128 × C dimensions to cover the whole image in a raster scan manner. The network has been trained using back propagation with the RMSprop optimizer. The learning rate was scheduled to drop exponentially from an initial value of 1 × 10 −06 . The model was trained for 200 epochs with batch size of 64. The loss gradients were clipped at 0.05 to avoid exploding gradients. The whole training and testing are performed for the year 2015 since only the annotated maps are available. We also experimented with different loss functions including cross-entropy loss function, focal loss, 46 and dice loss 47 in both weighted and unweighted settings. We found that the unweighted focal loss outperforms the other losses and hence it was adopted for this work.

Change detection statistics
Since the change is detected between consecutive years, our metrics are also based on two images under inspection. We combine these metrics temporally to assess the overall change trends. When two consecutive forest cover maps generated by our classification model are subtracted, we get a pixel-wise change map. Based on this, we can calculate the net change, gain, and loss in our regions of interest. These metrics are explained as follows.
Forest cover percentage. This is calculated on the individual forest cover maps generated by the classification model. This is the ratio of the number of forest pixels to the sum of forest and non-forest pixels in the classified image. This gives a percentage of the forest cover in the district under inspection but no information about change. It is calculated for all the years in the temporal series of images as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 3 3 9 FC% ¼ forest pixels forest pixels þ non-forest pixels : (8) Gain percentage. Gain percentage (GP%) is calculated on the change map and it is the ratio of the gain pixels ð−1Þ to the total number of change map pixels (−1, 0, and 1): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 2 6 5 GP% ¼ gain pixels gain pixels þ loss pixels þ no change pixels : (9) Loss percentage. Loss percentage (LP%) is calculated on the change map and it is the ratio of the loss pixels ðþ1Þ to the total number of change map pixels (−1, 0 and 1): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 1 9 2 LP% ¼ loss pixels gain pixels þ loss pixels þ no change pixels : Effective forest cover change percentage. Effective forest cover change percentage (ECP%) is also calculated on the change map to calculate the overall change in forest percentage with respect to the former year's forest percentage. It is calculated as follows:

Results
In this section, we present detailed evaluations of the AI-ForestWatch framework. We present comparisons of the statistical ML-based classification algorithms against UNet semantic segmentation for all the configurations. Furthermore, we also present forest cover and change statistics in the study area from years 2014 to 2020.

Comparison with Classification Algorithms
As already discussed in Sec. 2, statistical machine learning algorithms are commonly employed to solve the land cover classification problem. In addition to the semantic segmentation-based approach, three machine learning algorithms, decision tree, logistic regression, and random forest were also explored in this study. For these machine learning algorithms, 5000 points were chosen for each classifier as training input and 5000 for testing. The Landsat-8 was used as input while the digitized data as the ground truth. Furthermore, extensive exploration was conducted on selection of ideal model by considering different input channel configurations for all abovementioned machine learning and deep learning algorithms.
The results of explorations of the algorithms for RGB, full-spectrum, vegetation indices, and augmented are presented in Tables 4-7, respectively. Decision tree and logistic regression showed the worst results in most configurations. Quantitatively, semantic segmentation-based approach outperformed statistical machine learning algorithms in terms of accuracy. However, this measure should be carefully considered as we are performing binary classification and there exists a class imbalance between the forest and non-forest classes. Due to this reason, Table 4 Test accuracy and per-class F 1 scores comparison of the baseline ML-based land cover classification algorithms with UNet-based semantic segmentation approach for RGB channels.  combination of F1 score, precision, and recall along with accuracy are used as standards to evaluate the performance of the models. Among semantic segmentation models, although, the RGB model yielded the best accuracy (83.88%). However, it lacked in terms of the F1 score (0.52) and recall (0.41) for the forest class. The vegetation indices model performed best in terms of F1 score (0.58) for and recall (0.53) for the forest class but lacked in precision. The augmented model (18 input indices) produced the best results with regard to every measure with the exception of recall for the forest class. Due to this reason, the augmented model was selected for the AI-ForestWatch framework to perform forest estimation and change detection task. The estimation and change detection results discussed from here onward are derived using this model.

Estimation and Change Detection Results
After selection of the augmented model on basis on accuracy and the F1 score, the forest area cover and change maps for years 2014 to 2020 of the 15 districts of the BTAP project are generated. As the model is only trained and tested on year 2015 data, inference is performed on clean images from year 2014 and 2016 to 2020 using the augmented input configuration. The detailed numerical statistics for all the 15 districts of the BTAP project in our case study are summarized in Table 8. Nowshehra is the only district to show a negative change in forest cover from 2014 to 2020. The rest of the districts show an increase in the forestation. The ECP% column in Table 8 is the percentage change in forest cover compared to the forest cover in 2014. Hence some changes are very higher (þ764% for Haripur and þ327.13% for Malakand) because forest cover in 2014 was quite low and majority of the afforestation drive was carried out in these districts. The 14 to 20 LP% and 14 to 20 GP% columns show the forest loss and GP%, respectively, according to the metrics given in Sec. 3.2.4. In all districts except Nowshehra, the GP% is more than the LP%, which explains the overall forest gain seen in most districts.

Discussions
In order to solve the problem of forest estimation and change detection, we compared various machine learning algorithms with a semantic segmentation-based approach. Additionally, we also considered multiple input channel configurations for detailed exploration of the algorithms. to 2020. These change maps indicate the pixels, in which forest changed to non-forest, vice-versa, and the pixels with no change. Such discrete series of images can help in analysis of how the forest cover is changing over the temporal period and it can be extrapolated to see how the forest maps would look in the coming years. Since the maps show pixel-wise change, precise locations of forest loss/gain can be determined at upto 30 m per pixel. This information is very crucial for the foresters as it can be used to determine the reasons for loss and devise policies for forest protection and afforestation.
One artifact seen in our estimations is that the FC% jumps from years 2014 to 2015, came down closer to year 2014 percentage cover again in year 2016, and then increased normally from Table 8 Forest statistics for the 15 BTAP districts produced by our AI-ForestWatch framework from 2014 to 2020. All districts except Nowshehra have shown a positive change in forest cover. It must be noted here that the ECP% compares the forest cover of 2020 relative to 2014 and not with respect to the total land area of the district. years 2016 to 2020. This artifact is clearly visible in Table 8, districts Karak, Haripur, and Buner. In our case, we are constrained by the fact that only the local land cover data for year 2015 are available. Such artifacts can be resolved by cross-temporal training on labeled Landsat-8 data of multiple years, i.e., 2014, and 2016 to 2020. Since for training the model, we are relying on the data from year 2015 only (because that is the only available data) and generating forest cover estimations and maps for rest of the years, then any kind of noise, such as seasonality effects or cloud cover taken from year 2015 data could have generated this anomaly. We surmise that this would normalize or die out as more training data becomes available since in that case there would be a greater chance of receiving images of the same district from different years that have different seasonality effects in them even when they were cleaned by cross-temporal median filtering within 1 year of data. The classification model would then learn these seasonality effects alongside the classification task and would be more robust.
It should be noted that such detailed forest cover change maps and statistics (as shown in Appendix A) are not available for the BTAP project and this is considered a first attempt to assess the effects of afforestation on the forest cover area of the BTAP project at a reasonably high spatial resolution (see images of selected regions in Figs. 6 and 7). Nazir et al. 48 presented a mathematical model which suggests that the BTAP project will increase the forest area of the KP province to 23.59 million hectares, which is a 3.29% improvement over the forest area. Although this mathematical model considers the complete area of the KP province, our study of 15 districts shows a similar trend (presented in Table 8). Along with improvement in the forest area, the BTAP project has a positive socioeconomic impact on the population of KP. Khan et al. 49 showed that social sustainability increased by 69% between years 2014 and 2018. Additionally, the economic condition of the rural households in the region has improved considerably. Although the BTAP project have numerous positive environmental and social impacts, it lacks human resources and funding 50 to continuously monitor the forest cover and change. In order to tackle these problems, the change maps generated by the AI-ForestWatch framework provide the foresters with an initial estimations of the BTAP project and its impacts on land cover and all the associated ecosystems while relying on our digitized maps accuracy and the assumption that our cross-temporal forest cover predictions are accurate. Furthermore, it will provide policy makers with not only essential information for long-term planning but also to devise afforestation strategies.
Although, global forest change cover maps are also presented by Shimada et al. 5 and Hansen et al. 17 until 2012 and 2018, respectively. However, these lack the local knowledge and context, i.e., forest tree species and local forest rules. Globally different countries and locations have different forest species/qualities with respect their climate and region, for instance, forests in Asia will have different specifications as compared to Europe and North America. Therefore, the approach presented herein takes into account the local knowledge using the local training and annotated data to generate detailed maps of the specific region as shown in Appendix A. As we aim to open source the AI-ForestWatch framework, the release will include the dataset (training, validation, and test), the methodology, and the pretrained models. In order to adopt our framework for any other location or region, the pretrained models can be easily adapted by using the transfer learning approach to fine-tune the model for that specific study area or location.
The investigations carried out in this study will lead to new opportunities for forest monitoring specifically for the countries with limited resources. This would surely help foresters to have initial estimations of the available forest inventory and associated ecosystems. Furthermore, multi-label land cover maps can allow the same approach to be provided with, for example, monthly multi-spectral image profiles from Sentinel-2 or Landsat-8 data to monitor the change in water bodies, glaciers, or yearly urban growth.

Conclusion
Forest estimation and change detection are essential parameters for maintenance of the forest inventory, forest ecosystems, and carbon emissions. In order to perform these estimations frequently, in this paper, we presented AI-ForestWatch, an end to end framework for forest estimation and change detection using the open source tools and imagery. Our approach is based on semantic segmentation (UNet), which performs pixel-wise classification to detect the forest and non-forest areas in an extended (eleven Landsat-8 bands concatenated with seven vegetation indices) Landsat-8 imagery. We generate pixel-wise forest and non-forest change maps using multi-temporal change detection from the years 2014 to 2020. As a case study, we analyzed the proposed framework on the 15 BTAP project districts in the KP province of Pakistan. Our evaluations show a positive change with forest cover improvements in the 14 BTAP districts of the study area. In the future, we aim to explore newer CNN topologies such as SegNet and Mask RCNN for more accurate forest cover map predictions. Furthermore, more accurate comparisons will be made when yearly ground truth data for all the years are available. That would allow us to train the classification network cross-temporally for better map predictions.   (e) (f) Fig. 7 Battagram district district 2014 to 2020 statistics generated automatically by the AI-ForestWatch framework: (a)-(g) the forest cover maps generated by the UNet inference from years 2014 to 2020, respectively, and (h) the forest change map of 2020 with respect to 2014. Blue pixels indicate no change in the area, red pixels indicate forest loss, and green pixels indicate forest gain.