Remote sensing analysis of vegetation at the San Carlos Apache Reservation, Arizona and surrounding area

Abstract. Mapping of vegetation types is of great importance to the San Carlos Apache Tribe and their management of forestry and fire fuels. Various remote sensing techniques were applied to classify multitemporal Landsat 8 satellite data, vegetation index, and digital elevation model data. A multitiered unsupervised classification generated over 900 classes that were then recoded to one of the 16 generalized vegetation/land cover classes using the Southwest Regional Gap Analysis Project (SWReGAP) map as a guide. A supervised classification was also run using field data collected in the SWReGAP project and our field campaign. Field data were gathered and accuracy assessments were generated to compare outputs. Our hypothesis was that a resulting map would update and potentially improve upon the vegetation/land cover class distributions of the older SWReGAP map over the 24,000  km2 study area. The estimated overall accuracies ranged between 43% and 75%, depending on which method and field dataset were used. The findings demonstrate the complexity of vegetation mapping, the importance of recent, high-quality-field data, and the potential for misleading results when insufficient field data are collected.


Introduction
Remote sensing technology is often used to document and classify existing vegetation data at the regional scales. 1 Vegetation/land cover maps can be developed either at a community level or species level by discerning spectral characteristics and translating them into classes. 2 Large-scale projects are often constrained by the limited availability of high-resolution imagery and depend on lower resolution remotely sensed imagery as inputs that create lower resolution outputs. 3 Accuracy can increase when newer products are used and when lower resolution data can be validated with field data. 4 Gathering baseline conditions and change monitoring are advantages when creating good maps and improving the quality of the results. 5 A large mapping effort was undertaken to classify vegetation in the USA by the National Gap Analysis Program (GAP), where gap analysis was defined as a method for identifying "gaps" in conservation land and/or water locations. 6 This was improved upon in the five southwestern states (Arizona, Colorado, New Mexico, Nevada, and Utah), by the Southwest Regional Gap Analysis Project (SWReGAP). SWReGAP provided land-cover mapping and assessment of biodiversity using Landsat ETM+ imagery (1999 to 2001) and digital elevation model (DEM) derivatives. 7 The SWReGAP project used terrestrial ecological systems (TES) classification 8 developed by NatureServe 9 to emphasize dominant vegetation types. 7,10,11 The SWReGAP map contains 125 land-cover classes with validation accuracy of 61%. 7 This validation accuracy was based on an intermediate product, whereas the final, published map (based on all of the field observation sites with determined vegetation/land cover class) should be higher. In addition, the process of generalizing the vegetation/land cover classes for this study should result in a more accurate product as fewer, more generalized classes can be easier to map. Therefore, we felt that the SWReGAP map would be acceptable for use in this project.
Landsat Earth Observation Satellites have been collecting space-based imagery of the Earth's surface since 1972. 12 The Landsat 8 satellite was launched in February 2013, with Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) instruments onboard. OLI collects data with a spatial resolution of 30 m in the visible, near-IR, and SWIR wavelength regions, and a 15-m panchromatic band. 12 The OLI payload is distinguished for examining vegetative cover types. [13][14][15] Land managers of the San Carlos Apache Tribe identified a need to obtain an updated vegetation map for better management of forest, woodland, and rangelands. The U.S. Geological Survey (USGS) Western Geographic Science Center (WGSC) used remotely sensed data to help the Tribe map rangeland condition. That map was later used to identify areas for spraying pesticides for invasive grasshoppers. 16 In 2017, WGSC attempted a unique mapping approach to provide a more current vegetation map, both on and off the reservation, for the Tribe. The SWReGAP vegetation map was used as a starting point for this mapping effort.
Our hypothesis was that using the SWReGAP map as a base, and adding more recent multitemporal Landsat 8 OLI satellite data, along with vegetation index and DEM data, the SWReGAP map could be improved upon for a relatively small area (24;000 km 2 ). It was postulated that using unsupervised algorithms to split the current multitemporal 30-m satellite data up into many classes, and recombining them using the SWReGAP map as a guide, an updated map, with improved accuracy, could be produced. It was also decided to pursue an initial supervised classification using field data from the SWReGAP and our field campaign. This paper presents an overview of how remotely sensed imagery was used to classify and map vegetation at the San Carlos Apache Reservation, Arizona and surrounding area, while also providing a discussion of its accuracy for the remote sensing of vegetation community.

Study Area
The San Carlos Apache Reservation in east-central Arizona, covers 7502 km 2 and is the 10th largest reservation in the United States by land area, with a population of 10,709. 17 The study area and map includes 16;500 km 2 of adjacent land creating a larger area of interest (AOI; Fig. 1

Classification
SWReGAP mapped 46 discrete classes in the study area; however, the Tribe did not require the specificity of the SWReGAP classes. Based on discussions with the San Carlos Apache Tribe (tribal ethnobotanist), 20 12 classes were determined to be of interest. Two classes were added to identify areas that do not occur on the reservation but occur in the surrounding mapping area ("mine" land cover class and "Montane Grassland" vegetation type). Finally, two classes were added to be more in-line with the SWReGAP vegetation classes ("Madrean Pine-Oak Forest/ Woodland" and "Mesquite Upland Scrub"), as these common study area vegetation classes did not fit well into the original 12-class classification. This resulted in 16 classes for our analysis. To compare these classes with SWReGAP, a cross walk was developed between the generalized vegetation classes used in the unsupervised classification mapping effort and the more specific classes employed by SWReGAP for this mapping area (Table 1).
Of the 46 discrete classes that SWReGAP mapped in the study area, only a relatively small portion of them came up as what were considered, "likely" vegetation types in this analysis. We define "likely" vegetation types as the top three SWReGAP vegetation/land cover classes (nongeneralized) with the most pixels in common with the unsupervised class in question. In other words, the vegetation/land cover classes that best represented the unsupervised class, measured by the number of pixels sharing a common location. Many of the smaller classes, covering a low percentage of the mapping area (∼ < 0.5%), did not fit in this category. Additionally, some vegetation classes were grouped together into a single class (i.e., Dry-Mesic Mixed Conifer, Mesic Mixed Conifer, etc.) for the study.

Remote Sensing
A suite of Landsat 8 OLI images from 2014 to 2013 was used in the classification. These images are virtually cloud-free over the study area and are calibrated to surface reflectance; they were downloaded in February 2017 from the USGS earth resources observation and science (EROS) science processing architecture (ESPA) website 21 Table 2 for information on selected bands.
NDVI was calculated for each of the 13 Landsat 8 scenes and used as inputs to the classification. For Landsat 8, the bands used in NDVI are b4 (red) and b5 (NIR). NDVI values range from −1 to 1; nonland surfaces (such as water and snow) typically assume negative values and land surfaces typically assume positive values. As landscapes become more densely vegetated, the NDVI trends to 1. 11,15,26

Creating layer stacks for different elevation zones
Vegetation communities are generally constrained by elevation gradients and the inclusion of elevation data can greatly increase the accuracy of mapping efforts. 27 A 10-m raster DEM from The National Map 28 was downloaded, mosaiced, and reprojected (UTM, zone 12, WGS84) using bilinear interpolation, converted into units of feet, degraded in resolution to 30 m, and finally subset to the study area coordinates. The newer 10-m data were used as it is an updated version of the 30 m that corrects for some of the issues with the original 30-m data, such as edge matching/seam issues, while maintaining the potential of being updated from other high-resolution sources (such as LIDAR). A mask was created from the DEM data for elevations above and below 2591 m. Snow contamination affecting the pixels of several image dates in areas above 2591 m was the main reason for splitting the scene using this elevation. Masking scenes using this threshold allowed more dates to be utilized in areas below this elevation because snow contamination in upland areas would have caused some otherwise excellent scenes to be rejected.
Initially, two layer stacks were produced, one for each Landsat path/row. The layer stack for P36/R37 consisted of bands 1 to 7 and NDVI for each of the seven dates followed by the DEM image to create a 57 band image. The layer stack for P35/R37 consisted of bands 1 to 7 and NDVI for each of the six dates, followed by the DEM image to create a 49 band image. The P35R37 layer stack was further split into two separate image stacks, one masked to elevations above 2591 m, and one below that elevation. Only P35R37 contained elevations above this threshold in the study area.
For areas above 2591-m elevation, only four of the six dates were uncontaminated by snow cover; these four scenes and their derived NDVIs and DEM imagery were used to create a new 33 band layer stack for analysis in the high elevation zone. The Landsat path/row boundaries cut diagonally across the study area which resulted in strips of "extraneous" pixels as the individual image bands did not exactly cover the same area. To address this, all the individual image bands  (2) in two of the layer stacks affected by this issue (P35R37 below 2591 m and P36R37) were made into a binary mask with 0 representing no data and 1 representing usable image data. The minimum overlap area (intersection of polygons) was then determined and used to mask out the areas of the layer stack that contained any "no data" values. In summary, three layer stacks were made as follows: i

Unsupervised classification
Unsupervised classifications do not require training data like supervised classification and therefore may be more economical in developing land cover maps for large areas. We explored a unique method using iterations of unsupervised classifications and thematic class matching in an attempt to increase the accuracy of the classification. For portions of the map above 2591 m, a four-date layer stack (with corresponding NDVIs and DEM) was classified to initially create 10 unsupervised classes. Each of those 10 classes were then subset from the same layer stack and further classified into 10 additional classes, resulting in a total of 100 classes. One second-generation class was further subdivided into five classes, so 104 classes in total were created. Below 2591 m (P35R37), a six-date layer stack (with corresponding NDVIs and DEM) was classified to create 20 unsupervised classes. Each of those 20 classes were further subdivided into 20 additional classes for a total of 400 classes. For P36R37, a seven-date layer stack (with corresponding NDVIs and DEM) was iteratively classified (20 × 20) to ultimately create 400 unsupervised classes as well. The portion of the mapping area above 2591 m (only in P35R37) was far less than the portion of the image below this elevation, and needed far fewer classes to map.

Vegetation class assignment
To determine the most likely vegetation class of each of the 904 resultant unsupervised classes, they were compared with the original SWReGAP map. Each class was spatially subset from the SWReGAP map, and the top three SWReGAP vegetation class designations were determined for each unsupervised class. If the "top" SWReGAP class, determined by the number of pixels representing that class, was exceeded by summing the second and third place most likely SWReGAP classes, and these classes were to be grouped into a single class (i.e., Dry-Mesic Spruce-Fir, Mesic Spruce-Fir, etc.), the grouped class designation (i.e., Spruce-Fir, etc.) was now considered the most likely vegetation class. In other words, if the second and third place classes were to be grouped into a single class, and the number of pixels representing these summed classes exceeded the original first place selection, it was determined to be the most likely class. See Fig. 2 for further clarification.
After assigning the most likely class to all the unsupervised classes for P35R37, the Ponderosa and Mixed Conifer classes, that straddled the "border" across the 2591-m elevation threshold, were subset out, combined and reclassified (into 30 unsupervised classes) using the same four-date layer stack used for high elevations. Vegetation classes were then assigned using the technique described previously. This resulted in a much "cleaner transition" across this artificial elevation boundary. After the class assignments were made, the below and above 2591-m portions of P35R37 were initially mosaiced together. This was followed by mosaicing (using the overlay function) the classified area that straddled the elevation threshold "over" the original P35R37 mosaic. The resultant P35R37 mosaic was then mosaiced (using the overlay function) "over" the classified P36R37 section of the mapping area. This resulted in roughly the eastern two-thirds of the mapping area being derived from P35R37 imagery, and the remaining western portion being generated from P36R37 imagery. Finally, a 3 × 3 pixel majority filter was applied to eliminate single pixels and small pixel groups that are potential "noise" in the image. Urban area boundary shapefiles from 2016 were then downloaded from the census website, 29 screen digitized, and added to the map to create an urban class. See Fig. 3 for flowchart of process discussed above.
For use in accuracy assessment, the SWReGAP vegetation/land cover map was collapsed and recoded from its original 46 classes found in the study area to the final 16 classes used in the unsupervised classification map using the cross walk shown in Table 1. Classes that did not directly translate from the SWReGAP map to a more generalized class of the unsupervised classification map (i.e., recently burned, etc.) were labeled as "unclassified" on this map. None of the "unclassified" group of vegetation/land cover classes came up as a "likely" class in the analysis.

Supervised classification
A supervised classification was developed to compare with the unsupervised classification. An initial supervised classification was performed using all the field data [specifically vegetation/ land cover class and associated global positioning system (GPS) position] in the study area from the 2001 to 2004 SWReGAP and the 2017 field campaign as training sites (1028 points/pixels in all). No attempt was made to verify that the SWReGAP points were indeed still the same vegetation class, nor was burned vegetation removed or reassigned from the dataset. After the SWReGAP vegetation type field data were generalized into one of the 16 final classes, they were combined with the field data points from the 2017 field campaign. The field data points were then grouped into the individual vegetation class shapefiles using ArcMap (v. 10.5). They were then further subdivided by Landsat path/row. These groups of points that represented the same vegetation class (in a specific path/row) were then opened into ERDAS Imagine (v. 2014) and converted into AOI points, before being combined with the respective Landsat 8, NDVI, and DEM image layer files to create representative signature files. The six-date Landsat 8 image stack (with NDVI and DEM) discussed earlier was used to create the signature files for P35R37, whereas the seven-date Landsat 8 image stack (with NDVI and DEM) was used for P36R37. During the supervised classification, "minimum distance" was used as the parametric rule, whereas a nonparametric rule was not employed. Water, agriculture, Riparian Forest/ Woodland, mine, and urban were added back in from the unsupervised classification map (non 3 × 3 spatial filtered version). Then, a 3 × 3 median filter was run on the image.

Reference Data
Ninety-five observation sites were visited during two-field campaigns in 2017. Additionally, 933 field observation "points" in the mapping area, which were collected from 2001 to 2004, were subset from the historical SWReGAP field database. Each set of reference data are representative samples reflecting the land cover composition in the study area.

2017 field campaign
The 2017 field campaign was based on a map of semirandom, accessible points in the study area that included a number of points for each of the generalized vegetation/land cover classes. Roads layers from the San Carlos Apache Reservation, Tonto and Apache-Sitgreaves National Forest, and other sources were buffered by 200 m in ArcMap 10.5 (ESRI 2016) to constrain the accuracy assessment points to areas near a road. The map was broken into six regions based on the distribution of major roads and land management/ownership regions. Random points were generated for each vegetation class for each region with a minimum allowed distance between points of 100 m. Some classes, such as Spruce-Fir Forest, are less spatially extensive and therefore not represented in all regions; this resulted in 74 class/region combinations. We attempted to create 10 points in each of the 74 class/region areas; however, because of the small spatial extent of some classes as well as the minimum allowed distance constraint, this was not always possible. Eight of the 74 class/region combinations received <10 random sampling points. This resulted in a total of 684 possible field sampling points. Field work was performed September 18 to 22, 2017, and October 31 to November 1, 2017, by driving along well-maintained roads to predetermined accuracy assessment points. Allowing for a suitable area to pull off, we attempted to locate the point to collect field data (Fig. 4). If a fence line (or a steep drop-off) impeded access, a "subjective point" was placed as near to the actual point as possible. Other subjective points were added along the route, often within broad expanses of a single vegetation type, or in unique vegetation "islands" within a more continuous cover. A total of 95 points were sampled. At each site, a GPS point and photographs in the four cardinal directions (north, east, south, and west) were collected and information on the local vegetative cover were documented, including the major vegetation species and their apparent abundances, and overall vegetation type based on one of the mapped vegetation classes. General vegetation of areas bordering the site was noted as well, as was evidence of fire and tree mortality/regeneration.

2001 to 2004 SWReGAP field campaigns
Over the course of 2−1/2 field seasons, approximately 93,000 reference samples were collected directly, or obtained from other contemporary projects, for the SWReGAP land-cover mapping effort in the southwestern states of Arizona, Colorado, Nevada, New Mexico, and Utah. 7 Of those, 933 were in the study area (Fig. 5). In the study area, the vast majority of field plots were field collected by traveling easily navigable roads and selecting plots that were at least 1 hectare in size with stand homogeneity. 7 Ocular estimates included percent cover of dominant species and abiotic site characteristics. GPS coordinates were recorded.

Accuracy Assessment
Classification accuracy measures the agreement between what is known to be on the ground (i.e., reference condition) and the product of a remotely sensed image classification (expected condition). 30 To perform an accuracy assessment, a user compares reference data with the expected class  at a number of locations within the image. An error matrix is developed that provides a descriptive statistical technique to determine overall accuracy. Overall accuracy is the number of pixels correctly classified divided by the total number of pixels. 31 In addition, accuracies of each vegetation class can be computed, with both a "consumer's accuracy (CA)" and "producer's accuracy (PA)." 32 The CA divides the number of correctly classified pixels by the total number of pixels assigned to a class by the classification analysis (expected). The PA divides the number of pixels correctly classified per class as a percentage of the total number of pixels belonging to that class as described by the reference data. Classification errors occur via omission, when a feature is left out of the class, and via commission, when a feature is incorrectly included in a class.

Statistical analysis
The unsupervised classification map was compared with 2017 field data and SWReGAP field data. Additionally, the recoded SWReGAP map was compared with 2017 field campaign data. The SWReGAP map was also compared with the SWReGAP field data to determine an upper boundary for this map's accuracy over the study area. Finally, the supervised classification map was compared with both the SWReGAP and 2017 field campaign data to estimate upper boundaries for this map's accuracy. Accuracy assessments were developed in R 33 using the caret package. 34 The error matrices for the SWReGAP map to SWReGAP field data, and supervised classification are not shown as the field data (100%) were used as the training data to produce these maps. Figure 6 shows the results of our mapping efforts. Figure 6(a) was created from a 904 class unsupervised classification using Landsat 8 OLI satellite and DEM data. The determination of vegetation and land cover class designations was based on reference to the SWReGAP map. Figure 6(b) was created from recoding the SWReGAP vegetation/land cover map for the study area to the same classes used in the unsupervised classification map. An additional "unclassified" class was added to capture SWReGAP vegetation and land cover classes that did not directly fit into the 16-class scheme. Figure 6(c) was created from an initial supervised classification employing training sites derived directly from the SWReGAP and 2017 field campaigns. All SWReGAP field data [specifically vegetation/land cover class (generalized) and associated GPS position] in the study area were used in the analysis, and were not checked to see if had changed (burned, changed vegetation type, etc.) in the intervening years since it was collected (2001 to 2004). In addition, all vegetation/land cover class and associated location observations in the study area from the 2017 field campaign were also utilized to create training sites in the supervised classification.

Unsupervised classification
The results of the unsupervised classification had an overall accuracy of 53% when compared with 2017 field data ( Table 3). In general, CA was higher than PA. Agriculture (Ag) and Pinyon-Juniper-Evergreen Oak Woodland were the most accurate classes overall with both CA and PA >0.50. Madrean Pine-Oak Forest/Woodland was the least accurate class with both CA and PA falling below 0.25. The unsupervised classification map was assessed using SWReGAP field data (Table 4) and compared with the accuracy assessment using the 2017 field data. Overall accuracy was similar at 52% with mixed results in both CA and PA. Not all classes can be compared between the 2017 field data and the SWReGAP field data since the SWReGAP field data are missing any Agricultural observation points and the 2017 field data is missing Spruce-Fir Forest. All of the classes saw changes in either CA or PA and most classes saw changes in both. Aspen Forest and Pinyon-Juniper-Evergreen Oak Woodland were lower in both CA and PA, while Mixed Conifer Forest, Ponderosa Pine Forest, and Madrean Pine-Oak Forest/Woodland were higher in both CA and PA. Interior chaparral, Montane Grassland, and Plains/Semidesert Grassland had higher PA but lower CA. Sonoran/Chihuahuan Desertscrub had lower PA but higher CA. Riparian Forest/ Woodland and Mesquite Upland Scrub maintained their CA but had lower PA. Montane Grassland, Plains/Semidesert Grassland, and Sonoran/Chihuahuan Desertscrub were the most accurate (PA and CA > 0.50), whereas Spruce-Fir Forest was the least accurate (PA and CA < 0.25).

SWReGAP
The SWReGAP map was assessed using 2017 field data ( Table 5). Overall accuracy was 44%, a reduction of 9% from the unsupervised classification map assessed with 2017 field data with mixed results in both CA and PA. Madrean Pine-Oak Forest/Woodland was the only class to improve in both CA and PA. Aspen Forest, Ponderosa Pine Forest, Montane Grassland, and Mesquite Upland Scrub showed decreases in both CA and PA. Pinyon-Juniper-Evergreen Oak Woodland had higher CA but lower PA, whereas Interior Chaparral had lower CA but higher PA. Lowry et al. 7 reported a validation accuracy of 61% for the entire five state mapping area. This validation accuracy was based on an accuracy assessment of an intermediary product and the accuracy of the final SWReGAP map is likely higher than the validation accuracy. To find an upper boundary for accuracy of the final SWReGAP product over the study area, the SWReGAP map was compared with the SWReGAP field data. The resultant accuracy is overestimated because the test data were taken from the training data 30 but it sets the upper boundary of accuracy at 75%. The overall accuracy of the SWReGAP map was roughly considered to be between these boundaries (61% minimum-measured over the five state mapping areas, and 75% maximum-measured over the study area). The error matrix comparing the recoded SWReGAP map to the SWReGAP field data is not presented here, because 100% of the field data were used to create the map, and it does not represent a competent estimate.

Supervised classification
The results of the supervised classification had an overall accuracy of 43% when compared with 2017 field data, and 47% when compared with the SWReGAP field data. The error matrices are not presented here, because 100% of the field data were used to create the map, and they represent an unrealistic upper limit to the accuracy estimates. Even so, the supervised classification map has very low accuracy and probably requires a significant amount of work with the selection of training sites before being useful as an accurate map product. Moreover, this classification was only meant to explore the initial results of using a supervised classification on the field data to see if that is a reasonable avenue for future mapping. Table 3 Error matrix comparing the unsupervised classification results (expected) to the 2017 field data (reference).

Discussion
The overall accuracy for each map iteration compared with the field dataset is shown in Table 6. The unsupervised classification map ranged between 52% and 53%, depending on which field dataset was used as a reference. The overall accuracy of the SWReGAP map over the study area was 44% using the 95 sites of 2017 field data as a reference. However, the SWReGAP field data are much more extensive, with nearly 10-fold more observation sites, potentially allowing for a more definitive accuracy assessment. In addition, the larger multistate SWReGAP map has an accuracy that is likely higher than the reported 61% due to their method of validation. SWReGAP validated an intermediate map developed with 80% of field data, using the remaining 20% to assess the validation accuracy. The final map was developed using 100% of the field data, and thus would be expected to be of even higher accuracy. In addition, if fewer classes were used as in the unsupervised classification map, one would expect the SWReGAP map to be of higher accuracy than the stated 61% as well. The measured 75% accuracy of the SWReGAP map using the SWReGAP field data (used to create it) sets an elevated upper limit on the accuracy of this map over the study area. Although the accuracy results are inconclusive, we deem the SWReGAP map to be more accurate in general. Visually, the unsupervised classification map and SWReGAP map differ greatly. One geographic area of note is the Pinaleño Mountains, a large mountain range in the extreme southcentral part of the study area. This mountain range peaks at 3267 m in elevation, and is portrayed as covered entirely in Ponderosa Pine (green) at its middle and highest elevations on the SWReGAP map. Although Ponderosa Pine is prevalent at middle elevations, the range is actually covered in Mixed Conifer, Spruce Fir, Aspen and Montane Grassland at its highest elevations. The unsupervised classification map depicts these cover types in the Pinaleño Mountains, although they are not entirely representative of actual cover distributions. This Table 4 Error matrix comparing the unsupervised classification map (expected) to the SWReGAP field data (reference). During the analysis, very few "small" SWReGAP vegetation/land cover classes (those with fewer pixels in the study area associated with them) came up as "likely" classes. There may not have been enough unsupervised classes in general, and the chosen 0.001% class size threshold used in the ISODATA unsupervised classification may have been too large, or it is possible that other parameter(s) may be at fault. More unsupervised classes would have resulted in more numerous classes with fewer pixels and potentially, a greater opportunity for small SWReGAP vegetation/land cover classes to represent them. A lower class size threshold percentage in ISODATA could have worked in roughly the same way, allowing smaller classes to be generated and ultimately classified as one of the small SWReGAP classes. This is not believed to be a major issue; however, because small classes, by their definition, do not cover much land area and many (if not all) would have ultimately have been generalized into a larger class anyway.

Reference
SWReGAP classes have a higher thematic class "resolution." During the process of generalizing the SWReGAP classes to fit within the 16 classes of the unsupervised classification map, Table 5 Error matrix comparing the SWREGAP map with generalized classes (expected) to the 2017 field data (reference).  there were some issues that could have affected negatively the overall and individual class accuracy estimates. For instance, we feel that the fairly common SWReGAP class "Juniper Savanna" would have best been split between Pinyon-Juniper-Oak Woodland and Plains/Semidesert Grassland depending on the tree density. It was, however, assigned to Pinyon-Juniper-Oak alone. SWReGAP field data are also significantly older (2001 to 2004 versus 2017) and there were probably some instances of change in the landscape at the field observation sites, decreasing the apparent accuracy results. Areas affected by fire in the intervening years may be extensive in some classes (Ponderosa Pine, Mixed Conifer, and Interior Chaparral, for instance), and this could have significantly skewed the results as well. The accuracy of the individual classes differed greatly and was dependent on what field dataset it was referenced to, further confusing the decision about which map is more accurate. Fire history could have also affected the accuracy of our supervised classification map causing the forest/chaparral classes to be inaccurately assigned. The fact that all of the 2001 to 2004 SWReGAP field data in the study area were used (unchanged) to create training sites and vegetation spectral signatures for the supervised classification needs to be taken into account. We expect Ponderosa Pine and Mixed Conifer in particular, and Madrean Pine-Oak, Interior Chaparral, Spruce-Fir and Aspen to have greater errors compared with other classes because those classes are susceptible to big changes from intervening fire. In this particular case, the SWReGAP training data could be suspect solely because of age, and give poor results unless it is carefully vetted and burned areas accounted for. This may also affect the accuracy numbers when compared against the SWReGAP field data (verses the 2017 field data), making the forest/chaparral classes seem less accurate.

Reference
Why the huge difference in overall accuracy measurements of the SWReGAP map using the 2017 field data (44%) verses the 2001 to 2004 SWReGAP field data (75%)? Despite the 75% overall accuracy measurement being an overestimation based on it being the training data for the creation of the map itself, the overall accuracy using the 2017 field data, 44% seems artificially low. Our only explanation is that there were far too few-field observation sites (95) to give us an accurate portrait of a map covering 24;000 km 2 in such a diverse landscape. Additionally, many of these observation sites are highly spatially correlated as they are found along access roads and not randomly distributed across the mapping area, making our collection of field data less effective than even the low number reflects. The SWReGAP field data have a similar issue (albeit even more pronounced) with high spatial correlation because of the tight distribution along access roads, but the sheer number of points still allows for a better estimate of the map's accuracy.
The technique to classify the image data into many classes, and recombine them using the SWReGAP map as a guide was envisioned as a way to allow the map to differ significantly from and yet potentially improve upon the SWReGAP map. Unsupervised classification algorithms are adept at "pulling apart" the spectral variability of the image data, and minimizing the variability of the individual classes. By iteratively classifying the image into many classes, the variability of each class should be minimized, and the likelihood of it being a similar vegetation or land cover class maximized. The assignment of vegetation and land cover classes to these unsupervised classes, however, is a major potential source of error. In supervised classification, the vegetation or land cover class is usually determined for the training areas "a priori," and this class assignment is applied to the entire supervised class. The use of the SWReGAP map was expected to help account for this disadvantage.
The SWReGAP and our classification maps were done from images acquired at different times (12+ years difference), from different platforms (Landsat ETM+ and Landsat 8 OLI), classified by different procedures and analysts. To be compared, the SWReGAP map was translated into the same classification scheme and both were shown at the same level of detail. The accuracy assessment allows for quantitative comparisons of different interpretations and produces variations as described.
To expand upon this research, we consider that in addition to the comparison with the SWReGAP data, each unsupervised class could also be checked against a NDVI time plot (class average NDVI verses date) and other variables (such as elevation and class distribution) to help decide what class it most resembled. Additionally, we suggest that the supervised classification would be improved upon by checking the SWReGAP field data points to see if they were burned, or had changed class in the intervening years. It is still expected, that a more robust, and iterative supervised classification procedure could ultimately improve the map and result in an accurate rendition of the vegetation classes. Updating the unsupervised version could probably produce a similar result but would probably require even more time. In retrospect, we could have identified unsupervised classes that had no clear winner (where the "most likely" vegetation/land cover class had only a small edge over second place) and then analyzed the NDVI time plot and other variables to make a final determination. This hybrid option could potentially be accomplished in a reasonable amount of time.

Conclusions
Our approach to improve upon a regional, multistate mapping effort using an unsupervised classification to make updates in a smaller area was of limited success. A multitiered unsupervised classification generated over 900 classes, and these classes were then compared individually with the SWReGAP map and recoded to coincide with a class that represented the largest majority of the shared pixels. Our hypothesis was that the recombined, resultant map would update and potentially improve upon the vegetation/land cover class distributions of the SWReGAP map over the 24;000 km 2 study area.
Although a slight improvement (53% versus 44%) was measured in the overall accuracy of the unsupervised classification map versus the SWReGAP map using the limited field data from 2017, the field data collected during the previous SWReGAP mapping effort allowed for an analysis depicting the opposite. As the SWReGAP field campaign collected 933 points over the mapping area compared with the 95 points that were visited during the much shorter 2017 field campaign, more emphasis should be put on the results of the accuracy assessment conducted with a much larger set of reference samples. Therefore, it is surmised that the SWReGAP map is more accurate overall. Just how much more accurate is unknown as the accuracy of the SWReGAP map was measured as an upper limit as the field data (100%) was used to create the map. One would predict that the more classes generated by this multitiered, unsupervised technique, the more it should mimic the comparison SWReGAP map, but even with the 900+ classes, it was not enough. Although errors were certainly propagated from the SWReGAP map, we accept that this experimental technique was insufficient to overcome this shortfall and added new sources of error. Future efforts to develop a new vegetation map would benefit from using a carefully selected subset of the field data collected by SWReGAP and the 2017 field campaigns as training data. It is surmised that this more traditional supervised approach, with numerous, potential training sites, has a greater chance of success in a shorter timeframe than updating the vegetation class assignments of a pre-existing map on a class-by-class basis when there are over 900 classes to analyze. Although it is likely that the vegetation class assignments for the unsupervised classification could be greatly improved upon by analyzing the NDVI time plots (phenology) and other variables, it would almost certainly require a significant amount of time because of the sheer number of classes to consider. Perhaps if the class-by-class analysis was limited in scope to cases where there was no clear winner "most likely" class, the timeconstraint issue could be overcome and the map improved to a sufficient degree.
The final conclusion is that while it is not always possible, attaining a sufficient set of field reference data points is critical, and improves analysis dramatically. If we had relied on the 95 field sites visited during the 2017 field campaign, we would have had a different finding than if using the 933 SWReGAP field sites we ultimately chose to base our conclusions on. It may have been concluded that the unsupervised classification had indeed improved on the SWReGAP map over the study area, while it had not. Ultimately, the technique used to develop this map was insufficient at producing an accurate product.
and Nona Tuchawena helped provide support and staff for the field campaign. We would also like to thank the White Mountain Apache Tribal Forestry, especially Robert Lacapa (with Mike and Sam), for providing assistance and guidance in accessing their lands. We would like to thank Dr. Kathryn Thomas for working with us to interpret the SWReGAP field data. We acknowledge our USGS reviewers, Dr. Christopher Soulard, Dr. Cynthia Wallace, and Dr. Miguel Villarreal for their help in revising this paper. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.