Accuracy assessment model for classification result of remote sensing image based on spatial sampling

Abstract. The classification accuracy of a remote sensing image should be assessed before the classification result is used for scientific investigation and policy decision. We proposed an accuracy assessment model based on spatial sampling to reflect region sensitivity of a remote sensing image. The proposed model aims to solve the following problems: (1) what sampling size should be selected for accuracy assessment; (2) where sample points should be distributed in a region; and (3) how to analyze the result of accuracy assessment. This assessment model was proposed based on gray-level co-occurrence matrix (GLCM) and considered both sampling size calculation and sample points distribution during the assessment. The overall accuracy and kappa coefficient derived from this model were very close to the true value derived from the total assessment, suggesting that the assessment accuracy of the model is close to that of total assessment. Compared with the percent sampling model, the model could quantify the relationship between GLCM-correlation parameter and sample size, thereby allowing producer and user to determine sample size according to spatial uniformity and heterogeneity. Compared with the random sampling model, the model could ensure that the sample points are uniformly distributed in the spatial region and proportionally distributed in different types of land cover. Taken together, the proposed model is suitable for the accuracy assessment of the classification result of a remote sensing image.


Introduction
Remote sensing images could provide the representation of object surface at different spatial and temporal scales. They are widely used in a great number of fields, including predicting epidemiology and burned area, 1,2 detecting forest and cultivated land changes, monitoring soil erosion and environmental change, 3,4 and mapping land cover and species distribution. [5][6][7][8] In particular, the majority of a remote sensing image should be conducted image classification before their applications, which can be achieved by either visual or computer-aided analysis. A key concern during image classification is whether the classification result derived from the remote sensing image has sufficient quality for operational application. Thus, it is required to propose accuracy assessment model to judge whether the accuracy of classification result meets the requirement of user's applications.
Currently, several methods have been used for accuracy assessment of remote sensing classification result, including population-based statistical framework, 9 multiple-objective accuracy assessments, 10 geographically weighted accuracy measures, 11 and stratified random sample for the National Land Cover Database. 12 Some studies take sampling size calculation as the major concern, whereas other studies take sample points distribution as the major concern. However, the classification result of remote sensing is a special product. Both sampling size calculation and sample points distribution are crucial for the classification accuracy. During image classification, it is required to determine sample size based on spatial autocorrelation, select sample points based on spatial heterogeneity, and qualify classification accuracy by comparing sample points and reference data.
In this paper, we proposed an accuracy assessment model for a classification result of a remote sensing image based on spatial sampling. This model considered both sampling size calculation and sample points distribution. It would allow producer and user to determine sampling rate according to spatial uniformity and heterogeneity. Moreover, it could ensure that sample points are uniformly distributed in the spatial region and proportionally distributed in different types of land cover.

Remote Sensing Data and Study Region
The study region is located in Sichuan Province, Western China. The data set is a fusion image of multispectral and panchromatic images based on the Landsat-8/OLI image obtained on August 24, 2015, with 15-m spatial resolution [ Fig. 1(a)]. The image has 256 different gray levels. The original image is available at http://www.gscloud.cn. The reference data are aero high spatial resolution images obtained on August 10, 2015, with 0.6-m spatial resolution [ Fig. 1 The two data follow the same coordinate system WGS_1984_UTM_zone_48N.

Accuracy Assessment Model
The accuracy is typically used to express the degree of "correctness" of a classification result. We proposed an accuracy assessment model to reduce data redundancy and ensure assessment precision based on two parameters, sampling size (n) and optimal distance (d). In the model, each pixel was defined as an assessed item. Supposed that the remote sensing image was rectangular, which had N x columns and N y rows, the lot size (N) of accuracy assessed items was N ¼ N x × N y .
According to the first law of geography, 13 each pixel had the spatial autocorrelation with each other. The closer autocorrelation was more strongly related than that of more distant ones. In this paper, the spatial autocorrelation was calculated by gray-level co-occurrence matrix (GLCM). The sampling size n and optimal distance d were then deduced based on the model of accuracy assessment.

Gray-level co-occurrence matrix
Supposed that the gray level at each pixel was quantified as N g levels. G x ¼ f0; 1; : : : ; N g − 1g was the set of N g quantified gray levels. The remote sensing image, H, indicated a function that assigned some gray level in G to each pixel or pair of coordinates in N ¼ N x × N y .
The texture-context information was specified by the matrix of relative frequencies (P ij ) with two neighboring pixels separated by distance d in the remote sensing image, where one pixel with gray level i and the other pixel with gray level jði; j ∈ G x Þ.
The matrices of gray-level co-occurrence frequency (P ij ) were represented as a function of the angular relationship (θ) and distance (d) among the neighboring pixels as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 2 1 pði; j; d; θÞ ¼ #f½ðk; lÞ; ðm; nÞ ∈ ðN y × N x ÞðN y × N x Þg (1) where # was the item number, ðk; lÞ and ðm; tÞ were the rows and columns information of the pixel with i and j gray, respectively, and d was the number of interval pixels between ðk; lÞ and ðm; tÞ on angular ðθÞ in the practical calculation. GLCM-correlation parameter ðrÞ of each pixel was calculated by the following equation: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 2 3 where P ði;j;d;θÞ was the entry in a normalized GLCM. The mean (μ) and standard deviations (σ) for the rows and columns of the matrix were calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 4 5 2 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 4 0 8 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 3 7 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 3 3 1 GLCM-correlation parameter ðrÞ ranged from −1 to 1. When r was close to 1, the pixels had strong spatial correlation, which were located at ðk; lÞ and ðm; tÞ. Otherwise, the pixels had weak spatial correlation.

Accuracy assessment model
Based on the GLCM-correlation parameter ðrÞ, the sampling size ðnÞ, and optimal distance ðdÞ were deduced as shown below: where ε was an arbitrarily small value, r 0 was the critical value of GLCM-correlation parameter ðrÞ provided by the users and producers to balance data redundancy and accuracy, θ was defined as the value with four different orientation information, including 0 deg and 90 deg. Here, for simplified calculation, only two different orientations were considered. n was the optimal sample size. n 0 deg and n 90 deg were the number of the interval pixels at 0 deg and 90 deg, respectively.

Accuracy Analysis and Comparison
The feasibility and advantage of our proposed accuracy assessment model were assessed by comparing with total assessment, percent sampling model, and random sampling model. The overall accuracy, producer accuracy, user accuracy, commission, omission, and kappa coefficient were used as the assessment parameters during these comparisons. [14][15][16][17][18][19] 3 Results

Classification Result of Remote Sensing Images
Five different types of land cover were classified from the two above-mentioned images, including building, agriculture, bare, water, and forest based on the support vector machine (SVM) in ENVI 5.1 software. Two classification results in vector form were shown in where w represented the orthogonal vector to the hyperplane, fðxÞ ¼ 0; b∕jjwjj was the distance from the hyperplane to the origin, and hx; wi denoted that x inner products w. The parameters of Eq. (8) were obtained from the following quadratic optimization problem: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 4 0 9 where λ i was the Lagrange multipliers, y i ¼ f−1; þ1g defined the class of x i , since SVM was a binary classifier, C acted as an upper bound of λ values, and φðxÞ was a function adopted to remap the input vectors into a higher dimensionality space. The inner product hφðx i Þ; φðx j Þi was known as the kernel function. A popular example of kernel was the radial basis function, expressed by hφðx i Þ; φðx j Þi ¼ expðkx i − x j k 2 ∕2σ 2 Þ; σ ∈ R þ , which was adopted in this study. 20,21 The parameters C and σ were performed as C ¼ 100 and σ ¼ 0.25, respectively.

Sample size calculation
The pixels of the studied remote sensing image (N) were 401,888 totally. The sampling rate was the proportion covered by the sample size (n) in the total size of this image data (N). Calculated by Eq. (2), the quantitative relationship of study region between distances (interval pixel) and GLCM correlation was shown in Fig. 3. Taken the GLCM-correlation parameter r ¼ 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, and 0.5 as example, the values of the optimal number of interval pixels and the optimal distance in both 90-and 0-deg orientations were shown in Table 1.
Based on Table 1 and Fig. 3, we knew that GLCM-correlation parameters were negatively related with the number of interval pixels. If the number of interval pixels became large enough, the GLCM-correlation parameter would be close to 0. GLCM-correlation parameters had a different gradient in different orientations. In this study, the gradient was sharper at 90-deg   (4) orientation than that at 0-deg orientation. If the GLCM-correlation parameters had a large value, lager sample size should be selected for the accuracy assessment of land cover.

Sample points distribution
The distribution of sample points affected the assessment precision. In this study, the principle of sample points selection was uniformity and heterogeneity. Based on the optimal distance (D) in Table 1, the experimental region was divided into n rectangles and one's area was D × D. One sample point was then selected in each rectangle region. Thus, n sample points were selected. Taken GLCM-correlation parameters r ¼ 0.85, 0.8, 0.75, 0.7, and 0.65 as example, sample points located in the region were shown in Fig. 4. Based on Fig. 4 and Table 2, we concluded that: (1) the sample points are uniformly distributed in the studied region, which were not associated with sample size (Fig. 3) and (2) the sample points are uniformly distributed in different types of land cover, which were consistent with the area of different types of land cover (Table 2). Thus, the result showed that the proposed model could ensure that the sample points are uniformly distributed in the spatial region and different types of land cover, which were unrelated with the definition of GLCM-correlation parameter and the size of land-cover area.

Accuracy analysis of classification result of remote sensing image
In this study, we took the land-cover classified from high-resolution image as reference data. We then selected the points located at the same positions from the high-resolution image and studied image, respectively. If the type of land cover from the two different images was consistent, the variable was assigned as 1. Otherwise, the variable was assigned as 0. The confusion matrix of accuracy assessment was shown in Table 3 (GLCM-correlation parameters r ¼ 0.85). Overall accuracy, kappa coefficient, and other assessment parameters could be obtained from the above-mentioned confusion matrix.  The accuracy parameters obtained from total assessment (401,888 pixels) were taken as the true value. The accuracy parameters obtained from our model were taken as the assessment values (Tables 4 and 5).
The rate of deviation (r) was calculated by Eq. (8). Figure 5 showed the rate of deviation comparison of each GLCM-correlation parameters E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 3 6 3 where r was the rate of deviation,P denoted the accuracy value of each GLCM-correlation parameters, which was overall parameter or kappa coefficient, and P was the overall parameter or kappa coefficient of true value. Based on Tables 4 and 5 and Fig. 5, we knew that the overall accuracy and kappa coefficient derived from our model were very close to the true value. The greatest rate of deviation was only 0.54%. As the GLCM-correlation parameter increased, the rate of deviation of overall parameter and kappa coefficient decreased. Thus, the assessment accuracy of our proposed model was close to the accuracy of total assessment.

Comparison Results of Different Assessment Models
In this section, we used three different assessment models to conduct accuracy assessment for the classification result of the above-mentioned remote sensing image, including percent sampling model, random sampling model, and our proposed model.

Compared with percent sampling model
Taking 2% as the sampling rate, the percent sampling was used to assess the accuracy of land cover. As shown in Fig. 6, we knew that the percent sampling model had a fixed sampling rate. The autocorrelation among different pixels was ignored in the remote sensing image. Thus, it was different to define the sampling rate for percent sampling model. However, our model could   Table 5 Comparison of different accuracy parameters.     (4) quantify the relationship between GLCM-correlation parameter and sampling rate. Thus, the producers and users could easily determine the sampling rate according to the spatial autocorrelation and heterogeneity.

Compared with random sampling model
Given the sampling size of 825, the sample points were randomly selected in the region three times. Figure 7 showed the result of the random sample sampling at one time. Figure 8 showed the rate of deviations for random sampling model and our proposed model. We knew that the result of accuracy assessment for random sampling model was not consistent. Sample distribution is an important determinant in accuracy assessment. If sample distribution was considered, it would lead to sample choice preference and could not provide an objective result. As shown Table 4, the sample rate was consistent across different experiments in our model. Moreover, the sample rate deviation of our model was less than that of random sampling model.

Discussions
The classification accuracy of the remote sensing image is very necessary before the application for scientific investigation and policy decision. In this study, we proposed an accuracy assessment model based on spatial sampling. This model considered both sample size calculation and sample points distribution during the accuracy assessment. Compared with percent sampling model, the proposed model could quantify the relationship between GLCM-correlation parameter and sample size. Compared with random sampling model, the proposed model ensured that the sample points are uniformly distributed in the spatial region and proportionally distributed in different types of land cover. Overall, our model is suitable for the accuracy assessment of the classification result of the remote sensing image. During the classification accuracy assessment of the remote sensing image, our model could not only consider sample size calculation but also consider sample points distribution. As for sample size calculation, we used the GLCM to quantify the relationship between spatial autocorrelation and sample size. This matrix could provide useful information about the spatial relationships of pixels in an image. Compared with percent sampling, which has a fixed sampling rate, our model could allow the producers and users to determine the sampling rate according to the spatial autocorrelation and heterogeneity. As for sample point distribution, our method considered both the uniformity and heterogeneity of sample points distribution. It ensures that the sample points are uniformly distributed in the spatial region and proportionally distributed in different types of land cover. Compared with random sampling model, our model has great advantage on accuracy consistence and sample rate deviation.
However, there are some limitations for our proposed model. We only calculated the GLCMcorrelation parameter ðrÞ at two different orientations, including 0 deg and 90 deg. More directions should be considered in future study. In this study, we proposed an accuracy assessment model for remote sensing classification result based on spatial sampling. This model calculates the sample size required for accuracy assessment, determines the sample points distributed in a region, and analyzes the result of accuracy assessment. This model considers both sampling size calculation and sample points distribution during the classification accuracy assessment. Our model could allow producer and user to easily determine sample size. Moreover, our model ensures that the sample points are uniformly distributed in the spatial region and proportionally distributed in different types of land cover. Thus, our proposed model is a suitable model for the accuracy assessment of the classification result of the remote sensing image.