Influence of surface height variance on distribution of ground control points

Abstract The process of geometric correction of a satellite image requires a network of ground control points (GCPs), the density of which depends on the accuracy required. In this study, an approach to determine the distribution of the GCPs was developed based on dividing the surface terrain into zones of low- and high-height variances. Comparative statistics were investigated to summarize the optimum number and distribution of the GCPs. For a uniform arrangement of 10 GCPs, the accuracy using root mean square error was 7.19 m. This accuracy was improved to 6.13 m following the inclusion of just two additional GCPs in the zone of high-surface height variance. Thus, the transformation using the polynomial model and a set of GCPs, for which the surface variance in height was considered, resulted in greater accuracy than using the conventional uniformly distributed method. In addition, the time required for the trial-and-error selection of the locations of the GCPs was reduced. Our results suggest that a method that considers the variance in height of the surface terrain could be applied to various types of images such as satellite or aerial photography.


Introduction
Aerial photographs and satellite images are fundamental sources of data used for the analysis in geographic information systems, remote sensing, and mapping. The image acquisition process depends on the type of sensor used. Geometric distortion of the images from current pushbroom satellites, such as SPOT, ASTER, and Quickbird, is derived from two principal sources: inaccurate modeling of the charge-coupled device sensor geometry and small movements of the instrument platform during image acquisition. 1 High-resolution satellite images are prone to geometric distortion and thus, geometric correction is desirable. 2 The primary distortions have both internal and external sources. The former are the distortions inherent within a system that can be adjusted based on the characteristics of the sensor, e.g., lens distortion, which can be compensated for using the colinearity equation. The latter are the errors introduced from sources such as the Earth's curvature and changes in atmospheric conditions, the reflective properties of light, and surface topography. These external sources distort the image directly, causing positions within the image to be inconsistent with those on the ground. Variations in elevation and surface roughness cause a particular type of distortion known as perspective distortion, which normally happens when using a short focal length.
In some cases, images used in satellite data analysis have no accompanying information regarding the sensor model of the satellite and thus, they may require adjustment using other appropriate models. Different methods for the transformation of one coordinate system to another have been proposed, 3 e.g., the Helmert transformation, affine transformation, pseudo-affine transformation, projective transformation, second-order conformal transformation, and polynomial transformation. Of these, the polynomial model is simple and easy to use. Historically, polynomial models are among those empirical models used most frequently for fitting functions. 4 The distortion is predominantly low frequency and therefore, it can be modeled using a low-order polynomial. Polynomials offer many advantages such as simple form, moderate flexibility of shapes, well-known and understood properties, and ease of use computationally. This model transforms three-dimensional (3-D) object coordinates to image coordinates and converts the data from 3D to two-dimensional (2-D). This transformation requires the use of ground control points (GCPs), which may be obtained using integrated information from ground surveys, photogrammetry, or GPS data comprising x, y, and z coordinates. Rizos and Satirapod 5 stated that the differential GPS technique generally provides an accuracy of better than 3 m in the horizontal component at twice the distance root mean square error (RMSE) (at approximately 95% confidence level). The results of the transformation are the coefficients of the polynomial equation.
Generally, GCPs for geometric correction are distributed uniformly throughout the entire image using a square grid. 6 This approach is suitable for the calculation and determination of the GCP positions, but it does not account for onsite topography. The number of GCPs should be higher in mountainous areas or in areas with high variance of surface height. However, obtaining GCPs by field survey or other methods can be costly and time consuming; therefore, the optimization of the GCP network is desirable. The GCP locations should have high-spatial frequency and well-defined positions such as at river crossings and road junctions. 7 Elevation is one of the three important variables needed to determine the coefficients of the 3-D polynomial equation and the positions of the GCPs. The GCPs should be located to sample the 3-D content of the data, i.e., a greater number of samples in regions of high height variation. Wang et al. stated that the spatial distribution of GCPs may affect the accuracy and quality of image correction. 8 This study aims to determine the optimum number of GCPs required for each type of surface by classifying the level of height variance based on the standard deviation (SD) of the elevation distribution.

Study Area and GCP Availability
The study area is located in Chiang Mai and Lampoon, which are northern provinces of Thailand, approximately 600 km from Bangkok. This area has a variety of topographic characteristics including both flat areas and highly complex surfaces. The coordinates of the upper left and lower right corners of the target area are 98.83040°E, 18.7702°N and 99.02550°E, 18.5918°N, respectively, encompassing an area of approximately 406 km 2 . The observed surface variance in height, based on a digital elevation model (DEM), is highly diverse, causing high values of SD and difference in elevation (the highestthe lowest), which are 69 and 643 m, respectively. Areas with high variance in height (mountainous terrain) accounted for ∼35% of the area, mainly to the northwest; and the remaining 65% in eastern and southern parts of the image was areas with low variance in height (urban). Figure 1 shows an image of the landscape of this area acquired from Worldview-1 (Level 1B) on February 23, 2008. Level 1B data are radiometric-corrected, sensor-corrected, and band-aligned, providing imagery as seen from the spacecraft without any further correction of terrain distortion. Worldview-1 operates at an altitude of 496 km with a resolution of 50 cm in the panchromatic band at nadir. The specified geolocation accuracy of the instrument is less than 4.0 m CE90 (Circular Error of 90%) without GCPs in a 2-D image. 9 There are 78 GCP points with 3-D coordinates and accuracy of 0.2 m available for the image, which are provided by the Land Development Department of Thailand; however, only 68 GCP points were used in this study as independent check points (ICPs). The ICPs are those points that have been used in the RMSE process for checking the transformed image. A static and kinematic GPS survey was used to acquire the positions of the GCPs. All GCPs including the ICPs are shown in Fig. 2.

Three-Dimensional Polynomial Model
Using GCPs to determine the coefficients, polynomial modeling is applied to generate a new surface model with enhanced accuracy, both in terms of shape and position of the image. Stability analysis of the polynomial differential equations is a central topic in nonlinear dynamics and control, which in recent years has undergone major algorithmic developments following advances in optimization theory. 10 The transformation model creates a new geocoded image space, in which interpolated pixel values will be placed later during resampling. The procedure requires that the polynomial equations to be fitted to the GCPs using least squares criteria to model the geometrical distortion in the image domain independently of its source. The degree of the polynomial may be chosen based on the desired accuracy and the number of GCPs available. In general, polynomial equations used for modifying images are typically second or third order. The independent variables in the equation are X n , Y n , and Z n . The dependent variables are the row and column positions of the pixel (r p and c p ).
For matching the point of the GCP chip on the ground and the associated point on the distorted image, the surrounding observation has been used. The coordinate functions are as shown below: where r p is the position of the pixel in the row that corresponds to the column of the pixel in the distorted satellite image, c p is the position of the pixel in the column that corresponds to the row of the pixel in the distorted satellite image, and X, Y, Z, and n are the ground coordinates that correspond to the east and north positions, elevation, and local position number, respectively. Function p v ðX n ; Y n ; Z n Þ and p h ðX n ; Y n ; Z n Þ, as a second-order polynomial, can be generated from Eqs. (1) and (2) as follows: Generally, polynomial transformations with terms up to the first order can model rotation, scale, and translation and are computationally economical. As additional terms are added, more complex warping can be achieved. a j and b j ðj ¼ 0; 1; : : : ; 9Þ are coefficients that are unknown but can be calculated using the least squares method from the error E of all data n deviating from the function pðX n ; Y n ; Z n Þ as follows: Then, the minimum error can be found by differentiating E with respect to the coefficient and setting the result to zero; thus, the following 10 subequations are produced: The results of these equations will be in the form of a matrix with the following equation: 2 On the left side of this equation, a symmetric matrix "A" represents the known values square matrix (10 × 10), and matrix "B" denotes the coefficients matrix (10 × 1). On the right side, matrix "C" represents the known value matrix (10 × 1). Therefore, the multiplication of both sides of Eq. (7) with inverse matrix A provides the coefficients a 0 ; a 1 ; a 2 ; : : : ; a 9 .

Surface Variance in Height Classification Method
Bayer et al. 11 stated that the synthetic aperture radar images reveal radiometric image distortions that are caused by terrain undulations, because the 3-D nature of the surface is ignored and GCPs are distributed uniformly here. We account for this and increase the sampling density depending on local variations of elevation. Here, two zones were chosen for these data; however, this may not be optimal. One method to allow for this would be to partition the image into several regions based on height variation and to have different GCP densities in the different regions. Many histogram-based approaches identify a threshold for the binarization of an image by approximating a histogram as two distribution functions. [12][13][14] Wu et al. proposed procedures for changepoint detection (threshold selection) by determining the maximum distance from a line L to a cumulative histogram, where line L connects the first and last points of the cumulative histogram (Fig. 3). 15 The main advantage of this method is being able to use the height to find a changepoint of the surface variance in height. It is also appropriate for a large number of data points.
In Fig. 3, the straight line L connecting ðx 1 ; y 1 Þ and ðx N ; y N Þ is a cumulative "equalized" histogram. Histogram equalization modifies the pixel values in such a way that the intensity histogram of the resulting height becomes uniform. 16 The maximum distance to L provides a baseline for dividing the cumulative histogram into two parts. The point ðx T ; y T Þ, representing the maximum distance to L, is the change-point to be used as the threshold. Line L can be obtained from the following equation: Assume that the obtained line equation is as shown below: ax þ by þ c ¼ 0: The distance from ðx i ; y i Þ to L is expressed as follows: Compare the distances of all ðX i ; Y i Þ points, where 2 ≤ i ≤ N − 1, to L and set T ¼ arg max N−1 i¼2 D½ðX i ; Y i Þ; L. Thus, the selected threshold is X T . 15

Change-Point Determination Using the Elevation Distribution
The classification of surface height variance, based on elevation data, was conducted using grid-based DEM data. The DEM data with resolution of 30 m were downloaded from the Global DEM and operated jointly by NASA and the Japan Ministry of the Economy. These data were then used to create a graph of the frequency distribution. The change-point was determined using the cumulative distribution curve. Figure 4 shows the elevation on the x-axis, the number of pixels in each elevation interval on the left side y-axis, and the cumulative percentage at each elevation interval on the right side y-axis. Based on this graph, the change-point for elevation was determined to be 308 m.

Change-Point Determination Using the SD Distribution
The SDs represent the variance in height. Each SD is calculated using the eight pixels that surround a pixel of interested from a 3 × 3 elevation matrix. Therefore, the same calculation can be used to determine all the SDs for the entire image. The method proposed by Wu et al. 15 was also used to determine the change-point based on the SD histogram. Figure 5 shows a SD histogram with the SDs plotted on the x-axis, the number of pixels in each SD interval shown on the left side y-axis, and the cumulative percentage of the SDs presented on the right side y-axis. Based on this graph, the SD change-point was determined to be 3.

Classification and Zoning
After the change-points were determined, both the elevation frequency distribution and the SD frequency distribution were used to classify the surface into zones. The classification was conducted by designating those areas with elevation >308 m or SD > 3 as zones with high variance of height and necessarily, those areas with elevation <308 m or SD < 3 were classified as zones with low variance of height.
The final zoning of the area using Global Mapper software is shown in Fig. 6. It can be seen clearly that the area is divided into two zones following the change-point detection process. The white part represents the zone of high variance in height, and the black part represents the zone of low variance in height.

Number of GCPs for Each Zone
The model used for geometric correction and for identifying the numbers and positions of the GCPs in this study was a second-order 3-D polynomial. As shown above, there are 10 variables in the form of coefficients in the polynomial: a 0 to a 9 . The number of GCPs required for this model is 10 per image. The image was classified into two zones using the zoning method described above, and the 10 points were distributed between these two zones. The number of GCPs assigned to each zone was determined after consideration of the zones size and degree of height variance.
The percentages of the total area (measured from the image) assigned to the two zones without consideration of the degree of variance in height were 35% (0.35) and 65% (0.65), respectively. The trial weights for the degree of variance in height ranged from 40% to 70% for four cases. Table 1 shows five scenarios for the case being studied. In the first scenario, the 10 GCPs were distributed uniformly throughout the entire area. The proportional areas of the two zones were the same for all cases. For example, the calculation for Case II was conducted with the area of high variance in height assigned an influence of 40% and a proportional area of 35% (fixed). Accordingly, the area of low variance in height had an influence of 60% and a proportional area of 65% (also fixed). Based on this information, the number of GCPs to be distributed into each zone was determined to be 4 and 6 for the low and high variances in height zones, respectively. The results for all scenarios are shown in Table 1.

GCP Allocation
In general, GCPs are used in uniform distribution patterns. Points should first be placed on the four corners of the image to control the accuracy of regional mapping. 17 Then, the other GCPs should be distributed uniformly within each zone. In this study, the selection of the locations of the GCPs was conducted manually. The locations of the GCPs for the five trial cases, based on the numbers of GCPs presented in Table 1, are shown in Fig. 7. Figure 7(a) shows Case I, in which the GCPs were allocated using the uniform distribution method. Four GCPs were placed at the corners to control the boundaries of the study area. The fifth GCP was placed at the center of the area, and the remaining five GCPs were spaced equally around the fifth point, resulting in a pentagon shape. The variance in height was not considered in this case.
For Cases II to V presented in Figs. 7(b) to 7(e), the GCP allocation based on a uniform distribution was conducted as follows: 1. Ten assumed points were defined within the study area, corresponding to the coefficients in the polynomial (a 0 to a 9 ), and each point is circled to show clearly the definition applied in each trial case.

RMSE
where n is the number of all remaining ICPs, X act and Y act are the actual control points' X and Y coordinates, and X est and Y est are the estimated control points' X and Y coordinates. Table 2 shows the RMSEs results when the GCP distribution was selected by varying the numbers and locations of the GCPs. It can be concluded that Case I is the best case because it shows the lowest average RMSE. This is probably because Case I applied a uniform distribution, which is a popular method for allocating GCPs. Furthermore, the variance in height of the surface in this case was ignored. The other cases that resulted in an increasing RMSE after the GCP positions were moved from the area of low variance in height to the area with high variance in height.
To compare the 68 points in the transformed image, 68 ICPs were used and the evaluated direction of error, both in the x-and y-axes, is presented in Fig. 8. It was necessary to enlarge the scale by a factor of 100 to show clearly the direction of error. Furthermore, Fig. 8 shows that there were still errors in the upper-left corner of the map of error, corresponding to the zone of high variance in height.
To confirm that the surface variance in height influenced the accuracy of the geometric correction of the image, a number of GCPs were removed or added to the zone of high variance in height and low variance in height; the different number of GCPs used per trial was two points. The results from this validation are presented in Tables 3 and 4. By increasing and decreasing the number of GCPs in the zone of high variance in height, it was determined that the number of GCPs cannot be reduced below 10, as shown clearly in Fig. 9 by the rapid increase in RMSE when the number of GCPs was reduced from 10 to 8. The RMSE decreased by about 12% following the increase in the number of GCPs from 10 to 12. No significant improvement in accuracy can be observed by increasing the number of GCPs beyond 12.   Fig. 9 Effect of removing/adding GCPs in high-variance zone.
Increasing or decreasing the number of GCPs in the low variance in height zone resulted in the same trend found in the high variance in height zone; however, the results did not show any significant improvement in accuracy following an increase in the number of GCPs beyond 10 ( Fig. 10).

Conclusions
In this study, an image of mountainous terrain of the Chiang-Mai area acquired by Worldview I, a global DEM, and 78 ground truth points with an accuracy of 0.2 m were used to evaluate the accuracy of distortion correction using a polynomial model. Using the turning point from the elevation and SD histogram, the terrain was divided into two regions: low variance in height and high variance in height. This study determined the RMSE of the GCPs' coordinates and demonstrated that the distribution of the GCPs affects the accuracy. In particular, it was shown that a uniform distribution of GCPs was the best arrangement. The effect of increasing or decreasing the number of GCPs in the high/low variance in height zones was investigated for the case of uniform distribution. The results conclude that a suitable minimum number of GCPs was 10 and that the accuracy was improvable by including just two additional GCPs in the zone of high variance in height, but that increase in the number of GCPs in the zone of low variance in height offered no significant improvement in accuracy.
The results of this study could be applied to aerial photographs or satellite images as guidance for identifying the optimum numbers and locations of GCPs for each zone of variance in height, especially for study areas with a variety of surface types. Through this approach, researchers could save time on the trial-and-error selection of locations for GCPs.
Sudniran Phetcharat is a PhD student of the Asian Institute of Technology, Thailand. He received the B.Eng. degree in transportation engineering from Suranaree University of Technology, Nakonrachasima, in 1997, and an M.Eng. degree in civil engineering from Kasetsart University, Bangkok, in 2001. He has been an assistant professor with the Civil Engineering Department, Srinakharinwirot University, College Station. His research interests include crumb rubber-modified asphalt, highway materials, GIS application in transportation, and house saving energy.
Masahiko Nagai is an associate director of the Geoinformatics Center and a visiting faculty in the Asian Institute of Technology (AIT) in Thailand. He received the Master of Science from AIT and the Doctoral of Engineering from the University of Tokyo, Japan. His research focuses on geoinformatics, mainly environmental monitoring, Earth observation data interoperability arrangement with ontological information for remote sensing and GIS, construction of 3-D modeling by integrating multiple sensors, as well as disaster management.
Taravudh Tipdecho is a research specialist I (adjunct faculty) at the Asian Institute of Technology (AIT) in Thailand. He received his BS and MS degrees from Chiangmai University, Thailand, and D Tech Sc. degree from AIT, Thailand. His research focuses on the acquisition methods for large scale geoinformation. In recent years, the research focus has been on information extraction from airborne laser-scanning data. These interests also include accuracy analysis, image understanding, and object modelling.