Lithologic classification using multilevel spectral characteristics

Abstract. Geological maps are commonly used to investigate the distribution of geological natural resources, such as minerals. However, the existing 1:200,000 geological map created in the 1990s for Guangxi, China, cannot support efficient investigation and interpretation of the geological surface changes. Therefore, we propose the application of remotely sensed multispectral imagery to update the existing 1:200,000 geological map at a scale of 1:100,000. To this end, the analysis of the spectral characteristics of six types of lithologies from the USGS spectral library and from actual measurements by Field Spec®4, ASD Inc. is conducted first. With the analyzed results of the spectral characteristics, the carbonate rock is separated from the other rocks using band ratios, and then the study area is separated carbonate and noncarbonate areas. In the noncarbonate area, five types of rocks, named, shale, marble, sandstone, granite, and basalt, are classified using supervised classification, in which the training data sets are from the 1:200,000 geographic map. The field verification of the classified results shows that a classification accuracy of 66% is reached, which meets the accuracy requirement for the creation of 1:100,000 geological maps on that basis of the standard formulated by China Geological Map Remote Sensing Interpretation technology. The 1:100,000 geological map created will be delivered to the Guangxi Geological Bureau, China, for applications by the geological and remote sensing communities.


Introduction
The geological map of Guangxi Province in China was made at a scale of 1:200,000 in the end of the 20th century using early very-low-resolution remotely sensed image classification and field investigation. For enhanced usability, the scale of the 1:200,000 geological map should be updated to 1:100,000. The prerequisite for this task is that the types and coverage areas of each lithologic rock in Guangxi, China, should be obtained. To this end, this paper presents the method for classification of six major types of rocks, i.e., carbonate, shale, basalt, sandstone, marble, and granite.
Since the early 1980s, many investigators worldwide, such as Schetselaar et al., 1 Gomez, et al., 2 Zhang et al., 3 Vaughan et al., 4 and Hunt, 5-7 made considerable efforts to map minerals from remotely sensed imagery. These methods can be categorized as follows.
For example, Rowan and Mars 14 applied the Radarsat data and ASTER data for effective identification of lithologic information. Chica-Olmo et al. 15 combined the SPOT data and Landsat TM data with GIS space analysis to interpret the lithologic information.

Texture Extraction
Texture information is very important in identifying the types of rocks and minerals; therefore, the texture extraction method is widely used in the study of lithologic classification. For example, Arivazhagan and Ganesan 16 used wavelet statistical features for texture analysis. They found that the combination of wavelet statistical features and co-occurrence features could better extract the lithology. Chica-Olmo and Abarca-Hernandez 17 proposed the application of statistical functions representing the texture to improve the classification accuracy of the lithology. In recent years, many new methods have been proposed and widely used for lithologic classification, such as spectral angle mapping 18 and MNF transform and related band absorption depth analysis. 19

Pattern Recognition
Methods such as wavelet analysis, neural network, expert knowledge system, pattern recognition, and decision tree classification have been widely applied to lithologic classification. For example, Ishikawa and Gulick 20 and Mobasheri and Ghamary-Asl 21 presented the decision framework of an expert system for lithologic classification using hyperspectral images and achieved a classification accuracy of 96%. In addition, many other researchers such as Wilkinson 22 employed the spatial structures in combination with the spectral band to perform the classification.
Despite considerable effort in this field, there remains a shortage of lithologic classification methods. For example, the information fusion method is able to take advantage of different types of data sets, but data collection is expensive. Therefore, it is not suitable for lithologic identification of a large area. The texture analysis method is capable of effectively extracting lithologic information, but it is labor-intensive and has low effectiveness. The pattern recognition method is based on the results of initial classification and identification of rocks. Therefore, this paper takes advantage of the rock spectral information and prior geographical map.

Study Area
The study area is located at Guangxi, China, at a longitude and latitude of 104°26′ through 112°04′ E and 20°54′ through 26°24′ N. The study area covers ∼2.38 million km 2 (Fig. 1). It is a typical karst landform with karst rocky desertification area constituting ∼16.0% of the total karst rocky desertification area in China. 23,24 The strata in the study area are well-developed and the outcrop area is large. The three main types of rocks in the study area are sedimentary, magmatite, and marble rocks. The largest coverage is of sedimentary rocks, accounting for 90% of the study area. Because carbonate, shale, basalt, sandstone, granite, and marble rocks approximately account for more than 95% of the study area, this paper only extracts the abovementioned six types of rocks.

Data Sets
The data sets used in this study contain the following data: 1. Landsat 7 data: The Landsat 7 ETM+ satellite imagery was selected as the base data for lithologic classification in this paper. The data employed for this investigation were downloaded from the USGS website at Ref. 25  (1) numerous quaternary, cretaceous, jurassic, triassic, permian, silurian, and ordovician strata exist in this study area. (2) The largest coverage is of carbonate, accounting for more than 40% of the study area; the smallest coverage is of basalt [see Fig. 2 Six types of lithologies, i.e., carbonate, shale, basalt, sandstone, marble, and granite will be classified using the a priori knowledge from the geological map. 3. Laboratory spectral measurement data: Rocks are compounded with other types of rocks, for instance sandstone and shale, because of which it is not easy to classify them. This means that the spectral library that is "theoretically" measured for a pure type of rock or mineral by the USGS is not sufficient to express the spectral characteristics of the rocks in the study area. For this reason, this paper measures the spectral curves of seven types of minerals and rocks. The seven types of rock samples are granite porphyry, limestone, sandstone, quartz, quartz porphyry, basalt, and diabase, which were collected from the study area (see Fig. 3).
The laboratory measurements of seven types of rock samples were conducted as follows. Field Spec ® 4 from ASD Inc. with wavelength ranging from 0.350 through 2.500 μm was used to measure the spectral curves (see Fig. 4). The Field Spec ® 4 was first calibrated using a standard whiteboard. After the calibration was completed, the spectral curve was immediately measured for a minimal sample. Each type of sample was measured at least 5 times to ensure the reliability of the data. If the measured spectral curves of the samples  showed strong fluctuation, the measurements were repeated until a stable spectral curve was obtained.
We screened the measured results for redundant measurements and averaged the data. The spectral curves of the rock samples are shown in Fig. 5.

Repair the bad sectors
The Landsat 7 ETM+ scan line corrector suddenly failed to work on May 31, 2003, resulting in the image data being overlapped and partially gapped. Therefore, a gap filling algorithm was developed to fix the problems. The gap filling method improves the images by extracting the "good" image data from other images of the same geographic coordinates but from a different time phase, and then filling the "bad" sectors using the "good" images. The equation is given below E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 4 3 3 where α is the image gain calculated as α ¼

Atmospheric and geometric corrections
After repairing the defective sectors, a second-order polynomial equation was used for geometric correction of the Landsat 7 ETM+ imagery with ENVI 5.0 software. Seventeen ground control points that were well distributed across the study area were selected. During the correction, two points with large errors were eliminated, and the bilinear interpolation method was used for gray resampling. The MODTRAN model was employed for atmospheric correction using the flash unit in the ENVI 5.0 software. MODTRAN (mid spectral resolution atmospheric radiative transfer model) is a widely accepted model for correction of atmospheric radiation such as solar shortwave radiation, as it has high accuracy and flexible radiative transfer in scattering atmosphere. Next, the images were tailored and assembled into a complete image for the entire area of Guangxi, China [see Fig. 7(a)]. The results after the atmospheric corrections are shown in Fig. 7(b). Landsat 7 ETM+ has eight bands. This paper presents the statistical data of each band and studies their relevance. The specific statistical results are shown in Table 1. From Table 1, it can be seen that the correlations between band 2 and band 3 and between band 5 and band 7 are very strong. In order to fully reflect the different bands in the remote sensing image information, this paper applies band 7, band 4, and band 1 to create color-synthesized imagery [see Fig. 7(a)].

Analysis of Spectral Characteristics
It has been widely accepted that different ions cause different impacts on the spectrum of the rock, whereas rocks with same ion and lithology minerals have similar spectral curves. Therefore, this paper attempts to find an effective classification method for lithologic classes by analyzing the spectral characteristics of each type of rock.
The major components of sandstone are quartz and feldspar. The spectral characteristics of the minerals in sandstone from the USGS spectral library are shown in Fig. 8. The fact that different types of rocks have similar minerals presents a challenge for lithologic classification if their spectral characteristics alone are employed. For example, the spectral characteristics of sandstone in Fig. 9 are very similar to that of muscovite shown in Fig. 8, except for an absorption peak at 1.4 μm.
Shale is a type of sedimentary rock composed of mud, which is a mix of flakes of clay with fragments of other minerals such as quartz and feldspar. Shale contains 45% to 80% SiO 2 . 27 The spectral curves of shale from the USGS spectral library are shown in Fig. 10(a). As observed from Fig. 10(a), a prominent absorption peak occurs at 1.4 μm, but no obvious absorption and reflection peaks appear at other spectral ranges. In addition, both shale and sandstone contain    quartz, which implies that both should have very similar spectral curves. In order to further understand their spectral characteristics, the spectral curve of shale was measured and the result is shown in Fig. 10(b). It can be found that although the spectral characteristics of both shale and sandstone are similar, the reflectivity of shale is lower than that of sandstone.
Basalt is a rock with low reflectivity rate. 27 This indicates that the remotely sensed image in the basalt region has a dark tone or is black. The spectral curves of the main minerals in basalt, namely, olivine, pyroxene, hornblende, and biotite, from the USGS spectral library are collected and shown in Fig. 11. This figure shows a large absorption peak located at 1.1 μm and a reflection peak located at 0.7 μm. Therefore, the spectral characteristic/curve of basalt is obviously different from that of other rocks.
Granite is classified as an igneous rock that contains ∼65% to 75% of silicon dioxide. 27 The spectral curves of the constituents of granite rocks, namely, potash feldspar, quartz, muscovite, and biotite, from the USGS spectral library are collected and shown in Fig. 12. It can be found from Fig. 12 that granite and basalt have similar spectral curves, as both of them contain biotite. This implies that an alternate method should be used to classify granite rocks.  In addition, the spectral curves of marble minerals, namely, tremolite, alusite, serpentine, and potassium feldspar, from the USGS spectral library are collected and shown in Fig. 13. It is found from Fig. 13 that granite and marble rocks have similar spectral curves because both of them contain potash feldspar.
Carbonate rocks consist of different types of iron ores such as siderite and limonite. The spectral curves of calcite and dolomite have significant absorption peaks, but are relatively flat between 2.075 and 2.351 μm; the spectral curves of siderite and limonite are relatively flat between 1.550 and 1.750 μm (see Fig. 14).
The following conclusions can be drawn from the above spectral analysis: 1. Figures 8, 10(a), 10(b), and 14 show that carbonate, sandstone, and shale have similar constituents, which causes challenges in classification. It is, therefore, not a good approach to classify the three types of rocks simultaneously. 2. As inferred from Fig. 14, a band ratio method may be appropriate to classify carbonate, as this method can directly calculate the value of each pixel in a different band, and then use the ratio to form a new image. Cloutis et al. 28 discovered that the carbonate in the  ETM+ image has totally seven absorption peaks, band 5 has an absorption peak at 1.750 μm, and band 7 has two absorption peaks at 2.130 and 2.220 μm. Therefore, the use of the band ratio method should effectively separate carbonate from the other minerals. 3. As observed from Figs. 9, 10(a), and 10(b), the spectral curves of shale and sandstone are similar. Therefore, classification results would not be ideal if a supervised classification method is used to classify them simultaneously. For this reason, this paper selects supervised classification methods, the minimum distance method to identify shale and spectral calculation to identify sandstone. 4. As observed from Figs. 10(a), 10(b), and 13, the spectral features and texture features of marble rocks and shale are obviously different. Therefore, a supervised classification method is proposed to classify them. 5. Sandstone, basalt, and granite rocks have similar spectral curves according to the USGS spectral library. This indicates that their classification needs a combination of the laboratory spectral measurement data.

Classification of Rocks
Based on the above analysis, a flowchart for the classification of six types of lithologies is proposed in Fig. 15. The details are described as follows.

Classification of carbonate rocks
Band ratio method was applied to extract the carbonate rocks. The band ratio method is a numerical treatment applied to multispectral remotely sensed images, as it can enhance the spectral characteristics of the lithology and calculate the ratio of the absorption band to the reflection band. It determines the ratio of the pixel brightness of different bands and then uses the ratios to create a new image.
As mentioned above and shown in Fig. 14, the spectra of the main minerals, namely, calcite and dolomite, in carbonate rocks are relatively flat between 1.545 and 1.755 μm, and obvious absorption peaks occur between 2.075 and 2.351 μm. The carbonate rocks have an absorption peak at 1.750 μm in the ETM+ image in band 5. There are two absorption peaks of band 7 in 2.130 and 2.220 μm, respectively, which can be classified by the ratio of these two bands. The band ratio in ETM+ imagery is calculated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 2 3 ½floatðB5Þ∕floatðB7Þ: (2) The result of the band ratio is used for binarization by the decision tree in ENVI 5.0. Then the density of the study area is divided and the threshold range of the carbonate rocks is selected. The carbonate rocks are classified using the selected threshold range. The results of the classified carbonate rocks are evaluated using the 50 checkpoints well-distributed across the study area. The classification accuracy is evaluated using the confusion matrix, and an accuracy of ∼64.5% is achieved [see Fig. 20(a)].

Classification of shale and marble rocks
After carbonate is extracted, the remaining five types of rocks, namely, shale, marble, sandstone, granite, and basalt, will be classified using five training areas representing five types of lithologies. The five types of training areas are selected from the 1:200,000 geological map. The operations are as follows: The training data sets of the five types of lithologies were selected from the 1:200,000 geographic map and are shown in Fig. 17. As observed from Fig. 17, the texture features of shale and marble rocks are obviously different. Figures 10 and 13 show that the spectral curves of shale and marble rocks are significantly different. Therefore, the minimum distance method is used to classify the shale and the marble rocks. To avoid misclassification of shale and sandstone, it is proposed to first classify shale and then extract sandstone. Therefore, the training data sampled in the analysis in this section should be very accurate and reliable. The classification results are depicted in Fig. 20(a).
With the 50 selected checkpoints, the classification accuracies of shale and marble rocks reach ∼67.4% and 65.7%, respectively.

Classification of sandstone and granite rocks
In order to further classify these rocks, ENVI 5.0 was used to collect the spectral curves measured in the laboratory, as described in Sec. 3.1. We further resampled these spectral curves at the spectral range of 0.4 through 2.4 μm. Finally, the spectral curves of seven rock samples were created, as shown in Fig. 18.
Because the content of acidic igneous rocks is generally more than 65%, 27 this paper uses 65% quartz and 35% quartz porphyry and granite porphyry as the rock sample below. Based on the components of granite rocks, the spectral curves can be calculated using Spectral Math Suit in ENVI 5.0 as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 2 9 7 s1 × 0.65 þ ðs2 þ s3Þ × 0.35; (3) where s1 represents the spectral curve of quartz, and s2 and s3 represent the spectral curves of quartz porphyry and granite porphyry, respectively. Similarly, the spectral curves of sandstone can be expressed as where s1, s4, and s5 represent the spectral curves of quartz, dolomite, and sandstone, respectively. In addition, the spectral curves of basalt can be expressed as where s1 and s6 represent the spectral curves of quartz and diabase, respectively, and s7 represents that for basalt. Because the component of quartz in basalt accounts for ∼48%, the proportion of quartz is set to 48%. The calculated spectral curves for granite rocks, sandstone, and basalt calculated using Eqs. (3)-(5) are shown in Fig. 19.  The classification of granite rocks, sandstone, and basalt based on the results in Fig. 19 is performed as follows. First, the difference in the spectral information for sandstone and granite rocks at 1.4 μm is used as the basis for classification, as the absorption peaks of granite rocks are obvious at 1.4 and 1.9 μm, whereas the absorption peaks for both sandstone and basalt are not obvious at 1.4 and 1.9 μm. With the interpreted results in Sec. 2.2, we can roughly determine the range of interest of sandstone and acidic igneous rocks on the images. After binarization processing, we conduct image segmentation by selecting thresholds for classification of sandstone and acidic igneous rocks. Finally, we can segment the areas of sandstone and granite rocks manually.
After the above processing, sandstone and granite rock areas were extracted for analysis. The areas of sandstone and granite rocks were segmented by the decision tree. The classification results were evaluated using 25 checkpoints selected in the study area. The 25 checkpoints were superimposed on the geologic map for accuracy assessment [ Fig. 20(a)]. The results of the classification accuracy are shown in Table 2. As seen from Table 2, an average classification accuracy of 65.2% ½ð60.3 þ 70Þ∕2 was achieved.

Basalt lithology classification
As seen from Fig. 19, the threshold method cannot effectively extract basalt because the spectral curves of basalt, sandstone, and granite rocks are similar. Moreover, the coverage of basalt in the study area is very small and the collection of samples is limited. However, with the sample collected in Sec. 2, the training data for basalt were selected and the minimum distance classifier (provided by ENVI 5.0) was applied to classify basalt. The operational steps are the same as those for marble rocks and shale in Sec. 3.2.2. Finally, the basalt area was extracted.
In addition, 50 checkpoints were superimposed on the geological map for accuracy assessment [ Fig. 20(a)]. The assessment method is the same as that described in Sec. 3.2.3. The classification accuracies for each type of rock are shown in Table 2; the average classification accuracy was 67.4%.

Field Verification of Classification Accuracy
In order to verify the classification accuracy, the city of Guanyang located at the longitude of 110°43′16″ E through 111°20′13″ E and latitude of 25°10′32″ N through 25°45′37″ N was selected for the accuracy assessment of the field verification [see Figs. 20(b) and 20(c)]. Six types of rocks were sampled and their coverage areas were investigated. Based on the field verification, the classification accuracies of carbonate, sandstone, granite, shale, marble, and basalt rocks were ∼64.5%, 60.3%, 70%, 67.4%, 65.7%, and 67.8%, respectively. The average accuracy was ∼66% [see Table 2 and Fig. 20(a)]. The classification accuracy with 66% can meet the requirement of 1:100,000 geological mapping, which is referenced to the Standard of China Geological Map Remote Sensing Interpretation Technology. 29 In addition, the areas containing the six types of rocks were calculated in accordance with the number of pixels and the corresponding lithologies. The results are shown in Fig. 21.
From Figs. 20 and 21, it can be noted that the area occupied by carbonate in the study area is the largest. The carbonate rocks mainly cover the northern part of the study area; they also appear in the southwest part of the study area. The area occupied by sandstone is the second largest. The sandstone is widely distributed in the northwest part of the study area; they also appear in the central and eastern regions. Shale is mainly distributed in the eastern region. The granite distribution is concentrated in the south and northeast parts of the study area. The area covered by marble rocks is mainly located in the northern and southeastern regions. Basalt covers the smallest area and is scattered throughout the study area.

Discussions and Remarks
As analyzed above, the classification accuracies of carbonate rocks and sandstone are not ideal. This is probably because their minimal components are similar. For example, carbonate rock components consist of quartz and calcite, and over 52% of sandstone is comprised of quartz. As shown in Fig. 22, the spectral curves of quartz and calcite differ widely around the bands of 0.6 and 1.4 μm. Based on the results of the field verification, the classification results of limestone are correct, as limestone is a type of carbonate rock with calcite as the main mineral and only a small amount of other components. The classification accuracy for carbonate rocks is high because carbonate rocks in the city of Guanyang are mainly composed of limestone, i.e., without the components of sandstone and shale. On the other hand, s1, s2, and s3 in Eq. (4) are based on laboratory measurements, in which the sample numbers are limited. Therefore, the classification accuracy of sandstone is low. From Table 2, the classification accuracies for both shale and carbonate rocks are close. In addition, the band ratio method affects the spectral characteristics of both clay and green mudstone. However, clay is the main component of shale. Therefore, the band ratio method affects the classification accuracies of both carbonate and shale.

Conclusions
In this paper, a rock classification method using multispectral characteristics is proposed. The remotely sensed Landsat 7 ETM+ imagery in combination with the existing 1:200,000 geological map was applied to study the spectral characteristics of six types of rocks. The carbonate rocks were first classified by the band ratio method, using which the study area was divided into  carbonate and noncarbonate areas. In the noncarbonate area, five types of rocks-shale, marble, sandstone, granite, and basalt-were further classified. Using the spectral data measured in the laboratory, the lithology information could be effectively extracted from the ETM+ imagery of the entire area. Field validations and investigations were conducted to validate the proposed methods. The validation results demonstrated that an average of 66% classification accuracy was achieved, which meets the accuracy requirement for a 1:100,000 geological map. The 1:100,000 geological map created in this paper will be delivered to the Guangxi Geological Bureau, China, for application by geological and remote sensing communities nationwide.