## 1.

## Introduction

Factor analysis is a multivariate analysis method which conducts a comprehensive analysis on observed data of multiple variables and multiple samples.^{1} Based on the study of a similarity matrix of variables or samples, it sums perplexing multiple variables or multiple samples up to a few factors. The combinational relationships of variables or samples are analyzed, so that the essential factors, which play leading roles in the information extraction process, are acquired.

Factor analysis has been used to classify geological samples, to get the main factors of geological structures, to obtain geochemical information of different regions, to explain spatial variables, and to analyze topographic variables from which the hydrological factors can be extracted.^{2}3.4.5.^{–}^{6} R-mode factor analysis has been used to do geochemical division and estimate water quality.^{1}^{,}^{7} After being combined with clustering analysis, it provides a new quantitative method which has explicit geological significance.^{8}

High temperature targets refer to the surface cover types in which temperatures are obviously higher than the temperatures of normal surface cover types ($300\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{K}\pm $). These abnormally high temperatures, such as forest fires, grassland fires, coal seam spontaneous combustions, volcanic eruptions, etc., are generally higher than 500 K. Due to the higher temperature, their emitted energy in the shortwave infrared ($1.3\sim 3.0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$) can be equal to or even higher than the reflected energy of the surface cover types with normal temperatures. This is a significant feature of high temperature targets and can be derived from the blackbody radiation function.^{9} High temperature targets have a great significance in environmental monitoring, disaster warning, and resources investigation. In this article, factor analysis is conducted to recognize high temperature targets in remote sensing imagery.

## 2.

## Method

Factor analysis is divided into R-mode factor analysis and Q-mode factor analysis. R-mode factor analysis is mainly used for studying the relationships among variables and realizing the classification of samples, while Q-mode factor analysis is mainly used for studying the relationships among samples and realizing classification of variables. Both of them can be unified by correspondence analysis.^{10}

R-mode factor analysis starts from a similarity matrix/correlation coefficient matrix of variables. Through the correlation analysis of variables, it sums multiple variables up to a few factors. The information can be extracted with the least amount of information loss. Each factor extracted can be considered as a linear combination of the original variables, so that the group characteristics of variables can be analyzed and the thematic significance of every factor will be achieved.

R-mode factor analysis follows the form

where $X$ is an $n\times m$ order original data matrix; $n$ is the number of samples/pixels; $m$ is the number of variables/bands; $A$ is an $m\times p$ order matrix called the loading matrix that represents the correlation between variables and factors; $F$ is an $n\times p$ order matrix called the factor score matrix that represents the correlation between samples and factors; and $p$ is the number of factors.The calculation process of R-mode factor analysis is as follows:

(1) Column standardization processing of original data matrix $X$ (The standardized matrix is also expressed in the term of $X$).

(2) Calculation of correlation coefficient matrix $R$:

(3) Calculation of eigenvalue $\mathrm{\Lambda}$ and eigenvector $T$ of $R$.

(4) Calculation of factor loading according to the number of factors $p$:

(5) Calculation of the matrix of factor score $F$:

## 3.

## Data Sources

The study area lies on the intersection of Fugu in Shanxi Province and Baode in Shaanxi Province, China ($38\xb0{39}^{\prime}\mathrm{N}\sim 39\xb0{35}^{\prime}\mathrm{N}$, $110\xb0{22}^{\prime}\mathrm{E}\sim 111\xb0{19}^{\prime}\mathrm{E}$) (Fig. 1). Bode and Fugu are located on different sides of the Yellow River. The study area is abundant in coal resources. Around the year 2002, high temperature targets, such as coke ovens and metal smelting factories, were widely distributed in this area. One Landsat-7 ETM+ scene acquired the date of July 14, 2002, is selected in this study. A series of imaging processing steps, such as radiometric calibration, atmospheric correction, and clipping, have been performed first. Through the previous processes mentioned above, for remote sensing imagery pixels, their radiant energy should be the sum of the reflected and emitted energies due to the high temperature targets in them. We call the reflectivity of pixels visual reflectivity^{11} and the physical expression is^{9}

## (5)

$${\rho}_{0}=\frac{{M}_{1}S+{M}_{2}S+{M}_{3}(1-S)+{M}_{4}(1-S)}{{T}_{\theta}{E}_{0}\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}\theta},$$There are nine kinds of surface cover types in the study area. They are residential area, road, forest land, cultivated land (in the hilly area), cultivated land (in the lowland), river, flood plain, gully, and high temperature targets. In the view of the spectral analysis,^{12} 30 representative pixels in each kind of the surface cover type are selected from remote sensing imagery, respectively. There are 270 samples in total. KMO (Kaiser–Meyer–Olkin) and Bartlett spherical degree test^{13} have been done to all of the selected samples. KMO statistics is an indicator which is used to compare simple correlation and partial correlation coefficients of variables. Its value range is [0, 1]. The greater the KMO value, the more suitable it is for the factor analysis of the original variables to be put into effect. Bartlett is the indicator that is used to test the difference between the actual correlation matrix and the unit correlation matrix. When the value of the significance test of Bartlett is less than a given reliability, the correlation between the original variables is significant. Thus, the original variables are suitable for factor analysis. The results of these tests show that the KMO value is greater than 0.6 and the Bartlett value is less than 0.05, which satisfy the premise of factor analysis. In R-mode factor analysis of the study area, representative samples, eigenvalues, and information quantity are in Table 1 and the factor loading matrix in Table 2 is achieved. As the accumulated information quantity of first three factors has reached 98.141% and satisfies with the requirements of little information loss, this article will focus on the first three factors with larger amounts of information and give detailed explanations of the analysis of them.

## Table 1

Correlation matrix eigenvalues and information quantity calculated by the representative samples.

Factor | Eigenvalue | Information quantity (%) | Accumulated information quantity % |
---|---|---|---|

Factor 1 | 3.982 | 66.365 | 66.365 |

Factor 2 | 1.127 | 18.785 | 85.150 |

Factor 3 | 0.779 | 12.990 | 98.141 |

Factor 4 | 0.059 | 0.981 | 99.121 |

Factor 5 | 0.048 | 0.793 | 99.914 |

Factor 6 | 0.005 | 0.086 | 100.000 |

## Table 2

Factor loading matrix calculated by the representative samples.

Factor 1 | Factor 2 | Factor 3 | |
---|---|---|---|

Band 1 | 0.909 | −0.364 | −0.107 |

Band 2 | 0.929 | −0.360 | 0.054 |

Band 3 | 0.911 | −0.389 | 0.019 |

Band 4 | 0.588 | 0.398 | 0.698 |

Band 5 | 0.821 | 0.540 | −0.037 |

Band 7 | 0.665 | 0.514 | −0.526 |

## 4.

## Experimental Results

The R-mode factor loading matrix reflects the relationship between variables and factors. Each factor can be considered as a linear combination of each variable. The thematic significance of R-mode factors is determined based on the understanding of the physical meanings of the variables of each band. The element ${a}_{ij}$ in the factor loading matrix shows the important degree of variable $i$ of the factor $j$. The greater the absolute value of ${a}_{ij}$, the more important variable $i$ is to the factor $j$. That is to say that the thematic meaning of the factor can be described by the variables with larger absolute values of ${a}_{ij}$. Factor 1 represents the total linear combination of the visual reflectivity of each band and it is called the brightness factor. Its loading is positive in each band. Factor 2 represents the difference between near-infrared and visible light reflectivity and it is called the vegetation factor. The absolute values of factor loading at the fourth and seventh band variables are larger in Factor 3 (i.e., ${f}_{3}$). That is to say, ${f}_{3}$ mainly represents the information of bands 4 and 7. The inverted result of ${f}_{3}$ is equal to the difference value of the combination of bands 7 and 4. The approximate expression of $-{f}_{3}$ is

where ${\rho}_{4}$ is the visual reflectivity of band 4, and ${\rho}_{7}$ is the visual reflectivity of band 7.The main difference between high temperature targets and other surface cover types is that for the targets with high temperature, ${\rho}_{7}$ is larger and ${\rho}_{4}$ is smaller. Hence, the shared similar principles with the enhanced vegetation index^{14} ($\mathrm{EVI}={\rho}_{4}-{\rho}_{3}$, ${\rho}_{3}$ is the visual reflectivity of band 3) are used to enhance the information of vegetation, and ${f}_{3}$ can be used to enhance the information of high temperature targets. Calculate the value of ${f}_{3}$ for the pixels and the greater the inverted value of ${f}_{3}$ is, the more likely it is that there are high temperature targets in them. At the same time, ${f}_{3}$ is similar to the normalized difference fire index (NDFI)^{11} $[\mathrm{NDFI}=({\rho}_{7}-{\rho}_{4})/({\rho}_{7}+{\rho}_{4})]$ in principle, while NDFI is also similar to the normalized difference vegetarian index (NDVI)^{14} $[\mathrm{NDVI}=({\rho}_{4}-{\rho}_{3})/({\rho}_{4}+{\rho}_{3})]$ in its principle of enhancing vegetation. Like NDFI, ${f}_{3}$ can enhance the temperature of high temperature targets, so that $-{f}_{3}$ can be named a fire factor.

In the light of factor analysis with representative samples, the authors of this article calculate the factor score for the factor loading matrix with the study area pixel data, and get a factor score image (Fig. 2) and representative samples factor illustration (Fig. 3). The R-mode factor score matrix reflects the relationship between samples and factors, thus the DN value in each factor score image represents the weight of the composition of lightness, vegetation, and fire factors. The reflectivity of the flood plain in each band is very high, so that it shows a high brightness value in the score image of Factor 1. Forest and cultivated land have higher compositions of vegetation, so that they both have high brightness values in the score image of Factor 2. Pixels of high temperature objects in the score image of Factor 3 have the highest brightness values. In the factor illustration, all representative pixel samples show different clustering characteristics of points. The pixel samples can be classified and used to recognize targets.

Because of the factor score, a mixture tuned matched filtering,^{15} (MTMF, the identification method which uses a matched filtering score and infeasibility to measure the similarity degree between unknown pixels and known samples), is conducted to high temperature objects of known samples. After the threshold is set according to the matched filtering score image and infeasibility image scatter plot, 300 pixels of high temperature targets are acquired. Then they are verified one by one in the field. The result of the field verification demonstrates the following facts. 285 pixels of targets are recognized successfully. Most of them are from the coking plants or metal smelting plants, and a few of them are from thermal power plants. All the extracted target pixels have high temperature property meanings. There are 15 error pixels, and among them, 8 pixels are from the flood plains and gullies. The reason for this misjudgment is that they have higher values in the curves of all the bands. The other seven pixels are near high temperature pixels. The reason for this misjudgment is that they have a higher similarity on the spectrum to high temperature targets. The identification precision of MTMF reaches 95%.

In addition, via using a monowindow algorithm,^{16} the temperature has been inversed from the thermal infrared remote sensing imagery (ETM+ 6) with the same time phase. The pixels in the thermal imagery, which match the positions of the 285 pixels mentioned above, served as thermal infrared abnormal areas, and the other areas are taken as the normal temperature background in the process of statistical analysis of temperature. The results are listed in Tables 3 and 4. The results show that the values of the normal temperature background and high temperature targets are relatively close, which makes it more difficult to distinguish the targets from the background. Moreover, the target with the highest temperature in the results of this inversion is 324.21 K, which is quite different from its actual temperature (500 K+). The temperature of high temperature targets inversed from the shortwave infrared imagery (Table 5) based on blackbody radiation characteristics^{17} are relatively consistent with the actual temperature. That is to say, thermal infrared remote sensing data fail to reflect the features of high temperature targets properly. The main reason for this case is that the spatial resolution of thermal infrared remote sensing imagery is 60 m × 60 m and of shortwave infrared remote sensing imagery is 30 m × 30 m. The high temperature targets with the equal temperature and area are more easily weakened by the background with a normal temperature in the thermal infrared remote sensing imagery than those in the shortwave infrared remote sensing imagery.

## Table 3

Inversion temperature of thermal infrared high targets in the study area.

Min | Max | Mean | SD | Mode | Median | |
---|---|---|---|---|---|---|

T (K) | 307.41 | 324.21 | 314.08 | 3.63 | 315.65 | 313.39 |

## Table 4

Inversion temperature of thermal infrared normal temperature background in the study area.

Min | Max | Mean | SD | Mode | Median | |
---|---|---|---|---|---|---|

T (K) | 294.06 | 323.21 | 311.02 | 7.43 | 311.80 | 310.51 |

## Table 5

Inversion temperature of shortwave infrared high targets in the study area.

Min | Max | Mean | SD | Mode | Median | |
---|---|---|---|---|---|---|

T (K) | 499.38 | 608.19 | 574.64 | 24.88 | 608.14 | 575.90 |

## 5.

## Conclusion

R-mode factor analysis starts from the similarity matrix/correlation coefficient matrix of variables. It can sum multiple variables up to a few factors with little information loss. This article uses multispectral remote sensing data in its study. In the results of R-mode factor analysis, a factor loading matrix or factor loading curves reflect the correlation between band variables and factors. Each R-mode factor can be considered as a linear combination of each bands’ variables, so that the group characteristics of variables can be analyzed and the thematic significance of every factor will be achieved.

The score matrix of R-mode factor analysis reflects the correlation between samples and factors. In the results of R-mode factor analysis, a factor score matrix represents the factor composition weight of pixel samples and can be used to classify pixel samples and recognize targets.

Factor loading and factor score have clear thematic significance, which allows the adoption of MTMF to recognize targets. The result of the field verification shows that all the extracted target pixels have high temperature property meanings, and the identification precision reaches 95%.

## Acknowledgment

This article was supported by the Higher Specialized Research Fund for the Doctoral Program funding under Grant number 20110061120067.