Semisupervised deep learning using consistency regularization and pseudolabels for hyperspectral image classification

. Hyperspectral image (HSI) classification is a focus area in remote sensing research, wherein redundant spectral information poses a significant challenge and deep-learning-based classifiers have achieved better performance than traditional methods have. Training a deep-learning-based classifier requires numerous labeled samples. However, collecting such a sub-stantial amount of labeled hyperspectral data is difficult. Semisupervised classification of HSIs has thus received increasing attention, where semisupervised learning classifiers function based on labeled and unlabeled data. A new training method for semisupervised HSI classification is proposed. Specifically, consistency regularization and pseudolabeling are combined as a semisupervised training framework, without the introduction of a complex mechanism. Our proposed algorithm can work without the need to change the conventional convolutional neural network model architecture. Unlike previous deep-learning-based methods, our approach does not require data reconstruction to obtain unsupervised loss. This means that our model can be much less computationally intensive. From the results of experiments on three public hyperspectral datasets, our proposed method outperforms several state-of-the-art methods. © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JRS.16.026513]


Introduction
Hyperspectral image (HSI) classification is a subject garnering very high research attention in hyperspectral remote sensing. HSIs can provide rich spectral and spatial information, which improves their utility in various applications. However, the abundant spectral information also leads to low classification accuracy, which is termed the Hughes phenomenon. On the contrary, limited labeled hyperspectral samples also lead to difficulty in HSI classification. In the real world, increasing quantities of hyperspectral data are becoming available with the development of information acquisition technology. However, most of these data are unlabeled, and labeling the data is extremely laborious and time-consuming. Nevertheless, semisupervised learning (SSL) methods can achieve good performance via usage of a large volume of unlabeled data and a small amount of labeled data. Thus, SSL methods are increasingly being proposed to address this challenge.
In the early stages of semisupervised HSI classification, several traditional methods have been applied. For instance, the transductive support-vector machine (SVM) was proposed for semisupervised HSI classification. 1 Gomez-Chova et al. 2 used the Laplacian SVM (LapSVM) by introducing the graph Laplacian matrix to process unlabeled data. Yang et al. 3 designed the spatiospectral Laplacian SVM by considering both spatial and spectral information.
Recently, deep learning has shown great potential for use in computer vision 4,5 and HSI classification. In this regard, Zhang et al. 6 proposed a spectral-spatial residual network consisting of spectral and spatial residual blocks to extract spectral and spatial features. By replacing the residual network with the densely connected structure and using different convolution kernel sizes, Wang et al. 7 proposed a fast dense spectral-spatial convolution network.
Although previous studies on deep learning for supervised HSI classification with a small number of training samples have achieved noteworthy success, these studies could not use unlabeled data to enhance the classification performance. Thus, designing a semisupervised method based on deep learning is of great value, and ladder networks have been introduced to deep learning for semisupervised HSI classification. 8 Fang et al. 9 employed the cotraining method with a model based on ResNets. 10 However, such methods require the addition of new parts to the model or designing of multiple models, which increases the computational effort.
Most recently, a new semisupervised method called Fixmatch 11 has emerged as the state-ofthe-art semisupervised framework in computer vision. This method works using existing convolutional neural network (CNN) models, and changes to the model structure are unnecessary. It comprises two components: consistency regularization and pseudolabeling. The consistency regularization component depends on data augmentation. However, many classical data augmentation methods in the computer vision field cannot be applied to HSIs directly. For example, color distortion, a classical image augmentation method, is not suitable for HSIs because it changes the spectral information. Consequently, HSI data augmentation has not been sufficiently researched. We therefore attempted to apply data augmentation for HSI semisupervised classification and designed a simple SSL framework for HSI classification.
In this paper, inspired by Fixmatch, 11 we propose a semisupervised method for HSI classification. Semisupervised methods for HSI classification can work with both labeled and unlabeled data, which is of high practical significance. Because labeling HSI data is time-consuming and labor-intensive. The contributions of this study are as follows: 1. We propose a method for semisupervised HSI classification based on consistency regularization and pseudolabeling. The pseudolabels are produced based on the original image which work as targets when the augmented images are fed into the model. In our method, both the original image and the augmented image are used when performing consistency regularization. 2. Our proposed algorithm couples the SSL with conventional CNN without the need to change the model structure. 3. We explore the use of data augmentation on HSI semisupervised classification, where the consistency regularization relies on data augmentation. We adjust data augmentation methods to make the consistency regularization available for HSI classification. We modifiy the classical Cutout 12 method to prevent it from removing the spectral information of the center pixel and combine it with horizontal or vertical flip as our augmentation method. Our experiments prove that the proposed method is superior to the state-of-theart approach.
The remainder of this paper is organized as follows: Sec.
The AugmentðxÞ is a stochastic transformation. Thus, the two terms in Eq. (1) are not equal.

Pseudolabeling
Pseudolabeling 14 is also a typical semisupervised method. It relies on the model to give labels (the argmax of the model's output) for unlabeled data. Only the labels whose largest probability exceeds a predefined threshold are retained. The pseudolabeling process employs the following loss function: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 4 6 3 1 μB and τ is the threshold.

Summary of Our Proposed Model
The complete architecture is shown in Fig. 1. In our model, the loss function consists of two elements, supervised loss l s and unsupervised loss l u . l s refers to the standard cross-entropy loss, and l u is formulated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 2 8 where represents the unlabeled data, and Aðu b Þ represents the augmented unlabeled data. The loss function of our method is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 2 5 3 where λ u is the weight parameter. All the above equations were gained from Ref. 11. The augmentation method is illustrated in Algorithm 1. The size of the submatrix in Algorithm 1 is not fixed.

Experimental Datasets and Parameter Setup
We conducted experiments on three popular hyperspectral datasets: Indian Pines, Pavia University, and Salinas, as shown in Fig. 2.
The Indian Pines image was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) in 1992 from northwestern Indiana. It contains 145 × 145 pixels with 224 spectral bands in the wavelength range of 0.4 to 2.5 μm. Considering the water absorption effect, the researchers removed 24 bands, and the 10,249 labeled pixels were grouped into 16 classes.
The University of Pavia scene was acquired by the Reflective Optics System Imaging Spectrometer in 2002 over Pavia, northern Italy. The image contains 610 × 340 pixels with 103 spectral bands. The spectral wavelength ranges from 0.43 to 0.86 μm. The 42,776 labeled pixels were classified into nine categories.
The Salinas dataset was acquired by the AVIRIS sensor over Salinas Valley, California. The data cube size is 512 × 217 × 224 with the wavelength range from 0.4 to 2.5 μm. Twenty water absorption bands were discarded, and 200 bands were available for analysis. The 54,129 labeled pixels were divided into 16 categories.
All the experiments were performed using the RTX Titan GPU. The learning rate was 0.0001, and the minibatches had a size of 256. The model was trained using the Adam optimizer for 150 epochs. The unsupervised loss weight parameter was min(0.01 × epoch, 1), and the threshold was 0.98. According to Fixmatch, 11 we think a high threshold value is necessary to ensure a high level of accuracy. We chose 0.9, 0.94, and 0.98 as the range of the threshold value. We chose hybridSN 15 as the backbone network. The input formats were also the same as in Ref. 15, 25 × 25 × 30 for the Indian Pines, and 25 × 25 × 15 for the Pavia University and Salinas images. Principal component analysis was used to reduce the data dimensions. We compare our proposed model with five methods, including four traditional methods and one deep-learning-based method. The traditional methods are mentioned in Ref. 16: transductive SVM, 17 local and global consistency graph-based algorithm (LGC), 18 label propagation algorithm (LPA), 19 and label propagation algorithm combined with particle cooperation and competition (LPAPCC). 16 The deep-learning-based method is SS-CNN. 8 This model requires data reconstruction to obtain unsupervised loss. We trained this model for the same number of epochs as our model. The training and testing samples are the same as in Ref. 16, as shown in Tables 1-3. The percentages in parentheses are the corresponding ratios.

Analysis of Classification Results
The classification results over three HSI datasets are listed in Tables 4-6. The results of the traditional methods as shown in these tables are obtained from Ref. 16. The best values are marked in bold. We conducted the experiments five times consecutively. The numbers after the plus-minus signs are the standard deviations of the corresponding metrics. In this paper, the overall classification accuracy (OA), average classification accuracy (AA), and kappa coefficient (Kappa) are adopted as model performance evaluation metrics. The OA is the ratio of the correctly classified samples to all the test samples. The AA denotes the average classification accuracy in each category. Finally, Kappa is the measure of agreement between the ground truth and classification and is interpreted as follows: <0.2, poor agreement; 0.2 to 0.4, fair agreement; 0.4 to 0.6, moderate agreement; 0.6 to 0.8, good agreement; and 0.8 to 1, excellent agreement. As shown in Tables 4-6, our proposed method achieved the best performance over three experimental datasets. The AA of our model is much better than that of the other contrast methods. This means that our method is applicable to unbalanced data. Considering that our model did not achieve the highest classification accuracy on some kinds of samples, we think this additionally reflects the high stability of our method. According to the Kappa results, our proposed method performs well on the three datasets. This method achieved the highest classification accuracy on six of the nine classes in the Pavia University dataset. These results further confirm the superiority of our proposed method. From the above results shown in Tables 4-6, we can reach several conclusions. First, our proposed method shows the most accurate and stable performance. Second, from a comparison    with the TSVM results, the deep-learning-based framework proves superior to the traditional machine learning-based algorithm. Third, the Indian Pines dataset is much more difficult to classify than the Salinas scene. Moreover, the LGC and LPA results indicate that the single mechanism does not perform well in HSI semisupervised classification. Therefore, methods based on multiple mechanisms need to be investigated further. Besides, our proposed method gets better performance than SS-CNN. It indicates that deep-learning-based methods can achieve good results in HSI semisupervised classification without changing the model architecture. Finally, the LPAPCC result suggests a significant advantage in using the graph mechanism. In future work, we will attempt to introduce the graph mechanism to design a semisupervised framework.

Influence of Data Removal
Data removal is an important part of our augmentation method. Here, to evaluate its effects, we compare the performance of our proposed method and our method without data removal. The results are shown in Table 7. As shown in Table 7, our proposed method surpasses the performance of our method with image flip only.
Although the enhancement is negligible on the Indian Pines dataset and University of Pavia dataset, we think it is because that the proportion of unlabeled data in the training data is small. This can be verified by a larger enhancement in Salinas dataset whose unlabeled data occupy a relatively larger proportion. However, in the real world, the unlabeled data are much more than labeled data. We suppose that data removal can have a greater impact on accuracy with a large proportion of unlabeled data.

Influence of Parameter
The threshold and the weight of unlabeled loss in our proposed method are important. To find the best parameters for our proposed method, we conducted several experiments. We compared the performance of different parameters. The results are presented in Tables 8 and 9. From the results about different thresholds, we can see our proposed method did not achieve the best accuracy over Indian Pines. However, the differences are very small. Moreover, our proposed method is superior to other methods over University of Pavia and Salinas. We can get that the threshold of 0.98 is strongly robust. From the results about different weights, we can easily conclude that our proposed method achieved the highest accuracy. Besides, from the results, we can find that the sensitivity of our model to the parameters is different on different datasets. We suppose the reason is that the proportions of unlabeled data in the training data are different on different datasets. Because the proportion of unlabeled data in the training data is small on Indian Pines, the model is not sensitive to these parameters. The smaller the proportion of unlabeled data in the training data is, the less sensitive the model is to these parameters.
In summary, it has been shown from these results that the model accuracy is not significantly affected by the weight and threshold. Our proposed method is very robust and therefore practical in real-world applications.

Conclusions
A deep-learning-based framework for HSI semisupervised classification is proposed in this paper. The framework is composed of two components: consistency regularization and pseudolabeling. It is used on existing CNN models to enable them to utilize unlabeled data. Unlike the case in previous research, a simpler SSL framework is proposed herein, and no complex mechanism is introduced. The proposed method's loss consists of supervised and unsupervised losses. The unsupervised loss is essentially the cross-entropy loss based on the augmented version of unlabeled data. The label is based on the output of the corresponding unlabeled data. Our proposed algorithm can work without the need to change the conventional CNN model architecture.
Besides, unlike previous deep-learning-based methods, our approach does not require data reconstruction to obtain unsupervised loss. Therefore, our model can be much less computationally intensive. Our experiments demonstrate that the proposed method can achieve high classification accuracy and outperform state-of-the-art methods, while being robust to model parameters.
In future work, we will propose a better data augmentation method for HSI classification, and the generative adversarial network 20 is the focus of our subsequent study. An attempt will be made to introduce additional methods, such as the graph mechanism, to HSI semisupervised classification because the original pseudolabeling does not solve the problem of wrong labels. Moreover, an attempt is underway to design a new pseudolabeling method.