Multi-spectral cloud detection based on a multi-dimensional and multi-grained dense cascade forest

Abstract. Cloud detection in satellite images is a vital step for cloud/land recognition, cloud/snow discrimination, and cloud shadow removal. Accurate cloud detection plays an important role in land resource management, environmental pollution monitoring, and land target recognition. Deep learning (DL) algorithms have shown great progress in cloud detection. However, as the complexity of the DL-based model increases, cloud detection efficiency decreases. DL-based cloud detection models are unable to successfully balance the performance-efficiency tradeoff. In our study, a multi-dimensional and multi-grained dense cascade forest (MDForest) is proposed for multi-spectral cloud detection. MDForest is a deep forest structure that automatically extracts low-level and high-level features from satellite cloud images end-to-end; a multi-dimensional and multi-grained scanning mechanism is introduced to capture the spectral information of multi-spectral satellite images while enhancing the representation learning ability of cascade forest. The experimental results on the HJ-1A/1B dataset show that MDForest improves the performance of cloud detection and possesses a good inference efficiency compared with DL-based cloud detection methods, which makes the proposed MDForest satisfy the application where good performance and high efficiency are both required.


Introduction
Accurate cloud detection in satellite images is a vital step for land object recognition, land resource management, and environmental pollution detection. 1,2 The difficulties of satellite image transmission and the high similarity that exists between cloud regions and ground objects (such as ice, snow, and fog) make cloud detection a challenge, 3,4 which drives researchers to develop a fast and accurate cloud detection method. With the development of remote sensing technology, the effective integration and processing of multiple spectral satellite images provide a way to achieve accurate cloud detection by extracting rich information from multi-spectral satellite images. 5 In recent years, the robustness of cloud detection methods has been greatly improved. Based on the optimization pattern of these methods, three groups have emerged: (1) threshold-based methods, 6,7 (2) machine learning (ML)-based handcraft feature designing approaches, 8,9 and (3) deep learning (DL)-based algorithms. 2,10 Because of their reliable performance and simplicity, function of mask 11,12 and automated cloud cover assessment 13,14 algorithms are two representative approaches that have been widely employed for cloud detection. Aiming at addressing the poor generalization of the threshold-based method, Yuan and Hu 15 combined support vector machine (SVM) and haze optimization transform technique to distinguish the fog, cloud-free, and thick cloud areas from satellite imagery, by calculating the correlation between the spectral responses. Many ML-based methods that combine handcraft feature engineering for cloud detection have been broadly explored, these works include SVM-based multi-feature fusion, [16][17][18] *Address all correspondence to Yao Zou, zouyaodhu@mail.dhu.edu.cn cloud detection implemented by random forest using spectral reflection characteristics, 19 and fast cloud detection realized by decision tree-based models. 20 ML-based cloud detection models are a group of algorithms that highly rely on the domain knowledge of meteorology experts. Compared with ML-based cloud detection models, DL-based techniques extract low-level to high-level features end-to-end and have shown great promise to distinguish cloudless areas and cloudy regions. 21,22 Based on the texture, spectral, and structural information, Shao et al. 10 combined a neural network and fuzzy theory to improve the performance of cloud detection. Subsequently, Shao et al. 23 proposed a multi-scale convolutional neural network (CNN) to automatically extract spatial and spectral information from multi-spectral satellite cloud images, which further enhance the prediction result of cloud detection. With the benefit from the end-to-end low-level and high-level feature extraction mechanism, many DL-based cloud detection models have been established to reduce the detection error. These models include fully CNN, 24,25 deep CNN, 26 cascade CNN, 27 and residual network. 22 DL-based algorithms have made great progress in the improvement of cloud detection. However, there are some defects of DL-based methods that limit their practical application in a cloud detection task. (1) It is widely known that the training of DL-based models requires a great amount of labeled training data while the sample labeling process in cloud detection task cost high. 28 (2) NN is a pre-defined framework that may lead to under-parameterization or overparameterization, making the training of NN a complex hyperparameter fine-tuning process. 29 (3) DL-based cloud detection models are widely criticized because of their black-box attribution, which relates to an implicit learning process. 30,31 Based on the above considerations, an improved multi-dimensional dense deep forest is proposed for multi-spectral cloud detection. The proposed algorithm has the following characteristics: (1) it is a deep framework that ensembled by the decision trees, (2) end-to-end learning, (3) multi-dimensional multi-scale scanning mechanism, and (4) dense connection. The above features make the proposed methods have the following advantages. (1) The deep structure based on tree algorithm effectively extracts the features of cloud samples from satellite cloud images, improving the efficiency of training and reasoning. (2) End-to-end learning avoids complex manual feature design, allowing it to extract low-level and high-level features from the original satellite images. (3) The multi-dimensional and multi-scale scanning mechanism effectively captures the spatial and spectral information of cloud images. (4) The dense connection improves the feature reuse abilities of deep forest, enabling the information extracted from each layer to be exploited more efficiently.

Proposed Method
Given a dataset S ð0Þ ¼ fx i represents the original input sample and y i is the label corresponding to the i'th sample. In the cloud detection task, the input x is the cloud detection image samples that contain four spectral information; y i ∈ f1; 2; 3g, where y i ∈ f1g represents the cloud-free area, y i ¼ 2 is the area of thin cloud, and y i ¼ 3 indicates the thick cloud area. In this study, a multi-dimensional and multi-grained dense deep forest (MDForest) is proposed for multi-spectral cloud detection. Figure 1 shows the structure of MDForest. MDForest consists of two parts: multi-dimensional and multi-grained scanning and dense cascade forest. 32,33 Multi-dimensional and multi-grained scanning realizes the re-representation process of features. The multi-scale features are captured by a multi-grained scanning mechanism to enhance the representation learning ability of the cascade forest. Also, it possesses the ability to process multiple spectrums, which can effectively distinguish the similarity between the spectrums in satellite images. Cascade forest is a deep forest that simulates the representation learning of neural networks and achieves good prediction accuracy and efficiency through hierarchical image information processing. To realize multi-spectral cloud detection, we first collect the satellite images with four spectrums from Chinese satellites HuanJing-1A and HuanJing-1B (HJ-1A/1B) 34,35 and effectively integrate them into multi-spectral satellite images. Then, the multi-dimensional and multi-grained scanning mechanism is applied to extract spatial information and spectral information from cloud samples. Next, a cascaded forest is established to realize the representation learning based on the re-represented features. HDForest is constructed in a cascading way, making HDForest an ensemble framework to process information layer-by-layer. In addition, the designing of a densely connected structure, which borrowed from DenseNet, 36 is introduced in this study to avoid overfitting by maximizing the utilization of spatial information and spectral information. Figure 2 shows the structure of the dense cascade forest. According to the hierarchical processing mechanism of the neural network, a deep forest structure can be described as x l ¼ F ðlÞ ðx ðl−1Þ Þ ¼ ½F ðl1Þ ðx ðl−1Þ Þ; F ðl2Þ ðx ðl−1Þ Þ; : : : ; F ðlmÞ ðx ðl−1Þ Þ, where l ∈ f1; 2; : : : ; Lg represents the probabilistic output of l − 1'th layer. F ðlÞ represents the cascaded forest of level l and F ðliÞ the i'th random forest in the cascade forest of the level l. As can be seen from Fig. 2 Fig. 1 The structure of MDForest. cascade forest simulates the layer-by-layer information processing mechanism of the neural network, the output of l − 1'th layer is considered as the input of l'th layer. However, such a structure is prone to overfitting when the number of layers increases, which hinders the diversity of individual learners. Each layer of the cascade forest is only related to the previous layer, which leads to the homogenization of based learners. One way to solve this problem is to design a dense connection structure to maximize the feature reuse rate. In this study, to improve the utilization of features, the dense connection is borrowed to build a dense deep forest, thus avoiding the overfitting problem. Based on above analysis, the layer-by-layer information processing mechanism can be re-expressed as (1)

Dense Cascade Forest
The final prediction results are averaged by the output of the last layer. Figure 3 shows the probabilistic prediction generation process of a random forest. As shown in Fig. 3, the construction of l'th level cascaded forest is not only related to the output of level l − 1, but also all the outputs of the layers before the level l − 1. Each level of the cascade forest is the concatenation results of the probabilistic prediction that are produced by four random forests while the prediction of random forest can be calculated according to Fig. 3.
As shown in Fig. 3, the probabilistic predictions of random forest can be concretely calculated by the following rules: (1) observe the class distribution on each leaf node of each decision tree in the random forest; (2) calculate the proportion of samples of different classes at all the leaf nodes; (3) take the probabilistic result at leaf node as the prediction result of decision tree for the given instance; (4) average all the probabilistic results of all the decision trees as the prediction of the random forest. Based on the prediction process of a random forest and the structure of the dense cascade forest, we concatenate all the class vectors that are generated by four random forests as the output of each level of the dense cascade forest. To further alleviate the overfitting problem, K-fold cross-validation 37-39 is incorporated in each layer of the dense cascade forest to get the robust probabilistic prediction result. As can be seen from Fig. 3, given an instance x, cloud detection can be modeled as an identification process of cloud-free, thin cloud, and thick cloud regions. As a result, a decision tree generates a three-dimensional class vector. The output of a forest is regarded as the average of the outputs of all the decision trees in a random forest.
In this work, four random forests are used as base learners to establish a cascade layer of HDForest, two of which are general random forests and two of which are completely random forests. Each random forest consists of 800 decision trees. Each tree in a completely random forest randomly selects a feature as the parent node for splitting. The training of the decision tree finished when there are less than 10 samples at each leaf node. Different from the completely random forest, a general random forest randomly selects ffiffiffi d p features as candidates for node splitting, where d is the number of features. In the growth of each tree, we utilize the Gini index as the criteria for node splitting.

Multi-Dimensional and Multi-Grained Scanning
The multi-dimensional and multi-grained scanning mechanism is inspired by the convolution operation of the CNN, which refers to a sliding window similar to CNN. In this study, the input size of each satellite cloud image sample is 28 × 28 × 4, suppose we set the dimension of the sampling window as 7 and the sliding step size as 7, ½ ð28−7Þ 7 þ 1 2 ¼ 16 subsamples can be obtained after performing a single sliding window on an original cloud sample with size 28 × 28 × 4. Likewise, in the process of multi-dimensional and multi-grained scanning, if we perform two grained windows with sizes 7 and 14 with a sliding stride of 7 for feature representation, we can get 25 subsamples that are slid from an original cloud sample, including 16 subsamples with the dimension of 7 × 7 × 4 and 9 subsamples with the dimension of 14 × 14 × 4. Subsequently, each subsample is used for training two completely random forests and two general random forests, and each random forest predicts one subsample into a C-dimensional class vector (in this study, cloud sample images are classified into three categories: cloudless, thin cloud, and thick cloud regions, therefore, C ¼ 3). All the outputs of the four random forests can be concatenated as the representation of original samples. In conclusion, an original cloud sample with size 28 × 28 × 4, which performed multi-dimensional and multi-grained scanning with windows of size 7 × 7 × 4 and size of 14 × 14 × 4, can be transformed into a probabilistic space with the dimension of 4 × 25 × 3 ¼ 300, where 4 is the number of random forests, 25 represents the number of samples that are slid from an original cloud sample, and 3 is the dimension of each predictive class probabilistic vector (Fig. 4).

Results
The experimental data are collected from HJ-1A and HJ-1B. HJ-1A/1B are Chinese environmental and disaster monitoring satellites. The HJ-1A satellite is equipped with a CCD camera and a hyperspectral imager while HJ-1B has a CCD camera and an infrared camera. The design principles of the two CCD cameras on the HJ-1A and HJ-1B satellites are the same. They are placed symmetrically at the sub-satellite points, bisecting the field of view and observing in parallel. HJ-1A and HJ-1B jointly complete the push-broom imaging with a swath of 700 km, a resolution of 30 m, and 4 spectrum channels. In this study, cloud detection is modeled as the recognition of cloudless, thin cloud, and thick cloud. To well balance the feasibility of patch-wise cloud detection and the quality of the cloud detection dataset, we extract satellite cloud samples with size 28 × 28 and 4 spectrum channels from satellite imagery. Table 1 shows band information for the four spectral satellite pictures. In this study, 28,800 cloud detection samples are collected for the experiment using HJ-1A/1B satellite images; 80% of the samples are picked randomly for training, with 20% of the remaining samples for testing.
A brief comparison of single-spectral cloud detection is explored and numerical techniques are adopted to validate the effectiveness of MDForest. In the study, several standard cloud sensing ML/DL algorithms, which include SVM, decision tree, random forest, neural network, CNN, ResNet-34, 40 Fig. 4 The illustration of the multi-dimensional and multi-grained scanning mechanism.
implementation of SVM, the radial basis function kernel 41 is selected for the SVM. The decision tree is trained under the determination of the criteria of the Gini index, and the number of decision trees in the random forest is determined by hyperparameters optimization of grid search. The structure of the neural network used in our experiment is a structure with four hidden layers, each hidden layer is composed of n neurons, where n is grid searched from the value set f64; 128; 256g, each hidden layer is activated with ReLU function. 42 In the output layer, the SoftMax activation function is performed to get the probabilistic prediction. Learning rate, which is an important parameter to the final predictive performance, is determined by grid searching from the space of f10 −4 ; 10 −3 ; 10 −2 g.
In the implementation of CNN-based cloud detection, we adopt a CNN structure that is similar to LeNet. 43 CNN is composed of convolution layers and pooling layers. In this study, each pool size of each pooling layer is set to 2. A convolution layer is first performed on single/multispectral cloud detection images to extract the low-level features. We consider the parameter of filters in each convolution layer as a hyperparameter that needs to be finetuned, and the searching space of the number of filters in each convolution layer is set {64, 128, 256}. In the first convolutional layer, kernel size is set to 5 to enlarge the receptive field. Similar to the activation pattern of NN, each convolution layer is activated by the ReLU function, and the learning rate of CNN is searched from the value set f10 −4 ; 10 −3 ; 10 −2 g.
Considering the running environment and the scale of the multi-spectral cloud detection, ResNet-34 40,41 is selected for further comparison. The training strategy of ResNet-34 inherited the training pattern from NN and CNN. In this study, all NN-based cloud detection methods are optimized by Adam optimizer.
In the design of MDForest, three grained sliding windows with a sliding ratio of 0.125, 0.25, and 0.5 are first performed on original satellite cloud images. Next, one cascade layer accomplishes the realization of transforming raw cloud samples into a probabilistic feature space. Dense cascade layers are designed to learn the information from the transformed features, each layer of dense cascade forest consists of four random forests with T, where T is the number of decision trees searched from the value set of f100; 200; 300; 400; 500; 600g. Since MDForest is an ensemble algorithm that is integrated by random forests, random forest is the ensemble of decision trees. Therefore, MDForest can be regarded as an "ensemble in ensemble" algorithm. The high tree-based ensemble pattern makes MDForest a robust algorithm that has fewer hyperparameters to be finetuned. Therefore, in this study, we only focus on the fine-tuning of the parameter of the number of the decision tree in each random forest; the other parameters are consistent with the default implementation of random forest, which is realized in the scikitlearn 44 package. Table 2 shows the performance comparison of various cloud detection algorithms on singlespectral satellite images, all the experiments are run on CPU i7-10700K with a memory size of 48 GB, and the comparison results are the averaged results of the predictions on the four wavebands.
As can be seen from Table 2, SVM performs worst for single-spectral satellite cloud detection and MDForest outperforms other cloud detection methods. Compared with SVM, the decision tree algorithm improves the performance of cloud detection while achieving quick training speed and high inference efficiency, which indicates that tree-based approaches are better choices for single-spectral cloud detection than SVM. Compared with the prediction result of the single classifier such as decision tree and SVM, the random forest classifier gets a higher accuracy for single-spectral cloud detection, which demonstrates the effectiveness of ensemble-based approaches for cloud detection. In comparison, random forest outperforms CNN and neural network, which indicates that random forest can be a good solution for single-spectral cloud detection. In the comparison of training time and testing time, as can be seen from Table 3, treebased single-spectral cloud detection methods are more efficient than neural network-based single-spectral cloud detection.
To further testify the performance of the various cloud detection methods based on singlespectral satellite images, we present the prediction results on the full single-spectral satellite images based on various cloud detection methods in Fig. 6. Figure 6 As can be seen from Figs. 5(a) and 5(b), SVM tends to misclassify the land region into the thin cloud area, the cloud areas that are predicted by SVM greatly differ from the real cloud distribution in Fig. 5(a). In contrast, random forest and neural network-based methods reduce the misclassification error of the cloudless area. In the comparison of the CNN and neural network, CNN shows the superiority in the prediction of the cloudless area, which mainly credits the spatial information extraction of cloud images. Finally, as can be seen from the comparison of neural network-based cloud detection methods and forest-based methods, forest-based cloud detection methods show better predictive ability, the misclassification regions of forest-based areas are smaller than that of neural network-based cloud detection methods. In conclusion, the cloud detection algorithms have a large proportion of false detection regions in the prediction of a single spectral satellite image, which motivated our exploration of multi-spectral cloud detection.
To further improve the performance of cloud detection, we integrate spectral information for multi-spectral cloud detection. Since random forest and neural network-based cloud detection show good performance in cloud detection, we focus on the performance comparison of neural network-based multi-spectral cloud detection. Figure 6 shows the performance comparison of  neural network-based multi-spectral cloud detection methods. As can be seen from Fig. 6, ResNet outperforms the CNN and neural network, which indicates that a deeper network structure has a stronger ability to extract features from low-level to a higher level. In addition, as shown in Fig. 6, the training accuracy of the neural network is slightly higher than that of CNN while its testing accuracy is slightly worse than that of the neural network, which implies that CNN is more suitable for multi-spectral cloud detection due to its superior spatial and spectral feature extraction ability. Based on the prominent performance of CNN-based multi-spectral  cloud detection, we further compared the performance of neural network-based cloud detection methods to the forest-based cloud detection methods to verify the effectiveness of MDForest. Table 3 shows the performance comparison of the neural network-based and forest-based cloud detection methods. As shown in Table 3, MDForest achieves comparable accuracy to ResNet while MDForest costs less on training/testing single-spectral cloud images than ResNet-34. As can be seen from Table 3, though ResNet improves the performance of cloud detection based on multi-spectral satellite images, the high complexity of ResNet limited its applicability for practical cloud detection, which is not suitable for fast cloud detection. By comparison, MDForest satisfied the need where the high efficiency and good accuracy of cloud detection are both required. Combined with Tables 2 and 3, it can be seen that random forest is a robust cloud detection method that has good detection efficiency, but from the perspective of the realization of accurate cloud detection, MDForest is the best choice.
Since NN-based cloud detection methods based on multi-spectral satellite images are good at extracting spatial and spectral information, we further verify the generalization ability of neural network-based cloud detection methods and MDForest. Figure 7 shows the comparison of the prediction images of neural network-based methods and MDForest. Figure 7(a) shows the prediction of the neural network based on a multi-spectral satellite image, and the real cloud image is shown in Fig. 5(a). Figure 7(b) presents the prediction of CNN based on the multi-spectral satellite image, Fig. 7(c) provides the prediction result of ResNet based on the multi-spectral satellite image, and Fig. 7(d) shows the prediction result of MDForest based on the multi-spectral satellite image.
As can be seen from Fig. 7, which refers to the original satellite images in Fig. 5(a), the prediction of the neural network is the poorest in the above comparison methods since the neural network cannot effectively utilize spectral information. In addition, the learning mechanism of the neural network is concluded as a fitting process, which cannot well extract the spatial information of multi-spectral cloud samples, making the improvement insufficient compared with other spatial and spectral information learning approaches. As shown in Fig. 7(a), the cloud detection results of the neural network have a high false detection on the area of thin cloud area, the predicted thick cloud region is larger than the other three cloud detection methods. Compared with the real images in Fig. 5(a), the misclassification error of thin cloud to thick cloud is high, making the neural network not a suitable method for multi-spectral cloud detection. In comparison, the prediction results of CNN, ResNet, and MDForest are close to the real image, and the misclassification areas of thin cloud have been greatly reduced. As can be seen from the comparison of Figs. 7(b) and 7(c), the prediction of ResNet is comparable to the prediction result of CNN since ResNet is a deeper framework that inherits the learning pattern and feature extraction mechanism from CNN. As illustrated in Fig. 7, MDForest achieves the best prediction result. Although the misclassification areas that are predicted by MDForest on the complex ground cover (such as the river) are large when compared to CNN's predicted image, MDForest's accuracy in predicting thin cloud patches is significantly improved.
To further validate the effectiveness of MDForest for multi-spectral cloud detection, two groups on two random satellite multi-spectral cloud images are compared. Figures 8 and 9 show the prediction results of CNN, ResNet-34, and MDForest; Fig. 8(a) shows the real satellite image for the multi-spectral satellite image #2; Fig. 8(b) shows the prediction result of CNN for the multi-spectral satellite image #2; Fig. 8(c) shows the prediction of ResNet-34; Fig. 8(d) presents the prediction results of MDForest the multi-spectral satellite image #2. Figure 9(a) shows the real satellite image #3; Fig. 9(b) shows the prediction result of CNN for the multi-spectral satellite image #3; Fig. 9(c) shows the predicted image of ResNet-34 for multi-spectral satellite image #3; Fig. 9(d) shows the prediction result of MDForest for multi-spectral satellite image #3.
As can be seen from Fig. 8, CNN and ResNet misclassified more thick cloud areas into thin cloud regions than MDForest. Based on the comparison of prediction images of the neural network, ResNet, and MDForest using multi-spectral satellite imagery, the prediction result of MDForest has a higher similarity with the real image on the distribution of cloud-free areas, thin cloud regions, and the land covered by thick cloud. In the prediction of the cloud-free area, MDForest shows significant improvement compared with the prediction results of CNN and ResNet-34. As can be seen from Fig. 9, in the area that is covered with rivers, MDForest gets a relatively accurate prediction result while CNN and ResNet-34 classified more areas of rivers into thin cloud areas or thick cloud regions. MDForest achieves good cloud detection performance using multi-spectral satellite imagery due to the robustness of tree-based deep structure and the multi-dimensional and multi-grained scanning mechanism. Compared with neural network-based cloud detection methods, the parameters fine-tuning process of MDForest is much simpler. Consequently, we further study the influence of different parameter settings on the performance of cloud detection. Figure 10 shows the performance of MDForest under a different number of trees in a random forest and The performance comparison with different scale sliding windows, training-0 is the training curve of MDForest without multi-dimensional and multi-grained scanning mechanism; testing-0 is the testing curve of MDForest without multi-dimensional and multi-grained scanning mechanism; training-1 is the training curve of MDForest with 1-grained multi-dimensional scanning; testing-1 is the testing curve of MDForest with 1-grained multi-dimensional scanning, and so on. different grained scanning. Figure 10(a) shows the testing accuracy comparison under different numbers of trees in a random forest, Fig. 10(b) shows the comparison of training accuracy and testing accuracy comparison under different grained scanning. As can be seen from Fig. 10(a), MDForest is an adaptive deep forest that deepens its structure according to the complexity of input data. When the number of decision trees in a random forest is set to 200, MDForest deepens its structure to layer 4 while the three layered MDForest seems to be a model complex enough to deal with the input data. Such character makes MDForest a superior data-driven cloud detection method to NN-based cloud detection methods whose optimal structure is determined by manually fine-tuning. Figure 10(a) demonstrates that MDForest gets the optimal testing accuracy in the setting of each forest is ensembled by 300 decision trees; when the number of decision trees is determined as lower than 300, more decision trees for random forest in HDForest imply the better performance HDForest can get; when the number of decision trees in a random forest exceeds the threshold value of 300, overfitting occurs. Moreover, Fig. 10(b) indicates that more grained sliding windows are beneficial to the performance improvement of cloud detection using multi-spectral satellite images, which further verifies the effectiveness of MDForest.

Conclusions
Though CNN-based methods realized high-performance multi-spectral cloud detection by extracting and integrating spatial and spectral information, the performance improvement on cloud detection was based on the increase of model complexity, which hindered the progress for fast cloud detection. In this study, we proposed a multi-dimensional and multi-grained dense deep forest for cloud detection using multi-spectral satellite imagery. The proposed method was a deep forest structure that simulated the layer-by-layer processing mechanism of NN. In addition, the multi-layered structure gave the proposed method the representation learning ability, allowed it automatically to extract features of satellite cloud images end-to-end. Moreover, the multi-dimensional and multi-grained scanning mechanism possessed the power to deal with multi-spectral satellite cloud detection, which further improved the performance of cloud detection. Finally, a densely connected structure was borrowed in the proposed method to avoid the overfitting problem. Experimental results on HJ-1A/1B demonstrated that the proposed MDForest improved the performance of multi-spectral cloud detection while getting a better cloud detection efficiency, which can be regarded as an alternative to CNN-based methods for multi-spectral cloud detection.
MDForest improved the efficiency and performance of cloud detection based on multi-spectral satellite cloud images. However, there are still some problems that need to be addressed in future work, which include: (1) the recognition of ground area is not ideal enough, in the future work, some noisy samples such as lakes, rivers, and fogs will be collected to enrich the diversity of cloud detection dataset; (2) MDForest improves the efficiency of cloud detection, implementation on some mobile devices can achieve the goal of fast cloud detection. Therefore, in our future work, embedding MDForest into some hardware would be a good solution for practical cloud detection.