Deep convolutional neural network for classifying Fusarium wilt of radish from unmanned aerial vehicles

Abstract. Recently, unmanned aerial vehicles (UAVs) have gained much attention. In particular, there is a growing interest in utilizing UAVs for agricultural applications such as crop monitoring and management. We propose a computerized system that is capable of detecting Fusarium wilt of radish with high accuracy. The system adopts computer vision and machine learning techniques, including deep learning, to process the images captured by UAVs at low altitudes and to identify the infected radish. The whole radish field is first segmented into three distinctive regions (radish, bare ground, and mulching film) via a softmax classifier and K-means clustering. Then, the identified radish regions are further classified into healthy radish and Fusarium wilt of radish using a deep convolutional neural network (CNN). In identifying radish, bare ground, and mulching film from a radish field, we achieved an accuracy of ≥97.4%. In detecting Fusarium wilt of radish, the CNN obtained an accuracy of 93.3%. It also outperformed the standard machine learning algorithm, obtaining 82.9% accuracy. Therefore, UAVs equipped with computational techniques are promising tools for improving the quality and efficiency of agriculture today.


Research Motivation
Radish is one of the major horticultural crops in Korea, occupying ∼10% of the entire vegetable cultivation area. One of the most destructive and economically damaging diseases of radish is Fusarium wilt of radish. It is a vascular disease that causes a chlorosis, necrosis, and abscission of leaves and a discoloration of the vascular elements in roots, stems, and petioles, leading to death of the infected plant. 1 Management and control of Fusarium wilt of radish is challenging for several reasons; for instance, its pathogen is soil inhibiting. Rapid spread of the disease is often observed, resulting in substantial harvest losses. Early detection of the disease could aid in preventing the spread of the disease and minimizing the damage. However, manual inspection is inaccurate, inefficient, and time-consuming. Therefore, an automated, fast, and precise surveillance system for detecting Fusarium wilt of radish is needed.
Remote sensing permits the acquisition and recording of information of agricultural produce and environment. Satellite-and aircraft-based technologies have been the two major remote sensing technologies. Satellite-based remote sensing has been widely studied and applied but suffers from insufficient information due to low resolution images, inaccurate (or poor quality) information due to local weather conditions, and a high cost for the system. 2,3 Aircraft-based remote sensing is often equipped with multiple sensors or cameras, providing high quality information or images. However, the system is still costly and hard to operate. 4 Alternatively, unmanned aerial vehicles (UAVs) are remote controlled aircraft that offer ad hoc remote sensing of the surface at relatively low altitudes. 5 Due to the rapid advances in sensing, control, and positioning techniques, UAVs are now capable of acquiring high spatial resolution surface images at a low operational cost. With the greater capability and availability as well as cost-effectiveness, the applications of UAVs are rapidly growing 6 such as traffic monitoring, 7 forest fire monitoring, 8 and search and rescue operations. 9 UAVs also have great potential for improving agriculture. 10,11 They can not only facilitate obtaining crop and field information in a timely manner but also assist farmers in improving crop management and farm planning.
Computer and information technologies can process and analyze the information or images obtained by remote sensing to monitor and assess the farming condition, e.g., crop health, crop yield, and harvest time. Several computer systems have been developed for improving agriculture. For example, plant disease detection, 12-15 quality inspection of agriculture products, 16 and vegetable classification. 17 These systems were mainly developed based on standard computer vision and machine learning methods such as support vector machine (SVM). Deep learning is a new paradigm of machine learning. It has recently proved to be useful for several applications, for instance, image recognition, 18-21 speech recognition, 22 and drug discovery. 23 The technique, especially, provides an efficient and effective means of handling large-scale datasets as well as discovering intrinsic feature representation of the datasets.
In this study, we propose a systematic approach that combines UAVs with computerized methods to detect Fusarium wilt of radish. Images of radish fields are obtained from UAVs at low altitudes. The state-of-the-art computer methods, including deep learning, are utilized to process and analyze the radish images. The rest of this paper is organized as follows. In Secs. 2 and 3, we describe the data acquisition, image processing, and classification procedures for detecting Fusarium wilt of radish. In Sec. 4, the performance of our approach in detecting Fusarium wilt of radish is presented. In Sec. 5, we conclude with the summary of our findings and perspectives on future directions.

Image Acquisition
Images of radish fields were acquired in Hongchun-gun and Jungsun-gun, Kangwon-do, Korea, from July to September 2016. A commercial UAV (Phantom 4, DJI co., Ltd.), equipped with an RGB camera (12 mega pixels), was used to obtain the field images at the altitudes of ∼10 m above ground level. Each image has a spatial dimension of 4000 × 3000 pixels with 72 dpi. In total, 139 images were attained. Figure 1 shows the exemplary images of radish fields that were acquired from the UAV.

Dataset Preparation
Two types of datasets were constructed. The first dataset (dataset1) includes three distinctive regions of radish fields (radish, mulching film, and bare ground) (Fig. 1). Each image was manually reviewed, and the regions of interest (ROIs) corresponding to radish, bare ground, and mulching film were selected. In total, 1734 ROIs were selected from 139 images; 634 ROIs (average size of 475 × 408, ranging from 38 × 69 to 1155 × 1133) for radish, 580 ROIs (average size of 289 × 220, ranging from 19 × 16 to 900 × 1103) for bare ground, and 520 ROIs (average size of 158 × 128, ranging from 22 × 28 to 799 × 510) for mulching film regions. This dataset is used for radish field classification (Sec. 3.1) and segmentation (Sec. 3.2). The second dataset (dataset2) contains ROIs for healthy radish and Fusarium wilt of radish (Fig. 1). Acquiring the images of radish fields, the infected regions were first identified. Afterward, the images were further examined with the prior knowledge of the infected regions, and then the ROIs corresponding to Fusarium wilt of radish and healthy radish were manually identified and delineated. 1158 ROIs (average size of 273 × 280, ranging from 80 × 80 to 836 × 943) for healthy radish and 904 ROIs (average size of 207 × 204, ranging from 39 × 46 to 596 × 632) for Fusarium wilt of radish were selected from 139 images. This dataset is used for detecting Fusarium wilt of radish (Sec. 3.3).

Methodology
The overview of the proposed method is shown in Fig. 2. First, we conduct radish field classification using a softmax classifier. The classification of the radish field aims at identifying the class label of the respective radish, bare ground, and mulching film ROIs. The ROIs are provided in sataset1. Second, we perform the segmentation of a whole radish field. The whole radish field is partitioned into a number of disjoint regions, and their class labels are determined by the radish field classifier. Finally, a convolutional neural network (CNN) model is built for classifying Fusarium wilt of radish using dataset2. Fusarium wilt classification is only applied to the regions of radish that were preidentified in the radish field segmentation step.

Radish Field Classification
We extract texture-and color-based features from the radish, bare ground, and mulching film regions (dataset1). Texture-based features are extracted using local binary pattern (LBP). 24 Color-based features are extracted by applying color-space conversion and an AutoEncoder (AE). 25 Concatenating these two feature sets, a two-stage feature selection method is applied to choose the most discriminative features. A subset of features that are the most informative and useful in classifying radish field is designated as the most discriminative features. Utilizing the discriminative features, a softmax classifier is constructed for radish field classification. The trained softmax classifier provides the probability that a region belongs to each class label. The class label with the highest probability is assigned to each region. Figure 3 shows the entire process of radish field classification.

Texture feature extraction
LBP can provide texture descriptors that are invariant to rotation and illumination changes at low computational cost. Given a (center) pixel c in an image, LBP examines its neighboring pixels (a set of regularly distributed pixels on a circle) p (p ¼ 0; : : : ; P − 1) in a radius R and generates a binary pattern code as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 4 1 4 where sðxÞ is 1 if x ≥ 0 and 0 if x < 0 and g c and g p represent the gray level of the center pixel and its neighborhood pixels, respectively. To achieve rotational invariance, binary pattern codes are generated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 2 7 LBP ri P;R ¼ minfRORðLBP P;R ; iÞji ¼ 0;1; : : : ; P − 1g; (2) where RORðx; iÞ is computed by a circular bitwise right shift operation, namely, the same binary pattern code generated by the bitwise operation is regarded as one identical pattern. LBP features are computed on a gray-scale image using three neighboring topologies ðP; RÞ ¼ ð8;1Þ; ð16;2Þ; ð24;3Þ, generating 703,404 features.

Color feature extraction
Radish field images are initially obtained in RGB (red, green, and blue) color space and converted into HSV (hue, saturation, and value) and L Ã a Ã b (lightness, green-red, blue-yellow) color spaces. Histograms are built on hue, Ãa, and Ãb channels (256 bins or features per histogram). Then, we concatenate these three color histograms into one color histogram, generating 768 color features. The color histogram features are further processed by adopting AE. 25 AE is an unsupervised learning technique, typically used for dimensionality reduction. It consists of input and output layers (of the same dimensionality) and hidden layer(s). It tries to learn an approximation/representation of the input. The dimensionality of the hidden layers is smaller than the input and output layers. The hidden layers learn the compressed representation of the input (encoding), i.e., extracting meaningful features from the input. Finally, applying two-stacked AE 26 on the color histogram features, we obtain M reduced and compressed features ( Fig. 4; M ¼ 100).

Feature selection
We perform a two-stage feature selection to choose the most discriminative features for radish field classification. In the first stage, Wilcoxon rank-sum test is used to select statistically significant features for classification (p-value < 0.01). In the second stage, random forests (RF) with 50 trees and out-of-bag (OOB) scheme are adopted to estimate the importance of the features. RF is one of the standard machine learning algorithms for classification and regression. It constructs multiple decision trees using bootstrap aggregating, combining classification models of a randomly generated training dataset and a random selection of features. The OOB error is a measure of prediction error based on random subsampling of the training dataset.
To assess feature importance, each feature is permuted and the OOB error is computed again. The difference in OOB errors before and after the permutation becomes the importance of each feature. Only features with feature importance >0 are considered. In total, 1770 features are selected from 703,404 features (Sec. 3.1.2).

Softmax classifier
A softmax classifier is a generalization of logistic function that can be used for multiclass classification. In an artificial neural network-based classifier, it is mainly adopted as a final classification layer. Given a feature vector x, the softmax classifier outputs the probability for each class label jðj ¼ 1; : : : ; CÞ as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 2 2 4 Pðy ¼ jjxÞ ¼ e uðxÞj P K k¼1 e uðxÞk ; for j ¼ 1: : : :C; E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 1 7 7 where y is a class label, w ij is a weight, and b j is a bias (i ¼ 1; : : : ; I; j ¼ 1; : : : ; C). I and C denote the number of features and classes (radish, bare ground, and mulching film), respectively. Computing the softmax classifier amounts to determining the weight w and bias b. These are chosen to minimize mean squared error with 200 iterations. Equation (3) is called the softmax function, which outputs a C-dimensional vector of real values between 0 and 1, representing categorical probability distribution. The class label with the highest probability is assigned to x.

Radish Field Segmentation
The whole radish field is segmented into radish, bare ground, and mulching film using the radish field classifier (Sec. 3.1). The radish field classifier is built on ROIs, i.e., extracting texture-and color-based features from ROIs and assigning class labels to them. To apply the radish field classifier to the whole radish field image, we first identify a number of distinct regions and conduct radish field classification. K-means clustering is adopted to divide a whole radish field image into a number of disjoint regions. Converting the color space of a radish field image (RGB) into HSV and L Ã a Ã b color spaces, K-means clustering is performed on the hue channel (HSV), Ãa and Ãb channels (L Ã a Ã b) (K ¼ 3, 5, 10, 15, and 20). For each of the resultant clusters, the texture-and color-based features (Secs. 3.1.1 and 3.1.2) are extracted, and the radish field classifier (Sec. 3.1.3) assigns a class label (radish, bare ground, and mulching film) to each of the disjoint regions. Figure 5 shows the procedure of radish field segmentation.

Fusarium Wilt of Radish Detection
We employ a CNN model to detect Fusarium wilt of radish. Radishes are identified via radish field segmentation (Sec. 3.2). By sliding a rectangular window of a fixed size (200 × 200 pixels) over the identified radishes, the CNN model determines the disease status, stepping by 50 pixels.

Convolutional neural network
A VGG-A network 27 is adopted to distinguish Fusarium wilt of radish from healthy radish. A VGG network has been successfully applied to image recognition. 19 It consists of eight layers of convolutional layers and two layers of fully connected layers ( Table 1). The original VGG-A network takes images of size 224 × 224 as input. In this study, the images of size 200 × 200 are fed to the network as input.

CNN training
Our CNN model is trained using dataset2 (healthy radish ROIs: 1158 and Fusarium wilt of radish ROIs: 904). Each ROI is drawn on a whole radish or the infected area of a radish. For each of the ROIs, an image patch is generated by drawing the smallest rectangular window encompassing the ROI. All the image patches are resized to a fixed size of 200 × 200 pixels (RGB), which is about the average size of Fusarium wilt of radish ROIs. 20% of the training dataset is randomly selected and left as the validation dataset. The validation dataset is used for tuning the learning rate. In training, we set the batch size to 90 and momentum to 0.9. The learning rate is initially set to 0.01. As the error rate on the validation set reaches a plateau, the learning rate decreases by a factor of 10. This is performed three times, i.e., the learning rate gradually reduces to 0.001. The training runs for 100 epochs, taking ∼40 min. The detailed training steps are available in Ref. 27.
For training our CNN model, NVIDA DIGITS 5 toolbox with Caffe framework was used. The experiments were performed on a Linux machine, with Ubuntu 14.04, Intel ® Core i7-5930K processor, three NVIDIA Titan X 12GB GPUs, four 3072 cuda cores, and 64GB of DDR4 RAM.

Evaluation Methods
We assess the performance of the proposed methods (radish field classification, radish field segmentation, and Fusarium wilt of radish classification) via k-fold cross-validation (k ¼ 3). k-fold cross-validation divides the entire dataset into k roughly equal-sized disjoint subsets. Two subsets are used to train the proposed methods. The remaining subset is used to evaluate the performance of the methods. This is repeated k times with differing choices of the remaining subset. For radish field classification, the confusion matrix is computed to assess the ability of our model to distinguish differing areas of radish fields (radish, bare ground, and mulching film). The confusion matrix CM can be computed by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 4 8 8 jfr ∈ R such that Ground TruthðrÞ ¼ i and Prediction ðrÞ ¼ jgj; where R is the ROIs and GroundTruthðrÞ and PredictionðrÞ denote the ground truth class label and predicted class label of an ROI r, respectively. The pixel-level segmentation accuracy 29 (PSA) is adopted to evaluate the radish field segmentation performance. PSA is calculated as follows: PSA is measured for differing choices of K in K-means clustering (K ¼ 3, 5, 10, 15, and 20) to examine the effect of the size of the clusters in segmentation performance.
Examining the performance of Fusarium wilt of radish classification, the confusion matrix is computed. Also, the true-positive rate (TPR; the rate of Fusarium wilt of radish ROIs that are correctly classified as Fusarium wilt of radish), true-negative rate (TNR; the rate of healthy radish ROIs that are correctly classified as healthy radish), and accuracy (the rate of Fusarium wilt of radish and healthy radish ROIs that are correctly classified as labeled) are measured.

Radish Field Classification Results
Using dataset1 (radish ROIs: 634, bare ground ROIs: 580, and mulching film ROIs: 506), radish field classification was performed. Table 2 describes the classification results. The experimental results suggest that our model could determine the class label of the radish, bare ground, and mulching film ROIs with high accuracy. It is notable that we obtained the highest classification performance (99.7%) for radish ROIs. <3% of the bare ground and mulching film ROIs were misclassified.

Radish Field Segmentation Results
Radish field segmentation results (PSA) with differing choices of the number of clusters (K) in K-means clustering are shown in Table 3. As K increases, the overall PSA increases up to ∼93%, but when K > 10, it gradually decreases. This may be ascribable to the size of clusters.
Increasing K, the size of each cluster decreases. The features, computed from the smaller clusters, may not be robust enough to provide accurate classification performance. With K ¼ 10, >91% PSA was achieved for radish, bare ground, and mulching film. The experimental results suggest that K ¼ 10 is the optimal number of clusters for radish field segmentation. Figure 6 shows the segmentation results with K ¼ 10. Regions corresponding to radish, bare ground, and mulching film are, in general, correctly classified as labeled. However, misclassified regions are also observed (Fig. 7). These include withered radishes that are mainly brown in color. Due to the similarity in color with bare ground, these regions were clustered together with bare ground by K-means clustering, that is, it is not caused by the radish field classifier but by the clustering method. As described in Sec. 3.2, K-means clustering is based on three color channels.

Fusarium Wilt of Radish Classification Results
In Table 4, we demonstrate the performance of Fusarium wilt of radish classification. The classification accuracy was measured via k-fold cross-validation (k ¼ 3) on dataset2 (healthy radish ROIs: 1158 and Fusarium wilt of radish ROIs: 904). We first performed radish field segmentation and discarded ROIs that contain ≤1% of radishes. Then, our CNN model distinguished Fusarium wilt of radish from healthy radish, achieving an accuracy of 93.3%. TPR and TNR were 87.2% and 98.0%, respectively. We note that the image patches may include the regions of differing class labels due to the image patch generation process (Sec. 3.3.2); for example, Fusarium wilt of radish ROIs could include a part of a healthy radish. This may have an adverse effect on the performance of Fusarium wilt of radish classification. A finer training strategy utilizing the exact regions will aid in improving the overall performance of our method.
Further, the performance of our CNN model was superior to the standard machine learning algorithm. Using RF, 82.9% accuracy, 83.1% TPR, and 82.8% TNR were obtained in detecting Fusarium wilt of radish (Table 4). This confirms that the CNN model could improve the standard machine learning scheme.  In addition, we repeated the above experiment with varying sizes of image patches to our CNN model. Resizing the image patches to 120 × 120 and 280 × 280, the performance of our CNN model was consistent (Table 5). The results prove that our method is insensitive to the size of images. Figure 8 shows the detection result of Fusarium wilt of radish. Regions with Fusarium wilt of radish are marked with red circles [ Fig. 8(a)]. The region-by-region detection results are provided in Fig. 8(b). The size of sliding window is 200 × 200, which is about the average size of  Fusarium wilt of radish ROIs. The ROIs were drawn on a radish or the infected area of a radish. Hence, our method was able to detect the individual infected areas. Overall, the regions of healthy radish and moderate Fusarium wilt of radish were successfully detected by our method. However, regions of severe Fusarium wilt of radish were often missed (Fig. 9). This is mainly due to segmentation failure, i.e., K-means clustering as mentioned in Sec. 4.2.

Conclusions
We have demonstrated an approach of utilizing UAVs and computational techniques to identify Fusarium wilt of radish. Deep learning, in particular, was able to detect Fusarium wilt of radish with high accuracy. The capability to detect Fusarium wilt of radish from UAVs may offer great potential for reducing the effort and cost for managing and preventing the disease as well as improving the crop yield. Our method can be applied to other crops and plants since Fusarium wilt is a common vascular disease of plants, including tomato, tobacco, banana, and etc. This study has several limitations. First, the performance of our methods was evaluated via cross-validation. A validation study on an extended dataset will further ensure the robustness of our methods. Second, only RGB images were considered. For crop monitoring and management, infrared images are often employed. Developing a methodology to combine RGB images and infrared images may further improve the performance of our methods. Third, severe Fusarium wilt of radish was often missed. Advances in segmentation methods will lead to the improved detection accuracy. Last, the severity of Fusarium wilt of radish was not considered. Depending on the level of the severity, the plan for controlling and preventing the disease may differ. Further study will be conducted to tackle the present limitations, to improve accuracy and robustness of the detection, and to facilitate efficient and effective monitoring and prevention of Fusarium wilt of radish.