Automated classification of citrus disease on fruits and leaves using convolutional neural network generated features from hyperspectral images and machine learning classifiers

Abstract. Citrus black spot (CBS) is a fungal disease caused by Phyllosticta citricarpa that poses a quarantine threat and can restrict market access to fruits. It manifests as lesions on the fruit surface and can result in premature fruit drops, leading to reduced yield. Another significant disease affecting citrus is canker, which is caused by the bacterium Xanthomonas citri subsp. citri (syn. X. axonopodis pv. citri); it causes economic losses for growers due to fruit drops and blemishes. Early detection and management of groves infected with CBS or canker through fruit and leaf inspection can greatly benefit the Florida citrus industry. However, manual inspection and classification of disease symptoms on fruits or leaves are labor-intensive and time-consuming processes. Therefore, there is a need to develop a computer vision system capable of autonomously classifying fruits and leaves, expediting disease management in the groves. This paper aims to demonstrate the effectiveness of convolutional neural network (CNN) generated features and machine learning (ML) classifiers for detecting CBS infected fruits and leaves with canker symptoms. A custom shallow CNN with radial basis function support vector machine (RBF SVM) achieved an overall accuracy of 92.1% for classifying fruits with CBS and four other conditions (greasy spot, melanose, wind scar, and marketable), and a custom Visual Geometry Group 16 (VGG16) with the RBF SVM classified leaves with canker and four other conditions (control, greasy spot, melanoses, and scab) at an overall accuracy of 93%. These preliminary findings demonstrate the potential of utilizing hyperspectral imaging (HSI) systems for automated classification of citrus fruit and leaf diseases using shallow and deep CNN-generated features, along with ML classifiers.


Introduction
Citrus black spot (CBS) is a fungal disease caused by Phyllosticta citricarpa (syn.Guignardia citricarpa) that results in lesions on fruit surfaces and potential premature fruit drop, leading to reduced yield. 1,2Many countries, including the European Union, prohibit the acceptance of fruits with CBS lesions due to its classification as an A1 quarantinable disease.It is therefore crucial to control and identify infected fruit to safeguard the profitability and marketability for Florida's growers. 3CBS was initially reported in March 2010 in Collier County, Florida, and it remains a persistent concern in southwest Florida. 4 Infected fruit peels exhibit five distinct lesion types, including Hard Spot, False Melanose, Freckle Spot, Cracked Spot, and Virulent Spot.Hard spots are round, sunken lesions with brick-red to brown margins and grey centers.Leaf lesions are rare, mostly asymptomatic, but on younger leaves, they manifest as small (< 5 mm), round, reddishbrown lesions.On older leaves, they present tan necrotic centers with dark brown margins and yellow halos.Early detection of CBS-infected trees through fruit examination enables growers to implement necessary mitigation measures, preventing disease spread within orchards and minimizing long-distance transmission, thereby maximizing yields.Current recommendations for CBS-affected Florida orchards include monthly fungicide applications from May to September. 5Additionally, by identifying CBS-infected fruits, they can be separated during postharvest packinghouse operations to prevent accidental shipment, avoiding rejected loads and potential bans in numerous countries.Presently, the detection of CBS-infected fruits, whether in orchards or packinghouses, relies on manual inspection, a time-consuming process prone to human error due to fatigue.
In addition to CBS, citrus canker is another significant disease; it is caused by the bacterium Xanthomonas citri subsp.citri (syn.X. axonopodis pv.citri). 6,7This bacterium can infiltrate citrus tree tissues through wounds and leaf stomatal openings.Citrus canker is highly contagious and can result in premature fruit drops, leading to a decrease in yield.Symptoms of this disease manifest as blister-like lesions on fruits and leaves.Early detection plays a crucial role in managing the disease as it can help slow down fruit infection originating from the inoculum produced by leaf infections.Similar to CBS, manual inspection of citrus canker symptoms on leaves is a laborious and time-consuming process that may introduce human error, affecting detection accuracy. 8,9herefore, there is a need for automated systems that can tirelessly and accurately perform the classification tasks and at faster speeds.2][13] HSI systems are used widely in detection and classification of diseased fruits, vegetables, plants, crops, etc. because they provide unique spectral signatures of these in a wide range of spectra. 12Recently CV algorithms based on machine learning (ML) and its sub-class deep learning (DL) have found immense success in disease detection and classification tasks.For instance, Nagasubramanian et al. 14 detected charcoal rot fungal disease on soybeans using images and a DL-based CV algorithm at an accuracy of 95.73%.In another study by Zhang et al., 15 they detected yellow rust on winter wheat at an accuracy of 85% using HSI and a DL-based CV algorithm.Similarly, Yadav et al. 11,16 used a DL-based CV algorithm with HSI to detect citrus canker and other disease conditions on Ruby Red grapefruit at an average accuracy of 98.87%.Most of the image-based classification and detection tasks using DL rely on convolution neural network (CNN) generated features.CNN-based DL for image classification has found tremendous success in a wide range of tasks.Gulzar 17 used a CNN-based DL approach to classify 14 varieties of crop seeds at 99% accuracy.In another study by Tiwari et al., 18 they classified plant leaf symptoms at an accuracy of 99.2% using CNN-based DL approach.The advantage of this method is that CNN can extract spatial features automatically in each spectral band, providing the ability for real-time classification and detection.However, using full spectrum of HSI with CNN for real-time applications may be quite challenging as HSI usually consists of >100 bands or anything >10 bands in general. 19In HSI, a large number of bands imply high-dimensional vector spaces corresponding to the wavelengths.This makes normally distributed data concentrated at the tails and the uniformly distributed data concentrated at the corners, making its statistical density estimation difficult. 19,200][21] Combining both of these issues related to high dimensional data, for a fixed sample size, classification accuracy may first increase with increasing features but then starts to decrease beyond a certain optimal value, which is sometimes referred to as the curse of dimensionality. 19,21,22This is why, in HSI, optimal band selection for dimension reduction is usually preferred.An optimal band selection technique based on principal component analysis (PCA) as shown by Kim et al. 23 can be used for this purpose.A similar approach was successfully used by Yadav et al. 11 for selecting the five most important bands out of the 92 HSI bands for classifying Ruby Red grapefruits with canker and five other disease conditions.In another study by Zhao et al., 24 they used PCA for most discriminating band selection from the HSI and then used those to train ML classifiers for detecting the severity of wheat leaves infected by powdery mildew at an accuracy of >93%.Selecting optimal bands by reducing the high-dimensional vector space to a reduced one using the PCA technique not only eliminates the issues posed by curse of dimensionality but also provides an opportunity to develop a multi-spectral imaging (MSI) system that is commercially viable option for fruit and leaf inspection. 11,16The optimal bands selected by the PCA method can be used to train ML and DL-based classification algorithms for detecting CBS-infected fruit and cankerous leaves from healthy ones or with other conditions.
Among the many existing CNN-based DL algorithms, the Visual Geometry Group 16 (VGG16) 25 has been successfully used in many image classification purposes ranging from medical to agricultural applications.For example, Sholihati et al. 26 used VGG16 to classify four types of potato leaf diseases at an overall accuracy of 91%.In another research work by Cai et al., 27 they used VGG16 to classify six different types of cotton trash at an overall accuracy of 84.14%.Similarly, Gharakhani et al. 28 used VGG16 to classify seven varieties of cotton plants using under canopy images at an accuracy of 86%.The original VGG16 network architecture consists of SoftMax 29 as the classifier.The SoftMax classifier in the VGG16 network was chosen for its suitability in multi-class classification tasks, its ability to provide a probability distribution across classes, and its compatibility with the cross-entropy loss for effective training.The combination of SoftMax and cross-entropy loss has become a standard choice for training deep neural networks for classification.1][32] It is known that SVMs are robust to outliers in the training data, which is why SVM may be more resilient than SoftMax, which tends to be relatively more sensitive to outliers.In addition to this, results provided by SVMs are more interpretable as they provide a clear decision boundary in feature space.This is why, in certain cases, replacing SVM with SoftMax may be advantageous.
Just as replacing different classical ML algorithms in the CNN network can outperform the original network's performance in some cases, sometimes a deep CNN may not be required to achieve higher classification accuracy.In other words, a shallow CNN may outperform deep CNN when hyperparameter values are fine-tuned and optimized for the dataset used.For example, Kim et al. 33 showed that their shallow CNN outperformed VGG16 deep CNN in the detection of surface cracks on concrete structures in terms of accuracy as well as computation cost (less for the shallow network).Similarly, Li et al. 34 showed that their custom shallow CNN outperformed Xception 35 and InceptionV3 36 deep CNNs in classifying diseased images of maize, apple, and grape in terms of precision, recall, and F1-score.
The overall goals of this study, which is a derivative of our previous study, 16 are to explore the application of shallow CNN with SoftMax and SVM to classify HSI images of CBS infected "Valencia" orange fruit from four other conditions (greasy spot, melanose, wind scar, and marketable) and to use VGG16 deep CNN with SoftMax and SVM to classify citrus canker-affected leaves from four other conditions (control, greasy spot, melanoses, and scab).The specific objectives are as follows: (i) use PCA to select the top five discriminant bands from the 92 HSI bands used in imaging CBS infected orange fruits, (ii) train a custom shallow CNN with SoftMax and SVM classifiers using the selected five bands for the classification of orange fruits with CBS and four other conditions, (iii) use PCA to select the top five discriminating bands from the 348 HSI bands used in imaging citrus leaves affected with canker and four other conditions, and (iv) train VGG16 with SoftMax and SVM classifiers using the selected five bands for classification of citrus leaves with canker and four other conditions.

Hyperspectral Imagery System
Two types of HSI systems were used in this study.The HSI system used for the CBS dataset on "Valencia" orange fruits, previously described by Bulanon et al. 10 (Fig. 1) and based on Kim et al. 37 's design recommendations, included an EMCCD camera (Luca, Andor Technology), an ImSpector V10E imaging spectrograph, and a C-mount lens.Illumination was achieved using halogen line lamps with a DC voltage-regulated power supply, housed within a dark box to prevent external light interference.Reflectance measurements utilized a similar lighting setup.The system's software, based on Microsoft Visual Basic, facilitated parameterization and data transfer, following the approach of Kim et al. 23 Spectral calibration employed an Hg-Ne lamp, focusing on the efficient wavelength range of 450 to 930 nm due to the system's limitations in the visible and NIR regions.
The second HSI system (Fig. 2) was developed recently and is an improved and portable version of the previous one that comprised a total of 348 spectral bands in the wavelength range of 395 to 1005 nm and was used for canker and four other conditions of leaf samples.
The leaf samples in the new HSI system were illuminated by two separate LED line lights (Metaphase Technologies, Bristol, PA), which emit visible and near-infrared (VNIR) broadband light for reflectance imaging and ultraviolet-A (UV-A) excitation light for fluorescence imaging.The VNIR light employs LEDs at seven wavelengths, namely 428, 650, 810, 850, 890, 910, and 940 nm, whereas the UV-A light uses a single wavelength at 365 nm.Both units utilize an identical rod focal lens to generate a narrow line beam that is roughly 280 mm long and 15 mm wide on the sample holder.The intensities of the LEDs at the eight wavelengths can be adjusted through two digital dimming controllers, with three channels each.Specifically, four channels are used to regulate the intensities at 365, 428, and 650 nm and a bundle of five NIR wavelengths (810, 850, 890, 910, and 940 nm).The lights are angled at approximately 6 deg from the vertical position to overlap their line illuminations on the sample surface.Reflectance and fluorescence signals in the VNIR range are collected using a miniature line-scan hyperspectral camera (Nano-Hyperspec VNIR, Headwall Photonics, Bolton, MA), which integrates an imaging spectrograph and a CMOS focal plane array detector (12-bit and 1936 × 1216 pixels).To capture a wide-angle view, a lens is attached with a 5-mm focal length (Edmund Optics, Barrington, NJ) to the camera.Finally, a long-pass gelatin filter (>400 nm, Kodak, Rochester, NY) is attached to the lens to remove second-order effects from the UV-A excitation.To facilitate the study of plant samples (leaves and fruits), a custom sample holder was created in the new HSI system using a 3D printer (F370, Stratasys, Eden Prairie, MN).The holder is made of black thermoplastic and measures 254 × 197 × 15 mm 3 .It is partitioned into four identical sections (2 × 2), each of which can hold the samples such as citrus leaves and peels.A reflectance standard panel, measuring 254 × 32 × 15 mm 3 and supplied by Labsphere of North Sutton, NH, USA, is mounted alongside the sample holder to enable flat-field correction for reflectance images.For line-scan image acquisition, a linear motorized stage from FUYU Technology of Chengdu, Sichuan, China, is used to move the sample holder and the reflectance panel beneath the hyperspectral camera.The camera has a spatial resolution of 0.33 mm∕pixel when set at a lens-to-sample distance of 285 mm.Each camera frame is scanned, and an 810 × 348 (spatial × spectral) pixel region of interest (ROI) is extracted, covering a spectral range of 395-1005 nm and a 270 mm instantaneous field of view.To prevent the influence of ambient light on the images, an enclosure was built from black aluminum composite boards with an aluminum frame measuring 56 × 36 × 56 cm 3 .The enclosure houses the LED lights, camera, sample holder, and reflectance panel, as well as the moving stage.The power supplies and controllers for the lights and stage are located outside the enclosure.A powered four-port Universal Serial Bus (USB) hub is used to connect the major hardware components (i.e., two lights, a camera, and a stage) to a laptop computer.The compact HSI system, built on a 45 × 60 cm 2 optical breadboard, is easily transportable, making it ideal for on-site and field experiments.
The system software for the new HSI system was developed using LabVIEW (v2022, National Instruments, Austin, TX) and runs on a Windows 11 (Microsoft Corporation, Redmond, WA) computer.A graphic user interface for the software was developed (Fig. 3) using LabVIEW's vision development module to enable image and spectrum display.To implement parameterization and data transfer functions, software development kits from hardware manufacturers were used: these include user datagram protocol for LED light control, USB for camera control, and serial communication for stage movement control.The hyperspectral camera continuously collects line-scan reflectance signals from the standard panel and samples passing below as the sample holder is translated by a motorized stage.When the entire sample holder passes the camera's scanning line, the reflectance image acquisition is complete.The VNIR line light is then turned off, and the UV-A line light is turned on for 10 s to stabilize the LED output.The camera begins the continuous acquisition of line-scan fluorescence signals as the stage moves back toward the starting position.When the stage reaches the origin, the UV-A light is turned off, completing a full imaging cycle that creates a pair of hyperspectral reflectance and fluorescence images from the same samples.The spatial resolution along the translation direction depends on the moving speed and the number of total scans for a predetermined scan distance.A moving speed of 3.3 mm∕s can scan 250 lines for a one-way travel distance of 250 mm in ∼76 s, resulting in an ∼1 mm∕pixel spatial resolution.To synchronize continuous line-scan image acquisition and translation stage movement, the stage moving speed is determined based on the exposure time of the camera, with low speed for a long exposure time and high speed for a short exposure time.An empirical reciprocal relationship was found between the moving speed (V in mm∕s) and the exposure time (T in s) based on test results (i.e., V ¼ 0.99∕T).For example, for exposure times of 0.3 and 0.6 s, the corresponding moving speeds were determined to be 3.3 and 1.65 mm∕s, respectively.In addition to the continuous moving mode, the HSI system can also carry out incremental step-by-step line scanning (i.e., stop-and-go mode).The software displays a pair of reflectance and fluorescence images along with an original spectrum and a spatial profile and updates them line by line to show the real-time scan progress during image acquisition.After each measurement, the reflectance and fluorescence images acquired from the same samples are saved into two separate data files using a standard format of band interleaved by line.However, in this study, only the reflectance files, i.e., the files with HSI cubes, were used.

Citrus Fruit Dataset
The citrus fruit dataset consisted of "Valencia" oranges that were hand-picked from CBS-infested citrus grove near Immokalee in southwestern Florida in April 2010. 38The orange fruit samples consisted of 90 marketable, 135 CBS, 90 greasy spot, 105 melanose, and 105 wind scars, for a total of 525 fruit samples.Of these, training and validation datasets were split in the ratio 4∶1.The oranges were washed and stored in an environmental control chamber maintained at 4°C.The fruit samples were removed from the control chamber two hours prior to image acquisition by the HSI system, shown in Fig. 1.In Fig. 4, pseudo RGB color images are shown for each of the five orange peel conditions.

Citrus Leaf Dataset
Citrus leaf samples were collected from a citrus grove located at the University of Florida's Citrus Research and Education Center in Lake Alfred, Florida, in November 2022.The "Valencia" leaf dataset included 16 with canker, 20 healthy∕control, 16 with greasy spot, 16 with melanose, and 12 "Furr" mandarin leaves with scab symptoms, for a total of 80 leaf samples.This was too small of a dataset to train any ML or DL model, which is why image augmentation techniques were applied to increase the dataset size for each of the conditions, resulting in 100 HSI images for each class totaling 500 HSI images.Even though the data size for each class was too small to train ML and DL algorithms for classification, they were used as an exploratory approach for the new HSI system, as shown in Fig. 2. HSI cubes with pseudo RGB images generated by the new system for the five different leaf conditions are shown in Fig. 5.  Unlike in Fig. 5, the HSI cubes in Fig. 6 were generated after the raw HSI cubes were calibrated using the reflectance panel and spectral binning to 116 bands from the original 348 bands.Reflectance calibration provides correction for factors such as sensor response, lightning conditions, etc. ensuring accurate measurements.These cubes were then resized to that of individual leaf sized cubes for further processing.

Principal Component Analysis for Optimal Band Selection
PCA is a widely used method for dimension reduction to avoid the effects of the curse of dimensionality, especially when dealing with high-dimensional spaces such as in the case of the HSI system.Even though PCA is a feature extraction approach for dimension reduction, it can be used for feature selection using a loading factor, i.e., by determining the most informative feature on the variance. 39,40The reason for choosing PCA for feature selection rather than feature extraction in this study was because, in the feature extraction approach, original features are transformed along latent dimensions, i.e., along the principal components (PCs), which may not be interpretable or at least may be less interpretable because of the Karhunen-Loève (KL) transform that it uses. 39,41onsidering a random variable B ∈ R p representing spectral feature bands of the HSI system with p ¼ 92 for the old system and p ¼ 348 for the new system, mean μ B , and covariance matrix P B , then after applying the KL transformation, the extracted features by PCA are given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 4 ; 1 4 7 B 0 ¼ TðBÞ: (1) Assuming n PCs, which are given as Z 1 : : : Z n , we rewrite Eq. ( 1) as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 4 ; 1 1 1 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 4 ; 7 5 where W ¼ ½u 1 : : : u n is the loading matrix.The greater values of u i imply a higher importance of the corresponding feature vector.It is worth noting that each of the n PCs is orthogonal to the others and is in the direction of the largest variance.

CNN with SoftMax and SVM Classifiers
Two types of CNN architectures were used in this study.For the "Valencia" orange dataset consisting of CBS infected fruits and four other conditions, a custom shallow network was developed; it consists of four convolution layers, three batch normalization layers (BN), three fully connected (FC) layers, and a SoftMax classifier (Fig. 7).The convolution layers perform a specialized type of linear operations called convolution that is used for feature extraction by applying a kernel (i.e., small array of numbers) across the input tensors (i.e., array of numbers). 42The features are then propagated in a forward direction using a training dataset, and then learning parameters such as kernels and weights based on the calculated loss values are updated through backpropagation and gradient descent optimization algorithm.The BN layer was used to normalize the input image data to zero-mean and constant standard deviation, which has been shown to improve the accuracy and training speed of many CNNs. 43The first BN layer was used after the first two convolution layers, and the second BN layer was used after another two convolution layers.The output feature map of the fourth layer, i.e., the last convolution layer, was transformed into a one-dimensional array by flattening it and then connecting it to the first FC layer (i.e., the first dense layer).The FC layers connect each of the inputs to every output of the learnable weights. 42Then the third BN was used, and the remaining two FC layers were used.The last FC layer consists of five output nodes, corresponding to the output probabilities for each of the five classes of orange fruit peel conditions.The input layer was designed to accept input images of shape n × 211 × 210 × 5, where n represents number of input images in the dataset.The SoftMax classifier (in the first approach) used in the last FC layer is used for multiclass classification, which uses the SoftMax activation function to normalize the real-valued outputs from the last FC layer in the range between 0 and 1 corresponding to class probabilities. 42Assuming x i as the i'th element of the input feature vector, the SoftMax function is defined as 44 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 7 ; 2 4 5 where K is equal to the number of classes and j ∈ ½1; K. Then the SoftMax classifier is defined as ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 7 ; 1 8 3 where fðx i Þ is the probability of x i belonging to class j and Fðx i Þ is the largest calculated probability of x i belonging to all j classes.In the second approach, the SoftMax classifier was replaced by SVM, which tries to find an optimal separating hyperplane with a maximum margin between classes by focusing on the training data located at the edges of the distribution. 45The SVM classifier was originally designed for binary classification tasks; however, it can be used for multi-class problems.The basic linear SVM classifier is defined as 46 Fig. 7 Network architecture of the custom CNN that was used for the "Valencia" orange fruit dataset consisting of CBS infection and four other (healthy/control, greasy spot, melanoses, and scab) peel conditions.E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 4 ; 5 4 3 where w and b are weights and biases, respectively, and x ∈ R d is the input feature vector. 1 represents a positive class, and −1 represents a negative class.A non-linear SVM uses the RBF kernel function given as 46 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 4 ; 4 7 5 where σ is the kernel width parameter, which plays a significant role in the performance of the non-linear SVM classifier. 45,46n the case of a citrus leaf dataset for the classification of canker and four other conditions, a custom VGG16 25 with SoftMax and SVM was used (Fig. 8).The VGG16 architecture consists of 13 convolution layers and three FC layers.The first two blocks consist of two convolution layers each, and the third, fourth, and fifth blocks consist of three convolution layers each.Each block is separated by a max-pooling layer.The last FC layer uses SoftMax as the classifier; it was customized to output five nodes corresponding to the probabilities of five classes of citrus leaf conditions.The input layer was customized to accept an input image of shape n × 123 × 99 × 5, where n represents the number of input images in the datasets (Fig. 8).

CNN Model Training and Validation
All CNNs with SoftMax and SVM classifiers were trained on an NVIDIA Tesla P100-PCIE GPU 343 (Santa Clara, CA) running the Compute Unified Device Architecture (CUDA) version 11.2 and driver 344 version 460.32.03 using the Google Colab Pro+ (Google LLC., 342 Menlo Park, CA) platform.The networks with SoftMax were trained from scratch using an adaptive learning rate method called Adadelta. 47The custom shallow CNN with SoftMax was trained for 120 iterations, whereas the custom VGG16 with SoftMax was trained for 500 iterations.In the shallow network, the learning rate was initially set to 0.05, but it was reduced by a factor of two if no improvement in validation loss was observed for 10 consecutive iterations.Early stopping functionality also monitored the validation loss, but with patience set to 20 epochs.The learning rate in the case of custom VGG16, i.e., for leaf dataset, was set to 0.01.In the case of using SVM as the classifier for the shallow network, the RBF kernel was used, implying a non-linear SVM functionality with C regularization parameter value of 1.In the case of custom VGG16 with SVM, feature vectors were extracted from the last Max Pooling layer, i.e., "block5_pool," and then both linear and RBF SVMs were trained using C parameter values from 1 to 10,000,000.

Performance Metrics
The performance of each of the classifiers, i.e., shallow CNN with SoftMax and SVM as well as custom VGG16 with SoftMax and SVM, were measured in terms of precision, recall, F1-score, and accuracy, which are defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 7 ; 6 3 8 where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.In addition to accuracy, area under the ROC curve (i.e., AUC) was also used to measure the performance of the custom shallow CNN with SoftMax and the custom VGG16 with SoftMax.This was done because it considers the entire range of threshold values between 0 and 1 and is not affected by class distribution and misclassification cost. 11,48,49The AUC can be treated as a measure of separability, and the lines belonging to a class that reaches close to the top-left corner are the most separable one.Apart from these metrics, k-fold crossvalidation (CV) error estimation was used for the leaf dataset when custom VGG16 was used with SVM as a very limited training dataset was available.One of the benefits of CV is that it helps to reduce the risk of overfitting, which is a common problem when working with limited training data. 39,50

Workflow Pipeline for Citrus Fruit and Leaf Datasets
The entire workflow pipeline for the "Valencia" orange fruit dataset consisting of CBS and four other peel conditions is shown in Fig. 9.
As seen in Fig. 9, the first step involves HSI cube dataset collection using the old HSI system with 92 spectral bands and then using PCA with loading factors to select the top five discriminating bands, which are then used with custom shallow CNN for spatial feature extraction and then to train the SoftMax and RBF SVM classifiers.The workflow pipeline for citrus leaf disease with canker and four other conditions is shown in Fig. 10.
As seen in Fig. 10, the original HSI cubes of shape 200 × 800 × 348 consist of four leaf samples, which were first calibrated and then converted into reflectance cubes.Using the spectral binning technique, the original 348 bands were reduced to 116 bands.This resulted in HSI cubes with four leaf samples of shape 250 × 200 × 116.After that, each of the cubes was clipped to the shape of 123 × 99 × 116 to obtain HSI cubes of individual leaf samples for each of the five conditions.Then the first six bands (399.3 to 425.8 nm) and the last eight bands (972 to 1009.1 nm) were removed pursuant to unusual spectral profiles due to sensor and light artifacts, resulting in HSI cubes with 102 spectral bands, as can be seen in Fig. 12.These cubes were then used for PCA with a loading factor to determine the five most important bands of the 102, which were then used with VGG16 for spatial feature extraction.Then SoftMax and linear and RBF SVMs were used for classifications.

Spectral Profiles of Fruit and Leaf Datasets
An example of spectral profiles of each orange fruit condition is shown in Fig. 11.These spectral profiles were obtained from previous work by Kim et al., 38 as the same dataset is used.It can be clearly seen that the reflectance values for the marketable fruit condition are the highest across the spectrum, whereas they are lowest for the CBS symptomatic fruit throughout the range of spectra.Between the 650 and 700 nm wavelengths, all five classes look distinct, whereas at the edges of the spectrum, CBS, greasy spot, and melanose look similar.
The spectral profile of the leaf dataset is shown in Fig. 12 for four leaf samples of canker.At the lower and upper wavelength edge boundaries of the spectrum, an unusual effect, which is due to LED illumination attenuation and sensor degradation below 430 nm and above 970 nm, takes place, as previously described.For this reason, these edges were eliminated from analysis.In between these boundaries, all canker leaf conditions look similar, as would be expected for similar disease conditions.It is noted that leaf 3 seems to have an overall higher reflectance than leaves 1, 2, and 4. As observed in the image cube, some leaves appear to have a brighter reflectance, which could be due to leaf age or surface residues.Fig. 10 Workflow pipeline for classifying "Furr" mandarin citrus leaves with canker disease and four other conditions (control, greasy spot, melanoses, and scab).
Fig. 11 Spectral profiles of each of the five different fruit conditions.Adapted from Kim et al. 38

Band Selection Based on PCA
A scree plot was used to determine the total percentage of variance explained by each principal component.Based upon the scree plot, it was found that the first five PCs explained a total of 91.79% of the variances and the bands that mostly contributed along each of the five PCs were found to be 12 (509.22nm), 28 (592.92),36 (634.76 nm), 39 (650.46 nm), and 71 (817.86 nm).Based on the PCs, two bands that lie between 600 and 750 nm were selected, which agrees with the spectral profile seen in Fig. 10, where there is a local minimum for all fruit samples in this range that make them distinguishable.These selected five bands were used to train and validate the custom CNN for classifying fruit samples with all five peel conditions.
In case of the leaf dataset, based on the scree plot (Fig. 13), it was found that the top PC could alone explain 73.28% of the variances, whereas the top five PCs could collectively explain 99.89% of the variances and the bands that most contributed toward these PCs were found to be 7 (431.1 nm), 8 (436.4 nm), 30 (553.1 nm), 54 (680.4 nm), and 95 (897.8 nm).Based upon these five selected bands, custom VGG16 with SoftMax and linear and RBF SVMs were trained and validated on the citrus leaf dataset.

Citrus Fruit Datasets
The custom shallow CNN with the SoftMax classifier took an average training time of 34 min and an average inference of 20.2 ms on an NVIDIA Tesla P100-PCIE GPU using the Google Colab Pro+ AI platform.The training and validation accuracy graphs are shown in Fig. 14.
Figure 14 shows that the custom CNN reached convergence between 100 and 120 iterations.To avoid overfitting of the model, training was stopped before the 120 th iteration, and the trained  model was saved in ".h5" file format for further applications.The training and validation process was repeated 10 times to evaluate the mean performances of the trained model.The summary of the confusion matrix showing the values of precision, recall, and F1-score for each of the five fruit peel conditions is shown in Table 1.
The overall mean accuracy for all five peel conditions was found to be 89.8%.The confusion matrix heat map, which shows the number of correct and misclassifications for each of the fruit peel conditions, is shown in Fig. 15.
As seen in Fig. 15(a), the confusion matrix is based on the number of test datasets available for the trained model.The values of precision, recall, F1-score, and accuracy are determined from this, and in the case of uneven distribution of dataset among different classes, these metrics may become biased toward the majority class.Therefore, for such cases ROC-AUC [Fig.15(b)] may represent the performance of the classifier in a more trustworthy way. 48Based on the ROC-AUC, wind scar was the most separable class followed by melanose, greasy spot, CBS, and marketable.However, based on the available test dataset, CBS was the most accurately classified based on its F1-score value (Table 1).
In the second approach, the SoftMax classifier was replaced by RBF SVM, and the process was repeated 10 times to determine the mean accuracy, precision, recall, and F1-score values, as shown in Table 2.The mean overall accuracy was determined to be 92.1%, which is 2.56% times higher than the accuracy obtained by the SoftMax classifier.This result was similar to the one by Dey et al. 51 who showed that using the SVM classifier with VGG19 CNN improved pneumonia detection in chest X-rays.Similar improvement was found in early detection of glaucoma by Raja et al. 52 when using CNN with the SVM classifier.Table 2 shows that wind scar and CBS were the most precisely classified fruit samples, whereas melanose had the least precision.The confusion matrix heat map, which shows the number of correct and misclassifications for each of the fruit peel conditions using RBF SVM, is shown in Fig. 16.
From Fig. 16, it can be seen that all of the CBSs were misclassified as greasy spot, whereas majority of the marketable fruits were misclassified as wind scar.Even though marketable and wind scar fruits look spectrally different in the spectral profiles (Fig. 11) of the collected fruit samples, in the majority of cases, they look similar, which explains their misclassifications seen in the confusion matrix (Fig. 16).

Citrus Leaf Dataset
The custom VGG16 CNN with SoftMax and linear and RBF SVMs was also trained on an NVIDIA Tesla P100-PCIE GPU using the Google Colab Pro+ AI platform.Because this dataset was limited, augmentation during the training process was used using the "ImageDataGenerator" class of Keras deep learning application programming interface.The parameters used for this class were as follows: rotation_range ¼ 20, zoom_range ¼ 0.15, width_shift_range ¼ 0.2, height_shift_range ¼ 0.2, and shear_range ¼ 0.15.The graphs for training and validation accuracy along with their corresponding loss are shown in Fig. 17.
Based on the validation accuracy graph, the network was found to converge around 500 iterations.The overall accuracy based on the validation dataset was found to be 82%; the values for precision, recall, and F1-score are summarized in Table 3.     Table 3 shows that control, scab, and canker were the most accurately classified images with F1-score values of 97%, 91%, and 85%, respectively, whereas greasy spot was the least accurately classified with an F1-score of 59%.Greasy spot and melanose had similar performances in terms of precision, whereas the latter had a much better sensitivity (recall) than the former.
Because of the limited and imbalanced original dataset, the confusion matrix and the performance values shown in Table 3 may not be a reliable and true representation of the trained model.Hence, ROC-AUC is shown in Fig. 18(b), which shows that scab condition leaves were the most separable followed by healthy/control, melanose, canker and greasy spot.
In the second approach, similar to the case of the fruit dataset, the SoftMax classifier was replaced by SVM.In the fruit dataset, the RBF SVM performed better, which is why only the result of RBF SVM is shown.Similarly, in the case of the leaf dataset, RBF SVM performed slightly better at the C-regularization parameter value of 1,000,000 (Fig. 19).
From Fig. 19, the best accuracy for RBF SVM was 93%, which was obtained at C ¼ 1;000;000 and remained constant for any values of C beyond that.The classification summary of this is shown in Table 4.
From Table 4 and the confusion matrix in Fig. 20(a), canker and scab were the most accurately classified with the highest F1-score values of 97% each.Canker had one misclassification belonging to the control class, whereas scab had none.Melanose had the least classification F1-score value of 0.89, which resulted from the five misclassified images.The CV error estimate chart [Fig.20(b)] shows that k-fold of 2, 3, and 8 resulted in the lowest estimated error for the  RBF SVM classifier.The k-fold cross-validation error estimates based on accuracy metrics were included as an additional aid to analyze overfitting and optimistic classification issues because the original leaf datasets were very small. 53,54Similarly, the different accuracies that were obtained for linear SVM at different C-regularization parameter values are shown in Fig. 21.
For the linear SVM, the overall accuracy reached a maximum value of 92% when the C-regularization parameter value was 10,000, and for any values beyond this, no further improvement was observed.The classification summary for this is shown in Table 5.As seen in Table 5 and the confusion matrix in Fig. 22(a), scab was perfectly classified with no misclassification and an F1-score value of 97%, whereas canker had the second highest F1-score value of 94% with a single misclassification, as seen in the case of RBF SVM.Similar to the case of RBF SVM, the linear SVM had minimum error estimates at k-fold values of 2, 3, and 8.

Conclusions
This study successfully demonstrated the practicality of employing two HSI systems for the classification of "Valencia" orange fruits with CBS and four other peel conditions, as well as "Furr" mandarin leaves with citrus canker and four other conditions.The results indicate that, even with a shallow CNN with properly tuned parameters, accurate classification of CBSinfected orange fruits was achieved, with a moderately high accuracy of ∼90%.Moreover, by replacing the SoftMax classifier with RBF SVM, an overall accuracy improvement of 2.56% was observed, resulting in a value of 92.1%.
Furthermore, the newly designed HSI system proved effective in classifying citrus leaves with canker and four other conditions.The VGG16 model with SoftMax achieved a weighted average accuracy of 85% and an overall accuracy of 82% for canker leaves, including the remaining four classes.Notably, replacing the SoftMax classifier with linear and RBF SVMs led to a substantial overall performance improvement of 10% to 11% points and 12.19% to 13.41%, respectively.This could be because the leaf dataset might have more complex and non-linear patterns, i.e., the different conditions on leaves were not linearly separable.Based on these findings, it is recommended to utilize VGG16 for feature extraction and RBF SVM for classification  as this combination achieves a weighted average accuracy of 97% for canker leaves.Importantly, the results were based on whole fruit and leaf HSI image samples rather than isolated sub-image ROIs, demonstrating that excellent results can be obtained without the need for sub-imaging.Future work will involve collecting a larger dataset of actual fruit and peel conditions to minimize the reliance on excessive sample augmentation.Additionally, the utility of generative adversarial network-based augmentation will be evaluated and compared with geometrybased augmentation.Furthermore, the study aims to explore and compare the effectiveness of extracting features solely from diseased regions of leaf samples, and training and testing the custom VGG16 model with SoftMax and SVM classifiers using the expanded dataset captured by the new HSI system.These efforts aim to develop a more robust and reliable CV algorithm for autonomous detection of citrus diseases on both fruits and leaves.
Mark A. Ritenour received his BA degree from California State University, Fresno, California, United States, in 1988, his MS degree from the University of California, Davis, California, United States, in 1991, and his PhD from the University of California, Davis, California, United States, in 1995.Currently, he is a professor of postharvest physiology and handling of fresh horticultural crops in the Horticultural Sciences Department at the University of Florida's Indian River Research and Education Center.His main research interests are in improving preand postharvest practices to maximize fresh produce quality and quality retention during harvest and postharvest handling.

Fig. 1
Fig.1HSI system for acquiring reflectance images from "Valencia" orange samples.

Fig. 2
Fig. 2 Portable HSI system for citrus disease classification on leaves.

Fig. 3
Fig. 3 Graphic User Interface of the software for the newly developed HSI system.

Fig. 4
Fig. 4 RGB color images for each of the five "Valencia" orange fruit peel conditions.Several diseases and disorders were present on the fruit: (a) CBS, (b) greasy spot, (c) melanose, (d) wind scar, and (e) marketable peel.

Fig. 8
Fig.8Network architecture of the custom VGG16 that was used for the citrus leaf dataset consisting of canker disease infection and four other conditions.

Fig. 9
Fig.9Workflow pipeline for classifying "Valencia" orange fruits with CBS infection and four other (market, melanose, greasy spot, and wind scar) peel conditions.

Fig. 12
Fig. 12 Example of spectral profiles of four-leaf samples belonging to canker condition.

Fig. 13
Fig.13Example of the scree plot that was obtained when PCA was implemented on the citrus leaf dataset.

Fig. 14
Fig. 14 Training and validation accuracy graphs for custom CNN with the SoftMax classifier when used for classification of CBS infected and four other "Valencia" orange fruit peel conditions.

Fig. 15 (
Fig. 15 (a) Confusion matrix heat map for custom shallow CNN with the SoftMax classifier showing the number of correct and misclassified images for all five conditions of "Valencia" orange fruit peels.(b) ROC curve showing different areas under the ROC curves for each of the five different fruit peel classes.

Fig. 16
Fig. 16 Confusion matrix heat map for custom shallow CNN with the RBF SVM classifier showing the number of correct and misclassified images for all five conditions of "Valencia" orange fruit peels.

Fig. 17
Fig. 17Training and validation accuracy graphs for custom CNN with the SoftMax classifier when used for classification of CBS infected and four other "Valencia" orange fruit peel conditions.

Fig. 18 (
Fig. 18 (a) Confusion matrix for the citrus leaf dataset when used with custom VGG16 and SoftMax classifier.(b) ROC curves show the area under ROC curves for five different leaf conditions when used with custom VGG16 and SoftMax classifier.

Fig. 19
Fig. 19 Different classification accuracies obtained at different values of C-regularization parameter for RBF SVM for the classification of five conditions of citrus leaves.

Fig. 21
Fig. 21 Different classification accuracies obtained at different values of the C-regularization parameter for RBF SVM for the classification of five conditions of citrus leaves.

Fig. 20 (
Fig. 20 (a) Confusion matrix for the citrus leaf dataset when used with custom VGG16 and RBF SVM classifier.(b) Graphs showing values of cross-validation error estimates for different values of k-fold used between 0 and 10.

Fig. 22 (
Fig. 22 (a) Confusion matrix for the citrus leaf dataset when used with custom VGG16 and linear SVM classifier.(b) Graphs showing values of cross-validation error estimates for different values of k-fold used between 0 and 10.

Table 1
Summary of the confusion matrix showing values of precision, recall, and F1-score for each "Valencia" orange fruit peel conditions, including one with CBS infection after training the custom CNN with the SoftMax classifier.

Table 2
Summary of the confusion matrix showing values of precision, recall, and F1-score for each "Valencia" orange fruit peel conditions, including one with CBS infection after training the custom CNN with the RBF SVM classifier.

Table 3
Summary of the confusion matrix showing values of precision, recall, and F1-score for each "Valencia" orange leaf conditions, including one with canker infection after training the custom VGG16 with the SoftMax classifier.

Table 4
Summary of the confusion matrix showing values of precision, recall, and F1-score for each "Valencia" orange leaf conditions, including one with canker infection after training the custom VGG16 with the RBF SVM classifier for C ¼ 1;000;000.
Linear SVM for citrus leaf disease classificaƟon

Table 5
Summary of the confusion matrix showing values of precision, recall, and F1-score for each "Valencia" orange leaf conditions, including one with canker infection after training the custom VGG16 with the linear SVM classifier for C ¼ 1;000;000.