Spectral discrimination of bulloak (Allocasuarina luehmannii) and associated woodland for habitat mapping of the endangered bulloak jewel butterfly (Hypochrysops piceata) in southern Queensland

Abstract The bulloak jewel butterfly (Hypochrysops piceata) is an endangered species due to a highly restricted distribution and complex life history, yet little is known of the availability of suitable habitat for future conservation. The aim of this study was to examine the potential of hyperspectral reflectance data for the discrimination of woodland species in support of bulloak jewel butterfly’s habitat mapping. Sites from known butterfly sightings in Leyburn, Southern Queensland, Australia, were examined using hyperspectral scanning and vegetation species discrimination. Reflectance data of eight woodland vegetation species (Allocasuarina luehmannii, Eucalyptus crebra, Eucalyptus populnea, Callitris glauca, Corymbia maculata, Angophora leicarpa, Acacia sparsiflora, and Jacksonia scoparia) were collected at the leaf and canopy levels using a full-range (350 to 2500 nm) hand-held nonimaging spectroradiometer. Partial least square (PLS) regression was used to interpret the bulloak tree spectra against other vegetation species. The PLS results indicated high-prediction accuracies ranging from 78% to 95% and 52% to 5% for canopy and leaf levels, respectively. The highest spectral separability was observed at the near-infrared bands (approximately at 700 to 1355 nm), followed by selected ranges in the short-wave infrared band where separability peaked at 1670 and 2210 nm. The results confirmed the feasible use of hyperspectral sensing for discriminating vegetation species and its potential use for habitat mapping of the endangered bulloak jewel butterfly.

Spectral discrimination of bulloak (Allocasuarina luehmannii) and associated woodland for habitat mapping of the endangered bulloak jewel butterfly (Hypochrysops piceata) in southern Queensland

Introduction
The assessment and mapping of high biodiversity areas are vital components of conservation planning. 1,2 The location and extent of wildlife habitat, as well as the spatial distribution of wildlife, are important considerations in developing conservation strategies to mitigate the habitat loss and degradation. 3 In assessing and mapping habitat areas, a number of bioclimatic and environmental factors need to be considered at the appropriate level of thematic details and spatial scales. 4,5 As regards to vegetation, the assessment of its structure and composition presents certain challenges whenever vast tracts of land are involved. 6 Traditional field survey and mapping techniques for vegetation classification cannot always provide the required information in an appropriate time and cost-effective manner. 7,8 Geographic information systems (GIS) and remote sensing technologies offer potential solutions to broad-scale vegetation assessment and mapping of wildlife habitat. [9][10][11] In the context of conservation mapping and management of wildlife, habitat is often a species-specific concept. 5 This means that the management of species, including their respective habitat, demands that the mapping be conducted at the individual species level. 12,13 This could be a challenging task, given that the individual species may have very specific habitat requirements which cannot be easily mapped. For instance, the bulloak jewel butterfly (Hypochrysops piceata), 14 an endangered species in southern Queensland, has close associations with the bulloak tree (Allocasuarina luehmannii) 15 such that the habitat mapping at the tree species or plant-association level is highly preferred. Yet, mapping at the tree species level will be difficult even for the contemporary remotely sensed data captured by multispectral sensors.
Recent development of hyperspectral remote sensing systems, such as the Hyper-X and HyspIRI missions, may provide significantly improved mapping of vegetation at the species or plant association levels. 16,17 Data from hyperspectral remote sensing technology increase the capability to accurately map vegetation characteristics which were formerly not measurable with broadband multispectral bands. [18][19][20] For instance, vegetation discrimination studies identified that the hyperspectral imagery has accurately mapped and differentiated vegetation species. 18,21 Some studies achieved classification with high accuracies in vegetation species discrimination analysis. 17,18,21 Mapping the habitat of the bulloak jewel butterfly (H. piceata) using hyperspectral remote sensing data is currently unexplored, with no reported work published in the literature. Identified as endangered by Queensland's Nature Conservation Act 1992 and classified as a high precedence under the Department of Environment and Heritage Protection (EHP), the bulloak jewel butterfly uniquely depends on the species of bulloak tree (A. luehmannnii). Having a mutual connection with bulloak trees, ant species (Anonychomyrma sp.), and also with the attendance of scale insects named Rhyzococcus sp., 12,13,15,22 this butterfly occurs in bulloak or mixed bulloak woodland on sandy and alluvial soils in southern Queensland. 14,22 Remnant bulloak woodland appears to provide important resources not only as habitat for the bulloak jewel butterfly, but also as a food resource for a number of bird species. 22,23 Bulloak woodland is also categorized as an endangered ecological community. 24 Thus, there is an urgent need for habitat mapping of woodlands associated with the bulloak tree. Furthermore, Lundie-Jenkins and Payne 15 highlighted in the Recovery Plan for the Bulloak Jewel Butterfly (Hypochrysops piceatus) 1999-2003 the crucial need to (a) discover and confirm additional populations of the butterfly and (b) conduct spatial mapping and predictive modeling. These suggestions were initially proposed by Dunn and Kitching 12 based on their previous butterfly surveys.
The goal of this study was to evaluate the possible use of hyperspectral data in discriminating bulloak trees from other woodland vegetation species in a woodland remnant in southern Queensland. The specific objectives were to (a) examine whether bulloak tree species can be effectively differentiated from the selected woodland vegetation species at the leaf and canopy levels from the full-range (350 to 2500 nm) hand-held spectroradiometer data and (b) compare the prediction accuracies and errors of the species discrimination produced from raw and transformed data. This study is part of a broader assessment focused on habitat mapping and suitability modeling of the endangered bulloak jewel butterfly in southern Queensland.

Study Area
The study area (33,400 ha) is located near Leyburn, Queensland, a town in the Darling Downs Region (approximately 28.0167°S, 151.5667°E) as depicted in Fig. 1. A small settlement, Leyburn is situated 219 km southwest of Brisbane with an altitude of approximately 423 m above mean sea level. The woodland species in this area are generally mixed in composition and occur in multiple stages of regeneration and degradation. The dominant vegetation species in the survey sites included bulloak (A. luehmannii), narrow-leaf ironbark (Eucalyptus crebra), poplar box (Eucalyptus populnea), white cypress pine (Callitris glauca), spotted gum (Corymbia maculata), apple gum (Angophora leicarpa), currawang (Acacia sparsiflora), and dogwood (Jacksonia scoparia).
The study area is dominated by livestock grazing, representing approximately 85% of the total area. 25 Other land use types include cropping, production forestry, mining, reservoir/dam, residential, intensive animal production, and other minimal uses. 25 There are 19 regional ecosystems (REs) 26 types in the study area, of which five combinations of these REs (5730 ha in total) are associated with the bulloak trees species.
The study area is a natural habitat of the bulloak jewel butterfly, and this species has been sighted in the locality 12,14 as marked in Fig. 1. The data collection conducted by the Queensland's EHP 26 from 1969 to 2003 resulted in a total of 19 sightings of the bulloak jewel butterfly in this locality. Data collection dates vary from 1969 until 2003. All sightings are very restricted within this area. Hence, it is selected as the study area for this project.

Field Data Collection
Field data collection was conducted in June 2013 to measure the reflectance of eight woodland vegetation species (e.g., A. luehmannii, E. crebra, E. populnea, C. glauca, C. maculata, A. leicarpa, A. sparsiflora, and J. scoparia) separately at the leaf and canopy levels as depicted in Fig. 2 and Table 1. RE maps, produced by Queensland's EHP, 26 served as a guide in the purposive selection of sample locations at the plant community level. The target sites at this level were selected based on the presence of bulloak trees, accessibility, and proximity to plant communities where the butterfly was sighted. The information about the target sites was collected during the preliminary survey prior to the spectrometer reading data collection.
Using a simple random sampling method, a total of 40 sample trees were initially considered within the RE sites. However, due to the reasons related to accessibility, height of tree, apparent tree health condition, and exposure to sun, only 32 trees were sampled. Four trees were sampled for each woodland species. For each woodland species, 45 to 60 spectral measurements were collected at the leaf level, with the same amount collected at the canopy level. The identities of the selected species were defined by comparing the physical appearance of leaves, flowers, fruits, and bark with those respective characteristics found in the field guide book. 27 They were further verified by consulting with the local experts during the preliminary survey work.  A portable full-range (350 to 2500 nm) spectroradiometer from the Analytical Spectral Devices, Inc., Boulder, Colorado. 28 was used to record the reflectance data at 1-nm wavelength intervals. This device is a hand-held battery-powered spectrometer with a fore optic cable for light collection and a notebook computer for data logging. 26 For canopy-level measurement, the sensor was positioned about 70 cm above the canopy to record the average spectra within a 10-cm diameter on the canopy. A twin-step ladder was used to measure the canopy reflectance. On the other hand, for leaf-level measurement, a distance of 17-cm height was used to record the average spectra for a 2.5-cm diameter of leaf. A precalibrated yard stick was used to measure the distance between the sensor and the leaves. These calculations were based on the 8-deg field of view (FOV) toward the targeted sample. The FOV is used to express the solid angle through which light incident on the fore optic will arrive to the detector system. 28 Reliable field spectrometry data collection depends upon accurate calibration of the devices used. Thus, the sensor was calibrated using a white reference plate prior to the scanning for both levels of measurement. The scanning was carried out on the unshadowed portion of the leaf or canopy on a clear sunny day. For canopy measurement, the most challenging task was to choose a part of the canopy with minimal shadow coverage from other parts of the same tree or from adjoining trees. Thus, the best practice to cater for this issue was through a selection of an exposed (no incident shadow) part of the canopy or a tree that has a suitable distance from other adjacent trees.
Reflectance data were collected between 10.00 am and 2.30 pm local time for optimal performance of the sun's azimuth and elevation. Moreover, additional calibration was performed against a white reference plate if there was cloud interference during the recording session. This helps in calibrating ongoing differences between multiple sources and standardizing the measurement.

Data Preprocessing
For both levels of measurement, 10 spectra were internally averaged by the spectrometer for each sample. All spectral datasets were stored in a computer and processed using the RS3 software intended for use with a graphical user interface. The reflectance data were then transformed into ASCII format. The sample datasets from the ASCII format constituted 16 raw files of reflectance data for the eight selected vegetation species sampled at the leaf and canopy levels. These datasets were structured in spreadsheets to produce the data array of 45 to 60 samples for every species by 2151 wavebands. Furthermore, bulloak species was compared with the other species independently, since there was a need to discriminate between bulloak and other vegetation species (e.g., bulloak versus apple gum; bulloak versus white cypress pine; and bulloak versus narrow-leaf ironbark). Thus, the variables for the seven pairs of vegetation species were labeled properly before exporting into the Unscrambler 9.2 29 software for partial least squares (PLSs) regression analysis.
Before the calibration stage was undertaken, the spectral reflectance was preprocessed for an optimal performance. A series of "cleaning" operations was applied for the elimination of: (a) very short wavelengths (350 to 399 nm) and strong water vapor absorption bands: 1356 to 1480, 1791 to 2021, and 2396 to 2500 nm, 18,21 (b) outliers that indicated abnormal reflectance response as compared with other samples. 20,30,31 Pretreatment or transformation of the spectral data was a significant component of a number of spectral analyses to improve the accuracy of results. In this study, two chemometrics pretreatment methods [moving average smoothing 8,18,20 and multiplicative scatter correction (MSC) 20,31 ] were applied to reduce the noise and to normalize the data. The goals of smoothing or the averaging method were to decrease the number of variables in the dataset, to eliminate uncertainty in measurement, and to reduce the effect of noise. 32 This method replaces each point of the spectra with the average of x adjacent points, where x is a positive integer in a matrix. 32 In this case, a 3 by 3 matrix or a kernel window of moving average was selected to transform the raw bands prior to the PLS regression analysis. On the other hand, the purpose of MSC is to treat undesired scatter effects for both multiplicative (amplification) and additive (offset) effects. 32

PLS Regression
The most frequently used regression techniques in spectroscopy analysis are principal component analysis (PCA) and PLS regression. 30,31,33 The PLS model is comparatively better than the PCA because it does not include latent variables that are less important to describe the variance of the quality measurement. 20,30,31,34 The PLS regression can be defined as a bilinear modeling method for relating the variations in one or numerous response variables (Y-variable) to numerous predictors (X-variable). 33,35 In PLS regression, information of the independent (X-variable) variables is projected onto a small number of latent (Y-variable) variables named PLS factors or components. 30,35 This meant that the Y-data are used in estimating the latent variables to ensure that the first components are those that are most relevant for predicting the Y-variables.
The PLS comprises regression and classification tasks as well as reduction techniques and modeling tools. 31,33 Furthermore, PLS reduces the entire reflectance spectra to a few relevant factors and regresses them to the measured parameter of a given sample. 36,37 Thus, the PLS model is considered to be more robust than a multiple linear regression calibration model. 36 This method performs well when the various "X-variables" articulate common information, i.e., when there is a great amount of correlation or collinearity. Thus, the use of PLS regression in discrimination analysis offers better capabilities than traditional regression techniques in analyzing hyperspectral data because of its inherent high collinearity. 38 Other statistical approaches (e.g., discriminant function analysis) are not applicable for analysis involving over 1000 X-variables. These alternatives have limitations in handling multicollinearity of predictor variables.
Furthermore, PLS requires the response of a dependent variable Y (species) which is a categorical data to quantitatively analyze the data similar to the other traditional regression technique. In this situation, the values for the species-type variable were coded as "0" and "1" [e.g., bulloak (0) versus apple gum (1)]. Thus, in this study, it limited the analysis to only two species at a time with the intention of exploring an interspecies difference.
The dataset was divided into two sets which were comprised of calibration and validation sets for PLS regression analysis. Seventy-five percent of the dataset was used to develop a prediction equation (calibration set), while the remaining 25% was used for the validation of the predictive equation. 20,38 In the development of the PLS model, a full cross-validation (leave-one-out) method was used to calculate the quality of prediction and to prevent over-fitting of the calibration model. 32,35 This means that only one sample at a time is kept out of the calibration. Furthermore, the performance of the PLS models was evaluated by the root-mean-square error of prediction (RMSEP) and the coefficient of determination (r 2 ) of the model. An accurate PLS model should have a high regression coefficient of determination (r 2 ) and a low RMSEP between the predicted and measured values of each regression analysis. 20,31,35 Regression analysis was led by identification and elimination of outliers using an outlier list tool offered by Unscrambler 9.2 software. The outliers were also identified from the influence plot which displays the sample residual-variance against leverages. It means that the outliers were listed under the PLS components (PCs) list as signified by the influence plot. The selection of an outlier always starts with the ones that appear first in the earliest components. 29 Furthermore, samples with high-residual variance are likely to be outliers. Thus, if there were six samples identified as outliers in the raw bands of interspecies discrimination, then all of them will be removed prior to the PLS regression analysis. This approach was also applied for the smoothing and MSC techniques with the same number of outliers encountered as in the raw bands in order to compare these three techniques. It was found that less than 5% of the dataset was considered outliers.

Vegetation Reflectance Properties
Leaves of woodland species display the typical spectral curves of green healthy vegetation: high reflectance in the near-infrared (NIR) and relatively low reflectance in the visible and short-wave infrared (SWIR). Visual interpretation of raw spectra from Figs. 3-5 indicated the differences in ranges across vegetation species. For example, bulloak species was displayed and compared with dogwood (the closest spectral response) and poplar box (highly separated spectral response).
At both leaf and canopy levels, the highest separability of the spectral magnitude was clearly shown at the NIR band (approximately at 700 to 1355 nm) as depicted in Fig. 3. These were followed by the selected ranges in the SWIR band (where separability peaked at 1670 and 2210 nm), green band (550 nm), and red band (650 nm). This result agreed with other researchers 19,34,35 who concluded that the spectral difference between the species is insignificant in the visible band but is truly notable in the NIR and SWIR bands. It means that the differences in pigment (chlorophyll) absorption between species pairs were not the discriminating variables, but those related to leaf internal structure (NIR-related) and leaf water content and other biochemicals (SWIR-related) were discriminating variables. 19   At the canopy level, the bulloak-dogwood species comparison had shown the lowest separability, whereas the bulloak-poplar box pair exhibited the highest spectral discrimination (Fig. 4). Likewise, score plots in PLS demonstrated these patterns as shown in Figs. 6 and 7. These plots clearly show that the bulloak-poplar box pair has high discrimination. Similarly, at the leaf level, bulloak-dogwood species revealed the lowest spectral separability between them, while bulloak-angophora exhibited the highest separability.   The spectral separability of different species pairs, such as bulloak-dogwood and bulloakpoplar box, can be explained by their leaf attributes (e.g., color, size, texture, shape, etc.) as described in Table 1. For instance, the needle-like modified leaves or foliage of the bulloak tree closely resemble that of the dogwood. In comparison, the glossy and rounded dark green leaves of an adult polar box tree are distinctly different from bulloak. These examples also reflect the same situation in describing the spectral separability of the bulloak-angophora species pair: bulloak's foliage is needle-like, whereas angophora has oppositely arranged lanceolate leaves. These findings are in agreement with Zhang et al. 19 and Taylor et al. 21 who demonstrated that the variations due to leaf attributes, along with other biophysical properties, leaf chemical composition, and leaf water content, were the main factors which could contribute to the differences in spectral separability of plant foliage.
There are two significant issues for every study involving the discrimination and mapping of vegetation: (a) the variability of biological types and (b) the spectral similarity (characteristics) of most vegetation. 40 Some of this variability can be accredited to geographical and environmental backgrounds, some to phenology and seasonal circumstances, but most of them still remain at the scale of communities and individual vegetation in their habitat. 41 However, the spectral properties of all types of vegetation are suppressed by the similar set of pigments, structures, and biochemicals throughout the year. 40,42 In this study area, vegetation communities are nondeciduous and exhibit little morphological changes.

Bulloak tree versus other woodland species at leaf and canopy levels
Prediction accuracies of PLS results can be obtained from the subtraction of value one with the RMSEP. Tables 2 and 3 summarize the PLS results for the canopy and leaf levels of raw spectra. There were high correlations between predicted and measured values for the validated samples, i.e., r ¼ 0.985 to 0.997 and r ¼ 0.985 to 0.996 for canopy and leaf levels, respectively. The RMSEP was reasonably low (i.e., from 0.0433 to 0.1086 for canopy level and from 0.0542 to 0.0827 for leaf level) in range of 0 to 1, indicating good prediction accuracies. Figure 8 reveals the accuracies of raw spectra at the canopy level range from 89.14% to 95.67%, while the accuracies at the leaf level range from 91.73% to 94.58%. Among the seven pairs of species combination, the lowest accuracy pairs correspond to bulloak versus narrow-leaf ironbark (89.14%) and bulloak versus apple gum (91.73%) for canopy and leaf levels, respectively. These values were found to be relatively high. This means that the accuracy ranges obtained in this study were relatively close to those attained by Huang and Apan 31 (93.57% to 94.27%) who used PLS regression for detecting Sclerotinia rot disease on celery. In most cases of PLS regression analysis of raw spectra as conducted here, the prediction accuracies of canopy-level samples produced higher accuracies than the leaf-level sets, except for the bulloak versus currawang, bulloak versus dogwood, and bulloak versus narrow-leaf ironbark (Fig. 8). These higher accuracies of canopy samples may be best explained by the differentiating reflectance of mixed materials integrated within the wider FOV of the sensor. From the point of view of satellite radiation, a forest or woodland environment will reflect an integrated signal from leaves, branches, and trunks of trees as well as from soil and leaf litters. 21,38 Each of these elements has substrative and additive effects on the spectral curve, thus the ultimate reflectance will be a mixture of all of them. 18

Raw spectra versus transformed data
For vegetation species discrimination involving bulloak tree spectra against the other woodland vegetation species, the prediction accuracy of raw spectra is reasonably higher than the accuracy of transformed spectra (smoothing and MSC) as depicted in Tables 4 and 5 for both canopy and leaf levels. Overall, the prediction accuracies of raw spectra and smoothing were nearly the same, whereas MSC's prediction accuracies consistently produced the lowest accuracies. For example, prediction accuracies of raw spectra and the smoothing method for bulloak versus apple gum, as well as for bulloak versus cypress (canopy level), were nearly the same. They were 92.67% (raw) and 92.64% (smoothing) for bulloak versus apple gum, while they were 95.67% (raw) and 95.67% (smoothing) for bulloak versus cypress. At the leaf level, similar patterns were observed: for bulloak versus dogwood, the accuracies were 92.45%, 92.54%, and 81.82% for raw spectra, smoothing, and MSC, respectively.  This situation indicated that the raw and smoothing spectra have equivalent predictive power for species discrimination when evaluated by PLS regression. In most cases, the PLS regression accuracies at the canopy level produced higher values than the leaf level either for raw spectra or transformed data. However, a reverse observation was found for the bulloak versus currawang, bulloak versus dogwood, and bulloak versus narrow-leaf ironbark as shown in Fig. 9.  Figure 9 reveals that the accuracies of raw spectra and transformed data at the canopy level range from 78.68% to 95.67%, while accuracies at the leaf level range from 52.09% to 94.58%. The least accurate pairs correspond to bulloak versus dogwood (78.68%) and bulloak versus cypress (52.09%) at canopy and leaf levels, respectively. Both values were generated from the MSC's prediction method. This suggests that the transformations executed to the raw spectra did not generate a substantial modification in model prediction.

Conclusions
The results from PLS regression confirm the effectiveness of narrow-band spectral reflectance data for vegetation species discrimination sampled at the study area. Among the seven pairs of species combinations of raw spectra, the least discrimination was observed between bulloak versus narrow-leaf ironbark (89.14%) and bulloak versus apple gum (91.73%) for canopy and leaf levels, respectively. In comparison with the raw spectra and transformed data, the least accurate pairs correspond to bulloak versus dogwood (78.68%) and bulloak versus cypress (52.09%) at canopy and leaf levels, correspondingly. These values were produced by the MSC's prediction method. This study revealed that the raw spectra and smoothing (transformed) datasets have a corresponding predictive power for discriminating between species at the canopy and leaf levels. However, the transformation techniques of MSC applied to the raw data did not produce significant enhancement to the accuracy of prediction. We conclude that a full cross-validation technique in PLS regression produced high-prediction accuracies for raw spectra and smoothing datasets.
The spectral separability of bulloak tree against other woodland vegetation species indicated good discrimination between selected regions of the spectrum. The NIR region (700 to 1355 nm) appeared to play a key role in the discrimination between species in PLS regression. However, there is a limitation for using PLS for this kind of study as it confines the analysis to only two species at a time. In addition, studies on interspecies spectral differences have been largely descriptive or quantitative. Discrimination was possible but was statistically based, acknowledging that there is variability within a plant group or species. There were relative differences between spectra rather than an individual signature for each species. Variability can be attributed to phenology, geographical, and environmental settings, but much variation still remains at the scale of communities and individual species in their natural habitat.
This study demonstrated the feasible use of hyperspectral data in discriminating woodland species to help improve the mapping of bulloak jewel Butterfly habitat. The next step is to conduct research on the use of hyperspectral imaging sensor (rather than the nonimaging technique as conducted here) for habitat mapping. Wan Nor Zanariah Zainol Abdullah is a postgraduate student at the University of Southern Queensland, Australia, as well as a tutor at the University Putra Malaysia, Malaysia. She received her BS degree from University of Technology Malaysia, Malaysia, in 2006. She was awarded her MS degrees in remote sensing and GIS from the University Putra Malaysia, Malaysia, in 2010. Currently, she is doing her PhD degree in fine-scale habitat mapping and modeling of an endangered butterfly species and climate change impact analysis.
Armando A. Apan is currently an associate professor at the University of Southern Queensland. His current research area focuses on the application of GIS and remote sensing to observe terrestrial ecosystems and their responses to environmental change; vegetation and habitat mapping; species distribution modelling; land use/cover change analysis; and monitoring of agricultural crops. He has over 120 papers published in international refereed journals, conference proceedings and book chapters.
Tek N. Maraseni has over 20 years work experience in carbon and forest related areas in Nepal, Thailand, China, and Australia. Currently, he is the VC's senior research fellow with the University of Southern Queensland, Australia, and also a senior international scientist with the Chinese Academy of Sciences, China. He has produced over 90 publications, including two books in the last seven years. His published work has been recognized by several national and international fellowships/grants/awards. Andrew F. Le Brocque is a vegetation ecologist and has researched and published in the areas of plant-environment relationships, biodiversity and ecosystem services in agricultural landscapes, and plant-fungal relationships for over 20 years. He is currently senior lecturer in ecology and sustainability at the University of Southern Queensland and has taught ecology, conservation and sustainability at USQ since 1996 and currently coordinates and teaches courses in ecology, conservation biology, environmental science, and sustainability.