Learning vector quantization neural network for surface water extraction from Landsat OLI images

Abstract. There is a growing concern over surface water dynamics due to an increased understanding of water availability and management with current climate trends. Remote sensing has now become an effective means of water extraction due to the availability of an enormous amount of data with diverse spatial, spectral, and temporal resolutions. However, water extraction from optical remote sensing data is associated with several major difficulties, such as the applicability of the extraction method over large areas and complex environments; shadow contamination from clouds, buildings, and mountains; and disclosure of shadowed water and exclusion of floating and submerged plants. To address these difficulties, a learning vector quantization (LVQ) neural network-based method was proposed and implemented to extract water using Landsat 8 imageries. This method is capable of separating water from clouds, build-up areas, shadows, and shadowed water by the ideal input of bands 1 to 7 and normalized difference vegetation index. This model learns water across Sri Lanka. Eight OLI scenes were tested, and the performance was compared with five widely used machine learning algorithms: support vector machine, K-nearest neighbor, discriminant analysis, combination of modified normalized difference water index and modified fuzzy clustering method, and K-means clustering methods. This method performed the best, achieving overall accuracies and the kappa coefficients between 97.8% and 99.7% and between 0.96 and 0.99, respectively. Results have demonstrated robustness, consistency, and preciseness in various dark surfaces, noisiest water environments, and highly water scarce scenes. LVQ revealed a good generalizing ability to detect all types of water with less amount of training samples. This method can be easily adaptable for other sensors and global water to support water resource studies.

2 Study Area and Data

Study Areas
Sri Lanka is located in the tropics, which has a higher number of small-and medium-sized surface water resources. The study area is situated between 5°55′ N to 9°51′ N latitude and 79°41′ E to 81°53′ E longitude and eight typical test sites are shown in Fig. 1. These test sites represent a wide range of water bodies with various types (i.e., ponds, lakes, reservoirs, streams, rivers, ocean), sizes, and depth, and their background environment include diverse land cover types, such as build-up areas, forest, paddy fields with flat, hilly, and mountainous terrains. A range of environmental noise to evaluate the performance of the proposed model and a brief description of test sites is presented in Table 1.

Data
For this study, four Landsat 8 OLI images (15 March 2016: path 141, rows 55, 56; 13 January 2017: path 141, rows 55 and 13 September 2018: path 141, rows 54) were collected from the U.S. Geological Survey (USGS). 35 Images were Level 1 terrain-corrected (L1T) and pregeoreferenced using the WGS84 datum. Sentinel-2 images belonging to five tiles at 10-m resolution were used as reference data and were collected from the European Space Agency (ESA). 36 The corresponding metadata information and a brief description of each test scene are summarized in Table 1.

Methods
The overall methodology adopted in this study for the identification and extraction of water bodies is shown in Fig. 2. This LVQ-based method consists of five stages: (1) applying

Image Preprocessing
Radiometric calibration and atmospheric corrections are prerequisites for raw Landsat imagery to obtain identical and high-quality experimental data. 37 Due to the challenges associated with atmospheric correction over inland and coastal water, Rayleigh-corrected reflectance has been widely used in water applications with consistency. 38 Images were processed and Rayleighcorrected reflectance were derived using the Atmospheric Correction for OLI lite tool (acolite_win_2014).

Training Sample Selection
Uniformly distributed training samples are efficient to train ANN. 2 Training samples are selected across the study area from four OLI images except test areas. A total of 3765 water and 2685 nonwater pixels were selected in order to examine the efficiency of LVQ and accelerate the training process. Selection and distribution of samples are done by experience following a series of experiments. Pixels that are recognized as true water were exclusively selected for the training process to ensure the accuracy and quality of the network. All samples are manually labeled using region of interest polygons as water and nonwater using ENVI 5.3 software.

Training LVQ Network
A total of eight layers, including OLI bands from one to seven and normalized difference vegetation index (NDVI), were included in the network together with the per-pixel class label. The LVQ network was constructed by the newlvq function in the neural network toolbox provided by MATLAB R2014b. Parameters were selected according to a rigorous examination of a series of experiments with higher efficiency and minimum computational error. 39 Learning rates of 0.01, 0.001, and 0.0001 have experimented; both learning rates 0.01 and 0.001 were excluded as they converged at a lower number of epochs and 0.0001 converges around epochs 200. The experiment result showed how training and classification accuracies evolve with the number of epochs and time (Fig. 3). We set the number of iterations and neurons as 200 and 50, respectively, in concern of an optimization strategy.

Simulation
According to the trained LVQ, each pixel was classified into two categories: water or nonwater. The sim function in the MATLAB neural network toolbox was applied to simulate the trained network and a binary image was obtained as a result with different color labels.

Accuracy Assessment
The performance evaluation for entire scenes is conducted in a quantitative manner and visual comparison. All the eight Landsat image scenes are verified with the corresponding highresolution Sentinel-2 images and a confusion matrix is calculated. All water pixels and an equal amount of nonwater pixels are used to assess the accuracy. Performance evaluation was done by four accuracy measures, including overall accuracy, kappa coefficient, producer's accuracy, and user's accuracy, which were defined by following standard equations: 40 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 3 6 2 Overall accuracy ¼ ðTP þ TNÞ N × 100; (1) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 3 0 7 where TP is true positive: the number of detected water pixels, TN is true negative: the number of detected nonwater pixels, FP is false positive: the number of falsely detected water pixels, FN is false negative: the number of undetected water pixels, and N is the total number of pixels used in accuracy assessment.

Water Extraction Result and Quantitative Assessment
With the water extraction process proposed, results in binary maps in all test sites are shown in Fig. 4. A significant amount of clouds, cloud shadows, terrain shadows, build-up areas, various water types, different shapes and depth, diverse land cover, and vegetation mixed water are the major influence factors in these test scenes. A simple visual investigation shows that this method succeeded in enhancing variation between water and nonwater in all water environments under confused surroundings and concurrently suppressing low reflectance surfaces in both fresh and coastal waters. Quantitative assessment of water extraction is done by calculating producer's accuracy, user's accuracy, overall accuracy, and kappa coefficient by constructing error matrices. The results are summarized in Table 2. The overall accuracy and kappa coefficient ranges from 97.80% to 99.69% and 0.9559 to 0.9938, respectively. Similarly, the producer's accuracy and user's accuracy are varied between 0.9717 and 0.9998 and 0.9783 and 0.9945, respectively. Overall, the results indicate the robustness and higher accurateness of the proposed method under diverse water environmental conditions.

Performance Comparison with Other Methods
The results of LVQ were compared with the results of the five most widely used supervised and unsupervised water detection algorithms: SVM, KNN, DA, combination of modified normalized difference water index and modified fuzzy clustering method (MMFCM), and K-means. SVM 41 is a hyperplane-based classification technique, KNN 42 algorithm is using KNNs for classification, and DA 43 is a multidimensional distance parametric classification technique. MMFCM, which is developed by combining modified fuzzy clustering method algorithm, modified normalized difference water index (MNDWI), 9 and K-means, 44 which is a popular unsupervised clustering algorithm. All algorithms were implemented in MATLAB R2014b using seven OLI bands and NDVI. Water extraction was carried out independently for another 10 various noise-induced scenes using LVQ, SVM (type: CSVC, kernel: radial basis), KNN (NN: 9), DA (type: linear), MMFCM, and K-means clustering, and the result is visually inspected. Furthermore, the ability to suppress shadows, the potential to detect small waters, shadowed water, and the capability of extracting boundaries in more detail without the influence of surroundings and noises are expressed from those test images (Fig. 5). The differences in performance among them are marked as red squares.
Water extraction result in Fig. 5(a) shows that cloud shadow over the paddy fields can be suppressed by LVQ, KNN, and DA [Figs. 5(a1), 5(a3), and 5(a4)], while SVM, MMFCM, and K-means misclassify a portion of cloud shadow as water. In the presence of a large amount of cloud over water and build-up area [ Fig. 5(b)], MMFCM showed confusion with cloud shadow and cloud as water; meanwhile, SVM, DA, and K-means clustering method extracted portions of the shadow as water. However, LVQ and KNN were successful in discrimination and accurate identification of shadowed water boundary in the build-up area. In case of terrain shadow, all algorithms exposed similarity in the performance as LVQ [Fig. 5(c)] in contrast to the K-means clustering, which overclassifies terrain shadows as water.
LVQ has the potential to ascertain a very shallow and thin sandy river, which has a similar ground condition in the underwater and adjacent nonwater areas that were partially detected by KNN, DA, MMFCM, and K-means clustering However, the single-pixel feature was undetected by KNN and MMFCM; meanwhile, the KNN and K-means method only partially detected the eight-pixel water body. However, SVM and DA have commissioned few more pixels as water [ Fig. 5(i2) and 5(i4)]. These all performances can be concluded that LVQ has depicted a higher capacity to detect different types of water distribution in detail with superior accuracy and stability among the six methods under various confusing states.

Discussion
This study also revealed that the superior big data generalization ability of LVQ by learning from lesser number of training samples with acceptable accuracy. The number and types of training samples were reduced as much as possible to be small using a trial-and-error technique. Similar to these results, studies have already shown empirical evidence for the excellent generalization ability of LVQ 45 and ANN methods in water extractions 2 using very few training samples. Further, results revealed that LVQ can correctly segment rivers, streams, ponds, reservoirs, ocean water, and saltpans even if the training samples were not provided for those all water types. These may be due to the dynamic learning nature and segmenting characteristics of LVQ 23 or the adequateness of the input data in the representation of precise water 18 or spectral variation of the dataset. Previously, studies stated that the performance of the LVQ-based water detection largely depends on the selection of training samples. 4,32 This task was highly challenging due to the significant variations in the characteristics of water and nonwater across Sri Lanka. These challenges were accomplished by careful manual selection of samples as much as possible to represent the variation. Perfection in the selection of training samples and parameters may further improve the accuracy. Furthermore, the accuracy of the per-pixel labeling of training samples is a crucial factor that affects the accuracy of the predictions of each pixel during segmentation. In concern of high precision, only well-known water pixels were labeled as water. As a consequence of this process, our model consisted of higher accuracy with minimal omission and commission errors. An earlier study had reported that scarcity of water training samples in the scenes results in an adverse effect on the output accuracy of LVQ. 4 This study used training samples from all OLI imageries as same class pixels almost showed similar spectral response in all bands 46 with several considerations: (1) to achieve accurate water extraction in the scenes that lack water pixels and (2) to speed up the prediction process by reducing human interaction in sample selection and the training process in each scene. Moreover, scenes with small, narrow, various depth and diverse conditional water bodies are much more challenging 33,40 in contrast to the scenes with larger water features or the majority of water. Thus, the test scenes were selected with various sized water and cover areas range between 0.44% and 42% to evaluate the consistency. This model showed good effectiveness to segment water without any limitations in extents [single pixel in Fig. 5 Fig. 4(c)], and quantities of water pixels by generating a prime result, as illustrated by Fig. 6. However, the method leads to false-positive pixels in urban areas (test site b).

(i)], shapes, depths [about 87 m in
In this study, evaluations were conducted through four OLI imageries and therefore, can be considered as a general representative of the country's water features. However, clear sky and availability of very high-resolutional Sentinel-2 images on the same or close proximity date to evaluate accuracy are the main reasons for limited scenes. Manual interference is required in the training processes for appropriate training samples and training parameters selection and perpixel labeling. In addition, the requirement of time for training the LVQ and ANN 2,32 is quite considerable. This is due to the training behavior of the neural network and is proportional to the dimensionality of input data. However, it is noted that the time required for the simulation of all images is minimal when the LVQ has been trained. Fig. 6 Comparison of water pixels with accuracy.

Optimum Input Selection
Surface water extraction is generally impacted by the presence of various forms of environmental noises, surroundings, extent, shape, depth of water, and types of water surfaces. In Sri Lanka, with its tropical and heterogeneity space regions, it is more challenging, because it is always covered by plentiful clouds, cloud shadows, terrain shadows, shadowed water, and higher aquatic plant diversity. These factors may contribute to the confusion during the spectral identification process, especially among small and higher density water features, 4 which showed that learning from few bands may be insufficient for water identification using LVQ, whereas the accuracy that may increase with the number of bands can be used. 28,47 A recent study showed the OLI band 7 with good performance in the perspective of water identification. 48 These all made it a necessity to incorporate more elements to represent real water during the process of learning in highly complex areas as our study site. Hence, a combination of seven OLI spectral bands with NDVI distribution 49 was introduced as input data to increase accuracy.
Selection of the best training input was carried out using a trial-and-error method with a validation dataset by comparing scenes with noises, such as cloud, terrain shadow, and cloud shadow with complex terrain and land covers, which resulted in the highest overall accuracy and kappa coefficient. Seven OLI bands and NDVI were tested with a few more water indices, and the outcomes are shown in Fig. 7. All produced visually similar results, whereas cloud, cloud shadows, and terrain shadows are significantly contributed to the false positive in Figs. 7(a2) and 7(b2). Results showed that the combination of OLI bands and NDVI index is the ideal and delivers remarkable results [Figs. 6(a1) and 6(a2)] since the accuracy cannot be increased by further adding indices, such as NDWI 2 and MNDWI.

Potential Global Applicability
The proposed model relies only on Landsat 8 OLI bands, without demanding auxiliary data or prior statistical knowledge on data distribution. Limited training samples showed the ability to detect a number of water type with acceptable accuracy. This model has excellent capacity and flexibility in the architecture to learn and segment in an automated mode at a large scale. Therefore, this model can be implemented to monitor global water by training more spectrally and geometrically varied water samples across the world, from available global water datasets: GLC2000, 50 Global Land Survey datasets 51 and Global Land Cover Facility together with perpixel labels, 52 without any preprocessing 20 in automated manner.
Learning directly from pixels, without considering data distribution makes the possibility to combine multisource data. Previous studies showed that ANN is insensitive to the input multispectral sensor 2,31 as well. This method can be easily adapted to other multispectral sensors, such as Quickbird 31 and Sentinel 2A/B. Future work will be directed toward the time series from Landsat 4, 4 Landsat 5, and Landsat 7 with available open-source tools. 7 Fig. 7 Performance comparison of surface water extraction by various input layer combinations: (a1), (b1) OLI bands 1 to 7 and NDVI; (a2), (b2) OLI bands 1 to 7, NDVI, NDWI, and MNDWI. Commission errors are marked as red squares.

Conclusion
Surface water mapping is getting more attention due to the growing concern about freshwater availability and water-related issues. This study proposes an LVQ neural network model for water extraction using Landsat 8 OLI images, where the heuristic learning process is intrinsically related to the spectral signature, shape, textural information, and spatial dependence. The performance of this model is evaluated under different backgrounds and compared with five commonly used machine learning algorithms (SVM, KNN, DA, MMFCM, and K-means).
The results of the study can be summarized as follows: (a) Performance of LVQ illustrates that it can accurately identify water with various water types (freshwater and seawater), extents (very narrow and small), shapes, depths, and detailed water boundary that can be extracted. The overall accuracy in eight test sites ranges from 97.8% to 99.7% and the kappa coefficients between 0.96 and 0.99. (b) According to the visual comparison, LVQ depicted a better performance under the complicated water environments than under the other five algorithms. Results show that it could effectively identify water by suppressing dark surfaces and environmental noises, such as cloud shadow, terrain shadow, build-up, cloud, and floating vegetation. In addition, it is able to precisely detect water under the shadow. (c) This study also found that the resulted accuracy is acceptable for all types of water bodies and water-scarce scenes, even if training samples are limited in quantity and water types. Accuracy may be further improved with perfection in the training samples and parameters. (d) The addition of the NDVI improved the consistency of LVQ in the complicated scenes by overcoming noises. Adding more indices did not show improvement in the performance.
Our results demonstrate that LVQ could be very useful for the accurate automated classification of surface water. Further applications remain to be explored, and we suggest that multisensors and universal water samples may be used to understand the global water dynamics over time and to forecast future trends.