Landslides are type of natural geohazard interfering with many economical and social activities and causing serious damages on human life. It is ranked as a great disaster, threatening life, property and environment. Therefore, early prediction of landslide prone areas is vital. Variety of causative factors such as glaciers melting, excessive raining, mining, volcanic activities, active faults, earthquake, logging, erosion, urbanization, construction, and other human activities can trigger landslide occurrence. Then, identification of factors that directly influences the slide events is highly in demand. Some topographical, geological, and hydrological datasets (e.g., slope, aspect, geology, terrain roughness, vegetation index, distance to stream, distance to road, distance to fault, land use, precipitation, profile curvature, plan curvature) are considered to be effective conditioning factors. However, the importance of each factor differs from one study to another. This study investigates the effectiveness of four sets of landslide conditioning variable(s). Fourteen landslide conditioning variables were considered in this study where they were duly divided into four groups G1, G2, G3, and G4. Three machine learning algorithms namely, Random Forest (RF), Naive Bayes (NB), and Boosted Logistic Regression (LogitBoost) were constructed based on each dataset in order to determine which set would be more suitable for landslide susceptibility prediction. In total, 227 landslide inventory datasets of the study area were used where 70% was used for training and 30% for testing. To this end, in the present research, the two main objectives were: 1) Investigation on effectiveness of 14 landslides conditioning factors (altitude, slope, aspect, total curvature, profile curvature, plan curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), Terrain Roughness Index (TRI), distance to fault, distance to road, distance to stream, land use, and geology) by analyzing and determining the most important factors using variance-inflated factor (VIF), Pearson’s correlation and Chi-square techniques. Consequently, 4 categories of datasets were defined; first dataset included all 14 conditioning factors, second dataset included Digital Elevation Models (DEM) derivatives (morphometrice factors), third dataset was only based on 5 factors namely lithology, land use, distance to stream, distance to road, and distance to fault, and last dataset was included 8 factors selected using factor analysis and optimization. 2) Evaluate the sensitivity of each modeling technique (NB, RF and LogitBoost) to different conditioning factors using the area under curve (AUC). Eventually, RF technique using optimized variables (G4) performed well with AUC of 0.940 followed by LogitBoost (0.898) and NB (0.864).
This study investigates the effectiveness of using groundwater inventory data for groundwater spring potential mapping in the Haraz watershed located in Norther Iran. From a total of 917 groundwater inventory dataset, six random inventory scenarios of 917, 690, 450, 230, 92, and 46 were generated. We trained two learning classifiers, namely the Support Vector Machine (SVM) and Random Forest (RF) based on each scenario to determine which one(s) would be more suitable for spring potential mapping. In each of the scenarios, 70% of the dataset was used for training whereas 30% was used for testing. The end results (classified maps) for each classifier and their respective dataset were quantitatively assessed based on the Area under Curve (AUC) metric. The prediction accuracies for the spring potential maps being produced for each scenario ranged from 0.693 to 0.736 using the SVM, and 0.608 to 0.895 for RF. Our findings indicate that 46 random points of inventory data did not produce a desirable outcome. On the contrary, more points yield better results, i.e. 450 random points produced the highest ROC when using SVM (0.736) followed by 917 and 690 random points using RF (0.895 and 0.877, respectively).