一种结合SMOTE和卷积神经网络的滑坡易发性评价方法

A Landslide Susceptibility Assessment Method Using SMOTE and Convolutional Neural Network

  • 摘要: 大规模的人类工程活动诱发和加剧了滑坡灾害的致灾情况,严重威胁工程安全和环境安全。滑坡易发性评价是滑坡监测预警的关键技术。针对传统滑坡监测手段数据源有限、缺乏挖掘滑坡灾害空间分布特征及其诱发因素的有效方法等问题,以位于三峡库区的中国重庆市万州区为研究区,基于地形、地质和遥感影像等多源数据,首先提取了22个滑坡易发性评价因子,并对这些因子进行多重共线性检验;然后采用合成少数类过采样技术(synthetic minority oversampling technique, SMOTE)解决滑坡和非滑坡样本比例不平衡问题,建立输入训练集;最后构建卷积神经网络(convolutional neural networks,CNN)模型,定量预测滑坡易发性,生成滑坡易发性分区图。采用受试者工作特征曲线分析评价结果,测试数据集模型精度达89.50%,说明该模型是一种高性能的滑坡易发性评价方法。

     

    Abstract:
      Objectives  Large-scale human engineering activities induce and aggravate the landslide disaster, which seriously threatens the engineering safety and environmental safety. Landslide susceptibility assessment is the key technique of landslide monitoring and early warning. In view of the limited data sources of traditional landslide monitoring methods and the lack of effective methods for mining the spatial distribution of landslide hazards and their inducing factors, the main objectives of this study was to investigate the prediction performances of synthetic minority oversampling technique (SMOTE) and convolutional neural networks (CNN) for landslide susceptibility assessment in the Wanzhou area of the Three Gorges, China.
      Methods   The landslide susceptibility assessment methods used in this paper are SMOTE and CNN. SMOTE is an improved random oversampling algorithm, which was used to better the imbalance of input samples. CNN is a widely used deep learning network and differs from other neural networks mainly in convolution operation. CNN was used to analyze the nonlinear relationship between landslides and its influencing factors. Firstly, 22 influencing factors were extracted from multi-source data such as topographic, geological, and remote sensing data, and multicollinearity test was performed on these factors. Then, the grid unit with a spatial resolution of 25 m was selected as the mapping unit. 30% of the initial sample dataset was selected randomly for testing of the CNN, and the remaining 70% was expanded by using SMOTE to obtain 1: 1 landslide and non-landslide dataset. 90% of the proportionally balanced dataset was randomly selected as the training set and the remaining 10% as the validation set. Thirdly, the obtained training sam - ple set is input to the CNN and six optimal model parameters were set to ensure the stability and efficiency of the model. Finally, the trained CNN was used to generate landslide susceptibility zoning map, and the receiver operating characteristic curves, testing and validating set were used for evaluating the accuracy and results of the proposed model.
      Results   The proposed SMOTE-CNN model with the highest predictive accuracy and generalization ability was trained and then used to calculate the landslide susceptibility indices. The index values of the grid cells vary from zero to one and correspond to landslide susceptibilities from low to high. The landslide susceptibility maps were created using these values. For easy visual interpretation, the susceptibility values were classified into five classes (very low, low, moderate, high, and very high) using Jenks natural breaks. Approximately, 59.57 % of the landslides lie in the very-high-susceptibility region, 18.74% of the landslides lie in the high-susceptibility region, 10.73% of the landslides lie in the moderatesusceptibility region, and 10.96 % of the landslides lie in the low- and very-low-susceptibility regions. The experimental results demonstrate that the proposed model provides the best predictive accuracy. The model can effectively assess landslide susceptibility and provides a novel method for landslide prediction.
      Conclusions   The SMOTE and CNN were applied to quantitatively mapping the landslide susceptibility in Wanzhou area of the Three Gorges, China. The experimental results show that SMOTE and CNN have good stability, and the results were consistent with field investigations and can provide a reference for landslide prevention and reduction in the Three Gorge, China. In addition, CNN has the characteristics of strong adaptability and is good at mining local features of data and extracting global training features and classification. However, the physical meaning of CNN model is not clear, and it takes a long time to process a large amount of data, so it relies on GPU acceleration. Therefore, it is necessary to carry out further research from the aspects of super-parameter optimization combination, model structure adjustment, applicability analysis etc., so as to improve the accuracy and robustness of the model.

     

/

返回文章
返回