Optimized negative sampling strategies of gradient boosting decision tree and random forest for evaluating Wenchuan coseismic landslides susceptibility mapping
-
摘要: 强震诱发的滑坡具有数量多、分布广、规模大等特点,严重威胁人民生命财产安全。滑坡易发性评价能够快速预测灾害空间分布,对于减轻震后灾害的危险性具有重要意义。在同震滑坡易发性评价研究中,如何选取滑坡负样本并通过耦合机器学习模型提高评价精度的对比研究仍需进一步研究。本文以山区汶川地震诱发的滑坡为研究区,首先,选取地形地貌、地质环境、地震参数等10个滑坡影响因子,分析滑坡空间分布规律;其次因子共线性分析检验数据冗余,接下来采用频率比法(FR)选取极低、低易发区滑坡负样本点的采样策略;最后采用基于决策树演化改进的梯度提升决策树(GBDT)、随机森林(RF)和耦合模型(FR-GBD与FR-RF),开展了基于机器学习的同震滑坡易发性区划并进行精度评价。研究结果表明:①滑坡空间分布受到多层级因子控制;②模型预测精度为:FR-RF (AUC=0. 943) > FR-GBDT (AUC=0. 926) > RF (AUC=0.901) > GBDT (AUC=0. 856),结果表明在低易发区选择滑坡负样本可以明显提高易发性精度,研究成果可为滑坡易发性中负样本的选择和评价模型构建提供参考同时也为震后滑坡的防灾减灾提供理论支持。Abstract: Strong earthquake-induced landslides have the characteristics of a large number, wide distribution, and large scale, which seriously threaten the safety of people's lives and property. Landslide susceptibility mapping (LSM) can quickly predict the spatial prone-area distribution, which is of great significance for reducing risky disasters post-earthquakes. However, in the study of co-seismic landslide LSM, how to select the landslide negative samples and integrated machine learning model to improve the evaluation accuracy still needs further investigation. In this research, the landslides induced by the Wenchuan Earthquake were selected as a case study in mountainous areas. Firstly, 10 landslide influencing factors such as topography, geological environment, and seismic parameters were selected to analyze the landslide spatial distribution; secondly, collinearity analysis was used to test data redundancy. Non-negative sample points of sampling strategies were randomly selected at the extremely low susceptible regions by frequency ratio method (FR). Finally, Gradient Boosting Decision Tree (GBDT), Random Forest (RF), and their optimal models were used to map the co-seismic landslides susceptibility, conduct a comparative study of the models and carry out the accuracy assessment. The results show that: ①The landslide spatial distribution is controlled by multi-level factors; ②the accuracy of the models is FR-RF (AUC=0. 943) > FR-GBDT (AUC=0. 926) > RF (AUC=0. 901) > GBDT (AUC=0. 856). Selecting the landslide-negative samples at low-prone areas could significantly improve the LSM accuracy, and the research results can provide a reference for selecting landslide-negative samples and constructing the evaluation models, as well as for providing theoretical support for disaster prevention and mitigation of post-earthquake.
点击查看大图
计量
- 文章访问数: 89
- PDF下载量: 19
- 被引次数: 0