Collapse susceptibility evaluation based on an improved two-step sampling strategy and a convolutional neural network
-
摘要:
机器学习在崩塌滑坡泥石流地质灾害易发性分析评价领域已得到广泛的研究性应用, 非灾害样本的选取是易发性建模过程中的关键问题, 传统随机抽样和手工标注方法可能存在随机性和主观性。将土质崩塌易发性评价视为正例无标记(positive and unlabeled, 简称PU)学习, 提出了一种结合信息量(information value, 简称IV)和间谍技术(Spy)的两步卷积神经网络(convolutional neural networks, 简称CNN)框架(ISpy-CNN)。以广州市黄埔区崩塌编录和15类基础环境因子, 通过信息量模型筛选出部分低信息量样本; 采用间谍技术训练CNN模型, 从低信息量样本中识别出具有高置信度的可靠负例划分为非崩塌样本; 分别基于该学习框架、传统间谍技术和随机抽样, 使用支持向量机(support vector machine, 简称SVM)和随机森林(random forest, 简称RF)对比验证。结果表明, ISpy-CNN框架在验证集上的准确率、
F 1值、敏感度和特异度较随机采样分别提升了6.82%, 6.82%, 6.82%, 8.23%, 较传统Spy技术分别提升了2.86%, 2.89%, 2.86%, 2.31%;PU学习中第2步采用CNN模型的预测精度高于RF和SVM模型; 与传统Spy技术相比, 增加相同数量训练样本, ISpy-CNN框架筛选的样本集表现出较高的稳定性、预测精度和增长率。本研究提出的ISpy-CNN框架能更好地辅助选取高质量非灾害样本, 且崩塌易发性分区结果更符合实际的崩塌空间分布。Abstract:Objective Machine learning has been widely applied in the fields of collapse, landslide and debris flow susceptibility analysis. The selection of nonhazard samples is a key issue in landslide susceptibility analysis. Traditional random sampling and manual labelling methods may involve randomness and subjectivity.
Methods In view of the potential randomness and representativeness of noncollapse samples, this paper considered soil collapse susceptibility evaluation a positive-unlabelled (PU) learning problem and proposes a two-step convolutional neural network framework (ISpy-CNN) that combines an information value model and the Spy technique. First, 15 collapse-related factors were selected for modelling based on the geomorphological, geological, hydrological, and artificial environmental conditions of the study area. Low-information-value samples that were able to map the distribution structure of noncollapsing samples were screened by the information value model. Then, through the Spy technique and training the CNN model, negative samples with high confidence were identified from low-information-value samples that were classified as noncollapsed samples. Finally, based on the framework and traditional random sampling, we used support vector machine (SVM) and random forest (RF) models to compare and verify the reliability, prediction accuracy and data sensitivity of the proposed learning framework and other models.
Results The results illustrate that the proposed ISpy-CNN method can improve the accuracy,
F 1 value, sensitivity and specificity on the validation set by 6.82%, 6.82%, 6.82%, 8.23%, respectively compared to random sampling and 2.86%, 2.89%, 2.86%, 2.31%, respectively compared to the traditional Spy technique. The prediction accuracy of step 2 in PU learning using the CNN model is higher than that of the RF and SVM models. The sample set screened by the ISpy-CNN framework exhibited greater stability, prediction accuracy and growth rate than those screened by the traditional Spy technique by adding the same number of training samples.Conclusion The ISpy-CNN framework proposed in this paper can better assist in the selection of nonhazard samples and real collapse spatial distribution maps, and the results of the framework are more consistent with the actual collapse distributions.
-
表 1 CNN模型部分超参数设置
Table 1. Hyperparameter settings of the proposed CNN model
CNN参数项 参数值 卷积核大小 3×1 池化核大小 2×1 激活函数 ReLU 优化器 Adam 损失函数 二元交叉熵损失函数 学习率 0.005 表 2 使用不同采样方法的CNN模型分类结果对比
Table 2. Performance comparison of CNN models using different sampling methods
方法 ACC/% F1/% 敏感度/% 特异度/% 随机采样 85.08 85.07 85.08 84.94 Spy 89.04 89.00 89.04 90.86 ISpy 91.91 91.90 91.91 93.17 表 3 崩塌易发性评价频率比统计结果
Table 3. Frequency ratio statistics of collapse susceptibility evaluation
易发性等级 Spy ISpy SVM RF CNN SVM RF CNN 极低 0.00 0.00 0.03 0.03 0.00 0.01 低 0.32 0.22 0.48 0.47 0.10 0.16 中等 1.54 0.83 1.12 1.23 0.57 0.44 高 4.94 2.25 1.35 4.49 1.59 1.06 极高 6.48 7.46 7.83 6.75 8.00 8.19 -
[1] 胡厚田. 崩塌与落石[M]. 北京: 中国铁道出版社, 1989: 1-2.HU H T. Collapse and rockfall[M]. Beijing: China Railway Publishing House, 1989: 1-2. (in Chinese) [2] 国土资源部地质环境司, 国土资源部宣传教育中心. 中国地质灾害与防治[M]. 北京: 地质出版社, 2003: 184-185.Geological Environment Department of the Ministry of Natural Resources of the People's Republic of China, Publicity and Education Center of the Ministry of NaturalResources of the People's Republic of China. Geological hazards and prevention in China[M]. Beijing: Geological Publishing House, 2003: 184-185. (in Chinese) [3] 曹洪洋, 袁颖, 贾磊. 区域降雨型滑坡灾害预警预报[M]. 北京: 地质出版社, 2017: 8-10.CAO H Y, YUAN Y, JIA L. Early warning and prediction of regional rainfall induced landslide[M]. Beijing: Geological Publishing House, 2017: 8-10. (in Chinese) [4] 温亚楠, 张志华, 慕号伟, 等. 动态多源数据驱动模式下的滑坡灾害空间预测[J]. 自然灾害学报, 2021, 30(3): 83-92. https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZH202103010.htmWEN Y N, ZHANG Z H, MU H W, et al. Landslide disaster spatial prediction under dynamic multi-source data-driven mode[J]. Journal of Natural Disasters, 2021, 30(3): 83-92. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-ZRZH202103010.htm [5] ADA M, SAN B T. Comparison of machine-learning techniques for landslide susceptibility mapping using two-level randomsampling (2LRS) in Alakir catchment area, Antalya, Turkey[J]. Natural Hazards, 2018, 90(1): 237-263. doi: 10.1007/s11069-017-3043-8 [6] 张书豪, 吴光. 随机森林与GIS的泥石流易发性及可靠性[J]. 地球科学, 2019, 44(9): 3115-3134. https://www.cnki.com.cn/Article/CJFDTOTAL-DQKX201909025.htmZHANG S H, WU G. Debris flow susceptibility and its reliability based on random forest and GIS[J]. Earth Science, 2019, 44(9): 3115-3134. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DQKX201909025.htm [7] 田乃满, 兰恒星, 伍宇明, 等. 人工神经网络和决策树模型在滑坡易发性分析中的性能对比[J]. 地球信息科学学报, 2020, 22(12): 2304-2316. https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202012004.htmTIAN N M, LAN H X, WU Y M, et al. Performance comparison of BP artificial neural network and CART decision tree model in landslide susceptibility prediction[J]. Journal of Geo-information Science, 2020, 22(12): 2304-2316. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202012004.htm [8] 郭子正, 殷坤龙, 黄发明, 等. 基于滑坡分类和加权频率比模型的滑坡易发性评价[J]. 岩石力学与工程学报, 2019, 38(2): 287-300. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX201902007.htmGUO Z Z, YIN K L, HUANG F M, et al. Evaluation of landslide susceptibility based on landslide classification and weighted frequency ratio model[J]. Chinese Journal of Rock Mechanics and Engineering, 2019, 38(2): 287-300. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX201902007.htm [9] 武雪玲, 沈少青, 牛瑞卿. GIS支持下应用PSO-SVM模型预测滑坡易发性[J]. 武汉大学学报(信息科学版), 2016, 41(5): 665-671. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201605015.htmWU X L, SHEN S Q, NIU R Q. Landslide susceptibility prediction using GIS and PSO-SVM[J]. Geomatics and Information Science of Wuhan University, 2016, 41(5): 665-671. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201605015.htm [10] 黄发明, 殷坤龙, 蒋水华, 等. 基于聚类分析和支持向量机的滑坡易发性评价[J]. 岩石力学与工程学报, 2018, 37(1): 156-167. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX201801016.htmHUANG F M, YIN K L, JIANG S H, et al. Landslide susceptibility assessment based on clustering analysis and support vector machine[J]. Chinese Journal of Rock Mechanics and Engineering, 2018, 37(1): 156-167. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX201801016.htm [11] 余凯, 贾磊, 陈雨强, 等. 深度学习的昨天、今天和明天[J]. 计算机研究与发展, 2013, 50(9): 1799-1804. https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201309002.htmYU K, JIA L, CHEN Y Q, et al. Deep learning: Yesterday, today and tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-JFYZ201309002.htm [12] WANG H, ZHANG L, YIN K, et al. Landslide identification using machine learning[J]. Geoscience Frontiers, 2020, 12(1): 351-364. [13] SAHA S, ROY J, HEMBRAM T K, et al. Comparison between deep learning and tree-based machine learning approaches for landslide susceptibility mapping[J]. Water, 2021, 13(19): 2664-2693. doi: 10.3390/w13192664 [14] 王毅, 方志策, 牛瑞卿, 等. 基于深度学习的滑坡灾害易发性分析[J]. 地球信息科学学报, 2021, 23(12): 2244-2260. https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202112012.htmWANG Y, FANG Z C, NIU R Q, et al. Landslide susceptibility analysis based on deep learning[J]. Journal of Geo-information Science, 2021, 23(12): 2244-2260. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202112012.htm [15] GONG C, LIU T, YANG J, et al. Large-margin label-calibrated support vector machines for positive and unlabeled learning[J]. IEEE Trans. Neural Netw. Learn Syst., 2019, 30(11): 3471-3483. doi: 10.1109/TNNLS.2019.2892403 [16] YAO J, QIN S, QIAO S, et al. Application of a two-step sampling strategy based on deep neural network for landslide susceptibility mapping[J]. Bulletin of Engineering Geology and the Environment, 2022, 81(4): 1-20. [17] HAN D, LI S, WEI F, et al. Two birds with one stone: Classifying positive and unlabeled examples on uncertain data streams[J]. Neurocomputing, 2018, 277: 149-160. doi: 10.1016/j.neucom.2017.03.094 [18] VILLATORO-TELLO E, ANGUIANO E, MONTESY-GÓMEZ M, et al. Enhancing semi-supervised text classification using document summaries[C]//Anon. Ibero-American Conference on Artificial Intelligence. Switzerland: Cham, 2016: 115. [19] 戴悦. 基于信息量模型的三峡库区滑坡区域危险性评价方法研究[D]. 北京: 清华大学, 2013.DAI Y. Study on the method of regional early warning of landslide in Three Gorges area based on information model[D]. Beijing: Tsinghua University, 2013. (in Chinese with English abstract) [20] 周晓亭, 黄发明, 吴伟成, 等. 基于耦合信息量法选择负样本的区域滑坡易发性预测[J]. 工程科学与技术, 2022, 54(3): 25-35.ZHOU X T, HUANG F M, WU W C, et al. Regional landslide susceptibility prediction based on negative sample selected by coupling information value method[J]. Advanced Engineering Science, 2022, 54(3): 25-35. (in Chinese with English abstract) [21] 陈飞, 蔡超, 李小双, 等. 基于信息量与神经网络模型的滑坡易发性评价[J]. 岩石力学与工程学报, 2020, 39(增刊1): 2859-2870. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX2020S1027.htmCHEN F, CAI C, LI X S, et al. Evaluation of landslide susceptibility based on information volume and neural network model[J]. Chinese Journal of Rock Mechanics and Engineering, 2020, 39(S1): 2859-2870. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX2020S1027.htm [22] 温鑫, 范宣梅, 陈兰, 等. 基于信息量模型的地质灾害易发性评价: 以川东南古蔺县为例[J]. 地质科技通报, 2022, 41(2): 290-299. doi: 10.19509/j.cnki.dzkq.2022.0054WEN X, FAN X M, CHEN L, et al. Susceptibility assessment of geological disasters based on an information value model: A case of Gulin County in southeast Sichuan[J]. Bulletin of Geological Science and Technology, 2022, 41(2): 290-299. (in Chinese with English abstract) doi: 10.19509/j.cnki.dzkq.2022.0054 [23] 殷坤龙. 滑坡灾害预测预报[M]. 武汉: 中国地质大学出版, 2001: 19-22.YIN K L. Time prediction and risk evaluation of landslide hazard and prospective[M]. Wuhan: China University of Geosciences Press, 2001: 19-22. (in Chinese) [24] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551. doi: 10.1162/neco.1989.1.4.541 [25] GU J, WANG Z, KUEN J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 77: 354-377. doi: 10.1016/j.patcog.2017.10.013 [26] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述[J]. 电子科技大学学报, 2011, 40(1): 2-10. https://www.cnki.com.cn/Article/CJFDTOTAL-DKDX201101003.htmDING S F, QI B J, TAN H Y. An overview on theory and algorithm of support vector machines[J]. Journal of University of Electronic Science and Technology of China, 2011, 40(1): 2-10. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DKDX201101003.htm [27] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 178-181.ZHOU Z H. Machine learning[M]. Beijing: Tsinghua University Publishing House, 2016: 178-181. (in Chinese) [28] 黄发明, 胡松雁, 闫学涯, 等. 基于机器学习的滑坡易发性预测建模及其主控因子识别[J]. 地质科技通报, 2022, 41(2): 79-90. doi: 10.19509/j.cnki.dzkq.2021.0087HUANG F M, HU S Y, YAN X Y, et al. Landslide susceptibility prediction and its main environmental factors identification based on machine learning models[J]. Bulletin of Geological Science and Technology, 2022, 41(2): 79-90. (in Chinese with English abstract) doi: 10.19509/j.cnki.dzkq.2021.0087 [29] VU D H, MUTTAQI K M, AGALGAONKAR A P. A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables[J]. Applied Energy, 2015, 140: 385-394. doi: 10.1016/j.apenergy.2014.12.011 [30] 闫举生, 谭建民. 基于不同因子分级法的滑坡易发性评价: 以湖北远安县为例[J]. 中国地质灾害与防治学报, 2019, 30(1): 52-60. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGDH201901006.htmYAN J S, TAN J M. Landslide susceptibility assessment based on different factor classification methods: A case study in Yuan'an County of Hubei Province[J]. The Chinese Journal of Geological Hazard and Control, 2019, 30(1): 52-60. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGDH201901006.htm [31] 李婷婷, 吕佳, 范伟亚. 基于新型间谍技术的半监督自训练正例无标记学习[J]. 计算机应用, 2019, 39(10): 2822-2828. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201910006.htmLI T T, LÜ J, FAN W Y. Semi-supervised self-training positive and unlabeled learning based on new Spy technology[J]. Journal of Computer Application, 2019, 39(10): 2822-2828. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201910006.htm [32] JUBA B, LE H S. Precision-recall versus accuracy and the role of large data sets[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 4039-4048. doi: 10.1609/aaai.v33i01.33014039