Application of certainty factor and random forests model in landslide susceptibility evaluation in Mangshi City, Yunnan Province
-
摘要: 编制科学的滑坡易发性分区图,可以有效降低灾害带来的损失。以云南省芒市为研究区,利用确定性系数模型(certainty factor,简称CF)方法计算各个因子的敏感值,作为随机森林(random forests,简称RF)的分类数据,选取合适的训练数据和最优化的模型参数进行模型预测,从而对研究区进行滑坡易发性评价分区。采用频率比方法将连续性因子离散化,从而通过确定性系数计算因子不同区间的滑坡易发性,同时利用CF先验模型,对研究区负样本进行选取。通过计算袋外误差得到最优化的RF参数,随后利用RF模型对研究区模型进行训练及预测。绘制ROC曲线和三维遥感影像对预测模型结果分别进行定量和定性评价,结果表明,所得到的模型精度为91%,优于随机抽样得到的结果。最后,采用平均基尼不纯度减少和平均准确度下降两种计算方法计算、评价了研究区各个因子的重要性。基于以上对研究区进行的滑坡易发性评价结果,可以为该区灾害风险评估和管理提供依据。Abstract: Drawing up scientific zoning maps of landslide susceptibility can effectively reduce the loss caused by disasters.Taking Mangshi City, Yunnan Province as the research area, the researchers used certainty factor (CF) method to calculate the sensitive values of each factor, and used them as classified data of random forests (RF), selected appropriate training data and optimized model parameters, and finally established the prediction model of susceptibility in the research area.In this paper, the frequency ratio method is adopted to discretize the continuity factor, so as to calculate the landslide susceptibility of different sections of the factor through the deterministic coefficient.Meanwhile, CF prior model is used to select negative samples in the research area.The optimized RF parameters are obtained by calculating the out-of-pocket errors, and then the RF model is used to train and predict the research area model.ROC curve and 3D remote sensing image were drawn to evaluate the prediction model results quantitatively and qualitatively, and the results showed that the accuracy of the model was 91%, which was better than that of random sampling Finally, the importance of each factor in the study area was calculated and evaluated by using two calculation methods of average Gini impurity reduction and average accuracy reduction.Based on the above, the landslide vulnerability assessment is carried out in the study area to provide a basis for disaster risk assessment and management in this area.
-
表 1 数据类别及特性
Table 1. Data categories and features
数据类型 空间分辨率和比例尺 数据用途 GF-1 多光谱8 m,全色2 m 对道路进行补充和校正 Landsat 8 多光谱30 m,全色15 m 通过计算NDVI,计算植被覆盖度 地形图 1:50 000 生成数字地形高程(digital elevation model,简称DEM),计算坡度、坡向等 地质图 1:50 000 提取地层岩性 构造纲要图 1:50 000 提取断层 地貌图 1:50 000 提取研究区地貌 表 2 各因子离散化结果
Table 2. Discretization results of factors
因子 离散化 地貌类型 湖积台地地丘地形、火山锥地形、垄背槽谷地形、山间河谷冲积平原、岩溶中山峡谷地形、中山中切割地形、中山中切割陡坡地形、中山中切割垄状地形 岩性 薄-中层状较坚硬碎屑岩岩组、薄-中层状弱风化碳酸盐岩岩组、薄-中层状较软中等风化碎屑岩岩组、薄层状强风化变质岩岩组、块状较软强风化花岗岩岩组、块状弱风化花岗岩岩组、松散类土体、中厚层状弱风化变质岩岩组、中厚层状坚硬碳酸盐岩岩组 坡度/(°) [0, 9],(9, 18],(18, 21],(21, 30],(30, 68] 坡向 北、东北、东、东南、南、西南、西、西北 高程/m [528,900],(900, 1 200],(1 200, 1 600],(1 600, 2 000],(2 000, 2 865] 断层缓冲距/m [0, 300],(300, 600],(600, 900],(900, 1 200],(1 200, 1 500],(1 500, +∞] 道路缓冲距/m [0, 300],(300, 600],(600, 900],(900, 1 200],(1 200, +∞] 剖面曲率 [-3, -1.5],(-1.5, -0.5],(-0.5, 0],(0, 1],(1, 3] 植被覆盖度 [0, 0.3],(0.3, 0.5],(0.5, 0.75],(0.75, 0.9],(0.9, 1.0] 地形起伏度/m [0, 10],(10, 30],(30, 40],(40, 70],(70, 170] 表 3 多重共线性检查
Table 3. Multicollinearity check
因子 坡度 地质构造 岩性 地貌 海拔 坡向 剖面曲率 植被覆盖度 地形起伏度 道路 容忍度T 0.776 0.934 0.711 0.665 0.743 0.831 0.969 0.810 0.752 0.988 方差膨胀因子VIF 1.289 1.071 1.407 1.505 1.346 1.204 1.032 1.234 1.331 1.012 表 4 混淆矩阵
Table 4. Confusion matrix
数量/个 预测 分类误差/% 0 1 实际 0 316 1 3 1 15 91 14 表 5 易发性结果分区统计
Table 5. Regional statistics of susceptibility results
易发性等级 滑坡点/个 分级栅格数 分级滑坡比例/% 分级栅格比例/% 滑坡比率 低易发区 4 141 923 1.13 47.74 0.023 7 较低易发区 38 64 075 10.76 21.55 0.499 4 中易发区 95 46 742 26.91 15.72 1.711 6 较高易发区 99 28 904 28.05 9.72 2.884 5 高易发区 117 15 639 33.14 5.26 6.300 5 -
[1] 刘勇, 秦志萌, 刘曼, 等.基于状态划分的滑坡位移预测方法研究[J].地质科技情报, 2018, 37(1):184-189. [2] 王念秦, 王永锋, 罗东海, 等.中国滑坡预测预报研究综述[J].地质论评, 2008, 44(3):355-361. [3] 王芳, 殷坤龙, 桂蕾, 等.不同日降雨工况下万州区滑坡灾害危险性分析[J].地质科技情报, 2018, 37(1):190-195. [4] 杜谦, 范文, 李凯, 等.二元Logistic回归和信息量模型在地质灾害分区中的应用[J].灾害学, 2017, 32(2):220-226. http://d.wanfangdata.com.cn/Periodical/zhx201702039 [5] Pawluszek K, Borkowski A.Impact of DEM-derived factors and analytical hierarchy process on landslide susceptibility mapping in the region of Roznow Lake, Poland[J].Natural Hazards, 2017, 86(2):919-952. doi: 10.1007/s11069-016-2725-y [6] Hong H, Pradhan B, Sameen M I, et al.Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach[J].Landslides, 2018, 15(4):753-772. doi: 10.1007/s10346-017-0906-8 [7] 王卫东, 钟晟.基于GIS的Logistic回归模型在地质灾害危险性区划中的应用[J].工程勘察, 2009, 37(11):5-10. http://www.cnki.com.cn/Article/CJFDTotal-GCKC200911003.htm [8] Patriche C V, Pirnau R, Grozavu A, et al.A comparative analysis of binary logistic regression and analytical hierarchy process for landslide susceptibility assessment in the Dobrovat River Basin, Romania[J].Pedosphere, 2016, 26(3):335-350. doi: 10.1016/S1002-0160(15)60047-9 [9] Chen W, Xie X, Peng J, et al.GIS-based landslide susceptibility modelling:A comparative assessment of kernel logistic regression, Naive-Bayes tree, and alternating decision tree models[J].Geomatics Natural Hazards & Risk, 2017, 8(2):950-973. [10] 安凯强, 牛瑞卿.信息量支持下SVM模型滑坡灾害易发性评价[J].长江科学院院报, 2016, 33(8):47-51. http://d.wanfangdata.com.cn/Periodical/cjkxyyb201608011 [11] 郭天颂, 张菊清, 韩煜, 等.基于粒子群优化支持向量机的延长县滑坡易发性评价[J].地质科技情报, 2019, 38(3):236-243. http://www.cqvip.com/QK/93477A/20193/7002215554.html [12] 李远远, 梅红波, 任晓杰, 等.基于确定性系数和支持向量机的地质灾害易发性评价[J].地球信息科学学报, 2018, 20(12):1699-1709. [13] 田春山, 刘希林, 汪佳.基于CF和Logistic回归模型的广东省地质灾害易发性评价[J].水文地质工程地质, 2016, 43(6):154-161. http://www.cqvip.com/QK/90596X/20166/670705557.html [14] Breiman L.Random forests[J].Machine Learning, 2001, 45(1):5-32. [15] Verikas A, Gelzinis A, Bacauskiene M.Mining data with random forests:A survey and results of new tests[J].Pattern Recognition, 2011, 44(2):330-349. doi: 10.1016/j.patcog.2010.08.011 [16] 方匡南, 吴见彬, 朱建平, 等.随机森林方法研究综述[J].统计与信息论坛, 2011, 26(3):32-38. http://www.cnki.com.cn/Article/CJFDTotal-TJLT201103007.htm [17] 马海荣, 程新文.一种处理非平衡数据集的优化随机森林分类方法[J].微电子学与计算机, 2018, 35(11):28-32. http://d.old.wanfangdata.com.cn/Periodical_wdzxyjsj201811006.aspx [18] Ling P, Niu R, Bo H, et al.Landslide susceptibility mapping based on rough set theory and support vector machines:A case of the Three Gorges area, China[J].Geomorphology, 2014, 204(1):287-301. [19] 于宪煜, 胡友健, 牛瑞卿.基于RS-SVM模型的滑坡易发性评价因子选择方法研究[J].地理与地理信息科学, 2016, 32(3):23-28. http://www.cqvip.com/QK/92655A/201603/668985173.html [20] Behnia P, Blais-Stevens A.Landslide susceptibility modelling using the quantitative random forest method along the northern portion of the Yukon Alaska Highway Corridor, Canada[J].Natural Hazards, 2018, 90(1):1-20. doi: 10.1007/s11069-017-3105-y [21] 戴福初, 姚鑫, 谭国焕.滑坡灾害空间预测支持向量机模型及其应用[J].地学前缘, 2007, 14(6):153-159. [22] Shortliffe E H, Davis R, Axline S G, et al.Computer-based consultations in clinical therapeutics:explanation and rule acquisition capabilities of the MYCIN system.[J].Computers and Biomedical Research, 1975, 8(4):303-320. doi: 10.1016/0010-4809(75)90009-9 [23] Heckerman D.Probabilistic interpretations for MYCIN's Certainty Factors[M].:, 1990. [24] 武晓岩, 李康.基因表达数据判别分析的随机森林方法[J].中国卫生统计, 2006, 23(6):491-494. http://d.wanfangdata.com.cn/Periodical/zgwstj200606004 [25] 李亭, 田原, 邬伦, 等.基于随机森林方法的滑坡灾害危险性区划[J].地理与地理信息科学, 2014, 30(6):25-30. http://www.cqvip.com/QK/92655A/20146/663026138.html [26] 王奕森, 夏树涛.集成学习之随机森林算法综述[J].信息通信技术, 2018, 12(1):49-55. [27] 王佳佳, 殷坤龙, 肖莉丽.基于GIS和信息量的滑坡灾害易发性评价:以三峡库区万州区为例[J].岩石力学与工程学报, 2014, 33(4):797-808. [28] 张俊, 殷坤龙, 王佳佳, 等.三峡库区万州区滑坡灾害易发性评价研究[J].岩石力学与工程学报, 2016, 35(2):284-296. [29] 罗盛锋, 刘永丽, 巩时源.基于TM影像的焦作市植被覆盖度时空动态监测[J].河南农业科学, 2015, 44(4):177-180. http://www.cqvip.com/QK/93996X/20154/664855823.html [30] 武雪玲, 任福, 牛瑞卿.多源数据支持下的三峡库区滑坡灾害空间智能预测[J].武汉大学学报:信息科学版, 2013, 38(8):963-968. http://d.wanfangdata.com.cn/Periodical/whchkjdxxb201308018 [31] Liu X, Wu J, Zhou Z.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems Man and Cybernetics Part B-cybernetics, 2009, 39(2):539-550. doi: 10.1109/TSMCB.2008.2007853 [32] 高惠璇.处理多元线性回归中自变量共线性的几种方法:SAS/STAT软件(6.12)中REG等过程增强功能的使用[J].数理统计与管理, 2000, 19(5):49-55. [33] Tzeng C H.A mathematical formulation of uncertain information[J].Annals of Mathematics & Artificial Intelligence, 1991, 4(1/2):69-87. [34] Chen W, Chai H, Zhao Z, et al.Landslide susceptibility mapping based on GIS and support vector machine models for the Qianyang County, China[J].Environmental Earth Sciences, 2016, 75(6):1-13. [35] Myronidis D, Papageorgiou C, Theophanous S.Landslide susceptibility mapping based on landslide history and analytic hierarchy process (AHP)[J].Natural Hazards, 2016, 81(1):1-19. doi: 10.1007/s11069-015-2063-5 [36] 黄发明.基于3S和人工智能的滑坡位移预测与易发性评价[D].武汉: 中国地质大学(武汉), 2017. [37] 武雪玲, 任福, 牛瑞卿, 等.斜坡单元支持下的滑坡易发性评价支持向量机模型[J].武汉大学学报:信息科学版, 2013, 38(12):1499-1503. [38] Pourghasemi H R, Kerle N.Random forests and evidential belief function-based landslide susceptibility assessment in Western Mazandaran Province, Iran[J].Environmental Earth Sciences, 2016, 75:1851-18517.