Outlier detection method for geotechnical engineering based on MetaOD model selection
-
摘要: 岩土工程现场及室内参数测试数据是工程施工、设计、评价的基础。异常数据的存在往往会误导施工、设计等参数的确定, 数据异常检测是确保工程安全可靠的最基本但极为重要的工作。针对传统异常检测算法没有模型选择这一过程而导致检测的盲目性, 提出了基于元学习的异常检测算法(meta-learning outlier detection, MetaOD)和数据挖掘算法相结合的异常检测模型体系。该体系首先根据数据的特点选择适合不同数据类型的初始模型类型及其参数, 并对选择出的同类型算法的参数进行求均值处理; 然后再采用遴选出的算法进行数据异常诊断, 进而提高异常检测的准确性。为了评估模型的有效性, 采用加州大学欧文分校提出的机器学习检验数据集(glass数据集)进行检验分析。结果显示, 采用该模型体系进行异常检测时查准率达到96.41%, 远高于其他检测算法。最后, 应用该模型体系对澳门花岗岩单轴抗压强度数据集和均昌隧道的地下水位监测数据进行了异常检测分析, 并分别识别出9个和10个异常点。Abstract: The geotechnical engineering field and indoor parameter test data are the foundation of engineering construction, design and evaluation. The existence of abnormal data often misleads the determination of parameters such as construction and design. Data anomaly detection is the most basic but extremely important task to ensure the safety and reliability of a project. Aiming at the blindness of detection due to the lack of model selection in traditional anomaly detection algorithms, this paper proposes an anomaly detection model system based on a combination of meta-learning outlier detection (MetaOD) and data mining algorithms. The system first selects the initial model class and its parameters suitable for different data types according to the characteristics of the data, averages the selected parameters of the same type of algorithm, and then uses the selected algorithm to diagnose data anomalies, thereby improving the anomaly accuracy of detection. To evaluate the effectiveness of the model, the machine learning test dataset (glass dataset) proposed by the University of California Irvine, is used for test analysis. The results show that the accuracy rate of anomaly detection using this model system reaches 96.41%, which is much higher than that of other detection algorithms. Finally, the model system is applied to the uniaxial compressive strength dataset of the Macau granite and the groundwater monitoring data of the Junchang Tunnel to carry out anomaly detection and analysis and to identify 9 and 10 abnormal points, respectively.
-
表 1 分类结果混淆矩阵
Table 1. Confusion matrix of classification results
真实情况 预测结果 正例 反例 正例 TP(真正例) FN(假反例) 反例 FP(假正例) TN(真反例) 表 2 glass数据模型选择结果
Table 2. The selection results of the glass data model
排名 模型 参数(领域数) 1 ABOD 5 2 ABOD 15 3 ABOD 20 4 ABOD 25 5 IForest (20,0.2) 表 3 glass数据集在不同参数下的混淆矩阵和查准率
Table 3. Confusion matrix and precision of the glass dataset under different parameters
模型 TP FN FP TN 查准率/% 运算时间/s ABOD(5) 187 18 7 2 96.39 0.100 8 ABOD(15) 186 19 7 2 96.37 0.435 9 ABOD(20) 187 18 7 2 93.39 0.770 0 ABOD(25) 186 19 6 3 96.37 1.364 4 ABOD(16) 188 17 7 2 96.41 0.484 9 表 4 该模型算法与常规算法的检测结果比较
Table 4. Comparison of the detection results of the model algorithm and the conventional algorithm
模型 TP FN FP TN 查准率/% 运算时间/s ABOD (16) 188 17 7 2 96.41 0.484 9 COF 185 20 7 2 90.24 0.195 8 HBOS 184 21 8 1 89.76 1.303 5 OCSVM 184 21 8 1 89.76 0.014 8 LODA 184 21 8 1 89.76 0.027 4 CBLOF 185 20 7 2 90.24 1.096 7 COPOD 184 21 8 1 89.76 0.036 9 MCD 183 22 9 0 89.27 0.060 5 PCA 184 21 8 1 89.76 0.012 6 IForest 185 20 7 2 90.24 0.171 3 表 5 Ⅲ级花岗岩数据各变量间相关性
Table 5. Correlation among variables of the Macau Ⅲ-level granite dataset
UCS IS 50 RL vp ηe Gs UCS 1 0.75 0.66 0.77 -0.56 0.48 IS 50 1 0.65 0.46 -0.50 0.41 RL 1 0.50 -0.68 0.57 vp 1 -0.44 0.37 ηe 1 0.91 Gs 1 注:UCS.单轴抗压强度;IS 50.点荷载指数;RL.施密特锤回弹值;vp.纵波波速;ηe.有效孔隙率;Gs.相对密度 表 6 Ⅲ级花岗岩数据模型选择结果
Table 6. Selection results of the Macau Ⅲ-level granite dataset model
排名 模型 参数 1 HBOS (5,0.1) 2 IForest (20,0.5) 3 IForest (75,0.3) 4 IForest (200,0.2) 5 IForest (20,0.2) 表 7 均昌隧道的地下水位监测数据模型选择结果
Table 7. Selected results of the groundwater monitoring data model for the Junchang Tunnel
排名 模型 参数 1 LODA (25, 200) 2 LODA (15, 20) 3 IForest (50, 0.2) 4 LODA (25, 50) 5 LODA (30, 20) -
[1] Bieniawski Z T. The point-load test in geotechnical practice[J]. Engineering Geology, 1975, 9(1): 1-11. doi: 10.1016/0013-7952(75)90024-1 [2] Cargill J S, Shakoor A. Evaluation of empirical methods for measuring the uniaxial compressive strength of rock[J]. International Journal of Rock Mechanics and Mining Sciences & Geomechanics Abstracts. Pergamon, 1990, 27(6): 495-503. https://www.sciencedirect.com/science/article/pii/014890629091001N [3] 柴波, 陶阳阳, 杜娟, 等. 基于Hoek-Brown准则的节理岩体能量参数估算[J]. 地质科技通报, 2020, 39(1): 78-85. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202001010.htmChai B, Tao Y Y, Du J, et al. Energetics parameter estimation of jointed rock mass based on Hoek-Brown failure criterion[J]. Bulletin of Geological Scienceand Technology, 2020, 39(1): 78-85(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202001010.htm [4] 李宗强, 李居铜, 张爱勤, 等. 土木工程试验方法与数据处理[M]. 哈尔滨: 哈尔滨工业大学出版社, 2014.Li Z Q, Li J T, Zhang A Q, et al. Civil engineering test methods and data processing[M]. Harbin: Harbin Institute of Technology Press, 2014(in Chinese). [5] 马祥配, 刘福臣. 岩土工程检测常见问题处理[J]. 中国勘察设计, 2013(3): 96-98. https://www.cnki.com.cn/Article/CJFDTOTAL-KCSJ201303031.htmMa X P, Liu F C. Handling of common problems in geotechnical engineering inspection[J]. China Engineering Consulting, 2013(3): 96-98(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-KCSJ201303031.htm [6] 杨威. 郴宁高速公路万华岩边坡监测与稳定性评价方法研究[D]. 长沙: 中南大学, 2013.Yang W. Study of monitoring methods and stability evaluation method of Wanhuayan slope in Chenning highway[D]. Changsha: Central South University, 2013(in Chinese). [7] 李明. 盾构隧道长期健康监测与评价[D]. 北京: 中国科学院大学, 2015.Li M. Long-term health monitoring and evaluation of shield tunnels[D]. Beijing: University of Chinese Academy of Sciences, 2015(in Chinese). [8] 刘鸿飞, 刘俊芳, 苏跃宏, 等. 无侧限抗压强度异常值处理新方法的研究[J]. 岩土工程学报, 2020, 42(增刊1): 137-140. https://www.cnki.com.cn/Article/CJFDTOTAL-YTGC2020S1027.htmLiu H F, Liu J F, Su Y H, et al. New method for dealing with unconfined compressive strength outliers[J]. Chinese Journal of Geotechnical Engineering, 2020, 42(S1): 137-140(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-YTGC2020S1027.htm [9] Mu H Q, Yuen K V. Novel outlier-resistant extended Kalman filter for robust online structural identification[J]. Journal of Engineering Mechanics, 2015, 141(1): 04014100. doi: 10.1061/(ASCE)EM.1943-7889.0000810 [10] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets[C]//Anon. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. [S.l.]: [s.n.], 2000: 427-438. [11] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying density-based local outliers[C]//Anon. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. [S.l.]: [s.n.], 2000: 93-104. [12] 谭文侃, 叶义成, 胡南燕, 等. LOF与改进SMOTE算法组合的强烈岩爆预测[J]. 岩石力学与工程学报, 2021, 40(6): 1186-1194. https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX202106010.htmTan W K, Ye Y C, Hu N Y, et al. Severe rock burst prediction based on the combination of LOF and improved SMOTE algorithm[J]. Chinese Journal of Rock Mechanics and Engineering, 2021, 40(6): 1186-1194(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-YSLX202106010.htm [13] He Z, Xu X, Deng S. Discovering cluster-based local outliers[J]. Pattern Recognition Letters, 2003, 24(9/10): 1641-1650. [14] Tang J, Chen Z, Fu A W C, et al. Enhancing effectiveness of outlier detections for low density patterns[C]//Anon. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin, Heidelberg: Springer, 2002: 535-548. [15] Papadimitriou S, Kitagawa H, Gibbons P B, et al. Loci: Fast outlier detection using the local correlation integral[C]//Anon. Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405). [S.l.]: IEEE, 2003: 315-326. [16] Liu F T, Ting K M, Zhou Z H. Isolation forest[C]//Anon. 2008 Eighthieee International Conference on Data Mining. [S.l.]: IEEE, 2008: 413-422. [17] Zhao Y, Rossi R A, Akoglu L. Automating outlier detection via meta-learning[J]. arXiv preprint arXiv: 2009.10606, 2020. [18] Wolpert D H, Macready W G. No free lunch theorems for optimization[J]. IEEE Transactions on Evolutionary Computation, 1997, 1(1): 67-82. doi: 10.1109/4235.585893 [19] Bache K, Lichman M. UCI Machine learning repository (2013)[EB/OL]. [2013-03-27](2021-05-10). Available: http://archive.ics.uci.edu/ml [20] Chen L, Guo Z, Yin K, et al. The influence of land use and land cover change on landslide susceptibility: a case study in Zhushan Town, Xuan′en County (Hubei, China)[J]. Natural hazards and earth system sciences, 2019, 19(10): 2207-2228. doi: 10.5194/nhess-19-2207-2019 [21] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.Zhou Z H. Machine learning[M]. Beijing: Tsinghua University Press, 2016(in Chinese). [22] Kriegel H P, Schubert M, Zimek A. Angle-based outlier detection in high-dimensional data[C]//Anon. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [S.l.]: [s.n.], 2008: 444-452. [23] 柏道远, 李彬, 李银敏, 等. 湖南常德-安仁断裂印支期构造运动分段性: 来自花岗岩的约束[J]. 地质科技通报, 2021, 40(5): 173-187. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105019.htmBai D Y, Li B, Li Y M, et al. Segmentation of the movement in Indosinian of the Changde-Anren fault in Hunan: Constraints from granite[J]. Bulletin of Geological Science and Technology, 2021, 40(5): 173-187(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105019.htm [24] Ng I T, Yuen K V, Lau C H. Predictive model for uniaxial compressive strength for Grade Ⅲ granitic rocks from Macao[J]. Engineering Geology, 2015, 199: 28-37. doi: 10.1016/j.enggeo.2015.10.008 [25] 姚文礼. 四川盆地须家河组致密砂岩物源体系的控储作用[J]. 地质科技通报, 2021, 40(5): 223-230. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105023.htmYao W L. Reservoir control of tight sandstone provenance system in Xujiahe formation, Sichuan Basin[J]. Bulletin of Geological Science and Technology, 2021, 40(5): 223-230(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105023.htm [26] Goldstein M, Dengel A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm[C]//Anon. KI-2012: Poster and demo track. [S.l.]: [s.n.], 2012: 59-63. [27] 江欣悦, 李静, 郭林, 等. 豫北平原浅层地下水化学特征与成因机制[J]. 地质科技通报, 2021, 40(5): 290-300. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105030.htmJiang X Y, Li J, Guo L, et al. Chemical characteristics and formation mechanism of shallow ground water in the northern Henan Plain[J]. Bulletin of Geological Scienceand Technology, 2021, 40(5): 290-300(in Chinese with English abstract). https://www.cnki.com.cn/Article/CJFDTOTAL-DZKQ202105030.htm [28] Pevn T. Loda: Lightweight on-line detector of anomalies[J]. Machine Learning, 2016, 102(2): 275-304. doi: 10.1007/s10994-015-5521-0