Reconstructing and interpreting analysis of sonic logging curves based on machine learning and SHAP algorithm
-
摘要:
测井技术是查明地下岩性、地层及地质流体的关键技术手段,在石油勘探行业中发挥着至关重要的作用。然而,由于仪器损坏、井眼条件等因素,经常造成测井数据缺失、曲线不全等问题,传统多元线性回归或经验公式方法无法合理地构建测井曲线间的关系模型使得曲线重构精度相对较低,机器学习算法虽能在大量数据之间找到最为合适的数据映射关系进而提高模型精度,但相较而言其所构建的黑箱模型无法得到良好的解释。近期,可解释性算法的运用使得机器学习在重构测井曲线中的应用更为合理。通过将支持向量回归(support vector regression,简称SVR),随机森林(random forest,简称RF)以及极限梯度提升(extreme gradient boosting,简称XGBoost)和传统多元线性回归方法(linear regression,简称LR)的对比对英国能源局22-30b-11号井声波测井曲线进行了模型重构并基于shapley additive explanations(SHAP)算法对XGBoost模型进行了解释。结果表明,XGBoost在测试集上的决定系数(
R 2)和均方误差(MSE )分别为0.996,6.371,优于SVR的0.990、15.755和RF的0.993、9.871,而传统多元线性回归方法则为0.969、48.895,表明XGBoost对声波时差曲线的重构具有更高的准确度和更好的泛化性能。创新性地采用SHAP算法对XGBoost黑箱模型的解释表明,在模型构建选择重要特征时,XGBoost模型采用地层温度数据作为特征明显合理于多元线性回归方法采用的井径测井数据。最后基于SHAP对模型进行了单点和全局特征交互解释。上述结果表明在声波测井曲线重构方面,机器学习算法明显优于传统的多元线性回归方法,并证明了SHAP算法在声波测井曲线重构机器学习模型解释方面的可行性,为后续机器学习在测井解释中的发展提供了新的思路。Abstract:Objective Well logging techniques is cruicial for determining subsurface lithological characteristics and geological structures, which plays a pivotal role in the petroleum exploration industry. However, issues such as instrument damage and wellbore conditions frequently lead to data gapping or incomplete curves of well logs. Traditional multivariate linear regression or empirical formula fail to construct a reasonable relationship model among well logging curves, resulting in a relatively low reconstruction accuracy. Although machine learning algorithms are able to find the most appropriate mapping relationship between a large amount of data to improve the model accuracy, the black-box characteristics cannot be well explained.
Methods In this work, support vector regression (SVR), random forest (RF), and eXtreme gradient boosting (XGBoost) are compared with traditional multiple linear regression (LR) to reconstruct the acoustic logging curve of the NDR well 22-30b-11, and the XGBoost model is interpreted based on shapley additive explanations (SHAP) algorithm.
Results Results demonstrate that XGBoost outperforms SVR and RF on the test set, achieving
R 2 of 0.996 and anMSE of 6.371, surpassing SVR, with anR 2 of 0.990 and anMSE of 15.755, and RF, with anR 2 of 0.993 and anMSE of 9.871. In contrast, the LR yields anR 2 of 0.969 and anMSE of 48.895, indicating that XGBoost has higher accuracy and better generalization performance in reconstructing acoustic time difference curves. This paper innovatively adopts the SHAP algorithm to explain the XGBoost black-box model, showing that when important features are selected for model construction, the XGBoost model with formation temperature data is more reasonably than the well logging data with multiple linear regression. Finally, the model is interpreted via SHAP for single-point and global feature interactions.Conclusion Results show that the machine learning algorithm is significantly better than the traditional multiple linear regression for logging curve reconstruction, indicating the feasibility of the SHAP algorithm in the interpretation of machine learning models for logging curve reconstruction, which provides a new idea for the subsequent development of machine learning in logging techniques.
-
Key words:
- logging curves reconstruction /
- machine learning /
- model interpretation /
- SHAP algorithm /
- sonic logging
-
表 1 各特征回归系数值
Table 1. Regression coefficient values for each feature
CALI DRHO FTEMP GR NPHI RHOB RILD RILM 4.21 −1.01 −1.83 7.62 19.93 −10.23 −2.21 1.28 -
[1] 周欣,曹俊兴,王兴建,等. 基于双向门控循环单元神经网络的声波测井曲线重构技术[J]. 地球物理学进展,2022,37(1):357-366. doi: 10.6038/pg2022EE0469ZHOU X,CAO J X,WANG X J,et al. Acoustic log reconstruction based on bidirectional gated recurrent unit (GRU) neural network[J]. Progress in Geophysics,2022,37(1):357-366. (in Chinese with English abstract doi: 10.6038/pg2022EE0469 [2] 李红斌,王贵文,庞小娇,等. 苏北盆地古近系阜宁组页岩工程品质测井评价[J]. 地质科技通报,2023,42(3):311-322.LI H B,WANG G W,PANG X J,et al. Logging evaluation of the engineering quality of the Paleogene Funing Formation oil shales in the Subei Basin[J]. Bulletin of Geological Science and Technology,2023,42(3):311-322. (in Chinese with English abstract [3] 谭礼洪,张国强,谭忠健,等. 利用阵列声波测井资料评价变质岩储层有效性[J]. 石油地球物理勘探,2022,57(6):1464-1472.TAN L H,ZHANG G Q,TAN Z J,et al. Evaluation of metamorphic reservoir effectiveness by array acoustic logging data[J]. Oil Geophysical Prospecting,2022,57(6):1464-1472. (in Chinese with English abstract [4] 李焕然,唐晓明,李盛清,等. 压裂缝的动态渗透性能及其声波测井评价[J]. 石油勘探与开发,2022,49(1):194-202.LI H R,TANG X M,LI S Q,et al. Dynamic fluid transport property of hydraulic fractures and its evaluation using acoustic logging[J]. Petroleum Exploration and Development,2022,49(1):194-202. (in Chinese with English abstract [5] 李军,路菁,李争,等. 页岩气储层“四孔隙”模型建立及测井定量表征方法[J]. 石油与天然气地质,2014,35(2):266-271. doi: 10.11743/ogg20140214LI J,LU J,LI Z,et al. ‘Four-pore’modeling and its quantitative logging description of shale gas reservoir[J]. Oil & Gas Geology,2014,35(2):266-271. (in Chinese with English abstract doi: 10.11743/ogg20140214 [6] 张海涛,杨小明,陈阵,等. 基于增强双向长短时记忆神经网络的测井数据重构[J]. 地球物理学进展. 2022,37(3):1214-1222.ZHANG H T ,YANG X M ,CHEN Z,et al. Log data reconstruction method based on enhanced bidirectional long short-term memory neural network[J]. Progress in Geophysics,2022,37(3):1214-1222. (in Chinese with English abstract [7] 丁阳阳,赵军龙,李兆明,等. 基于XGBoost算法的煤体结构测井识别技术研究[J]. 地球物理学进展,2022,37(3):998-1006. doi: 10.6038/pg2022FF0435DING Y Y,ZHAO J L,LI Z M,et al. Research on logging recognition technology of coal structure based on XGBoost algorithm[J]. Progress in Geophysics,2022,37(3):998-1006. (in Chinese with English abstract doi: 10.6038/pg2022FF0435 [8] 李雄炎,秦瑞宝,刘小梅,等. 多方法对比分析及随钻声波测井曲线的预测[J]. 地球物理学进展,2016,31(3):1131-1138. doi: 10.6038/pg20160328LI X Y,QIN R B,LIU X M,et al. Forecasting LWD acoustic logs based on comparison and analysis of multi-method[J]. Progress in Geophysics,2016,31(3):1131-1138. (in Chinese with English abstract doi: 10.6038/pg20160328 [9] SMITH J H. A method for calculating pseudo sonics from e-logs in a clastic geologic setting[J]. Gulf Coast Association of Geological Societies Transactions,2007,57(1):675-678. [10] GARDNER G,GARDNER L W,GREGORY A R. Formation velocity and density:The diagnostic basics for stratigraphic traps[J]. Geophysics,1974,39(6):770-780. doi: 10.1190/1.1440465 [11] CASTAGNA,J P,BATZLE M L,EASTWOOD R L. Relationships between compressional-wave and shear-wave velocities in clastic silicate rocks[J]. Geophysics,1985,50(4):571-581. doi: 10.1190/1.1441933 [12] 杨怀杰,乔宝强. 基于多元回归模型的拟声波时差测井曲线重构方法研究[J]. 铀矿地质,2021,37(3):500-505.YANG H J,QIAO B Q. Study on reconstruction method of quasi-acoustic time difference log based on multiple regression nodel[J]. Uranium Geology,2021,37(3):500-505. (in Chinese with English abstract [13] ZHANG D,TSAI J. Machine learning and software engineering[J]. Softw. Qual. J.,2003,11:87-119. doi: 10.1023/A:1023760326768 [14] 张家臣,邓金根,谭强,等. 基于XGBoost的测井曲线重构方法[J]. 石油地球物理勘探,2022,57(3):697-705.ZHANG J C,DENG J G,TAN Q,et al. Reconstruction of well logs based on XGBoost[J]. Oil Geophysical Prospecting,2022,57(3):697-705. (in Chinese with English abstract [15] CRANGANU C,BAUTU E. Using gene expression programming to estimate sonic distributions on the natural gamma-ray and deep resistivity logs:A case study from the Anadarko Basin,Oklahoma[J]. J. Pet. Sci. Eng.,2010,70:243-255. doi: 10.1016/j.petrol.2009.11.017 [16] FENG R,GRANA D,BALLING N. Imputation of missing well log data by random forest and its uncertainty analysis[J]. Computers & Geosciences,2021,152:104763. [17] ZHAO X B,CHEN X J,HUANG Q,et al. Logging-data driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization:A case study in Wenchang A Sag,Pearl River Mouth Basin[J]. J. Petrol. Sci. Eng. ,2022,214:110517. [18] 黄发明,胡松雁,闫学涯,等. 基于机器学习的滑坡易发性预测建模及其主控因子识别[J]. 地质科技通报,2022,41(2):79-90.HUANG F M,HU S Y,YAN X Y,et al. Landslide susceptibility prediction and identification of its main environmental factors based on machine learning models[J]. Bulletin of Geological Science and Technology,2022,41(2):79-90. (in Chinese with English abstract [19] 张驰,潘懋,胡水清,等. 融合储层纵向信息的机器学习岩性识别方法[J]. 地质科技通报,2023,42(3):289-299.ZHANG C,PAN M,HU S Q,et al. A machine learning lithologic identification method combined with vertical reservoir information[J]. Bulletin of Geological Science and Technology,2023,42(3):289-299. (in Chinese with English abstract [20] BREIMAN L. Random forests[J]. Machine Learning,2001,45:5-32. doi: 10.1023/A:1010933404324 [21] LI Y,ZOU C,BERECIBAR M,et al. Random forest regression for online capacity estimation of lithiumion batteries[J]. Applied Energy,2018,232:197-210. doi: 10.1016/j.apenergy.2018.09.182 [22] 杨灿,刘磊磊,张遗立,等. 基于贝叶斯优化机器学习超参数的滑坡易发性评价[J]. 地质科技通报,2022,41(2):228-238.YANG C,LIU L L,ZHANG Y L,et al. Machine learning based on landslide susceptibility assessment with Bayesian optimized the hyperparameters[J]. Bulletin of Geological Science and Technology,2022,41(2):228-238. (in Chinese with English abstract [23] LUO S,XU T,WEI S J. Prediction method and application of shale reservoirs core gas content based on machine learning[J]. J. Appl. Geophys.,2022,204:104741. doi: 10.1016/j.jappgeo.2022.104741 [24] PAN S,ZHENG Z,GUO Z,et al. An optimized XGBoost method for predicting reservoir porosity using petrophysical logs[J]. Journal of Petroleum Science and Engineering,2022,208:109520. doi: 10.1016/j.petrol.2021.109520 [25] CHEN T,GUESTRIN C. XGBoost:A scalable tree Boosting system[C]//Anon. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,KDD ’16. Association for Computing Machinery. New York:[S. n. ],2016:85-794. [26] MANGALATHU S,HWANG S H,JEON J S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach[J]. Engineering Structures,2020,219:110927. doi: 10.1016/j.engstruct.2020.110927 [27] LUNDBERG S M,LEE S I. A unified approach to interpreting model predictions[C]//Anon. Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook:[S. n. ] ,2017:4768-4777. [28] FENG D C,WANG W J,MANGALATHU S,et al. Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls[C]. J. Struct. Eng. ,2021,147:04021173. [29] EKANAYAKE I U,MEDDAGE D P P,RATHNAYAKE U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP)[J]. Case studies in construction materials,2022,16:e01059. [30] JAS K,DODAGOUDAR G. Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP[J]. Soil Dyn. Earthq. Eng.,2023,165:107662. doi: 10.1016/j.soildyn.2022.107662 -