Reconstructing Sonic Well Log Curves based on Machine Learning and Analysis of Model Interpretability
-
摘要: (目的)测井技术是查明地下岩性、地层及地质流体的关键技术手段,在石油勘探行业中发挥着至关重要的作用。然而,由于仪器损坏、井眼条件等因素,经常造成测井数据缺失、曲线不全等问题,传统多元线性回归或经验公式方法无法合理地构建测井曲线间的关系模型使得曲线重构精度相对较低,机器学习算法虽能在大量数据之间找到最为合适的数据映射关系进而提高了模型精度,但相较而言其所构建的黑箱模型无法得到良好的解释。近期,可解释性算法的运用使得机器学习在重构测井曲线中的应用更为合理。(方法)本文通过将Support Vector Regression(SVR),Random Forest(RF)以及eXtreme Gradient Boosting(XGBoost)和传统多元线性回归方法(Linear Regression,简称LR)对比对英国能源局22-30b-11号井声波测井曲线进行模型重构并基于Shapley Additive Explanations(SHAP)算法对XGBoost模型进行解释。(结果)结果表明,XGBoost在测试集上的R2和MSE分别为0.996、6.371优于SVR的0.990、15.755和RF的0.993、9.871,而传统多元线性回归方法则为0.969、48.895,表明XGBoost对声波时差曲线的重构具有更高的准确度和更好的泛化性能。本文创新性的采用SHAP算法对XGBoost黑箱模型进行解释,表明在模型构建选择重要特征时,XGBoost模型采用地层温度数据作为特征明显合理于多元线性回归方案采用的井径测井数据。最后基于SHAP对模型进行单点和全局特征交互解释。(结论)上述结果表明在声波测井曲线重构方面,机器学习算法明显优于传统的多元线性回归方法,并证明了SHAP算法在声波测井曲线重构机器学习模型解释方面的可行性,为后续机器学习在测井解释中的发展提供了新的思路。
-
关键词:
- 关键词:测井曲线重构 /
- 机器学习 /
- 模型解释 /
- SHAP
Abstract: Abstract: (Objective)Well logging technology is a critical means for determining subsurface lithological characteristics and geological structures, playing a pivotal role in the petroleum exploration industry. However, issues such as instrument damage and wellbore conditions frequently lead to data gaps and incomplete curves in well logging. Traditional multivariate linear regression and empirical formula methods fail to construct a reasonable relationship model among well logging curves, resulting in relatively low curve reconstruction accuracy. Although machine learning algorithms are able to find the most appropriate data mapping relationship between a large amount of data and thus improve the model accuracy, the black-box model constructed by them cannot be well explained in comparison. (Methods) In this paper, the Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost) are used to compare with the traditional Multiple Linear Regression (LR) to reconstruct the acoustic logging curve of the NDR well 22-30b-11 and the XGBoost model is interpreted based on the Shapley Additive Explanations (SHAP) algorithm. (Results)The results demonstrate that XGBoost outperforms SVR and RF on the test set, achieving R2 values of 0.996 and MSE of 6.371, surpassing SVR with R2 of 0.990 and MSE of 15.755, and RF with R2 of 0.993 and MSE of 9.871. In contrast, the LR method yields an R2 of 0.969 and MSE of 48.895, indicating that XGBoost exhibits higher accuracy and better generalization performance in reconstructing acoustic time difference curves. This paper innovatively adopts the SHAP algorithm to explain the XGBoost black-box model, showing that when selecting important features for model construction, the XGBoost model adopts formation temperature data as important features significantly more reasonable than the well logging data adopted by the multiple linear regression scheme. Finally the model is interpreted based on SHAP for single point and global feature interactions. (Conclusion)The above results show that the machine learning algorithm is significantly better than the traditional multiple linear regression method in logging curve reconstruction, and prove the feasibility of SHAP algorithm in the interpretation of machine learning model for logging curve reconstruction, which provides a new idea for the subsequent development of machine learning in logging technology.-
Key words:
- Key words: log curves reconstruction /
- Machine learning /
- Model interpretation /
- SHAP
点击查看大图
计量
- 文章访问数: 325
- PDF下载量: 67
- 被引次数: 0