Citation: | PENG Bin, TIAN Yiping, ZENG Bin, WU Xuechao, WU Wenming. Recognition and application of geological entities related to ore-forming conditions in the Kaiyang phosphate mine based on the XLNET model[J]. Bulletin of Geological Science and Technology, 2024, 43(4): 224-234. doi: 10.19509/j.cnki.dzkq.tb20230543 |
With increasing difficulty in phosphate ore prospecting, there are an increasing number of geological exploration reports. The manual recognition of geological information related to phosphate rock mineralization in massive documents is time-consuming and inefficient. It cannot meet the needs of knowledge sharing, dissemination and intelligent management of geological reports.
To quickly obtain the ore-forming geological knowledge hidden in the phosphate ore reports, this work intends to establish an automatic recognition method for ore-forming geological entities based on the extreme learning machine network(XLNET) model. First, BIO labelling of entities was carried out to establish a geological entity dictionary, and XLNET was used as the underlying preprocessing model to learn the bidirectional semantics of sentences. Then, the BILSTM-Attention-CRF(bidirectional long short term memory(BILSTM)-self attention layer(Attention)-conditional random field(CRF)) model was used to realize intelligent classification of multiple text labels. Finally, the ore-forming conditions and ore-forming model of phosphate ore in the reports were roughly predicted by locating the distribution position of phosphate ore entities in the report.
Comparing this model with the other three models, these results show that the accuracy rate, recall rate and F1 value of this model are close to 90%, which are 2%, 5% and 6% higher than those of the previous three models, respectively.
This study provides a more efficient method for automatic geological entity recognition for geological researchers in the Kaiyang phosphate mine.
[1] |
JIN G Z, LIN Y F, XIAO G M, et al. SEM+: Tool for discovering concept mapping in earth science related domain[J]. Earth Science Informatics, 2015, 8(1): 95-102. doi: 10.1007/s12145-014-0203-1
|
[2] |
LIANG W, LEI X, CHAO L L, et al. A knowledge-driven geospatially enabled framework for geological big data[J]. ISPRS International Journal of Geo-Information, 2017, 6(6): 166-166. doi: 10.3390/ijgi6060166
|
[3] |
QIN J Q, XIE Z, WU L, et al. BiLSTM-CRF for geological named entity recognition from the geoscience literature[J]. Earth Science Informatics, 2019, 12(4): 565-579. doi: 10.1007/s12145-019-00390-3
|
[4] |
QIN J Q, ZHONG X, LIANG W, et al. GNER: A generative model for geological named entity recognition without labeled data using deep learning[J]. Earth and Space Science, 2019, 6(6): 931-946. doi: 10.1029/2019EA000610
|
[5] |
储德平, 万波, 李红, 等. 基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别[J]. 地球科学, 2021, 46(8): 3039-3048. https://www.cnki.com.cn/Article/CJFDTOTAL-DQKX202108028.htm
CHU D P, WAN B, LI H, et al. Geological entity recognition based on ELMO-CNN-BiLSTM-CRF model[J]. Earth Science, 2021, 46(8): 3039-3048. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DQKX202108028.htm
|
[6] |
邱芹军. 基于地质报告文本的时空及主题提取关键技术研究[D]. 武汉: 中国地质大学(武汉), 2020.
QIU Q J. Rsearch on the key technologies of spatio-temporal and topic extraction based on geological report text[D]. Wuhan: China University of Geosciences (Wuhan), 2020. (in Chinese with English abstract)
|
[7] |
吴冲龙, 刘刚, 周琦, 等. 地质科学大数据统合应用的基本问题[J]. 地质科技通报, 2020, 39(4): 1-11. doi: 10.19509/j.cnki.dzkq.2020.0401
WU C L, LIU G, ZHOU Q, et al. Fundamental problems of intergrated application of big data in geoscience[j]. Bulletin of Geological Science and Technology, 2020, 39(4): 1-11. (in Chinese with English abstract) doi: 10.19509/j.cnki.dzkq.2020.0401
|
[8] |
张雪英, 叶鹏, 王曙, 等. 基于深度信念网络的地质实体识别方法[J]. 岩石学报, 2018, 34(2): 343-351. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXB201802011.htm
ZHANG X Y, YE P, WANG S, et al. Geological entity recognition method based on deep belief networks[J]. Acta Petrologica Sinica, 2018, 34(2): 343-351. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXB201802011.htm
|
[9] |
马凯. 地质大数据表示与关联关键技术研究[D]. 武汉: 中国地质大学(武汉), 2018.
MA K. Research on the key technologies of geological big data representation and association[D]. Wuhan: China University of Geosciences (Wuhan), 2018. (in Chinese with English abstract)
|
[10] |
谢雪景, 谢忠, 马凯, 等. 结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别[J]. 地质通报, 2023, 42(5): 846-855. https://www.cnki.com.cn/Article/CJFDTOTAL-ZQYD202305014.htm
XIE X J, XIE Z, MA K, et al. Geological named entity recognition combined BERT and BiGRU-Attention-CRF model[J]. Geological Bulletin of China, 2023, 42(5): 846-855. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-ZQYD202305014.htm
|
[11] |
张春菊, 张磊, 陈玉冰, 等. 基于BERT的交互式地质实体标注语料库构建方法[J]. 地理与地理信息科学, 2022, 38(4): 7-12. https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT202204002.htm
ZHANG C J, ZHANG L, CHEN Y B, et al. BERT-based interactive geological entity annotation corpus construction method[J]. Geography & Geographic Information Science, 2022, 38(4): 7-12. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DLGT202204002.htm
|
[12] |
王刘坤, 李功权. 基于GeoERNIE-BiLSTM-Attention-CRF模型的地质命名实体识别[J]. 地质科学, 2023, 58(3): 1164-1177. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKX202303022.htm
WANG L K, LI G Q. Geological named entity recognition based on GeoERNIE-BiLSTM-Attention-CRF model[J]. Chinese Journal of Geology, 2023, 58(3): 1164-1177. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-DZKX202303022.htm
|
[13] |
ZHAO H, HUANG C N, MU L, et al. An improved Chinese word segmentation system with conditional random field[C]//Anon. Proceedings of the fifth sighan workshop on Chinese language processing. 2006: 108-117.
|
[14] |
梁坤萍, 程国繁, 覃庆炎, 等. 贵州织金新华磷矿区风化磷块岩形成条件及风化淋滤富集机制初步研究[J]. 地质科技通报, 2022, 41(4): 172-183. doi: 10.19509/j.cnki.dzkq.2022.0110
LIANG K P, CHENG G F, QING Q Y, et al. A preliminary study on the formation conditions and weathering leaching enrichment mechanism of secondary phosphorite in the Xinhua phosphate mining area, Zhijin, Guizhou[J]. Bulletin of Geological Science and Technology, 2022, 41(4): 172-183. (in Chinese with English abstract) doi: 10.19509/j.cnki.dzkq.2022.0110
|
[15] |
程国繁, 何英. 贵州册亨板其风化型磷矿成矿条件与成矿模式[J]. 矿物学报, 2016, 36(2): 189-197. https://www.cnki.com.cn/Article/CJFDTOTAL-KWXB201602004.htm
CHEN G F, HE Y. A preliminary study on ore-forming conditions and its model for Banqi secondary phosphate deposit, Ceheng County, Guizhou Province, China[J]. Acta Mineralogica Sinica, 2016, 36(2): 189-197. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-KWXB201602004.htm
|
[16] |
张亚冠, 杜远生, 陈国勇, 等. 富磷矿三阶段动态成矿模式: 黔中开阳式高品位磷矿成矿机制[J]. 古地理学报, 2019, 21(2): 351-368. https://www.cnki.com.cn/Article/CJFDTOTAL-GDLX201902011.htm
ZHANG Y G, DU Y S, CHEN G Y, et al. Three stages dynamic mineralization model of the phosphate-rich deposits: Mineralization mechanism of the Kaiyang-type high-grade phosphorite in central Guizhou Province[J]. Journal of Palaeogeography, 2019, 21(2): 351-368. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-GDLX201902011.htm
|
[17] |
姚贵斌, 张起贵. 基于XLnet语言模型的中文命名实体识别[J]. 计算机工程与应用, 2021, 57(18): 156-162. https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202118019.htm
YAO G B, ZHANG Q G. Chinese named entity recognition based on XLnet language model[J]. Computer Engineering and Applications, 2021, 57(18): 156-162. (in Chinese with English abstract) https://www.cnki.com.cn/Article/CJFDTOTAL-JSGG202118019.htm
|
[18] |
WANG C B, LI Y J, CHEN J G, et al. Named entity annotation schema for geological literature mining in the domain of porphyry copper deposits[J]. Ore Geology Reviews, 2022, 152: 105243.
|
[19] |
王龙辉, 剡鹏兵, 焦养泉, 等. 鄂尔多斯盆地北部下白垩统铀成矿模式[J]. 地质科技通报, 2023, 42(3): 222-233. doi: 10.19509/j.cnki.dzkq.2022.0096
WANG L H, YAN P B, JIAO Y Q, et al. Uranium metallogenic model of the Lower Cretaceous in the northern Ordos Basin[J]. Bulletin of Geological Science and Technology, 2023, 42(3): 222-233. (in Chinese with English abstract) doi: 10.19509/j.cnki.dzkq.2022.0096
|