Title: |
Investigation of the Geothermal Literature Using Machine Learning Algorithms |
Authors: |
Mohammad (Jabs) ALJUBRAN, Alaa ALAHMED, Ahmed ALKHALIFAH, Matt HALL |
Key Words: |
machine learning, natural language processing, literature |
Conference: |
Stanford Geothermal Workshop |
Year: |
2022 |
Session: |
General |
Language: |
English |
Paper Number: |
Aljubran |
File Size: |
959 KB |
View File: |
|
With the globally growing volume of geothermal literature, data analysis has become useful to advance professional and academic research and development efforts. Furthermore, it is essential to leverage state-of-the-art algorithms to develop useful tools based on existing databases. This work utilized statistical and deep learning techniques to draw insights based on the geothermal literature. We scraped the International Geothermal Association (IGA) database using the Stanford University search engine. We gathered and preprocessed all 21,388 publications archived in this database, where headers included publication title, authors, year, keywords, abstract, language, conference, and session type. Analysis shows that the three geothermal events with the largest volume of publications historically are the Geothermal Resources Council Transactions, World Geothermal Congress, and Stanford Geothermal Workshop. Using natural language processing (NLP) techniques, we “geoparsed” each abstract to figure out what location in geographical coordinates it is concerned about. This allowed for developing an interactive world heatmap showing the focus of geothermal research efforts historically. Latent Dirichlet Allocation (LDA) was used to cluster the geothermal literature into a total of nine topics. We also developed a geothermal literature intelligent search engine using term frequency—inverse document frequency (TF-IDF) and cosine similarity. Preprocessing the “authors” data, we developed a coauthorship graphical network encompassing researchers within the geothermal community and reflecting the level of collaboration between them. Finally, a deep learning model was developed to perform text generation and auto-completion using the state-of-the-art generative pretrained transformers (GPT-2) fine-tuned to the geothermal literature. We conclude this paper by introducing an open-source application programming interface (API) demonstrating and offering these insights and tools for public use. This live API is designed to continuously read from the IGA Stanford University search engine to ensure up-to-date results. You may access this API at http://steaming-geothermal-analytics.info.
Press the Back button in your browser, or search again.
Copyright 2022, Stanford Geothermal Program: Readers who download papers from this site should honor the copyright of the original authors and may not copy or distribute the work further without the permission of the original publisher.
Attend the nwxt Stanford Geothermal Workshop,
click here for details.