Title:

Investigation of the Geothermal Literature Using Machine Learning Algorithms

Authors:

Mohammad (Jabs) ALJUBRAN, Alaa ALAHMED, Ahmed ALKHALIFAH, Matt HALL

Key Words:

machine learning, natural language processing, literature

Conference:

Stanford Geothermal Workshop

Year:

2022

Session:

General

Language:

English

Paper Number:

Aljubran

File Size:

959 KB

View File:

Abstract:

With the globally growing volume of geothermal literature, data analysis has become useful to advance professional and academic research and development efforts. Furthermore, it is essential to leverage state-of-the-art algorithms to develop useful tools based on existing databases. This work utilized statistical and deep learning techniques to draw insights based on the geothermal literature. We scraped the International Geothermal Association (IGA) database using the Stanford University search engine. We gathered and preprocessed all 21,388 publications archived in this database, where headers included publication title, authors, year, keywords, abstract, language, conference, and session type. Analysis shows that the three geothermal events with the largest volume of publications historically are the Geothermal Resources Council Transactions, World Geothermal Congress, and Stanford Geothermal Workshop. Using natural language processing (NLP) techniques, we “geoparsed” each abstract to figure out what location in geographical coordinates it is concerned about. This allowed for developing an interactive world heatmap showing the focus of geothermal research efforts historically. Latent Dirichlet Allocation (LDA) was used to cluster the geothermal literature into a total of nine topics. We also developed a geothermal literature intelligent search engine using term frequency—inverse document frequency (TF-IDF) and cosine similarity. Preprocessing the “authors” data, we developed a coauthorship graphical network encompassing researchers within the geothermal community and reflecting the level of collaboration between them. Finally, a deep learning model was developed to perform text generation and auto-completion using the state-of-the-art generative pretrained transformers (GPT-2) fine-tuned to the geothermal literature. We conclude this paper by introducing an open-source application programming interface (API) demonstrating and offering these insights and tools for public use. This live API is designed to continuously read from the IGA Stanford University search engine to ensure up-to-date results. You may access this API at http://steaming-geothermal-analytics.info.


ec2-3-91-106-157.compute-1.amazonaws.com, you have accessed 0 records today.

Press the Back button in your browser, or search again.

Copyright 2022, Stanford Geothermal Program: Readers who download papers from this site should honor the copyright of the original authors and may not copy or distribute the work further without the permission of the original publisher.


Attend the nwxt Stanford Geothermal Workshop, click here for details.

Accessed by: ec2-3-91-106-157.compute-1.amazonaws.com (3.91.106.157)
Accessed: Friday 29th of March 2024 08:40:33 AM