Title:

Predicting Geothermal Favorability in the Western United States by Using Machine Learning: Addressing Challenges and Developing Solutions

Authors:

Stanley MORDENSKY, John LIPOR, Jacob DEANGELO, Erick BURNS, Cary LINDSEY

Key Words:

machine learning, exploration, hydrothermal, class imbalance, resource favorability

Conference:

Stanford Geothermal Workshop

Year:

2022

Session:

Modeling

Language:

English

Paper Number:

Mordensky

File Size:

2309 KB

View File:

Abstract:

Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized weight-of-evidence and logistic regression methods to estimate resource favorability, but these analyses relied upon some expert decisions. While expert decisions can add confidence to aspects of the modeling process by ensuring only reasonable models are employed, expert decisions also introduce human bias into assessments. This bias presents a source of error that may affect the performance of the models and resulting resource estimates. Our study aims to reduce expert input through robust data-driven analyses and better-suited data science techniques, with the goals of saving time, reducing bias, and improving predictive ability. We present six favorability maps for geothermal resources in the western United States created using two strategies applied to three modern machine learning algorithms (logistic regression, support-vector machines, and XGBoost). To provide a direct comparison to previous assessments, we use the same input data as the 2008 U.S. Geological Survey (USGS) conventional moderate- to high-temperature geothermal resource assessment. The six new favorability maps required far less expert decision-making, but broadly agree with the previous assessment. Despite the fact that the 2008 assessment results employed linear methods, the non-linear machine learning algorithms (i.e., support-vector machines and XGBoost) produced greater agreement with the previous assessment than the linear machine learning algorithm (i.e., logistic regression). It is not surprising that geothermal systems depend on non-linear combinations of features, and we postulate that the expert decisions during the 2008 assessment accounted for system non-linearities. Substantial challenges to applying machine learning algorithms to predict geothermal resource favorability include severe class imbalance (i.e., there are very few known geothermal systems compared to the large area considered), and while there are known geothermal systems (i.e., positive labels), all other sites have an unknown status (i.e., they are unlabeled), instead of receiving a negative label (i.e., the known/proven absence of a geothermal resource). We address both challenges through a custom undersampling strategy that can be used with any algorithm and then evaluated using F1 scores.


ec2-3-142-174-55.us-east-2.compute.amazonaws.com, you have accessed 0 records today.

Press the Back button in your browser, or search again.

Copyright 2022, Stanford Geothermal Program: Readers who download papers from this site should honor the copyright of the original authors and may not copy or distribute the work further without the permission of the original publisher.


Attend the nwxt Stanford Geothermal Workshop, click here for details.

Accessed by: ec2-3-142-174-55.us-east-2.compute.amazonaws.com (3.142.174.55)
Accessed: Friday 19th of April 2024 07:17:23 PM