Title:

Data Curation for Machine Learning Applied to Geothermal Power Plant Operational Data for GOOML: Geothermal Operational Optimization with Machine Learning

Authors:

Nicole TAVERNA, Grant BUSTER, Jay HUGGINS, Michael ROSSOL, Paul SIRATOVICH, Jon WEERS, Andrea BLAIR, Christine SIEGA, Warren MANNINGTON, Alex URGEL, Jonathan CEN, Jaime QUINAO, Robbie WATT, John AKERLEY

Key Words:

data curation, machine learning, data-centric AI, data pipeline, GOOML, power plant operations

Conference:

Stanford Geothermal Workshop

Year:

2022

Session:

General

Language:

English

Paper Number:

Taverna

File Size:

417 KB

View File:

Abstract:

Geothermal Operational Optimization with Machine Learning (GOOML) is a transferable and extensible component-based geothermal asset modeling framework that considers complex steamfield relationships and identifies optimization prospects using a data-driven approach to physics-guided, data-centric machine learning. This framework has been used to develop digital twins that provide steamfield operators with operational environments to analyze and understand historical and forecasted power production, explore new steamfield configuration possibilities, and seek optimal asset management in real world applications. To create, test, and apply the GOOML framework, diverse time-series datasets spanning multiple years were sourced from various geothermal power plant components within several complex real-world geothermal operations. These operations are based in the United States and New Zealand and include a variety of technologies, end-uses and configurations, collectively covering nearly all relevant operating conditions for modern geothermal fields. Datasets were acquired from multiple sources to ensure that machine learning experiments generalized properly to various operating conditions. It was found that the data varied in quality, format, and completeness. To ensure consistency between the various datasets, a standardized data curation process was developed to reliably streamline data preparation. This paper will discuss best practices as learned from the GOOML data curation process which takes the following steps: 1) acquisition of large quantities of data from power plant operators, 2) digestion of data to gain an initial understanding of what is included, 3) data transformation, which includes converting the data into a standardized machine-readable format so that they can be visualized, quality checked, and cleaned, 4) quality assurance and quality control, involving identification of significant data gaps and apparent anomalies through mapping of data features to real world componentry via the GOOML historical model, followed by discussion with modelers and power plant operators to identify additional data needs and to resolve issues, 5) use in machine learning algorithms, and 6) repetition of steps one through five until all data needs are met and data are deemed suitable for producing trustworthy modeling results which may be disseminated, ideally along with the curated dataset. This iterative process is focused on improving the quality of the data rather than tuning machine learning model parameters and supports a shift towards a more data-centric philosophy as a means for improving real-world applicability of geothermal machine learning projects.


ec2-18-116-118-198.us-east-2.compute.amazonaws.com, you have accessed 0 records today.

Press the Back button in your browser, or search again.

Copyright 2022, Stanford Geothermal Program: Readers who download papers from this site should honor the copyright of the original authors and may not copy or distribute the work further without the permission of the original publisher.


Attend the nwxt Stanford Geothermal Workshop, click here for details.

Accessed by: ec2-18-116-118-198.us-east-2.compute.amazonaws.com (18.116.118.198)
Accessed: Friday 19th of April 2024 03:24:40 AM