Title:

Improving the Quality of Geothermal Data Through Data Standards and Pipelines Within the Geothermal Data Repository

Authors:

Nicole TAVERNA, Jon WEERS, Jay HUGGINS, Sean PORSE, Arlene ANDERSON, Zachary FRONE, RJ SCAVO

Key Words:

GDR, data, standardization, pipelines, data science, machine learning

Conference:

Stanford Geothermal Workshop

Year:

2023

Session:

General

Language:

English

Paper Number:

Taverna

File Size:

898 KB

View File:

Abstract:

For machine learning outputs to be applicable to real-world problems, high-quality data are needed to ensure high-quality results. With recent emphasis on machine learning in geothermal, there is an increasing need for greater focus on the quality of the data available for use in these projects. High-quality datasets result from dependable sensors or devices collecting data, high frequency of measurements, sufficient data points, adequate metadata, reliable storage of data, and sufficient data curation. Another component that contributes to high-quality data is reusability, which can be enhanced through data standardization. Data standardization creates consistency in formatting and contents of like datasets, lessening preprocessing requirements and ensuring adequate information provided by a given dataset. The Geothermal Data Repository (GDR)—which houses data from research funded by the U.S. Department of Energy Geothermal Technologies Office—aims to help improve data quality through automated data standardization for high-value datasets through the implementation of data pipelines alongside reliable and accessible long-term storage for datasets. As such, the GDR has decided to shift away from recommending the use of Excel-based content models and toward the implementation of automated data pipelines. This takes the burden of data standardization off the user and project team and will increase the availability of standardized geothermal data available through the GDR. A set of recommendations, or a data standard for each data type, will exist with each data pipeline in order to advise data collection for maximum usability for future research. This paper describes the GDR’s proposed transition toward data standardization through automated data pipelines, discusses the need for and value of such a shift, and calls for suggestions from the community regarding the most useful data standards and pipelines.


ec2-35-174-62-162.compute-1.amazonaws.com, you have accessed 0 records today.

Press the Back button in your browser, or search again.

Copyright 2023, Stanford Geothermal Program: Readers who download papers from this site should honor the copyright of the original authors and may not copy or distribute the work further without the permission of the original publisher.


Attend the nwxt Stanford Geothermal Workshop, click here for details.

Accessed by: ec2-35-174-62-162.compute-1.amazonaws.com (35.174.62.162)
Accessed: Monday 15th of April 2024 05:33:41 AM