IMPERFECT DATA IN, IMPERFECT MODEL OUT: USING COMPETING MODELS TO DECIDE IF WE HAVE THE RIGHT DATA

Mordensky, Stanley

Paper No. 172-1

Presentation Time: 9:00 AM-1:00 PM

IMPERFECT DATA IN, IMPERFECT MODEL OUT: USING COMPETING MODELS TO DECIDE IF WE HAVE THE RIGHT DATA

MORDENSKY, Stanley, U.S. Geological Survey, 2130 SW 5th Ave, Portland, OR 97201, LIPOR, John, Electrical & Computer Engineering, Portland State University, Portland, OR 97201, DEANGELO, Jacob, U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Moffett Field, CA 94025, BURNS, Erick, U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Portland, OR 97201 and LINDSEY, Cary R., Geology, Minerals, Energy, Geophysics Science Center, U.S. Geological Survey, 2130 SW 5th Ave., Portland, OR 97201

Previous geothermal resource assessments of the western U.S. utilized data-driven methods (i.e., weight-of-evidence and logistic regression) to estimate resource favorability, but these analyses relied upon some non-ideal approaches for data science (i.e., expert decisions). Although expert decisions can add confidence to aspects of the modeling process by ensuring seemingly reasonable models are employed, expert decisions also introduce human bias, which presents a potential source of error that may affect model performance.

To facilitate comparison of methods, we use the same data from the 2008 geothermal resource assessment (e.g., heat flow, horizontal stress) to train models from modern machine learning algorithms (i.e., logistic regression, eXtreme Gradient Boosting, support vector machines, and multilayer perceptron neural networks), which minimize dependence upon expert decisions. While some algorithms are simple (e.g., logistic regression), other algorithms are highly sophisticated (e.g., the neural network). Despite the contrast in complexity, the results from the very simple and highly complex algorithms are similar. In fact, the most complex machine learning model results (i.e., from the neural network) appear to be more similar to the simplest machine learning algorithm (i.e., logistic regression) than either of the models resulting from the expert decisions in the 2008 assessment, indicating human bias influenced estimates away from a machine-driven optimum.

The similarity of the models produced by the spectrum of the machine-learning algorithms is a direct result of the simplicity (and perhaps inadequacy) of the feature data. The feature data used in the 2008 geothermal resource assessment were imperfect approximations of geological conditions (e.g., heat flow and stress were interpolated and informed by measurements only sparsely available for some regions of the U.S.). These results demonstrate that there are not complex patterns within the previous data that can be mined by more sophisticated machine learning, indicating a fundamental limitation of the data previously used for identification of geothermal resource favorability. That is, the most important part of the machine learning workflow, the data, needs to be sufficient to make reliable predictions.

Session No. 172--Booth# 147

T13. The Legacy of Kenneth L. Pierce: Interdisciplinary Studies along the Track of the Yellowstone Hotspot and Beyond (Posters)

Tuesday, 11 October 2022: 9:00 AM-1:00 PM

Exhibit Hall F (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol 54, No. 5
doi: 10.1130/abs/2022AM-377146

© Copyright 2022 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T13. The Legacy of Kenneth L. Pierce: Interdisciplinary Studies along the Track of the Yellowstone Hotspot and Beyond (Posters)

Previous Abstract | Next Abstract >>

GSA Connects 2022 meeting in Denver, Colorado

IMPERFECT DATA IN, IMPERFECT MODEL OUT: USING COMPETING MODELS TO DECIDE IF WE HAVE THE RIGHT DATA