WHAT MATTERS MOST? MEASURING FEATURE IMPORTANCE FOR GEOTHERMAL RESOURCES USING SUPERVISED LEARNING

Mordensky, Stanley

Paper No. 187-8

Presentation Time: 3:45 PM

WHAT MATTERS MOST? MEASURING FEATURE IMPORTANCE FOR GEOTHERMAL RESOURCES USING SUPERVISED LEARNING

MORDENSKY, Stanley, U.S. Geological Survey, 2130 SW 5th Ave, Portland, OR 97201, LIPOR, John, Electrical & Computer Engineering, Portland State University, Portland, OR 97201, DEANGELO, Jacob, U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Moffett Field, CA 94025, BURNS, Erick, U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Portland, OR 97201 and LINDSEY, Cary R., Geology, Minerals, Energy, Geophysics Science Center, U.S. Geological Survey, 2130 SW 5th Ave., Portland, OR 97201

Recent evaluation of strategies for conventional hydrothermal resource assessment in the United States has relied upon machine learning methods (i.e., logistic regression, SVMs, XGBoost, and multilayer perceptron neural networks [i.e., MLPs]) to predict resource favorability using features (i.e., heat flow, distance to faults, distance to magma bodies, maximum horizontal strain, and seismic event density) from the U.S. Geological Survey’s 2008 Geothermal Resource Assessment. Two of the machine learning algorithms (i.e., SVMs and MLPs) must rely on model-agnostic measures of feature importance (i.e., measures of feature importance that are applicable regardless of an algorithm’s conceptual framework; e.g., sensitivity analyses and SHapely Additive exPlanation [SHAP] values), while the other two machine learning algorithms also offer straightforward, model-gnostic (i.e., algorithm-specific) measures to interpret the relative contributions of features on favorability predictions (i.e., feature coefficients for logistic regression, weight, gain, cover, and F score for XGBoost). Relative feature importance is measured for all machine learning algorithms using the model-agnostic measures, and, when possible, model-gnostic measures are shown for comparison.

The various measures of feature importance, in general, are consistent with each other. Heat flow and distance to faults are commonly identified as the most important features and strain and seismicity as the least important features in the majority of the models produced by the machine learning algorithms. However, some measures of feature importance deviate from this behavior when used with some machine learning algorithms (e.g., sensitivity analysis by F1 score or SHAP values with ensembled SVMs). We discuss these differences between measures of feature importance across the machine learning algorithms and what these differences may represent in the context of predicting geothermal resources. In so doing, we caution that the rankings from some measures of feature importance are considerably unlike the rankings from other measures. Hence, we demonstrate that the evaluation of feature importance benefits from a multi-measure approach that permits higher confidence when the rankings are consistent across different measures.

Session No. 187

T53. Mathematics, Statistics, and Machine Learning in the Geosciences. How Can We Solve Today’s Challenges through Data Mining and Artificial Intelligence?

Tuesday, 11 October 2022: 1:30 PM-5:30 PM

502 (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol 54, No. 5
doi: 10.1130/abs/2022AM-376621

© Copyright 2022 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T53. Mathematics, Statistics, and Machine Learning in the Geosciences. How Can We Solve Today’s Challenges through Data Mining and Artificial Intelligence?

<< Previous Abstract | Next Abstract >>

GSA Connects 2022 meeting in Denver, Colorado

WHAT MATTERS MOST? MEASURING FEATURE IMPORTANCE FOR GEOTHERMAL RESOURCES USING SUPERVISED LEARNING