Joint 120th Annual Cordilleran/74th Annual Rocky Mountain Section Meeting - 2024

Paper No. 33-5
Presentation Time: 9:20 AM

INTEGRATING UNCERTAINTY INTO THE HYDROTHERMAL RESOURCE FAVORABILITY MAPS FOR THE GREAT BASIN, USA


MORDENSKY, Stanley1, BURNS, Erick1, LIPOR, John J.2 and DEANGELO, Jacob3, (1)U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Portland, OR 97201, (2)U.S. Geological Survey / Portland State University, Portland, OR 97201, (3)U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Moffett Field, CA 94025

We detail a machine learning workflow that predicts hydrothermal upflow and estimates uncertainty imparted by bias resulting from different train-test splits by using XGBoost with Monte Carlo Cross Validation (n = 120). Feature data are extracted from datasets (e.g., strain, gravity) compiled under the INnovative Geothermal Exploration through Novel Investigations Of Undiscovered Systems (INGENIOUS) project. Label data are defined as the difference between heat flow measured at each well in the study area and the background conductive estimate. That difference is estimated to be correlated to nearby convective heat flow, so a difference of zero implies no nearby convection, and larger differences being correlated to large amounts of convective heat flow. Our previous work found that the convective signals contain large outliers (3,083 signals between -91 and 1,000 mW/m2 and 192 signals in the much larger range of 1,000 and 11,105 mW/m2) that bias results for traditional regression; however, these outliers may hold valuable information for large, power-producing hydrothermal systems. Therefore, in order to minimize outlier effects, we discretize the continuous label values into bins for very low, high, and very high signals using optimal bin boundaries identified by preliminary modeling to allow for use with ordinal regression.

Because hydrothermal systems are sparse (approximately less than 0.1% of the total area), an additional criterion on the resulting models is that total area with high convective upflow should be minimized; in this respect, XGBoost with ordinal regression and the binned convective signals better predicts hydrothermal systems than XGBoost with regression and the unbinned convective signals. Well sites with low convective signals have the greatest variance in predictions. We hypothesize that the greater variance of predictions at well sites with low convective signals is a result of the biased sampling of well sites with low convective signals since many of the wells with low convective signals were initially expected to host a hydrothermal system prior to exploration drilling. Presently, we are testing this hypothesis by randomly subsampling unlabeled locations distant from convective wells as having low convective signals to augment the known locations with low convective signals.