GSA Connects 2023 Meeting in Pittsburgh, Pennsylvania

Paper No. 24-4
Presentation Time: 8:00 AM-5:30 PM

LOCATING LARGE HYDROTHERMAL SYSTEMS


MORDENSKY, Stanley P.1, BURNS, Erick R.1, LIPOR, John J.2 and DEANGELO, Jacob3, (1)U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Portland, OR 97201, (2)Electrical & Computer Engineering, Portland State University, Portland, OR 97201, (3)U.S. Geological Survey, Geology, Minerals, Energy, and Geophysics Science Center, Moffett Field, CA 94025

We detail a machine learning workflow using two algorithms (linear regression and XGBoost) to predict hydrothermal upflow in the Great Basin. Feature data are extracted from datasets supporting the INnovative Geothermal Exploration through Novel Investigations Of Undiscovered Systems (INGENIOUS) project. Label data are extracted from measured thermal gradient wells by comparing the measured heat flow at the well to the estimated conductive heat flow. We term that difference as reported convective signal, which is assumed to be due to convective heat flow, with a larger reported convective signal corresponding to greater hydrothermal convective upflow. The reported convective signals contain outliers that may affect regression (3,083 signals between -91 and 1,000 mW/m2 and 192 signals in the much larger range of 1,000 and 11,105 mW/m2), so the influence of outliers is tested by constructing models for two cases: 1) using all the data, and 2) truncating the range of signals to only -25 to 200 mW/m2.

Early results demonstrate that XGBoost outperforms linear regression. Because hydrothermal systems are sparse, models that predict high convective signal in smaller areas better match the natural frequency of hydrothermal systems. For XGBoost using the truncated range of labels, half of the high reported signals are within < 3 % of the highest predictions. For XGBoost using the entire range of labels, half of the high reported signals are within < 13 % of the highest predictions. While this implies that the truncated regression is superior, the all-data model better predicts the locations of power-producing systems (the operating power plants are in a smaller fraction of the study area given by the highest predictions).

Even though the models reliably predict relatively higher for higher reported signals and lower for lower reported signals, both XGBoost models consistently underpredict the magnitude of higher signals. This behavior is attributed to low granularity of input features compared with the scale of a hydrothermal upflow zone (a few km or less across). Trouble estimating exact values while still reliably predicting high versus low convective signals suggests that a future strategy such as ranked ordinal regression (classifying into ordered bins for low, medium, high, very high signals) might produce better models.