APPLYING DATA-DRIVEN MACHINE LEARNING TO GEOTHERMAL FAVORABILITY, WESTERN UNITED STATES

Mordensky, Stanley

Paper No. 170-9

Presentation Time: 3:50 PM

APPLYING DATA-DRIVEN MACHINE LEARNING TO GEOTHERMAL FAVORABILITY, WESTERN UNITED STATES

MORDENSKY, Stanley¹, LIPOR, John², DEANGELO, Jacob³, BURNS, Erick R.¹ and LINDSEY, Cary¹, (1)U.S. Geological Survey, 2130 SW 5th Ave., Portland, OR 97201, (2)Electrical & Computer Engineering, Portland State University, Portland, OR 97201, (3)U.S. Geological Survey, MS989, 345 Middlefield Road, Menlo Park, CA 94025

We demonstrate that modern machine-learning methods and data-science strategies can be used to reproduce essential findings and potentially improve on past geothermal energy assessments while relying less on expert input into the process. This study demonstrates that two foundational machine learning algorithms (logistic regression and XGBoost), implemented using unbiased data analysis strategies, agree with previous studies that relied much more heavily on expert-systems knowledge. The linear method we use, logistic regression, conforms well with the binned logistic regression and weights-of-evidence approaches used for the 2008 USGS conventional-hydrothermal, resource-favorability maps. The non-linear XGBoost provides an alternate interpretation that broadly agrees and may provide increased granularity in favorability maps.

To provide a direct comparison, we use the same input data from the 2008 conventional-hydrothermal, resource-favorability study to create new favorability maps. This 2008 study relied upon methods that required input data to be binned when creating maps of geothermal favorability, thereby requiring bin-value exploration and selection and, consequently, human-made decisions (e.g., bin quantity, bin limits). Our study presents probability maps for the western US created using modern, data-driven strategies (i.e., no expert choices in the algorithmic application) in an effort to remove human bias and minimize the considerable effort of the expert in creating resource maps. During the analysis, two overarching challenges were identified: 1) the training data have only positive examples (i.e., known hydrothermal systems) and unlabeled examples (comprised of negative [i.e., no hydrothermal system present] and unidentified positive examples) and 2) extreme class imbalance (estimated to have approximately a 1 : 2600 positive-example : unlabeled-example ratio). To address challenge number 1), unsupervised clustering of features was used to identify groups of likely true negative examples, and these likely true negative examples and the known positives were then sampled proportionally for use with the supervised methods. To address challenge number 2), a customized oversampling training strategy was selected for creating a reliable classifier.

Recorded Presentation

Session No. 170

T37. Geologic Energy Research II

Tuesday, 12 October 2021: 1:30 PM-5:30 PM

B115/B116 (Hybrid Room) (Oregon Convention Center)

Geological Society of America Abstracts with Programs. Vol 53, No. 6
doi: 10.1130/abs/2021AM-365177

© Copyright 2021 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T37. Geologic Energy Research II

<< Previous Abstract | Next Abstract >>

GSA Connects 2021 in Portland, Oregon

APPLYING DATA-DRIVEN MACHINE LEARNING TO GEOTHERMAL FAVORABILITY, WESTERN UNITED STATES