APPLYING DATA-DRIVEN MACHINE LEARNING TO GEOTHERMAL FAVORABILITY, WESTERN UNITED STATES
To provide a direct comparison, we use the same input data from the 2008 conventional-hydrothermal, resource-favorability study to create new favorability maps. This 2008 study relied upon methods that required input data to be binned when creating maps of geothermal favorability, thereby requiring bin-value exploration and selection and, consequently, human-made decisions (e.g., bin quantity, bin limits). Our study presents probability maps for the western US created using modern, data-driven strategies (i.e., no expert choices in the algorithmic application) in an effort to remove human bias and minimize the considerable effort of the expert in creating resource maps. During the analysis, two overarching challenges were identified: 1) the training data have only positive examples (i.e., known hydrothermal systems) and unlabeled examples (comprised of negative [i.e., no hydrothermal system present] and unidentified positive examples) and 2) extreme class imbalance (estimated to have approximately a 1 : 2600 positive-example : unlabeled-example ratio). To address challenge number 1), unsupervised clustering of features was used to identify groups of likely true negative examples, and these likely true negative examples and the known positives were then sampled proportionally for use with the supervised methods. To address challenge number 2), a customized oversampling training strategy was selected for creating a reliable classifier.