GSA 2020 Connects Online

Paper No. 140-7
Presentation Time: 3:15 PM

APPLYING MACHINE LEARNING METHODS TO PREDICT GEOLOGY USING SOIL SAMPLE GEOCHEMISTRY


LUI, Timothy Chee Cheng1, GREGORY, Daniel1, COWLING, Sharon A.1 and LEE, Well-Shen2, (1)Department of Earth Sciences, University of Toronto, 22 Russell Street, Earth Science Centre, Toronto, ON M5S 3B1, Canada, (2)Harquail School of Earth Sciences, Laurentian University, 935 Ramsey Lake Road, Sudbury, ON P3E 2C6, Canada

Understanding the underlying geology of an area can be important for a variety of reasons. When bedrock is not present at the surface the underlying geology can be hard to determine. Through this study, we compared the effectiveness of various machine learning methods that used soil sample geochemistry to predict underlying geology.

The area of study was the Klaza Property found in the southern part of the Dawson Range in southcentral Yukon, Canada. The area is mostly underlain by igneous rock specifically mid-Cretaceous granodiorite, however, there also are intrusive dykes and metamorphosed rocks. The area spans a 30 km by 15 km rectangle. Our dataset was composed of more than 6700 soil samples and their chemical concentrations for 28 elements. The dataset was enhanced with topographical features to test if these types of data are useful.

We evaluated the performance of 9 different machine learning algorithms: logistic regression, quadratic discriminant analysis, nearest neighbors, support-vector machine, naïve Bayes, artificial neural network, random forest, AdaBoost random forest, and gradient boosted random forest.

We then tested 6 sampling methods that balanced our datasets: random undersampling, random oversampling, Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), SMOTE and Edited Nearest Neighbours, and SMOTE and Tomek links.

Finally, we compared a variety of multiple classifier systems (MCS) that combined algorithms to increase predictive performance. We tested different combinations of algorithms as well as different ensembling methods including adding the probabilities of the algorithms and applying a logistic regression to the probabilities.

Through this study we found that: 1) topographic features can increase the performance of the algorithms, 2) gradient boosted random forest was the best individual algorithm, 3) ADASYN was the most effective sampling method, 4) MCS can improve performance. Our best model was an MCS that used a logistic regression to combine a support-vector machine, artificial neural network, random forest, and gradient boosted random forest. Through this study we found what machine learning methods were most effective. Knowing this can help predict underlying geology using soil sample geochemistry.