APPLYING MACHINE LEARNING METHODS TO PREDICT GEOLOGY USING SOIL SAMPLE GEOCHEMISTRY
The area of study was the Klaza Property found in the southern part of the Dawson Range in southcentral Yukon, Canada. The area is mostly underlain by igneous rock specifically mid-Cretaceous granodiorite, however, there also are intrusive dykes and metamorphosed rocks. The area spans a 30 km by 15 km rectangle. Our dataset was composed of more than 6700 soil samples and their chemical concentrations for 28 elements. The dataset was enhanced with topographical features to test if these types of data are useful.
We evaluated the performance of 9 different machine learning algorithms: logistic regression, quadratic discriminant analysis, nearest neighbors, support-vector machine, naïve Bayes, artificial neural network, random forest, AdaBoost random forest, and gradient boosted random forest.
We then tested 6 sampling methods that balanced our datasets: random undersampling, random oversampling, Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), SMOTE and Edited Nearest Neighbours, and SMOTE and Tomek links.
Finally, we compared a variety of multiple classifier systems (MCS) that combined algorithms to increase predictive performance. We tested different combinations of algorithms as well as different ensembling methods including adding the probabilities of the algorithms and applying a logistic regression to the probabilities.
Through this study we found that: 1) topographic features can increase the performance of the algorithms, 2) gradient boosted random forest was the best individual algorithm, 3) ADASYN was the most effective sampling method, 4) MCS can improve performance. Our best model was an MCS that used a logistic regression to combine a support-vector machine, artificial neural network, random forest, and gradient boosted random forest. Through this study we found what machine learning methods were most effective. Knowing this can help predict underlying geology using soil sample geochemistry.