UNDERSTANDING THE GEOCHEMISTRY OF SOUTHERN CALIFORNIA PLUTONIC ROCKS USING AUTOMATED MACHINE LEARNING
Over the past 70 years, machine learning has emerged as an option for analyzing multivariate data simultaneously and examining patterns in the geosciences. However, a tool-supported pipeline is lacking for geologists to apply machine learning from data preparation to model evaluation. Here we present a pipeline to guide geologists in applying automated machine learning (autoML) to multivariate data. It uses an open web application developed with Python. We apply the pipeline to the 500 southern California plutonic geochemistry samples using both supervised and unsupervised learning algorithms.
Supervised learning algorithms were used to classify the samples: Decision Tree, K-Nearest Neighbors, Logistic Regression, Support Vector Machines, and Multi-Layer Perceptron. The model generated with the Decision Tree algorithm offered the best average accuracy (87%), precision (89%) and recall (89%) results and identified the decisions made during the classification.
Two unsupervised learning approaches of PCA and K-Means were used. Up to five principal components were selected to explain 72% of the data variance. These components were input to the K-Means clustering algorithm to generate three clusters. Components and clusters may be related to: 1) mafic to felsic differentiation with small ionic radius compatible elements (MgO, Co, V, Mn, and HREE) positive and large ionic radius incompatible elements (K2O, Rb, and LREE) negative; 2) pressure effects and magma source depth with Sr as positive and Y as negative and other REEs arranged between; and 3) water effects with immobile elements (Ta, Nb) positive and mobile alkali elements (Na, K, Rb, Cs) negative, and possibly also elements enhanced in hydrothermal deposits and radioactive elements.