Cordilleran Section - 117th Annual Meeting - 2021

Paper No. 7-10
Presentation Time: 12:10 PM

UNDERSTANDING THE GEOCHEMISTRY OF SOUTHERN CALIFORNIA PLUTONIC ROCKS USING AUTOMATED MACHINE LEARNING


ESTEBAN, Oscar1, ALFEREZ, German1, MARTINEZ ARDILA, Ana2 and CLAUSEN, Benjamin L.3, (1)Institute of Data Science, Universidad de Montemorelos, Av Libertad 1300 Pte., Montemorelos, NL 67500, Mexico, (2)Dept of Earth and Biological Sciences, Loma Linda Univ, Loma Linda, CA 92350, (3)Dept of Earth and Biological Sciences, Loma Linda Univ, Geoscience Research Inst, Loma Linda, CA 92350

Southern California plutonic rocks found in the Transverse and Peninsular Ranges have been divided into as many as eight different groups separated by faults and distinguished by varying crustal thickness and mantle component sources. Baird et al. (1984) systematically collected about 500 samples of these rocks. Elemental and isotopic data from these samples have been displayed and analyzed using standard discrimination, bivariate, and ternary diagrams; however, displaying only two or three elements at a time does not effectively utilize the multivariate data available.

Over the past 70 years, machine learning has emerged as an option for analyzing multivariate data simultaneously and examining patterns in the geosciences. However, a tool-supported pipeline is lacking for geologists to apply machine learning from data preparation to model evaluation. Here we present a pipeline to guide geologists in applying automated machine learning (autoML) to multivariate data. It uses an open web application developed with Python. We apply the pipeline to the 500 southern California plutonic geochemistry samples using both supervised and unsupervised learning algorithms.

Supervised learning algorithms were used to classify the samples: Decision Tree, K-Nearest Neighbors, Logistic Regression, Support Vector Machines, and Multi-Layer Perceptron. The model generated with the Decision Tree algorithm offered the best average accuracy (87%), precision (89%) and recall (89%) results and identified the decisions made during the classification.

Two unsupervised learning approaches of PCA and K-Means were used. Up to five principal components were selected to explain 72% of the data variance. These components were input to the K-Means clustering algorithm to generate three clusters. Components and clusters may be related to: 1) mafic to felsic differentiation with small ionic radius compatible elements (MgO, Co, V, Mn, and HREE) positive and large ionic radius incompatible elements (K2O, Rb, and LREE) negative; 2) pressure effects and magma source depth with Sr as positive and Y as negative and other REEs arranged between; and 3) water effects with immobile elements (Ta, Nb) positive and mobile alkali elements (Na, K, Rb, Cs) negative, and possibly also elements enhanced in hydrothermal deposits and radioactive elements.