GSA Connects 2022 meeting in Denver, Colorado

Paper No. 187-10
Presentation Time: 4:15 PM

CLASSIFYING DETRITAL ZIRCON U-PB AGE DISTRIBUTIONS USING MACHINE LEARNING


FEKETE, Jack, Geosciences, University of Arkansas, Fayetteville, AR 72703

Detrital zircon (DZ) geochronology has become a widely used technique in recent decades, in part due to the ease in collecting large-n datasets acquisition via laser ablation-inductively coupled plasma-mass spectrometry. However, analysis of large DZ datasets can prove challenging when using classic methods of visual inspection and/or existing quantitative comparisons (e.g., Cross-correlation coefficient, R2). Given the continued increase in the size and complexity of DZ datasets and the lack of quantitative methods for assessing intergroup relationships, machine learning (ML) and artificial intelligence present a new opportunity to not only handle large amounts of data in a time-efficient manner but also assess trends and groupings between samples of unknown affinity that cannot be quickly assessed with a visual inspection. Here we test the utility of ML classification of DZ age distributions by training algorithms on synthetic samples derived from eight different sources of varying similarity, predicting the source of new derivative samples, and judging a ML algorithm’s success using their F1 score (a balance of precision and recall). Similarity of sources was calculated from R2, with eight sources being created, such that all inter-source comparisons yielded an R2 value within a 0.1 range (e.g., 0.55-0.65). In total, nine sets of sources were created (R2 values range from 0.15-0.25 to 0.85-0.95 and 0.95-0.99). Within each 0.1 R2 interval, the number of sources included in the ML pipeline (up to 8) and the number of analyses in the derivative samples (10-1000) varied to observe the resulting change in F1 scores. Overall, ML pipelines had higher F1 scores when fewer sources were used, source similarity was lower, and the number of analyses per synthetic sample was higher. ML pipelines also outperformed basic R2 calculation predictions with F1 scores improved by as much as 16.9% and on average by ~3.2% for all intervals. These results provide a proof-of-concept of the utility of ML classification for quickly and efficiently classifying grouping relationships in large DZ datasets. Future research directions include applying ML classifying algorithms to scenarios with varying source components in derivative synthetic samples and testing the approach in real-world datasets.