MACHINE LEARNING APPLIED TO A PETROGRAPHIC DATABASE: INTRODUCING THE GLOBAL PREDICTION OF SAND MINERALOGY (GLOPRSM) MODEL
To date, point count data for 3,512 sand samples have been compiled from 50 published sources. Environmental data were extracted from a subset 3,208 fluvial and marine sample catchments, including precipitation, temperature, relief, slope, basin area and source lithology. These data were used to train two Random Forest models to predict the log-ratios of samples’ quartz-feldspar-lithic (QFL) proportions. Preliminary results show 68% and 78% of the variability of the two log-ratios can be explained by the random forests. Using the BasinATLAS dataset, a Global Prediction of Sand Mineralogy (GloPrSM) was generated for level 8 watersheds (mean area ~8,000 km2). In general, GloPrSM predicts quartz enrichment in tropical latitudes (30°N to 30°S), feldspar enrichment near plutonic and metamorphic crystalline terranes, and lithic enrichment near active margins and flood basalts. Feature importance algorithms revealed that slope, temperature, metamorphic source abundance, and felsic to intermediate plutonic source abundance are the most important predictors of the log-ratio models. Future research will increase the granularity of GloPrSM by incorporating additional grain types, including mono- and polycrystalline quartz, chert, alkali feldspar, plagioclase, and volcanic, sedimentary and metamorphic lithics.