Paper No. 16-3
Presentation Time: 8:45 AM
PREDICTING MODERN SEDIMENT COMPOSITION – MACHINE LEARNING APPLIED TO A GLOBAL PETROGRAPHIC DATABASE
JOHNSON, Isaac1, SHARMAN, Glenn R.1, SZYMANSKI, Eugene2 and HUANG, Xiao1, (1)Geosciences, University of Arkansas, 340 N. Campus Dr., 216 Gearhart Hall, Fayetteville, AR 72701, (2)Utah Geological Survey, 1594 West North Temple, Suite 3110, Salt Lake City, UT 84116
Sandstone petrography has long been used as a tool to infer sedimentary provenance. In the latter half of the 20
th century, sandstone petrographers including William R. Dickinson and Paul E. Potter made significant advancements in relating the relative abundance of framework grains in sedimentary deposits to tectonic setting. Grain proportions may also reveal details about the boundary conditions of the systems in which sediments are formed, including source terrane lithology, climate, and transport distance. This research seeks to answer the following questions: (1) can the final modal composition of sand be predicted if boundary conditions are known; and (2) can sand modal composition be used to determine the relative control of the environmental factors that generate sediments? We investigate these questions by analyzing Pleistocene-to-modern aged samples where provenance and boundary conditions are known with certainty, using existing data from published studies and new data from marine sand samples across the globe. Moreover, this research will reassess the usefulness of point count data to predict sedimentary provenance by employing data analysis libraries of the Python programming language in order to better understand how Earth-surface processes are manifested in the global sedimentary archive.
To date, we have compiled point count data from 3,026 sand samples and 48 published sources and, of these data, we used a subset of 1,554 fluvial samples to train a Random Forest Regressor from Python’s scikit-learn library. Numerical data were collected for each fluvial sample’s catchment, including precipitation, temperature, relief, slope, area, erosion rate and source rock proportions; these data comprise the Random Forest independent variables. Preliminary results reveal a positive correlation between predicted composition and the test dataset with a R2 score of 0.719. Permutation feature importance was calculated for each independent variable revealing average basin slope is the most important estimator at a mean importance of 55.0%, with basin area and average basin temperature following at 26.3% and 17.9%, respectively. Future research will incorporate data from marine, lacustrine and littoral depositional environments to improve the predictive capabilities of the Random Forest.