MACHINE LEARNING ANALYSIS OF N-ALKANES FROM WOODY AND GRASSY AFRICAN PLANTS
n-Alkane chain length distributions have been suggested as an alternative indicator of woody and grassy African plants, with studies using ensemble mean distributions of these plant types as end members for continental scale ecosystem reconstructions. However, there is significant variation in the distributions of individual plant n-alkanes. This study employs supervised machine learning to classify n-alkane distributions of modern plants, assessing the reliability of this proxy for grassy and woody plant identification.
Using a dataset of n-alkane chain lengths from ~600 modern specimens, including over 220 newly generated distributions, we employ linear (e.g., K-Nearest Neighbors) and non-linear (e.g., Neural Network) classifiers. Both achieve >80% validation accuracy in distinguishing woody and grassy plants, suggesting the reliability of n-alkane chain length distributions as indicators of woody and grassy ecosystems. Future studies applying machine learning to modern calibration soils or lake sediments which integrate plant wax n-alkanes will establish a foundation for applying this tool to the geologic record. Our method complements established isotopic practices, offering a powerful tool for reconstructing vegetation structure in pure C3 ecosystems.