GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 218-3
Presentation Time: 2:00 PM


NGUYEN, Thao1, EBERHARDT, Sven1, WILF, Peter2, WING, Scott L.3 and SERRE, Thomas1, (1)Dept. of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI 02912, (2)Dept. of Geosciences, Pennsylvania State Univ, University Park, PA 16802, (3)Department of Paleobiology, Smithsonian Institution, P.O. Box 37012 MRC 121, Washington, DC 20013,

Leaves are the most conspicuous, abundant, and frequently fossilized plant organs, but until now, the vast quantity of evolutionary information encoded in their complex, variable shapes and venation patterns was largely inaccessible. Machine vision offers opportunities to analyze large numbers of specimens, to discover novel leaf features of angiosperm clades that may have phylogenetic significance, and to use those characters to classify unknowns. There is enormous potential for machine learning to guide the identification and evolutionary analysis of fossil leaves. Here, we leverage recent developments in the area of deep learning, an area of machine learning that is currently revolutionizing artificial intelligence. Deep learning aims to model high-level visual abstractions by training a deep neural network to classify images. The algorithm learns high-level visual representations by composing a hierarchy of simple but non-linear modules. Starting from pixel intensities, each transformation yields an increasingly abstract visual representation at each stage. In order to train and test the algorithm, we have assembled a large image collection of over 25,000 cleared and x-rayed leaves. Significantly, no manual preparation of the images is necessary. We report initial results with a deep learning network demonstrating accurate categorization of thousands of angiosperm leaf images into natural botanical groups (APG IV orders and families), far outperforming an earlier computer vision approach (Wilf et al. Computer vision cracks the leaf code, PNAS 2016). We further explore methods for feature visualization to gain deeper understanding of the wealth of novel botanical characters used by the network to learn to categorize leaves. Last, we report promising initial results towards the automated analysis of fossil leaves.