TEXT-MINING THE BRYOZOAN FOSSIL RECORD

Kopperud, Bjørn Tore

Paper No. 12-6

Presentation Time: 9:15 AM

TEXT-MINING THE BRYOZOAN FOSSIL RECORD

KOPPERUD, Bjørn Tore, Natural History Museum, University of Oslo, Blindernveien 31, Oslo, 0371, Norway and LIOW, Lee Hsiang, Natural History Museum, University of Oslo, Blindernveien 31, Oslo, 0371, Norway; Natural History Museum and Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, PO Box 1066 Blindern, Oslo, 0316, Norway

An increasing number of observations of fossil organisms are recorded by paleontologists. When studying the dynamics of speciation and extinction, researchers often use compiled literature datasets such as those found in the Paleobiology Database. While the Paleobiology Database is an excellent resource for many groups of fossil animals, there are substantial gaps in the coverage of several groups, such as the Cenozoic bryozoans. The compilation of observational data from the published literature is a challenging and labor-intensive endeavor. Using Bryozoa as an exemplary case study, we use natural language processing to extract temporal distributions of fossils in an automated fashion. We perform named-entity recognition of bryozoan species names and geological time intervals in published articles and books using dictionaries of known names. Next, we apply supervised machine-learning techniques to discriminate between the observation or non-observation of a fossil species in a geological time-interval, given their co-appearance in a sentence. This type of information retrieval is reproducible from end-to-end, making tasks such as reference lookup and outlier inspection of the fossil record more transparent. Our preliminary results indicate that human and machine-based information retrieval are similarly accurate. Raw observed genus counts through time appear congruent, yet not identical, with previous counts of richness in Cenozoic bryozoans. We present estimates of true richness and diversification rates using capture-recapture approaches using our machine-compiled data. Remaining challenges include updating outdated taxonomic names, incorporating non-English texts, and the acquisition of large-volume documents in the presence of legal restrictions on intellectual property. We argue that this approach can easily be adapted for other groups of fossil animals, and would be especially beneficial when studying groups that are under-represented in curated databases.

Session No. 12

T125. EARTHTIMEs and EarthRates: Exploring the Tempo and Mode of Earth-System Processes and Evolution

Sunday, 4 November 2018: 8:00 AM-12:00 PM

Room 143-144 (Indiana Convention Center)

Geological Society of America Abstracts with Programs. Vol. 50, No. 6
doi: 10.1130/abs/2018AM-320408

© Copyright 2018 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T125. EARTHTIMEs and EarthRates: Exploring the Tempo and Mode of Earth-System Processes and Evolution

<< Previous Abstract | Next Abstract >>

GSA Annual Meeting in Indianapolis, Indiana, USA - 2018

TEXT-MINING THE BRYOZOAN FOSSIL RECORD