GSA Connects 2023 Meeting in Pittsburgh, Pennsylvania

Paper No. 16-1
Presentation Time: 8:05 AM

THE DIGITAL DATA REVOLUTION IN PALEONTOLOGY: OPPORTUNITIES AND CHALLENGES AMID AN EMBARRASSMENT OF RICHES


HENDY, Austin and HOOK, Juliet, Natural History Museum of Los Angeles County, 900 Exposition Blvd, Los Angeles, CA 90007

The digital revolution has greatly improved the digital preservation and availability of paleontological data, first from literature sources, and more recently from museum collections. While preservation and accessibility are critical milestones in collection stewardship, research utility and ease of public consumption are important to furthering the broader impacts of paleontological data. Using the literature-derived Paleobiology Database (PBDB), and the biodiversity aggregators Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio), we evaluate how well these datasets represent the Phanerozoic fossil record of North America. In particular we assess the quality and research readiness of readily downloadable datasets, strengths, gaps, and biases in geographic, stratigraphic, and taxonomic data, and historic trends in collecting, publishing, and digitization activity. In addition, we use case studies of the late Cretaceous Eastern Pacific and Western Interior Seaway regions, and the Cenozoic of the Eastern Pacific to evaluate completeness, accuracy, and quality of occurrence, locality, and taxonomic data, and the potential and limitations of contextual data.

In general iDigBio and GBIF datasets are richer in species occurrences and distinct localities than those of the PBDB. However age and lithostratigraphic data are poorly reported to biodiversity aggregators, greatly limiting their research utility. While the majority of records in both datasets are identified to species level resolution, this percentage is greater in the PBDB. Both types of portal reveal distinct patterns of taxonomic composition, reflecting the biases of museum collected/digitization activities and the interests of publishing scientists and PBDB enterers.

Large-scale digitization of museum collections has significantly increased the accessibility and discoverability of biodiversity data through enhanced data exploration and visualization experiences, and the benefit of formalised vocabulary standards. Nevertheless, variable data quality, lack of taxonomic and stratigraphic resolution, and the significant effort required to clean, standardize and transform digitized collections present challenges to downstream users looking to activate these data.