EXTENDING THE REACH AND RESOLUTION OF THE PALEOBIOLOGY DATABASE WITH COMPUTATIONAL AND DATA INFRASTRUCTURES
Fossil occurrences within the PBDB derive from field-based observations and fossil specimens, some of which were collected, curated in museum collections, and described in publications. However, not all specimen numbers and specimen-specific data (e.g., size, morphology) are included in the PBDB. We used custom scripts and GDD to automatically locate and extract more than 133K unique numbered specimens from more than 13k papers for an initial set of 71 institutional collections. Specimen-based descriptors of morphology, taphonomy, geology, and more were then linked to these specimens and to PBDB occurrences. The publication footprint of museum collections was also computed. We also used GDD to extract potential PBDB fossil occurrences from the literature. Lithostratigraphic units described as fossil-bearing in the literature, but that are not included in the PBDB, are randomly distributed with respect to age among named rock units.
The utility of fossils often depends on the precision and accuracy of age constraints. However, fossil occurrence ages in the PBDB, and within much of the literature and museum collections, remain decoupled from data that constrain geochronology. A continuous-time age model for all sediments in Macrostrat has been generated using basic principles. GDD and cyberinfrastructure exposing results from geochronological lab facilities is being used to improve this age model. Repositioning museum fossil specimens and PBDB occurrences within a continuous-time stratigraphic age model improves the effective temporal resolution of the PBDB by up to an order of magnitude, enables time bin-free quantitative analysis of fossil data, and provides a mechanism for continual refinement of fossil ages as new geochronological measurements are made.