Rocky Mountain Section - 75th Annual Meeting - 2025

Paper No. 36-2
Presentation Time: 10:30 AM

STREAMLINING FOSSIL INVENTORY DATA AND ANALYSIS ACROSS U.S. NATIONAL PARKS


ONG, Nathan, Salt Lake City, UT 84111

The Paleontological Resources Preservation Act of 2009 requires the National Park Service (NPS) to inventory fossils. To date, 286 parks with paleontological resources are known. The existing system used to track these data consisted of Word tables cataloging formations, fossils, depositional environments, and notes on fossil condition, cultural significance (e.g., use in building stone), and locality. While extensive and meticulous, the variable and decentralized nature of these data made access and maintenance labor-intensive.

To address this, I compiled these tables into a master spreadsheet, then created a word bank of over 2,600 total keywords, including 800 taxonomic terms, 200 geological time intervals, and 800 formations. Using the statistical software R, I mined these notes to generate individual fossil occurrence records and enrich them with beta taxonomic classifications, geologic ages, and specimen metadata (e.g., anatomical elements, type status, preservation).

To promote standardization of future data, I built a tool that augments and structures manual data entry, while maintaining a change log for transparency. Recognizing the limitations of R as a relational database, I also created a global update tool that propagates changes of indexed values across the dataset.

For externally facing applications, I developed an online query tool that generates customized reports with fossil statistics, taxonomic breakdowns, and diversity curves for NPS administrators. Additionally, this dataset, in conjunction with data from the Bureau of Land Management, trained a machine learning algorithm for rating formations on the Potential Fossil Yield Classification (PFYC) Index, aiding in paleo-mitigation planning and identification of under-prospected formations.

Future goals include leveraging large language models like ChatGPT to automate the extraction of fossil occurrence records from publications, and the integration of data from accessioned specimens to further expand this vital paleontological database.