GSA Annual Meeting in Indianapolis, Indiana, USA - 2018

Paper No. 3-6
Presentation Time: 9:50 AM

BIG DATA IN PALEONTOLOGY—CREATING A ROADMAP FOR BUILDING A DATA SYNTHESIS CENTER FOR THE PALEOGEOSCIENCES


PARK BOUSH, Lisa E., Center for Integrative Geosciences, University of Connecticut, 354 Mansfield Road, Storrs, CT 06269-1045, WILLIAMS, John W., Department of Geography, University of Wisconsin-Madison, 550 N Park St, Madison, WI 53706, BOWEN, Gabriel, Department of Geology and Geophysics, University of Utah, 115 S 1460 E, Salt Lake City, UT 84112, GORING, Simon, Department of Geography, University of Wisconsin, 550 N Park St, Madison, WI 53706, LEHNERT, Kerstin, Lamont-Doherty Earth Observatory, Columbia University, 61 Route 9W, Palisades, NY 10964, NOREN, Anders J., LacCore/CSDCO, Department of Earth Sciences, University of Minnesota, 500 Pillsbury Dr. SE, Minneapolis, MN 55455, PETERS, Shanan E., Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706, SESSA, Jocelyn A., Department of Invertebrate Paleontology, The Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, STIGALL, Alycia L., Department of Geological Sciences and OHIO Center for Ecology and Evolutionary Studies, Ohio University, 316 Clippinger Laboratories, Athens, OH 45701 and UHEN, Mark, Department of Atmospheric, Oceanic and Earth Sciences, George Mason University, Fairfax, VA 22030

Developments in data science and informatics are revolutionizing access to geoscientific data, breaking down barriers among disciplines, and opening new research frontiers, particularly among paleontology and allied disciplines. Successful database projects such as the Neotoma Paleoecology DB, Paleobiology Database, LinkedEarth, SedDB/EarthChem, NOAA Paleoclimatology, MorphoBank, VertNet and iDigBio have enabled data-driven science that have led to important insights into the paleontological record with respect to extinctions, macroevolution, biogeography, and Earth-life systems. The pace of change within cyberinfrastructure is rapid and includes building software, creating best practices and linkages, and changing the culture to form an open-data ecosystem for data-driven paleogeoscience. Cyberinfrastructure investments over the next decade should come in the form of distributed, meso-scale investments, focused on the following priorities: 1) reducing data friction by developing scientific workflows, structured vocabularies, semantic frameworks, and data-tagging systems to pass data and metadata seamlessly within and among community resources; 2) developing automated data-mining systems for extracting information from unstructured data in the scientific literature; 3) supporting the long-term sustainability of existing community cyberinfrastructure resources and the grassroots development of community informatics resources for sub-disciplines that lack data sharing systems; 4) launching data mobilization campaigns to unlock existing data relevant to high-priority scientific research questions; 5) developing and training a distributed scientific workforce for both early career scientists and current practitioners; and 6) establishing a Paleodata Synthesis Center to coordinate activities among individual geoscientists and the federation of CCDRs and sample repositories, promote community best-practices and data standards, and develop education and scientific workforce training initiatives. Accomplishing these aims requires the collaboration and cooperation of the entire paleogeoscience community. We present an update on the status of database projects and propose a roadmap to create a Paleodata Synthesis Center to address the challenges of the future.