GSA Annual Meeting in Seattle, Washington, USA - 2017

Paper No. 236-2
Presentation Time: 2:00 PM

OPEN CORE DATA PROJECT: INTEGRATING PHYSICAL SAMPLES INTO THE WEB OF DATA


RICHARD, Stephen, IEDA, Lamont Doherty Earth Observatory, Columbia University, Route 9W, Palisades, NY 09065, FILS, Douglas, Consortium for Ocean Leadership, 1201 New York Ave, NW, 4th Floor, Washington, DC 20005, NOREN, Anders, CSDCO / LacCore, University of Minnesota, 116 Church St SE, Minneapolis, MN 55455 and LEHNERT, Kerstin A., Lamont-Doherty Earth Observatory, Columbia University, 61 Route 9W, Palisades, NY 10964, smr2209@columbia.edu

Open Core Data (OCD) is an NSF-funded project focused on facilitating utilization of data derived from scientific drilling from the JOIDES Resolution Science Operator (JRSO), the Continental Scientific Drilling Coordination Office (CSDCO), and eventually other drilling operations. Partners include the Consortium for Ocean Leadership and the Interdisciplinary Earth Data Alliance (IEDA). OCD’s goal is to provide identification, access, citation and provenance of borehole-derived samples and related data to support the research community. Use of persistent identifiers for samples (IGSN) and datasets (DOI) is the foundation for Linked Open Data (LOD) patterns and embedding metadata in web documents to promote discovery. Widely adopted vocabularies (e.g. schema.org, GeoLink, VOID, DCAT…) are used to promote interoperability.

OCD is implementing approaches to data packaging to facilitate data use, starting with simple comma-delimited text (CSV for the Web), and moving to implement data packages for more complex data. This data package model provides access to a large suite of tools, libraries, and workbenches to support data utilization, validation and visualization. The system is also testing a framework (based on the W3C PROV-AQ pingback pattern) to record the provenance of data products as they are analyzed. A collection of web interfaces (APIs) has been developed to allow use of this data in notebook environments like Jupyter and R, which include spatial elements.

Together, this technology will enable analysis of core data and age models in the context of a web of linked samples and datasets (http://www.opencoredata.org/). The participating core repositories will be linking their physical samples using IGSN URIs registered through the System for Earth Sample Registration (SESAR), and curating linkages between data derived from the samples. Such digital curation is an important step in realizing the full value of physical samples in repositories.