GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 156-2
Presentation Time: 9:00 AM-6:30 PM

A COMMUNITY METADATA AUGMENTATION AND CURATION MODEL FOR IMPROVED CROSS-DOMAIN GEOSCIENCE DATA DISCOVERY


ZASLAVSKY, Ilya1, RICHARD, Stephen M.2, GUPTA, Amarnath1, VALENTINE, David1, WHITENACK, Thomas1, SCHACHNE, Adam1 and OZYURT, Ibrahim3, (1)San Diego Supercomputer Center, Univ of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0505, (2)Arizona Geological Survey, 416 W. Congress, #100, Tucson, AZ 85701, (3)University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, valentin@sdsc.edu

Cross-disciplinary data discovery in the earth sciences is a complex challenge due to different data models, semantic conventions, access protocols, and other practices of data description and access across geoscience disciplines. Quality, completeness, and standards-compliance of available metadata catalogs vary dramatically, while metadata curation remains mostly manual and labor-intensive. In view of rapidly growing data volumes and cross-domain data interoperability needs, traditional metadata management models become increasingly inadequate.

CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability, http://earthcube.org/group/cinergi) is an NSF EarthCube Building Block project assembling a large cross-disciplinary inventory of geoscience information resources, consistently described and made available via standard service interfaces. Metadata descriptions are obtained from multiple geoscience repository catalogs as well as through community contributions. The metadata documents are converted to a standard representation, analyzed and automatically enhanced, which includes automatic generation of relevant keywords based on text analysis, derivation of spatial extent, and validation of organization names mentioned in the metadata. Keyword generation, in turn, is based on a cross-domain bridge ontology, which integrates several existing geoscience ontologies and controlled vocabularies, and on GeoSciGraph, a system for text parsing, vocabulary management, and semantic annotation. Once processed, the metadata records are republished as ISO-19115/19139 documents with embedded semantic references to the ontologies integrated into CINERGI, along with provenance information for each record. The CINERGI curation model expects that repository curators examine results of automatic metadata augmentation, approving or rejecting computer-generated metadata elements, and thus triggering further ontology updates and re-processing. We report on project results and the main system components: the metadata augmentation pipeline; the underlying CINERGI ontology and semantic services; services and user interfaces for resource discovery and access; and accompanying provenance and validation services.