A COMMUNITY METADATA AUGMENTATION AND CURATION MODEL FOR IMPROVED CROSS-DOMAIN GEOSCIENCE DATA DISCOVERY

ZASLAVSKY, Ilya¹, RICHARD, Stephen M.², GUPTA, Amarnath¹, VALENTINE, David¹, WHITENACK, Thomas¹, SCHACHNE, Adam¹ and OZYURT, Ibrahim³, (1)San Diego Supercomputer Center, Univ of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0505, (2)Arizona Geological Survey, 416 W. Congress, #100, Tucson, AZ 85701, (3)University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, valentin@sdsc.edu

Cross-disciplinary data discovery in the earth sciences is a complex challenge due to different data models, semantic conventions, access protocols, and other practices of data description and access across geoscience disciplines. Quality, completeness, and standards-compliance of available metadata catalogs vary dramatically, while metadata curation remains mostly manual and labor-intensive. In view of rapidly growing data volumes and cross-domain data interoperability needs, traditional metadata management models become increasingly inadequate.

CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability, http://earthcube.org/group/cinergi) is an NSF EarthCube Building Block project assembling a large cross-disciplinary inventory of geoscience information resources, consistently described and made available via standard service interfaces. Metadata descriptions are obtained from multiple geoscience repository catalogs as well as through community contributions. The metadata documents are converted to a standard representation, analyzed and automatically enhanced, which includes automatic generation of relevant keywords based on text analysis, derivation of spatial extent, and validation of organization names mentioned in the metadata. Keyword generation, in turn, is based on a cross-domain bridge ontology, which integrates several existing geoscience ontologies and controlled vocabularies, and on GeoSciGraph, a system for text parsing, vocabulary management, and semantic annotation. Once processed, the metadata records are republished as ISO-19115/19139 documents with embedded semantic references to the ontologies integrated into CINERGI, along with provenance information for each record. The CINERGI curation model expects that repository curators examine results of automatic metadata augmentation, approving or rejecting computer-generated metadata elements, and thus triggering further ontology updates and re-processing. We report on project results and the main system components: the metadata augmentation pipeline; the underlying CINERGI ontology and semantic services; services and user interfaces for resource discovery and access; and accompanying provenance and validation services.

Session No. 156--Booth# 127

T93. Use of Geoscience Data Resources in Education and Research (Posters)

Monday, 26 September 2016: 9:00 AM-6:30 PM

Exhibit Hall E/F (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol. 48, No. 7
doi: 10.1130/abs/2016AM-288066

© Copyright 2016 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T93. Use of Geoscience Data Resources in Education and Research (Posters)

<< Previous Abstract | Next Abstract >>

GSA Annual Meeting in Denver, Colorado, USA - 2016

A COMMUNITY METADATA AUGMENTATION AND CURATION MODEL FOR IMPROVED CROSS-DOMAIN GEOSCIENCE DATA DISCOVERY