A COMMUNITY METADATA AUGMENTATION AND CURATION MODEL FOR IMPROVED CROSS-DOMAIN GEOSCIENCE DATA DISCOVERY
CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability, http://earthcube.org/group/cinergi) is an NSF EarthCube Building Block project assembling a large cross-disciplinary inventory of geoscience information resources, consistently described and made available via standard service interfaces. Metadata descriptions are obtained from multiple geoscience repository catalogs as well as through community contributions. The metadata documents are converted to a standard representation, analyzed and automatically enhanced, which includes automatic generation of relevant keywords based on text analysis, derivation of spatial extent, and validation of organization names mentioned in the metadata. Keyword generation, in turn, is based on a cross-domain bridge ontology, which integrates several existing geoscience ontologies and controlled vocabularies, and on GeoSciGraph, a system for text parsing, vocabulary management, and semantic annotation. Once processed, the metadata records are republished as ISO-19115/19139 documents with embedded semantic references to the ontologies integrated into CINERGI, along with provenance information for each record. The CINERGI curation model expects that repository curators examine results of automatic metadata augmentation, approving or rejecting computer-generated metadata elements, and thus triggering further ontology updates and re-processing. We report on project results and the main system components: the metadata augmentation pipeline; the underlying CINERGI ontology and semantic services; services and user interfaces for resource discovery and access; and accompanying provenance and validation services.