2008 Geoinformatics Conference (11-13 June 2008)

Paper No. 8
Presentation Time: 12:40 PM

A COLLABORATIVE ENVIRONMENT FOR CLIMATE-DATA HANDLING


KINDERMANN, Stephan, German Climate Computing Centre, DKRZ, Bundesstrasse 55, Hamburg, 20146, Germany and STOCKHAUSE, Martina, Max Planck Institute for Meteorology, Bundesstrasse 53, Hamburg, 20146, Germany, kindermann@dkrz.de

The Collaborative Climate Community Data and Processing Grid project (C3Grid) is currently building up a climate-data handling infrastructure to support scientists in finding, analyzing, processing and sharing climate-data sets; the infrastructure aims to underpin the entire data cycle, from discovery of input data to publication and archiving of final results.

Basically, the approach consists of three layers: a common data discovery layer, a data access layer and a data manipulation layer:

- Data discovery is based on International Organisation for Standardization (ISO) standard 19139 conform metadata descriptions, harvested to a central metadata catalogue with common web interface.

- Data access and data provider-specific data extraction and pre-processing steps are hidden by simple data request web service interfaces.

- Data processing can be triggered in a collaborative grid environment providing compute resources as well as short and long term data storage components.

Building up the infrastructure we face three main problems, which are discussed:

1. Establishing a consistent security layer: On the one hand there is a clear need for a federated Authentication and Authorisation (AA) infrastructure, leaving user identity and role management at the individual home institutes. On the other hand the collaborative environment uses grid technology and its Public Key Infrastructure (PKI) with Virtual Organisation (VO) based AA. The C3Grid approach merges Shibboleth's Security Assertion Mark-up Language (SAML) information with Grid/VO based AA mechanisms. After a prototyping phase with plain web services, now C3Grid moves towards the implementation of Web-Service-Resource-Framework (WSRF) web services.

2. Generation and quality control of ISO metadata: discovery information currently can mainly be handled by large data providers. Yet, substantial generic tool support is necessary to allow small data providers to join the infrastructure and to support appropriate metadata generation during processing, for example reflecting data provenance. The C3Grid aims at automatic data provenance tracking and semi-automatic data archiving with quality checked discovery metadata.

3. Enable flexible but modular processing: in the long run complex scientific workflows which are composed of predefined modules need to be supported by the infrastructure. This requires a sufficient workflow description on the one hand and additional description of use details for the data on the other hand. The handling and generation of such metadata can build on the collected experience and developed tools, but needs additional agreements and information services to suffice.

In general a major challenge in the project is to find (partly even legal) agreements that reflect an elaborate balance between technical progress and manageable effort. The established data and compute provider want to reuse their current implementations in order to minimize maintenance of software stacks and general effort. Yet, the integration into collaborative environments always requires the adoption and prototyping of not yet established technologies. Different technological pathways have to be merged with respect to the specific needs of the community and future needs of inter-community cyber infrastructures. In this talk the key experiences and design decisions within the C3Grid project, which started in September 2005 as part of the German grid initiative (D-Grid), are discussed.

Abbreviations used:

D-Grid: German Grid Initiative

C3Grid: Collaborative Climate Community Data and Processing Grid project

ISO: International Organisation for Standardization

AA: Authentication and Authorisation

PKI: Public Key Infrastructure

VO: Virtual Organisation

SAML: Security Assertion Mark-up Language

WSRF: Web Service Resource Framework