Geoinformatics 2007 Conference (17–18 May 2007)

Paper No. 10
Presentation Time: 2:30 PM-4:30 PM

GEOINFORMATICS FOR GEOCHEMISTRY (GFG): INTEGRATED DIGITAL DATA COLLECTIONS FOR THE EARTH & OCEAN SCIENCES


LEHNERT, Kerstin Annette, Lamont-Doherty Earth Observatory 61 Route 9W, Columbia Univ, 61 Route 9W, Palisades, NY 10964 and VINAYAGAMOORTHY, Sri, Center for International Earth Science Information Network, Columbia University, 61 Route 9W, Palisades, NY 10964, lehnert@ldeo.columbia.edu

The GfG program, a collaborative enterprise by the Lamont-Doherty Earth Observatory (LDEO) and the Center for International Earth Science Information Network (CIESIN), integrates and consolidates the development, maintenance, and operation of four closely related Geoinformatics projects, comprising the digital data collections for geochemistry (EarthChem, PetDB, and SedDB), and the System for Earth Sample Registration (SESAR) that administers global unique identifiers for samples. Systems within the GfG program represent core databases for geochemistry and the broader Geosciences, enabling data stored in these databases to be discovered and reused by a diverse community now and in the future. The systems dynamically evolve in response to community needs and technical innovation, and contribute proactively to the construction of a digital information infrastructure that supports the next generation of Geoscience research and education, by establishing links to other Geoinformatics activities, and pursuing developments in interoperability.

The GfG program provides the technical infrastructure (hardware, software, and services), the required range of expertise (a team of scientists, data managers, database administrators, web application programmers, web designers, project managers) and the organizational structure for the execution of the individual project components. It is managed in a professional and sustainable environment that ensures reliable services, a high level of data quality, and the long-term availability of the datasets. All systems within the program are operated in a dynamic modus operandi, continuously responding to the needs and demands of the community and to changes in technologies, metadata and interface standards, data types, policies and procedures for data publication and data access, and organizational structures to retain their value to the science community. GfG is dedicated to educate and train the science community, as well as students and teachers, in the use of the data collections through short courses, internships, and lectures, and advancing the establishment of a new workforce for Geoinformatics through training and education of project staff.

The GfG program includes the following elements:

-          System engineering and development: Data modeling, database development; development of data submission, ingestion procedures; development of web applications; interoperability interfaces

-          System operation: Database administration; system maintenance, security, & backups; risk management

-          Data management: Data compilation/solicitation; data and metadata entry & QC; user support; long-term archiving

-          Education, Outreach, & Community Liaison: Short courses, workshops, lectures, exhibits; collaboration with other Geoinformatics efforts nationally and internationally

-          Program & Project Management: Integrated management of program and it's individual projects

 

GfG System Infrastructure and Architecture

The core infrastructure on which the GfG systems are developed and operated is illustrated in Figure 1. It consists of a set of Web, Application, Mapping, and Database servers; all SUN servers. WebLogic application server software is used for developing web applications. IONIC software is used for serving geospatial data and for developing mapping/visualization tools. <>

Using this infrastructure, we are developing a service-oriented architecture (SOA) to implement an application framework consisting of query models and data cache that support web and interoperability interfaces. This framework is used across each individual project thereby making the operation and maintenance of GfG systems more efficient and sustainable.

At the core of this architecture are the databases: the new Geo-Chemistry Data Model (GCDM) and the SESAR data model. The application software layer consisting of query model and data cache is being developed using object oriented design (OOD) methodologies on a Java/J2EE platform supported by WebLogic application server. Web services for querying and serving data are being developed on top of the application software layer. The user interfaces are built based on web services as well as application software modules. The web services will include interoperability interfaces to serve analytical and geospatial data related to samples to external client systems. A conceptual diagram of this service-oriented architecture is illustrated below in Figure 2.

 

Geo-Chemistry Data Model (GCDM)

Geochemical data served by the GfG includes substantially broader range of measurements and materials such as sediment cores, hydrothermal spring fluids and plumes, and xenoliths and requires the following features to be supported by the underlying data model.

  • description of spatial and temporal components of samples and measurements (e.g. depth in core, time-series and sensor measurements, point analyses on a microprobe slide);
  • capability to store ‘derived' (model) types of observed values such as age models for cores or end-member compositions for seafloor hydrothermal springs;
  • capability to track relationships between samples and sub-samples,
  • ability to integrate data at any level of sample granularity;
  • capability to accommodate analytical metadata at the level of individual measurements;

Based on these requirements, we have developed a more generic, integrative, and flexible model for geochemical data, the Geo-Chemistry Data Model GCDM, to serve as the core data structure for our entire suite of geochemical databases (Djapic, Vinayagamoorthy, & Lehnert 2006a,b). This data model is compliant with standards defined in GeoSciML, a markup language developed by the IUGS Commission for Geoscience Information to represent Geoscience information associated with geologic maps and observations (Cox 2006). Attributes in GCDM such as method, sample and item measured can be mapped to corresponding types within GeoSciML, others like observation point or observed value can be incorporated into the GeoSciML concepts of method, event, and measured value. We will use GeoSciML to serve geochemical data via interoperable web services. We have presented and discussed the model with the community at various occasions (Geoinformatics 2006, AGU Fall Meeting 2006, workshop with the IODP Applications Development team at TAMU), and received valuable feedback and validation of the model. Updates to the model are in progress that will make it even more generic and widely applicable.

Along with the new data model, we will implement the International Geo Sample Number (IGSN), the emerging global unique identifier for samples that will allow building enhanced interoperability with other data systems at the sample level. SESAR data model is at the core of the implementation of IGSN for broad and diverse earth samples ranging from holes, cores, and dredges, to individual samples and sub-samples. SESAR enables the unique identification of samples and integration of sample data from various sources and systems.

 

Web Interfaces

Each GfG system includes following web based interfaces:

  • query and browse;
  • visualization and analysis;
  • administrative;
  • data validation and loading;

 

Interoperability Interfaces

To maximize the use of the GfG data collections, we are implementing interoperability interfaces using the service-oriented architecture described above, to allow open access to the client systems.

 

  • Analytical Data Services: Geochemical data access via web services that are based on the XML schema developed by EarthChem to serve complete sample data and metadata. This schema will continue to evolve towards compliance with GeoSciML as a community standard, and will enable other systems and tools to access GfG data in real-time.
  • Geospatial Data Services: OGC compliant WMS and WFS services for serving sample locations. Selected data, metadata and a link to a sample profile will be included. The service will enable any OGC compliant client to overlay, visualize, and analyze relevant geospatial data layers from multiple sources in conjunction with GfG layers.

 

References Cited

 

Cox, S.J.D. (editor) (2006), “Observations and Measurements”. Open Geospatial Consortium, Inc. document OGC 05-087r4, version 0.14.7, 168 pages. http://portal.opengeospatial.org/files/?artifact_id=17038

 

Djapic, B., S. Vinayagamoorthy, K. A. Lehnert (2006), “Serving Geochemical Data Using GeoSciML Compliant Web Service: Next Step in Developing a Generic GeoChemical Database Model.” Eos Trans. AGU, 87(52), Fall Meet. Suppl., Abstract IN51B-0813

 

Lehnert, K. A., S. Vinayagamoorthy, B. Djapic, J. Klump (2006), “The Digital Sample: Metadata, Unique Identification, and Links to Data and Publications.” Eos Trans. AGU, 87 (52), Fall Meet. Suppl., Abstract IN53C-07

 

OpenGIS® Web Feature Service (WFS) Implementation Specification, Version 1.1.0, Open Geospatial Consortium Inc., Document: OGC 06-027r1, Date: 2006-02-12, http://www.opengeospatial.org/standards/wfs/.