Geoinformatics 2007 Conference (17–18 May 2007)

Paper No. 8
Presentation Time: 2:30 PM-4:30 PM

DEPLOYING DATA PORTALS


CHANDRA, Sandeep, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, LIN, Kai, San Diego Supercomputer Center, Univ of California, San Diego, 9500 Gilman Drive, Mail Code 0505, La Jolla, CA 92093-0505 and YOUN, Choonhan, chandras@sdsc.edu

A fundamental objective of the GEON Project (www.geongrid.org) is to develop data sharing frameworks--in the process, identifying best practices and developing capabilities and tools to enable advances in how geoscience research is done. The GEON portal framework, which plays a key role in facilitating this objective, is implemented using GridSphere and a portlet-based framework that provides a uniform authentication environment with access to a rich set of functionality, including the ability to register data and ontologies into the system; smart search capability; access to a variety of geoscience-related tools; access to Grid-enabled geoscience applications; and a customizable private work space from which users have access to the scientific workflow environment, which allows them to easily author and execute processing “pipelines” for data analysis, modeling, and visualization. In this paper, we describe the modular hardware and software components that we have developed, which make it possible to easily deploy a data sharing portal environment for any application domain.

In practice, deploying a portal framework like GEON portal requires an understanding of the various portal software components, and their dependencies, that go into engineering such a system. In addition to sites within the GEON network, the GEON software infrastructure is increasingly being adopted in other projects such as the Chesapeake Bay Environmental Observatory (CBEO), the Network for Earthquake Engineering Simulations (NEES), an Archeoinformatics project, and the National Ecological Observatory Network (NEON). Based on this experience, we have developed a modular packaging of the various components of the system to allow easy installation and configuration of the hardware, middleware, and other software components.

The data portal infrastructure consists of the following components:

1. A Portal Server, which runs the portal software, The nominal system is a “rack-mounted”, server-class machine with 750 GB raw disk (5 x 146 GB hot-swappable SAS drives), dual core 3.0 GHz Intel Xeon processors, 4 GB of RAM, dual gigabit network interfaces, and redundant power supplies. The portal server runs the GEON software stack, including the portal software and provides connectivity to and interoperability among the other GEON systems.

2. A Data Server, which provides storage and other data management services. This system is also a “rack-mounted”, server-class machine with dual core 3.0 GHz Intel Xeon processors. It includes 5 x 300 GB RAID SAS drives for a total of 1.5 Terabytes of raw drive space. The data nodes are configured with RAID disks in order to deal with unforeseen disk failures.

3. A Certificate Authority (CA) Server, which manages user accounts. The CA server system has the same basic configuration as the Portal Server, but with 2 GB RAM and the disk size reduced to about 36 GB, since the CA Server tasks are not I/O intensive.

The Portal Server runs the Rocks cluster management software and a standardized “GEON software stack,” which includes the GEON Portal and its dependent libraries, including software tools developed in the GEON project. The so-called “core” portal functionality is generic, e.g. search and data ingestion capabilities, and can mostly be leveraged “out of the box” by other projects. The Data Server provides the capability to host data registered through the portal and also provides other data management services. The Data Server runs the SDSC Storage Resource Broker (SRB) software, which provides a number of built-in data management services. The Data Server could also host additional data management services, e.g. GIS software. The Portal Server can communicate with Web services hosted at remote locations using the standard web service protocols.

The CA Server runs the Grid Account Management Architecture (GAMA) software, which manages users accounts through the portal. The CA server is installed using the Rocks software and the GAMA roll. Once installed, the system is fully configures as a CA. Also, the portal software is pre-configured to communicate with this GAMA server for managing user accounts.

Acknowledgements

1. Ilya Zaslavsky, Chesapeake Bay Environmental Observatory (CBEO) Project (http://geon16.sdsc.edu:8080/gridsphere/gridsphere)

2. NEESit – Enabling Earthquake Engineering Research, Education and Practice (http://neesphere.sdsc.edu:8080/gridsphere/gridsphere)

3. National Ecological Observatory Network (NEON) (http://neon.sdsc.edu:8080/gridsphere/gridsphere)

4. Karan Bhatia, Kurt Mueller, Sandeep Chandra, Grid Account Management Architecture (GAMA) (http://grid-devel.sdsc.edu/gridsphere/gridsphere?cid=gama)