DEPLOYING DATA PORTALS
In practice, deploying a portal framework like GEON portal requires an understanding of the various portal software components, and their dependencies, that go into engineering such a system. In addition to sites within the GEON network, the GEON software infrastructure is increasingly being adopted in other projects such as the Chesapeake Bay Environmental Observatory (CBEO), the Network for Earthquake Engineering Simulations (NEES), an Archeoinformatics project, and the National Ecological Observatory Network (NEON). Based on this experience, we have developed a modular packaging of the various components of the system to allow easy installation and configuration of the hardware, middleware, and other software components.
The data portal infrastructure consists of the following components:
1. A Portal Server, which runs the portal software, The nominal system is a rack-mounted, server-class machine with 750 GB raw disk (5 x 146 GB hot-swappable SAS drives), dual core 3.0 GHz Intel Xeon processors, 4 GB of RAM, dual gigabit network interfaces, and redundant power supplies. The portal server runs the GEON software stack, including the portal software and provides connectivity to and interoperability among the other GEON systems.
2. A Data Server, which provides storage and other data management services. This system is also a rack-mounted, server-class machine with dual core 3.0 GHz Intel Xeon processors. It includes 5 x 300 GB RAID SAS drives for a total of 1.5 Terabytes of raw drive space. The data nodes are configured with RAID disks in order to deal with unforeseen disk failures.
3. A Certificate Authority (CA) Server, which manages user accounts. The CA server system has the same basic configuration as the Portal Server, but with 2 GB RAM and the disk size reduced to about 36 GB, since the CA Server tasks are not I/O intensive.
The Portal Server runs the Rocks cluster management software and a standardized GEON software stack, which includes the GEON Portal and its dependent libraries, including software tools developed in the GEON project. The so-called core portal functionality is generic, e.g. search and data ingestion capabilities, and can mostly be leveraged out of the box by other projects. The Data Server provides the capability to host data registered through the portal and also provides other data management services. The Data Server runs the SDSC Storage Resource Broker (SRB) software, which provides a number of built-in data management services. The Data Server could also host additional data management services, e.g. GIS software. The Portal Server can communicate with Web services hosted at remote locations using the standard web service protocols.
The CA Server runs the Grid Account Management Architecture (GAMA) software, which manages users accounts through the portal. The CA server is installed using the Rocks software and the GAMA roll. Once installed, the system is fully configures as a CA. Also, the portal software is pre-configured to communicate with this GAMA server for managing user accounts.
Acknowledgements
1. Ilya Zaslavsky, Chesapeake Bay Environmental Observatory (CBEO) Project (http://geon16.sdsc.edu:8080/gridsphere/gridsphere)
2. NEESit Enabling Earthquake Engineering Research, Education and Practice (http://neesphere.sdsc.edu:8080/gridsphere/gridsphere)
3. National Ecological Observatory Network (NEON) (http://neon.sdsc.edu:8080/gridsphere/gridsphere)
4. Karan Bhatia, Kurt Mueller, Sandeep Chandra, Grid Account Management Architecture (GAMA) (http://grid-devel.sdsc.edu/gridsphere/gridsphere?cid=gama)