Geoinformatics 2007 Conference (17–18 May 2007)

Paper No. 1
Presentation Time: 2:30 PM-4:30 PM

A DEPLOYABLE GEON LIDAR PROCESSING AND ANALYSIS SYSTEM


JAEGER-FRANK, Efrat1, CHANDRA, Sandeep1, CROSBY, Christopher J.2, NANDIGAM, Viswanath1, MEMON, Ashraf1, ARROWSMITH, J. Ramon2, ALTINTAS, Ilkay1 and BARU, Chaitan1, (1)San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0505, (2)School of Earth and Space Exploration, Arizona State Univ, Tempe, AZ 85281-1404, efrat@sdsc.edu

Distribution, processing and analysis of large LiDAR (Light Distance And Ranging, also known as ALSM (Airborne Laser Swath Mapping)) datasets pushes the computational limits of typical data distribution and processing systems. The high point-density of LiDAR datasets makes processing difficult for most geoscience users who lack the computing and software resources necessary to handle these massive data volumes. Over the past two years, as part of the Geosciences Network project (GEON), we have developed a three-tier architecture--called the GEON LiDAR Workflow (GLW)--to facilitate community access to LiDAR datasets. The GLW uses the GEON portal, a workflow system based on the Kepler scientific workflow environment, and a set of services, for coordinating distributed resources using emerging Grid technologies and the GEONGrid clusters. The GLW is available to the community via the GEON portal and has proven itself as an efficient and reliable LiDAR data distribution and processing tool.

The increasing popularity of the GLW has led to several requests for deployment of the system to additional sites and projects. We are currently in the process of creating an automatic deployment of the system that requires a minimal amount of user intervention (known as a “roll”). As an initial phase, we have replicated the processing services originally deployed on a GEON cluster at Arizona State University, to additional compute clusters at the San Diego Supercomputer Center and at UNAVCO in Boulder, CO. With the GLW deployed on multiple processing clusters, we can improve the system load balancing and provide a failover site. We further plan to enhance the system by utilizing a Grid scheduler to map jobs onto the Grid clusters by taking into account the availability of the corresponding compute, storage, and networking resources. Deploying the GLW on distributed sites also imposes additional requirements on the system for increased robustness, and more system monitoring information. For example, users are interested in tracking the execution state of their LiDAR processing job in real time. A number of new features were added to the GLW to address these requirements. We use the Kepler “provenance” capability, which collects job provenance data, to enhance the GLW's job monitoring interface to provide users with live job status monitoring. The data provenance is also useful when publishing results and sharing GLW products among scientists. With these enhancements we expect to make the GLW more robust and useful to a wide range of earth science users. The GLW is currently deployed on 3 sites, has 126 users who have submitted a total of 1250 LiDAR processing requests for a total of over 500 Giga bytes of data.