Geoinformatics 2007 Conference (17–18 May 2007)

Paper No. 2
Presentation Time: 2:30 PM-4:30 PM

LIDAR-IN-A-BOX: SERVING LIDAR DATASETS VIA COMMODITY CLUSTERS


NANDIGAM, Viswanath1, BARU, Chaitan2, CHANDRA, Sandeep1 and FRANK, Efrat1, (1)San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, (2)San Diego Supercomputer Center, Univ of California, San Diego, La Jolla, CA 92093-0505, viswanat@sdsc.edu

The Geosciences Network (GEON, www.geongrid.org) is an NSF-funded project to create an IT infrastructure that facilitates a collaborative, interdisciplinary science effort in the field of earth sciences. GEON facilitates data registration, ingestion, and integration of a range of geoscience data types including LiDAR (Light Distance And Ranging) data. LiDAR datasets can be used to create high quality digital earth surface models, which are useful in a variety of geoscience and geospatial applications. The recent, rapid increase in the rate of acquisition and popularity of these datasets far outpaces the resources available to most geoscientists for processing and using these data.

GEON provides a novel approach for processing and distributing LiDAR datasets and derived products using a high performance backend database machine, a portal as the front-end user interface, and the Kepler scientific workflow system for managing the computations. Currently, the LiDAR datasets are stored in an IBM DB2 database running on one of the nodes of DataStar - an IBM supercomputer system that is one of the computational resources in the TeraGrid. This node is linked to a large disk subsystem via a high-end fibre channel link. This machine configuration is well suited to handle the massive amounts of LiDAR data, which frequently exceed several millions of data points per dataset.

We propose a new approach to hosting LiDAR data based on commodity clusters, which can provide a better price/performance solution. This is achieved by taking advantage of DB2's "partitioned database" feature. In this approach, the LIDAR database tables would be partitioned across multiple machines or "nodes". Each partition is managed by an independent database manager, each with its own data, configuration files, indexes, and transaction logs. This architecture provides better scalability—new machines can be added to the complex and the database can be expanded across them. In this paper, we describe this new, parallel database architecture for hosting LiDAR data. We refer to this system as the "LiDAR in the Box" since one of the benefits of this approach is that individual researchers will be able to deploy such a system at their sites. We will describe the approach that will be used to make the LiDAR-in-a-Box easily deployable.