LIDAR-IN-A-BOX: SERVING LIDAR DATASETS VIA COMMODITY CLUSTERS
GEON provides a novel approach for processing and distributing LiDAR datasets and derived products using a high performance backend database machine, a portal as the front-end user interface, and the Kepler scientific workflow system for managing the computations. Currently, the LiDAR datasets are stored in an IBM DB2 database running on one of the nodes of DataStar - an IBM supercomputer system that is one of the computational resources in the TeraGrid. This node is linked to a large disk subsystem via a high-end fibre channel link. This machine configuration is well suited to handle the massive amounts of LiDAR data, which frequently exceed several millions of data points per dataset.
We propose a new approach to hosting LiDAR data based on commodity clusters, which can provide a better price/performance solution. This is achieved by taking advantage of DB2's "partitioned database" feature. In this approach, the LIDAR database tables would be partitioned across multiple machines or "nodes". Each partition is managed by an independent database manager, each with its own data, configuration files, indexes, and transaction logs. This architecture provides better scalabilitynew machines can be added to the complex and the database can be expanded across them. In this paper, we describe this new, parallel database architecture for hosting LiDAR data. We refer to this system as the "LiDAR in the Box" since one of the benefits of this approach is that individual researchers will be able to deploy such a system at their sites. We will describe the approach that will be used to make the LiDAR-in-a-Box easily deployable.