INTEGRATING GEOLOGIC DATA IN THE NGMDB, A STANDARDS BASED APPROACH USING GEOSCIML AND OPEN SOURCE TOOLS
GeoSciML is a transport mechanism and schema developed under the auspices of the CGI and demonstrated successfully in late 2006 at the International Association for Mathematical Geology conference in Liege, Belgium. In that demonstration, geologic databases from worldwide participants were queried by a desktop client (not browser-based) to show a consistent set of geologic data across disparate data sets from different countries or agencies.
The phase three prototype of the NGMDB integrates data from Arizona, the Pacific Northwest (Oregon, Washington, and Idaho), as well as several national data sets. Additionally, we demonstrate interoperability with other standards-based services such as the NRCS' soils database and NASA's MODIS satellite data.
The prototype is enabled by a custom-built data import tool that allows for matching data in fields from an input database source such as Oregon or Washington geologic compilations to a database schema used as a backend for the web map service (NGMDB-lite). The NGMDB-lite schema is a flat-file view of a subset of data from the NGMDB database design (Richard et al., 2004). Fields in the input table are matched to corresponding fields in NGMDB-lite. Subsequently, unique values from each input field are matched to corresponding terms in controlled vocabularies defined by the NGMDB.
After matching fields and terms from the input map database to the NGMDB-lite schema and vocabulary, a fully attributed ESRI shapefile is generated. This shapefile is then appended to the an aggregated shapefile master table for display in an Open Source mapping framework developed at Portland State University and managed on SourceForge as the project Map-Fu (http://sourceforge.net/projects/map-fu/).
Map-Fu is an AJAX/AJAJ-implemented browser front-end to map data that uses Javascript and PHP/Mapscript to interact with individual users to handle requests for map data. It includes specific tools for zooming, panning, querying, and rendering individual map layers. It is a "thick client" that runs in all modern internet browsers, and is thus inherently cross-platform.
On the server side, we implement an Open Source stack that consists of Mapserver and PostGIS (a set of GIS extensions for PostgreSQL, a mature Open Source Object-Relational database) running atop Apache and Linux. Mapserver is configured via mapfiles to display and symbolize map data from shapefiles for quick response to user requests. PostGIS is used to answer queries by the user and to enable interaction with the database so that registered experts can update the database through a GeoWiki.
The GeoWiki allows users to draw points or polygons on a map interface and update information in the database on paleontological, engineering properties, hydrogeology, or general comments. This facilitates a wider community's participation in making this resource useful for an even larger user group. Expert's data are stored directly in the NGMDB table structure.
As a proof of concept for integrating multiple data sets stored locally or remotely, we serve a local data set that is compiled from the all of the above named sources (with the exclusion of Oregon) from a shapefile located on the server at PSU. We have configured the Oregon data as a Web Feature Service (WFS) that responds to requests from remote servers. This is stored as a reference in the mapfile as if it were a remote source, and displayed along with the other data simply as another layer. Requests for GeoSciML that cross boundaries of local versus "remote" services (the border of Washington and Oregon, example) still return standardized fields and science terminology.
This system is a model for aggregating multiple data sets from many agencies. Some organizations have sufficient resources to set up a WFS server and maintain their own GeoSciML-compliant data which can be integrated into our system. Some organizations, however, lacking these resources, could simply provide the data to our project for hosting on the NGMDB site. A third option is to allow organizations access to a virtual server on our system; they will have their own subdomain, for example id.ngmdb.us (Idaho), and manage their own data as if it existed on their own local server. Each of these three scenarios is mediated by our custom Data Import Tool, which allows the expert geologist for a region to map their data fields to a common schema and the unique values contained within to controlled science terminology of the NGMDB.
The end-user's process of data discovery and use will be further enabled by the ability to overlay standards-compliant WFS and WMS (or generically OWS) services in our mapping framework. However, since all of our services will be broadcast as OWS services, any other agency or organization could create their own "mashup" with data of their own choosing in whatever client they choose, including proprietary desktop clients such as ArcExplorer, or any of a number of clients, including virtual earths, that will be proliferating in the near term.
References
Richard, S.M., Craigue, J., Soller, D.R., 2004, Implementing NADM C1 for the National Geo-logic Map Database, in Soller, David R., Editor, Digital Mapping Techniques '04-Workshop Proceedings, U. S. Geological Survey Open-file Report 2004-1451, p. 111-144, accessed at http://pubs.usgs.gov/of/2004/1451/pdf/richard.pdf