INTEGRATION OF HYDROLOGIC OBSERVATIONS FROM GOVERNMENT AND ACADEMIC DATA COLLECTIONS WITHIN THE CUAHSI HYDROLOGIC INFORMATION SYSTEM
Heterogeneity across the information systems, and lack of standard and widely adopted information models, data exchange protocols, and agreed-upon semantics for data interchange, as well as often incompatible policies on data serving, data retention, security, funding, etc., are the main challenges of integrating observational data across agencies and academic projects. Within the CUAHSI HIS project, these challenges have been addressed by:
- Developing a common information model for observations data collected by stationary points (measurement stations), that would be uniform across government and academic sources
- Implementing the common information model as a) a relational schema called Observations Data Model (ODM) that supports publication of observations data collections developed as part of academic projects [Horsburgh and others, in press], b) a series of databases storing observations data catalogs describing agency repositories, and c) a standard XML schema for exchanging water observations, called Water Markup Language, or WaterML [Zaslavsky and others, 2007]
- Developing web services with a common set of method signatures, to return WaterML-compliant information about observation stations (GetSites, GetSiteInfo), variables (GetVariables, GetVariableInfo), and values (GetValues). These services, called WaterOneFlow services, are implemented as XML wrappers over web-based data access systems maintained by federal and state agencies, such as NWISWeb. In the last year, these services have been re-coded to take advantage of the web services developed by our partner agencies. For example, the USGS NWIS team has published a beta version of a web service that provides programmatic access to NWIS Daily Values data in a WaterML-compliant form. A similar effort is undertaken at NCDC to publish ASOS data following the WaterML schema. At the same time, EPA has developed the Water Quality Exchange (WQX) framework for sharing water quality data and submitting them to the STORET data warehouse. Mapping of WQX elements to WaterML is the basis for recoding WaterOneFlow services for the STORET repository.
- Managing varying semantics by mapping water quantity, water quality and other parameters collected at government agencies, to a common vocabulary. This task is essential due to the size and heterogeneity of available parameter codes (for example, both USGS NWIS and EPA STORET have 15,000+ listed parameter codes) and differences in naming conventions adopted at different agencies and research groups. The mapping supports cross-database search, as users navigate a parameter ontology and find variables at each observation network that have been associated with the search concept. For example, search for nitrate measurements in a given area may uncover a range of stations maintained by USGS, EPA, other agencies, as well as academic projects, where nitrate-related variables were measured. The online mapping system for cross-database search and retrieval is called Hydroseek [Beran and Piasecki, in press].
- Creation of an observations data publication environment where local data managers can load observations data they collected, into ODM, validate them, publish them as web services, configure them to be presented via an online mapping interface called Data Access System for Hydrology (DASH), associate variable names with common ontology terms, and register the web services at a central HIS site to be included in the global search application, Hydroseek. The components of the data publication workflow are part of the HIS Server, which has been deployed, over the last year, at hydrologic observatory test beds to support publication of local observations data.
- Development of online user interfaces that support combining disparate data into common spatial and temporal representations.
The components mentioned above, are organized in a service-oriented architecture; its general outline is shown in Figure 1. At the physical level, the HIS includes software stacks for HIS Server and HIS Server Lite (the latter is based on free software components only), which are being deployed to the 11 NSF-supported hydrologic observatory test beds and enable uniform publication of local observational data from mostly academic sources. The central HIS site at San Diego Supercomputer Center (SDSC) serves observations data catalogs that contain sufficient information for formulating data retrieval requests against agency data repositories.
FIGURE 1 NEAR HERE.
Until now, retrieving data from most government repositories was a major bottleneck, as CUAHSI web services worked by wrapping respective agency web sites (NWIS, STORET, etc.) into XML wrappers, and hence were sensitive to changes in page layout, not to mention the need to relay the data via SDSC servers. In addition, we have used the web service wrappers to harvest observations data catalogs from agency web sites, which was also an error-prone process. As collaboration with the agencies on web services development intensified in the last year, this situation is changing. The HIS project now receives database snapshots for building catalogs and enable rapid data discovery, and connects to newly developed WaterML-compliant or other web services that are hosted at agency servers and enable faster data retrieval. The same model is being extended now to state agencies, as states of Florida, Texas and Idaho are implementing their HIS systems.
Acknowledgements
NSF award EAR-0622374 is gratefully acknowledged (PI: D.R. Maidment). Also, we gratefully acknowledge cooperation, insightful discussions and help provided by partner agencies personnel from USGS (R. Hirsch, K. Lins, D. Briar, D. Coyle, M. Hamill and other members of the Division of Water Resources), EPA (C. Spooner, M. Hamilton, R. Hill and the STORET team), and NCDC (R. Baldwin).
References Cited
Bandaragoda, C. J., Tarboton, D. G. , and Maidment, D. R., 2005, User Needs Assessment: Chapter 4 in Hydrologic Information System Status Report, Version 1, Edited by D. R. Maidment, p.48-87, available online at http://www.cuahsi.org/docs/HISStatusSept15.pdf. (Accessed April 20, 2008)
Beran, B., and Piasecki, M., in press, Engineering new paths to water data: Computers and Geosciences.
Horsburgh, J. S., Tarboton, D. G., Maidment, D.R. and Zaslavsky, I, in press, A Relational Model for Environmental and Water Resources Data: Water Resources Research.
Zaslavsky, I., Valentine, D., and Whiteaker, T., ed., 2007, CUAHSI WaterML, Open Geospatial Consortium, Inc., document OGC 07-041. Available online at http://www.opengeospatial.org/standards/dp. (Accessed April 20, 2008)
Figure 1. Main components of the CUAHSI HIS Service Oriented Architecture.
Acronyms
- ASOS: Automated Surface Observing System
- CUAHSI: Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.
- DASH: Data Access System for Hydrology
- HIS: Hydrologic Information System
- NWIS: National Water Information System
- ODM: Observations Data Model
- SNOTEL: Snowpack Telemetry
- STORET: Storage and Retrieval
- WaterML: Water Markup Language