Paper No. 9
Presentation Time: 10:30 AM

THE PROPOSED CUAHSI WATER DATA CENTER: EMPOWERING SCIENTISTS TO DISCOVER, USE, STORE, AND SHARE WATER DATA FROM MULTIPLE SOURCES


ARRIGO, Jennifer S.1, COUCH, Alva L.2, HOOPER, Richard P.2 and POLLAK, Jonathon2, (1)CUAHSI, 196 Boston Ave. Suite 3800, Medford, MA 02155, (2)CUAHSI, 196 Boston Ave. Suite 3000, Medford, MA 02155, jarrigo@cuahsi.org

The proposed CUAHSI Water Data Center (WDC) will provide production-quality water data resources based upon the successful large-scale data services prototype developed by the CUAHSI Hydrologic Information System (HIS) project, providing time series data collected from sensors primarily (but not exclusively) in the medium of water. The WDC’s missions include providing simple and effective data discovery tools useful to researchers in water-related disciplines, and providing simple and cost-effective data publication mechanisms for researchers. The WDC’s activities will include:
  1. Rigorous curation of the water data catalog already assembled by the HIS project.
  2. Data backup and failover services for “at risk” data sources.
  3. Creation and support for ubiquitously accessible data discovery and access.
  4. Partnerships with researchers to extend the state of the art in water data use.

The WDC will serve as a knowledge resource for researchers, and will interface with other data centers to make their data more accessible to water researchers. The WDC aims to address some of the grand challenges of accessing and using water data, including:

  1. Cross-domain data discovery: different scientific domains refer to the same kind of data using different terminologies, making discovery of data difficult for researchers in other disciplines.
  2. Cross-validation of data sources: much water data comes from sources lacking information on quality control; such sources can be compared against others with rigorous quality control.
  3. Data provenance: the appropriateness of data for use in a specific model or analysis often depends upon the exact details of how data was gathered and processed.
  4. Contextual search: discovering data based upon geological (e.g. aquifer) or geographic features.
  5. Data-driven search: discovering data that exhibit quality factors that are not described by the metadata.

Many major data providers (e.g. federal agencies) do not have the mandate to provide access to data other than their own. The HIS has assembled data from more than 90 different sources, thus demonstrating the promise of this approach. Meeting the grand challenges listed above will greatly enhance scientists’ ability to discover, interpret, access, and analyze water data from across domains and sources to test Earth system hypotheses.