A DATA INTEGRATION AND INTEROPERABILITY BLUEPRINT FOR USGS
The USGS Geospatial Information Office is currently leading efforts to develop a long-term plan or "blueprint" for data integration, accessibility, discovery, and interoperability across the USGS. The Data Integration Blueprint will include projects that provide meta data, data content standards, infrastructure, and informatics, that enhance scientific techniques, improve data access, provide management visibility, advance the strategic directions of the USGS Science Strategy, and connect the USGS to its partners and collaborators through participation in international efforts to develop a global science and computing platform for the 21st century. The plan will be comprehensive by design, incorporating the data integration and scientific tools development efforts of all USGS into a single framework with common practices and a seamless infrastructure. The USGS will work with its partners and national and international cyber infrastructure activities to develop this framework. Background
The USGS is a world leader in monitoring, assessment, and research related to the natural sciences. Coupled with a diverse multidisciplinary workforce, extensive monitoring networks, and national and regional scale approaches, the USGS has carved out a reputation for being the "authoritative source" of specific national data sets such as water quality, cartographic base, land cover/land use, biological resources, and geologic mapping. As the future unfolds, the USGS ability to map and integrate this data will be critical for the advancement of all science directions. Some of the major mission activities that USGS engages in include:
- Collect and maintain long-term national and regional geologic, hydrologic, biologic, and geographic databases
- Collect, process, and analyze earth and planetary imagery and remote sensing
- Develop open-source models of complex natural systems and human interaction with that system
- Maintain national and global geologic, biologic, hydrologic, and geographic monitoring systems
- Archive and preserve physical collections of earth materials, biologic materials, reference standards, geophysical recordings, paper records.
- Develop standards of practice for the geologic, hydrologic, biologic, and geographic sciences
The USGS maintains a large number of science data sets at local, regional, and national scales. The USGS ability to integrate this data is critical to the achievement of Department of Interior (DOI) mission objectives in Resource Protection, Resource Use, and Serving Communities and the USGS national federal mission of conducting science and serving earth and biological data. Development of a fully integrated science data environment will improve the accessibility of science data and information within the USGS, across the DOI, and with its scientific partners, collaborators and customers in other federal agencies and the public. Greater access to a broad range of integrated science data will spark new discovery and support a wider range of inquiry, better informing and enhancing the decision making of managers, policy makers, and stewards of the Nation's resources.
Examples of some of the long-term national data sets maintained by USGS include:
- The National Map -- topographic, orthoimagery, hydrography, etc.
- MRDATA -- Comprehensive source of mineral resource data
- The National Geologic Map Database-a standardized community collection of geologic mapping
- NWISWeb- the National Water Information System
- The National Geochemical Database -- collection of rock, stream sediment, and other materials analyzed by the USGS
- National Geophysical Database (aeromagnetic, gravity, and aeroradiometric)
- National and Global Earthquake Catalogs
- North American Breeding Bird Survey
- National Vegetation/speciation maps
- National Oil and Gas Assessment
- National Coal Quality Inventory
The conduct of science is changing worldwide. There is widespread recognition that the earth's complex natural systems are interrelated and that scientific inquiry must be equally integrated to develop new understanding of the implications for the environment, land management, resource utilization and policy making. Complex scientific questions require the analysis, integration, and modeling of science data and information from multiple disciplines, locations, and timeframes. The USGS and its partners, including industry, Federal, State and Local Governments, Universities and Associations as well as international scientific organizations are beginning to connect and integrate the data and research techniques of the world's scientists, making them accessible to a global science community and transforming the way in which research, engineering, and education are conducted. Science data integration within the USGS is a prerequisite for joining these international efforts to develop a worldwide science collaboration and computing platform that can address future environmental science challenges.
For example, Phenology is the study of periodic plant and animal life cycle events that are influenced by environmental changes, especially seasonal variations in temperature and precipitation driven by weather and climate. Phenological events record - immediately and empirically - the consequences of environmental variability and change vital to the public interest. Variability in phenological events, such as the beginning of the growing season, can have important environmental and socio-economic implications for the economy, health, recreation, agriculture, management of natural resources, and natural hazards. Although phenology is a far-reaching component of environmental science, it is not well understood. The predictive potential of phenology requires a new data resource - a national network of integrated phenological observations. A USA National Phenology Network (USA-NPN) is currently being designed and organized to engage federal agencies, environmental networks and field stations, educational institutions, and mass participation by citizen scientists. The initial phase will establish a continental-scale network focused on phenological observations of a few regionally appropriate native plant species and nationally-cultivated indicator plants. The USGS must not only integrate its scientific data to support this effort, but must also integrate data from other monitoring activities, such as water availability, and soil chemistry to inform larger National issues such as climate change and ecosystems restoration.
Some of the National and Global monitoring systems that the USGS maintains include:
- National Stream Flow Information Program
- Advanced National Seismic System
- National Volcano Early Warning System
- Debris Flow Warning System
- Global Terrestrial Network for Permafrost
- Landsat 5 and 7
- Biomonitoring of Environmental Status and Trends
- National Bird Banding Program
- Land Cover/Land Change Monitoring
- Famine Early Warning System
- National Water Quality Assessment Program
In 2006, the Director of the Survey chartered a team to develop a new USGS Science Strategy. That strategy, entitled "Facing Tomorrow's Challenges: USGS Science in the Coming Decade" will be released in April 2007, and includes 6 major science goals and a special chapter on "New Methods of Investigation and Discovery" that provides the following long-term vision for USGS data integration as follows:
The USGS supplies an information environment where diverse and distributed knowledge is accessed and used seamlessly by scientists, collaborators, customers, and the public to address complex natural science issues.
The USGS Science Strategy also lays out the following strategic actions to accomplish this long term vision:
- Incorporate planning for long-term data management and dissemination into multidisciplinary science practices.
- Adopt and implement open data standards within USGS and contribute to the creation of new standards through international standards communities.
- Develop and implement a comprehensive scientific cataloging strategy that incorporates existing data sets resulting in an integrated science catalog.
- Develop a sustainable data-hosting infrastructure to support the retention, archiving, and dissemination of valuable USGS data sets in accordance with open standards.
- Develop and enhance tools and methods that facilitate the capture and processing of data and metadata.
- Identify and support authoritative data sources within USGS programs and encourage development and adoption of standards.
- Build and strengthen the internal workforce augmented by external partnerships in environmental information science.
- Identify and leverage national and international efforts that promote comprehensive data and information management and foster greater sharing of knowledge and expertise.
- Partner with collaborators and customers to facilitate data integration across the world-wide science community.
- Partner with collaborators and partners in the development of informatics tools and infrastructure that contribute to the evolving global science computing and collaboration platform.
The last three strategic actions are key to successful creation of an international cyber-infrastructure for the sciences. One of the ways to achieve this collaboration is through the creation and participation in "communities of practice". A community of practice is not merely a community with a common interest. But are practitioners who share experiences and learn from each other. They develop a shared repertoire of resources: experiences, stories, tools, vocabularies, and ways of addressing recurring problems. This takes time and sustained interaction. Standards of practice and reference materials will grow out of this experience. But the critical benefits include: creating and sustaining knowledge, leveraging of resources, and rapid learning and innovation.
Figure 1. Schematic of a service oriented architecture for integration of USGS data.