2008 Geoinformatics Conference (11-13 June 2008)

Paper No. 8
Presentation Time: 12:20 PM

IMPLEMENTATION PLAN FOR THE GEOSCIENCE INFORMATION NETWORK (GIN)


ALLISON, M. Lee1, GUNDERSEN, Linda C.2, RICHARD, Stephen M.1 and DICKINSON, Tamara L.3, (1)Arizona Geological Survey, 416 W. Congress, #100, Tucson, AZ 85701-1381, (2)U.S. Geological Survey, MS 911 National Center, Reston, VA 20192, (3)U.S. Geological Survey, 911 National Center, 12201 Sunrise Valley Dr, Reston, VA 20192, lee.allison@azgs.az.gov

Rationale for a Geosciences Information Network

Many of the challenges to creating an Earth science cyberinfrastructure are not technical but organizational and cultural in nature. Recent workshops have focused on how to achieve cooperation, integration, and community governance of a geoinformatics system. The critical stumbling blocks to creating a wide-reaching geoinformatics component of the cyberinfrastructure for the sciences are: agreements on common standards and protocols; engagement of the vast number of distributed data resources; practices for recognition of and respect for intellectual property; a simple data and resource discovery system (distributed integrated catalogues); mechanisms to encourage development of web service tools for analyses; and business models for continuing maintenance and evolution of information resources.

Geoscience Information Network (GIN)

The Association of American State Geologists (AASG) and the U.S. Geological Survey (USGS), agreed in 2007 that ″ the nation's geological surveys develop a national geoscience information framework that is distributed, interoperable, uses open source standards and common protocols, respects and acknowledges data ownership, fosters communities of practice to grow, and develops new web services and clients″ (Allison et al., 2008). The AASG and USGS subsequently formed an interagency Steering Committee to pursue design and implementation of the Geoscience Information Network (GIN). The national GIN concept involves four modular components:

1. Agreement on open-source standards and common protocols through the use of Open Geospatial Consortium (OGC) standards.

2. A data exchange model that will start by utilizing the geoscience mark-up language GeoSciML (CGI IWG, in press; Cox and Richard, 2006), which is an OGC, Geography Mark-up Language (GML)-based application.

3. Prototype data discovery tools or catalogues (National Data Catalogue – NDC - developing under the USGS National Geological and Geophysical Data Preservation Program – NGGDPP) and National Geologic Map Database - NGMDB).

4. Data integration tools developed or planned by a number of independent projects that can be applied to various applications.

The ″lack of a national (U.S.) civil Earth information strategy″ was noted by Gail et al (2007). They argue that the Global Earth Observation System of Systems (GEOSS) and the U.S. Group on Earth Observations fall short in addressing the nation's Earth information needs. Instead, they call for the U.S. to ″commit to a National Earth-Information Initiative to re-evaluate the national process of collecting and using civil Earth information, including the effectiveness of governmental organizations, the relationship between government functions and private sector activities, and the ability to effectively connect scientific developments to societal uses.″ We believe implementation of the GIN will effectively fill this role.

National data base inventory

As part of the NGGDPP, 36 U.S. state geological surveys are compiling inventories of data and samples they maintain or that are outside the surveys but are available to be archived, or that are at risk of being lost. The USGS will compile these inventories into a preliminary assessment of the scope and size of geologic data resources in geological surveys or available to them. Next year, the states will start compiling metadata catalogues for these data. These resources are the primary initial target of the GIN.

Completing the GIN

The GIN implementation plan will enable basic network operation by establishing service definitions, standard protocols, and best practices through community workshops, and implementation of the architecture via a series of test bed systems. The first test bed will focus on services for serving interpreted geospatial features (for example, a geologic map), implemented in the context of the IUGS-CGI Interoperability Working Group GeoSciML development. Priorities for subsequent service development will be established by a Steering Committee; one high priority candidate is serving observation data recorded at point locations (for example, samples, chemical analyses, boreholes). Test bed network nodes will be initially implemented and tested on a single server and after a demonstration for the community the service will be rolled out to other nodes in the network. We will seek expressions of interest from state geological surveys and individual USGS programs to participate in each test bed.

The network will use data discovery services that are being implemented as part of the AASG-USGS NGGDPP and the USGS NGMDB. Web services will enable integration of GIN data with other applications and data sources.

Sustainability

Like the Internet, a successful information network will create a tipping point at which users and providers will see the network as critical to their basic functions such that populating and maintaining that network becomes a necessary cost of doing business. Few organizations are mandated to maintain a web site yet most realize that without one, they essentially do not exist in today's environment. We are quickly moving to a similar situation for sharing data in an interoperable manner.

The AASG-USGS workshop participants acknowledged the need to recognize providing and using interoperable, web-enabled information resources as part of their mission and the GIN value should be sufficiently compelling to support network maintenance and development just as they currently do for web sites. Once the framework GIN is built and test beds demonstrated successfully, we expect that other data providers and users will find compelling needs for use of the network for a wide variety of specific tasks, that will help fund full implementation and expansion of the GIN. We also expect each network participant will include costs for expanding their contributions to GIN in their base operating costs and grant proposals in the same way costs for web site activities are funded.

Education and training

We plan a ″Circuit Rider″ approach wherein GIN staff are dedicated to providing potential network participants with technical training or actually carrying out the technical work themselves by ″riding the circuit″ among them for short durations. The Circuit Riders services will be free, but will need to be prioritized by the Steering Committee. Our goal is to give each geological survey and USGS program the ability to write GeoSciML protocol ″wrappers″ to translate their data sets, and to guide them on server configurations necessary for the data sets to be discoverable by GIN users. For surveys or programs without the technical expertise to handle these chores, the Circuit Rider would carry them out either on site or remotely as required. Various online services exist to facilitate a virtual environment for the Circuit Riders to work interactively in real time with network participants, including shared access to computers or servers while writing or tutoring on code development.

A Help Desk will provide no-cost remote assistance to providers and users. The goal is to provide service to not only the initial survey data providers but to other organizations that want to be early adopters of the GIN opportunities.

Mechanisms for change and adaptation in technologies

The challenge in creating a dynamic flexible community-based network is defining and maintaining sufficient standards to make the network effective and reliable while keeping it open to new developments. The GIN will be defined by collections of service definitions, interchange formats, and vocabularies that are established (to the degree possible) independent of any particular hardware, operating system, or lower-level network protocols. Adoption of new technology will only require implementation of network elements in a new environment, ideally with no change to any network service definitions or protocols. The architecture allows for the use of multiple conventions for different user groups.

Acknowledgements

This project is supported by the NSF, award 0723437 to the AASG, and by the USGS NGMDB, through support of S.M. Richard.