EARTHREF.ORG IN THE CONTEXT OF A NATIONAL CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
1) The Geochemical Earth Reference Model initiative (GERM, http://EarthRef.org/GERM/) is a consortium of leading geochemists who promote understanding of Earth chemistry on a planetary scale, by sponsoring scientific workshops and general improvements in CI. GERM was started more than a decade ago and shortly after its inception played a pivotal role in the creation of the widely-used GERM website and the development of AGU's electronic journal, G-cubed. GERM has also served as a forum for the continued development of several independent databases, such as PetDB and GeoROC. The GERM website offers a range of resources, including compositional data, partition coefficient data and computational tools. It is also a source for geochemical publications, such as GERM conference circulars and abstract volumes. 2) The Seamount Catalog (http://EarthRef.org/SC/) is a data repository for seamount geographic information, maps and a wide range of other seamount data. It is also the central website of the recently founded Seamount Biogeoscience Network (SBN; http://EarthRef.org/SBN/) and it aims to provide an equally useful CI for disciplines that range from marine geophysics to microbiology to fisheries, with the goal of helping integrate the research from these diverse disciplines on seamounts. 3) The Magnetics Information Consortium's (MagIC; http://EarthRef.org/MAGIC/) goal is to provide a data archive and interactive visualization website for paleomagnetism and rock magnetism (see Constable abstract). This is a community-driven database, where users can upload, search and use newly derived scientific data as well as data from peer-reviewed legacy publications. 4) The Enduring Resources for Earth Science Education (ERESE; http://EarthRef.org/ERESE/) project is part of the National Science Digital Library (NSDL) and offers a wide range of educational resources, in particular, for the teaching of plate tectonics. ERESE combines teachers' professional development with the development of digital library resources, and provides web seminars in collaborations with the National Science Teacher's Association (NSTA). These four initiatives have a broad range of clients, with dramatically different database and IT needs, distinct working styles, and variable levels of comfort with CI and IT. However, combining these diverse CI components under the EarthRef.org umbrella has been quite cost-effective, and allows each effort to use common IT resources and to share software development work. This philosophy has also proved rather appealing for its end-users: EarthRef.org now has over 2,500 registered users, has received 188,000 unique users over the last three years, and expects to receive roughly 120,000 unique users in 2007.
EarthRef.org has a number of key common features that reach across the GERM, MagIC, SBN and ERESE initiatives, many of which we consider important to establish functional CI in Geosciences. Collaboration between IT developers and leading members of the science community: Much of EarthRef.org has been developed by, or in close collaboration with active Earth scientists, who produce and use data, advise students, and have substantive current publication and science funding activity. Their personal and practical experience and perspectives on how Earth science is done successfully are a major asset in these developments, and contribute directly to the resulting EarthRef.org web environment. Integrating IT development with science conferences: EarthRef.org organizes top-level scientific conferences with invited speakers that synthesize the state-of-the-art in a given field. Leading scientists involved in major recent scientific developments are chosen as keynote speakers for these conferences. We use these science conferences to help a community hone its science vision, with an eye on the CI needs that will ultimately address and solve their grand challenges. Robust hardware-software requirements: EarthRef.org is hosted by the San Diego Supercomputer Center. This hosting guarantees deep archiving and safeguarding of all scientific data uploaded to the EarthRef.org databases. It also provides EarthRef.org with continuous upgrades for the underlying hardware, ensuring that its databases and websites operate with the best technology standards available. Laying the foundation for a large-scale cooperation to keep scientific content up-to-date: Much care has been taken to enable the EarthRef.org users to participate (and ultimately take over) data population for legacy as well as primary (new) data. To that end we have created various data formats and upload wizards that are easy to use via the EarthRef.org website. Data uploaders have the option of a proprietary hold where unpublished data are not accessible to the public. In other words they can keep their data private and view it within the context of the existing EarthRef.org databases. Users can also provide group names and passwords, in order to authorize limited groups of co-workers, students, teachers or even reviewers to query and visualize their private data. Archiving original records for legacy data for user-based quality control and quality assurance: Legacy data entries into EarthRef.org are accompanied by several file types, including the original scanned image of data tables, and data files generated by optical character recognition (OCR) of the scanned image. This allows a user to trace data to the original source and to explore scanned images as a cause for errors. Establishing an information continuum that ranges from top science to basic education: Earthref.org and in particular ERESE work to bridge the gap between science and general education, through a genuine collaboration between educators and scientists, and by making science database contents accessible for education and public outreach. Some of this is accomplished through specific contents developed for the educational community, but there are also some basic metadata that allow educators to screen for specific database contents. To this end all EarthRef.org data contents carry metadata on the minimum expert level needed to be able to work with a particular digital object. This scale ranges from one (the most basic primary school level) to nine (the expert scientist level). This relative scale allows educational users to extract objects for use in the classroom or curriculum design by browsing based on a specified expert level. Provide a set of basic CI building blocks that may be used by diverse CI efforts: A wide range of features are common to all EarthRef.org initiatives, including an online Earth sciences address book, a digital library archive and an Earth science reference database. The address book allows us to keep track of users uploading new data, but more importantly, because it is entirely Google-friendly, it allows users to easily find contact information for their colleagues. The EarthRef Digital Archive (ERDA) is a multi-purpose digital library that can archive any arbitrary digital object (ADO) and does provide the basic machinery for deep storing of ADO's and for linking data files to the GERM, MagIC, SBN and ERESE portals. The EarthRef.org Reference Database (ERRD) contains close to 100,000 references from Earth science publications as provided directly by the publishers. This is an invaluable resource that allows us to confidently link all the user-provided data to the original publisher and to the publications on their respective websites. Geospatial Referencing Through Google Maps: Geospatial parameters and data type are amongst the top-rated search parameters for geosciences data. The Google Maps interface offers an intuitive representation of any parameter. We have begun the development of a Google Maps interface for seamounts based on a combination of the multibeam data and satellite altimetry data stored in Earthref.org. However, this interface is equally useful for searching the SBN, MagIC and ERDA databases and has been implemented within each of these search portals.
EarthRef has been built by scientists/educators for scientists and educators, and our experience has been that much can be done with well known IT components focusing on key practical and some more visionary matters. Practical issues include the ease and efficiency of information/data acquisition to use, re-use, modeling and visualization and a close link between science and education at all levels. Vision issues focus on anticipating future directions of education, science and in particular multidisciplinary science integration. This type of vision has to translate into a specific CI design with a meaningful metadata structure and ontologies that allow us to combine data in new ways. There is a general consensus that a successful CI will create an environment that advances science in profound ways, helping achieve new levels of understanding and addressing the grande challenges. Key to such a CI will be community buy-in and ownership. We argue that this ownership has to start right from the beginning of the development, whereby the science process is profoundly integrated with CI development.