THE CULTURAL AND SOCIAL CHALLENGES OF DEVELOPING GEOINFORMATICS: INSIGHTS FROM SOCIAL, DOMAIN, AND INFORMATION SCIENCES
Insights gained by historians and social scientists from the analyses of other kinds of infrastructure such as railroads, telephony, and the Internet can help guide Geoinformatics practitioners to more effectively advance the growth of Geoinformatics and cyberinfrastructure in general. The recently released workshop report History & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures (Edwards et al. 2007), explains the dynamics, tensions, design challenges and navigation strategies that emerge as common patterns and practices of infrastructural development. The report emphasizes the relevance of social and organizational factors in infrastructure development, and concludes that robust cyberinfrastructure will develop only when social, organizational, and cultural issues are resolved in tandem with the creation of technology-based services. The workshop recommends strategic collaborations between social, domain, and information scientists to assist the design of effective navigation strategies that will help realize the vision of cyberinfrastructure. A primary target of these collaborations should be the study of existing cyberinfrastructure projects to reveal key factors in success and failure.
We here present initial insights from analyzing the development of successful Geoinformatics systems for geochemistry. Geochemistry is a discipline characterized by a culture of independent research in the form of small- to medium-scale projects, in which data are acquired by human observers' rather than by sensors, often through idiosyncratic data collection practices and in idiosyncratic formats. Due to the large personal effort involved in generating the data, geochemical data are considered private intellectual property, and are shared only through publications in the scientific literature that guarantee the appropriate credit for data authors. This practice has led to a wide dispersion of data in the literature, making it difficult for the broad Geoscience community to access and efficiently use the full range of available data. Data publications are frequently missing contextual information describing the complex processes of data gathering that is needed in order for other data users to interpret the data.
In the mid-1990's, domain scientists in the US and in Europe independently recognized the need for more efficient access to data to support new research endeavors (e.g. the NSF-funded RIDGE program or the Geochemical Earth Reference Model initiative) and the potential of emerging technologies such as relational databases and the World Wide Web, and started to develop geochemical databases (PetDB, GEOROC, NAVDAT, EarthRef) that were publicly accessible on the internet. These database projects were of limited scope (PetDB focused on mid-ocean ridge basalts and abyssal peridotites; GEOROC initially focused on ocean-island basalts), motivated by the scientists' personal research agendas. They were rapidly embraced by relevant parts of the community because they provided substantial benefits to researchers and educators, who no longer had to expend significant efforts to produce their own data compilations, excruciatingly typing data from the literature into spreadsheets. The new online databases matched the scientific working environment and workflow, allowing researchers to easily access the data and provided tools to integrate data from hundreds of publications into customized datasets within minutes. Since the databases went online, several hundred scientific articles have cited these databases as the source for datasets used to create or test new hypotheses, providing evidence for their utility and success.
The development of the various databases can be assigned to the System Building Phase' in infrastructure development, which is characterized by the successful design of technology-based services. Even though systems were not yet implemented in a sustainable manner, this phase was critical to provide a proof-of-concept to the community that broadened support for an advanced data infrastructure in geochemistry, and increased awareness within the community about deficiencies in the data culture such as the inconsistent and incomplete reporting of data quality information in publications.
The database projects next moved into a phase of System Growth & Stabilization', during which the systems were migrated to more professional and sustainable IT environments, with expert teams that supported development and operations. The community increasingly accepted the systems as part of their research infrastructure. Finally, the projects entered the Networking & Consolidation Phase' when they founded the EarthChem consortium with the objective to better link and integrate the independent data collections, nurturing synergies among projects, minimizing duplication of efforts, and sharing tools and approaches. Tensions regarding ownership, control, and design approaches that arose during the Networking Phase were surpassed by the substantial benefits of the collaboration, such as the broader impact of technical or organizational developments including standards and policies, and ultimately led to a more stable and well-considered implementation. The EarthChem network' is now expanding with new partners, and has attained a leadership role in the field, advancing a culture change in the geochemistry community and working with other geoinformatics projects, societies, editors, and the science community at large toward standards for data sharing and data reporting.
According to our analyses, the following factors have been key for the success of the geochemical databases: (a) The initial proof of concept systems did not rely on contributions from the community. Investigators only experienced the benefits of the systems and thus more readily accepted and supported the systems . (b) The systems offered capabilities that did not exist before, and that an individual could not achieve. (c) Early collaboration among the database systems led to compatible data models and metadata schemes, increasing the impact of the more widely applicable system designs. (d) The teams provided the necessary organizational and marketing' components to successfully advance the projects (the wizard - maestro - champion' combination noted by historians as a common combination of system-building teams).
Based on the lesson learned from the geochemistry systems, we see three roles that projects need to fulfill in order to successfully advance and implement Geoinformatics: (a) Service provider: The needs and requests of the users need to be given the highest priority. It is critical to understand that systems are operations, rather than pure research and development efforts. (b) Competent partner: Both data authors and data users need to trust the service provider, who needs to understand and respond to domain science concerns. (c) Team player: In order to advance the infrastructure development via networking, projects have to be willing to collaborate, share expertise and experiences, and acknowledge others' achievements.
Community concerns regarding issues of intellectual property (credit to data authors), and the impact of geoinformatics on core science funding, as well as the broad implementation of new practices and procedures (e.g., unfunded mandates of metadata generation) still represent major challenges that need to be overcome.
Paul N. Edwards, Steven J. Jackson, Geoffrey C. Bowker, and Cory P. Knobel. "Understanding Infrastructure: Dynamics, Tensions, and Design." Report of a Workshop on History and Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures. January 2007. http://www.si.umich.edu/InfrastructureWorkshop/