| 2008 Geoinformatics Conference (11-13 June 2008) | |
| Paper No. 5-4 | |
| Presentation Time: 10:40 AM-11:00 AM | ||
LONG-TERM AVAILABILITY OF GEOSCIENCE DATA | ||
|
KLUMP, Jens, GeoForschungsZentrum-Potsdam (DRZ), Telegrafenberg, Potsdam 14473 Germany, jklump@gfz-potsdam.de In the last decade research in the geological sciences has produced vast amounts of new data. In some cases it is the enormous volume of data that poses a technical challenge, in other cases it is their semantic complexity. What ever the volume an format of the data may be, geoscience data are characterized by their origin in a heterogeneous and dynamic research environment. In contrast to a business or administrative context, scientific work flows are characterized by ad-hoc changes that become necessary through the incorporation of new results into experimental working hypotheses (Barga and Gannon 2007). To the individual scientist, data curation is not at the focus of scientific work and there are few incentives to scientists to make data accessible for re-use or re-purposing. Only few science funding agencies ask grant recipients to make their data accessible, and even fewer journals make data access a prerequisite for publication. Furthermore, the roles and responsibilities in long-term curation of scientific data still need to be resolved (Lyon 2007). This situation leads to deficits in data management that puts large portions of our scientific heritage at risk of loss. Furthermore, the inaccessibility of data might have a negative impact on the quality of research (Nature Editorial 2006). To achieve a sustainable long-term accessibility and re-usability of research data requires a combination of organizational and technical measures. On the organizational side data curation needs to become an integral part of good scientific practice, at the same time geoinformatics has to develop tools that facilitate the tasks needed for efficient and sustainable data curation. A key to devising effective and sustainable strategies for the long-term preservation and accessibility of research data is to define “Levels of Persistence” in the data curation process and its supporting technical architecture. The idea is to distinguish the domain of active research, where curation is the responsibility of the scientists, and the long-term preservation domain, where responsibility and expertise lie with the “memory institutions” (library, data center). These domains are not discrete but rather form the end-points of a “curation continuum” (Treloar, Groenewegen and Harboe-Ree 2007). In most cases, the existing disciplinary data repositories are not integrated into the scientific work flow, which leads to only a small proportion of the data being archived in disciplinary repositories. This break in the work flow is also reflected in the problems observed in the generation and curation of metadata. More research needs to be done to determine which kind of metadata are needed at which level of data curation (Treloar, Groenewegen and Harboe-Ree 2007), and how metadata can be generated automatically in the data curation processes (Robertson 2006). The heterogeneity of data in the geological sciences requires to pay special attention to data and file formats. Not all formats that are popular among scientists are suitable for long-term preservation (Lormant et al. 2005). This also means, that preservation metadata need to encode more of the data format than just their MIME-type. Because Uniform Resource Locators (URL) are transient they are not suitable as means of referencing data for the purpose of citation. The shortcomings of URL are overcome by the use of persistent identifiers, such as Digital Object Identifiers and Uniform Resource Names (URN) (Altman and King 2007) Barga, R. and Gannon, D.B., 2007. Scientific versus business workflows. In I. J. Taylor et al., (Eds.) Workflows for e-Science. London, UK: Springer-Verlag, p. 9-16. Klump, J., 2008. Anforderungen von e-Science und Grid-Technologie an die Archivierung wissenschaftlicher Daten, Göttingen, Germany: Kompetenznetzwerk Langzeitarchivierung (nestor). Lormant, N. et al., 2005. How to Evaluate the Ability of a File Format to Ensure Long-Term Preservation for Digital Information? In Ensuring Long-term Preservation and Adding Value to Scientific and Technical data (PV 2005). Edinburgh, UK, p. 11. Available at: http://www.ukoln.ac.uk/events/pv-2005/pv-2005-final-papers/003.pdf. Lyon, L., 2007. Dealing with Data: Roles, Rights, Responsibilities and Relationships, UKOLN, Bath, UK. Nature Editorial, 2006. A fair share. Nature, v. 444, no. 7120, p.653-654. Robertson, R.J., 2006. Evaluation of metadata workflows for the Glasgow ePrints and DSpace services, University of Strathclyde, Glasgow, UK. Treloar, A., Groenewegen, D. and Harboe-Ree, C., 2007. The Data Curation Continuum - Managing Data Objects in Institutional Repositories. D-Lib Magazine, v. 13, no. 9/10. | ||
|
2008 Geoinformatics Conference (11-13 June 2008)
| ||
| Session No. 5 Geoinformatics Oral Session III GeoForschungsZentrum Potsdam, Building H: Main Lecture Theater 9:00 AM-4:20 PM, Friday, 13 June 2008 | ||
© Copyright 2008 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions. | ||