FROM DARKNESS TO LIGHT: THE LONG TAIL OF SAMPLE-BASED DATA IN THE NEXT DECADE
Sample-based observations are part of the long tail. They usually come as small datasets, acquired by idiosyncratic data collection practices and organized in customized formats. These data are shared primarily through publication in the scientific literature, but often data in publications are incomplete or only presented in diagrams, not in tabular form. Metadata about data quality, analytical procedure, and samples that are critical for the re-use of the data are often poorly documented. Many data never leave the ‘darkness’ of the investigator’s hard-drive or desk-drawer. Bringing the dark data into the light is an essential component of building comprehensive Geoscience cyberinfrastructure for the next decade.
New data services and tools are developed by the Geoinformatics for Geochemistry Program (www.geoinfogeochem.org) as part of the Integrated Earth Data Applications (IEDA, www.iedadata.org) data facility, to support the management of sample-based data, and to offer incentives to investigators to share their data. Tools and services that support the workflow of geochemical data management from data acquisition and metadata capture in the lab to data reduction, data analysis and visualization, to data publication and data submission to repositories in compliance with funding agency policies can advance preservation and access of geochemical, geochronological, and other sample-based data across the Geosciences.