• Harvey Thorleifson, Chair
    Minnesota Geological Survey
  • Carrie Jennings, Vice Chair
    Minnesota Geological Survey
  • David Bush, Technical Program Chair
    University of West Georgia
  • Jim Miller, Field Trip Chair
    University of Minnesota Duluth
  • Curtis M. Hudak, Sponsorship Chair
    Foth Infrastructure & Environment, LLC


Paper No. 5
Presentation Time: 2:50 PM


LEHNERT, Kerstin A., Lamont-Doherty Earth Observatory, Columbia University, 61 Route 9W, Palisades, NY 10964 and WALKER, J. Douglas, Department of Geology, University of Kansas, Lawrence, KS 66045,

Sample-based observations such as geochemical and geochronological data are part of the ”Long Tail of Science Data” as described by Heidorn (DOI: 10.1353/lib.0.0036). Heidorn organizes science projects along an axis from large to small, with very large projects supporting dozens or more scientists on the left side of the axis, and smaller projects sorted by decreasing size trailing off to the right. The area under the right side of the curve is the ‘long tail of science data’, where the majority of scientists produces many small and heterogeneous datasets that are poorly curated, not shared, and often lost, even though this long tail holds greater potential for innovative and transformative science.

Sample-based observations are part of the long tail. They usually come as small datasets, acquired by idiosyncratic data collection practices and organized in customized formats. These data are shared primarily through publication in the scientific literature, but often data in publications are incomplete or only presented in diagrams, not in tabular form. Metadata about data quality, analytical procedure, and samples that are critical for the re-use of the data are often poorly documented. Many data never leave the ‘darkness’ of the investigator’s hard-drive or desk-drawer. Bringing the dark data into the light is an essential component of building comprehensive Geoscience cyberinfrastructure for the next decade.

New data services and tools are developed by the Geoinformatics for Geochemistry Program ( as part of the Integrated Earth Data Applications (IEDA, data facility, to support the management of sample-based data, and to offer incentives to investigators to share their data. Tools and services that support the workflow of geochemical data management from data acquisition and metadata capture in the lab to data reduction, data analysis and visualization, to data publication and data submission to repositories in compliance with funding agency policies can advance preservation and access of geochemical, geochronological, and other sample-based data across the Geosciences.

  • GSA2011_LongTail_final.pptx (5.5 MB)
  • Meeting Home page GSA Home Page