GSA Annual Meeting in Indianapolis, Indiana, USA - 2018

Paper No. 3-3
Presentation Time: 8:35 AM

PROMOTING FAIR DATA PRACTICES FOR THE GEOSCIENCES WITH JUPYTER NOTEBOOKS


RUSSELL, Kevin P., Indiana Geological and Water Survey, Indiana University, 611 North Walnut Grove, Bloomington, IN 47405-2208

Juypyter Notebook technology has risen dramatically to include disciplines outside of computer and data science including the geosciences. Notebooks are a powerful tool that enables users to develop and share documents that contain live code, HTML, images and explanatory text, dynamic visualizations, and equations that provide guidance and context for communicating ideas-in-development, as well as, polished workflows and visualizations.

The FAIR data principles aim to make data findable, accessible, interoperable, and reusable. Jupyter Notebooks are helpful tools in documenting practices that reduce the incidence of dark data and hidden methods that researchers perform when cleaning, wrangling, and processing their data. Notebooks provide users an open mechanism by which one or more easily shared documents contain the full analytical methodology, connections to data sources, visualizations, and descriptive text to inform interpretation of those data.

Notebooks also allow for the fast prototyping of data workflows. The non-linear, independent, cell based operation allows a user to quickly test a specific step without re-executing the entire workflow or script. Furthermore, notebooks open the vast python ecosystem to users. For example, the Python Data Analysis Library (Pandas) library provides high performance and straight-forward data structures that allow for use of huge amounts of data at extremely fast speeds. Finally, Jupyter Notebooks are becoming integrated with an increasing number of platforms including ArcGIS and custom libraries built by organizations or individuals and shared as open source software distributions.

This presentation will introduce Jupyter Notebook technologies to new users through a demonstration of its use in a geoscience application. The walkthrough will cover data import from local files, external files over REST API from IndianaMap, and web-scraping of data from a table embedded in an HTML web page. These data will be displayed with the Pandas library and visualized with a popular data visualization library. Basic geospatial operations will be performed with ArcGIS Arcpy library and the output shared through ArcGIS Online. These same geoprocessing operations will be performed with GDAL to allow anyone, anywhere to reproduce the analysis with reliable results.