GSA Annual Meeting in Phoenix, Arizona, USA - 2019

Paper No. 177-9
Presentation Time: 10:15 AM

DESIGN OF AN OBSERVATORY DATA SYSTEM AND OPEN SOURCE DATA SHARING WITH ACTIVE CURATION AND MACHINE LEARNING


KUMAR, Praveen1, MARINI, Luigi2, PITCEL, Michelle2, KEEFER, Laura3 and MCHENRY, Kenton2, (1)Civil and Environmental Engineering, 205 North Mathews Avenue # 3215 DCL, University of Illinois, Urbana, IL 61801, (2)NCSA, University of Illinois at Urbana-Champaign, 1205 W. Clark St, Room 1008, MC-257, Urbana, IL 61801, (3)University of Illinois Urbana-Champaign, Illinois State Water Survey, 2204 Griffith Dr., Champaign, IL 61820-7463

Over the past six years IMLCZO (Intensively Managed Landscape Critical Zone Observatory) has contributed to benefited from the development and deployment of a system based on CLOWDER platform that supports heterogeneous scientific data. Scientific data is often very heterogeneous. Within geoscience, data spans time series, geospatial, remote sensing, geophysical image, geophysical and geochemical laboratory analyses, experimental outcomes, and images to name a few. For such data to be usable by others, large collections of data spanning these types, some of it unstructured, must be annotated and/or processed into more readily usable products. If datasets are large, which is more and more the case today, local computational capabilities are also often essential towards usability in order to save the user from having to download the data or identify a suitably powerful local computational resource to run analysis. Clowder, an open source data management framework built on the notion of Active Curation, provides machine learning and other analysis based tools to facilitate the annotation of large, broad, and unstructured datasets. Being customizable from the ground up, Clowder can be leveraged and deployed as needed at local institutions for specific scientific needs or deployed remotely on cloud/HPC resources, extended to meet new data visualization/analysis needs, and utilized to run custom analysis near the data where it resides, and interoperate with other data infrastructure components e.g. for long term archiving. Clowder has been leveraged to support the data sharing and processing needs of a broad range of communities spanning geoscience, biology, materials science, medicine, social science, cultural heritage and the arts. This presentation will describe the data system designed to support scientific advancement.