2015 GSA Annual Meeting in Baltimore, Maryland, USA (1-4 November 2015)

Paper No. 324-5
Presentation Time: 2:30 PM

BUILDING AN AUSTRALIAN PETASCALE DATA REPOSITORY TO SUPPORT GEOSCIENCE RESEARCH AND BEYOND: IS CENTRALIZATION THE WAY FORWARD?


WYBORN, Lesley A.I. and EVANS, Benjamin J.K., National Computational Infrastructure, Australian National University, 56 Mills Road, Acton, 2600, Australia, lesley.wyborn@anu.edu.au

Changes in the global computational landscape offer the Geosciences new opportunities to undertake innovative research at scales and resolutions never before attempted. Currently researchers have access to more computational power and storage resources than at any time in the past. But ways of organising the Geosciences to utilise these capabilities are not yet optimal: in particular, valuable data tends to be stored as heterogeneous files, distributed over multiple sites and sectors.

Web services showed promise in the last decade and offered the ability to connect distributed repositories online via standardized interfaces. However, this approach is under-delivering: the skills required to set up and maintain these services are high; applications to read these services are not common place; and the user community is small. Multiple small-scale distributed repositories are also inefficient in terms of energy and scarce skills, whilst the chances of all sites having all their data services online, 24*7 are remote.

Radical rethinking on data is essential. The community needs to move to a new paradigm where processing power, people and tools are brought to the data with more emphasis placed on high quality data-as-a-service. The trends are towards fewer and larger data repositories that are collocated with HPC/cloud resources, with similar files aggregated into self-describing data arrays and cubes of regional to continental scales. Common interfaces to multiple applications allow online access to data via a variety of resources from petascale computers, to cloud (public, private) to mobile devices (smart phones, tablets); skill sets can also be consolidated.

An example of a petascale Data Repository collocated with significant HPC/cloud resources has been built at the Australian National Computational Infrastructure (NCI) and stores over 10 PB of data collections spanning a range of fields from the geosciences, environment, climate and oceans, through to astronomy, bioinformatics, and the social sciences. Researchers can either log on to utilise the HPC infrastructure for these collections, or access the data via standardised services. The goal is to develop a trusted data platform to support not only the Geosciences, but also the inclusion of geoscience data in multiple interdisciplinary research projects.