TOWARDS BIG EARTH DATA ANALYTICS: THE EARTHSERVER APPROACH

ABSTRACT WITHDRAWN

, p.baumann@jacobs-university.de

Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data; "coverage", according to ISO and OGC, is defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timeseries and x/y/z geology data, and 4-D x/y/z/t atmosphere and oc

ean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server.

The transatlantic EarthServer initiative unites 11 partners to establish Big Earth Data Analytics. A key ingredient is flexibility for users to ask what they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level query languages; these have proven tremendously successful on tabular and XML data, and we extend them with a central geo data structure, multi-dimensional arrays.

A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing code has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria.

The EarthServer platform is comprised by rasdaman, an Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language.

We present the EarthServer project, relate it to the current state of standardization, and demonstrate it by way of large-scale data centers and their services using rasdaman.

Session No. 139

T90. Digital Geosciences: A Framework for Data-Intensive, Multi-Disciplinary Research and Education (Posters)

Monday, 28 October 2013: 9:00 AM-6:30 PM

Hall D (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol. 45, No. 7, p.359

© Copyright 2013 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T90. Digital Geosciences: A Framework for Data-Intensive, Multi-Disciplinary Research and Education (Posters)

<< Previous Abstract | Next Abstract >>

The Geological Society of America 2013 GSA Annual Meeting in Denver: 125th Anniversary of GSA (27-30 October 2013) Denver, Colorado, USA

TOWARDS BIG EARTH DATA ANALYTICS: THE EARTHSERVER APPROACH

The Geological Society of America
2013 GSA Annual Meeting in Denver: 125th Anniversary of GSA (27-30 October 2013)
Denver, Colorado, USA