APPLYING DATA MINING TECHNIQUES TO CAPTURE OUTLIERS IN GROUNDWATER LEVEL DATA: A CASE STUDY OF EDWARDS AQUIFER, CENTRAL TEXAS

DEDEAUX, Lenee, Biology-Aquatic Resources, Texas State University-San Marcos, San Marcos, TX 78666, lenee.dedeaux@gmail.com

Continuous groundwater level data collected from observation wells in the karstic Edwards Aquifer, Central Texas, USA, provide large time series datasets useful for understanding temporal changes in aquifer conditions. However, because the Edwards Aquifer exhibits varying degrees of spatial dependency and spatial heterogeneity through time, observation well time series frequently do not align and contain outliers Detection of outliers in the datasets can increase groundwater level prediction accuracies by reducing the uncertainty associated with anomalous data.

Techniques capable of identifying outliers are required to prepare complex spatial-temporal data for predictive modeling. Data mining techniques using large time-series datasets are especially useful when modeling complex systems such as the Edwards Aquifer. Data mining techniques identify distinctive patterns in time series datasets and apply those findings to new data to simulate, predict, and forecast complex systems.

Preliminary analyses confirm that the Edwards Aquifer’s complex hydrogeology and groundwater pumping introduces outlier responses into the data. To better understand outliers and what causes them, this research conducts the data mining technique of hierarchical clustering analyses, combined with distance time warping (DTW) algorithm as a similarity measure, of daily groundwater levels from observation wells under various temporal scales and hydrologic conditions in order to detect outliers in the dataset, which will then be assessed for future model input. The DTW algorithm measures similarity of observation wells independent of time thereby accounting for spatial dependency and spatial heterogeneity in the Edwards Aquifer. As a result, clustering using the time-invariant DTW algorithm produces more robust cluster solutions than other similarity measures that are time-variant, such as Euclidean distance. We believe the outcome of this study can reveal distinctive spatial-temporal patterns between groundwater wells such as changing flow paths under different hydrologic conditions, which can introduce outliers in the data. Future research will use clusters identified in this hierarchical cluster analysis as inputs to an artificial neural network to predict groundwater levels in the Edwards Aquifer.

Session No. 99

T109. Karst Hydrology: New Insights of Dynamic Aquifer Systems

Monday, 26 September 2016: 8:00 AM-12:00 PM

Room 504 (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol. 48, No. 7
doi: 10.1130/abs/2016AM-287478

© Copyright 2016 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T109. Karst Hydrology: New Insights of Dynamic Aquifer Systems

<< Previous Abstract | Next Abstract >>

GSA Annual Meeting in Denver, Colorado, USA - 2016

APPLYING DATA MINING TECHNIQUES TO CAPTURE OUTLIERS IN GROUNDWATER LEVEL DATA: A CASE STUDY OF EDWARDS AQUIFER, CENTRAL TEXAS