GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 99-3
Presentation Time: 8:35 AM

APPLYING DATA MINING TECHNIQUES TO CAPTURE OUTLIERS IN GROUNDWATER LEVEL DATA:  A CASE STUDY OF EDWARDS AQUIFER, CENTRAL TEXAS


DEDEAUX, Lenee, Biology-Aquatic Resources, Texas State University-San Marcos, San Marcos, TX 78666, lenee.dedeaux@gmail.com

Continuous groundwater level data collected from observation wells in the karstic Edwards Aquifer, Central Texas, USA, provide large time series datasets useful for understanding temporal changes in aquifer conditions. However, because the Edwards Aquifer exhibits varying degrees of spatial dependency and spatial heterogeneity through time, observation well time series frequently do not align and contain outliers Detection of outliers in the datasets can increase groundwater level prediction accuracies by reducing the uncertainty associated with anomalous data.

Techniques capable of identifying outliers are required to prepare complex spatial-temporal data for predictive modeling. Data mining techniques using large time-series datasets are especially useful when modeling complex systems such as the Edwards Aquifer. Data mining techniques identify distinctive patterns in time series datasets and apply those findings to new data to simulate, predict, and forecast complex systems.

Preliminary analyses confirm that the Edwards Aquifer’s complex hydrogeology and groundwater pumping introduces outlier responses into the data. To better understand outliers and what causes them, this research conducts the data mining technique of hierarchical clustering analyses, combined with distance time warping (DTW) algorithm as a similarity measure, of daily groundwater levels from observation wells under various temporal scales and hydrologic conditions in order to detect outliers in the dataset, which will then be assessed for future model input. The DTW algorithm measures similarity of observation wells independent of time thereby accounting for spatial dependency and spatial heterogeneity in the Edwards Aquifer. As a result, clustering using the time-invariant DTW algorithm produces more robust cluster solutions than other similarity measures that are time-variant, such as Euclidean distance. We believe the outcome of this study can reveal distinctive spatial-temporal patterns between groundwater wells such as changing flow paths under different hydrologic conditions, which can introduce outliers in the data. Future research will use clusters identified in this hierarchical cluster analysis as inputs to an artificial neural network to predict groundwater levels in the Edwards Aquifer.