Rocky Mountain (63rd Annual) and Cordilleran (107th Annual) Joint Meeting (18–20 May 2011)

Paper No. 9
Presentation Time: 11:00 AM

ENHANCING MULTIVARIATE STATISTICAL CHARACTERIZATION OF HYDROCHEMICAL GROUNDWATER DATA: A COMPARATIVE ANALYSIS OF CLUSTERING METHODS


ASANTE, Joseph1, KREAMER, David K.1 and CROSS, Chad L.2, (1)Department of Geoscience, University of Nevada, Las Vegas, Nevada, 4505 Maryland Parkway, Las Vegas, NV 89154/4010, (2)Epidemiology & Biostatistics Concentration, School of Public Health, University of Nevada, Las Vegas, Nevada, 4505 Maryland Parkway, Las Vegas, NV 89154/3064, asantej@unlv.nevada.edu

Many studies have characterized hydrochemical data using graphical and multivariate statistical methods. Compared to graphical methods, multivariate clustering techniques and Principal Component Analysis are widely used by researchers because these multivariate methods can handle large datasets and limitless parameters. However, the procedures of clustering techniques and Principal Component Analysis can be subjective. This subjectivity questions the significance of the hydrochemical facies delineated using clustering methods and Principal Component Analysis, limits using the full potential of hydrochemical data, and reduces confidence in interpreting the hydrochemical facies to solve hydrologic problems.

We hypothesized that, using Multiple Discriminant Function Analysis and Cross-Tabulation, quantitative decisions can be made about the clustering technique to use for a hydrochemical dataset, number of hydrochemical facies that are significant, and effect of hydrochemical data transformation, analytical errors, and outliers on a clustering technique. The goal was to optimize cluster analytic characterization of hydrochemical dataset by integrating quantitative decisions in the cluster analysis. We quantitatively found that, the Hierarchical Clustering method, using within-groups linkage with squared Euclidean distance, was the best method for our hydrochemical data; six hydrochemical facies are significant for the hydrochemical dataset. Also, inappropriate data transformation significantly affected the delineation of the hydrochemical facies (Cramer’s V < 0.8). In addition, the hydrochemical facies delineated using all the data and after separately removing the hydrochemical outlier samples (7 %) and the samples with analytical errors (19 %) were found to be regionally similar (Cramer’s V > 0.8).