ENHANCING MULTIVARIATE STATISTICAL CHARACTERIZATION OF HYDROCHEMICAL GROUNDWATER DATA: A COMPARATIVE ANALYSIS OF CLUSTERING METHODS
We hypothesized that, using Multiple Discriminant Function Analysis and Cross-Tabulation, quantitative decisions can be made about the clustering technique to use for a hydrochemical dataset, number of hydrochemical facies that are significant, and effect of hydrochemical data transformation, analytical errors, and outliers on a clustering technique. The goal was to optimize cluster analytic characterization of hydrochemical dataset by integrating quantitative decisions in the cluster analysis. We quantitatively found that, the Hierarchical Clustering method, using within-groups linkage with squared Euclidean distance, was the best method for our hydrochemical data; six hydrochemical facies are significant for the hydrochemical dataset. Also, inappropriate data transformation significantly affected the delineation of the hydrochemical facies (Cramer’s V < 0.8). In addition, the hydrochemical facies delineated using all the data and after separately removing the hydrochemical outlier samples (7 %) and the samples with analytical errors (19 %) were found to be regionally similar (Cramer’s V > 0.8).