Paper No. 10
Presentation Time: 3:45 PM

MIXTURE-MODEL CLUSTERING OF REGIONAL GEOCHEMICAL DATA


ELLEFSEN, Karl J., U.S. Geological Survey, Box 25046, Mail Stop 964, Denver Federal Center, Denver, CO 80225, SMITH, David B., U.S. Geological Survey, MS 973, Denver Federal Center, Denver, CO 80225 and HORTON, John D., U.S. Geological Survey, Denver Federal Center, MS 973, Denver, CO 80225, ellefsen@usgs.gov

Mixture-model clustering of regional geochemical data is a statistical procedure that is useful for interpretation. Because geochemical data are a type of compositional data, straightforward application of standard statistical procedures can yield erroneous results. Thus, we have developed and implemented (in the R statistical programming language) a robust clustering procedure that accounts for the compositional properties of the data: All element concentrations are first transformed with the isometric log-ratio transformation. The transformed concentrations are then used to calculate robust principal components. These components are clustered using a mixture model for which the probability density functions are multivariate normal, and the conditional probabilities that a sample is related to the density functions are calculated. In addition, random samples are drawn from each of the density functions and then are back-transformed to equivalent element concentrations.

The clustering procedure is evaluated with soil geochemical data from a survey of the state of Colorado (United States of America). The data comprise 959 samples with 31 element concentrations for each sample. The chosen mixture model has 4 density functions, and the calculated conditional probabilities partition the 959 samples into 4 clusters. For each cluster, most samples are spatially close together and thus are related to specific geologic features such as surficial deposits or bedrock. The independently-known geochemical properties of these geologic features are consistent with the random sample concentrations, and the order statistics for the random sample concentrations are almost identical to the corresponding order statistics for the field data (i.e., the measured concentrations for those samples with high conditional probabilities). Both results suggest that the clustering procedure is accurate. Another benefit of mixture-model clustering is that the element concentrations for each cluster are approximately statistically stationary, making them suitable for additional statistical processing such as multivariate kriging.