GSA Connects 2022 meeting in Denver, Colorado

Paper No. 111-11
Presentation Time: 4:45 PM

AN EIGENVALUE BASED K-MEANS APPROACH TO GEOLOGIC CLUSTER ANALYSIS


VOLLMER, Frederick, Geology Department, SUNY New Paltz, 1 Hawk Drive, New Paltz, NY 12561

Geologic directional data is often multimodal, and it is desirable to objectively partition it into clusters. These include fractures and faults, whose partitioning may assist in identifying overprinted deformation events, or coeval slip partitioning along a major fault system. Cluster analysis is a form of unsupervised learning, which includes parametric mixture and k-means models. In a k-means model each data point is assigned to one of k distributions based on a measure of distance. This nonparametric model is widely applied to geologic data.

The present approach maximizes the sum of an eigenvalue, or resultant vector, index over the clusters. This allows weighting, and is implemented for axis, vector, and girdle distributions. Cluster centers are seeded using random rotation matrices, from which iterative solutions converge by minimizing a cost function. This requires selection of the number of clusters, k, which can be subjective, so methods are given for the evaluation of k. A scalar distance-based minimization is also implemented, where cluster centers are found by minimizing the sum of spherical distances from the data. This is less flexible, but is more stable for large k and small n. Finally, a density-based cluster analysis is implemented.

Three data sets illustrate these methods. The first is fracture data from the San Manual copper mine, Arizona. Contouring the data on a Schmidt plot suggests k=3 clusters. The second are magnetic remanence vectors, which on theoretical grounds should have k=4 clusters. A third example is fault slip partitioning. Slip partitioning has been suggested to occur on transpressional fault systems, such as the Denali fault, and is a well established tenet of plate tectonic theory along extensional plate boundaries, such as the Mid-Atlantic Ridge (MAR). This example uses k=3 to distinguish three fault slip modes along the southern MAR using cluster analysis.

These implementations are available in the free Orient software.