GSA Connects 2021 in Portland, Oregon

Paper No. 176-5
Presentation Time: 2:40 PM

AVOIDING PITFALLS IN PCA FOR DESCRIBING TAPHONOMY: REFINING THE CHEMOSPACE APPROACH


ROSBACH, Stephanie, Department of Geological Sciences, University of Missouri, Columbia, MO 65211, SNYDER, John C., Department of Statistics, University of Missouri, Columbia, MO 65211, LAMADRID, Hector M., Department of Geological Sciences, University of Missouri--Columbia, 101 Geological Sciences Bldg, Columbia, MO 65211, HUNTLEY, John, Geological Sciences, University of Missouri, 101 Geological Sciences Building, Columbia, MO 65211 and SCHIFFBAUER, James D., Geological Sciences, University of Missouri, 101 Geological Sciences Bldg, Columbia, MO 65211

Principal Component Analysis (PCA) is an incredibly useful exploratory statistical tool that allows researchers to determine vectors of high variance within their data. Over the last decade, PCA has become widely used across paleontological subdisciplines including morphometrics, paleoecology, and most recently taphonomy. With regard to the latter, PCA has become a prominent method to parse out preservational chemistry of a fossil in a “chemospace” using various forms of spectroscopic data (Colleary et al, 2015, McCoy et al, 2019, Selly et al, 2017). Given its wider use, it is necessary to assess potential fine-tuning adjustments that could make this methodology more effective for evaluating fossil materials.

Here we discuss three scenarios that require methodological adjustments to the PCA to produce useful and interpretable results. First, proportional data, such as data produced by , requires specific transformations before applying PCA to account for the inherent non-negative and constant sum constraints in the data. Second, in the case of performing PCA before supervised learning methods (e.g., classification, clustering), samples of unknown group membership should be excluded when performing PCA to avoid including variance from the out-of-sample data and to improve model generalizability. In the particular case of classification or clustering, it is advisable to use a secondary method beyond PCA to conclude group membership. Lastly, certain spectroscopy methods, such as Raman, can introduce unwanted noise with heterogeneous geologic samples, for example, fossil samples that contain a mixture of mineral and organic materials that can produce overlapping vibrational signals. While PCA is a valid technique to characterize the nature of these spectra, there is a question of how noise can influence the resulting principal components and thus the final interpretations. Here, we test a set of known, generated spectra with varying levels of noise to evaluate the effectiveness of PCA in describing the material.