GSA 2020 Connects Online

Paper No. 180-5
Presentation Time: 11:15 AM

NEW APPROACHES FOR HANDLING INAPPLICABLE DATA IN CHARACTER MATRICES FOR BOTH DISPARITY AND PHYLOGENETIC ANALYSIS


HOPKINS, Melanie J., Division of Paleontology (Invertebrates), American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024-5192 and ST. JOHN, Katherine, Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024-5192; Department of Computer Science, Hunter College, City University of New York, New York, NY 10065

At broad taxonomic scales, it may be difficult to meaningfully describe morphological variation using quantitative measurements. As a result, it is common for paleobiologists to translate the variation observed across taxa into semi-quantitative characters where the different states of a character represent categories of biological expression of that character. Complex traits are often translated into multiple characters in order to capture all of the variation, and at broad taxonomic scales, some complex characters may only be observed in a subset of taxa. As a result, any character describing some aspect of that complex trait will not be applicable to all of the taxa. Several strategies exist for coding and handling these “inapplicable” or “hierarchical” characters, one of the most common of which is to treat inapplicable characters as missing data. In the context of disparity analyses, this can lead to the reranking of pairwise dissimilarities resulting in taxa that share more primary character states being assigned larger dissimilarity values than taxa that share fewer. In the context of phylogenetic analyses, this can lead to favoring trees where interval nodes are assigned impossible states, where the arrangement of taxa within subclades is unduly influenced by variation in distant parts of the tree, and/or where taxa that otherwise share most primary characters are grouped distantly. We introduce a family of dissimilarity metrics that proportionally weights primary characters by the set of potentially inapplicable characters that describe them. Using both synthetic and empirical datasets, we demonstrate how using this approach for disparity analyses or as a modification to maximum parsimony for phylogenetic analyses eliminates the problems that arise when the treating inapplicable characters as missing data.