EFFECTS OF RANDOMLY AND NON-RANDOMLY DISTRIBUTED MISSING DATA IN SUPPORT VALUES OF BAYESIAN AND PARSIMONY ANALYSIS (Invited Presentation)

POL, Diego and HOLLEY, Alfredo, Museo Paleontológico Egidio Feruglio, CONICET, Av Fontana 140, Trelew, 9100, Argentina, dpol@mef.org.ar

Paleontological datasets are characterized by the copious amount of missing data and their problematic effects in phylogenetic analyses have long been noted. In terms of parsimony analyses, recent advances in numerical methods and their efficient implementation in phylogenetic software currently allows incorporating numerous characters or taxa with large amounts of missing entries without creating problems related to the large numbers of equally parsimonious trees. Furthermore, the taxa that are unstable among the most parsimonious trees can be identified and removed for achieving well-resolved reduced consensus trees. The effects that missing data has on support values, however, is much less understood.

Regarding Bayesian analyses, recent studies using both empirical and simulated data matrices have shown that missing data also affect the performance of this method, especially when the missing data is non-randomly distributed. Non-random distribution of missing data in paleontological data matrices is quite common as it is usually concentrated on highly incompletely scored taxa and highly incompletely scored characters. As in parsimony, the effects of the amount of missing data (and the different patterns of distribution) on posterior clade probability is poorly understood.

Here we present a study on the effect of randomly and non-randomly distributed missing entries have on a set of empirical data matrices of morphological characters in support values for both Bayesian and parsimony analyses. Different regimes of missing entries were artificially added to these datasets and the support/credibility values obtained for the modified datsets were compared with those of the original matrices (without missing data). The results of these analyses show that support/credibility values are highly sensitive to the presence of non-randomly distributed missing entries, in particular for the case of highly incompletely scored taxa. A major difference in the results of both methods is found in the frequency of high credibility values obtained for erroneous groups in the case of Bayesian analyses.

Session No. 312

T152. Troubles and Triumphs with Fossil Phylogenies

Wednesday, 28 September 2016: 1:30 PM-5:30 PM

Mile High Ballroom 4AB (Colorado Convention Center)

Geological Society of America Abstracts with Programs. Vol. 48, No. 7
doi: 10.1130/abs/2016AM-285418

© Copyright 2016 The Geological Society of America (GSA), all rights reserved. Permission is hereby granted to the author(s) of this abstract to reproduce and distribute it freely, for noncommercial purposes. Permission is hereby granted to any individual scientist to download a single copy of this electronic file and reproduce up to 20 paper copies for noncommercial purposes advancing science and education, including classroom use, providing all reproductions include the complete content shown here, including the author information. All other forms of reproduction and/or transmittal are prohibited without written permission from GSA Copyright Permissions.

Back to: T152. Troubles and Triumphs with Fossil Phylogenies

Previous Abstract | Next Abstract >>

GSA Annual Meeting in Denver, Colorado, USA - 2016

EFFECTS OF RANDOMLY AND NON-RANDOMLY DISTRIBUTED MISSING DATA IN SUPPORT VALUES OF BAYESIAN AND PARSIMONY ANALYSIS (Invited Presentation)