GSA Annual Meeting in Phoenix, Arizona, USA - 2019

Paper No. 14-3
Presentation Time: 8:40 AM


LLOYD, Graeme T., School of Earth and Environment, University of Leeds, Leeds, LS2 9JT, United Kingdom

Appropriate visualisation remains critical to sound data analysis and is more valuable than any summary statistic. However, oftentimes our data are high-dimensional, either because they are multivariate or due to some other intrinsic property, meaning simple bivariate visualisations are not possible. Phylogenetic trees – a special case of directed acyclic graphs that attempt to capture the relationships between a set of species – fall into this latter category. Typically, researchers only work with samples of optimal trees, choosing to visually summarise their variance using “consensus” methods that capture only very limited information. A more appropriate alternative might be to use tree “spaces” that visually summarize trees in the context of all possible topologies (arrangements of tips). Such visualisations have broad application in paleobiology. For example, by assigning numerical values to tips the space becomes a “landscape”, allowing identification of multiple tree “islands”. Unfortunately, there are multiple challenges to implementing treespace approaches, primarily the high numbers of all possible topologies – for 50 tips this number is equivalent to estimates for all the atoms in the observable universe. Here I attempt to derive a novel approach by drawing on mathematical tools from graph theory, geometry and topology. I conjecture that treespace can be first captured as a two-dimensional graph, with vertices corresponding to topologies and edges to adjacencies, and then projected onto the N-sphere (the hyperdimensional extension of the sphere), where the rich toolbox of map projections can be co-opted for subsequent visualisation. Critical to this approach is the consideration of not just fully bifurcating topologies, but also the multifurcating trees that are largely ignored by workers. Here I show practical solutions for the 1-, 2-, 3-, and 4-tip cases that both retain low dimensionality and Euclidean distances, in contrast to currently available methods. Finally, I speculate on what needs to be achieved to generalize this approach to higher tip counts.