GSA Annual Meeting in Denver, Colorado, USA - 2016

Paper No. 289-7
Presentation Time: 9:30 AM

THE SPACE OF SAMPLED ANCESTOR TREES


GAVRYUSHKIN, Alex, Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, Basel, 4058, Switzerland, alex@gavruskin.com

With phylogenetic methods being employed in various areas of science, the information carried by the tree may have substantially different meanings. Examples include gene trees, species trees, transmission trees, language trees, etc. In many of these applications, the information about the ancestry relationship explicitly presents in the data and often is an objective for phylogenetic *sampled ancestor tree* inference. Main examples include trees with fossils and transmission trees.

An important property that distinguishes a sampled ancestor tree from a classical phylogenetic tree is that the taxa are allowed to be internal nodes of the tree. For example the information that a particular fossil taxon is ancestral to a particular contemporary taxon can be expressed in a sampled ancestor tree but not in a classical tree.

Surprisingly little is known about sampled ancestor trees from computational and mathematical point of view. Indeed, there exists no standard coordinate system for this type of trees. This poor understanding leads us to major problems with computational tree inference: MCMC methods lack efficient navigation algorithms, comparison methods lack a sound metric, and statistical summaries lack consistency. The key obstacle for solving these problems is the dimensionality of the trees.

In this work, we suggest an approach to fill this gap by providing a novel coordinate system for phylogenetic sampled ancestor trees. This system scales naturally from continuous to discrete trees by hierarchically approximating continuous time by discrete time segments. Although elementary moves between trees are inherited from the NNI move, geometric and algorithmic properties of the moves are greatly different.

In this talk, I will introduce the coordinate system and motivate it by popular applications in computational phylogenetics. I will compare the system with classical phylogenetic trees and demonstrate its algorithmic and statistical potential. I will finish by outlining possible directions for future research.

Handouts
  • 2016_GSA.pdf (15.9 MB)