Scaling laws and regression on (phylogenetic) trees

30 June 2017

Jörg Stelling
Department of Biosystems Science and Engineering
ETH Zürich, Basel, Switzerland


Allometric scaling laws linearly relate (transformations of) two biological observables such as metabolic rate and body size. In log-log space, they give rise to power laws, which have been proposed to be ubiquitous in nature, including in the structure of genomes and cellular networks. However, current approaches assume that a single scaling law applies to all samples considered (for example, all animals), and there are no adequate methods to test for this assumption. We were motivated to address this problem by cases in which one observes (two) features of samples that are related to each other via a tree structure: species in a phylogenetic tree, or individuals in a lineage tree. Formally, we aim to find an optimal (according to an information criterion), combined linear regression model that allows for local regression models on parts of the data, provided that the partition of the data complies with the given tree: local models have to be assigned to non-overlapping subtrees. Dependencies between subtree choices make this a formidable combinatorial problem that is not amenable to brute-force computation. Here, we show that the provably best model can be found efficiently by combining elementary statistics, dynamic programming ideas, and algorithms from computational geometry. Application of our method to the allometric relation between brain size and body size using data for several hundred mammal species shows that not all species follow the same scaling law; it reveals fine structures among mammals and indicates that (in this respect) humans are not special.

current theory lunch schedule