Extracting mechanistic insights from statistical analysis of high throughput data

12 Nov 2010

Jayajit Das
Battele Center for Mathematical Medicine
Children's Hospital
Columbus, Ohio


Hierarchical cell signaling and gene regulatory kinetic reactions, composed of rich biochemical networks, produce decisive functional outcomes in cells that interact with diverse stimuli. Recent developments in high throughput experiments provide us with detailed views of these complex phenomena. While the amount of data from such experiments containing enormous numbers of variables is impressive, it is difficult to extract mechanisms underlying the complex kinetics that determine functional outcomes. Elucidating the mechanisms is essential for both scientific understanding and therapeutic applications. We study multivariate statistical methods (eg: principal component analysis) based on covariances used in the analysis of high throughput data sets. We show that these lead to a dramatic reduction (from hundreds to fewer than 5) of the dimensionality in the time-dependent data obtained from numerical solution of coupled ordinary differential equations describing large, biologically significant sets of biochemical reactions. We find this reduction is independent of the form of the nonlinear interactions, network architecture and over a wide range of parameter values (rate constants and concentrations). We show how changes in time scales in the system are associated with the relative changes in the number of the principal components required to capture the maximal variance in the data set. We provide examples where description of the system kinetics in terms of few principal components can lead to new insights into complex multi-dimensional systems. This may lead us toward uncovering mechanisms and identifying the key processes in complex biological systems.

current theory lunch schedule