17 February 2006

Greg Meredith

Biosimilarity LLC

tuningtheair.com

Both the research biologist and the biologist commercially engaged in finding drug-based therapies find themselves enlisted in the profound task of developing a more detailed understanding of cellular mechanisms including signaling, metabolism, regulation, and how they work together. In pursuit of this goal the working biologist is faced with an ever-growing mountain of data -- growing in large measure as a result of the sustained effort to find successful therapies. She needs a way to access and organize this data that is at once more objective and more semantically sophisticated than her current tools provide. Moreover, the nature and kind of data the biologist is sorting through is increasingly about dynamics. Using the pathway databases as a microcosmic image of a larger problem, when a user of the Kyoto Genes and Genomic database (KEGG) searches, she searches by name. Rather than searching KEGG in terms of behavioral characteristics, a user must must know in advance what she is looking for, say, a pathway called the "TCA-cycle"; but, in many cases, these naming conventions are even less well agreed than the categories of popular music consumption "rock, pop, jazz, ....". We observe, however, that the dynamical data is already in KEGG; and we note that there are methods that allow a much more semantically rich view into the data.

To make this more concrete, we imagine enabling a biologist to ask a query like this: find all the pathways, P, and small molecules, D, such that if we introduce the D into the P a communication event is eventually blocked in P and the (P, D) combination eventually reaches a state that decomposes into a healthy part (in concentration k) and a toxic part (in concentration k'). This query exemplifies what is expressible---and, more importantly, what may be processed---in the mathematical framework of process algebras (see references below). While the talk describes work in progress, the ambitions are high; we note that the mathematical view of data in terms of relational algebra gave rise to a revolution in data access and manipulation, known as relational databases. These databases, however have no innate understanding of data that is dynamical; dynamical content must be captured at a level to which relational search techniques are blind. Recognizing that process-oriented data is already dominating what we collect and access over the internet---especially as relates to biological research---the proposal is to usher in another mathematical point of view, one cognizant of the dynamics in dynamical data. Roughly speaking, relational algebra : SQL :: process algebra : X---solving for X provides the biologist with the basis of a search language aware of the **dynamics** she is investigating.

R. Blossy, L. Cardelli, A. Philips, *"A compositional approach to the stochastic dynamics of gene networks"*, to appear in Transactions on Computational Systems Biology. PDF

C. Priami, A. Regev, E. Shapiro, W. Silverman, *"Application of a stochastic name-passing calculus to representation and simulation of molecular processes"*, Information Processing Letters, 80:25-31 2001.