Virtual Cell Program

Home / Research / little-b-challenges

Modular models in little b

Challenges in implementing modularity There are many obstacles towards building a computational infrastructure that realises our vision. We discuss some of them here along with the design choices that led to the architecture of little b. For present purposes, a "model" is a dynamical system: a state space equipped with a time evolution. This subsumes most models currently discussed in the literature: Boolean, deterministic, mass-action, stochastic, spatial or hybrid.

Different models may use different names for the same component. We assume the user provides a mapping between namespaces.

The same component may be represented differently in different contexts. One person may regard the MAP kinase Erk1 as being either activated or not, while another may want to know the phosphorylation state of the T and Y residues. Each representation is satisfactory in its own context. This reveals a broader problem: there is no unique model of any biological system. The model depends on the question being asked: a deterministic model may be sufficient to answer one question in the context of the available data, while a stochastic model may be needed in another context. Once again, this is a feature of biological complexity. The best we can hope for is that the computational infrastructure integrates modules correctly when that is feasible, which is already a big advance over what is currently possible, and fails gracefully when it is not, highlighting the inconsistencies and passing them back to the user for resolution.

Biology exists independently of any mathematical assumptions made to model it. Indeed, we can discern at least three knowledge layers implicit in any biological model: the biology (kinase K phosphoryates substrate S), the biochemical mechanism (ATP and substrate bind in random order to form an enzyme-substrate complex which decomposes in a single step into product, ADP and phosphate), the mathematical description of the mechanism (mass-action kinetics). The computational infrastructure allows each layer to be separately described, so that any of them can be altered independently. Monolithic models make no such distinctions, which greatly inhibits modularity.

Formalisation may require the construction of entities which have not been explicitly specified. Enzyme substrate complexes are a case in point. A biologist would say that enzyme E converts substrate S to product P. To formalise this in mathematical terms, the biochemical mechanism has to made explicit. It is usually assumed that this is a two-step reaction


with the formation of an intermediate enzyme-substrate complex ES. The computational infrastructure has to bring this component into existence, name it and account for its behaviour in the model, even though it may never be directly assayed in any biological experiment.

Modularity requires a symbolic reasoning capability. A model has to keep track of each molecular component in the system. However, the same component may be present in different cellular compartments (the cytoplasm and the nucleus, for instance) which will have different time evolutions. Hence, different variables will be needed to describe the component in different compartments. One solution would be to globally assume that each compartment contains each component, allowing those components not present to have value 0. But this destroys modularity, as it requires the components to be known in advance. To avoid this, the infrastructure must work out which components will be present locally in each compartment using the information provided by the user but in advance of determining the time evolution. This requires reasoning over the information provided by the user. If the user has stipulated that compartment X has an inward transporter for B in its membrane along with an enzyme that converts B to A then X will contain A when its environment contains B but need not otherwise. Symbolic reasoning is one reason for adopting LISP as the basis for the computational infrastructure.

Modularity requires consistent generation and control of names. Names must be assigned to components that the user has not explicitly specified (like component A in compartment X in the previous example) in such a way that these names are accessible to subsequent changes. The user may incrementally add an enzyme C to compartment X, for which A is an effector. The infrastructure must recognise, from the name previously given by the infrastructure to A in X, that it is the effector specified in the description of C, and account for its behaviour accordingly. The consistent assignment of names in a way that facilitates incremental and modular construction is central to the way the infrastructure works.

Biological knowledge is subject to continual refinement. For instance, there is conflicting data on whether or not KSR, the scaffold for the MAP kinase cascade, has any kinase activity (its name reflects homology to known kinases). The consensus is that it does not, despite some reports to the contrary. Current models of the MAP kinase cascade might assume no kinase activity on the part of the scaffold, while future models may need to change that assumption. This is easier to do when biological knowledge is kept separate from mathematical assumptions but still requires that existing descriptions of the scaffold be modified in the light of new knowledge along with descriptions of any other components whose functions are affected by the new knowledge. Such changes can become too complex to be done manually; they need to be written as programmes that can alter the existing knowledge base. These programmes could be written in a separate language to that in which the knowledge is specified but that creates a requirement for two languages: one to specify knowledge and one to modify it. LISP allows both languages to be the same, giving the computational biologist powerful programmatic tools for updating existing knowledge.

Time evolution may create new state. Endocytosis results in a new vesicular compartment within the cytoplasm of a cell. The components inside the vesicle will then evolve differently to the same components in the cytoplasm. The infrastructure can only determine the emergence of these new states by solving the time evolution of the system; it cannot (at least with present knowledge) work it out symbolically from the information supplied by the user. The infrastructure does not attempt to address this. It remains an open problem.

While each of the problems raised above can be readily solved individually in any particular instance by ad hoc methods, the challenge in building a computational infrastructure lies in solving them all automatically in ways that are mutually consistent with each other. Designing a programming language to do this is as much art as science. We have found it necessary to draw on several developments in computer science: functional programming, forward chaining algorithms and rule based systems from AI, object oriented programming and algebraic simplification. The result is little b.

modularity and little b

synopsis of little b (coming soon)

examples of what can be achieved with little b (coming soon)

www.littleb.org description of little b and its semantics (under construction)

 

Home | Research | Systems Timeline | Teaching
Papers | People | Positions
Events | Theory Lunch | Contact