Jeremy Zucker

"How to debug a bug: constraint-based analysis of a bioinformatics
pipeline for constructing genome-scale metabolic models from an
annotated genome"

Abstract: 

Despite the increasing availability of annotated genomes, a paucity of
genome-scale metabolic models have so far been published. The
bottleneck comes from the extensive manual curation which must take
place before attempting to make predictions with the model. However,
if a bioinformatics pipeline for constructing models of metabolism
from genomes is to succeed, it must also incorporate a systematic
method for curating each model.  We outline a proposal for
constructing such a pipeline by generating these models from metabolic
pathway/genome databases and applying constraint-based reasoning
techniques to discover errors in the underlying data and data
representations.  By creating a feedback loop between model and
database, we hope to accelerate the curation process to the point that
systematic application of these models to a large variety of organisms
becomes feasible.

When the model fails to predict outcomes correctly, we must apply the
scientific method to the system as a whole. Errors may be introduced
at any point in the pipeline and propagate downstream to contribute to
an incorrect result.  For constraint-based models of metabolism, the
most conservative assumptions underlying the model are stoichiometric
constraints such as mass balance, thermodynamic constraints such as
the reversibility of the reaction, and enzyme capacity constraints to
limit the maximum flux through a reaction.  These constraints act as
hard, inviolable physicochemical constraints that cells must abide by.

Therefore, the utility of these models lies not in what it can predict
correctly, but in what it predicts incorrectly, for by rigorously
examining the assumptions underlying that model, new knowledge can be
gained.

Constraint-based analysis (CBA) is a rigorous test of a model's
underlying network topology.  A model is considered incomplete if it
is unable to produce all the biomass components necessary for growth
from that organism's known minimal nutrient set. Conversely, the model
must *not* be able to produce the full set of essential biomass
components from a given media if the organism is known *not* to grow
in that media.

CBA can also be used to check whether the knowledge contained in a
metabolic database is consistent with the model assumptions.  To
represent chemical and physical constraints faithfully, every reaction
must be carefully mass balanced, and every metabolite must have at
least one consuming flux and one producing flux.

Finally, CBA can be used to test the relationship of genes to the
reactions their products catalyze. Every metabolic reaction is
catalyzed by one or more enzyme.  Each enzyme is composed of one or
more gene products. By performing in silico gene knockouts and
comparing these predictions with experiment, one can test the boolean
relationship between genes and reactions inferred from the protein
complex and isozyme annotations.