Jeremy Zucker "How to debug a bug: constraint-based analysis of a bioinformatics pipeline for constructing genome-scale metabolic models from an annotated genome" Abstract: Despite the increasing availability of annotated genomes, a paucity of genome-scale metabolic models have so far been published. The bottleneck comes from the extensive manual curation which must take place before attempting to make predictions with the model. However, if a bioinformatics pipeline for constructing models of metabolism from genomes is to succeed, it must also incorporate a systematic method for curating each model. We outline a proposal for constructing such a pipeline by generating these models from metabolic pathway/genome databases and applying constraint-based reasoning techniques to discover errors in the underlying data and data representations. By creating a feedback loop between model and database, we hope to accelerate the curation process to the point that systematic application of these models to a large variety of organisms becomes feasible. When the model fails to predict outcomes correctly, we must apply the scientific method to the system as a whole. Errors may be introduced at any point in the pipeline and propagate downstream to contribute to an incorrect result. For constraint-based models of metabolism, the most conservative assumptions underlying the model are stoichiometric constraints such as mass balance, thermodynamic constraints such as the reversibility of the reaction, and enzyme capacity constraints to limit the maximum flux through a reaction. These constraints act as hard, inviolable physicochemical constraints that cells must abide by. Therefore, the utility of these models lies not in what it can predict correctly, but in what it predicts incorrectly, for by rigorously examining the assumptions underlying that model, new knowledge can be gained. Constraint-based analysis (CBA) is a rigorous test of a model's underlying network topology. A model is considered incomplete if it is unable to produce all the biomass components necessary for growth from that organism's known minimal nutrient set. Conversely, the model must *not* be able to produce the full set of essential biomass components from a given media if the organism is known *not* to grow in that media. CBA can also be used to check whether the knowledge contained in a metabolic database is consistent with the model assumptions. To represent chemical and physical constraints faithfully, every reaction must be carefully mass balanced, and every metabolite must have at least one consuming flux and one producing flux. Finally, CBA can be used to test the relationship of genes to the reactions their products catalyze. Every metabolic reaction is catalyzed by one or more enzyme. Each enzyme is composed of one or more gene products. By performing in silico gene knockouts and comparing these predictions with experiment, one can test the boolean relationship between genes and reactions inferred from the protein complex and isozyme annotations.