Protein structure and stability deduced from the evolutionary sequence record

8 March 2013

Debora Marks
Department of Systems Biology
Harvard Medical School


The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to functional constraints. This record of evolutionary constraints can in principle be exploited for predictive and engineering purposes. Many attempts have been made to do this. I will discuss a new method that allows protein 3D structures, even those of large and trans-membrane proteins, to be calculated with surprising accuracy from sequence alone [1-3].

The method is based on the well explored idea that correlations between mutations in pairs of residues may be indicative of residue proximity in a protein but, crucially, it deals with the confounding effect of transitive correlations in chains of pairs [4-5]. Such transitive correlations produce statistical noise in the observed data that mask the underlying structure of the network of residue-residue interactions, and have previously prevented the full potential of the evolutionary information from being exploited.

I will explain the basic ideas behind the approach, its application to 3D structures of proteins and their interactions, and (if there is time) new work on plasticity of proteins, their alternative conformations, flexibility and stability. I hope that the audience will suggest additional applications that will benefit from this new window into co-conservation of sequence information across evolution.

More reading on Debbie's blog.


  1. D S Marks, T A Hopf, C D Sander, "Protein structure prediction from sequence variation", Nat Biotechnol 30:1072-80 2012. PubMed
  2. T A Hopf, L J Colwell, R Sheridan, B Rost, C Sander, D S Marks, "Three-dimensional structures of membrane proteins from genomic sequencing", Cell 149:1607-21 2012. PubMed
  3. D S Marks, L J Colwell, R Sheridan, T A Hopf, A Pagnani, R Zecchina, C Sander, "Protein 3D structure computed from evolutionary sequence variation", PLoS ONE 6:e28766 2011. PubMed
  4. B G Giraud, J M Heumann, A S Lapedes, "Superadditive correlation", Phys Rev E 59:4983-91 1999. PubMed
  5. E T Jaynes, Probability Theory: The Logic of Science, Cambridge University Press, 2003. Amazon

current theory lunch schedule