Protein Structure Prediction and Design

Proteins have amazing structural and functional diversity that enables cells to catalyze chemical reactions, regulate cell growth and death, transduce energy into mechanical force, execute complex signaling networks, and carry out the myriad of other processes essential for life. For biological systems to adapt, they must be able to acquire new functions and consequently new protein structures. Examining the database of known protein structures reveals that nature re-uses protein folds and structural motifs. Some folds are prolific and perform many distinct functions while others are restricted to one or a few closely related functions. While proteins with similar structures are likely to have similar functions, the details of specificity and regulation depend on the small differences between globally similar structures. We combine computational and experimental approach to understand the origin of protein structural diversity and the mechanism by which protein structure determines functional specificity.

Computationally, we utilize the Rosetta algorithm, which predicts protein structure by assembling short fragments of known structures. This method has been very successful in large-scale structure prediction tests and has been effectively applied to other modeling problems including protein design and protein-protein docking. One focus of our research is developing the Rosetta strategy to model structural differences in homologous proteins. Approximately 30% of known sequences have sufficient sequence similarity to a protein of known structure for current homology-based modeling methods, and this number is expected to increase as high-throughput structural genomics projects continue to map the protein structure universe. While homology modeling is one of the most reliable methods for structure prediction, structural divergence between related proteins occurs when sequence similarity is weak or nonexistent. Because sequence and structure divergence between homologous family members is responsible for changes in protein function and specificity, accurately modeling the differences between similar structures is an important application of prediction methods.

The evolution of diversity in protein structures can be traced by sequence-based methods, and comparison of known protein structures provides ample evidence that protein folds evolve by accumulation of insertions, deletions, and conversions of secondary structure elements. Proteins pairs that exhibit structural similarity in the absence of significant sequence similarity, however, may have diverged from a common ancestor to such an extent that detectable sequence homology has been lost or may result from convergent evolution. Additionally, highly related family members often have multiple insertions and/or deletions in addition to significant local sequence changes, complicating analysis of sequence-structure relationships. A second research focus is rational design and biophysical characterization of sequence-structure pairs that differ by a single insertion/deletion. The structural, functional and energetic consequences of single insertion/deletion events can be experimentally examined in such designed proteins, providing ideal model systems for improving computational methods for predicting structural perturbations. Rational design also permits putative pathways of structure evolution to be recreated, allowing investigation of the extent to which nature recycles elements in protein folds and functions and potentially permitting exploitation of such modular design for engineering novel protein folds and functions.