210 likes | 306 Views
A network-based representation of protein fold space. Spencer Bliven. Qualifying Examination. 6/6 / 2011. Overview. Background & Motivation Preliminary Research Proposed Future Research. Fold Space. What protein folds ar e possible? Discrete or Continuous? Both? Neither ?
E N D
A network-based representation of protein fold space Spencer Bliven Qualifying Examination 6/6/2011
Overview • Background & Motivation • Preliminary Research • Proposed Future Research
Fold Space • What protein folds are possible? • Discrete or Continuous? Both? Neither? • What portion of fold space is utilized by nature? • Long debated questions. Why? • Understanding of structure-function relationship • Protein design/engineering • Protein evolution • Classification
Previous Work β • Orengo, Flores, Taylor, Thornton. Protein Eng (1993) vol. 6 (5) pp. 485-500 • Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp. 123-38 • Holm and Sander. Science (1996) vol. 273 (5275) pp. 595-603 • Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60 • Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp. 2386-90 • Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61 • Sadreyevet al. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8 α/β α α+β
Why can we do better? • More structures • Sampling of globular folds “saturated” • Few novel folds being discovered • Geometric arguments for saturation of small protein folds • Recent all-vs-all computation • Cluster sequence to 40% identity • 17,852 representative (updated weekly) • 189 million FATCAT rigid-body alignments 73503 http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100 Accessed 5/31/2011
Structural Similarity Graph • Nodes: PDB chains,non-redundant to 40% • Edges: FATCAT-rigid alignments • “Significant” edges: • p<0.001 • Length > 25 • Coverage > 50 • Hierarchically cluster to reduce complexity in visualization a b a/b a+b Multi Membrane Small
Continuity • Skolnick claims ≤ 7 intermediates between any proteins • We observe network diameter=15 • Can find interesting paths Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85
Beta Propellers Symmetry C4 C5 C6 C7
Symmetry • Functionally important • Protein evolution (e.g. beta-trefoil) • DNA binding • Allosteric regulation • Cooperativity • Widespread (~20% of proteins) • Focus of algorithmic work FGF-1 Lee & Blaber. PNAS 2011 TATA Binding Protein 1TGH Hemoglobin 4HHB
Cross-class example • 3GP6.A • PagP, modifies lipid A • f.4.1 (transmembrane beta-barrel) • 1KT6.A • Retinol-binding protein • b.60.1 (Lipocalins)
Summary of Preliminary Research • Calculated all-vs-all alignment • Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985 • Built network of significant alignments • Approximately matches SCOP classifications • Improved structural alignment algorithms • Identify symmetry, circular permutations, topology independent alignments • Discussed more in report
Future Research • Improve the network • Improve all-vs-all comparison algorithm • Tune parameters during graph generation • Annotate the network & draw biological inferences • Annotate nodes with functional information • Compare with other networks • Create new networks • Enhance structural comparison algorithms
1. Improve all-vs-all comparison algorithm • Need domain decomposition • Use Combinatorial Extension (CE)
2. Tune parameters during graph generation • Don’t use p-values • Shouldn’t compare p-values, statistically* • Not normalized by secondary structure • Not accurate due to multiple testing problem • Use TM-score • RMSD, normalized to the alignment length • Determine optimal thresholds for determining “significance” • For instance, train an SVG * Technically ok here, since one-to-one with the FATCAT score
FATCAT p-value by Class • Perform poorly on all-alpha in “twilight zone” • Terrible on membrane proteins • Probably reflects non-structural considerations in SCOP assignment
3. Annotate nodes with functional information • SCOP/CATH classifications • GO terms • Metal binding • Ligand binding • Symmetry a b a/b a+b Multi Membrane Small
4. Compare with other networks • Define other types of network over the set of protein representatives • Protein-protein interactions • Co-expression • Correlate to the structural similarities Structural similarity Protein-protein interaction
5. Enhance structural comparison algorithms • Improve automated pseudo-symmetry detection • Find topology-independent relationships C3
Summary • Fold space as network • Improve network creation • Annotate network with functional information • Improve structural similarity detection
Acknowledgments Bourne Lab Philip Bourne Andreas Prlić Lab & PDB members Qualifying Exam Committee Ruben Abagyan Patricia Jennings Andy McCammon Collaborators Philippe Youkharibache Jean-Pierre Changeux Rotation Advisors Pavel Pevzner Philip Bourne JoséOnuchic & Pat Jennings Mike MacCoss Virgil Woods