340 likes | 530 Views
Bioinformatics of Protein Structure. Protein structures often characterized by secondary structure content. All a All b a / b a + b There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data. Sequence/structure.
E N D
Protein structures often characterized by secondary structure content • All a • All b • a/b • a+b • There are tools available (for instance at www.expasy.ch that will allow one to predict secondary structure from sequence data
Sequence/structure • All a-proteins begin to reveal sequence/structure relationship • Coiled-coil proteins exhibit periodicity with hydrophobic residues • Observe hydrophobic moments in membrane proteins
~1/4 of all predicted proteins in a genome are membrane proteins
Common structures found in b structures • Barrels • Propellers • Greek key • Jelly roll (Contains one Greek key) • Helix
Anti-parallel structures exhibit every other amino acid periodicity
Propellers • Variable number of propeller blades http://info.bio.cmu.edu/courses/03231/ProtStruc/b-props.htm
g-crystallin has two domains with identical topology • Protein evolution – motif duplication and fusion
Protein structures containing a and b • Distinction between a/b and a + b • a/b -Mainly parallel beta sheets (beta-alpha-beta units) • a + b - Mainly antiparallel beta sheets (segregated alpha and beta regions)
How many folds are there? Proteins have a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. To date we know ~26,000 protein structures Within this dataset, 945 folds are recognized http://scop.mrc-lmb.cam.ac.uk/scop/
How many non-folds are there? • http://www.scripps.edu/news/press/013102.html • 30-40% of human genome encodes for “unstructured” native proteins
Transition to structural classifications • Several useful databases link sequence analysis and protein structure information • Since structure is more highly conserved than sequence during evolution, structural alignment algorithms and classifications enable more distant evolutionary relatives to be identified. • CATH and SCOP are two databases that “organize” protein structures, each containing 950-1400 protein superfamilies
Structural Alignments • Various algorithms allow structure vs. structure comparisons • VAST, DALI • CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) also has SSAP and GRATH (one computationally intensive, one not) • [Sequence similarity to structural families for modeling often extracted using PSI-BLAST (Gene3D)]
Pairwise Structure Alignment: SSAP [1,4] Comparison of sequence and structure alignments: [1] Taylor WR, Orengo CA, 1989, Protein structure alignment. J Mol Biol 208:1-22[4] Mueller L, 2003, Protein structure alignment. Paper presentations 27.5:16:30h
Multiple structural alignments • CORA – from CATH (where?) • MultiProt - http://bioinfo3d.cs.tau.ac.il/MultiProt/ • DMAPS – (pre-calculated) http://dmaps.sdsc.edu/ • CE-MC - http://cemc.sdsc.edu/ • Others?
CATH • http://cathwww.biochem.ucl.ac.uk/latest/ • Classification Scheme: Class, Architecture, Topology and Homology • Class – secondary structure composition and packing • Architecture – orientation of secondary structures in 3D, regardless of connectivity • Topology – both orientation and connectivity of secondary structure is accounted for • Homologous superfamily – grouped based on whether an evolutionary relationship exists (clustered at different levels of sequence ID)
CATH hierarchy • Structural alignments To homologous super- Family, then sequence Alignments for sequence Family, and then domains.
Protein structure predictions • Identifying similar protein structures using only amino acid sequence • Modeling an amino acid sequence onto a known protein structure • Ab initio protein structure prediction
Test sequence >rsp2570 MTLDGKTIAILIAPRGTEDVEYVRPKEALTQATVVTVSLEPGEAQTVNGDLDPGATHRVDRTFADVSADAFDGLVIPGGTVGADKIRSSEEAVAFVRGFVSAGKPVAAICHGPWALVEADVLKGREVTSYPSLATDIRNAGGRWVDREVVVDSGLVTSRKPDDLDAFCAKMIEEFAEGVHDGQRRSA
SCOP database • Classification scheme: Class, Fold, Superfamily, and Family, • Class – Type and organization of secondary structure • Fold – Share common core structure, same secondary structure elements in the same arrangement with the same topological connections • Superfamily – share very common structure and function • Family – protein domains share a clear common evolutionary origin as evidenced by sequence identity or similar structure/function
HMM’s are useful at SCOP • For instance, SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) HMMs are derived from the PDB databank at www.rcsb.org • Identify sequence signatures for specific domains
Modeling protein structure based on homology • SWISS-MODEL • http://swissmodel.expasy.org/ • Using first approach mode, submit test sequence, and use your email • PSI-Blast identifies the most similar sequence with a protein structure, and SWISS-MODEL wraps your input sequence around it • Note you can also specify which structure you would like your sequence to wrap around
Ab initio predictions • Protein folding is a complex problem
Ab initio attempts • Based on Ramachandran plot probabilities • Measure interatomic Interactions – has worked for small proteins <85 aa, which appear to Favor H-bonds and van Der Waal and ignore Electrostatic interactions