360 likes | 445 Views
http://creativecommons.org/licenses/by-sa/2.0/. From Protein Sequence to Protein Structure. Prof:Rui Alves ralves@cmb.udl.es 973702406 Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website of the Course: http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/
E N D
From Protein Sequence to Protein Structure Prof:Rui Alves ralves@cmb.udl.es 973702406 Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website of the Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/ Course: http://10.100.14.36/Student_Server/
Outline • Fundamentals of protein structure • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure
Predicting protein sequence from DNA sequence • Protein sequence can be predicted by translating the cDNA and using the genetic code.
Proteins are the primary functional manifestation of genomes atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaactggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaagcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacggaactaaacgcggtaaagccgcttaa DNA sequence transcription Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaaccgaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggaguaaccaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuugguuacaguacggaacuaaacgcgguaaagccgcuuaa RNA sequence translation MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA protein sequence protein structure Protein function
Amino acids are the primary building blocks of proteins • The sequence of AAs is the primary structure of proteins • Sequence determines structure • Amino acids don’t fall neatly into classes • How we casually speak of them can affect the way we think about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen. • The properties of a residue type can also vary with conditions/environment
Grouping the amino acids by properties Livingstone & Barton, CABIOS, 9, 745-756, 1993.
w a t e r i s e l i m i n a t e d O O H N C H C O H H N C H C O H t w o a m i n o a c i d s 2 2 c o n d e n s e t o f o r m . . . R R 1 2 C o r c a r b o x y N o r a m i n o t e r m i n u s O O t e r m i n u s H N C H C N H C H C O H . . . a d i p e p t i d e . I f + H O H 2 t h e r e a r e m o r e i t b e c o m e s a p o l y p e p t i d e . R R 1 2 S h o r t p o l y p e p t i d e c h a i n s a r e u s u a l l y c a l l e d p e p t i d e s p e p t i d e b o n d i s f o r m e d w h i l e l o n g e r o n e s a r e c a l l e d p r o t e i n s . r e s i d u e 2 r e s i d u e 1 Proteins are made by controlled polymerization of amino acids
Outline • Fundamentals of protein structure • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure
Repeating torsion angles f/y angles characterize the secondary structure
Secondary structure elements in proteins A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted alpha-helix (local interactions) beta-strand(nonlocal interactions)
Principal types of secondary structure found in proteins Repeating (f,y) values y a-helix (15) (right-handed) -63o -42o 310 helix (14) -57o -30o Parallel b-sheet -119o +113o Antiparallel b-sheet -139o +135o
The alpha-helix: repeating i,i+4 h-bonds 11 180 10 120 12 9 right-handed helical region of phi-psi space 60 8 0 7 -60 5 6 y hydrogen bond -120 4 a-helix (15) (right-handed) -63o -42o -180 1 -120 -60 0 60 120 -180 3 By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space? 2
b strands/sheets beta-strand region of phi-psi space 180 57 Parallel b-sheet 120 y 56 60 -119o +113o 0 54 -60 53 -120 52 -180 -180 -120 -60 0 60 120 180 51 Is this a parallel or anti-parallel sheet? 50 By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space? 49
Contact maps of protein structures -both axes are the sequence of the protein map of Ca-Ca distances < 6 Å near diagonal: local contacts in the sequence off-diagonal: long-range (nonlocal) contacts rainbow ribbon diagram blue to red: N to C 1avg--structure of triabin
What is secondary structure and what does it teach? • Secondary structure is the sequence of fold elements in a protein (a-b-loop) - The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology • If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds
Outline • Fundamentals of protein structure • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure
Tertiary structure in proteins • Single polypeptide chain • The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology • Pattern of contacts between side chains/backbone also an aspect of tertiary structure • Outer surface and interior
Obvious interactions in native protein structures hydrophobic interactions polar interactions (hydrogen bond/salt bridge) disulfide crosslinks
The protein databank The protein databank is a central repository of protein structures http://www.rcsb.org/pdb/home/home.do
Major structure classification systems SCOP (Structural Classification of Proteins) CATH (Class-Architecture-Topology-Homology) DALI/FSSP (Fold classification based on Structure-Structure Alignment) SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.
Outline • Fundamentals of protein structure • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure
a-helix coil b-strand Training set of known structures Test set of known structures Database of known structures Training set of corresponding sequences Test set of corresponding sequences Database of corresponding sequences … The knuts and bolts behind fold predition ACDEFGTYAEE… Predict 2ary structure Compare Bad Predictions: Reshuffle training set and test set and repeat until predictions are correct Good Predictions: Method ready for new sequence 2ndary structure prediction p(aa1-coil) p(aa1-helix) p(aa1-strand) …
Fold Prediction Fold Prediction Database of known structures Database of corresponding sequences Homology based helix coil-strand profile folds database Server … … Database of probabilities of aa in 2ndary structure How does a fold prediction server work? YOUR SEQUENCE Weak/No Homology Strong Homology Helix-coil-strand profile prediction
Predicting protein structure • Homology Modeling • Phyre, 3D-JIGSAW, SWISSMODEL • Ab initio Modeling • ROBETTA
Database of known structures Database of corresponding sequences Server/ Program … … How does a homology modeling server work? …YDVRSEQVENCE… … Optimization via energy minimization, etc… Thread sequence to predict over known structure according to alignment …YDVR-SEQVENCE… …YDVRMSD-VDNCD… Best possible alignment (Sequence+ Structure) … Strong Homologues …YDVR-SEQVENCE… …YDVRMSD-VDNCD…
Predicting protein structure • Homology Modeling • 3D-JIGSAW,SWISSMODEL • Ab initio Modeling • ROSETTA
Database of corresponding sequences Database of structures for smaller amino acid runs Server/ Program … … Predicting protein structure by ab initio methods …YDVRSEQVENCE… …YDVR-SEQ …YDVRMSD-… …YDVR-SEQ …YPVRMSD-… … …VENCE… …YDNCD… Assemble …VENCE… …VEQCE… Energy minimization & optimization NO Homologues …
Accuracy of modelling • Accuracy is widely varying. • The quality of the model is VERY dependent on the quality of the alignment • Globular proteins are more accurately predicted • Membrane proteins are still a big problem • Homology modelling is “bad” if Homology<30% • CASP is a bienial meeting where accuracy of the different methods is predicted • Baker group is usually and consistently more accurate than others http://www.predictioncenter.org/
BLAST Algorithm • Sequences are split into words (default n=3) • Speed, computational efficiency • Scoring of matches done using scoring matrices • HSP = high scoring segment pair • BLAST algorithm extends the initial “seed” hit into an HSP • Local optimal alignment • More than one HSP can be found
Sequence-Structure Hybrid alignments ACEFGHIKLMNPQRSTVWYAALII…. ACDYGHIKLCQANRSTVWY ALII…. ACDYGHIKLCQANRSTVWY -ALII…. Using a probability model to predict secondary structure we can align the secondary structures aaaaaaaaa l l l l l aaaaaaaaaa…. aaaaaaaaaaaaaaaaaaaaaaaa…. If 3D structures are available for homologues, then structure can be used to improve alignment. STRAP does that: http://www.charite.de/bioinf/strap/
Summary • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure
To Do • Second task: Use your genes from the first task and obtain the protein sequence of all real genes, characterizing physico-chemically, predicting/finding the localization of proteins, their post translational modifications. Finish by creating structural models of each of your proteins. Write a small paper describing all your procedures and results in less than 8 pages, double spaced and in times new roman font, no smaller than 12 points. Tables (maximum 2) and figures (maximum 5) are allowed and are not included in the page limit. Organize your paper in the following way: introduction, methods, results, conclusions and discussion, bibliography, Table, Figures, with figure captions.