680 likes | 890 Views
11/7/05 Protein Structure: Classification, Databases, Visualization. Announcements. BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students
E N D
11/7/05Protein Structure: Classification, Databases, Visualization D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students Dec 2 Fri noon - Written project reports due Dec 5,7,8,9 class/lab - Oral Presentations (20') (Dec 15 Thurs = Final Exam) D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Bioinformatics Seminars Nov 7 Mon 12:10 IG FacultySeminarin 101 Ind Ed II Inborn Errors of Metabolism in Humans & Animal Models Matt Ellinwood, Animal Science, ISU Nov 10 Thurs 3:40 Com S Seminarin 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas http://www.cs.iastate.edu/~colloq/#t3 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Bioinformatics Seminars CORRECTION: Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link) Nov 14 Mon 1:10 PM Doug Brutlag, Stanford Discovering transcription factor binding sites Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Protein Structure & Function:Analysis & Prediction Mon Protein structure: classification, databases, visualization Wed Protein structure: prediction & modeling Thurs Lab Protein structure prediction Fri Protein-nucleic acid interactions Protein-ligand docking D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Reading Assignment (for Mon-Fri) • Mount Bioinformatics • Chp 10 Protein classification & structure prediction http://www.bioinformaticsonline.org/ch/ch10/index.html • pp. 409-491 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Other? Additional reading assignments for BCB 544 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Review last lecture:RNA Structure PredictionAlgorithms D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
RNA structure prediction strategies Secondary structure prediction • Energy minimization (thermodynamics) 2) Comparative sequence analysis (co-variation) 3) Combined experimental & computational D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
1) Energy minimization method What are the assumptions? Native tertiary structure or "fold" of an RNA molecule is (one of) its lowest free energy configuration(s) Gibbs free energy = Gin kcal/mol at 37C = equilibrium stability of structure lower values (negative) are more favorable Is this assumption valid? in vivo? - this may not hold, but we don't really know D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Gibbs free energy: G Gibbs Free energy (G) is formally defined in terms of state functions enthalpy & entropy, & state variable, temperature G = H - TS G= H - TS (for constant temp) Enthalpy(H) = amount of heat absorbed by a system at constant pressure Entropy (S) = measure of the amount of disorder or randomness in a system Note = this is not the same as "entropy" in information theory, but is related, see: http://en.wikipedia.org/wiki/Information_theory D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Gibbs free energy: G Gibbs free energy for formation of an RNA or protein structure = G =equilibrium stability of that structure at a specific temperature (kcal/mol at 37°C) G = -RT lnKeq R = gas constant D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Nearest-neighbor parameters Most methods for free energy minimization use nearest-neighbor parameters (derived from experiment) for predicting stability of an RNA secondary structure (in terms of Gat 37C) & most available software packages use the same set of parameters: Mathews, Sabina, Zuker & Turner, 1999 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Energy minimization - calculations: Total free energy of a specific conformation for a specific RNA molecule = sum of incremental energy terms for: • helical stacking (sequence dependent) • loop initiation • unpaired stacking (favorable "increments" are < 0) Fig 6.3 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
But how many possible conformations for a single RNA molecule? Huge number: Zuker estimates (1.8)N possible secondary structures for a sequence of N nucleotides for 100 nts (small RNA…) = 3 X 1025 structures! Solution? Not exhaustive enumeration… • Dynamic programming O(N3) in time O(N2) in space/storage iff pseudoknots excluded, otherwise: O(N6 ), time O(N4 ), space D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Algorithms based on energy minimization For outline of algorithm used in Mfold, including description of dynamic programming recursion, please visit Michael Zuker's lecture:http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html From this site, you may also download Zuker's lecture as either PDF or PS file. D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
2) Comparative sequence analysis (co-variation) Two basic approaches: • Algorithms constrained by initial alignment Much faster, but not as robust as unconstrained Base-pairing probabilities determined by a partition function • Algorithms not constrained by initial alignment Genetic algorithms often used for finding an alignment & set of structures D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
RNA structure prediction strategies Tertiary structure prediction Requires "craft" & significant user input & insight • Extensive comparative sequence analysis to predict tertiary contacts (co-variation) e.g., MANIP - Westhof • Use experimental data to constrain model building e.g., MC-CYM - Major • Homology modeling using sequence alignment & reference tertiary structure (not many of these!) • Low resolution molecular mechanics e.g., yammp - Harvey D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
New Last Time: Protein Structure & Function D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Protein Structure & Function • Protein structure - primarily determined by sequence • Protein function - primarily determined by structure • Globular proteins: compact hydrophobic core & hydrophilic surface • Membrane proteins: special hydrophobic surfaces • Folded proteins are only marginally stable • Some proteins do not assume a stable "fold" until they bind to something = Intrinsically disordered • Predicting protein structure and function can be very hard --& fun! D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
4 Basic Levels of Protein Structure D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Primary & Secondary Structure • Primary • Linear sequence of amino acids • Description of covalent bonds linking aa’s • Secondary • Local spatial arrangement of amino acids • Description of short-range non-covalent interactions • Periodic structural patterns: -helix, b-sheet D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Tertiary & Quaternary Structure • Tertiary • Overall 3-D "fold" of a single polypeptide chain • Spatial arrangement of 2’ structural elements; packing of these into compact "domains" • Description of long-range non-covalent interactions (plus disulfide bonds) • Quaternary • In proteins with > 1 polypeptide chain, spatial arrangement of subunits D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
"Additional" Structural Levels • Super-secondary elements • Motifs • Domains • Foldons D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
New Today: • Protein Structure & Function • Amino acids characteristics • Structural classes & motifs • Protein functions & functional families • not much - more on this later Classification Databases Visualization D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Amino Acids • Each of 20 different amino acids has different "R-Group," side chain attached to Ca D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Peptide bond is rigid and planar D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Hydrophobic Amino Acids D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Charged Amino Acids D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Polar Amino Acids D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Certain side-chain configurations are energetically favored (rotamers) Ramachandran plot: "Allowable" psi & phi angles D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Glycine is smallest amino acidR group = H atom • Glycine residues increase backbone flexibility because they have no R group D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Proline is cyclic • Proline residues reduce flexibility of polypeptide chain • Proline cis-trans isomerization is often a rate-limiting step in protein folding • Recent work suggests it also may also regulate ligand binding in native proteins -Andreotti D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Cysteines can form disulfide bonds • Disulfide bonds (covalent) stabilize 3-D structures • In eukaryotes, disulfide bonds are found only in secreted proteins or extracellular domains D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Globular proteins have a compact hydrophobic core • Packing of hydrophobic side chains into interior is main driving force for folding • Problem? Polypeptide backbone is highly polar (hydrophilic) due to polar -NH and C=O in each peptide unit; these polar groups must be neutralized • Solution? Form regular secondary structures, • e.g., -helix, b-sheet, stabilized by H-bonds D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Exterior surface of globular proteins is generally hydrophilic • Hydrophobic core formed by packed secondary structural elements provides compact, stable core • "Functional groups" of protein are attached to this framework; exterior has more flexible regions (loops) and polar/charged residues • Hydrophobic "patches" on protein surface are often involved in protein-protein interactions D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Protein Secondary Structures • Helix • Sheets • Loops • Coils D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
a- Helix • Most abundant 2' structure in proteins • Average length = 10 aa's (~10 Angstroms) • Length varies from 5-40 aa's • Alignment of H-bonds creates dipole moment (positive charge at NH end) • Often at surface of core, with hydrophobic residues on inner-facing side, hydrophilic on other side D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
helix is stabilized by H-bonds between ~ every 4th residue C = black O = red N = blue D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
R-groups are on outside of helix D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Types of helices • "Standard" helix: 3.6 residues per turn • H-bonds between C=0 of residue n and • NH of residue n + 4 • Helix ends are polar; almost always on surface of protein • Other types of helices? • n + 5 = helix • n + 3 = 310 helix D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Certain amino acids are "preferred" & others are rare in helices • Ala, Glu, Leu, Met = good helix formers • Pro, Gly Tyr, Ser = very poor • Amino acid composition & distribution varies, depending on on location of helix in 3-D structure D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
-Strands & Sheets • H-bonds formed between 5-10 consecutive residues in one portion of chain with another • set of 5-10 residues farther down chain • Interacting regions may be adjacent (with short loop between) or far apart • -sheets usually have all strands either parallel or antiparallel D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Antiparallel-sheet D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Antiparallel-sheet D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Parallel-sheet D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Mixed-Sheets also occur D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Loops • Connect helices and sheets • Vary in length and 3-D configurations • Are located on surface of structure • Are more "tolerant" of mutations • Are more flexible and can adopt multiple conformations • Tend to have charged and polar amino acids • Are frequently components of active sites • Some fall into distinct structural families (e.g., hairpin loops, reverse turns) D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Coils • Regions of 2' structure that are not helices, sheets, or recognizable turns • Intrinsically disordered regions appear to play important functional roles D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
Globular proteins are built from recurring structural patterns • Motifs or supersecondary structures = • combinations of 2' structural elements • Domains = combinations of motifs • Independently folding unit (foldon) • Functional unit D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization
A few common structural motifs • Helix-turn-helix e.g., DNA binding • Helix-loop-helix e.g., Calcium binding • b-hairpin 2 adjacent antiparallel strands • connected by short loop • Greek key 4 adjacent antiparallel strands • ba-b2 parallel strands connected by helix D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization