1 / 28

11/16/05 Genomics

11/16/05 Genomics. Bioinformatics Seminars. Nov 17 Thurs 4:10 BBMB Seminar in 1414 MBB C 2 and PH Domains: Diverse regulators of membrane signaling events Joe Falke, UC Boulder Nov 18 Fri 12:10 BCB Seminar in E164 Lago

torie
Download Presentation

11/16/05 Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 11/16/05Genomics D Dobbs ISU - BCB 444/544X: Genomics

  2. Bioinformatics Seminars Nov 17 Thurs 4:10 BBMB Seminarin 1414 MBB C2 and PH Domains: Diverse regulators of membrane signaling eventsJoe Falke, UC Boulder Nov 18 Fri 12:10BCB Seminarin E164 Lago Using P-Values for the Planning and Analysis of Microarray Experiments Dan Nettleton, Stat D Dobbs ISU - BCB 444/544X: Genomics

  3. Protein Structure PredictionGenome Analysis Mon Protein 3' structure prediction Wed Genome analysis & genome projects Comparative genomics; ENCODE, SNPs, HapMaps, medical genomics Thur Lab Protein structure prediction, SNPs Fri Experimental approaches: microarrays, proteomics, metabolomics, chemical genomics D Dobbs ISU - BCB 444/544X: Genomics

  4. Reading Assignment (for Mon-Fri) • Mount Bioinformatics • Chp 11 Genome Analysis http://www.bioinformaticsonline.org/ch/ch11/index.html • pp. 495 - 547 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html D Dobbs ISU - BCB 444/544X: Genomics

  5. BCB 544 Additional Readings • Required: • Gene Prediction • Burge & Karlin 1997 JMB 268:78 Prediction of complete gene structures in human genomic DNA • Human HapMap (Nature 437, Oct 27, 2005) • Commentary (437:1233) http://www.nature.com/nature/journal/v437/n7063/full/4371233a.html • News & Views (437: 1241) http://www.nature.com/nature/journal/v437/n7063/full/4371241a.html • Optional: • Article (437:1299) A haplotype map of the human genome The International HapMap Consortium D Dobbs ISU - BCB 444/544X: Genomics

  6. Review last lecture:Protein Structure Predictionfocus on:Threading D Dobbs ISU - BCB 444/544X: Genomics

  7. Target Sequence ALKKGF…HFDTSE Structure Templates Protein Structure Prediction using Threading Align target sequence with template structures (fold library) from the Protein Data Bank (PDB) Calculate energy (score) to evaluate goodness of fit between target sequence and template structure Rank models based on energy scores (assumption: native-like structures have lowest energy) D Dobbs ISU - BCB 444/544X: Genomics

  8. Protein Threading: typical energy function MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE What is "probability" that two specific residues are in contact? How well does a specific residue fit structural environment? Alignment gap penalty? Total energy: Ep + Es + Eg Find a sequence-structure alignment that minimizes the energy function D Dobbs ISU - BCB 444/544X: Genomics

  9. A Rapid Threading Approach for Protein Structure Prediction Kai-Ming Ho, Physics Haibo CaoYungok Ihm Zhong Gao James Morris Cai-zhuang WangDrena Dobbs, GDCBJae-Hyung Lee Michael Terribilini Jeff Sander D Dobbs ISU - BCB 444/544X: Genomics

  10. Sequence – Structure (1D – 3D problem) ALKKGF…HFDTSE Sequence – Contact Matrix (1D – 2D problem) • Sequence – 1D Profile (1D– 1D problem) Trick for Fast Threading? Ihm 2004

  11. Template structure (reduced) representation 1 i j N Template structure ( contact matrix) if Å (contact) Otherwise A neighbor in sequence (non-contact) Ihm 2004 D Dobbs ISU - BCB 444/544X: Genomics

  12. Energy function • Assumption: At residue level, pair-wise hydrophobic interaction is dominant: E = i,j Cij Uij Cij : contact matrix Uij = U(residue I , residue J) • MJ matrix : U = Uij • LTW : U = Qi*Qj • HP model : U = {1,0} Ihm 2004 D Dobbs ISU - BCB 444/544X: Genomics

  13. Contact energy: pairwise interactions C M F I L . . . Miyazawa-Jernigan (MJ) matrix: C M F I L V W . 0.46 0.54 -0.20 0.49 -0.01 0.06 0.57 0.01 0.03 -0.08 0.52 0.18 0.10 -0.01 -0.04 Statistical potential 210 parameters Li-Tang-Wingreen (LTW): 20 parameters Contact Energy: Ihm 2004

  14. 1 i j Å = < C 1 , if r 6 . 5 N ij ij = C 0 , otherwise (a neighbor in sequence) ij Sequence Vector Sequence AVFMRIHNDIVYNDIANTTQ Contact Energy Scoring Function Contact Matrix Template Structure Cao et al. Polymer 45 (2004) Ihm 2004

  15. Hydrophobic Contacts :contact matrix :i-th eigenvalue of :i-th eigenvector :eigenvector w/ the biggest eigenvalue :protein sequence of the template structure :fraction of hydrophobic contacts from i-th eigenvector 1D profile? first eigenvector of contact matrix Ihm 2004

  16. Weights of eigenvectors for real proteins • First eigenvector of contact matrix dominates the overlap between sequence and structure • Higher ranking (rank > 4) eigenvectors are “sequence blind”

  17. Fast threading alignment algorithm 1D Profile New profile Maximize the overlap between the Sequence ( ) and the profile ( ) allowing gaps Calculate contact energy using the alignment: Ihm 2004 D Dobbs ISU - BCB 444/544X: Genomics

  18. Parameters for alignment? ALKKGFG… HFDTSE • Gap penalty: • Insertion/deletion in helices or strands strongly penalized; small penalties for in/dels in loops • but,gap penalties do not count in energy calculation • Size penalty: • If a target residue & aligned template residue differ in radius by > 0.5Å & if the residue is involved in > 2 contacts, alignment contribution is penalized • but, size penalties do not count in energy calculation Loop Helix or Strand Ihm 2004 D Dobbs ISU - BCB 444/544X: Genomics

  19. How incorporate secondary structure? • Predict secondary structure of target sequence (PSIPRED,PROF,JPRED,SAM, GOR V) • N+ = number matches between predicted 2' structure of target & 2' structure of template • N- = number of mismatches • Ns= number of residues selected in alignment • “Global fitness” : f = 1 + (N+- N-) / Ns • Emodify = f * Ethreading Ihm 2004 D Dobbs ISU - BCB 444/544X: Genomics

  20. Finally, calculate "relative" score:How much better is this “fit” than random ? • Emodify : Sequence vs Structure • (adjusted for 2' structure match) • Eshuffle : Shuffled Sequence vs Structure • (randomize order of amino acids in target sequence 50-200X, • calc. score for each shuffled sequence, take average = Eshuffled) • Erelative = Emodify – Eshuffled Ihm 2004

  21. Performance Evaluation? in a "Blind Test" CASP5 Competition (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (beforeexperimental results published) D Dobbs ISU - BCB 444/544X: Genomics

  22. Predicted Structure Actual Structure Typical Results:(well, actually, our BEST Results):HO = top-ranked CASP5 prediction for this target! • Target 174PDB ID = 1MG7 Ihm 2004

  23. Overall Performance in CASP5 Contest Ho = 8th out of ~180(by M. Levitt, Stanford) • FR Fold Recognition • (targets manually assessed by Nick Grishin) • ----------------------------------------------------------- • Rank Z-Score Ngood Npred NgNW NpNW Group-name • 1 24.26 9.00 12.00 9 12 Ginalski • 2 21.64 7.00 12.00 7 12 Skolnick & Kolinski • 3 19.55 8.00 12.50 9 14 Baker • 4 16.88 6.00 10.00 6 10 BIOINFO.PL • 5 15.25 7.00 7.00 7 7 Shortle • 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA • 7 13.49 4.00 11.00 4 11 Brooks • 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming • 9 10.45 3.00 5.50 3 6 Jones-NewFold • ----------------------------------------------------------- • FR NgNW - number of good predictions without weighting for multiple models • FR NpNW - number of total predictions without weighting for multiple models M Levitt 2004

  24. Protein Structure Prediction Servers & Software • Three basic approaches: • 1) Homology modeling (need >30% sequence identity) • PredictProtein META, SWISS-MODEL, Cn3D • 2) Threading (if <%30 sequence identity) • Best? Hmm - see CASP & EVA • 3) Ab initio (if no template available & many CPUs) • Best? Rosetta (Baker) - see CASP & EVA But, 27 Sep 2005 Baker: "I think the best group may be Ginalski…" (CASP5 & CASP6) D Dobbs ISU - BCB 444/544X: Genomics

  25. Protein Structure Prediction META servers & Evaluation • Best approach for protein structure prediction? Try several servers • How? submit to a META-server: • PredictProtein META • 3-D Jury (BioInfoBank) META Also, check continuous benchmarking sites: • EVA • LiveBench D Dobbs ISU - BCB 444/544X: Genomics

  26. Baker & Sali (2000) D Dobbs ISU - BCB 444/544X: Genomics

  27. New today: Genomics D Dobbs ISU - BCB 444/544X: Genomics

  28. Genomics - for excellent overview lectures, see these posted by NHGRI & Pevsner: • 1- Genomic sequencing • Mapping and Sequencing CTGA2005Lecture1.pdf • Eric Green, NHGRI • 2- Human genome project • The Human Genome 2005-10-19_ch17.pdf • Jonathan Pevsner, Kennedy Krieger Institute • 3- SNPs • Studying Genetic Variation II: Computational Techniques • Jim Mullikin, NHGRICTGA2005Lecture13.pdf • 4- Comparative Genomics • Comparative Sequence Analysis • Elliott Margulies, NHGRI CTGA2005Lecture8.pdf D Dobbs ISU - BCB 444/544X: Genomics

More Related