520 likes | 792 Views
Predicting PDZ domain protein-protein interactions from the genome. Gary Bader Donnelly Centre for Cellular and Biomolecular Research University of Toronto VanBUG, Vancouver, Jan.8.2009. http://baderlab.org. Computational Cell Map. Map the cell Predict map from genome
E N D
Predicting PDZ domain protein-protein interactions from the genome Gary Bader Donnelly Centre for Cellular and Biomolecular Research University of Toronto VanBUG, Vancouver, Jan.8.2009 http://baderlab.org
Computational Cell Map • Map the cell • Predict map from genome • Multiple perturbation mapping • Active cell map • Map visualization and analysis software • Read map to understand • Cell processes • Gene function • Disease effects • Map evolution Cary MP et al. Pathway information… FEBS Lett. 2005 Bader GD et al. Functional genomics and proteomicsTrends Cell Biol. 2003
How are biological networks in the cell encoded in the genome? Can we accurately predict biologically relevant interactions from a genome? How do genome sequence changes underlying disease affect the molecular network in the cell? Can we predict how well model pathways or phenotypes will translate to human? Can we design new networks de novo?
Predicting Protein Interaction Networks From the Genome • Ideally: • Reality: • Not currently possible • Signaling pathways too divergent to accurately map by orthology • Protein interaction prediction likely as hard as protein folding, in general e.g. induced fit Accurately Predict
Predicting Networks • Map via orthology relationships • Metabolic pathways • E.g. KEGG, BioCyc, metaSHARK • Protein-protein interactions • E.g. OPHID, HomoMINT • Signaling pathways • E.g. Reactome • Infer using functional associations • Phylogenetic profile, Rosetta Stone • Infer from molecular profiles • Gene expression gene regulatory network • E.g. ARACNE, MEDUSA, MatrixREDUCE Pinney et al. NAR 2005 Bader & Enright
Peptide Recognition Domains • Simple binding sites • Well studied • Numerous • Biologically important • Eukaryotic signaling systems often involve modular protein-protein interaction domains http://pawsonlab.mshri.on.ca/ http://nashlab.uchicago.edu/domains/
Protein Domain Interaction Network Prediction Genome Gene and protein prediction Domain prediction Specificity prediction Protein-protein interaction prediction
Protein Domain Interaction Network Prediction Genome Gene and protein prediction Domain prediction Specificity prediction Protein-protein interaction prediction
80-90 aa’s, 5-6 beta strands, 2 alpha helices Recognizehydrophobic C-termini Membrane localization of signaling components Neuronal development, cell polarity, ion channel regulation PDZ Domains C Dev Sidhu Par-6 PDZ Domain VKESLV-COOH (1RZX, Fly) Tonikian et al. PLoS Biology Sep.2008
~250 Human PDZ Domains Multiple sequence alignment
~250 Human PDZ Domains Multiple sequence alignment
PDZ Binding Motifs Class 1: X[T/S]X C-Terminus Class 2: XX polar basic acidic hydrophobic
Sequence Logo Position SWWPDSWV NAFEETWV NPFWDVWV NPFWDVWV SVDVDTWV -AYFDTWV STFLETWV KGVFESWV ESWHDSWV -GDQDTWV GRWMDTWV KFWRDTWL … Profile Logo Amino Acid polar=green, basic=blue, acidic=red, hydrophobic=black Schneider TD, Stephens RM. 1990. Nucleic Acids Res. 18:6097-6100 http://weblogo.berkeley.edu/
82 worm and human PDZ specificities mapped by phage display ~3100 peptides
PDZ Specificity Map Class 2: XX Class 1: X[T/S]X
PDZ Specificity Map Class 2: XX Class 3: X[D/X]X Class 4: XGX Class 1: X[T/S]X
PDZ Specificity Map Class 2: XX Class 3: X[D/X]X 16 Classes Class 4: XGX Class 1: X[T/S]X
Versatile Position Many Distinct Specificities
Versatile and Robust 91 Erbin mutants phaged, 3400 peptides Mutations cause specificity switch, not function loss
Conserved Specificity, Expanded Use PDZ domains are versatile, but only ~16 classes used from worm to human One billion years of evolution Model: specificities arose early, domains expanded under evolutionary constraints Raffi Tonikian
Protein Domain Interaction Network Prediction Genome Gene and protein prediction Domain prediction Specificity prediction Protein-protein interaction prediction
Predicting PDZ Specificity >ERBB2IP-1 RVRVEKDPELGFSISGGVGGRGNPFRPDDDGIFVTRVQPE GPASKLLQPGDKIIQANGYSFINIEHGQAVSLLKTFQNTVELII Tonikian et al. PDZ specificity map
50 mapped PDZ domains >70% similar to 69 unmapped PDZ Double coverage to 45% of worm/human 33 more PDZ groups 110 singletons Worm Human Mapped Unmapped
Are Residues Correlated? ~80 ~3000 Boris Reva, Chris Sander
Prediction Can Be Accurate Experiment Prediction
Challenge: But Not Always Experiment Prediction Shirley Hui
Predicting PDZ Specificity Consider sequence and physicochemical properties high accuracy at matching known domains to peptides Test Examples (PDZ-Peptide Pairs) Predictions YES NO ? ? … Machine Learning … Training Examples (Binding and Non binding PDZ-Peptide Pairs) NO NO NO NO YES YES YES YES Negative: Positive: … … Shirley Hui, Xiaojian Shao
Protein Domain Interaction Network Prediction Genome Gene and protein prediction Domain prediction Specificity prediction Protein-protein interaction prediction
Genome Search Phage Results SWWPDSWV NAFEETWV NPFWDVWV NPFWDVWV SVDVDTWV -AYFDTWV STFLETWV KGVFESWV ESWHDSWV -GDQDTWV GRWMDTWV KFWRDTWL … Profile PDZ ERBIN polar=green, basic=blue, acidic=red, hydrophobic=black
PDZ ERBIN Genome Search >Q86W91_HUMAN Plakophilin 4, isoform b ...LKSTTNYVDFYSTKRPSYRAEQYPGSPDSWV C-Terminal Match Score QYPGSPDSWV 5.5 DSWV Assumes: Position independence, uniform input, good sampling Physiological binder is similar to phage sequence Predicted C-Terminal Motif
Prediction Can be Accurate ERBIN PDZ Interaction Prediction ERBB2IP-1 10E-5 (High) Probability of PDZ binding 10E-7 (Low) Known Interactor …but requires further experimental support High Score
p-value Network of prioritized human PDZ interactions Matches known biology, significantly enriched in known interactors 8% overlap, p=8.6x10-18 336 interactions between 54 PDZ domains, 247 proteins
Genome Future: In vivo Protein Interaction Prediction Biologically Relevant (In vivo) In vitro Peptides PDZ Evolutionary Context Protein Expression Phage Display Protein Function In silico Predictions Protein Structure Bind Network Context Protein Location DLGs NMDAR
PDZ Human-Virus Interactions 89 viral proteins matched better than any human protein (vs. 30 domains) Affinities (ELISA) Yingnan Zhang
Crtam Ig transmembrane protein important in late phase T cell activation Non SCRIB binding SCRIB Binding Crtam peptide inhibitor blocks SCRIB-3 binding and polarization T cell Synthetic viral peptide promotes T cell proliferation Non SCRIB binding SCRIB Binding Jung-Hua Yeh and Andrew Chan
Conclusions • PDZ domains are highly specific, versatile and robust to mutation • Many specificities possible, but only a few are used • Specificity can be predicted from domain sequence • Prioritize predictions for experimental follow up • Use by pathogens • PDZ specificity map useful for: • Novel protein interaction discovery • Peptidomimetic therapeutic design • PDZ design (synthetic biology)
Expert knowledge Experimental Data Cell map exploration and analysis Can we accurately predict protein interactions? Databases Literature Pathway Information Pathway Analysis (Cytoscape)
~280 Pathway Databases! http://pathguide.org Vuk Pavlovic
Pathway Commons: A Public Library http:pathwaycommons.org Sander Lab (MSKCC) Bader Lab • Books: Pathways • Lingua Franca: BioPAX OWL • Index: cPath pathway database software • Translators: translators to BioPAX • Open access, free software • No competition: Author attribution • Aggregate ~ 20 databases in BioPAX format
http://cytoscape.org Network visualization and analysis Pathway comparison Literature mining Gene Ontology analysis Active modules Complex detection Network motif search UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, U of Toronto, U of Michigan
Gene Function Prediction • Guilt-by-association principle • Biological networks are combined intelligently to optimize prediction accuracy • Algorithm is more fast and accurate than its peers Quaid Morris (CCBR) Rashad Badrawi, Ovi Comes, Sylva Donaldson, Christian Lopes, Jason Montojo, Khalid Zuberi http://www.genemania.org
Canadian Bioinformatics Workshops 2009 Clinical Genomics and Biomarker Discovery Date: July 16-17, 2009, Toronto Faculty: Sohrab Shah Interpreting Gene Lists from -omics Studies Date: July 9-10, 2009, Toronto Faculty: Gary Bader, Quaid Morris & Wyeth Wasserman Informatics on High-Throughput Sequencing Data Date: July 23-24, 2009, Toronto Faculty: Michael Brudno, Asim Siddiqui & Francis Ouellette Exploratory Data Analysis and Essential Statistics using R October 2-3, 2009, Toronto Faculty: Raphael Gottardo and Boris Steipe Applications now being accepted at www.bioinformatics.ca Limited registration Registration Fee: $500
PDZ Work Genentech Dev Sidhu Yingnan Zhang Heike Held Stephen Sazinsky Yan Wu University of Toronto Charlie Boone Raffi Tonikian, Xiaofeng Xin MSKCC Chris Sander Boris Reva Acknowledgements Bader Lab G2N Chris Tan David Gfeller Shirley Hui Xioajian Shao Shobhit Jain MP Anastasija Baryshnikova Iain Wallace Laetitia Morrison Ron Ammar ACM Daniele Merico Ruth Isserlin Vuk Pavlovic Oliver Stueker Cytoscape Trey Ideker (UCSD) Kei Ono, Mike Smoot, Peng Liang Wang (Ryan Kelley, Nerius Landys, Chris Workman, Mark Anderson, Nada Amin, Owen Ozier, Jonathan Wang) Lee Hood (ISB) Sarah Killcoyne, John Boyle, Ilya Shmulevich (Iliana Avila-Campillo, Rowan Christmas, Andrew Markiel, Larissa Kamenkovich, Paul Shannon) Benno Schwikowski (Pasteur) Mathieu Michaud (Melissa Cline, Tero Aittokallio) Chris Sander (MSKCC) Ethan Cerami, Ben Gross (Robert Sheridan) Annette Adler (Agilent) Allan Kuchinsky, Mike Creech (Aditya Vailaya) Bruce Conklin (UCSF) Alex Pico, Kristina Hanspers Pathway Commons Chris Sander Ethan Cerami Ben Gross Emek Demir Robert Hoffmann Igor Rodchenkov Rashad Badrawi Funding CIHR, NSERC, NIH Genome Canada Canada Foundation for Innovation/ORF http://baderlab.org