880 likes | 1.14k Views
Protein interactions and Pathways. Jyoti Khadake & Vicky Schneider Joint Wellcome Trust –EBI Summer School 24 th June 2011. This morning session outline. Where do protein sequences come from? Introduction to protein databases Introduction to protein interactions
E N D
Protein interactions and Pathways Jyoti Khadake & Vicky Schneider Joint Wellcome Trust –EBI Summer School 24th June 2011
This morning session outline Where do protein sequences come from? Introduction to protein databases Introduction to protein interactions Standardisation of the protein interaction data IntAct and demo Psicquic/Cytoscape & demo Data visualisation and network building- Including the Protein information from other sources to enhance networks
Protein Sequences Can you name THE database of protein sequences? Protein databases • Based on nucleotide sequence similarity • Based on peptide sequences Organism database • Organism of protein is important as is sequence – taxonomy databases
UniProtKB factsheet
Let’s explore a protein: CDC42 • Cell division control protein 42 homolog also known as CDC42 is a protein involved in regulation of the cell cycle. • It is a small GTPase of the Rho-subfamily, which regulates signaling pathways that control diverse cellular functions including cell morphology, migration, endocytosis and cell cycle progression. What could go wrong if CDC42 is not doing its job?
UniProtKB (CDC42 protein) How are TREMBL entries generated? • Search for gene - CDC42 • Check the different proteins retrieved • Organisms • Same organism swissprot/trembl • Different referenced databases - PRIDE • Sequences and References • Information about protein Where is it present, how does it act, what are its properties… • INTACT, REACTOME, GOA, INTERPRO, PDB
UniProt Knowledge Base • Swiss-Prot: Manual annotations (~450,000 proteins) • TrEMBL: Automatic (~3,300,000 proteins) http://www.ebi.uniprot.org/ Master headline
UniProt Knowledge Base • Interactions in IntAct are using Splice Variants http://www.ebi.uniprot.org/ Master headline
UniProt Knowledge Base • Summary: • Master Protein: P60953 • Splice variants / Isoform: P60953-1, P60953-2 ! http://www.ebi.uniprot.org/ Master headline
UniProt Knowledge Base Protein Families, domains and motifs
What is a Protein families? Protein domain? And protein motifs? Why to bother creating a db that groups proteins that share the same domain?
InterPro Protein Families, domains (and motifs) factsheet
UniProt Knowledge Base • Summary: • Master Protein: P60953 • Interaction and pathway databases ! http://www.ebi.uniprot.org/ Master headline
UniProt Taxonomy • Web Interface to the NCBI taxonomy Master headline
Newt Master headline
PRIDE factsheet
Interactions • Basis of protein action • Types • Self • Binary: homomeric or heteromeric • N-nary complexes • Co-localisations • Biological types of interactions • Information in literature and websites
Types of Interaction data in IntAct 1. Direct interactions 2. Association 3. Functional Interaction
In pairs start the next activity: Match the types of experimental techniques (you can find information in the cards provided) with the type of interactions Jyoti just explained : Direct Interactions Association Functional Interaction
Standardisation of the protein interaction data Ontologies factsheet
Format for storage and exchange – PSI-MI XML 2.5
Interaction Databases Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs DIP – active curation, broad species coverage, PPIs MPACT - ? curation, limited species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – active curation, human-centric, modelled interactions MPIDB – active curation, microbial interactions
Participant3 Interaction1 Protein1 Participant1 Interaction2 Experiment1 Participant2 Protein2 Interaction3 Interaction4 Experiment2 . Roles . Features. Preparations Participant How to model an interaction Publication
Literature references Controlled by Ontologies Confidence measures Main objects - Experiment
Building of Complex e.g. enzyme target e.g. bait, prey Delivery methodexpression level… Interactor used experimentally Interactor Main objects - Participant
IntAct • Search MITab • From MiTab to detailed view • Expanding network • Network view - TBC • Other data that can be visualised
IntAct – Home Page http://www.ebi.ac.uk/intact Master headline
Software demonstration • Many ways to search data ! • Simple, yet powerful search engine • Advanced search – how to build complex queries • Searching by ontology terms • Searching by chemical substructure Master headline
Simple Search First search from the home page… Details of interaction Complex ? UniProt Taxonomy PubMed OLS Master headline
! Downloading & Customizing First search from the home page… Master headline
Searching –more How to build complex queries… Master headline
Searching – Fields • Unsure how to build your own complex query ? How to build complex queries… Master headline
Searching – Fields • Some fields provide easy ways to select terms How to build complex queries… Master headline
Software demonstration • Single interaction details • Selecting an interaction • Looking at the details • Fetching all other interaction reported in the same paper • Searching for similar interactions in the database Master headline
Interaction Details Selecting an interaction… Master headline
Interaction Details Looking at the details… Master headline
Interaction Details Looking at the details… Master headline
Interaction Details Searching for similar interactions… Master headline
Network visualisation • In IntAct • From IntAct Binary and expanded • From IntAct N-nary and expanded Important: type of interaction and method used • In Psicquic • Data from other interaction databases
What is PSICQUIC ? • Proteomics Standards Initiative Common QUery InterfaCe. • Community effort to standardise the way to access and retrieve data from Molecular Interaction databases. • PSICQUIC is a specification of a web service. • Resources already implementing PSICQUIC are listed in a registry. • Based on the PSI standard formats (XML and MITAB) • Documentation: http://psicquic.googlecode.com
PSICQUIC implementation PSICQUIC client User PSICQUICRegistry PSICQUIC sources PSICQUIC PSICQUIC PSICQUIC Interaction databases Annotation error ….…. …..... ….…. …..... Publications Observation error Sample
PSICQUIC View • Enables clustering of queries across providers, • Visualization of graphical network • Linking back to the original source for more details • … http://www.ebi.ac.uk/Tools/webservices/psicquic/view/ http://bit.ly/psicquic-view
PSICQUIC Services Tagging • Content • protein-protein • small molecule-protein • nucleic acid-protein • Interaction representation • evidence • clustered • Curation standards • mimix curation • imex curation • rapid curation • Source • internally curated • text mining • predicted • imported • Complex expansion • spoke • matrix • bipartite