610 likes | 846 Views
Introduction to Proteomics Susan Liddell University of Nottingham susan.liddell@nottingham.ac.uk. PGT short course 2013 UoN Graduate School Course Post-genomics and bio-informatics. Sutton Bonington Proteomics labs Division of Animal Sciences – South lab Susan Liddell, Ken Davies.
E N D
Introduction to Proteomics Susan Liddell University of Nottingham susan.liddell@nottingham.ac.uk PGT short course 2013 UoN Graduate School Course Post-genomics and bio-informatics
Sutton Bonington Proteomics labsDivision of Animal Sciences – South labSusan Liddell, Ken Davies • Supports proteomics studies & collaborative projects • gel electrophoresis (mainly 2D) • protein identification via tandem MS • Wide variety of types of projects and organisms • including some species with unreported genome sequences human cow horse fungi bacteria archaea plants Dr Ken Davis
Overview • what is proteomics? • why study the proteome • proteomic strategies • the 2D gel standard workflow • high throughput LC-MSMS • challenges
Proteomics: an explosive growth in interest & expectations The NCBI database PubMed was searched using the term “proteomics” Same search on 11/201236,547 entries Same search on 12/2013 44,947entries Biochem. J. (2012) 444, 169–181 The increase in proteomics publications
Driving forces for proteomics • Nucleotide databases with complete genome sequences • Technical advances in 2D gel electrophoresis • IPG strips, multigel runners, buffer components • Enormous advances in MS instrumentation • speed, sensitivity, resolution, soft ionisation • Computer algorithms for searching databases with MS data in correlative based approaches to identify proteins
Proteome The term “proteome” was coined by a PhD student, Marc Wilkins “the entire PROTEin complement expressed by a genOME of a cell or organism” Wasinger et al 1995 Electrophoresis: 16:1090 Proteomics “...the identification of all the proteins encoded in the human genome....” including modification, quantification, localisation and functional analysis for every cell type Human Proteome Organisation (www.hupo.org)
Proteomics study of proteins and protein function usually on a genome wide scale Proteomics preceded genomics Human Protein Index N & L Anderson 1982
Aims of Proteomics Global (unbiased) analysis of complex protein samples Find changes in protein expression (potential biomarkers) in different biological situations (disease) Development of diagnostic tools and therapeutic agents/drugs Fundamental understanding of biological processes and mechanisms
Why analyse the proteome? Genome considerations the genome provides only the blueprint – an inventory of the genes that could be expressed in a cell • genomes are (largely) static • proteomes vary enormously • cell types, developmental stage, environment, etc • proteomes are highly dynamic, are constantly changing • the proteome is a lot more complex than the genome, because most proteins are chemically modified and so have multiple forms Images courtesy of en.wikipedia.org
Why analyse the proteome? Genome considerations sequence alone does not reveal biological function Arabidopsis genome annotation functional characterisation 26% molecular function unknown Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies Plant Physiology (2004) Vol.135, p745
Why analyse the proteome? Genome considerations • one gene can code for more than one protein • gene rearrangements • immunoglobulin heavy and light chains • T-cell receptor α and β chains • RNA splicing • alternative splicing in a conserved family of ser/arg rich proteins in Arabidopsis generates up to 95 transcripts from only 15 genes
Why analyse the proteome? Transcript considerations poor correlation between mRNA levels and protein expression levels • Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. De Godoy et al 2008 Nature 455:1251 • “Overall correlation of mRNA and protein changes was poor....” • “the relationship between mRNA and protein levels depends on the proteins investigated” • Correlation between protein and mRNA abundance in yeastGygi et al 1999 MolCellBiol 19:1720 • A comparison of selected mRNA and protein abundances in human liverAnderson and Seilhamer 1997 Electrophoresis 18:533
Why analyse the proteome? Transcript considerations Common covalent modifications of proteins affecting activity each transcript can give rise to several protein isoforms via post translational processing (>300 PTMs) Biochemistry. Jeremy M Berg, John L Tymoczko, Lubert Stryer , Neil D Clarke
PROTEOMICS • proteins are the main biological effector molecules, providing structures, enzyme activities, transport and more • the next step from determining the genome is to find out thefunction of the gene products – the proteins • analysis of protein products complements genomics & transcriptomics “At the end of the day, proteins, not genes, are the business end of biology”
Global Proteomics identify every protein present in a cell, tissue, biofluid or organism Catalogue of proteins most practical for simple organisms eg yeast, prokaryotes
Targeted (Sub)Proteomics • quantitative protein abundance • qualitative post translational modifications phosphorylation, glycosylation • subcellularcompartments/organelles nuclei, plasma membrane, mitochondria • functional complexes of interacting proteins
There are many Proteomic Approaches using many different technologies GelsProteins1D/2D gelsstains/labels Liquid ChromatographyPeptides/Proteins1D/2DLabels/label free Protein ChipsProtein arrays on slides (protein spots, tissue sections) Mass Spectrometry
The Nobel Prize in Chemistry 2002 "for their development of soft desorption ionisation methods for mass spectrometric analyses of biological macromolecules" Electrospray ionization (ESI) John B Fenn Matrix-assisted laser desorption/ionization (MALDI) Koichi Tanaka
Applications of mass spectrometry in protein analysis include Protein identification peptide mass fingerprinting Tandem MS de novo sequence Recombinant protein evaluation confirm identity engineered mutations, sequence changes cleavages or other modifications assess homogeneity Identification of modifications acetylation oxidation glycosylation phosphorylation ….anything that causes a change in mass….
Proteomic Workflow 2D gel/MS Proteinseparation Analysis and protein spot selection Processing and digestion to peptides Mass spectrometric analysis Database interrogation Protein identification
kDa Protein separation2-dimensional gel electrophoresis pI 1st dimension Separation by charge (isoelectric focussing) pH 3 pH 10 2nd dimension Separation by molecular weight (SDS-PAGE)
2D gel electrophoresis equipment1st dimension IEF various lengths 5 - 24 cm wide range pH 3-11 narrow/zoom range pH 4-5 loading methods in-gel rehydration cup, paper bridge
2-D gel electrophoresis equipment 2nd dimension SDS-PAGE various lengths linear / gradient reducing / non-reducing Multi-gel runners increase reproducibility increase throughput
Protein detection and image capture post-gel staining colloidal coomassie blue silver SYPRO ruby, Deep Purple, Flamingo pre-gel sample labelling 35S-methionine Cy3, Cy5, Cy2 (DiGE) Pro-Q Diamond – phosphoproteins Pro-Q Emerald – glycoproteins Pro-Q Amber – transmembrane proteins (1D gels)
Example 2D gelE. coli cell extract Resolution – how many proteins? each spot is adifferent protein spot intensity is proportional to the amount of protein 250 150 Depends on the separation lengths of gels ie size of IPG strip and 2nd dimension gel Mini-gels (7 x 7cm) – a few hundred Midi-gels (18 x 20 cm) – ~1-2,000 Large format (24 x 20 cm) – up to10,000 100 75 50 37 25 20 Soo Jin Saa 100 µg, Silver stained, mini-format pH 4-7 IPG strip, 12.5% PAGE
Comparison of gel stains SYPRO ruby ~ 1 ng/mm2 Silver 0.5 ng/mm2 Colloidal Coomassie Blue 10-50 ng/mm2
Proteomic Workflow 2D gel/MS Proteinseparation Analysis and protein spot selection Processing and digestion to peptides Mass spectrometric analysis Database interrogation Protein identification
Analysis and spot selection Find differences in spot patterns (protein expression changes) between samples using image analysis software Image analysis software PDQuest (BioRad) DeCyder (GE Healthcare) Same Spots (Nonlinear Dynamics) Image capture Spot detection Spot matching across gel set Statistical evaluations
differences in spot intensity = protein expression changes separate proteins and compare different samples 2D gel electrophoresis separates proteins in two different dimensions – pH and size normal cells diseased cells Figure courtesy of Dr Rob Layfield, School of Life Sciences
separate proteins and compare different samples normal cells diseased cells You found some changes what are the actual proteins?
Proteomic Workflow 2D gel/MS Proteinseparation Analysis and protein spot selection Processing and digestion to peptides Mass spectrometric analysis Database interrogation Protein identification
Gel spot excision and processing Pick individual spots into 96-well microtitre plates Destain Digest (trypsin) Peptide extraction
Identify proteins using Mass Spectrometry MALDI-ToF Q-ToF2 (plus capillary/nano flow HPLC)
Limitations of 2D gels Some classes of proteins are difficult to obtain on 2D gels basic / acidic proteins large / small proteins membrane proteins Low throughput / difficult to automate Another approach : high throughput LC-MSMS
Many Proteomic Approaches using many different technologies GelsProteins1D/2D gelsstains/labels Liquid ChromatographyPeptides/Proteins1D/2DLabels/label free Protein ChipsProtein arrays on slides (protein spots, tissue sections) Mass Spectrometry
Proteomic Workflow high throughput LC-MSMS (bottom up) Digestion of complex protein sample Peptideseparation High resolution HPLC (often multidimensional) Mass spectrometric analysis Database interrogation Protein identification (large numbers) Quantitation tagging, non-tagging approaches
high throughput LC-MSMS(shotgun) From Vanderbilt University Medical Centre
LC-MSMS Combines the HPLC separation of peptides with the detection and analytical power of tandem MS Sample ESI IONISATION SOURCE MASS ANALYSER DETECTOR
Liquid ChromatographyMudPIT (Multi-dimensional protein identification technology) Separate complex mixtures of peptides using multi-dimensional HPLC 1st dimension – strong cation exchange 2nd dimension – reversed phase Analyse using tandem mass spectrometry
multidimensional chromatography most commonly strong cation exchange chromatography for the first dimension followed by running each fraction on reversed phase separation for the second dimension Figure courtesy of proteabio.com
LC-MSMS Use labels to make quantitative - heavy and light isotopes ICAT / iTRAQ Metabolic labelling in vivo or in vitro e.g. 15N/14N Trypsin digest in presence of heavy water 18O/16O
gel versus off-gel limitations of 2D gels Some classes of proteins are difficult to obtain on 2D gels basic / acidic proteins large / small proteins membrane proteins Low throughput / difficult to automate advantages of 2D gels allow examination of modifications of intact proteins relatively low cost equipment easy to understand/interpret
gel versus off-gel advantages of LC-MSMS very high thoughput identify large numbers of proteins more analytical runs disadvantages of LC-MSMS digestion increases the complexity of the sample lose the connection between peptides and intact proteins complex analysis procedures/software need more bioinformatics assistance Use as complementary approaches
What next? • gel based and high throughput off-gel experiments generate huge quantities of data in the form of long lists of proteins • how can these data be placed in the biological context – what does the data tell you? • this is a huge challenge • the need is to establish what the proteins do, what other proteins they interact with and work out why they have changed - in order to obtain molecular insights into the process in question
Specialised software tools • text mining tools to help establish the function(s) of each protein • DAVID : functional annotation tools to help understand biological meaning behind large list of proteins/genes • network analysis tools Ingenuity Systems IPA : pathway & network analysis of complex 'omics data Cytoscape : platform for visualizing complex networks and integrating ‘omics data STRING : database and web resource for known and predicted protein-protein interactions formulate a testable hypothesis to drive the research forward Figure from Rathbone, Liddell & Campbell (2013) Cellular Reprogramming 15:269
Challenges in proteomics : Complexity Many different molecules that are expressed at different • levels • times • places • PTMs
Challenges in proteomics : Complexity Range of sizes from ten to several hundred amino acids less than 10 kDa to 1 million kDa Different forms post translational modifications hugely increases the number of different molecules Chemical composition very different physicochemical characteristics
Challenges in proteomics : Complexity Diversity of sequences (number of genes/ORFs) arabidopsis ~28,000 (the smallest plant genome)human > 22,000 worm ~ 19,000 yeast ~ 6,000 E.coli ~ 5,000 (fairly typical for bacteria) Huge number of proteins estimates of the size of the proteome varies widely for human cells it varies from 100,000 to 2 million!
Challenges in Proteomics Dynamic range Don’t see the lower abundance proteins in complex mixtures
Proteins measured clinically in plasma span > 10 orders of magnitude in abundance Anderson NL, Anderson NG The human plasma proteome: history, character, and diagnostic prospects Molecular and Cellular Proteomics 2002 1:845-867