580 likes | 867 Views
-Presented by: Peter Oledzki John Pinney Ashwin Sivakumar. Proteomics. Proteomics has been said to be the next step from genomics Proteomics is the sudy of the proteome.
E N D
-Presented by: Peter Oledzki John Pinney Ashwin Sivakumar
Proteomics • Proteomics has been said to be the next step from genomics • Proteomics is the sudy of the proteome. • The proteome is the complete complement of proteins found in a complete genome or specific tissue.
Proteomics and genomics are inter-dependent Genome Sequence Proteomics Genomics mRNA Protein Fractionation Primary Protein products 2-D Electrophoresis Proteomics Functional protein products Protein Identification Post-Translational Modification Determination of gene
Aims of Proteomics • Detect the different proteins expressed by tissue, cell culture, or organism using 2-Dimensional Gel Electrophoresis • Store those information in a database • Compare expression profiles between a healthy cell vs. a diseased cell • The data comparison can then be used for testing and rational drug design.
Gel Electrophoresis • Motion of charged molecules in an electric field. • Polyacrylamide gel provides a porous matrix • (PAGE – Polyacrylamide Gel Electrophoresis) • Sample is stained with comassie blue to make it visible in the gel. • Sample placed in wells on the gel.
1-D Gel electrophoresis • Separation in only 1 dimension: size. • Smaller molecules travel further through the gel then large molecules, thus separation.
1-D continued • Electric field across gel separates molecules. • Negatively charged molecules travel towards the positive terminal and vice-versa. • Western blotting(Protein) not to be confused with Southern blotting (DNA) or Northern blotting (RNA) • Proteins are treated with the denaturing detergent SDS (sodium dodecyl sulfate) which coats the protein with negative charges, hence SDS-PAGE.
2-D – Separation is based on size and charge • First step is to separate based on charge or isoelectric point, called isoelectric focusing. • Then separate based on size (SDS-PAGE).
Isoelectric Focusing • The isoelectric point is the pH at which the net charge of the protein molecule is neutral. • Different proteins have different isoelectric points. • Isoelectric point is found by drawing the sample through a stable pH gradient. • The range of the gradient determines the resolution of the separation.
SDS-PAGE • Second Dimension. • Separation by size. • Run perpendicular to Isoelectric focusing. • The only unresolved proteins after the first and second dimensions are those proteins with the same size and same charge – rare!
2D-PAGE Analysis Software • 2D-PAGE technology has been in use for over 20 years, and potentially provides a vast amount of information about a protein sample. • However, due to difficulties with data analysis, it remains only partially exploited.
Analysis problems • It can be very difficult to compare the results of two experiments to yield a differential expression profile: • Can be severe warping of gel due to • uneven coolant flow • voltage leaks • tears in gel • Can be problems with normalisation of • background • spot intensity • Can be differences in sample preparations.
Current state of software • Correct identification and alignment of spots from the two gels has generally been a process with a lot of manual intervention - hence very slow. • The processing power available with today’s PCs means that automated analysis is starting to become possible. • One vendor claims a throughput of 4 gel pairs per hour can be compared and annotated by an experienced user of their package.
Automated gel matching • Gel matching, or “registration”, is the process of aligning two images to compensate for warp. • Some packages still require the user to identify corresponding spots to help with gel matching. • The Z3 program from Compugen has a fully-automated gel matching algorithm: • define set of small, unique rectangles. • compute optimal local transformations for rectangles. • Interpolate to make smooth global transformation. • Note that this makes use of spot shape, streaks, smears and background structure, which other programs discard.
Spot detection • Once the gel images have been matched, the program automatically detects spots. Algorithms are generally based on Gaussian statistics.
Spot Quantitation • The positions of detected spots are calibrated to give a pI / mW pair for each protein. • A value for the expression level of the protein can be calculated from the overall spot intensity. • Some programs do not quantitate each gel separately, but calculate relative intensity pixel by pixel. This may be a more accurate approach.
Differential Expression • The user can set threshold values for the detection of differential expression. This helps reduce the amount of information displayed at once. • In this example, a protein expressed only in the second sample is circled in red. The yellow circles show proteins which are differentially expressed.
Annotation • Some systems allow semi-automatic annotation of spots, based on a database of proteins listing their pI / mW values. • Proteins of interest can also be excised from the gel and sent on to mass spectrometry for definitive identification. The ProteomeWorks system from Bio-rad offers such an integrated solution for 2D-PAGE and MALDI.
Multi-experiment Analysis • One useful feature of modern programs is the ability to collate data from many runs of the same experiment. • Spots which only appear in one gel are likely to be artifacts, and are removed from the analysis. • This is an excellent way to reduce noise and enhance weak signals.
Links • Z3 system (Compugen) - http://www.2dgels.com/ • Melanie3 (SIB) - http://us.expasy.org/melanie/ • ProteomWeaver (Definiens) - http://www.proteomweaver.com/ • PDQuest (Bio-Rad) - http://www.biorad.com/ • Delta2d (Decodon) - http://www.decodon.com/
Introduction to the databases • With the advent of many 2-D PAGE databases there are a number of protein spots that are already "identified" in a few cell lines. Combined with the aims of the experiment, these databases may give one the opportunity to guess at the identity of a particular protein spot and confirm or deny this by immunoblotting. The approach of obtaining accurate peptide masses from specifically cleaved proteins to search protein sequence databases, known as peptide mass fingerprinting, provides one with another opportunity to identify a previously sequenced protein or (hopefully) confirm that it is indeed novel. An animated SDS PAGE presentation
A number of 2-D Gel databases exist. • Quantitative databases: S.cervisiae and REF52. • Annotative databases: E.coli and human keratinocytes. • An annual issue of the journal “Electrophoresis”-Major database for these databases!!!…(I mean has links to many of these). • A best one would obviously the database which is regularly updated.(Eg: Swiss 2D page).
List of 2-D GEL DATABASES • One can find an extensive list of such databases by following these links. • We would discuss a few “Interesting ones”. • World 2-D PAGE • NCIFCRF • DEAMBULUM-Protein Databases • Ludwig Institute of Cancer Research • Phoretix
World 2-D Page:Index of 2-D page Databases-ExPaSy • Basically a link to various 2-D Page databases. • Has a useful tool called 2-D Hunt where one could search for 2-DE related sites on the web. • Indexed as databases for multi species, mammalia, yeast, plant,bacteria,viruses and parasites, cell lines.
Swiss 2-D Page • Basically a protein databank for 2-D page and SDS page reference maps. • May give the exact location of the protein in the map or the region in the map assuming the fact that it has a Swissprot entry. • Options: Search by keywords, Accession number, spot clicking,full text,author,Swiss-2D Page spot serial number,SRS.Most of them being self-explanatory. • Protein list for a particular reference map(table)(can be downloaded).It gives details on the gene name,protein description,S-2DP reference number,S-2DP accession number,identification method,Exp. Molecular weights and Pis for each entity found. • We can also locate the location of a protein sequence in all/one/selected reference maps available.If it is not found a temporary virtual entry is created on the ExPASy server.
SWISS 2D-PAGE (contd) • It gives cross reference to Medline and a few other databases. • In addition to this textual data, SWISS-2DPAGE provides several 2-D PAGE images showing the experimentally determined location of the protein, as well as a theoretical region computed from the sequence protein, indicating where the protein might be found in the gel. • Genbio (Geneva Bioinformatics) gives subscription(PAID) for the Swiss 2D PAGE to Commercial Institutions. • Vital Statistics • Current release(15.0) has 861 entries in 33 reference maps.
Vital stats continued... • Sources of reference maps: • Human( Liver, plasma, HepG2, RBC, Lymphoma, HepG2 Secreted Proteins, CBF, Macrophage like Cell Line, Erythroleukemia cell, platelet, kidney, promycelocytic leukemia cells, colorectal epithelia cells, colorectal adenocarcinoma cell line(DL-1), Soluble nuclear proteins and matrix from liver tissue) • Mouse( Liver, gastrocnemius muscle, pancreatic islet cells,brown adipose tissue, white adipose tissue,soluble nuclear proteins, matrix from liver tissue). • Arabidopsis thaliana • Dictyostelium discoideum • Escherichia coli(for 7 pI ranges: 3.5-10,4-5,4.5-5.5,5-6,5.5-6.7,6-9,6-11) • Saccharomyces cerevisiae
Swiss 2D Page(cont..) There have been some recent additions to the database. SDS and 2-D Page of nuclear proteins from Human HeLa cells have been added to the growing list of reference maps.It is still an ongoing project.Information about known proteins found within that gel stretch has been mapped(see beloe: right-SDS, left-PAGE)
Swiss 2D Page(cont) Some Useful abbreviations: -ID line: comprises of ID, Entry name,Entry class and the method(2Dgel) in the order as mentioned.They follow a specific nomenclature. -AC line basically contains Accession numbers seperated by a semi colon.It’s a stable way of identifying entries with each release. -DT Line specifies date( self explanatory!). -DE Line gives a descriptive information about the protein.If the complete sequence was not determined then last line would spell as “Fragment”.
Some useful abbreviations(cont…) -The IM line The IM (Images) lines list the 2-D PAGE and SDS-PAGE images which are associated to the entry. These may be, for example, TUMORAL LIVER, NORMAL LIVER or just LIVER. -RP(Reference Position) line: Describes extent of work carried out by the author.Eg: Protein sequence, amino acid composition, mapping on gel, characterisation and review. -The “O” series contains organism species(OS),taxonomy(OX) and classification(OC). -MT(Master) line has information about types of maps used(Eg: Plasma, liver etc).
Methods used for zeroing on the identified spots. • Total of 3398 identified spots(as of the latest version). • Amino acid composition has identified 5.3% of these spots. • Co-migration: 2.6% • Gel-matching: 46.7% • Immunoblotting: 20% • Microsequencing: 15.5% • Peptide mass fingerprinting: 26.3% • Tandem mass spectroscopy: 2.3% Well..does it carry a message?
Browsing the Swiss 2-D Page using spot clicking -We could get information about a known protein by clicking on one of the “checks” in the extensive list of image maps available. -On clicking it throws a tailor-ready image map showing the accurate/approximate position of that protein with respect to all the image maps available.But for obvious reasons the best view can be obtained from the reference image map we initially clicked. -A hypertext link can then be used to obtain the full SWISS-PROT entry for that protein, displaying protein sequence, domain structure, information on known post-translational processing and modifications, and references.
Image clicking(continued…) -From SWISS-PROT, the user can select a link to SWISS-3DIMAGE to see the three-dimensional structure of the protein, if it is known, or to submit the sequence to the SWISS-MODEL three-dimensional modelling tool or view the domain structure . - Also, from SWISS-PROT, the user can select links to pertinent information from DNA sequence databases (EMBL/Genbank), chromosomal and genomic maps (GDB Genome Database), bibliographic references and abstracts (Medline), and databases on the association of human proteins with diseases (OMIM Online Mendelian Inheritance in Man).
Diagramatic representation of Image Clicking... Here we click on this spot in reference map of the Colorectal epithelia cell Throws a screen showing the pictures of different image maps with respect to that protein
Diagramatic representation(cont…) Protein identification on chosen reference map The red rectangle is the expected region of the protein on the gel. Spots are the proteins identified Dotted lines are extensions of the possible regions if the protein is acetylated, phosphorylated or glycosylated.
Enough of Swiss 2D PAGE!!! -On image clicking we could also calculate the theoretical pI and Molecular weight of different sequence fragments with desired end points.One could specify the N-Terminal and C-Terminal values in the options available in the screen.By default it would compute it for the entire sequence available. -Swiss 2D page also has a cross reference to another popular 2D Gel database in Siena 2D Gel database. Now bye bye! Swiss 2D PAGE.
Biobase/Julio Celis Database (Very well structured & lucid!!!) - Hosted at Danish Center of Human Genome Research. -Have the distinction of constructing the first 2D Gel database(HeLa cells) in 1981(Bravo, R., Bellatin, J. and Celis, J.E). -Human and Mouse 2D PAGE Databases. -annotated 2-D gel pattern of fluids from different species can be found in the fluid gallery. -One can find 2D-Gel immunoblots of selected proteins against various antibodies. -2D Gel gallary of various human cell types and fluids.(Includes tumors, keratinocytes and post-translational modifications.). Preparation and labelling of Human keratinocytes-please visit: http://biosun.biobase.dk/~pdi/procedures/procedure_label.html
Biobase/Julio Celis Database(cont…) Human 2-D PAGE Database: The keratinocyte 2D PAGE database constructed using carrier ampholytes, is the largest of its kind and currently list 3625 cellular (2313 isoelectric focusing, IEF; 954 non equilibrium pH gradient electrophoresis, NEPHGE), and externalised polypeptides (358, IEF) of which 1285 have been identified using a combination of techniques including immunoblotting [32], Edman degradation of internal peptides [33, 34], and mass spectrometry [35]. (Might be outdated!!!!) Representation through flow charts to follow...
(Biobase Cont..) By clicking on each of the available reference gels,we can get information(links to medline,swissprot,PDB,cellular location,Knockout,method used) on the available proteins(checked spots) on the gel. Databases for study of skin biology HK-IEF d’base HK-NEPHGE d’base KP present in medium IEF Database 372 IP 865 IP 59 IP
Biobase(cont…) Database for study of Bladder Cancer TCC-NEPHGE d’base Urine-IEF d’base TCC-IEF database BSCC-IEF d’base 144 IP 197 IP 449 IP 309 IP
Biobase(cont…) Other 2D Page Databases Human MRC-5- Fibroblasts-IEF D’base Human MRC-5- Fibroblasts-NEPHGE D’base 84 IP 262 IP
Biobase(cont…) Search Options: Seacrh by protein name, keyword, sample spot number, Relative Molecular mass, pI, organelle /component. Other options relating listing of proteins,views of the gels are quite self explanatory. Other utilities of the Database: Has links to -NCBI’S Human-Mouse Homology maps through its Mouse 2D-PAGE Databases. -Interesting studies like Mouse-Genome Informatics(Jackson’s lab) and Mouse Atlas Projects.
NCIFCRF(National Cancer Institute…..could not sphere out what FCRF was!!!..sorry) *Seems a very exhaustive and useful source.Lots of things still to study.* 2D Protein Gel Databases Maintained by Image Processing Section WebGel Flicker dbEngine Maintain the gel analysis software- GELLAB II
WebGel: WebGel is an Internet-based, interactive, qualitative and quantitative gel database analysis system. A WebGel database contains previously quantified gel data generated from a stand-alone quantitataive gel analysis system. wbdemoDB demonstration database of serum proteins in a fetal alcohol syndrome study. melanie2DB demonstration database of E.coli gelsfrom the Melanie 2.3 demonstration database. fasDB database of serum proteins in a fetal alcohol syndrome study
FLICKER “Flicker is a method for comparing images from different Internet sources on your Web browser. In the case of 2D protein electrophoretic gel images, maps identifying proteins in these gels are becoming increasingly available. Visually comparing 2D sample gels against these 2D gel database maps may suggest putative protein spot identification in many cases. Flicker was originally developed for comparing 2D protein gels across the Internet.” -Part of the description in their web site. Flicker comparing two Plasma 2D-PAGE Gels.