490 likes | 586 Views
Protein Analysis Tools 2 nd April, 2012. Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh ansuman@pitt.edu http://www.hsls.pitt.edu/guides/genetics. What we’ll do:. Brief overview of CLC Main Workbench
E N D
Protein Analysis Tools2nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh ansuman@pitt.edu http://www.hsls.pitt.edu/guides/genetics
What we’ll do: • Brief overview of CLC Main Workbench • find genomic context of a protein sequence • search for the presence of conserved domains • create a multiple sequence alignment plot
What we’ll do: • analyze primary structure such as, hydrophobicity, hydrophylicity, antigenicity, repeat sequence detection etc. • predict secondary structure • predict post translational modification such as, • Phosphorylation, glycosylation, …. • search for interacting partners • predict domain driven protein-protein interactions
Workshop Resources http://www.hsls.pitt.edu/molbio/tutorials
Sequence Analysis Software Suits • Wisconsin GCG • VectorNTI • DNA STAR-LaserGene • Geneious • CLC Main
Why CLC Main ? • Windows • Mac • Linux • DNA, RNA, Protein, • Microarray Data Analysis • Regular Update • HSLS Licensed
CLC Main Access • HSLS CLC Main Registration • Link: http://www.hsls.pitt.edu/molbio/clcmain • Access via Pitt - Network Connect • Instruction video: http://goo.gl/JNjMt
CLC Main Workbench Overview • Graphical Users Interface • Protein sequences Import • Sequence Navigation
Navigate a protein sequence
Videos • CLC Main –getting started (basic navigation steps): http://media.hsls.pitt.edu/media/molbiovideos/clc-navigation-ac0312.swf • CLC Main Workbench Walkthrough (Part1): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part1-ac0112.swf • CLC Main Workbench Walkthrough (Part2): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part2-ac0112.swf
Protein Sequence • Human PLCg1 • Refseq no: NP_002651 • Uniprot Accession Number: P19174 • FASTA file • Raw sequence CLC features: Search, Import, Create new sequence
Videos • Import a DNA /Protein sequence into CLC Main (Part1):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part1-ac0112.swf • Import a DNA /Protein sequence into CLC Main (Part 2):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part2-ac0112.swf
Protein sequence manipulation • Create a new protein with PLCg1 SH2-SH2-SH3 domains
Sequence Alignment • Pair-wise Alignment • Global • Local • Multiple Sequence Alignment
Multiple Sequence Alignment • Tools: ClustalW and T-coffee
PLCg1 Orthologous sequences • PLCg1: • Mouse: NP_067255 • Rat: NP_037319 • Cow: NP_776850 • Dog: XP_542998 • Zebra fish: NP_919388 • Human: NP_002651 • NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651
Videos • Create a multiple sequence alignment plot using CLC(part1): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212 part1.swf • Create a multiple sequence alignment plot using CLC (part2): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212-part2.swf • Create a multiple sequence alignment plot: http://media.hsls.pitt.edu/media/clres2705/msa.swf • Compare two peptide sequences.: http://media.hsls.pitt.edu/media/clres2705/blast2.swf
Starting with a short peptide sequence find: • the whole protein sequence • orthologs in other species (nematode) Tool: UCSC BLAT NCBI BLAST against SwissProt
Peptide to whole protein • Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Videos • Place a mRNA or peptide sequence into the human genome (BLAT): http://www.hsls.pitt.edu/molbio/videos/play?v=12e • Find homologous sequences: http://media.hsls.pitt.edu/media/clres2705/blast.swf
Find homologous sequence SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Sequence Manipulation & Format Conversion • Sequence Manipulation Suite • http://bioinformatics.org/sms2/ • Readseq • http://thr.cit.nih.gov/molbio/readseq/ GenePept FASTA
Hands-On • Retrieve amino acid sequence present between position 25 to 45 in Sequence A (MS Word Doc) • Identify the rat gene which encodes this peptide fragment and retrieve its whole protein sequence • Find the fruit fly homolog of this protein. • What % identity the fruit fly protein shares with its rat homolog? • Predict potential MAPKphosphorylation sites present in the fruit fly protein
Protein Domain Search: InterPro Scan • InterPro is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences. >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Videos: • Find protein domains, PTM, secondary str etc: http://media.hsls.pitt.edu/media/clres2705/uniprot.swf • Start with a protein pattern and find what proteins posses that domain: http://media.hsls.pitt.edu/media/clres2705/scanprosite.swf • Search for protein domains,repeats and sites: http://media.hsls.pitt.edu/media/clres2705/interpro.swf
Protein Domain Search: ScanProsite >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Pattern Search • [AC]-x-V-x(4)-{ED}: • This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp} • F-[GSTV]-P-R-L-[G>]
Protein Primary Structure Analysis • Tool: ExPASy from SIB • Calculated Mol Wt • Theoritical PI • Extinction coefficients • Estimated half-life • Hydropathicity plot : Kyte & Doolittle • Hydrophilicity plot: Hopp T.P., Woods K.R
Antigenic Site Prediction • Tool: Emboss Antigenic >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
EmBoss Antigenic • Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.
Transmembrane Site Prediction >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK Tool: TMHMM Server
Protein Secondary Structure >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Protein-Protein Interactions Prediction >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK Tool: STRING
Hands-on • Take the human BCL2 protein sequence and • Find its domain architecture • Predict the topology of its transmembrane region • Design suitable antigenic site for antibody generation • What is its calculated Mol Wt and Ext Coefficient? • Predict its secondary structure • What % of this protein possesses alpha helical structure? • Predict its potential interacting partners
Hands-on • Prediction of potential phosphorylation sites present in a protein sequence. • Sequence: human BCL2 • >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Phosphorylation Site Prediction: Tool: NetPhos >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Phosphorylation Site Prediction: Tool: GPS >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Thank you!Any questions? Carrie Iwema Ansuman Chattopadhyay iwema@pitt.eduansuman@pitt.edu 412-383-6887 412-648-1297 http://www.hsls.pitt.edu/guides/genetics