1 / 42

Protein functions prediction

Protein functions prediction. Signal peptides Transmembrane regions and topology PTM (post-translational modifications) Low complexity and biased regions Repeats Coils. Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package. Introduction. Different techniques.

susan
Download Presentation

Protein functions prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein functions prediction

  2. Signal peptides Transmembrane regions and topology PTM (post-translational modifications) Low complexity and biased regions Repeats Coils Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package Introduction

  3. Different techniques • Algorithms • Sliding window, Nearest Neighbor • Patterns, regular expression • Weight matrices • HMM, profiles • Neural Networks • Rules

  4. Sliding window THISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQW Score1 Scoren Score2 Width or Size=11, Step=5 Results are usually displayed as a graph, see example ->

  5. Patterns / regular expression • Pattern: <A-x-[ST](2)-x(0,1)-{V} • Regexp: ^A.[ST]{2}.?[^V] • Text: The sequence must start with an alanine, followed by any amino acid, followed by a serine or a threonine, two times, followed by any amino acid or nothing, followed by any amino acid except a valine. • Simply the syntax differ…

  6. Weight matrices (PSSM)

  7. HMM / profiles

  8. Neural Networks General principle: Example:

  9. N-ter exportation - secretion mitochondria chloroplast internal NLS (nuclear localization signal) C-ter GPI-anchor (Glycosyl Phosphatidyl Inositol) other membrane anchors (see PTM) other unknown ? Signals found in proteins

  10. SignalP MitoProt ChloroP Predotar PSort TargetP Sigcleave (EMBOSS) Phobius Big-PI DGPI Signals detection tools

  11. Detection (signal peptide, hydropathy, helices) Organisation (topology) Transmembrane regions

  12. TMHMM TMPred TopPred2 DAS HMMTop Tmap (EMBOSS) Mixture of tools Phobius ConPred II Transmembrane detection tools

  13. Phosphorylation S - T - Y N-glycosylation N O-glycosylation S - T - (HO)K Acetylation, methylation D - E - K Sulfation Y Farnesylation, myristylation, palmitoylation, geranylgeranylation, GPI-anchor C - Nter - Cter Ubiquitination and family K - Nter Inteins (protein splicing) Pre-translational Selenoprotein C Post translational modifications

  14. Pattern prediction (PROSITE) Short or weak signal Frequent hit producer Best method is experimental MS/MS detection Most method use « rules » joining pattern detection and knowledge to predict sites. NetOGlyc - Prediction of type O-glycosylation sites in mammalian proteins DictyOGlyc - Prediction of GlcNAc O-glycosylation sites in Dictyostelium YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences NetPhos - Prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins NMT - Prediction of N-terminal N-myristoylation Sulfinator - Prediction of tyrosine sulfation sites PTM detection

  15. repeats compositional bias PEST Low complexity regions

  16. DUST (DNA) / SEG de novo detection RepeatMasker (DNA) search collection REP search collection REPRO, Radar de novo detection PEST, PESTFind de novo detection EMBOSS (DNA) einverted equicktandem etandem palindrome EMBOSS (protein) oddcomp Low complexity / Repeats

  17. Helix of helix coiled-coil Leu-zipper Coils

  18. COILS Weight matrices Paircoil, Multicoil Pairwise correlation Marcoil HMM Pepcoil (EMBOSS) Weight matrices Coils detection

  19. Structure to predict Alpha-helices Beta-sheets Turns Random coil Garnier (EMBOSS) PHD DSC PREDATOR NNSSP Jpred Jnet Many others Secondary structure

  20. Peptides binding to MHC class I 8, 9, 10 mers class II 15 mers (3+9+3) Depend highly on MHC type Use of experimental knowledge Databases of known peptides SYFPEITHI HLA_Bind (BIMAS) MAPPP combined expert Antigenic (EMBOSS) Many more Prediction of proteasome cleavage sites NetChop PaProc Antigenic peptide

  21. All the protein domain descriptors PROSITE PFAM SMART PRODOM BLOCKS PRINTS TIGRfam … Federation: InterPro Many techniques Patterns, Regexp PSSM (PSI-BLAST) Profiles HMM Domain / Motif

  22. You can find some of them on our servers www.ch.embnet.org Or on ExPASy server www.expasy.org/tools Or ask Google!! www.google.com Other Tools

  23. European Molecular Biology Open Software Suite

  24. How to use EMBOSS/Jemboss at SIB

  25. Free Open Source (for most Unix plateforms) • GCG successor (compatible with GCG file format) • More than 150 programs (ver. 2.9.0) • Easy to install locally • but no interface, requires local databases • Unix command-line only • Interfaces • Jemboss, www2gcg, w2h, wemboss… (with account) • Pise, EMBOSS-GUI, SRSWWW (no account) • Staden, Kaptain, CoLiMate, Jemboss (local) • Access: www.emboss.org or emboss.sourceforge.net

  26. Some details • Format USA • 'asis' ::Sequence [start : end : reverse] • Format :: '@' ListFile [start : end : reverse] • Format :: 'list' :ListFile [start : end : reverse] • Format :: Database : Entry [start : end : reverse] • Format :: Database-SearchField:Word [start : end : reverse] • Format :: File : Entry [start : end : reverse] • Format :: File:SearchField:Word [start : end : reverse] • Format :: Program Program-parameters '|' [start : end : reverse] • Example: fasta::Swissprot:UBP5_HUMAN[200:300] • Databases • Any can be added, use showdb to display the available databases

  27. databases • showdb Displays information on the currently available databases # Name Type ID Qry All Comment # ==== ==== == === === ======= ipr_fetch P OK OK OK InterPro current by fetch ipi_fetch P OK OK OK IPI current by fetch refseq_fetch P OK OK OK refseq current by fetch repbase_fetch P OK OK OK repbase current by fetch swiss_fetch P OK OK OK SwissProt current by fetch swissprot P OK OK OK SWISSPROT sequences trembl P OK OK OK TREMBL sequences trembl_fetch P OK OK OK trembl current by fetch tremblnew P OK OK OK TREMBL New sequences ug_fetch P OK OK OK Unigene by fetch embl N OK OK OK EMBL release emhum N OK OK OK EMBL release, Human section by emboss index emrod N OK OK OK EMBL release, Rodent section by emboss index emvrt N OK OK OK EMBL release, Vertebrate (nonhuman, nonrodent) • seqret (seqretall, seqretset, seqretsplit) • entret (for complete untouched entry, e.g., for unigene, interpro, swissprot…) • Possible to define your own « .embossrc » file

  28. Some tools for DNA • redata Search REBASE for enzyme name, references, suppliers etc • remap Display a sequence with restriction cut sites, translation etc • restover Finds restriction enzymes that produce a specific overhang • restrict Finds restriction enzyme cleavage sites • showseq Display a sequence with features, translation etc • silent Silent mutation restriction enzyme scan • cirdna Draws circular maps of DNA constructs • lindna Draws linear maps of DNA constructs • revseq Reverse and complement a sequence • …

  29. Example: remap ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. Hin6I TaqI | HhaI | Bsc4I | Bsu6I | | Hin6I | BssKI | | | HhaI AciI | | BsiSI \ \ \ \ \ \ \ \ GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA / / / / / / / /// | TaqI | Hin6I AciI | | ||BssKI Bsc4I HhaI | | |BsiSI | | Bsu6I | Hin6I HhaI # Enzymes that cut Frequency Isoschizomers AciI 1 Bsc4I 1 BsiSI 1 BssKI 1 Bsu6I 1 HhaI 2 Hin6I 2 HinP1I,HspAI TaqI 1 # Enzymes that do not cut AclI BamHI BceAI Bse1I BshI ClaI EcoRI EcoRII Hin4I HindII HindIII HpyCH4IV KpnI NotI

  30. Example: cirdna • File: ../../data/data.cirp Start 1001 End 4270 group label Block 1011 1362 3 ex1 endlabel label Tick 1610 8 EcoR1 endlabel label Block 1647 1815 1 endlabel label Tick 2459 8 BamH1 endlabel label Block 4139 4258 3 ex2 endlabel endgroup group label Range 2541 2812 [ ] 5 Alu endlabel label Range 3322 3497 > < 5 MER13 endlabel endgroup

  31. Example: plotorf

  32. EMBOSS format input/output • UFO Universal Feature Object • gff, swissprot, embl, pir, nbrf (with or without sequence) • Alignments • Multiple and pairwise, many flavors (FASTA, MSF, SRS…) • Reports • Feature (UFO), SRS, motif, seqtable, excel, diffseq, listfile (USA), etc… • Sequences (compatible with USA) • Many!!! E.g., fasta, clustal, gcg, paup, gff, embl, swissprot, acedb, abi, etc…

  33. Web interfaces • PISE (Pasteur Institute Software Environment) • http://www-alt.pasteur.fr/~letondal/Pise/ • wEMBOSS (Belgium&Argentina) (not yet at SIB) • http://www.wemboss.org

  34. http://emboss.ch.embnet.org/Pise Pise a tool to generate Web interfaces for Molecular Biology programs

  35. http://www.wemboss.org

  36. Launch Jemboss http://emboss.ch.embnet.org/Jemboss

  37. Launch Jemboss First time only… Each time…

  38. Jemboss windows

  39. Jemboss windows other systems

  40. Summary • Anonymous web access through Pise • Registered access through Jemboss • Registered access through command-line (requires UNIX skills) • Please report problems!

  41. DEA Exercises web based sequence analysis The goal of this exercise is to use web based tools for protein sequence analysis a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein or with the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you. b) Pass this sequence through PFSCAN and search all databases. Compare with this command on ludwig-sun1/2:hits -b "prf pat pfam" tr:Q9X252 c) use the different profile, motifs, pattern databases to get more information about the domain(s) you found. d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)? List of useful links: basic BLAST or advanced BLAST or PSI-BLAST TMPred prediction tool for transmembrane regions (or TMHMM) COILS prediction tool for coiled-coil regions SignalP prediction tool for signal-peptide cleavage site Profile, domain, motifs databases and search sites: PFSCAN InterPro (Pfam, PRINTS, PROSITE,SMART) HITS Exercises

More Related