1 / 69

Protein function and classification

Protein function and classification . Hsin -Yu Chang www.ebi.ac.uk. Classifying proteins into families and identifying protein homologues can help scientists to characterise unknown proteins . .

remedy
Download Presentation

Protein function and classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein function and classification Hsin-Yu Chang www.ebi.ac.uk

  2. Classifying proteins into families and identifying protein homologues can help scientists to characterise unknown proteins.

  3. Greider and Blackburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ? 3. Mouse 2. Saccharomyces cerevisiae 1. Tetrahymena thermophila 4. Human

  4. 1995 Clone hTR 1995/1997 Clone hTERT 1997 Telomerase knockout mouse 1989 Telomere hypothesis of cell senescence Szostak 1999/2000… Telomerase/telomere dysfunctions and cancer 1998 Ectopic expression of telomerase in normal human epithelial cells cause the extension of their lifespan 1984 Discovery of telomerase Greider and Blackburn A single Tetrahymenathermophilacell has 40,000 telomeres, whereas a human cell only has 92. Gilson and Ségal-Bendirdjian, Biochimie, 2010.

  5. Can we identify human telomerase from Tetrahymea protein sequence?

  6. Let’s pretend that human telomerase has not been identified and we only know the protein sequences of Tetrahymenatelomerase. How can we find the human telomerase?

  7. BLAST (Basic Local Alignment Tool) : compares protein sequences to sequence databases and calculates the statistical significance of matches.

  8. BLAST • Advantages: • Relatively fast • User friendly • Very good at recognising similarity between closely related sequences • Drawbacks: • sometimes struggle with multi-domain proteins • less useful for weakly-similar sequences (e.g., divergent homologues)

  9. Using Tetrahymena telomerase protein sequences as a query in BLAST, you will find a few human proteins that have very low identity.

  10. Tetrahymena and putative human telomerase (AAC51724.1) have poor protein sequence match.

  11. Can we presume this protein is a telomerase homologue from humans? Can we find more information about it before pursuing it further?

  12. Search for protein signatures (such as domains) in AAC51724.1 Telomerase ribonucleoprotein complex - RNA binding domain Reverse transcriptase domain

  13. AAC51724.1 shares 23% identity with Tetrahymena telomerase. It also contains the same domains as telomerase. Plan experiments and find out more!

  14. But, where can we search for information about the protein domains?

  15. Protein databases that use signature approaches Profiles Protein features  (sites) HAMAP Structural domains Functional annotation of families/domains Patterns Finger prints Hidden Markov Models

  16. Construction of protein signatures • Construction of a multiple sequence alignment (MSA) from characterised protein sequences. • Modelling the pattern of conserved amino acids at specific positions within a MSA. • Use these models to infer relationships with the characterisedsequences

  17. Three different protein signature approaches Patterns Single motif methods Sequence alignment Profiles & Hidden Markov Models (HMMs) Full alignment methods Fingerprints Multiple motif methods

  18. Patterns

  19. PS00000 Patterns Patterns are usually directed against functional sequence features such as: active sites, binding sites, etc. Sequence alignment Motif ALVKLISG AIVHESAT CHVRDLSC CPVESTIS Pattern sequences [AC] – x -V- x(4) - {ED} Regular expression Pattern signature

  20. A conserved motif in tubulins Tubulin signature [SAG]-G-G-T-G-[SA]-G PDOC00199

  21. Patterns • Drawbacks: • Simple but less flexible • Advantages: • Strict - a pattern with very little variability and can produce highly accurate matches

  22. Fingerprints

  23. Motif 1 Motif 2 Motif 3 xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx Motif sequences Fingerprint signature PR00000 Fingerprints: a multiple motif approach Sequence alignment Define motifs Weight matrices

  24. Motif 1 Motif 2 Motif 3 Motif 4 Telomerase signature (PR01365)

  25. The significance of motif context • Identify small conserved regions in proteins • Several motifs  characterise family order 1 2 3 interval

  26. Fingerprints • Good at modeling the often small differences between closely related proteins • Distinguish individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificity Amino acids relatively well conserved across all chloride channel protein family members Amino acids uniquely conserved in chloride channel protein 3 subfamily members.

  27. Profiles & HMMs

  28. Profiles & HMMs Whole protein Sequence alignment Entire domain Define coverage xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx Use entire alignment of domain or protein family xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Build model (Profile or HMMs) Profile or HMM signature

  29. Profiles Start with a multiple sequence alignment Amino acids at each position in the alignment are scored according to the frequency with which they occur Scores are weighted according to evolutionary distance using a BLOSUM matrix • Good at identifying homologues

  30. HMMs Start with a multiple sequence alignment Amino acid frequency at each position in the alignment and their transition probabilities are encoded Insertions and deletions are also modelled Advantages • Can model very divergent regions of alignment • Very good at identifying evolutionarily distant homologues

  31. Three different protein signature approaches Patterns Single motif methods Profiles & HMMs hidden Markov models Full alignment methods Fingerprints Multiple motif methods

  32. Profiles & HMMs hidden Markov models Patterns Fingerprints www.ebi.ac.uk/interpro

  33. HAMAP Profiles Protein features  (sites) Functional annotation of families/domains Structural domains Patterns Finger prints Hidden Markov Models

  34. The aim of InterPro Protein sequences Family entry: description, proteins matched and more information. Domain entry: description, proteins matched and more information. Site entry: description, proteins matched and more information.

  35. What is InterPro? • InterProis an integrated sequence analysis resource • It combines predictive models (known as signatures) from different databases • It provides functional analysis of protein sequences by classifying them into families and predicting domains and important sites

  36. Facts about InterPro • First release in 1999 • 11 partner databases • Add annotation to UniProtKB/TrEMBL • Provides matches to over 80% of UniProtKB • Source of >85 million Gene Ontology (GO) mappings to >24 million distinct UniProtKBsequences • 50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of software

  37. InterPro signature integration process • Signatures are provided by member databases • They are scanned against the UniProt database to see which sequences they match • Curators manually inspect the matches before integrating the signatures into InterPro InterPro curators

  38. InterPro signature integration process • Signatures representing the same entity are integrated together • Relationships between entries are traced, where possible • Curators add literature referenced abstracts, cross-refs to other databases, and GO terms

  39. http://www.ebi.ac.uk/interpro/

  40. Search using protein sequences

  41. Family

  42. Type

  43. InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure. Ex. Telomerase family. Family Domain Distinct functional, structural or sequence units that may exist in a variety of biological contexts. Ex. DNA binding domain. Short sequences typically repeated within a protein. Ex. Tubulin binding repeats in microtubule associated protein Tau. Repeats Active Site Binding Site Conserved Site PTM Sites Ex. Phosphorylation sites, ion binding sites, tubulin conserved site.

  44. Type Name Identifier Contributing signatures Description References GO terms

  45. Type Identifier Name Relationships Contributing signatures Description References

  46. InterPro family and domain relationships

More Related