1 / 17

Pattern databases in protein analysis

Arthur Gruber. Pattern databases in protein analysis. Instituto de Ciências Biomédicas Universidade de São Paulo. AG-ICB-USP. Protein databases. Genpept – protein sequence database translated from GenBank

Download Presentation

Pattern databases in protein analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Arthur Gruber Pattern databases in protein analysis Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP

  2. Protein databases • Genpept – protein sequence database translated from GenBank • UniProtKB/TrEMBL – is a computer-annotated protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase. • UniProtKB/Swiss-Prot– is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases. AG-ICB-USP

  3. How to assign protein functions? • Similar proteins may share common functions, but… proteins that share common domains may have evolved to perform distinct functions • Proteins that exert similar function may share common domains, but…domain sequences are not always very similar – more refined are requires than simply similarity searches • Proteins may share common domains, but have different architectures– no single domain are necessarily involved with protein function. Many proteins use multiple domains to perform their activities AG-ICB-USP

  4. Some conclusions • Similarity searches may reveal proteins that share very similar sequences and functions – high similarity over the full length of the query sequence • An output with no significant hits or with hits to unannotated proteins will no unravel the possible function of the query protein • Similarity searches do not differentiate orthologues from paralogues • When matching multidomain proteins, it may not be appropriate to transfer the functional annotation – the context is important! AG-ICB-USP

  5. So what do proteins with similar function have in common? AG-ICB-USP

  6. residues, motifs, domains, architecture… AG-ICB-USP

  7. Pattern databases • Databases that contain patterns of residue conservation within groups of related sequences • There are several methods to determine patterns • There are many different pattern databases AG-ICB-USP

  8. Pattern databases AG-ICB-USP

  9. Common protein pattern databases • Prosite patterns – regular expressions • Prosite profiles – weight matrices (profiles) • Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs • Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function • Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches • Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource AG-ICB-USP

  10. How to start building a pattern database? • Prosite patterns – regular expressions • Prosite profiles – weight matrices (profiles) • Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs • Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function • Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches • Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource AG-ICB-USP

  11. How to start building a pattern database? AG-ICB-USP

  12. How to start building a pattern database? With multiple sequence alignments of functionally related proteins AG-ICB-USP

  13. Some definitions • Protein motif – a single conserved region • Prosite pattern – a consensus expression of a conserved region • Frequency matrices(PRINTS) – matrices that contain the frequencies in which residures occur in a given motif • PSSM – position specific score (weight) matrices (BLOCKS) –add a scoring scheme to the frequency matrices • HMMs profiles – probabilistic models derived from alignment profiles • Protein domain - is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. AG-ICB-USP

  14. AG-ICB-USP

  15. AG-ICB-USP

  16. AG-ICB-USP

  17. AG-ICB-USP

More Related