200 likes | 496 Views
Arthur Gruber. Pattern databases in protein analysis. Instituto de Ciências Biomédicas Universidade de São Paulo. AG-ICB-USP. Protein databases. Genpept – protein sequence database translated from GenBank
E N D
Arthur Gruber Pattern databases in protein analysis Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP
Protein databases • Genpept – protein sequence database translated from GenBank • UniProtKB/TrEMBL – is a computer-annotated protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase. • UniProtKB/Swiss-Prot– is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases. AG-ICB-USP
How to assign protein functions? • Similar proteins may share common functions, but… proteins that share common domains may have evolved to perform distinct functions • Proteins that exert similar function may share common domains, but…domain sequences are not always very similar – more refined are requires than simply similarity searches • Proteins may share common domains, but have different architectures– no single domain are necessarily involved with protein function. Many proteins use multiple domains to perform their activities AG-ICB-USP
Some conclusions • Similarity searches may reveal proteins that share very similar sequences and functions – high similarity over the full length of the query sequence • An output with no significant hits or with hits to unannotated proteins will no unravel the possible function of the query protein • Similarity searches do not differentiate orthologues from paralogues • When matching multidomain proteins, it may not be appropriate to transfer the functional annotation – the context is important! AG-ICB-USP
So what do proteins with similar function have in common? AG-ICB-USP
residues, motifs, domains, architecture… AG-ICB-USP
Pattern databases • Databases that contain patterns of residue conservation within groups of related sequences • There are several methods to determine patterns • There are many different pattern databases AG-ICB-USP
Pattern databases AG-ICB-USP
Common protein pattern databases • Prosite patterns – regular expressions • Prosite profiles – weight matrices (profiles) • Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs • Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function • Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches • Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource AG-ICB-USP
How to start building a pattern database? • Prosite patterns – regular expressions • Prosite profiles – weight matrices (profiles) • Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs • Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function • Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches • Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource AG-ICB-USP
How to start building a pattern database? AG-ICB-USP
How to start building a pattern database? With multiple sequence alignments of functionally related proteins AG-ICB-USP
Some definitions • Protein motif – a single conserved region • Prosite pattern – a consensus expression of a conserved region • Frequency matrices(PRINTS) – matrices that contain the frequencies in which residures occur in a given motif • PSSM – position specific score (weight) matrices (BLOCKS) –add a scoring scheme to the frequency matrices • HMMs profiles – probabilistic models derived from alignment profiles • Protein domain - is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. AG-ICB-USP