1 / 39

Predicting Protein Function: Understanding Biochemical Functions through Protein Signature and Protein Domains

Learn how to predict protein function through protein signatures and domains. Explore the role of homologous proteins, orthologs, paralogs, and COGs in inferring molecular functions and biological processes. Discover the significance of motifs, domains like DNA-binding, and the Pfam database in functional annotation and protein classification.

janandrews
Download Presentation

Predicting Protein Function: Understanding Biochemical Functions through Protein Signature and Protein Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Protein Function protein RNA DNA

  2. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245

  3. Function based on ligand binding specificity What (who) does it bind ?? Page 245

  4. Function based on biological process What is it good for ?? Amino acid metabolism? Page 245

  5. Function based on cellular location DNA RNA Where is it active?? Nucleolus ?? Cytoplasm?? Page 245

  6. Function based on cellular location DNA RNA Where is the RNA/Protein Expressed ?? Brain? Testis? Where it is under expressed?? Page 245

  7. GO (gene ontology)http://www.geneontology.org/ • The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated • molecular functions(F) • biological processes (P) • cellular components (C) Ontology is a description of the concepts and relationships that can exist for an agent or a community of agents

  8. Inferring protein function Bioinformatics approach • Based on homology • Based on functional characteristics • “protein signature”

  9. Homologous proteins • Rule of thumb:Proteins are homologous if 25% identical (length >100)

  10. Proteins with a common evolutionary origin Homologous proteins Orthologs - Proteins from different species that evolved by speciation. Hemoglobin human vsHemoglobin mouse Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events. Hemoglobin human vsMyoglobin human

  11. COGsClustersof Orthologous Groupsof proteins > Each COG consists of individual orthologous proteins or orthologous sets of paralogs. > Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR) DATABASE

  12. Inferring protein function based on the protein signature

  13. The Protein Signature • Motif (or fingerprint): • a short, conserved region of a protein • typically 10 to 20 contiguous amino acid residues • Domain: • A region of a protein that can adopt a 3 dimensional structure

  14. Protein Motifs Protein motifs can be represented as a consensus or a profile 1 50 ecblc MRLLPLVAAA TAAFLVVACS SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL GKWYEVARLD hsrbp ~~~MKWVWAL LLLAAWAAAE RDCRVSSFRV KENFDKARFS GTWYAMAKKD GTWYEI K AV M GXW[YF][EA][IVLM]

  15. Searching for Protein Motifs - ProSite a database of protein patterns that can be searched by either regular expression patterns or sequence profiles. - PHI BLASTSearching a specific protein sequence pattern with local alignments surrounding the match. -MEME searching for a common motifs in unaligned sequences

  16. Protein Domains • Domains can be considered as building blocks of proteins. • Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.

  17. DNA Binding domainZinc-Finger

  18. Varieties of protein domains Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times Page 228

  19. Example of a protein with 2 domains: Methyl CpG binding protein 2 (MeCP2) MBD TRD The protein includes a Methylated DNA Binding Domain (MBD) and a Transcriptional Repression Domain (TRD). MeCP2 is a transcriptional repressor.

  20. Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins

  21. Are proteins that share only a domain homologous?

  22. Pfam • > Database that contains a large collection of multiple sequence alignments of protein domains • Based on • Profile hidden Markov Models (HMMs).

  23. Profile HMM (Hidden Markov Model) HMM is a probabilistic model of the MSA consisting of a number of interconnected states D19 D16 D17 D18 100% delete 100% 16 17 18 19 50% M16 M17 M18 M19 D R T R D R T S S - - S S P T R D R T R D P T S D - - S D - - S D - - S D - - R 100% 100% 50% Match D 0.8 S 0.2 P 0.4 R 0.6 R 0.4 S 0.6 T 1.0 I16 I17 I18 I19 insert X X X X

  24. Pfam > Database that contains a large collection of multiple sequence alignments of protein domains Based on Profile Hidden Markov Models (HMMs). • > The Pfam database is based on two distinct classes of alignments • Seed alignments which are deemed to be accurate and used to produce Pfam A • -Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B

  25. Physical properties of proteins

  26. DNA binding domains have relatively high frequency of basic (positive) amino acids MKD P A A LKRARN T E A A RRS SRARKL QRM GCN4 zif268 M E R P Y A C P V E S C D RR F S R S D E L T RH I R I H T S K V N E A F E T L KR C T S S N P N Q R L P K V E I L R N A I R myoD

  27. Transmembrane proteins have a unique hydrophobicity pattern

  28. Physical properties of proteins Many websites are available for the analysis of individual proteins for example: EXPASY (ExPASy) UCSC Proteome Browser ProtoNet HUJI The accuracy of the analysis programs are variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as Phosphorylation sites), experimental evidence may be required rather than prediction algorithms. Page 236

  29. Knowledge Based Approach • IDEA Find the common properties of a protein family (or any group of proteins of interest) which are unique to the group and different from all the other proteins. Generate a model for the group and predict new members of the family which have similar properties.

  30. Knowledge Based Approach Basic Steps 1. Building a Model • Generate a dataset of proteins with a common function (DNA binding protein) • Generate a control dataset • Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins • Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups

  31. Basic Steps 2. Predicting the function of a new protein • Calculate the properties for a new protein And represent them in a vector • Predict whether the tested protein belongs to the family

  32. TEST CASE Y14 – A protein sequence translated from an ORF (Open Reading Frame) Obtained from the Drosophila complete Genome >Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G

  33. >Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G Y14 DOES NOT BIND RNA

  34. Database and Tools for protein families and domains • InterPro - Integrated Resources of Proteins Domains and Functional Sites • Prosite – A dadabase of protein families and domain • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived) • PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db

More Related