1 / 17

Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi

Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi mihn@fc.celera.com Panther Protein Informatics group Celera Genomics. How to classify proteins in a robust and accurate way?. Outline. Introduction to PANTHER

gili
Download Presentation

Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment of Genome-wide Protein Function Classification for Drosophila melanogaster Huaiyu Mi mihn@fc.celera.com Panther Protein Informatics group Celera Genomics

  2. How to classify proteins in a robust and accurate way?

  3. Outline • Introduction to PANTHER • Comparison of functional classification of Drosophila proteins by FlyBase and PANTHER

  4. What is PANTHER? PANTHER library (PANTHER/LIB) • a family tree • a multisequence alignment • an HMM PANTHER index (PANTHER/X) • Molecular function • Biological process

  5. Building the library & & 500,000 protein sequences (filtered GenBank NR) Biologist curation MSA HMM tree 40,000 subfamilies Family and subfamily was labeled with a name and classified by PANTHER/X categories 2200+ protein family clusters

  6. PANTHER library (PANTHER/LIB)

  7. PANTHER index (PANTHER/X) GO signal transducer GO:0004871 => receptor GO:0004872 => => transmembrane receptor GO:0004888 PANTHER/X RECEPTOR => G-protein coupled receptor => protein kinase receptor => => serine/threonine protein kinase receptor => => tyrosine protein kinase receptor transmembrane receptor protein kinase GO:0019199 => transmembrane receptor protein serine/threonine kinase GO:0004675 => => transforming growth factor alpha receptor GO:0005023 => => transforming growth factor beta receptor GO:0005024 => => => activin receptor GO:0017002 => => => => type I activin receptor GO:0016361 => => => => type II activin receptor GO:0016362 => => => type I transforming growth factor beta receptor GO:0005025 => => => => type I activin receptor GO:0016361 => => => type II transforming growth factor beta receptor GO:0005026 => => => => type II activin receptor GO:0016362 => transmembrane receptor protein tyrosine kinase GO:0004714 => => boss receptor GO:0008288 => => ephrin receptor GO:0005003 => => => GPI-linked ephrin receptor GO:0005004 => => => transmembrane-ephrin receptor GO:0005005 => => epidermal growth factor receptor GO:0005006 => => => gurken receptor GO:0008313 => => fibroblast growth factor receptor GO:0005007 => => hepatocyte growth factor receptor GO:0005008 => => insulin receptor GO:0005009 => => insulin-like growth factor receptor GO:0005010 => => macrophage colony stimulating factor receptor GO:0005011 => => macrophage receptor GO:0008019 => => Neu/ErbB-2 receptor GO:0005012 => => neurotrophin TRK receptor GO:0005013 => => => neurotrophin TRKA receptor GO:0005014 => => => neurotrophin TRKB receptor GO:0005015 => => => neurotrophin TRKC receptor GO:0005016 => => platelet-derived growth factor receptor GO:0005017 => => => platelet-derived growth factor\, alpha-receptor GO:0005018 => => => platelet-derived growth factor\, beta-receptor GO:0005019 => => stem cell factor receptor GO:0005020 => => vascular endothelial growth factor receptor GO:0005021 => vascular endothelial growth factor receptor GO:0005021

  8. PANTHER Scoring Classified (Name Molecular function Biological process) yes Score above threshold? A fasta file Family and subfamily HMMs

  9. How accurate is PANTHER? PANTHER An automated annotation process FlyBase A manually curated database for Drosophila genes Assess the associations

  10. Process for comparison Fly protein sequences PANTHER annotation by Scoring against PANTHER FlyBase annotation With GO terms Automated Comparison of FlyBase and Panther assignments Match Not Match Correct Manual review Inconclusive Incorrect

  11. A B C PANTHER PANTHER HMM hits classified to GO Not hit FlyBase classified to GO 4862 6205 6301 8031 FlyBase not classified to GO 3265 FlyBase PANTHER not classified to GO Classified overlap 3283 PANTHER HMM hits not classified to GO D E F FlyBase classified to GO PANTHER PANTHER HMM hits classified to GO 2794 3658 Not hit 6205 11538 4469 FlyBase Classified overlap 1159 PANTHER not classified to GO PANTHER HMM hits not classified to GO FlyBase not classified to GO Coverage of Drosophila proteins classified by FlyBase and PANTHER. PANTHER Both FlyBase Molecular function Biological process

  12. 37 35 58 50 195 345 663 700 2747 2737 Manual match Inconclusive Auto match Correct Incorrect Assessment of molecular function associations FlyBase PANTHER

  13. Types of errors • Homology error – an error cause by incorrect functional prediction based on sequence homology. • Human error – an error on part of the human curator. • Evidence error – an error by using an evidence that is incorrect.

  14. Analysis of errors

  15. Example of homology error PANTHER function inference in the context of a protein sequence tree FBgn0032382 (CG14934) FlyBase: alpha glucosidase neutral amino acid transporter PANTHER: alpha glucosidase CG14934 Alpha glucosidase Alpha amylase Neutral a.a. transporter Alpha amylase

  16. Summary • PANTHER is an automated method to classify proteins in a robust way. • The accuracy of PANTHER was assessed by comparing its classification of Drosophila proteins with FlyBase’s. • A total of 3283 Drosophila proteins were associated to at least one molecular function category by both FlyBase and PANTHER (3867 molecular function associations by PANTHER, and 3700 by FlyBase). • About 90% of these associations by FlyBase and PANTHER match with each other. • Total error rate is < 2% for both methods.

  17. Acknowledgements Celera Genomics Paul Thomas Jody Vandergriff Michael Campbell Apurva Narechania William Majoros Karen Diemer Olivier Doremieux Nan Guo Anish Kejariwal Steven Ladunga Betty Lazareva Anushya Muruganujan Steve Rabkin FlyBase Michael Ashburner Susanna Lewis

More Related