1 / 44

By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009

Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology. By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009. Agenda Today. Cyber-infrastructure and systems biology. (2) High performance computing and software for peptide/protein

ella
Download Presentation

By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Cyber-infrastructureforGenomics andProteomics in Systems Biology ByXianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009

  2. Agenda Today • Cyber-infrastructure and systems biology. • (2) High performance computing and software for peptide/protein • identification and quantification, data mining/target discovery, • on mass spectrometry generated proteomics data. • (3) Relational database management system, genome • annotation methodology, systems biology data integration, • biology knowledge generation and augmentation.

  3. Section One: Cyber-infrastructure and Systems Biology Reductionist approach, one gene, one protein Systems approach, multiple genes, network analysis Cutting edge science and technology

  4. Status of Technologies in Systems Biology

  5. Cyber-infrastructure for Systems Biology • http://www.communitytechnology.org/nsf_ci_report/ • “…. build new types of scientific and engineering knowledge environments and organizations to pursue research in new ways and with increased efficacy. • …..new NSF funding of $1 billion per year is needed to achieve critical mass ……. 2008 Awarded $50 millions 2004 Awarded $85 millions 2004 Awarded to $100 millions

  6. Supporting Cyber- infrastructure and Systems Biology Workflow Historic strong area Supporting

  7. Cyber-knowledge System to Enable Genomics-based Predicative Medicine (DOE - Genomics: GTL Roadmap, p.52)

  8. Core Laboratory Facility: Data Generation Core Computational Facility: Data Processing, Storage, and Dissemination Cyber-infrastructure, Data Management, Data Analysis Pipeline, and Data Display System Integration at Systems Biology Center (1) LIMS for raw data & protocol (2) Preprocessed data management (3) High throughput computing (4) Data validation and integration (5) Knowledge representation Data Mining and Knowledge Discovery

  9. Cyber-infrastructure Component (1) : High Performance Computing --- Migration of Bio-Computing Capability Step 1 Step 2 Start point PC Single CPU Computing Unix Multiple CPUs Computing Cluster Computing 2-4 biological labs 5-10 biological labs in US Most labs For large sets of data analysis

  10. Cyber-infrastructure Component (2) : Integrated Knowledgebase System --- Case Study of National Biodefense Proteomics Data Center

  11. Section Two: High Performance Computing and Proteomics ---- System Integration Case 1: UVa Proteomics Data Center High Performance and Throughput Computing Data Management Data Management

  12. Computational Proteomics Software and Algorithms Protein Database Search Engines Mascot Matrix Science Sequest / Bioworks Scripps/Thermo X! Tandem the GPM Spectrum Mill Agilent Technologies OMSSA NCBI PEAKS Bioinformatics Solutions Inc. Phenyx GeneBio Statistical Validation and Quantitation PeptideProphet Institute for Systems Biology ProteinProphet Institute for Systems Biology ASAPRatio, XPRESS, Libra Institute for Systems Biology Scaffold Batch System Proteome Software, Inc. SIEVE Thermo Census Scripps Research Institute Open Data Standards FuGE and XAR FHCRC, ICBC, ITMAT, & Manchester MIAPE HUPO PSI and Collaborators mzXML, pepXML, protXML Institute for Systems Biology MS1, MS2, SQT Scripps Research Institute Many more ……..…

  13. System Integration Case 2: National Biodefense Proteomics Data Center http://www.proteomicsresource.org Awarded $14 millions

  14. Proteomics Research Centers (PRC) and Their Major Data Types PRC Organizations Major Data Types (1) University of Michigan Microarray and mass spectrometry (2) Caprion Pharmaceuticals Mass spectrometry (3) Harvard Proteomics Institute Genomics and protein expression array (4) Albert Einsten College of Medicine Mass spectrometry (5) PNNL Mass spectrometry (6) Scripps NMR structural, X-ray crystal diffraction data, and Mass spectrometry (7) Myriad Genetics Yeast two-hybrid system

  15. Proteomics Data Flow Data Modeling / Decomposition 2D GELS Protein Array LC Immunoaffinity purification Y2H MS MS/MS NMR X-Ray Cryoelectron Microscopy X-Ray Defraction etc… PRCS Converting to Standard Format QA & QC QA & QC Standard Format VBI Quality Assurance & QualityControl Standard Format for Each Data Type Public Relational Database Quality Assurance & QualityControl Data Sources Data Types MIAME and MIAPE-like Standards/SOP for Data Submission

  16. Proteomics Database Architecture

  17. Databases in Proteomics Data Center Search By Experiment/Sample

  18. Strategies for Annotating Raw Data into Meaningful Knowledge • Annotation improvement and interaction network analysis (1) Non-homologous based methods -------------- Phylogenetic profiling, Rosetta stone pattern, Operon analysis, Co-expression profiling, Gene neighboring etc. (2) Comparative genomics with reference genomes --- E. coli, yeast, Arabidopsis, etc. model organisms. • Identifying anchor points for data integration (1) Known metabolic pathway; (2) Known signal transduction pathway; (3) Known gene regulation machinery; (4) Known protein-protein interaction map. BMC Bioinformatics 2006, 7 (Suppl 4):S18

  19. Qualitative Data Integration and Knowledge Augmentation Based on Networks Biology

  20. Quantitative Proteome Profiling --- The field is 2-3 years old Thermo SIEVE Scatter Plot of 14 UVa Raw Files for Validation of Data Quality and Absolute Quantification. Scaffold Capability of Proteome Spectra Counts of Semi-quantification.

  21. Search Engine Comparison at UVa Proteomics Data Center (1) Low annotation rates Few common annotations

  22. Peptide/Protein Identifications with Various Protein Database Search Engines (2) X!Tandem missed OMSSA missed Sequest over-predicted

  23. UVaPDC, MS/MS Search Engine Comparison (3) Common annotations Statistics on confident values Spectra counts

  24. Statistics and Summarization Capability of Scaffold --- The best feather of the software

  25. Data Mining on Data Processed via Computational Approach Knowledge-based Discovery

  26. Inference on Gene Network in Systems Biology Identified Knowledge Inference (1) Y2H, (2) MS pull down assay, (3) Co-expression assay. Knowledge Inference Rate limited step Identified Target/lead protein Where are the significant regulatory steps impacting pathway expression ?

  27. Healthy Individual Patient with Bladder Cancer Urine Urine Exosomes Urine Microparticles Ectosomes Gγ LC-MS/MS Western Blotting Gβ SEQUEST EPS8L2 Spectral Count Analysis Urinary Biomarker Identification ---EGFR Pathway Related Bladder Cancer ----- Small scale analysis Mucin-4* EGFR Adenylate Cyclase P cAMP P ATP Gα* Gγ Gβ GTP NRas* EPS15 Gα* EPS8L1* or EPS8L2* GTP EDH1 Raf GDP MAPK Cell Proliferation MP Formation * Differentially expressed

  28. Patten Matching on Gene Signatures at Various Biological States --- Large-scale analysis *** query signatures are compared to reference gene/protein expression signatures for known perturbations or disease phenotypes. (many to many association analysis)

  29. Section Three : Knowledge Base Establishment Database Case 1  Soybean Upstream Regulatory Elements for Ongoing Regulatory Motif Annotation

  30. Nominated Transcription Factor Involved in Stress Response Implicated in regulating wounding and jasmonate responses Soybean Promoter : GmERFs, Gmubis, Gmcons, GmWRKYs more and more and more…….. 10 promoters per month Group IX Promoter Red Dot = Soybean ERF genes

  31. Ongoing Effort on Transcription Factor Binding Motifs ---- Identify genetic circuits of cell wall, starch, and lipid biosynthesis and degradation

  32. Elucidation of Conserved Co-expression Networks via Data Integration with Expression Profiling Data

  33. Database Case 2  CGKB and TOBFAC Knowledge Bases • BMC Bioinformatics. 2007, 8:129. • BMC Bioinformatics. 2008, 9:53.

  34. Genome Annotation Strategy (1) : Homology-based Annotation High level coding region detection ! BMC Genomics. 2008, 9:103. 263,425 total cowpea gene space sequence (GSS).

  35. Genome Annotation Strategy (2) : Metabolic Pathway Integration BMC Bioinformatics. 2007, 8:129.

  36. Genome Annotation Strategy (3) : GO Integration with Distribution of Function Assignments BMC Genomics. 2008, 9:103.

  37. Genome Annotation Strategy (4): Comparative Genomics at Genome-scale ---- Example of medicago vs cowpea BMC Genomics. 2008, 9:103.

  38. Genome Annotation Strategy (5): Comparison at Gene Family Level --- WRKY and CONSTANS (CO) and CO-like Gene Families of Cowpea Transcription Factors • BMC Genomics. 2008, 9:103. • Plant Physiology. 2008, 147:280-295.

  39. Genome Annotation Strategies: (6) Repeat, (7) Domain, (8) Gene Model Repeat Domain GeneModel BMC Bioinformatics. 2007, 8:129.

  40. Genome Annotation Strategy (9) : Comparative Genomics on Network for Conserved Protein Complexes Conserved networks Comparative genome analysis

  41. Published Protein-Protein (PPI) Interactions in Organisms Example of Yeast PPI

  42. Genome Annotation Strategy (10): Functional Validation of Genes of Interest Through Reverse Genetics Program 2008 My name

  43. Acknowledgement

More Related