1 / 47

基因表达和蛋白丰度之间的比较和分析

基因表达和蛋白丰度之间的比较和分析 Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. 宁康 ningkang@qibebt.ac.cn 计算生物学研究小组 中国科学院青岛生物能源与过程研究所( QIBEBT-CAS ). http://www.qibebt.ac.cn/ 11/15/2012

mave
Download Presentation

基因表达和蛋白丰度之间的比较和分析

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 基因表达和蛋白丰度之间的比较和分析 Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data 宁康 ningkang@qibebt.ac.cn 计算生物学研究小组 中国科学院青岛生物能源与过程研究所(QIBEBT-CAS) http://www.qibebt.ac.cn/ 11/15/2012 http://www.bioenergychina.org/ http://www.computationalbioenergy.org/

  2. Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…

  3. The important biological questions Everything goes high-throughput… The underline process in transcription and translation? Transcriptome Gene expression Proteome Protein abundance But not very high correlation…

  4. The techniques for these questions • On the proteomic side LC–MS/MS or shotgun proteomics is the method of choice for large-scale protein identification • Label-free methods and labeled methods Labeled Label-free

  5. The techniques for these questions • On the proteomic side • Label-free: MS-1 based “peak intensity” or MS-2 based “spectrum counting” Spectrum counting Peak intensity J. Proteome Res, 2012, 11(4), 2261-2271

  6. The techniques for these questions • On the transcriptomic side Next-generation sequencing has recently emerged as a promising alternative to established microarray based methods Microarray RNA-Seq

  7. The main objectives • Comparative analysis of different label-free protein quantification methods using several software tools on the proteomic side • Correlation analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side • Better understanding of correlation between gene and protein expression

  8. Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…

  9. The overall scheme J. Proteome Res, 2012, 11(4), 2261-2271

  10. comprehensively analyzed mouse mitochondrial genes and proteins in various mouse tissues The datasets MitoCarta database http://www.broadinstitute.org/pubs/MitoCarta/  RNA-Seq profiling (http://woldlab.caltech.edu/rnaseq/) GNF1M tissue atlas

  11. The analysis procedure mzXML X!Tandem PiptideProphet ProteinProphet Spectra count (NSAF) msInspect msBID SpectrumMill RPKM values

  12. Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…

  13. The number of comparable genes • Mitochondrial (all) genes that could be compared at proteomic level J. Proteome Res, 2012, 11(4), 2261-2271

  14. Different techniques for expression measurements • At gene expression level • At protein abundance level

  15. Correlation between Gene Expression and Protein Abundances

  16. Correlation between Gene Expression and Protein Abundances • MS-1 based “peak intensity” • MS-2 based “spectrum count”

  17. Changes of expressions in different tissues mRNA vs. protein • Direction of changes • The majority of genes exhibited same direction of change based on gene expression by mRNA-Seq and protein abundance by msInspect for brainstem against liver Gene expression Protein abundance Brainstem vs. Liver

  18. Technical Factors Affecting the Correlation • The lengths of genes The gene length affect both gene expression and protein abundance values

  19. Technical Factors Affecting the Correlation • The low-abundance genes • The inclusion of lower intensity genes and proteins does not significantly affect the overall correlation.

  20. Technical Factors Affecting the Correlation • Does number matter? • The standard deviation of correlation coefficients gradually increased: a noticeable shift in the correlation coefficients toward lower values… Increasing R

  21. Technical Factors Affecting the Correlation “coding region dominant” genes • Does gene structure matter? • Restricting the analysis to these genes only (termed “coding region dominant” genes) improved the correlation slightly

  22. Biological Factors Affecting the Correlation • The effect of functional annotations • Correlation between gene and protein abundances for selected GO categories

  23. Biological Factors Affecting the Correlation • The sub-location annotation issue • Correlation based on these inner membrane genes is better than based on all mouse mitochondrial genes

  24. Among the top 5 most read articles in the journal in April 2012 (publication month)

  25. Biological Factors Affecting the Correlation • The RNA/protein stability issue • mRNA and protein half-lives in the mouse • Protein and mRNA stability are among the most significant factors governing the correlation between gene and protein abundances Quantitative model of gene expression in growing cells Chen, et al., Nature, 2011

  26. Next step: from analysis to prediction mRNA expression mathematics model Protein expression predict Translation rate Degradation rate • Issues: • The translation and protein degradation rates are difficult to detect • The model is on the basis of stead-state in cell. • ……

  27. Divide-and-conquer based on bi-clustering?

  28. Quantification and explanations

  29. Bi-clustering of gene expression / protein abundance • Bi-clustering of expressions…

  30. Factors for protein degradation • Enzyme activities: • Other factors: • Amino acids W,C, T,F,Y,V are enriched in labile proteins, but E,D,K,N,R,Q are enriched in stable protein. • Short half-life proteins are enriched for membrane proteins and signal transduction proteins, whereas long–half-life proteins are enriched for cytoskeleton proteins and nuclear proteins with housekeeping functions

  31. Preliminary results • Mouse liver tissue Hierarchical Cluster

  32. Preliminary results Bi-clustering y = 0.7847x + 5.8634 R² = 0.7497 y = 0.8396x + 5.9941 R² = 0.8004

  33. Preliminary results Clusters of interests

  34. Preliminary results Bi-clustering result analysis

  35. Preliminary results 1. Stable mRNA and protein (1)Enzymes(citric acid cycle, energy metabolism) (2)Reductases We reason that many housekeeping genes tend to have stable mRNAs and proteins. 2. Stable mRNA and labile protein (1)Regulated genes expression products (2)Dehydrogenases (3)Oxidases

  36. Preliminary results Mathematics modeling Use SVM (support vector machine) to combine multiple features? ? ? Cluster1 Protein1 The effect of single factor --- enzyme activity Cluster2 Protein2 Plus: 3D structure, enzyme activity, etc. Cluster3 Protein3 SVM modeling Cluster4 Protein4

  37. Summary • Spectral counts good as a basis for a more comprehensive strategy of evaluating protein abundance trends • Using the top 3 normalized peptide area intensities from MS1 for protein abundance correlated best with gene expression data collected through RNA-Seq • Both technical and biological factors affect the correlations of gene expression and protein abundance • Divide-and-conquer method for designing robust computational model for extracting gene and label-free protein abundance information

  38. http://www.computationalbioenergy.org/ Genotype Phenotype Enterotype Big-Data (genomics, proteomics, Raman profiling, etc.) Pure Strain (Genomic method) Community (Metagenomic method) Single-cell (Single-cell method)

  39. Bioenergy Agriculture Fermentation Medicine Bio-resources Cell biology Healthcare Synthetic biology Bio-material Microbial community Molecular biology Environmental monitoring Ecology Bio-defense Bionics Food screening …… Metagenomic technology Single-cell technology

  40. Single-cell data analysis platform

  41. Single-cell data analysis platform • Single-cell manipulation / sorting • Automatic phenotyping

  42. Acknowledgements • Members: • Stuff:Q Zhou, XQ Su, LH Ren, JY Wang, AH Wang, XZ Chang, YH Qiao • Student:RR Huang, XJ Wang, BX Song, W Fang, JQ Hu, M Gabriel (visiting), XW Cheng, J Wang • Collaborators: • JIANG Tao,(UC riverside, USA; ACM Fellow)(on metagenomics) • WONG Limsoon (NUS, Singapore) (on network) • CUI Xingping (UC riverside,USA) (on SNP detection and metagenomics) • Yiu SM, Li SC (Hong Kong) (on network) • Jan Baumbach (MPI, Germany) (on network) • WEI Chaochun (SJTU, China) (on metagenomics) • Alexey Nesvizhskii (U of Michigan, USA) (on proteomics) • Ansgar Poetsch (RUB, Germany) (on proteomics)

  43. Thank you!

  44. http://ComputationalBioenergy.org Example software Research areas Hardware platform Released software

  45. Thank you! Qingdao / Tsingdao

More Related