470 likes | 610 Views
基因表达和蛋白丰度之间的比较和分析 Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. 宁康 ningkang@qibebt.ac.cn 计算生物学研究小组 中国科学院青岛生物能源与过程研究所( QIBEBT-CAS ). http://www.qibebt.ac.cn/ 11/15/2012
E N D
基因表达和蛋白丰度之间的比较和分析 Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data 宁康 ningkang@qibebt.ac.cn 计算生物学研究小组 中国科学院青岛生物能源与过程研究所(QIBEBT-CAS) http://www.qibebt.ac.cn/ 11/15/2012 http://www.bioenergychina.org/ http://www.computationalbioenergy.org/
Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…
The important biological questions Everything goes high-throughput… The underline process in transcription and translation? Transcriptome Gene expression Proteome Protein abundance But not very high correlation…
The techniques for these questions • On the proteomic side LC–MS/MS or shotgun proteomics is the method of choice for large-scale protein identification • Label-free methods and labeled methods Labeled Label-free
The techniques for these questions • On the proteomic side • Label-free: MS-1 based “peak intensity” or MS-2 based “spectrum counting” Spectrum counting Peak intensity J. Proteome Res, 2012, 11(4), 2261-2271
The techniques for these questions • On the transcriptomic side Next-generation sequencing has recently emerged as a promising alternative to established microarray based methods Microarray RNA-Seq
The main objectives • Comparative analysis of different label-free protein quantification methods using several software tools on the proteomic side • Correlation analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side • Better understanding of correlation between gene and protein expression
Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…
The overall scheme J. Proteome Res, 2012, 11(4), 2261-2271
comprehensively analyzed mouse mitochondrial genes and proteins in various mouse tissues The datasets MitoCarta database http://www.broadinstitute.org/pubs/MitoCarta/ RNA-Seq profiling (http://woldlab.caltech.edu/rnaseq/) GNF1M tissue atlas
The analysis procedure mzXML X!Tandem PiptideProphet ProteinProphet Spectra count (NSAF) msInspect msBID SpectrumMill RPKM values
Outline • Background • General analysis scheme • Transcriptome analysis • Proteome analysis • Associated analysis • Explanation of the correlations • Technical issues • Biological issues • The best techniques…
The number of comparable genes • Mitochondrial (all) genes that could be compared at proteomic level J. Proteome Res, 2012, 11(4), 2261-2271
Different techniques for expression measurements • At gene expression level • At protein abundance level
Correlation between Gene Expression and Protein Abundances • MS-1 based “peak intensity” • MS-2 based “spectrum count”
Changes of expressions in different tissues mRNA vs. protein • Direction of changes • The majority of genes exhibited same direction of change based on gene expression by mRNA-Seq and protein abundance by msInspect for brainstem against liver Gene expression Protein abundance Brainstem vs. Liver
Technical Factors Affecting the Correlation • The lengths of genes The gene length affect both gene expression and protein abundance values
Technical Factors Affecting the Correlation • The low-abundance genes • The inclusion of lower intensity genes and proteins does not significantly affect the overall correlation.
Technical Factors Affecting the Correlation • Does number matter? • The standard deviation of correlation coefficients gradually increased: a noticeable shift in the correlation coefficients toward lower values… Increasing R
Technical Factors Affecting the Correlation “coding region dominant” genes • Does gene structure matter? • Restricting the analysis to these genes only (termed “coding region dominant” genes) improved the correlation slightly
Biological Factors Affecting the Correlation • The effect of functional annotations • Correlation between gene and protein abundances for selected GO categories
Biological Factors Affecting the Correlation • The sub-location annotation issue • Correlation based on these inner membrane genes is better than based on all mouse mitochondrial genes
Among the top 5 most read articles in the journal in April 2012 (publication month)
Biological Factors Affecting the Correlation • The RNA/protein stability issue • mRNA and protein half-lives in the mouse • Protein and mRNA stability are among the most significant factors governing the correlation between gene and protein abundances Quantitative model of gene expression in growing cells Chen, et al., Nature, 2011
Next step: from analysis to prediction mRNA expression mathematics model Protein expression predict Translation rate Degradation rate • Issues: • The translation and protein degradation rates are difficult to detect • The model is on the basis of stead-state in cell. • ……
Bi-clustering of gene expression / protein abundance • Bi-clustering of expressions…
Factors for protein degradation • Enzyme activities: • Other factors: • Amino acids W,C, T,F,Y,V are enriched in labile proteins, but E,D,K,N,R,Q are enriched in stable protein. • Short half-life proteins are enriched for membrane proteins and signal transduction proteins, whereas long–half-life proteins are enriched for cytoskeleton proteins and nuclear proteins with housekeeping functions
Preliminary results • Mouse liver tissue Hierarchical Cluster
Preliminary results Bi-clustering y = 0.7847x + 5.8634 R² = 0.7497 y = 0.8396x + 5.9941 R² = 0.8004
Preliminary results Clusters of interests
Preliminary results Bi-clustering result analysis
Preliminary results 1. Stable mRNA and protein (1)Enzymes(citric acid cycle, energy metabolism) (2)Reductases We reason that many housekeeping genes tend to have stable mRNAs and proteins. 2. Stable mRNA and labile protein (1)Regulated genes expression products (2)Dehydrogenases (3)Oxidases
Preliminary results Mathematics modeling Use SVM (support vector machine) to combine multiple features? ? ? Cluster1 Protein1 The effect of single factor --- enzyme activity Cluster2 Protein2 Plus: 3D structure, enzyme activity, etc. Cluster3 Protein3 SVM modeling Cluster4 Protein4
Summary • Spectral counts good as a basis for a more comprehensive strategy of evaluating protein abundance trends • Using the top 3 normalized peptide area intensities from MS1 for protein abundance correlated best with gene expression data collected through RNA-Seq • Both technical and biological factors affect the correlations of gene expression and protein abundance • Divide-and-conquer method for designing robust computational model for extracting gene and label-free protein abundance information
http://www.computationalbioenergy.org/ Genotype Phenotype Enterotype Big-Data (genomics, proteomics, Raman profiling, etc.) Pure Strain (Genomic method) Community (Metagenomic method) Single-cell (Single-cell method)
Bioenergy Agriculture Fermentation Medicine Bio-resources Cell biology Healthcare Synthetic biology Bio-material Microbial community Molecular biology Environmental monitoring Ecology Bio-defense Bionics Food screening …… Metagenomic technology Single-cell technology
Single-cell data analysis platform • Single-cell manipulation / sorting • Automatic phenotyping
Acknowledgements • Members: • Stuff:Q Zhou, XQ Su, LH Ren, JY Wang, AH Wang, XZ Chang, YH Qiao • Student:RR Huang, XJ Wang, BX Song, W Fang, JQ Hu, M Gabriel (visiting), XW Cheng, J Wang • Collaborators: • JIANG Tao,(UC riverside, USA; ACM Fellow)(on metagenomics) • WONG Limsoon (NUS, Singapore) (on network) • CUI Xingping (UC riverside,USA) (on SNP detection and metagenomics) • Yiu SM, Li SC (Hong Kong) (on network) • Jan Baumbach (MPI, Germany) (on network) • WEI Chaochun (SJTU, China) (on metagenomics) • Alexey Nesvizhskii (U of Michigan, USA) (on proteomics) • Ansgar Poetsch (RUB, Germany) (on proteomics)
http://ComputationalBioenergy.org Example software Research areas Hardware platform Released software
Thank you! Qingdao / Tsingdao