410 likes | 512 Views
Measuring Isoform Expression from RNA-Seq data Based on LDA. 刘学军 2012.9.21. Outlines. Background Modeling RNA-Seq data Results. Alternatively spliced isoforms. RNA-Seq data – an example. reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads
E N D
Measuring Isoform Expression from RNA-Seq data Based on LDA 刘学军 2012.9.21
Outlines • Background • Modeling RNA-Seq data • Results
RNA-Seq data – an example reference ACGTCCCC 12 ACGTC reads 8 CGTCC reads 9 GTCCC reads 5 TCCCC reads This gene can be summarized by a sequence of counts 12, 8, 9, 5.
Convert \theta to expression level • Obtain P(\theta|D) • Normalize counts to sequencing depth and isoform length:
Data set 1 3 conditions, each with 2 technical replicates 9370 genes which contain multiple isoforms.
Data set 1 • Histogram of probe number per gene • 72.43 on average
Data set 2 • Two conditions, 8 qRT-PCR validated isoforms
Data set 2 • Histogram of probe number per gene • 72.57 on average
Modelling Multi-response Surfaces for Airfoil Design with Multiple Output Gaussian Process Regression
Gaussian processes • Multiple output GP • MGP in airfoil design
Gaussian Processes • A Gaussian process (GP) is used to describe a distribution over functions. • A GP is a collection of random variables, any finite number of which have a joint Gaussian distribution.
Gaussian Processes The mean function and the covariance function are defined, The GP can be written as
Gaussian Processes The mean function and the covariance function are defined, The GP can be written as
Gaussian Processes The covariance function implies the prior distribution over functions.
Gaussian Processes Prediction with noise-free observations,
Convolution processes for multiple outputs • Consider a set of D output functions where is the input domain. is expressed as
Convolution processes for multiple outputs Consider more than one latent function are taken to be draw from a zero-mean GP with
Convolution processes for multiple outputs If the kernel smoothing function is and the covariance for the latent process is the covariance for the multiple responses is
Convolution processes for multiple outputs Each of the outputs can be corrupted with an independent process, The likelihood is and the prediction is
Correlation between Cl and Cd R^2=0.8525
Inverse design • Pressure distribution -> airfoil shape
Acknowledgment • 李蒙 • 闫国启 • 张礼 • 祝青雷