190 likes | 358 Views
Application of Bayesian methods in genomics. Sylvia Richardson sylvia.richardson@mrc-bsu.cam.ac.uk MRC Biostatistics Unit and University of Cambridge. MRC Biostatistics Unit Research Themes. Statistical Genomics. Background.
E N D
Application of Bayesian methods in genomics Sylvia Richardson sylvia.richardson@mrc-bsu.cam.ac.uk MRC Biostatistics Unit and University of Cambridge
Background • In integrative genomics, many questions of interest involve linking a large set of p predictors, e.g. SNP or gene expression, to q multiple responses, e.g. disease characteristics or biological phenotypes, using a moderate number of samples n • For example, interest in finding genetic markers associated with lipid metabolism, or genetic control points regulating the process of transcription • Statistical framework: Sparse Bayesian regression with model selection component
Sparse regression computations • Model exploration has to search a vast space of possible models when p is large • Our implementation: GUESS uses Evolutionary Monte Carlo techniques (Evolutionary Stochastic Search, ESS, running several MCMC chains in parallel) and GPU computing • Subset selection for single and multiple response phenotypes • An R Package R2GUESS, which calls a C++ code, soon to be released. R2GUESS runs the ESS algorithm and performs the complex post-processing of the output.
Application of GUESS to the genetic association analysis of lipid phenotypes
Analysis strategy and results for groups of correlated phenotypes Altogether 16 markers were found associated with different groups of phenotypes, in good correspondence with large GWAS analyses
Extension of sparse framework to hierarchically related regressions • Linking a large number q of responses to a large set of p predictors • Motivation: genetic regulation of expression
Hierarchical structure • We now have a (pxq) matrix Γ of selection indicators • Want to borrow information across the responses to highlight predictors common to several responses Adopt parametrisation of Ω involving a common parameter to each column, while still controlling sparsity in each regression.
Mouse gene expression case study Discovery of 6 “hot spots”, i.e. genetic markers associated to a substantial proportion of transcripts.
Summary • Focus on modelling strategies where multidimensional and multivariate aspects are fully exploited • Perform information synthesis: different sources of data, hierarchical structures, prior models informed by external information, … • Allow for model uncertainty and compare with alternative analytical strategies • Embed models and methods within state of the art Bayesian computations.