250 likes | 381 Views
EMBL Systems Microscopy Meeting. Bernd Fischer. rhdf5. R-package rhdf5 will appear tonight on http://www.bioconductor.org Easy to install on all platforms (windows, linux, MacOS) No need to install additional hdf5 libraries Ability to read (and write) HDF5 files from cellcognition in R.
E N D
EMBL Systems Microscopy Meeting Bernd Fischer
rhdf5 • R-package rhdf5 will appear tonight on http://www.bioconductor.org • Easy to install on all platforms (windows, linux, MacOS) • No need to install additional hdf5 libraries • Ability to read (and write) HDF5 files from cellcognition in R
What makes us different? • Genome-wide association studies (GWAS) have been enormously successful to identify many new causal players • ... but have not produced models that can explain / predict complex phenotypes at the level of individuals • - Rare polymorphisms. • - Genetic interactions, epistasis: “system is more than the sum of its parts”.
Design of Combinatorial knock-down Screen • 1367 x 72 Drosophila genes • each targeted by two independent dsRNA designs • 1456 plates (=559.104 wells) • ~100.000 distinct gene pairs
Imaging • Cells grow in incubator for five days • Fixate and stain with DAPI, pH3, and a-Tubulin • Image with automated fluorescence microscope Joint Work with Thomas Horn, Thomas Sandmann, Michael Boutros, DKFZ
Image Processing • Segmentation of Nuclei (in DAPI and pH3 channel) • Propagation of Segmentation from nucleus to cell body (region growing in a-Tubulin channel) • Extraction of features (intensity, area, texture, …) per cell, summary per well (mean, sd, quantiles, local cell density).
Estimating Genetic Interactions • For many phenotypes, the main effects (single gene) are multiplicative for non interacting genes i, j: • Additive on logarithmic scale • Estimation of main effects (assume that interactions are rare) • Detect Genetic Interactions: Compare to (t-test) effect of control main effect of dsRNA j error term interaction term 0, for non interacting genes ≠0, for interacting genes measurement (nr cells, growth rate, …) main effect of dsRNA i 8 23/10/2014
Interaction Matrix query genes x phenotypes template genes
Correlation of Interaction profiles preservedover phenotypes Correlation of interaction profiles Interaction
Classification of GO categories The interaction profile of each gene is used as a feature vector for machine learning Query genes are regarded as perturbations Trained classifier for 3311 categories (GO, Reactome, Interpro, …) 284 classifier showed good performance by cross validation (recall > 0.5 at precision 0.5)
Multiple Classification of genes blue: predicted, but not annotated yellow: annotated and predicted gray: annotation, but not predicted GO terms genes
Example APC ana phase promoter complex (APC) Direct inhibition of APC
Matrix Factorization predicted class probability (W) 284 classes genes PI: n template genes x m interaction features W: n template genes x k classes H: k classes x m interaction features n = 1367 template genes m = 720 (= 72 query genes * 10 phenotypes) k = 284 classes
Matrix Factorization class specific interaction profiles (H) 284 classes 72 query genes x 10 phenotypes Matrix factorization
Residuals (Semi-Supervised bi-clustering) Further matrix factorization to extract novel complexes/pathways
Modeling Genetic Interaction Data • View 1: Query genes are perturbations. => Cluster template genes to obtain functional information • View 2: phenotypic measurement in 100 000 different combinatorial genetic backgrounds => genotype to phenotype prediction
Network learningidentify the underlying molecular modules and their relation area number of cells phenotype (observed) activity of core modules (e.g. complexes, ‘path-ways’, mRNA or protein expression) (hidden) genetic perturbation state of each single experiment (observed)
EM algorithm E-Step: Compute expected activity levels on hidden nodes, given network structure and parameter => (loopy) belief propagation M-Step: Estimation network structure and regression parameter by maximizing expected likelihood area number of cells phenotype (observed) activity level (hidden) genetic perturbation (observed)
Summary • rhdf5 is out • Semi-supervised bi-clustering for genetic interaction data • Network learning to obtain causal network • General applicability to high dimensional phenotype data • Acknowledgement: Thomas Horn, Thomas Sandmann, Maximilian Billmann, Michael Boutros, Wolfgang Huber Huber group