380 likes | 547 Views
Dynamic multivariate phenotypes and genetic interactions. Joining Forces Day, ETH 20.11.2009 Wolfgang Huber EMBL Heidelberg. Complex traits. Do not follow Mendelian inheritance, result from multiple alleles Responsible alleles contribute different amounts to phenotype
E N D
Dynamic multivariate phenotypes and genetic interactions Joining Forces Day, ETH 20.11.2009 Wolfgang Huber EMBL Heidelberg
Complex traits Do not follow Mendelian inheritance, result from multiple alleles Responsible alleles contribute different amounts to phenotype Alleles may be present in only a fraction of all individuals with the phenotype ACTGAATTACGATTGATTACAATTAGCGGTACGTAG ACTGAATTAGGATTGATTACAATTAGCGTTACGTAG Often show environment interactions
The genetic basis of complex traits Currently ~400 variants that contribute to common traits and diseases are known Even when dozens of genes have been linked to a trait, both the individual and the cumulative effects are disappointingly small (<5%) Limitations to current genome-wide association studies: Common SNPs miss rare variants with potentially huge effects Structural variants can go undetected if they are unlinked to SNPs Epistasis, where the effect of one variable may not be found without knowing the other, confounds identification Research with model systems is needed in order to make progress in human genetics
Simple example: yeast sporulation efficiency BC187 YPS606 oak vineyard Sporulation efficiency: 99% 3.5% 374 recombinant segregants 5 linked loci linkage analysis Gehrke et al., Science 323 (2009)
Analytical series expansion Parameter fit (ANOVA)
phenotype A B X A*B* wt A* B* B A A*B* wt A* B A Pairwise interactions positive (synergistic) interaction negative (buffering) interaction B* negative (suppressing) interaction A*B* wt A* B*
Genetic interactions by combinatorial RNAi and multiparametric imaging Bernd Fischer (EMBL) Thomas Horn Thomas Sandmann Michael Boutros (DKFZ)
Imaging Cells grow for 5 days Fixate and stain with Hoechst (DNA content) Image with Cytometry laser scanner (TTP LabTech Acumen Explorer)
Image Analysis Foreground Background Original Image Segmentation 1 (propagate) Segmentation 2 (fixed size)
Image processing Data analysis Interaction network Subset of 96 Drosophila kinases and phosphatases Screening in 384-well microscopy plates Mapping the interaction network amongDrosophila kinases and phosphatases 192 reagents 192 reagents Expressed in S2 cells (RNA-Seq) Two independent RNAi designs Knock-down validation (qPCR) Query 1 Query 2 96 plates (~37.000 wells) 4.600 distinct gene pairs Readout: Hoechst nuclear staining Multi-parametric analysis: number of nuclei; summary statistics ofsize, intensity distribution
Feature Extraction median sum quantiles histograms intensity area of Features per cell intensity area Per well statistics: number of cells In total 52 features
Screen Plot (Number of Cells) Within screen replicates: Every dsRNA pair is tested twice
Estimating Genetic Interactions For many phenotypes, the main effects (single gene) are multiplicative for non interacting genes Additive on logarithmic scale: effect of control main effect of dsRNA j error term interaction term measurement (nr cells, growth rate, …) main effect of dsRNA i Estimation of Main Effects:
Reproducibility of the main effects template dsRNA left vs right template vs query dsRNA dsRNA design 1 vs 2 for a few genes: different efficiency or off-target effect different spotters
Main effects across different experiments & conditions 3 “replicate” screens: • S2 (serum, seed 15,000 cells, incubate 5d) • Dmel2 (no serum, seed 15,000 cells, incubate 4d) • Dmel2 (no serum,seed 13,500 cells, incubate 5d) 2 3 3 2 1 1
z-scores of genetic interactions Comparing independent dsRNA-pairs:
Interaction profiles reflect functional relationships Gap1 drk pnt Sos csw phl MapK Downstream rl MapK Upstream drk Sos Ras85D MapK Inhibitors Gap1 Rho1 Pos. interaction with MapK, but not with Gap1
Epistasis j = Gap1 j = neg. ntrl. j = Rho1 gij mi mi mi Non-Interacting genes
Dynamic modelling of cell cycle dynamics from live-cell imaging movies Gregoire Pau Collaboration with Thomas Walter, Jean Karime Heriché, Faba Neuman, Jan Ellenberg, and Mitocheck consortium
Mitocheck screen HeLa cells expressing H2B GFP tag Genome-wide siRNA screen, ~21000 genes, 2-3 siRNAs per gene, ~2 replicates each Time-lapse imaging for 48 h, t = 30 min 186322 video sequences, ~5 TB of video data 24
Cell phenotypes Cell segmentation, feature extraction, classification by SVM 4 classes considered Interphase Mitotic Polynucleated Apoptotic 25
Cell population time courses scrambled scrambled KIF11 COPB1 26
Rationale Derive phenotypic descriptors from video sequences Should be robust and biologically relevant Goals Outlier detection Grouping genes by phenotypic similarity (classification, clustering) 27
Model Temporal dynamics of cell state change on population level Initial conditions: at transfection time, t=0 nI(0) = (1- )n0 nM(0) = n0 nP(0) = nA(0) = 0 28
Model Transition rates: time of appearance i final strength i 29
Model fitting 13 unknown parameters: Penalized least squares regression, minimizing: ODE solved by Runge-Kutta 4th-order Minimization done by Levenberg-Marquardt algorithm Positivity constraint on Penalty term to improve the variance of the estimates Sample 64 different initial conditions RSS Penalty 30
Fitting examples 1 1 scrambled scrambled 2 KIF11 COPB1 31
Fitting error as a quality metric Residual sum of squares is a measure of discrepancy between fitted data and the model Aberrant data is likely to give high RSS values Useful for QC Loss-of-focus Giant bright artefact Loss-of-focus 32
Prediction of past and future n0 KIF11 n0 COPB1 33
the parameters are biologically relevant Each phenotype is characterized by 6*2 parameters For each k: amplitude i and time of appearance i Amplitudes: median duration of mitosis log(2) / (3+4+5) e.g. on negative controls: 1.4 h (mad 0.2 h) median cell cycle duration: 22.3 h (mad 4.1 h) consistent with what we expect from the HeLa cell line 34
LDA projection of parameter space Classif. = 98.78 % Z'-KIF11 = 0.50 Z'-COPB1 = 0.18 35
Cell death phenotype RNA helicase DDX39 Early death phenotype, onset 2 = 29 h Previously described loss-of-function phenotype in C.elegans and S.cerevisiae: apoptosis 36
Late death phenotype PRPF8 and SF3B1 - components of the spliceosome PRPF8 Median 2 = 0.02 2 = 44 h SF3B1 Median 2 = 0.02 2 = 44 h 37
Next steps Cell tracking to allow direct observation of residence time distributions in different states (sampling time, computational demand....) Integration of tracking with modelling to improve tracking performance Model cells‘ „individuality“ More markers
Conclusion Automated microscopy of cell-based (...) systems perturbed by combinations of RNAi, compounds is allowing major advances in genetics Genetics: from loss-of-function to allelic variation; from single gene phenotypes to gene networks with combinatorial phenotypes Cell biology: quantitative dynamic modelling based on imaging data
Thank you Simon Anders Elin Axelsson Ligia Bras Richard Bourgon Bernd Fischer Audrey Kauffmann Gregoire Pau Oleg Sklyar Robert Gentleman, F. Hahne, M. Morgan (FHCRC) Lars Steinmetz, J. Gagneur, Z. Xu, W. Wei (EMBL) Michael Boutros, F. Fuchs, D. Ingelfinger, T. Horn, T. Sandmann (DKFZ) Steffen Durinck (Illumina) All contributors to the R and Bioconductor projects
EMBL is recruiting Engineering Mathematics Informatics Biology Chemistry Physics www.embl.org/phdprogramme www.embl.org/postdocs www.embl.org.jobs Heidelberg