1 / 20

Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen

Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen. Gregoire Pau, Oleg Sklyar, Wolfgang Huber EMBL-EBI Cambridge Florian Fuchs, Michael Boutros DKFZ Heidelberg. Experimental setup. Genome-wide cell array screen with HeLa cells

zoey
Download Presentation

Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovery of (new) phenotypesin a genome-wide RNAi HeLa cell imaging screen Gregoire Pau, Oleg Sklyar, Wolfgang Huber EMBL-EBI Cambridge Florian Fuchs, Michael Boutros DKFZ Heidelberg

  2. Experimental setup • Genome-wide cell array screen with HeLa cells • Seeded, incubated for ~48h and stained with 3 markers • ~18000 genes knockdown (1 gene  1 well) • Actin (TRITC) • Tubulin (Alexa 488) • DNA (Hoechst) Florian Fuchs, Michael Boutros DKFZ Heidelberg

  3. Gene phenotypes • Gene phenotype: phenotype expressed by a population of cells • Gene phenotype ≠cell phenotype ! • Examples: • No phenotype (observed on negative control empty wells)

  4. Gene phenotypes • Examples • Apoptotic phenotype (observed on a COPB well) • Elongated phenotype (well LOC51693)

  5. Cell phenotypes • Frequently observed cells • But they are many other more ! O Interphase O Mitotic O Dead cell

  6. Goal Find new gene phenotypes "Given an input phenotype, how close is it to a known gene phenotype ?" ? Input well image No phenotype Apoptotic Elongated

  7. Probabilistic point of view • Let denote the features of a cell i by Xi where Xi Rp • Each cell has p features: • Cell size, nucleus-to-cell size ration, nucleus eccentricity… • Actin Haralick moment, total tubulin, nucleus-to-cell actin ratio… • A gene phenotype is then characterized by a m.v. distribution Z • Where a realization is a set of n cells (X1,…,Xn) drawn from Z Cell feature 2 (X*2) Z Cell Cell feature 1 (X*1)

  8. Models • Outlier detection problem • Given n cells (X1,…,Xn) where Xi Rp • How good are they fitting to a phenotype distribution (model) Z ? • Requires the estimation the density of Z (few samples, n≈p, hard !) • Requires a m.v. goodness-of-fit test (hard ?!) • Hard ! • Different workarounds • Shrinking (by binning) the space Rp by defining K cell classes • Z could be modeled by a simpler (and tractable) distribution

  9. Defining cell classes • Defining K classes (here, K=3) • Counting the number of cells belonging to classes • Classical approach, robust • Needs a good priori biological knowledge • Adapted to clustering but maybe not to novelty detection Cell feature 2 (X*2) Cell feature 2 (X*2) Cell feature 1 (X*1) Cell feature 1 (X*1) O Interphase O Mitotic O Dead cell

  10. Modeling Z • Assuming the phenotype distribution Z is known • Assuming a set of n cells (x1,…,xN) • P(X1=x1,… XN=xN) can be computed • Cells features Xi are independent • Two models: • Z is a normal distribution • Z is a mixture of 3 normal distributions

  11. First model: Z is normal • Independence and normal assumption • A = log(P(X1=x1,… XN=xN)) = i log(p(Xi=xi)) • A is the log-probability that the cells features are similar to Z • Here Z is the distribution of the 'no phenotype' phenotype • Goal: Finding phenotypes far away from the 'no phenotype' • p(X=x)= N(X,X) can be easily estimated on a training set of wells showing no phenotype

  12. Result • Using p=5 dimension cell features • Geometric: nucleus to cell size ratio, cell size, cell eccentricity • Protein: nucleus-to-cell actin ratio, nucleus-to-cell intensity ratio • Log-probability A can be computed on every well (~17000) • Sorting the lowest values Ai • Gives wells with some bluish dead cells, with very low p(X=x), which 'spoil' the sum lp Boring phenotypes: too close to the 'no phenotype'

  13. Workarounds • Naïve solutions ? • Trimming: A', keeping only the 50 % interquantile p(X=x) values • Median: using A''=mediani(log(p(Xi=xi))) • Sorting the lowest A'', 5 new phenotypic classes can be found: • Condensed phenotype • Elongated phenotype • Bi-nucleated phenotype • 'Large cells' phenotype • 'Densely packed small cells' phenotype

  14. Results • Condensed phenotype • Elongated STK39 TENC1 Curly shaped cells LOC51693 KCNT1

  15. Results • Binucleated • Large cells phenotype KIAA0363 ADRB2

  16. Results • Densely packed cells phenotype (empty spot) Artefact ? AFAR3

  17. Note • Cells features • A = log(P(X1=x1,… XN=xN)) = i log(p(Xi=xi)) • A is the log-probability that the cells features are distributed in the same way than the model phenotype • Cell numbers • The number of cells N also can be a discriminating factor ! • Example: in an apoptotic phenotype • B = log(P(N=n)) is easy to compute • But how to combine A and B into a 'global outlier' score ?

  18. Second model: Z is a mixture of 3 normal • Previous model was a coarse approximation • Normal assumption: 'no phenotype' population cells exhibit at least 3 different cell phenotypes (mitotic, interphase and dead cells) • New model • Z is a mixture of 3 normal distributions O Interphase O Mitotic O Dead cells

  19. Model • Density of a cell feature X • P(X=x) = (1- M- D)fI(x) + MfM(x) + DfD(x) • Where M, D are the mixture components of mitotic and dead cells • Where fI, fM and fD are the normal densities of components • Fitting X on a phenotype • Gives A, B but also the mixtures M, D • Can they be used as discriminative parameters ? • Approach similar to the definition of cell classes ? • How to combine A, B M and D to a global 'outlier' score ? • Ongoing work… • … not yet !

  20. Conclusion • Probalistic approach • Suitable for novelty detection • Even using Normal model lead to several phenotype discoveries • May not be extended to a clustering approach • Ongoing work • Results using the 3-component mixture model should be promising • … no ready yet !

More Related