210 likes | 327 Views
New ressources meeting: cellular phenotype image databank. Wolfgang Huber. So what are cellular phenotypes?. Cellular Phenotype Screens. Library of perturbation reagents: RNAi (~20k) over-expression constructs (~1k) small mole- cules (~1M). cell line that does something
E N D
New ressources meeting: cellular phenotype image databank Wolfgang Huber
Cellular Phenotype Screens Library of perturbation reagents: RNAi (~20k) over-expression constructs (~1k) small mole- cules (~1M) cell line that does something (e.g. mitotic growth; differentiation; respond to a signal) readout
Moffat et al.Nature Reviews Molecular Cell Biology7, 177–187 (March 2006)
Monitoring tools Plate reader 96 or 384 well, 1…4 measurements per well Flow Cytometer 4…8 measurements per cell, thousands of cells per well Automated Microscopy unlimited
Example 1: A genome-wide siRNA screenon HeLa cells to identify modulators of cell morphology (apoptosis, cell cycle, …) 1 Data from Florian Fuchs, Michael Boutros, DKFZ Heidelberg Show imageHTS3 2 Original image data 1. Negative control (siRNA against Renilla luziferase) 2. Elongated cell morphology after silencing GPR124 3. Mitotic arrest after silencing CDCA1 3 12 images per probe: 4 images in each of Hoechst-, Tritc- and Fitc-channels 22848 probes in total x 2 datasets
Example 2:Mitocheck time lapse data Ellenberg & Pepperkok groups, Mitocheck consortium Live cell time-lapse imaging • HeLa cell line expressing H2B GFP • seeded on siRNA spots and grown during ~48h • fluorescence time-lapse live imaging (sampling rate=30 min) Experimental output • video sequences of 96 images (1024x1024) • 100 MB per spot • ~200,000 spots (20 TB) Show Movie
phenotype features genes (reagents)
Why Now? • Whole genome RNAi screening has become an accessible tool for many labs • Academic institutions are establishing their own compound libraries to do small molecule screens • Advances in automated microscopy
What are the experiments of interest? Cell-based assay (cell-culture, not: tissue samples/histology, not organismal development) High content assay screening a large or complete set of reagents (RNAi, compounds) Read-out: for each reagent, a 2D-5D image of the same format (possibly replicates) Any biological process: - (de-)activation of a signaling pathway - cell differentiation - changes in the cell cycle dynamics - morphological changes - activation of apoptosis
What are the benefits of collecting these data? Article supplements hide most of the raw data and are not standardized Clustering of phenotype profiles is the more exact, the more features
User types Gene-oriented researcher who wants to know everything about her gene. Biological process oriented researcher who wants to get all the genes involved in his phenotype. Systems biologist who wants to infer modules (pathways) of gene products from data.
What for? • Gene annotation: link every gene in Ensembl to observed phenotypes in cell-based assays (so do we ignore compound screens?) • Phenoytpic clustering: genes with similar phenotype profiles have similar function
Phenotypes are not as simple as sequences or structures The mapping: single-gene knockdown to phenotype depends on genetic background, genetic interactions, environment. The "phenotypic profile of a gene" (in absolute terms) is a misleading oversimplification. But genes can be meaningful grouped/categorized by comparing their phenotypic profiles if measured uniformly.
How to navigate the data? By reagent (e.g. siRNA) By phenotype (standardized ontology, or as defined by author / publication?) Any attempt for us to reproduce the authors' image analysis and phenotype assignment? To represent and query intermediate results? --- many arguments for making this a separate task.
Three Tiers • Archive: Raw images/movies • Warehouse: Extracted numbers (one 'vector' per reagent) - idiosyncratic per experiment • Compendium: meaningful comparisons of numbers are possible between experiments
Preprocessing • The computations that turn images (with various number of "colors", resolution, 2...5D) into numbers are not standardized and likely not standardizable • They need not work image-by-image, but depend on the whole set (e.g. trainining of machine learning algorithms) • Even the authors' software implementations are usually inaccessible to others (we are working on this) • Reproduction is extremely difficult (in practice impossible)
Replacement costs • Simple genome-wide screen: 200-500 k€ • Mitocheck-type: ~ 5 M€
Secondary benefits • Having several different sets of raw data in one place will facilitate development of best practice analysis strategies
Related projects • Genome RNAi (Boutros, DKFZ) • FLIGHT (Baum, LICR) • Mitocheck (Ellenberg, Durbin/Sanger) • OME
Possible Next Steps • Create prototype database(s) as part of (and funded through) specific collaborations • Assemble a by-invitation only workshop (eg. WT) to test whether there is sufficient interest in the community (e.g. Ellenberg, Krausz/MPG, Pepperkok/EMBL, Durbin, Heriché/Sanger, Altschuler/UTSW, A. Carpenter/MIT, Boutros, Eils/DKFZ, Pelkmans/ETH, Taipale/Helsinki) Companies?