770 likes | 1.28k Views
Getting Started with CellProfiler. Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA. Software Overview. Available from www.cellprofiler.org Free, open source (Python) Software available for Windows, Mac and Linux. Image Analysis & Quantification.
E N D
Getting Started with CellProfiler Mark-Anthony Bray, Ph.D Imaging Platform, Broad Institute Cambridge, Massachusetts, USA
Software Overview • Available from www.cellprofiler.org • Free, open source (Python) • Software available for Windows, Mac and Linux Image Analysis & Quantification Image-centric Data Analysis
CellProfiler: Overview • Process large sets of images • Identifies and measures objects • Export data for further analysis • Goal: Provide powerful image analysis methods with a user-friendly interface • Philosophy: Measure everything, ask questions later... • Support data analysis based on individual cells
Typical CellProfiler Pipeline Workflow • For image-based assays, the basic objective is always to • Identify cells/organisms • Measure feature(s) of interest • The uniqueness of each assay comes in • Deciding what compartments to identify and how to identify them • Determining which measure(s) are most useful to identify interesting samples
The CellProfiler Interface • Pipeline panel: Displays modules in pipeline • Modules executed in order from top to bottom Module help Add or remove modules Change module position
The CellProfiler Interface • File panel: Displays files in default image folder Load pipeline by double-clicking on it View images by double-clicking on the filename
The CellProfiler Interface • The figure window has additional menu options • Toolbar menu: Pan, zoom in/out • CellProfiler Image Tools • Image Tool (also displayed by clicking on image) • Interactive zoom • Show pixel data (location, intensity)
The CellProfiler Interface • Folder panel: Change default input and output directories • Usually these should be separate folders Input folder: Contains images to be analyzed Output folder: Contains the output file plus exported data and images
The CellProfiler Interface • Settings panel: View and change settings for each module • Clicking on a different module updates the settings view
Module Categories • File processing: Image input, file output • Image processing: Often used for pre-processing prior to object identification • Object processing: Identification, modification of objects of interest • Measurement: Collection of measurements from objects of interest • Data Tools: Measurement exploration, measurement output
The First Module: LoadImages • Loads an “image set” which is a group of related images, in preparation for further processing • Related how? Depending on the imaging device, one file may represent • One channel at one imaging location • Multiple channels at one imaging location • Multiple channels at multiple locations • Etc… DNA GFP
The First Module: LoadImages • Can use text matching to define the difference between images in a set All images stained for GFP have the text Channel1- in the name Assign each image a meaningful name name for downstream reference Same for DNA images (Channel2-)
What Is An “Image”? • Images from Carolina Wahlby
Object Identification • Once the images are loaded, how do you find objects of interest? • Step 1: Distinguish the foreground from the background by picking a good threshold • Step 2: Identify objects as regions brighter than the threshold • Step 3: Cut and join objects to “improve” their shape
Primary Object Identification • Many options for thresholding, cut and join methods, etc.
Thresholding • Definition: Division of the image into background and foreground Here? • What is the best threshold value for dividing the intensity histogram into foreground and background pixels… Frequency Or here? Pixel values • Method: Pick the method that provides the best results • Otsu: Default - Good for readily identifiable foreground / background • Background, RobustBackground: Good for images in which most of the image is comprised of background
Thresholding • Correction factor • Multiplication factor applied to threshold • Adjusts threshold stringency/leniency • Setting this factor is empirical • Upper/lower bounds • Set safety limits on automatic threshold to guards against false positives • Helpful for unexpected images: Empty wells, images with dramatic artifacts, etc
Object Separation • Once the foreground objects have been identified, we need to distinguish multiple objects contained in the same “clump” • • • • • • • • Images from Carolina Wahlby
Object Separation • Two step process in “de-clumping” • Identification of the objects in a clump • Drawing boundaries between the clumped objects Adjust settings to “de-clump” objects
Object Separation • Intensity: Works best if objects are brighter at center, dimmer at edges • Shape: Works best if objects have indentations where clumps touch (esp. if objects are round) • Clump identification: Two options Peaks • • • • • • • • 1 2 1 Indentations 1 2
Object Separation • Drawing boundaries: Two options • Distance: Draws boundary lines midway between object centers • Intensity: Draws boundary lines at dimmest line between objects • • • • • • • • 1 • Test mode allows users to view results of all setting combinations
Object Separation • Additional separation settings: Adjust these settings if objects are being incorrectly split into pieces or merged together Original image Smoothing filter size = 4 Smoothing filter size = 8 • Smoothing: Increase to reduce intensity irregularities which produce over-segmentation of objects
Object Separation • Suppress Local Maxima • Smallest distance allowed between object intensity peaks to be considered one object rather than a clump • Decrease to reduce improper merging of objects in clumps Maxima Original image Maxima distance = 4 Maxima distance = 8
Object Separation • However…. • Adjusting these parameters can produce more improper segmentation than it solves • The proper settings are usually a matter of trial and error • The automatic settings are a good starting point, though Original image Smoothing filter size = 4 Smoothing filter size = 8
Filtering Invalid Objects • See FilterObjects module for more advanced filtering options Discard objects that fail size criterion or touch the image border
Primary Object Identification • Colors used to label each segmented object • Shows if each object has been identified and separated properly • Outlines highlight valid objects • Green: Valid • Yellow: Invalid – Touching border • Red: Invalid – Size criterion • Gives object count as a measurement
Secondary Object Identification • Goal: Identify individual cell boundaries by “growing” primary objects using a staining channel • Nuclei typically more uniform in shape, more easily separated than cells • Segment nuclei first, then use segmented nuclei to start cell segmentation
Secondary Object Identification • Methods • Distance-N: Ignores image information • Useful in cases where no cell stain is present • Watershed, propagate, Distance-B: Uses image information • Finds dividing lines between objects and background / neighbors • Test mode allows user to view results of all methods Distance-N Propagation
Secondary Object Identification • Regularization: Controls the precise dividing line between cells that touch each other • Performed by balancing between intensity and distance • Usually not adjusted Regularization = ∞ Regularization = 0 • Correction factor, lower/upper bounds on threshold: Same purpose as in IdentifyPrimaryObjects
Tertiary Object Identification • Goal: Identify tertiary objects by removing the primary objects from secondary objects • “Subtract” the nuclei objects from cell objects to obtain cytoplasm Cells Nuclei Cytoplasm — ═
Measurement Modules: Object Morphology Select the objects to measure
Module: MeasureObjectAreaShape • Goal: Measure morphological features such as • Area • Perimeter • Eccentricity • MajorAxisLength • MinorAxisLength • Orientation • FormFactor: Compactness measure, circle = 1, line = 0
Measurement Modules: Object Intensity Select the image to measure from Select the objects to measure
Module: MeasureObjectIntensity • Goal: Measure object intensity features such as • Integrated intensity: Sum of the pixel intensities within an object • Mean, median, standard deviation intensities • Maximal and minimal pixel intensities • Lower/Upper quartile • The object intensity may be obtained from any image, not just the image used to identify the object • Example: Ph3 intensity may be measured using the nuclei objects
Measurement Modules: Object Texture Select the image to measure from Select the objects to measure Select the spatial scale
MeasureObjectTexture • Goal: Determine whether the staining pattern is smooth on a particular scale • Selection of the appropriate texture scale is essentially empirical • A higher number measures larger patterns of texture • Smaller numbers measure more localized (finer) patterns of texture • Can also add several texture modules to the pipeline, each measuring a different texture scale
Other Measurement Modules • CalculateMath: Arithmetic operations for measurements • CalculateStatistics: Assay quality (V and Z' factors) and dose response data (EC50) for all measurements • Image-based measures • MeasureImageAreaOccupied • MeasureImageGranularity • MessureImageIntensity • Object-based measures • MeasureCorrelation • MeasureObjectNeighbors • MeasureRadialDistribution
Data Export Modules • User may output images or image measurements Select the objects to export
Measurement Display • The average measurements for all objects in the image are displayed in the figure window • However, the individual measurements for each object are stored in the output file
Data Export Modules • Goal: Retain images of intermediate image processing steps for quality control or save measurements for later analysis and exploration • SaveImages: Writes an image to a file • Intermediate images in the pipeline are not saved unless requested • Choice of many image formats to write → module can be used as an image format converter • ExportToSpreadsheet: Export measurements as a comma-separated file readable by spreadsheet programs • ExportToDatabase: Export measurements as a per-object and per-table plus configuration file for upload to a MySQL database
Illumination Correction • The physical limitations of any microscope produce nonuniformities in the optical path of the sample, microscope, and/or camera (b) (a) • Example: Tiling raw images shows that there is uneven illumination from left to right in each image • This heterogeneity can lead to inaccurate intensity measurements • A cell located at (a) is brighter than one at (b) even if the cells have the same amount of fluorescent material Carpenter et al, Genome Biology 2006, 7:R100
Illumination Correction • Illumination correction ensures that object segmentation and measurements (e.g. DNA content) are more accurate Carpenter et al, Genome Biology 2006, 7:R100
Illumination Correction • Two modules • Correct Illumination Calculate: Creates a illumination correction function • Correct Illumination Apply: Applies the function to your images • Available options • Correct each image individually, or all images together as an ensemble? • Calculate the illumination function by using foreground pixels or background pixels? • Apply the function using division or subtraction? • Additional considerations • Create a new illumination correction function if you image on a different microscope or change plates • Correct each channel since absolute illumination intensities may differ between channels • First, create and save the function from image set, then load and apply it prior to identification
Cluster Computing • If processing time is too great on a single computer, then run the pipeline on a cluster • Download and install CellProfileron a computing cluster • Add the ExportToDatabase module • Add the CreateBatchFiles module to the end of the pipeline and configure it appropriately • Run the first image cycle locally • Submit the batches to your cluster for processing • Check the progress of processing • For really big screens, it is necessary to process images in batches on a computing cluster.
Data Analysis • At the end of a pipeline, you may have 500+ features per cell • Size, shape, staining intensity, texture (smoothness), etc • Remember our Philosophy: “Measure everything, ask questions later...”
+1 0 -1 Data Analysis • What does this data set look like? • Cytological profile, or Cytoprofile • Shows all the measurements acquired • For each individual cell • In every image • In the entire experiment. -.2 .7 -.1 0 .2 -.9 Cell #6111617
CellProfiler Analyst: Overview • Explore data large sets of images • Identify interesting subpopulations and see the original images • Identify interesting phenotypes automatically • Goal: Provide the user with a powerful suite of image exploration and machine learning methods
The CellProfiler Analyst Interface • CellProfiler Analyst (CPA) allows you to explore the data with a variety of tools • Upon startup, CPA request a properties file which contains • Locations of the measurement tables • How the images are referenced • Other assorted information
Plate Viewer • Displays data in plate layout • 96- or 384-well format • Measurements are shown as color-coded wells or mouse tool-tips • Right-clicking on well reveals list of images to display
Image Viewer • Displays an image referenced by number • Color display • Colors are assigned to each channel of image data • Shown as a merged color image • Toggle channel visibility and color scaling