1 / 44

Gene Expression Data Analyses (2)

Learn about RNA isolation, labeling, hybridization, image analysis, data representation, and normalization techniques in gene expression data analyses.

adrake
Download Presentation

Gene Expression Data Analyses (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Expression Data Analyses (2) Trupti Joshi Computer Science Department 317 Engineering Building North E-mail: joshitr@missouri.edu 573-884-3528(O)

  2. Recap (Lecture 1) • RNA is first isolated from different tissues, developmental stages, disease states or samples subjected to appropriate treatments. • RNA is then labeled and hybridized to the arrays using an experimental strategy that allows expression to be assayed and compared between appropriate sample pairs. • Use a single label and independent arrays for each sample, or a single array with distinguishable fluorescent dye labels for the individual RNAs. • Regardless of the approach chosen, the arrays are scanned after hybridization and independent grayscale images, typically 16-bit TIFF images, are generated for each pair of samples to be compared. • Images are then analyzed to identify the arrayed spots and to measure the relative fluorescence intensities for each element.

  3. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  4. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  5. Spotted Array Cy5 Cy3

  6. Quality of Images Common problems: • Spot is not regular (e.g. not round, donut shape) • Hybridization is not even (e.g. half is good) • Hybridization with fog • The hybridization is too weak or saturated

  7. Image Processing • Gridding • Identifying spot locations • Segmentation • Identifying foreground and background • Processing techniques • Manual vs. semiautomatic gridding • Variety of segmentation techniques

  8. Segmentation

  9. Irregular size or shape Irregular placement Low intensity Saturation Spot variance Background variance Data Quality (1) miss alignment artifact bad print indistinguishable saturated

  10. Data Quality (2) • Calculate numeric characteristics of each spot • Throw out spots that do not meet minimum requirements for each characteristic • Throw out spots that do not have minimum overall combined quality

  11. Tips for Image Scan • Image format: 16 bit TIFF (0-65,536 intensity values) • Color: Rainbow palette data display for easy viewing • Adjust scanning resolution:  5, 10, 20 and 50 µm • Adjust the saturation rates (not many red spots)

  12. Signal Extraction • Many softwares are available (Imagene, GPC VisualGrid, TIGR SpotFinder, etc) • Most of them are effective

  13. Tips for Signal Extraction • Signal/noise ratio>+1.96 • Background area selection • Spot finding automation • Batch processing ability might not be good • Bad spots should be removed

  14. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  15. Expression Ratio • Consider an array that has Narraydistinct elements, and compare a query (R) and a reference sample (G), (for the red and green colors commonly used to represent array data), then the ratio (T) for the ith gene (where i is an index running over all the arrayed genes from 1 to Narray): • Usually use log2(Ti) • Reflect the up-regulated and down-regulated genes

  16. Log Transformations • Logarithm base 2 transformation, has the advantage of producing a continuous spectrum of values and treating up and down regulated genes in a similar fashion. • The logarithms of the expression ratios are also treated symmetrically, such that • genes up regulated by a factor of 2 has a log2(ratio) of 1, • gene down regulated by a factor of 2 has a log2(ratio) of −1, • gene expressed at a constant level (ratio of 1) has a log2(ratio) equal to zero.

  17. Example Gene 1 2 3 4 5 • R: Cy3: 0.1, 0.6, 0.3, 0.3, 0.5 • G: Cy5: 0.2, 0.3, 0.6, 0.2, 0.5 Thus Gene 1: log2(0.1/0.2) = -1 Gene 2: log2(0.6/0.3) = 1 ….. Gene 4: log2(0.3/0.2) = 0.58 …

  18. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  19. Data Normalization Uncalibrated, red light under detected Calibrated, red and green equally detected

  20. Rational for Data Normalization • Unequal quantities of starting RNA • Differences in labeling • Differences in detecting efficiencies between the fluorescent dyes • Scanning saturation • Systematic biases in the measured expression levels

  21. Two normalization • Normalization within slides • Normalization between slides

  22. Normalization Benefits • Can control for many of the experimental sources of variability (systematic, not random or gene specific) • Bring each image to the same average brightness

  23. Assumptions for Data Normalization • The average mass of each molecule is approximately the same, thus the molecule number in each sample will be the same • The arrayed elements represent a random sampling of the genes in the organism • The number of molecules from each sample to hybridize array are similar thus the total intensity for each sample will be the same

  24. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  25. Data Normalization Methods • Scaled Normalization • By total intensity • By mean • By median • By a group of genes • Linear regression analysis • Lowess normalization • Log centering • Rank invariant methods • Chen’s ratio statistics

  26. Scaled Normalization by Total Intensity • Gi and Riare the measured intensities for the ith array element • Log2(Ti’) is the normalized value

  27. Example Gene 1 2 3 4 5 • R: Cy3: 0.1, 0.2, 0.3, 0.3, 0.5 • G: Cy5: 0.2, 0.5, 0.6, 0.2, 0.5 Ntotal = (0.1+0.2+0.3+0.3+0.5)/(0.2+0.5+0.6+0.2+0.5) =1.4/2 =0.7 Thus gene 1: log2(0.5)-log2(0.7) …

  28. Other Scaled Normalization • Substitute the Ntotal by Nmean, Nmedian • For the normalization for a subset of genes, use the values generated from a subset of genes instead of all genes during the transformation

  29. Regression Normalization • Fit the linear regression model: • Assumption: all the genes on the array have the same variance (homogeneity) • Test the significance of the intercept . Fit a linear regression without  if it is insignificant. • Transform the treatment data: • Problem: • assumption may not hold • nonlinear trend (the third replicates of RL95 data has a slight quadratic trend) .

  30. Scatter Plot of Log Intensity before vs. after Regression Normalization

  31. Problem for Above Normalization • Only take care of the intensities between channel • Do not take into account systematic bias that may appear within the data • The log2(ratio) values can have a systematic dependence on intensity most commonly a deviation from zero for low-intensity spots.

  32. Systematic Intensity-dependent Effects of log2(ratio) • Examples: • Under-expressed genes appear up-regulated in the red channel. • Moderately expressed genes appear up-regulated in the green channel. • Explanation: Chemical dyes don’t fluoresce equally at different levels because of different levels of quenching (a phenomenon where dye molecules in close proximity, re-absorb light from each other, thus diminishing the signal) • Solution: Easiest way to visualize intensity-dependent effects is to plot the measured log2(Ri/Gi) for each element on the array as a function of the log2(Ri*Gi) product intensities. • Such 'R-I' (for ratio-intensity) plot can reveal intensity-specific artifacts in the log2(ratio) measurements.

  33. R-I Plots

  34. Lowess Normalization • Lowess (Locally weighted linear regression) analysis • It may remove the intensity-dependent effects in the log2(ratio) values

  35. How to do Lowess Normalization • Normalize the value point by point • Generally require defined percent for local area (e.g. 20%) • Lowess normalization requires a ratio (two dyes experiments only)

  36. Effects of Lowess Normalization on R-I Plot

  37. Globe vs Local Normalization The pin may generate some bias: one region has a larger spots. Problem: May cause variance of one region to be different from that of another region

  38. Variance Regularization • Assume that each subgrid has M elements, (with mean of the log2(ratio) values in each subgrid already adjusted to zero), then variance in the nth subgrid is • If the number of subgrids in the array is Ngrids, then the appropriate scaling factor for the elements of the kth subgrid is • Scaling all of the elements within the kth subgrid by dividing by the same value ak computed for that subgrid

  39. Replicate Filtering • Technical replication in two-color spotted array analysis (dye-reversal or flip-dye analysis), consists of duplicating labeling and hybridization by swapping the fluorescent dyes used for each RNA sample. • May help to compensate for any biases that may occur during labeling or hybridization; for example, if some genes preferentially label with the red or green dye.

  40. Replicate Filtering Outliers excluded

  41. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  42. Normalization between slides • Use scaled normalization • Generally preferred medium for normalization

  43. Lecture Outline • Image analysis • Data representation • Data Normalization • Normalization within slides • Scaled normalization • Linear regression normalization • Lowess Normalization • Global vs. Local normalization • Variance regularization • Replicate Filtering • Normalization between slides

  44. Reading Assignments Suggested reading: • Quackenbush J. Microarray data normalization and transformation. 2002. Nature Genetics, 32: 496-501. • Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. 2002. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.Nucleic Acids Res.30: e15.

More Related