1 / 25

Normalization of Microarray Data

Normalization of Microarray Data. Henrik Bengtsson (hb@maths.lth.se) Terry Speed (terry@stat.berkeley.edu). - how to do it!. Outline. The X Data Set (R,G)  (M,A) Transformation Background correction or not? Within slide normalization Across slide normalization

jorryn
Download Presentation

Normalization of Microarray Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization of Microarray Data Henrik Bengtsson (hb@maths.lth.se) Terry Speed (terry@stat.berkeley.edu) - how to do it!

  2. Outline • The X Data Set • (R,G)  (M,A) Transformation • Background correction or not? • Within slide normalization • Across slide normalization • Identifying differentially expressed genes • The X2 Data Set

  3. The X Data Set • All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1-3, (b) to slide 4-6, and (c) to slide 7-9. • All data is collected by GenePixTM Scanner and Software. The following analysis was done using [R] and the sma library by Terry Speed Group.

  4. (R,G)  (M,A) Transformation “Observed” data {(R,G)}n=1..5184: R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)}n=1..5184: M = log2(R/G) (ratio), A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity)  R=(22A+M)1/2, G=(22A-M)1/2

  5. Background correction or not? Decision 1: No background correction

  6. Within Slide Normalization Question: What kind of normalization should be applied: • No normalization, or • Global (lowess) normalization, or • Print-tip normalization, or • Scaled print-tip normalization?

  7. No Normalization Non-normalized data {(M,A)}n=1..5184: M = log2(R/G)

  8. Global (lowess) Normalization Global normalized data {(M,A)}n=1..5184: Mnorm = M-c(A) where c(A) is an intensity dependent function.

  9. Print-tip Normalization Print-tip normalized data {(M,A)}n=1..5184: Mp,norm = Mp-cp(A); p=print tip (1-16) where cp(A) is an intensity dependent function for print tip p. Print-tip layout

  10. Scaled Print-tip Normalization Scaled print-tip normalized data {(M,A)}n=1..5184: Mp,norm = sp·(Mp-cp(A)); p=print tip (1-16) where sp is a scale factor for print tip p (Median Absolute Deviation). After print-tip normalization After scaled print-tip normalization

  11. Spatial Effects No normalization Global normalization Scaled Print-tip normalization Print-tip normalization

  12. Another Quick Example Scaled print-tip normalization:

  13. Within Slide Normalization Summary Question: What kind of normalization should be applied: • No normalization, or • Global (lowess) normalization, or • Print-tip normalization, or • Scaled print-tip normalization? Decision 2: Scaled print-tip normalization.

  14. Across Slides Normalization Scaled print-tip normalization Median Absolute Deviation (MAD) Scaling Averaging

  15. Average Over All Slides The “average” slide:

  16. Cutoff by M values Top 5% of the absolute M values (|M| > 0.56):

  17. Cutoff by T values Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03:

  18. SE Cutoff Level In this data set, the number of genes found is insensitive to the SE cutoff level. About 1000 of the genes with smallest SE can be cutoff before it affects the final results.

  19. 103 Differentially Expressed Genes Top 5% of the absolute T values (|T|>8.6) s.t. SE(M) > 0.03, and top 5% of the absolute M values (|M|>0.56):

  20. Location of Differentially Expressed Genes Location of the 4x4 grid sized microarray

  21. 25 Differentially Expressed Genes Gene: MavgAavgT SE 1 -2.26 9.9 -18.0 0.125 2 -1.97 10.3 -14.5 0.136 3 -1.50 9.6 -14.7 0.102 4 -1.47 9.8 -12.2 0.121 5 -1.40 9.3 -11.9 0.118 6 -1.30 9.9 -14.4 0.090 7 -1.29 9.7 -14.6 0.088 8 -1.28 10.0 -12.7 0.101 9 -1.27 9.2 -13.6 0.094 10 -1.19 10.7 -13.7 0.087 11 -1.18 9.8 -11.4 0.103 12 -1.17 9.9 -20.7 0.057 13 1.12 11.3 13.5 0.083 14 -1.07 11.4 -13.3 0.080 15 -1.05 9.6 -12.8 0.081 16 -1.02 9.9 -12.0 0.085 17 -1.01 9.3 -11.8 0.086 18 -0.99 11.0 -13.6 0.073 19 -0.99 9.8 -11.4 0.087 20 -0.97 10.5 -13.8 0.070 21 -0.96 9.6 -12.5 0.077 22 0.95 11.5 11.6 0.082 23 -0.94 10.3 -25.0 0.038 24 -0.93 9.8 -13.5 0.068 25 -0.90 11.6 -12.0 0.075 Top 2% of the absolute T values (|T|>11) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.9):

  22. The X2 Data Set All slides are replicates and contains 5184 spots/genes. Three identical RNA preparations were done; (a) was hybridized to slide 1 & 2, (b) to slide 3 & 4, and (c) to slide 5 & 6.

  23. 93 Differentially Expressed Genes Top 5% of the absolute T values (|T|>5.6) s.t. SE(M) > 0.03) and top 5% of the absolute M values (|M|>0.38):

  24. 25 Differentially Expressed Genes Gene: MavgAavgT SE 1 1.97 12.5 8.3 0.237 2 1.27 9.7 18.2 0.070 3 1.23 13.2 7.5 0.164 4 1.12 12.3 19.2 0.058 5 0.93 14.2 7.7 0.122 6 0.86 13.7 10.2 0.085 7 -0.86 12.5 -8.1 0.106 8 -0.85 13.0 -17.0 0.050 9 -0.81 12.7 -16.3 0.050 10 -0.75 11.1 -8.6 0.088 11 -0.72 11.4 -11.4 0.063 12 -0.71 13.9 -15.6 0.045 13 0.66 10.0 9.4 0.071 14 0.66 10.8 9.2 0.072 15 -0.64 12.5 -15.2 0.042 16 0.64 9.6 7.9 0.081 17 -0.61 12.5 -7.5 0.081 18 -0.60 12.8 -18.2 0.033 19 0.59 11.4 8.3 0.071 20 -0.59 13.7 -8.3 0.071 21 -0.58 10.5 -7.2 0.081 22 -0.56 12.0 -12.5 0.045 23 0.55 11.7 9.1 0.061 24 -0.54 12.6 -7.6 0.071 25 0.53 11.2 9.5 0.056 Top 2% of the absolute T values (|T|>7.1) s.t. SE(M) > 0.03 and top 2% of the absolute M values (|M|>0.53):

  25. Acknowledgement • Thanks to: • Jean Yee Hwa Yang • [R] Software (free): • http://www.r-project.org/ • The Statistical Microarray Analysis (sma) library (free): • http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html

More Related