1 / 22

Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Statistical Analysis of cDNA Microarray Data: Challenges and Solutions. Toni Reverter CSIRO – Livestock Industries. AAHL Seminar - 12 Dec. 2002. Logical. cDNA. Distribution. Quantitative Computer Sci. Statisticians Mathematicians ……. Non-Q Biochemists Physiologists Pathologists …….

ayla
Download Presentation

Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Analysis ofcDNA Microarray Data:Challenges and Solutions Toni Reverter CSIRO – Livestock Industries AAHL Seminar - 12 Dec. 2002

  2. Logical cDNA Distribution Quantitative Computer Sci. Statisticians Mathematicians ……. Non-Q Biochemists Physiologists Pathologists ……. 1800s – DATA 30-60s – METHODS 50-70s – SOFTWARE 1980s – COMPUTER     EGG BANANA Source Size “banana omelette” Historical Excitement Balance Interdisciplinary Challenges Time Dependent Data Dependent Human Dependent Chronology Paradigm Skill Integration AAHL Seminar - 12 Dec. 2002

  3. Hysterical d a b Challenges Human Dependent Historical • Traditionally: Statistics grew alongside Agriculture • “Introduction to Statistical Analysis” • Law of Large Numbers • Central Limit Theorem SST = SSM + SSE • Pythagoras Theorem • Nowadays: Statistics alongside (Bio)Technology AAHL Seminar - 12 Dec. 2002

  4. AAHL Seminar - 12 Dec. 2002 Eg. Keren Byrne’s Data Challenges Human Dependent Excitement (source of) Eg. Always log spot intensities and ratios T Speed. “Hints and Prejudices” • Biochemist: My software does it, therefore it’s great! • Statistician: Well, I need further evidence to be convinced

  5. Challenges Human Dependent Balance • Too many Statisticians: Evidence: It takes 1 ship, 10 days to cross the ocean Question: How many days does it take for 10 ships to cross the ocean? Evidence: It takes 1 builder, 10 days to build a wall Question: How many days does it take for 10 builders to build a wall? AAHL Seminar - 12 Dec. 2002

  6. AAHL Seminar - 12 Dec. 2002 Challenges Human Dependent Balance • Too many Statisticians: PHD SCHOLARSHIP Statistical Science Program MATHEMATICAL SCIENCES INSTITUTE THE AUSTRALIAN NATIONAL UNIVERSITY Stipend $22,771 (2002 rate, indexed annually, tax free) A PhD Scholarship (APAI) is being offered by the Mathematical Sciences Institute at The ANU. An ARC Linkage Grant held by Professors Peter Hall (ANU) and Don Poskitt (Monash University), in conjunction with BAE Systems, Melbourne, will fund the scholarship. The research problem is in the area of stochastic control applied to ship motion, and involves the development and implementation of both parametric and nonparametric methods. The successful applicant will have a strong interest in statistical methodology, computational techniques, theoretical analysis, and the development of statistical research problems.

  7. AAHL Seminar - 12 Dec. 2002 No Yes No 60 30 Women? Yes 100 60 Survival Rates: Treated = 30/90 = 33.33% Non-Tr = 60/160 = 37.50% 12.5% Decrease! No Yes No 40 120 Men? Yes 20 60 Survival Rates: Treated = 120/180 = 66.66% Non-Tr = 40/60 = 66.66% No Difference! Challenges Human Dependent Balance • Too many Biochemists: Treated? No Yes 100 150 No Died? Yes 120 120 Survival Rates: Treated = 150/270 = 55.55% Non-Tr = 100/220 = 45.45% 22% Increase!

  8. AAHL Seminar - 12 Dec. 2002 Challenges Human Dependent Balance • Too many Biochemists: r = 0.87 y * * * * r = 0.00 * * * * * * * * * * * * * * * * * * r = 0.00 * * * * * * * * * * x

  9. AAHL Seminar - 12 Dec. 2002 Challenges Human Dependent Interdisciplinary Skills Minimal knowledge of the application discipline is needed …..failing that, the Statisticians will win, ..…but with the wrong weapons. • Amount of Expression = Amount of Response • Same cut-off point to judge all genes • Over-emphasis in normalization (Thus, reject “Boutique Arrays”) • Over-emphasis in variance stabilization

  10. AAHL Seminar - 12 Dec. 2002 Challenges Human Dependent Interdisciplinary Skills Minimal knowledge of the application discipline is needed: “Animal Breeding & Genetics” Ex.1: What’s a Steer? Ex.2: Ralf Moser’s Data Wt Gain, Kg * * * Options: 1. % Gain vs. % Disease 2. Medians instead of Means 3. Regression coefficients * * * * * * * * * * * * * * * * * * * * * * * * * % Lung Disease

  11. AAHL Seminar - 12 Dec. 2002 The ratio: estimates A - AB = -( + ) Solutions Wt Gain, Kg A O O: Control (Untreated) A: Treatment A B: Treatment B AB: Both Treatments B AB Disease Model: O =  A =  +  B =  +  AB =  +  +  + 

  12. AAHL Seminar - 12 Dec. 2002 Solutions A O B AB

  13. AAHL Seminar - 12 Dec. 2002 Variance of Estimated Effects (Relative to the All-Pairs) Reference 1 1 3 2 Loop 4/3 1 8/3 1 All-Pairs 1 1 2 1 Main effect of A Main effect of B Interaction AB Contrast A-B Solutions A A A O O O B AB B AB B AB Reference Loop All-Pairs

  14. AAHL Seminar - 12 Dec. 2002 Solutions Probability of both Female? Case 1. No Information …………………………1/4 Case 2. The one on the left is female …………1/2 Case 3. One of them is female ………….………1/3

  15. AAHL Seminar - 12 Dec. 2002 3 Equations > 35,000 Equations ! Solutions

  16. AAHL Seminar - 12 Dec. 2002 Solutions Clever Programming Tailored to your needs N=1 for filename in R16T0S1.gpr R16T0S2.gpr R16T24S1.gpr R16T24S2.gpr S32T0S1.gpr S32T0S2.gpr S32T24S1.gpr S32T24S2.gpr do # Get valid readings, compute log ratios awk 'NR>30 && $NF>=0 && $4!="no_spot" && \ substr($4,1,5)!="score" && substr($4,1,5)!="custo" && \ substr($4,1,6)!="spotre" && $9>$12 && $18>$21 \ {print $4, $9-$12, $18-$21, \ log($9-$12)/log(2.0), log($18-$21)/log(2.0)}' \ $filename | sort > junk1 awk '$2!=$3 {print $0, $4-$5, 0.5*($4+$5)}' junk1 > junk2 # get the median of log ratios REC=`wc -l junk2 | awk '{print int($1/2)}'` MED=`sort -n +5 junk2 | awk -v rec=$REC 'NR==rec {print $6}'` echo "Median of file" $filename " = " $MED # Global normalization: substract the median to each log ratio awk -v median=$MED -v slide=$N \ '{print "Slide_"slide, int(slide/2+.5), $1, $6-median}' junk2 | \ sort +2 > dat.$N N=`expr $N + 1` done cat dat.1 dat.2 dat.3 dat.4 dat.5 dat.6 dat.7 dat.8 > total.dat

  17. AAHL Seminar - 12 Dec. 2002 Interaction Solutions Generate a new variable: +1.0*[(R24-R0)+(S0-S24)] if R0<R24 & S0>S24 +0.5*[(R24-R0)+(S24-S0)] if R0<R24 & S0<S24 -0.5*[(R0-R24)+(S0-S24)] if R0>R24 & S0>S24 -1.0*[(R0-R24)+(S24-S0)] if R0>R24 & S0<S24 …then apply model-based clustering. Solutions Clever Programming Tailored to your needs • Your Needs: “Important values are…” • Away from (0,0) • In quadrants 1 and 4.

  18. AAHL Seminar - 12 Dec. 2002 Solutions Clever Programming Tailored to your needs

  19. AAHL Seminar - 12 Dec. 2002 High Keren’s Medium Low Ralf’s Solutions Clever Programming Tailored to your needs Get to know/use all the available options 1.t-Statistics: Standard Penalised 2.Clustering: Location-Based (k-Means, …) Model-Based (Mixtures of Distributions) 3.ANOVA (Linear Models)

  20. AAHL Seminar - 12 Dec. 2002 Conclusions Statistical Analysis of cDNA Microarray Data: • GENERAL: • Still in its infancy (…possibly even embryonic stage) • Many decisions have a heuristic rather than a theoretical foundation • No hope for a “One size fits all” software • Safer to aim towards “Tailor to one’s needs” • Integration of interdisciplinary skills is a must • LIVESTOCK SPECIES: • Tailing humans (…at the moment) • Strong background knowledge of genetics accumulated • Journals will soon be inundated • CLI has the opportunity to participate

More Related