1 / 33

Classifying defense of responses to Phytophthora infestans using a statistical approach

Classifying defense of responses to Phytophthora infestans using a statistical approach. Phytophthora infestans. Oomycete (fungal-like) pathogen of tomato, potato and few other plant species within Solanaceae Causes devastating disease worldwide

gilliamj
Download Presentation

Classifying defense of responses to Phytophthora infestans using a statistical approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying defense of responses to Phytophthora infestans using a statistical approach

  2. Phytophthora infestans • Oomycete (fungal-like) pathogen of tomato, potato and few other plant species within Solanaceae • Causes devastating disease worldwide • Increasingly difficult to control with common cultural practices including resistant varieties • Has a great number of nonhosts where disease symptoms are not expressed

  3. Nonhost resistance to Phytophthora infestans What is nonhost resistance?? • Resistance of all members of a plant species to all isolates or strains of a plant pathogen • Also has been referred to as ‘resistance on the species level’ • Arabidopsis thaliana is a nonhost of P. infestans

  4. One of our lab interests... Nonhost resistance of Arabidopsis thaliana to the oomycete pathogen Phytophthora infestans The basis of nonhost resistance • Elicitors of P. infestans in Arabidopsis • Test defense pathways in Arabidopsis involved in nonhost resistance • Identification of novel defense pathways in Arabidopsis

  5. Arabidopsis: A source of nonhost resistance genes to Phytophthora? • Genome sequence since year 2000 • Powerful mutagenesis tools • Large number of characterized mutants available • A wide variety of functional and high throughput tools available

  6. Arabidopsis mounts an active defense response during Phytophthora infection Arabidopsis cells display a Hypersensitive Response (HR) upon P. infestans infection as can be seen by cytoplasm granulation (A) and the accumulation of secondary metabolites (B)

  7. Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays

  8. Differential gene expression analysis Immobilized DNA targets, representing genes, allow gene-expression analysis through hybridization Probe = cDNA lesion Probe = cDNA mycelium

  9. Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays

  10. DNA micro-arrays, a high throughput version of the reverse Northern PSEUDO image of microarray slide after hybridization with probes obtained from two experimental conditions

  11. Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays

  12. Affymetrix gene chip technology: an advanced version of the microarray A set of 16 oligos (25-mer) are designed for each distinct gene and synthesized at a very high density on a solid surface From: Schadt et al., journal of cellular bioch. (2000)

  13. Arabidopsis whole genome chip 26,413 genes on a chip! The affymetrix chip was used to assay the induction of genes between four experimental stages/conditions: 1: mock inoculated 16hrs 2: mock inoculated 40hrs 3: P. infestans 16 hrs 4: P. infestans 40 hrs From: http://www.tmri.org/gene_exp_web/index.html 105,652 data points were generated...

  14. Project question: How can we infer differential expression of a gene with high levels of confidence? Potential and/or available Methods: • Cluster analysis • ANalysis Of Variance (ANOVA) • Regression analysis

  15. Cluster analysis (hierarchial) • Procedure: • Matrix is created • Gene pairs having the highest • similarity score are paired • Nodes are formed based on similarity • Tree is drawn displaying relatedness Although visual and intuitive, some assumptions about experimental procedures and artifacts are made using this approach From: Eisen et al., (1998) PNAS Vol. 95, pp. 14863–14868

  16. ANOVA Robust method of assigning variance to experimental components. This analysis approach makes assumptions about your dataset: • Normal distribution of data: • Univariate analysis; test whether data is normally distributed • analysis can be done in SAS using procunivariate

  17. PROCUNIVARIATE SAS Code: Data one; Infile 'c:\edgar analysis\realdeal.csv' delimiter = ',' firstobs = 5; Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = H2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = H2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = Inf_1; Exp = 1; chip = 2; trt = 2; output; lncor = Inf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var H2O_1 H2O_2 Inf_1 Inf_2; Run;

  18. PROCUNIVARIATE SAS Code: Data one; Infile 'c:\edgar analysis\realdeal.csv' delimiter = ',' firstobs = 5; Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = H2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = H2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = Inf_1; Exp = 1; chip = 2; trt = 2; output; lncor = Inf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var H2O_1 H2O_2 Inf_1 Inf_2; Run;

  19. PROCUNIVARIATE

  20. PROC UNIVARIATE on transformed data • Distribution skewed (non-normal distributed) • (Log) transform datapoints and re-run Procunivariate Reading in the file and transform the data: Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lnH2O_1 = log(H2O_1+1); lnH2O_2 = log(H2O_2+1); lnInf_1 = log(Inf_1+1); lninf_2 = log(Inf_2+1);

  21. PROC UNIVARIATE on transformed data Run PROC UNIVARIATE: lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var lnH2O_1 lnH2O_2 lnInf_1 lnInf_2; Run;

  22. PROC UNIVARIATE on transformed data Results:

  23. PROC UNIVARIATE on transformed data Results: Normal Probability Plot shows better distribution

  24. Log transformed data has better properties Transformed data can now be used to test our model in ANOVA Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc anova; Class gene exp chip trt; Model lncor = gene chip gene*trt chip*trt; Means gene*trt / lines; Run;

  25. ANOVA Procedure Model: Y = µ + G +C + G*T + C*T + E Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc anova; Class gene exp chip trt; Model lncor = gene chip gene*trt chip*trt; Means gene*trt / lines; Run;…..RIGHT ???

  26. ANOVA Procedure Problem Working with such a large dataset, it appeared that SAS has a matrix size limit, allowing ‘only’ 32,767 df for analysis (the project’s dataset exceeded this somewhat (~3-fold) SAS can be tricked into taking random subsets of the data to analyze using a string or bootstrapping approach. However the CODE allowing for that needs to be developed (IN PROGRESS) For the purpose of this class, The Linear Regression approach was taken for the selection of genes that are induced or suppressed

  27. Linear Regression: Procedure • Consider H2O_1 and H2O_2 replicates as well as • Inf_1 and Inf_2 • Use the replicates to calculate the means for every • treatment • Draw a scatterplot and fit a line that assumes a linear • relationship

  28. Linear Regression: visualized Example graph obtained from: http://www.tmri.org/gene_exp_web/index.html • Draw lines marking a confidence interval (P<0.01) • Select data points that fall out of the confidence interval

  29. Linear Regression: SAS Code: proc sort data = one; by gene; proc means noprint data = one; where trt = 1; by gene; var lncor; output out=T1means mean = t1; proc means noprint data = one; where trt = 2; by gene; var lncor; output out=T2means mean = t2;

  30. Linear Regression: SAS Code C’td: data both; merge T1means T2means; by gene; proc reg data=both; model T2 = T1 / CLI alpha = 0.01; output out = two LCL = lowercl UCL = uppercl; data two; set two; marker = 0; if T2 GT uppercl or T2 LT lowercl then marker = 1; proc print; where marker EQ 1; Run;

  31. Linear Regression: Non-transformed v.s. Transformed data Performance of linear regression was assessed for both transformed and non-transformed data Regression analysis seems more sensitive when log transformed data is used

  32. Linear Regression…. Advantages and disadvantages • Advantages: • Computationally straightforward • Efficient selection tool of outlyers Linear regression may be efficient and not as computationally intense as ANOVA, but some caution needs to be used Disadvantage: Detected outlyers represent observed values, the contribution to the difference cannot be assigned to an experimental variable perse

  33. Acknowledgmentsfor this project Torrey Mesa Research Institute (TMRI) • Dr. David Francis • Dr. Bert Bishop • Dr. Sophien Kamoun

More Related