330 likes | 344 Views
Classifying defense of responses to Phytophthora infestans using a statistical approach. Phytophthora infestans. Oomycete (fungal-like) pathogen of tomato, potato and few other plant species within Solanaceae Causes devastating disease worldwide
E N D
Classifying defense of responses to Phytophthora infestans using a statistical approach
Phytophthora infestans • Oomycete (fungal-like) pathogen of tomato, potato and few other plant species within Solanaceae • Causes devastating disease worldwide • Increasingly difficult to control with common cultural practices including resistant varieties • Has a great number of nonhosts where disease symptoms are not expressed
Nonhost resistance to Phytophthora infestans What is nonhost resistance?? • Resistance of all members of a plant species to all isolates or strains of a plant pathogen • Also has been referred to as ‘resistance on the species level’ • Arabidopsis thaliana is a nonhost of P. infestans
One of our lab interests... Nonhost resistance of Arabidopsis thaliana to the oomycete pathogen Phytophthora infestans The basis of nonhost resistance • Elicitors of P. infestans in Arabidopsis • Test defense pathways in Arabidopsis involved in nonhost resistance • Identification of novel defense pathways in Arabidopsis
Arabidopsis: A source of nonhost resistance genes to Phytophthora? • Genome sequence since year 2000 • Powerful mutagenesis tools • Large number of characterized mutants available • A wide variety of functional and high throughput tools available
Arabidopsis mounts an active defense response during Phytophthora infection Arabidopsis cells display a Hypersensitive Response (HR) upon P. infestans infection as can be seen by cytoplasm granulation (A) and the accumulation of secondary metabolites (B)
Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays
Differential gene expression analysis Immobilized DNA targets, representing genes, allow gene-expression analysis through hybridization Probe = cDNA lesion Probe = cDNA mycelium
Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays
DNA micro-arrays, a high throughput version of the reverse Northern PSEUDO image of microarray slide after hybridization with probes obtained from two experimental conditions
Hybridization based Gene expression analysis techniques • ‘Classical’: • Northern blot analysis • Macro array (reversed Northern) • ‘Advanced’: • Micro arrays (Glass slide immobilized DNA) • Genechips or Oligonucleotide arrays
Affymetrix gene chip technology: an advanced version of the microarray A set of 16 oligos (25-mer) are designed for each distinct gene and synthesized at a very high density on a solid surface From: Schadt et al., journal of cellular bioch. (2000)
Arabidopsis whole genome chip 26,413 genes on a chip! The affymetrix chip was used to assay the induction of genes between four experimental stages/conditions: 1: mock inoculated 16hrs 2: mock inoculated 40hrs 3: P. infestans 16 hrs 4: P. infestans 40 hrs From: http://www.tmri.org/gene_exp_web/index.html 105,652 data points were generated...
Project question: How can we infer differential expression of a gene with high levels of confidence? Potential and/or available Methods: • Cluster analysis • ANalysis Of Variance (ANOVA) • Regression analysis
Cluster analysis (hierarchial) • Procedure: • Matrix is created • Gene pairs having the highest • similarity score are paired • Nodes are formed based on similarity • Tree is drawn displaying relatedness Although visual and intuitive, some assumptions about experimental procedures and artifacts are made using this approach From: Eisen et al., (1998) PNAS Vol. 95, pp. 14863–14868
ANOVA Robust method of assigning variance to experimental components. This analysis approach makes assumptions about your dataset: • Normal distribution of data: • Univariate analysis; test whether data is normally distributed • analysis can be done in SAS using procunivariate
PROCUNIVARIATE SAS Code: Data one; Infile 'c:\edgar analysis\realdeal.csv' delimiter = ',' firstobs = 5; Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = H2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = H2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = Inf_1; Exp = 1; chip = 2; trt = 2; output; lncor = Inf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var H2O_1 H2O_2 Inf_1 Inf_2; Run;
PROCUNIVARIATE SAS Code: Data one; Infile 'c:\edgar analysis\realdeal.csv' delimiter = ',' firstobs = 5; Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = H2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = H2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = Inf_1; Exp = 1; chip = 2; trt = 2; output; lncor = Inf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var H2O_1 H2O_2 Inf_1 Inf_2; Run;
PROC UNIVARIATE on transformed data • Distribution skewed (non-normal distributed) • (Log) transform datapoints and re-run Procunivariate Reading in the file and transform the data: Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lnH2O_1 = log(H2O_1+1); lnH2O_2 = log(H2O_2+1); lnInf_1 = log(Inf_1+1); lninf_2 = log(Inf_2+1);
PROC UNIVARIATE on transformed data Run PROC UNIVARIATE: lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc univariate normal plot; var lnH2O_1 lnH2O_2 lnInf_1 lnInf_2; Run;
PROC UNIVARIATE on transformed data Results:
PROC UNIVARIATE on transformed data Results: Normal Probability Plot shows better distribution
Log transformed data has better properties Transformed data can now be used to test our model in ANOVA Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc anova; Class gene exp chip trt; Model lncor = gene chip gene*trt chip*trt; Means gene*trt / lines; Run;
ANOVA Procedure Model: Y = µ + G +C + G*T + C*T + E Input gene $ H2O_1 H2O_2 Inf_1 Inf_2; lncor = lnH2O_1; Exp = 1; chip = 1; trt = 1; output; lncor = lnH2O_2; Exp = 2; chip = 3; trt = 1; output; lncor = lnInf_1; Exp = 1; chip = 2; trt = 2; output; lncor = lnInf_2; Exp = 2; blot = 4; trt = 2; output; Proc anova; Class gene exp chip trt; Model lncor = gene chip gene*trt chip*trt; Means gene*trt / lines; Run;…..RIGHT ???
ANOVA Procedure Problem Working with such a large dataset, it appeared that SAS has a matrix size limit, allowing ‘only’ 32,767 df for analysis (the project’s dataset exceeded this somewhat (~3-fold) SAS can be tricked into taking random subsets of the data to analyze using a string or bootstrapping approach. However the CODE allowing for that needs to be developed (IN PROGRESS) For the purpose of this class, The Linear Regression approach was taken for the selection of genes that are induced or suppressed
Linear Regression: Procedure • Consider H2O_1 and H2O_2 replicates as well as • Inf_1 and Inf_2 • Use the replicates to calculate the means for every • treatment • Draw a scatterplot and fit a line that assumes a linear • relationship
Linear Regression: visualized Example graph obtained from: http://www.tmri.org/gene_exp_web/index.html • Draw lines marking a confidence interval (P<0.01) • Select data points that fall out of the confidence interval
Linear Regression: SAS Code: proc sort data = one; by gene; proc means noprint data = one; where trt = 1; by gene; var lncor; output out=T1means mean = t1; proc means noprint data = one; where trt = 2; by gene; var lncor; output out=T2means mean = t2;
Linear Regression: SAS Code C’td: data both; merge T1means T2means; by gene; proc reg data=both; model T2 = T1 / CLI alpha = 0.01; output out = two LCL = lowercl UCL = uppercl; data two; set two; marker = 0; if T2 GT uppercl or T2 LT lowercl then marker = 1; proc print; where marker EQ 1; Run;
Linear Regression: Non-transformed v.s. Transformed data Performance of linear regression was assessed for both transformed and non-transformed data Regression analysis seems more sensitive when log transformed data is used
Linear Regression…. Advantages and disadvantages • Advantages: • Computationally straightforward • Efficient selection tool of outlyers Linear regression may be efficient and not as computationally intense as ANOVA, but some caution needs to be used Disadvantage: Detected outlyers represent observed values, the contribution to the difference cannot be assigned to an experimental variable perse
Acknowledgmentsfor this project Torrey Mesa Research Institute (TMRI) • Dr. David Francis • Dr. Bert Bishop • Dr. Sophien Kamoun