150 likes | 263 Views
Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller.edu Outline. Excursions into GeneChip data analysis. Background subtraction Probeset statistics. Background estimation. estimate both mean B and fluctuations s needed in low-intensity regime
E N D
Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001felix@funes.rockefeller.eduOutline Excursions into GeneChip data analysis • Background subtraction • Probeset statistics
Background estimation • estimate both meanB and fluctuationss • needed in low-intensity regime • includes light reflection from substrate, photodetector dark current, some cross-hybridization (i.e. small residues) • by the CLT, background is expected to be a Gaussian variable
P(PM) Real “+” = s B B 0 • idea: B is insensitive to MM and visible at low intensity • select probes such that |PM-MM| < e (locally?) • use e=50 (new) or 100 (old settings) • P(PM) or P(MM) is convolution of Gaussian and step function
example: (e=100) dependence on e:
PM vs. MM distribution make a histogram in this region MM>PM zoom
MM>PM across different chips MM>PM not concentrated at low intensities: 27% of probe pairs with MM>PM are in the top quartile
probe pairs trajectories (~80 chips) • take all (PM, MM) for • a given probe set • center of mass (x,y) • ellipsoid of inertia • > s1and s2 • histogram the cm’s • color code acc. to • s = s1 / S(min(x, y)) • ~ noise detrending
all probe sets blue : large s green : mid red : small
probes with ‘well’ defined trajectories (eccentricity > 3) ~1/3 of probes blue : large green: mid red : small
PM within a probe set Are the brightness of the probes reasonably uniform? Or do different probes have very different hybridization efficiencies?
So what can possibly be happening? • sequence dependent hybridization efficiencies • are kinetic effects important? • cross-hybridization beyond what is detectable by • MM probes • this is hard to assess without sequence info • sequence dependent fabrication efficiencies? • variable probe densities
Composite scores • What have we learned from previous slides? • MM are not consistently behaving as expected • What about not using them ? • The probe set intensities vary over decades • difficult to estimate absolute intensities using • ‘averages’ (alternative: Li and Wong) • - we focus on ratio scores
Outline of algorithm • estimate background (mean and std) • discard noisy and saturated probes use either only PM or PM-MM as raw intensities • average the remaining log-ratios in an outlier robust way (robust regression to intercept), SE • normalize by centering (event. local) log-ratio distribution