600 likes | 1.08k Views
Simple Methods for Peak Detection in Times Series Microarray Data. I. Azzini R. Dell’Anna F. Ciocchetta F. Demichelis A. Sboner Bioinformatics Group, SRA, ITC-Irst Department of Information and C.T. Trento University, Italy E. Blanzieri A. Malossini
E N D
Simple Methods for Peak Detection in Times Series Microarray Data I. Azzini R. Dell’Anna F. Ciocchetta F. Demichelis A. Sboner Bioinformatics Group, SRA, ITC-Irst Department of Information and C.T.Trento University, Italy E. Blanzieri A. Malossini Department of Information and Communication Technology Trento University
Preliminary Analysis • Visual inspection of images • There are blurs on the images • We used alternative sw for image analysis • TIGR SpotFinder • Scananalyse • We reapplied the GenePixPro 3.0 quality control algorithm on a sample of images • Conclusions • From preliminary analysis did not emerged evidence againts reliability of original measures. • use QC_Dataset for further analysis
Our analysis problem • To detect and characterize genes that present peaks and singularities over the time series. • Motivations: • Primary: Peak genes could play an intriguing role • Secondary: artifacts detection
Our approach • Detection of spike genes • Apply a series of simple methods based on discrete derivative and integral. • Characterization of genes • Functional Classification using Multiclass SVM
Outline of the talk • Preliminary analysis • The analysis problem • Methods for peak detection • M-SVM for oligo classification • Results • Discussion
QC_Dataset Our notation: X0,t=E(o,t)
Missing value managment(data imputing) • Up to 2 adjacent missing values were replaced by interpolation • Oligos with more adjacent missing values were discarded • Extrapolation for TP1 and TP48 (For functional classification)
Methods for peak detection None of the methods is 100% precise and 100% accurate
Methods for peak detection The combination of M1, M2 and M3 are less prone to detect ramps Instead of peaks
Detection procedure • Each method M1-M6 scores the oligos. • We selected the oligos that were ranked among the first ten by at least one method
Detection procedure • We discarded oligos whose signal to noise ratio is less of 2 • The S/N ratio is higher w.r.t. the one adopted in original work • We need such a filter to discard extremely noisy signals • We visual inspect all the oligos of the table and discarded the ones that does not present peaks
opfblob0072 n128_25 f65819_1 m364_2 m12963_1 n159_34 ks244_7 n128_61 opfm60504 l1_28 ET ks75_15 ET c154 b593 b597 n176_5 opfh0008 opfblob0105 b541 n132_108 m50253_2 ks1030_4 OM n128_33 f71224_1 opfh0022 e17542_1 Detection procedure:Selected genes
Functional Classification (M-SVM) • Multiclass Support Vector Machine • Pairwise classification (N-1)*N/2 classifiers for N classes. • Majority vote • Schema for replacement of missing values • Trained on data of Table S2 • 530 samples and 14 functional classes • LOO accuracy is 73% • We applied the classifiers to the complete_dataset and scored the results depending on the voting.
Significant peaks or artifacts? • We tested: • Data Quality (from preliminary analysis) • We discarded oligos with low signal to noise ratios • The peaks have different width and amplitude (not consistent with synchronization induced artifact)
How are the peaksdistributed over time? • Plasmodium falciparum has different phases during the 48 hours cycle IDC (Ring, Trophozoide, Schizont) • The peaks that we detected seems to concetrate in specific time points. • We used Kolgomorov-Smirnov test for ruling out uniform distribution
Discussion • The peaks do not distributed uniformely over time • There is a (possibly) interesting high number of peaks near a transition phase.
Conclusions • We presented • Methods for peak detection • Functonal classificaton via M-SVM • The peaks do not distribute uniformely over time
Azzini* R. Dell’Anna F. Ciocchetta F. Demichelis A. Sboner Bioinformatics Group, SRA, ITC-Irst Department of Information and C.T.Trento University, Italy E. Blanzieri A. Malossini Department of Information and Communication Technology University of Trento
Biological Interpretation • Critical issue about our analysis
opfblob0072 n128_25 f65819_1 m364_2 m12963_1 n159_34 ks244_7 n128_61 opfm60504 l1_28 ks75_15 c154 b593 b597 n176_5 opfh0008 opfblob0105 b541 n132_108 m50253_2 ks1030_4 n128_33 f71224_1 opfh0022 e17542_1 opfblob0072 n128_25 f65819_1 m364_2 m12963_1 n159_34 ks244_7 n128_61 opfm60504 l1_28 ks75_15 c154 b593 b597 n176_5 opfh0008 opfblob0105 b541 n132_108 m50253_2 ks1030_4 n128_33 f71224_1 opfh0022 e17542_1 Selected genes