1 / 17

A Multivariate Biomarker for Parkinson’s Disease

A Multivariate Biomarker for Parkinson’s Disease. M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin. The Michael L. Gargano 12 th  Annual Research Day Friday, May 2 nd , 2014. Introduction. Genomic Analysis for the selection of genes associated with Parkinson’s Disease (PD)

haven
Download Presentation

A Multivariate Biomarker for Parkinson’s Disease

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12th Annual Research DayFriday, May 2nd, 2014

  2. Introduction • Genomic Analysis for the selection of genes associated with Parkinson’s Disease (PD) • Adoption of Multivariate Techniques • Comparison between several classification algorithms

  3. The Data • Microarray expression data from Affymetrix • Expression Dataset from GenBank (Geo accession GSE6613) generated using MAS5 algorithm • 105 samples, 22,283 measurements of gene expression from three groups: • Parkinson’s disease group (50 patients) • Healthy control group. (22) • Neurodegenerative control group. (33)

  4. Data Preparation • Filtering: removed noise in probesets (measurement) using “Filtering by Present calls” with threshold of 25%: only maintain genes expressed in 25% of the sample. • After the filtering the number of probeset dropped from 23,283 to 8,100.

  5. Normality Assumption & Normalization • The data showed a strong right skewness • We applied logarithmic-scale transformation • Normalized the data using z-score for outlier detection [z score > 5] and algorithmic optimization

  6. Univariate Analysis • Identify which (single) gene is associated with PD • Correspond in running 8,100 hypothesis tests: • H0: mA=mBwith the alternative H1: mA> mB • For this test we use the t-statistic t= with critical region t≥za • Since we have 3 classes a gene is selected if: • Are up-regulated in PD (Parkinson Disease) when compared with the other classes • Are up-regulated in the other classes but down-regulated in PD. • The result of this analysis does not indicate which class contains the up-regulated gene(s), so we need to check.

  7. Upregulated Features We identified 60 genes out of 24,000!

  8. Problems of Univariate Analysis in Genomics In array-based differential expression analysis the problem is to generate a list of genes that are differentially expressed, as meaningful and complete as possible. Let’s have 1,000 genes. We test each with a t-test with a significance level of 0.05: we might expect 40 genes to be differentially expressed. Of the remaining 960 non-differentially expressed genes we can expect 5% errors, or .05 x 960 = 48 false positives There are more false positives than truly differentially expressed genes: this is called multiple hypothesis testing problem

  9. Univariate Vs Multivariate • In Univariate analysis we are considering the effect of each gene, individually, against the target (PD) • The effect of a disease is rarely the result of a single gene. • Even if good univariate leads are found (the 60 genes) this rarely turns into the identification of useful pathways. • We don’t have information on any group of genes that, together, might be involved in the development of PD • Multivariate approaches tests for group of variables that, simultaneously, explain the particular output. • Multivariate theory is much more complex.

  10. Multivariate Mining on Genomics We are trying to identify a subset of genes (as small as possible) used as a classification model that will differentiate classes in the original data set. • Wrapper Subset Evaluator (WSE): implementation of forward wrapper method for feature selection for the creation of an optimal subset. • Correlation-based Feature Selection (CFS): these algorithms evaluate different combinations of features to identify an optimal subset. The feature subsets to be evaluated are generated using different search techniques. We used Best First and Greedy search methods with a forward direction. • R-Support Vector Machine (RSVM): a non-probabilistic binary linear classifier in its recursive version. No matter which algorithm you select it must use multivariate hypothesis testing

  11. Multivariate AnalysisEvaluating several Classification Models

  12. Multivariate AnalysisEvaluating several Classification Models We used 10 folds cross-validation method during the feature selection process. In K-Fold Cross-validation the original data set is split into k equal size sub-partitions. Out of the k sets, one is retained as a validation set for testing the model, and the remaining k-1 used in training the data. The cross-validation is repeated k times, and the results averaged.

  13. Multivariate AnalysisResults - WSE Kappa Statistics is a rate of agreement between tests.

  14. Multivariate AnalysisResults – CFS

  15. Multivariate AnalysisResults – CFS This looks a good starting point. A further investigation is warrant to understand the relationships between the selected 20 genes.

  16. Conclusions Multivariate models are a necessary tools in genomic studies. Among the algorithms tested in this study, RSVM clearly came out as an effective model to adopt in biomarker discovery, with the important ability of successfully discriminate between PD and other neurodegenerative diseases. This research cannot stop here, and the natural next step is to look for the biological interpretation of this result.

  17. Thank you

More Related