1 / 29

Proteomics Informatics –  Molecular signatures

This article explains the definition and uses of molecular signatures in proteomics informatics. It also discusses experimental design techniques to ensure valid conclusions. Case studies of MammaPrint and Oncotype DX are included.

schuler
Download Presentation

Proteomics Informatics –  Molecular signatures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteomics Informatics –  Molecular signatures

  2. Definition of a molecular signature A molecular signature is a computational or mathematical model that links high-dimensional molecular information to phenotype or other response variable of interest. FDA calls them “in vitro diagnostic multivariate assays”

  3. Uses of molecular signatures • Models of disease phenotype/clinical outcome • Diagnosis • Prognosis, long-term disease management • Personalized treatment (drug selection, titration) • Biomarkers for diagnosis, or outcome prediction • Make the above tasks resource efficient, and easy to use in clinical practice • Discovery of structure & mechanisms (regulatory/interaction networks, pathways, sub-types) • Leads for potential new drug candidates

  4. Experimental Design Experimental Design by Christine Ambrosino www.hawaii.edu/fishlab/Nearside.htm

  5. Experimental Design Overcoming the threat from chance and bias to the validity of conclusion.

  6. Experimental Design Controllable Factors Process Inputs Outputs Uncontrollable Factors

  7. Experimental Design • Recognition and statement of the problem (e.g. testing a specific hypothesis or open ended discovery). • Selecting a response variable. • Choosing controllable factors and their range. • Listing uncontrollable factors and estimate their effect. • Choosing experimental design. • Performing experiment. • Statistical analysis of data. • Designing the next experiment based on the results.

  8. Exploring the Parameter Space One factor at a time Score Score Score Factor 1 Factor 2 Factor 3 k factors : 2k experiments 2-factor factorial design 3-factor factorial design 4 experiments Factor 2 8 experiments Factor 1 k-factor factorial design (2k experiments) For example, 7 factors: 128 experiments, 10 factors: 1,024 experiments

  9. Randomization • Statistical methods require that observations are independently distributed random variables. Randomization usually makes this assumption valid. • Randomization guards against unknown and uncontrolled factors. • Randomize with respect to analysis order, location, material etc. Not Randomized Randomized p = 0.19 p = 0.32 No change in sensitivity during measurement Order of Measurements Order of Measurements

  10. Randomization Not Randomized Randomized p = 0.19 p = 0.32 Standard Deviation: 0.8, 0.8 Standard Deviation: 0.7, 0.9 No change in sensitivity during measurement Order of Measurements Order of Measurements p = 0.20 p = 5.7x10-6 Standard Deviation: 1.8, 1.3 Change in sensitivity during measurement Order of Measurements Order of Measurements

  11. Blocking Blocking is used to control for known and controllable factors. Randomized Complete Block Design - minimizing the effect of variability associated with e.g. location, operator, plant, batch, time. The Latin Square Design - minimizing the effect of variability associated with two independent factors The rows and columns represent two restrictions on randomization

  12. Replication • Replication is needed to estimate the variance in the measurements. • Technical replicates (repeat measurements). • Process replicates • Biological replicates

  13. Uncertainty in Determining the Mean • Normal • Skewed • Long tails • Complex • n=3 • n=3 • n=3 • n=10 Standard Error of the Mean • n=100 • n=10 • n=10 • n=10 • n=1000 • n=100 • n=100 • n=100 Mean

  14. An example of bad experimental design

  15. Experimental Design - Summary • Chance and bias is a threat to the conclusions from experiments • Controllable and uncontrollable factors • Randomization to guard against unknown and uncontrolled factors • Replication (technical, process, and biological replicates) is used to estimate error in measurement and yields a more precise estimate. • Blocking to control for known and controllable factors • Multiple testing • Molecular markers

  16. Experimental Design - Summary • Use your domain knowledge: using a designed experiment is not a substitute for thinking about the problem. • Keep the design and analysis as simple as possible. • Recognize the difference between practical and statistical significance. • Design iterative experiments.

  17. MammaPrint • Developed by Agendia (www.agendia.com) • 70-gene signature to stratify women with breast cancer that hasn’t spread into “low risk” and “high risk” for recurrence of the disease • Independently validated in >1,000 patients • So far performed >10,000 tests • Cost of the test is ~$3,000 • In February, 2007 the FDA cleared the MammaPrint test for marketing in the U.S. for node negative women under 61 years of age with tumors of less than 5 cm. • TIME Magazine’s 2007 “medical invention of the year”.

  18. Oncotype DX Breast Cancer Assay Developed by Genomic Health (www.genomichealth.com) 21-gene signature to predict whether a woman with localized, ER+ breast cancer is at risk of relapse Independently validated in thousands of patients So far performed >100,000 tests Price of the test is $4,175 Not FDA approved but covered by most insurances including Medicare Its sales in 2010 reached $170M and with a compound annual growth rate is projected to hit $300M by 2015.

  19. Improved Survival and Cost Savings In a 2005 economic analysis of recurrence in LN-,ER+ patients receiving tamoxifen, Hornberger et al. performed a cost-utility analysis using a decision analytic model. Using a model, recurrence Score result was predicted on average to increase quality-adjusted survival by 16.3 years and reduce overall costs by $155,128. In a 2 million member plan, approximately 773 women are eligible for the test. If half receive the test, given the high and increasing cost of adjuvant chemotherapy, supportive care and management of adverse events, the use of the Oncotype DX assay is estimated to save approximately $1,930 per woman tested (given an aggregate 34% reduction in chemotherapy use).

  20. EF Petricoin III, AM Ardekani, BA Hitt, PJ Levine, VA Fusaro, SM Steinberg, GB Mills, C Simone, DA Fishman, EC Kohn, LA Liotta, "Use of proteomic patterns in serum to identify ovarian cancer", Lancet 359 (2002) 572–77

  21. Check E., Proteomics and cancer: running before we can walk? Nature. 2004 Jun 3;429(6991):496-7.

  22. Example: OvaCheck • Developed by Correlogic (www.correlogic.com) • Blood test for the early detection of epithelial ovarian cancer  • Failed to obtain FDA approval • Looks for subtle changes in patterns among the tens of thousands of proteins, protein fragments and metabolites in the blood • Signature developed by genetic algorithm • Significant artifacts in data collection & analysis questioned validity of the signature: • Results are not reproducible • Data collected differently for different groups of patients http://www.nature.com/nature/journal/v429/n6991/full/429496a.html

  23. Main ingredients for developing a molecular signature

  24. Base-Line Characteristics DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005) 142-9.

  25. How to Address Bias DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005) 142-9.

  26. Challenges in computational analysis of omics data for development of molecular signatures • Signature multiplicity (Rashomon effect) • Poor experimental design • Is there predictive signal? • Assay validity/reproducibility • Efficiency (Statistical and computational) • Causality vspredictivness • Methods development (reinventing the wheel) • Many variables, few samples, noise, artifacts • Editorialization/Over-simplification/Sensationalism

  27. General conclusions • Molecular signatures play a crucial role in personalized medicine and translational bioinformatics. • Molecular signatures are being used to treat patients today, not in the future. • Development of accurate molecular signature should rely on use of supervised methods. • In general, there are many challenges for computational analysis of omics data for development of molecular signatures. • One of these challenges is molecular signature multiplicity. • There exist an algorithm that can extract the set of maximally predictive and non-redundant molecular signatures from high-throughput data.

  28. Single cell proteomics Mass-spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation BBudnik, ELevy, GHarmange, NSlavov bioRxiv 102681

  29. Single cell proteomics Bendall SC, Nolan GP, Roederer M, Chattopadhyay PK. A deep profiler's guide to cytometry. Trends Immunol. 2012 Jul;33(7):323-32.

More Related