1 / 59

3 rd Summer School in Computational Biology September 10, 2014

3 rd Summer School in Computational Biology September 10, 2014. Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK. Exercise – Survival Analysis. Homework ~ 1.5 hours.

laszlo
Download Presentation

3 rd Summer School in Computational Biology September 10, 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd Summer Schoolin Computational Biology September 10, 2014 Frank Emmert-Streib & SalissouMoutari Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK

  2. Exercise – Survival Analysis Homework ~ 1.5 hours

  3. 1. Kaplan-Meier Survival Curves

  4. Result: Survival Curve S(t)

  5. Goal: estimate S(t) from data • A survival curve shows S(t) as a function of t. • S(t): survival function (survivor function) • t: time S(t) gives the probability that the random variable T is larger than a specified time t, i.e., S(t) = Pr(T>t) T: is the event Problem: censoring

  6. Small example: Leukemia Acute MyelogenousLeukemia (AML) survival time censoring Chemotherapy (we use this info later) Only 5 patients

  7. Small example: Leukemia Number in risk Number of events ??? event censoring

  8. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti Kaplan & Meier 1958

  9. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti

  10. Check S(t) till t

  11. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti

  12. Check S(t) till t

  13. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti Last time seen, still alive at that time

  14. Check S(t) till t

  15. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti

  16. Check S(t) till t

  17. Kaplan-Meier estimator for S(t) • Estimator: ni: number of subjects at time ti di: number of events at time ti

  18. Check S(t) till t

  19. Full data set: Leukemia 23 patients

  20. R code

  21. 2. Comparing Survival Curves

  22. Reasons for comparing survival curves (SC) • Treatment vs no treatment: • Compare a SC for patients that have been treated with a certain medication with the SC for patient that have not been treated. • Result: Has the treatment an effect on the survival of the patients?

  23. Reasons for comparing survival curves • Chemotherapy vs no chemotherapy : • Compare a SC for patients that had chemotherapy with the SC for patient that have not had chemotherapy. • Result: Has the chemotherapy an effect on the survival of the patients? Survival Analysis has a big practical relevance

  24. Data: Leukemia Goal: compare the two SCs statistically Group 1 11 patients with chemo 12 patients without Group 2

  25. R code

  26. Log-rank test (Mantel-Haenszel) • Hypothesis: Null hypothesis H0: No difference in survival between (group 1) and (group 2). Alternative hypothesis H1: Differencein survival between (group 1) and (group 2). Mantel and Haenszel 1959

  27. Idea of the test • For each time t, estimate the expected number of events for (group 1) and (group 2). Number of events at t in i Number in risk at t in i

  28. sum O1 - E1 O2 – E2 E1 E2 The eit are obtained assuming H0 is true. Hence, mit – eit is a measure for the deviation of the data from H0.

  29. Wrapping up • Test statistic: • Sampling distribution: s follows a chi-square distribution with one degree of freedom

  30. R code • Back to our leukemia data set:

  31. Data: Leukemia Goal: compare the two SCs statistically Group 1 11 patients with chemo 12 patients without Group 2

  32. Survival Analysis & Biomarkers

  33. NIH Definition of Biomarker A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to therapeutic intervention.

  34. FDA Definition of Biomarker Any measurable diagnostic indicator that is used to assess the risk or presenceof disease

  35. What is a biomarker? These definitions are very broad and do not help in finding practical implementations for a particular disease.

  36. Our “definition” Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application. Application: Identify a set of genes that can be used for a prognostic analysis. …that are good!

  37. Definition of ‘prognosis’ A prognosisis a medical term denoting the predictionof how a patient will progress over time. For instance, a patient with a diagnosed disease can have: • Long time survival • Short time survival

  38. Our “definition” Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application. Application: Identify a set of genes that can be used for a prognostic analysis. • Set of genes: we call biomarkers • Use biomarkers to predict the prognostic outcome of a patient to classify survival

  39. Underlying idea to identify biomarkers The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods. In the previous example: • Survival analysis • Differential expression of genes • Classification

  40. Underlying idea to identify biomarkers The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods. In the previous example: • Clustering • Survival analysis • Differential expression of genes • Classification

  41. Our “definition” Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application. Application: Identifya set of genes that can be used for a prognostic analysis. Structured patient groups vs unstructured patient groups Statistics: Feature selection problem

  42. Underlying idea to identify biomarkers The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods. The definition of the procedure is part of the experimental design of the whole experiment. Yes, the experimental design includes the analysis of the data!

  43. Summary & Outlook to Genome and Network Medicine Almost there!

  44. Schedule 17 lectures

  45. Interdisciplinary summer school

  46. Vision of the VC Universities require interdisciplinary engagement in the educational and research effort Professor Patrick Johnston of President and Vice-Chancellor (VC) of Queen’s University

  47. A look 5 years ahead

  48. 1. Single cell experiments Experimental measurements of • DNA • Gene expression (mRNA) • Protein binding within single cells. What do the other high-throughput data provide information for? Populations of cells. NGS

More Related