1 / 31

Applied Statistics – Challenges and Reward

Applied Statistics – Challenges and Reward. Wenjiang Fu, Ph.D Computational Genomics Lab, Department of Epidemiology Michigan State University fuw@msu.edu www.msu.edu/~fuw. What is Statistics ?. “Lies, Damned Lies, and Statistics” “Figures fool when fools figure”

arvid
Download Presentation

Applied Statistics – Challenges and Reward

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applied Statistics – Challenges and Reward Wenjiang Fu, Ph.D Computational Genomics Lab, Department of Epidemiology Michigan State University fuw@msu.edu www.msu.edu/~fuw

  2. What is Statistics ? • “Lies, Damned Lies, and Statistics” • “Figures fool when fools figure” • A branch of mathematical science that studies data through probability distribution and modeling. • Fields: probability theory, actuarial science, biostatistics, finance statistics, industrial statistics, etc. • Related fields: biometrics, bioinformatics, geo-statistics, statistical mechanics, econometrics, etc.

  3. Knowledge & Information Decision “Data” Statistics Grand challenges we are facing … 21st century will be the golden age of statistics !

  4. Grand challenges we are facing … • Data collection technology has advanced dramatically, but without sufficient statistical sampling design and experimental design. • Advancement of technology for discovering and retrieving useful information has been lagging and has become the bottleneck. • More sophisticated approaches are needed for decision making and risk management.

  5. Statistical Challenges -- Massive Amount of Data

  6. Statistical Challenges – Image Data

  7. Statistical Challenges – Functional Data, Graph (Network) Data, and Shape Data

  8. Statistical Challenges – Click Stream Data

  9. Statistical Challenges – Data Fusion and Assimilation Data

  10. Statistics in Science Cosmic microwave background radiation High Energy Physics Genomic/proteomic data Tick-by-tick stock data

  11. Statistics in Science Microarray Finger Prints

  12. What do we do? • New ways of thinking and attacking problems • Finding sub-optimal but computationally feasible solutions. • New paradigm for new types of data • Be satisfied with ‘very rough’ approximations • Turn research results into easy and publicly available software and programs • Join force with computer scientists.

  13. Some ‘hot’ research directions • Dimension reduction • Visualization • Dynamic systems • Simulation and real time computation • Uncertainty and risk management • Interdisciplinary research

  14. Example 1. Sociology data

  15. Result through statistical modeling

  16. Example 2. Epidemiological study data

  17. Results from statistical modeling

  18. Example 3 Medical study data: Ob/Gyn Modeling of PlGF: Placental Growth Factor

  19. SNP: Single Nucleotide Polymorphism • Homologous pairs of chromosomes • Paternal allele • Maternal allele Paternal allele Maternal allele ACGAACAGCT TGCTTGTCGA SNP A/G ACGAGCAGCT TGCTCGTCGA

  20. The International HapMap Consortium (Nature 2003)

  21. Haplotype[AB] SNP1: two allelesA and a SNP2: two allelesB and b Haplotype[ab] Diplotype[AB][ab] Allele, Haplotype and Diplotype a A b B

  22. Microarray Technology: 2 channels Hybridization: A T C G T A G | | | | | | | T A G C A T C

  23. Microarray normalization: between slides Boxplots of log ratios from 3 replicate self-self hybridizations. Left panel: before normalization Middle panel: after within print-tip group normalization Right panel: after a further between-slide scale normalization.

  24. Affymetrix SNP Array ‘AB’ SNP: AC A – A, B – C. Illustration of SNP annotation on Affymetrix SNP array. Adopted from Matsuzaki et al 2004.

  25. Computational Genomics Data: SNP Genotype Error rate : 1 – 5 % : GIGO – Garbage in Garbage out

  26. Computational Genomics Data: SNP Genotype

  27. Prospects I Genome-oriented Medicine Genetic Variation influences - disease susceptibility - disease progression - therapeutic response - unwanted drug effects Genetics is pointing the way to personalized medicine… With the development of human HapMap project, coupling with advanced statistical approaches, we are entering an era to design personalized medicine based on individual’s genetic profile.

  28. Whole Genome-wide Association Studies

  29. Whole Genome-wide Association Studies • Successful study: • Wellcome Trust Case-Control Consortium • GWAS on 7 diseases with 14,000 patients and 2000 common controls. (Nature 2007) • Hypertension, diabetes, etc.

  30. Recruiting Graduate Students • Epidemiology: Study distribution of Disease; • Biostatistics: data modeling, computation; • Quantitative Biology Initiative: MSU cross-disciplinary center. • Background: Mathematics, Statistics, Physics, Biology, Chemistry, and others. • Opportunity: Contact your department graduate director/chairman for funding from the Ministry of Education. MSU Epi/Biostatistics provide partial funding and cover tuition fee. • Qualification: TOEFL, GRE, GPA, Reference letter. • My contact: fuw@msu.edu www.msu.edu/~fuw • Application: WWW.MSU.EDU

  31. Thank you! • Q and A. • Office: CMS 415.

More Related