1 / 35

data analysis & Basic statistics

data analysis & Basic statistics. xiao wu xiao.wu@yale.edu. Purpose of this workshop. Statistics as a useful tool to analyze results Basic terminology and most commonly used tests Exposure to more advanced statistical tools. why do we need statistics?. why do we need statistics?. Summary

rgetz
Download Presentation

data analysis & Basic statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. data analysis & Basic statistics xiaowu xiao.wu@yale.edu

  2. Purpose of this workshop • Statisticsasausefultooltoanalyzeresults • Basicterminologyandmostcommonlyusedtests • Exposuretomoreadvancedstatisticaltools

  3. why do we need statistics?

  4. why do we need statistics? • Summary • Classification • Interpretation • Pattern searching • Abnormality identification • Prediction • Intrapolation • Extrapolation

  5. Summary http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/

  6. Summary • Mean, median,mode • Variance, standard deviation • Max, min values and range • Quartiles http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/

  7. Example Firm A • Mean: $5,800 Firm B • Mean: $5,000

  8. Example Firm A • Mean: $5,800 • Median: $4,000 • SD: $7,270 • 3rd Quartile: $4,000 • 1st Quartile: $500 Firm B Mean: $5,000 Median: $5,000 SD: $203 3rd Quartile: $5,175 1st Quartile: $4,825

  9. Example

  10. Classification Identification of variable • Independent vs. dependent • Numeric vs. categorical

  11. Pattern searching • Distribution of data • Some commonly used distributions • Uniform • Binomial • Poisson • … • Central limit theorem http://www.mathwave.com/img/art/graphs_pdf2.gif

  12. Uniform • Every outcome has equal chance • Example: • Flipping a coin • Rolling a dice • What if you need to flip multiple times?

  13. Binomial • Two outcomes, probability p and 1-p • Multiple trials: n • Example: • Flipping a coin 100 times • Germination of multiple seeds https://onlinecourses.science.psu.edu/stat414/sites/onlinecourses.science.psu.edu.stat414/files/lesson09/graph_n15_p02.gif

  14. Poisson • Counts of rare, independent events • Each with probability, or average rate p • Example: radioactive decay http://kaffee.50webs.com/Science/images/alpha_decay.gif

  15. the most important distribution

  16. Normaldistribution • Centrallimittheorem • Everydistributionconvergestoanormaldistribution • Largesamplesize normaldistribution Parameters: • mean • standard deviation https://www.mathsisfun.com/data/images/normal-distrubution-large.gif

  17. Pattern searching Hypothesistesting • Difference between two populations • Z-testort-test? • Whatdoesp-valuemean? • Family-wiseerror – Bonferronicorrection • More than two possibilities • Chisquaretest • Fisher’sexacttest • Morethantwovariables • ANOVA

  18. Example 1 SAT score is related to gender • Null hypothesis • Alternative hypothesis (3 possibilities) • One or two tail? • Z or T test? • p=0.07, conclusion?

  19. Example 2 Predictors of stroke • Age • Hypertension • Gender • …

  20. Example 3 Genome-wide association studies • Scanning markers across the DNA of many people to find genetic variations associated with certain diseases

  21. Pattern searching Hypothesistesting • Onevariable • Z-testort-test? • Whatdoesp-valuemean? • Family-wiseerror – Bonferronicorrection • Comparetwocategorical variables • Chisquaretest • Fisher’sexacttest • Morethantwovariables • ANOVA

  22. Chi square Punnett Square • A cross between two pea plants yields 880 plants, 639 green, 241 yellow • Hypothesis: The green allele is dominant and both parents are heterozygous. http://www2.lv.psu.edu/jxm57/irp/chisquar.html

  23. Chi Square • 75% green • 25% yellow

  24. chi square Degree of freedom: number of categories – 1 = 1

  25. chi square

  26. Prediction • Regression • Linearregression • Multiplelinearregression • Accuracyvs.simplicity • Validation • leave-k-out http://2.bp.blogspot.com/-W7Ptp8uB02U/T8UAGm4Uw5I/AAAAAAAAC08/DcHCtLWXv-U/s1600/actnactn+1.png

  27. example • Use brain structural measurements to predict a subject’s performance on picture vocabulary test • 144 total structural measurements • 521 subjects • First step: eliminate unnecessary variables • All zeros? • Highly correlated pairs • Variables that do not correlate well with performance score

  28. example • Run regression • Validation: leave 1 out and leave 10 out • Principle component analysis • …

  29. Prediction More complicatedmodels: • Baysianapproach • Usepriorknowledgetoupdateprediction • Diffusionweights • Uselocalstructuretopredictneighboringvalues

  30. Statistical tools • EXCEL • MatLab • R • MiniTab • …

  31. Questions?

  32. My own research • Cost-effectivenessanalysis • Mathematicalmodelinginmedicine • Simulateiterationsratherthanactualpatients

  33. Recent results

  34. Results

  35. Group exercise

More Related