1 / 29

Statistics for Medical Sciences Division - Kanishka Bhattacharya Resources

Explore lectures, lecture notes, and resources for statistics in medical sciences division. Learn about data analysis, visualization, and statistical tests.

natane
Download Presentation

Statistics for Medical Sciences Division - Kanishka Bhattacharya Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for Medical Sciences Division Kanishka Bhattacharya

  2. Resources • Lectures • Lecture Notes • Andy Field (2009) Discovering Statistics Using SPSS (3rd Edition) Sage publishers • Ramsey and Schafer (2002) The Statistical Sleuth(2ndEdition) Thomson Learning • Email: kanishka@well.ox.ac.uk

  3. Lesson Structure 4 lectures & practical sessions = ( 2 x 4 ) hrs • Overview of statistics, exploratory data analysis • Z-tests, t-tests, Analysis of Variance (ANOVA), post-hoc analysis • Categorical analysis, odds ratio • Linear regression

  4. Overview of Statistics

  5. Scientific Process

  6. Analysing a set of data • Look at the data (initial checks on the data) • Downloading data, formatting, data collection, discrepant data, missing data • Visualize the data (exploratory data analysis) • Descriptive statistics, informative tables, well-constructed figures • Analyse the data (definitive analysis) • Formal statistical analysis • Quantify any interesting results • Report the findings

  7. Computers and Statistics • Excel, SPSS, Minitab, Stata, Mathlab, R, etc… • Advantages • Speed, accuracy, ease of data manipulation • Easy to produce plots, cross-tabulation tables, summary statistics • Disadvantages • Inappropriate analysis / use of wrong tests • Data dredging

  8. Sample Selection • Simple Random Sample • Stratified Sample • Cluster Sampling • Multistage Sampling • Multi-phase Sampling

  9. Types of Variables • Often, test to use depends on the type of variable at hand • Two main classes of variables: • Categorical • Numerical • Categorical variables further divided into two sub-classes: • Nominal categorical (example: gender, ethnic groups) • Ordinal categorical (example: size of a car, quality of teaching)

  10. Numerical Variables • Distinguish between discrete or continuous numerical variables • Discrete • Integer values (number of male subjects, number of episodes of flu outbreaks) • Continuous • Takes a whole range of values (height, weight) • Continuous variables treated as discrete (age)

  11. Exploratory Data Analysis

  12. EDA • Tabular EDA • Univariate tables, cross-tabulation of categorical variables • Numerical EDA • Location, spread, skewness, covariance and correlation • Graphical EDA • Frequency plots, histograms, boxplots, scatterplots • The precise form of EDA depends on the data at hand.

  13. Tabular EDA • Useful for summarising categorical data. For example, the following table shows the classification of 2,555 DNA samples in a case-control study of Malaria onset into the respective phenotypes:

  14. Tabular EDA • For two categorical variables: i.e. the distribution of genotypes for a sub-sample of the individuals affected and unaffected by severe malaria Question: Appears to be more affected individuals with the C allele (or conversely, more unaffected individuals have the AA genotype). Does this mean anything? (see later for test of independence)

  15. Calculating informative numbers which summarise the dataset • What are the numbers useful for describing the age of 1,059 individuals with diabetes? • Location parameters (mean, median, mode) • Skewness Mean age (54.6 years) 20 30 40 50 60 70 80 AGE Numerical EDA • Spread (range, standard deviation, interquartile range)

  16. Numerical EDA • Sample QuartilesQ1: 25th quantile (or value of the 25% ranked data)Q2: 50th quantile (also known as median of data)Q3: 75th quantile (or value of the 75% ranked data) • Interquartile range (IQR) IQR = Q3 – Q1 • Minimum, Maximum of data

  17. Numerical EDA • Numbers can be informative to identify potential problems with the data • Example: Suppose the height for 1,496 individuals randomly sampled from the population produces the following summary IQR = Q3 – Q1 = 188 – 172 = 16 Range = Max – Min = 201 – 0 = 201

  18. Numerical EDA Skewness b1 is unit-free, where 0 indicates symmetry about sample mean. Positive values indicate right-skewness, and negative values indicate left-skewness.

  19. Covariance and Correlation • Two numerical variables: height and weight • Questions • Are there any relationship between these variables? • If there is, how do we quantify this relationship? • Covariance and correlation Measures the degree of association between two numerical variables.

  20. Covariance and Correlation • Covariance is scale-dependent, and correlation is unit-free. • More intuitive to interpret correlation than covariance. • Example: Covariance for height and weight is 2.4 when assessed using metres and kilograms, but 240,000 when assessed using centimetres and grams. Correlation is a constant value at 0.83 for both scenario. • Correlation always bounded between –1 and 1 inclusive.

  21. Example

  22. Graphical EDA • Visual summaries of the data • Flagging outliers, obvious relationships, check for distribution

  23. Boxplots • Univariate boxplot: for 1 numerical variable Ends of box: Q1 and Q3 Length of box: IQR White line: Sample median Whiskers: 1.5 times IQR Lines outside whiskers: Outliers Circles: Extreme outliers

  24. Boxplots • Multivariate boxplots: for 1 numerical variable across different levels of a categorical variable • Graphical comparison

  25. Scatterplots • Graphical representation for 2 numerical variables

  26. Scatterplots

  27. Scatterplots

  28. Lecture Notes and Datasets: http://www.well.ox.ac.uk/lectures

  29. Login: calPassword: warthog

More Related