1 / 172

Statistics for Non-Statisticians

Statistics for Non-Statisticians. Kay M. Larholt, Sc.D. Vice President, Biometrics & Clinical Operations Abt Bio-Pharma Solutions. Topics. Basic Statistical Concepts 2) Study Design 3) Blinding and Randomization 4) Hypothesis testing 5) Power and Sample Size. Basic Statistical Concepts.

etienne
Download Presentation

Statistics for Non-Statisticians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for Non-Statisticians Kay M. Larholt, Sc.D. Vice President, Biometrics & Clinical Operations Abt Bio-Pharma Solutions

  2. Topics • Basic Statistical Concepts 2) Study Design 3) Blinding and Randomization 4) Hypothesis testing 5) Power and Sample Size

  3. Basic Statistical Concepts

  4. Statistics Per the American Heritage dictionary - “The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.” • Two broad areas • Descriptive – Science of summarizing data • Inferential – Science of interpreting data in order to make estimates, hypothesis testing, predictions, or decisions from the sample to target population.

  5. Introduction to Clinical Statistics • Statistics - The science of making decisions in the face of uncertainty • Probability - The mathematics of uncertainty • The probability of an event is a measure of how likely the event is to happen

  6. Sample versus Population

  7. Clinical Statistics • Biostatisticians are statisticians who apply statistics to the biological sciences. • Clinical statistics are statistics that are applied to clinical trials

  8. Basic Statistical Concepts • Types of data • Descriptive statistics • Graphs • Basic probability concepts • Type of probability distributions in clinical statistics • Sample vs. population

  9. Types of Data

  10. Types of Quantitative Variables

  11. Patient Categories • 1 Between 21 and 40 • 2 Between 41 and 60 • 3 Between 21 and 40 • Between 21 and 40 • 5 Between 61 and 80 • Patient Size (mm) • 1 24 • 2 45 • 3 26 • 23 • 5 67 • We can categorize into: • 0-20 mm • 21-40 mm etc. later Continuous Data Data should be collected in its “rawest” form. We can always categorize data later. (We can never “uncategorize” data.) • e.g. If you measure prostate size as part of the clinical trial then capture the size in mm on the CRF.

  12. Basic Data Summarization Techniques • The objective of data summarization is to describe the characteristics of a data set. Ultimately, we want to make the data set more comprehensible and meaningful. • To put data in a concise form, use • Summary descriptive statistics • Graphs • Tables

  13. Measures of central tendency Mean, Median, Mode Measures of dispersion Range, Variance, Standard deviation Measures of relative standing Lower quartile (Q1) Upper quartile (Q3) Interquartile range (IQR) range (IQR) Descriptive Statistics for Continuous Variables

  14. Arithmetic average: sum of all observations divided by # of observations. Example: The average age of a group of 10 people is 24.2 years Who are they? Mean

  15. Answer: They could be ten “twenty-somethings” who go out to dinner together: Pete aged 24, Jane aged 26, Louise aged 21, Bob aged 22, Julie aged 23, Sue aged 22, Jenn aged 27, John aged 28, Jeff aged 20 and Mark aged 29. The mean age for these 10 people is: (24+26+21+22+23+22+27+28+20+29)/10 = 24.2 years Mean

  16. Or alternatively: They could be Mr. & Mrs. Smith and their 8 grandchildren: Susie aged 3, Abby aged 5, Max aged 8, Laura aged 10, Joshua aged 10, Emma aged 12, Jane aged 13, Sarah aged 18, Mrs. Smith aged 80, Mr. Smith aged 83. The mean age for these 10 people is: (3+5+8+10+10+12+13+18+80+83)/10= = 24.2 years Mean

  17. Mean • Presenting the average alone does not give you much information about the data you are looking at.

  18. The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array. Median

  19. Median • Example • The age of the people in our data set is: • 24, 26, 21, 23, 22, 27, 28, 20, 29 ( I took out one of the 22 year olds to make this example easier) • Arranging the data in ascending order gives: • 20, 21, 22, 23, 24, 26, 27, 28, 29 • The median is 24

  20. There are three kinds of lies:lies, damned lies, and statistics. This well-known saying is part of a phrase attributed to Benjamin Disraeli and popularized in the U.S. by Mark Twain

  21. Median Home Price Connecticut: Darien • Median home price: $1,295,000 • Location: about 40 miles northeast of midtown Manhattan • Population: 20,209, households 6,592

  22. Properties of Mean and Median • There are unique means and medians for each variable in the data set. • Median is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. • Mean is a poor measure of central tendency in skewed distributions.

  23. The value of the observation that appears most frequently. Example The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Since the score of 81 occurs the most, the modal score is 81. 3-14 Mode

  24. Averages and What Else? • As we have seen, just knowing the mean or even the median of a data set does not tell us enough about the data. We need more information to really describe the data.

  25. Measures of Dispersion • Once we know something about the centre of the data we need to understand how the data are dispersed around this centre. • How variable are the data?

  26. Maximum value in the data set minus Minimum value in the data set The age of the patients in our data set is: 21, 25, 19, 20, 22 Range = 25 – 19 = 6 2. The age of the patients in our data set is: 21, 45, 19, 20, 22. Range = 45 – 19 = 26 When max and min are unusual values, range may be a misleading measure of dispersion. The range only uses the 2 extreme values in the data. Range

  27. Variance and Standard Deviation • The variance of a data set measures how far each data point is from the mean of the data set. • It provides a measure of how spread out the data points are • The Standard Deviation is the square root of the variance

  28. Variance and Standard Deviation • Variance: • Measure of dispersion, the square of the deviations of the data from the mean • Standard deviation: • positive square root of the variance • Small std dev: • observations are clustered tightly around the mean • Large std dev: • observations are scattered widely about the mean

  29. Standard Deviation Take each observation and subtract it from the mean of the observations Square the answer Sum up all the results Divide by n-1 Take the square root

  30. 19 20 21 22 25 19 20 21 22 45 Example – Standard Deviation • The age of the patients in our data set is: • 21, 25, 19, 20, 22 • Mean = 21.4, Median = 21,StdDev = 2.302 • 2. The age of the patients in our data set is: • 21, 45, 19, 20, 22. • Mean = 25.4, Median = 21,StdDev = 11.014

  31. Choosing an Appropriate Method of Central Tendency • The mean is ordinarily the preferred measure of central tendency. The mean should always be presented along with the variance or the standard deviation • There are situations when a median might be more appropriate: • - a skewed distribution • - a small number of subjects

  32. Measures of Relative Standing • Descriptive measures that locate the relative position of an observation in relation to the other observations.

  33. Measures of Relative Standing • The pth percentile is a number such that p% of the observations of the data set fall below and (100-p)% of the observations fall above it. • Lower quartile = 25th percentile (Q1) • Mid-quartile = 50th percentile (median or Q2) • Upper quartile = 75th percentile (Q3) • Interquartile range (IQR = Q3-Q1)

  34. 19 20 21 22 25 19 20 21 22 45 Measures of Relative Standing… an Example The age of the patients in our data set is: 21, 25, 19, 20, 22 Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2 The age of the patients in our data set is: 21, 45, 19, 20, 22 Q1 = 20, Q2 = 21, Q3 = 22, IQR = 2

  35. Definitions • Statistics - The science of making decisions in the face of uncertainty • Probability - The mathematics of uncertainty • The probability of an event is a measure of how likely the event is to happen

  36. Basic Probability Concepts Sample spaces and events Simple probability Joint probability

  37. Sample Spaces • Collection of all possible outcomes Example: All six faces of a die Example: All 52 cards in a deck

  38. Sample Space Gumballs in a gumball machine 60 red 50 green 40 yellow 30 white 25 pink 20 blue 16 purple Total: 241 gumballs

  39. Simple event Outcome from a sample space with one characteristic Examples: A red card from a deck of cards A purple gumball from the gumball machine Joint event Involves two outcomes simultaneously Example: An ace that is also red from a deck of cards Events

  40. Events Mutually exclusive events • Two events cannot occur together Example: Drawing one card from a deck A: Drawing a queen of diamonds B: Drawing a queen of clubs As only one of these can happen Events A and B are mutually exclusive

  41. Probability 1 Certain • Probability is the numerical measure of the likelihood that an event will occur • Value is between 0 and 1 .5 0 Impossible

  42. Number of event outcomes P( E ) = Total number of possible outcomes in the sample space Computing Probabilities The probability of an event E: Assumes each of the outcomes in the sample space is equally likely to occur

  43. Example: What is the probability of rolling a 4 when you roll a die? # of possible outcomes in the sample space = 6 # of 4s in the sample space = 1 Prob (rolling a 4 when you roll a die) = 1/6 Computing Probabilities

  44. Example: What is the probability of rolling a six and a four when you roll 2 dice? # of possible outcomes in the sample space = 36 # of ways to roll one 6 and one 4 = 2 P( ) = 2/36 = .0555 Computing Probabilities

  45. Computing Joint Probability The probability of a joint event, A and B:

  46. Computing Joint Probability P (Red Card and an Ace) = 2 Red Aces Total # Cards = 2/52 = 1/26

  47. Type of Probability Distributions in Clinical Statistics Bernoulli Binomial Normal

  48. Bernoulli Distribution The bernoulli distribution is the “coin flip” distribution. X is bernoulli if its probability function is: • Examples:X=1 for heads in coin toss • X=1 for male in survey • X=1 for defective in a test of product

  49. The binomial distribution is just n independent bernoullis added up. It is the number of “successes” in n trials. Probability of success is usually denoted by p, and therefore probability of failure is 1-p. Example:Number of heads when we flip a coin 10 times. Here n = 10, p=0.5 (the probability of getting a head when we toss the coin once). Binomial Distribution

  50. Binomial Distribution • The binomial probability function Example:X =Number of heads when we flip a coin 10 times. Here X ~ Binomial (n = 10, p=0.5) n! = n factorial = n.n-1.n-2…..1 10!=10.9.8.7.6.5.4.3.2.1=3,628,800

More Related