1 / 46

An Introduction to Statistics

An Introduction to Statistics. Two Branches of Statistical Methods. Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion Inferential statistics

tristaj
Download Presentation

An Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Statistics

  2. Two Branches of Statistical Methods • Descriptive statistics • Techniques for describing data in abbreviated, symbolic fashion • Inferential statistics • Drawing inferences based on data. Using statistics to draw conclusions about the population from which the sample was taken.

  3. Populations and Samples • A parameter is a characteristic of a population • e.g., the average height of all Americans. • A statistics is a characteristic of a sample • e.g., the average height of a sample of Americans. • Inferential statistics infer population parameters from sample statistics • e.g., we use the average height of the sample to estimate the average height of the population

  4. Descriptive Statistics Numerical Data Properties Central Variation Shape Tendency Mean Range Skewness Median Interquartile Range Kurtosis Mode Standard Deviation Variance

  5. Ordering the Data: Frequency Tables • Frequency table (distribution) • A listing in order of magnitude of each score achieved and the number of times the score occurred. • Grouped frequency table (distribution) • Range of scores in each of several equally sized intervals • Why Frequency Tables? • Gives some order to a set of data • Can examine data for outliers • Is an introduction to distributions

  6. Frequency Tables

  7. Grouped Frequency Tables Range Number Percent Cumulative 30-39 1 2.08 2.08 40-49 3 6.25 8.33 50-59 4 8.33 16.67 60-69 12 25.00 41.67 70-79 19 39.58 81.25 80-89 7 14.58 95.83 90-100 2 4.17 100.00 Total 48 100

  8. List each possible value, from highest to lowest Go one by one through the scores, making a mark for each score next to its value on the list Make a table showing how many times each value on your list was used Calculate the percentage of scores for each value Making a Frequency Table

  9. Making a Stem-and-Leaf Plot • Each data point is broken down into a “stem” and a “leaf.” Select one or more leading digits for the stem values. The trailing digit(s) becomes the leaves • First, “stems” are aligned in a column. • Record the leaf for every observation beside the corresponding stem value

  10. Stem and Leaf Plot • Stem-and-leaf of Shoes N = 139 Leaf Unit = 1.0 • 12 0 223334444444 • 63 0 555555555555566666666677777778888888888888999999999 • (33) 1 000000000000011112222233333333444 • 43 1 555555556667777888 • 25 2 0000000000023 • 12 2 5557 • 8 3 0023 • 4 3 • 4 4 00 • 2 4 • 2 5 0 • 1 5 • 1 6 • 1 6 • 1 7 • 1 7 5

  11. Stem and Leaf / Histogram Stem Leaf 2 1 3 4 3 2 2 3 6 4 3 8 8 5 2 5 By rotating the stem-leaf, we can see the shape of the distribution of scores. 6 Leaf 4 3 8 3 2 8 5 1 2 3 2 Stem 2 3 4 5

  12. Histograms • Histograms • Depicts information from a frequency table or a grouped frequency table as a bar graph

  13. Frequency Polygons • Frequency Polygons • Depicts information from a frequency table or a grouped frequency table as a line graph

  14. Frequency tables, histograms & polygons describe how the frequencies are distributed Distributions are a fundamental concept in statistics One peak Unimodal Two peaks Bimodal Shapes of Frequency Distributions

  15. Symmetrical vs. Skewed Frequency Distributions • Symmetrical distribution • Approximately equal numbers of observations above and below the middle • Skewed distribution • One side is more spread out that the other, like a tail • Direction of the skew • Positive or negative (right or left) • Side with the fewer scores • Side that looks like a tail

  16. Symmetric Skewed Right Skewed Left Symmetrical vs. Skewed

  17. Positively skewed distribution Positively Skewed Cluster towards the low end of the variable

  18. Skewed Frequency Distributions • Positively skewed • AKA Skewed right • Tail trails to the right

  19. Negatively Skewed Negatively skewed distribution Cluster towards the high end of the variable

  20. Skewed Frequency Distributions • Negatively skewed • Skewed left • Tail trails to the left

  21. Leptokurtic Mesokurtic Platykurtic Kurtosis • How peaked or flat the curve is • Leptokurtic: high and thin • Mesokurtic: normal shape • Platykurtic: flat and spread out

  22. Comparing the Kurtosis of Three Curves Curve A: Mesokurtic (Intermediate)

  23. Comparing the Kurtosis of Three Curves Curve B Leptokurtic (High & Peaked) Curve A: Mesokurtic (Intermediate)

  24. Comparing the Kurtosis of Three Curves Curve B Leptokurtic (High & Peaked) Curve A: Mesokurtic (Intermediate) Curve C Platykurtic (Broad & Flat)

  25. The Normal Curve • Seen often in the social sciences and in nature generally • Characteristics • Bell-shaped • Unimodal • Symmetrical • Average tails

  26. Central Tendency • Give information concerning the average or typical score of a number of scores • mean • median • mode

  27. Central Tendency: The Mean • The Mean is a measure of central tendency • What most people mean by “average” • Sum of a set of numbers divided by the number of numbers in the set

  28. Central Tendency: The Mean • so if • N = the number of numbers in X (10 for this example) • then

  29. Central Tendency: The Mean • Important conceptual point: • The mean is the balance point of the data in the sense that if we took each individual score (X) and subtracted the mean from them, some are positive and some are negative. If we add all of those up we will get zero. • Also, the sum of the absolute values of the negative numbers is equal to the sum of the absolute values of the positive numbers

  30. Central Tendency:The Median • Middlemost or most central item in the set of ordered numbers; it separates the distribution into two equal halves • If odd n, middle value of sequence • if X = [1,2,4,6,9,10,12,14,17] • then 9 is the median • If even n, average of 2 middle values • if X = [1,2,4,6,9,10,11,12,14,17] • then 9.5 is the median; i.e., (9+10)/2 • Median is not affected by extreme values

  31. Central Tendency: The Mode • The mode is the most frequently occurring number in a distribution • if X = [1,2,4,7,7,7,8,10,12,14,17] • then 7 is the mode • Mode is not affected by extreme values • There may be no mode or several modes

  32. Mean Mean Mean Median Mode Mode Mode Median Median Symmetric (Not Skewed) Positively Skewed Negatively Skewed Mean, Median, Mode

  33. When to Use What • Mean is a great measure. But, there are time when its usage is inappropriate or impossible. • Nominal data: Mode • The distribution is bimodal: Mode • You have ordinal data: Median or mode • Are a few extreme scores: Median

  34. Measures of Central Tendency Overview Central Tendency Mode Mean Median Midpoint of ranked values Most frequently observed value

  35. Variability • Variability • How tightly clustered or how widely dispersed the values are in a data set. • Example • Data set 1: [0,25,50,75,100] • Data set 2: [48,49,50,51,52] • Both have a mean of 50, but data set 1 clearly has greater Variability than data set 2.

  36. Variability: The Range • The Range is one measure of variability • The range is the difference between the maximum and minimum values in a set • Example • Data set 1: [0,25,50,75,100]; R: 100-0 = 100 • Data set 2: [48,49,50,51,52]; R: 52-48 = 4 • The range ignores how data are distributed and only takes the extreme scores into account

  37. Quartiles • Split Ordered Data into 4 Quarters • = first quartile • = second quartile= Median • = third quartile 25% 25% 25% 25%

  38. Variability: Interquartile Range • Difference between third & first quartiles • Interquartile Range = Q3 - Q1 • Spread in middle 50% • Not affected by extreme values

  39. Variability: Standard Deviation • “The Standard Deviation tells us approximately how far the scores vary from the mean on average” The typical deviation in a given distribution

  40. Variability: Standard Deviation • Standard Deviation can be calculated with the sum of squares (SS) divided by N

  41. Variability: Standard Deviation • let X = [3, 4, 5 ,6, 7] • M = 5 • (X - M) = [-2, -1, 0, 1, 2] • subtract M from each number in X • (X - M)2 = [4, 1, 0, 1, 4] • squared deviations from the mean • S(X - M)2 = 10 • sum of squared deviations from the mean (SS) • S(X - M)2 /N = 10/5 = 2 • average squared deviation from the mean • S(X - M)2 /N = 2 = 1.41 • square root of averaged squared deviation

  42. Variability: Standard Deviation • let X = [1, 3, 5, 7, 9] • M = 5 • (X - M) = [-4, -2, 0, 2, 4 ] • subtract M from each number in X • (X - M)2 = [16, 4, 0, 4, 16] • squared deviations from the mean • S(X - M)2 = 40 • sum of squared deviations from the mean (SS) • S(X - M)2 /N = 40/5 = 8 • average squared deviation from the mean • S(X - M)2 /N = 8 = 2.83 • square root of averaged squared deviation

  43. Variability: Standard Deviation • The square of the standard deviation is called the variance Standard Deviation Variance

  44. Standard Deviation & Standard Scores • Z scores are expressed in the following way • Z scores express how far a particular score is from the mean in units of standard deviation • if (X- M) = SD then (X- M)/SD = 1, and X is said to be one standard deviation above the mean

  45. Standard Deviation & Standard Scores • Z scores provide a common scale to express deviations from a group mean

  46. Standard Deviation and Standard Scores • Let’s say someone has an IQ of 145 and is 52 inches tall • IQ in a population has a mean of 100 and a standard deviation of 15 • Height in a population has a mean of 64” with a standard deviation of 4 • How many standard deviations is this person away from the average IQ? • How many standard deviations is this person away from the average height?

More Related