1 / 30

Introduction

Introduction. Population – the entire group of concern Sample – only a part of the whole Based on sample, we’ll make a prediction about the population. Bad sampling: convenience, bias, voluntary Good sampling: simple random sample(SRS). Inferential Stats: making predictions or

odeda
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction Population – the entire group of concern Sample – only a part of the whole Based on sample, we’ll make a prediction about the population. Bad sampling: convenience, bias, voluntary Good sampling: simple random sample(SRS). • Inferential Stats: making predictions or • inferences about a population based on a sample

  2. Experiments Observation – no attempt to influence Experiment– deliberately imposes some treatment Basic design principles: Control the effects of lurking variables Randomize which subject gets which treatment Use large sample size to reduce chance variation • Statistical Significance: • An observed effect so big that it would rarely • occur just by chance.

  3. Picturing Distributions with Graphs What makes up any set of data? • Individuals • objects described by data • can be • Variables • characteristic of individuals of particular interest • different values possible for different people

  4. Two kinds of variables • Categorical (Qualitative) • describes an individual by category or quality. • examples like • Numerical (Quantitative) • describes an individual by number or quantity. • discrete for variables that are • continuous for variables that are • examples like

  5. Describing Categorical Variables Tables summarize the data set by • listing possible categories. • giving the number of objects in each category. • or show the count as a percentage. Picture the distribution of a cat. var. with • Pie charts • Bar graphs

  6. Pie Chartswhole is split into appropriate pieces.

  7. Bar Graph Horizontal line keeps track of categorical values. Vertical bars at each value keeps track of # or %. % # 25 20 15 12 5 4 A B C D E F

  8. Example 1 80 AASU students in an Elem. Stats class come from one of four colleges (S & T, Edu, Health, Lib. Arts). The breakdown of these 80 students is given below.

  9. Ex1 - Pie Chart

  10. Ex1 – Bar Graph % 30 20 10 U LA E H ST

  11. Describing Quantitative Variables Tables summarize the data set by • listing possible intervals (ranges, classes). • giving the number of individuals in each class • or showing the number as a percentage. Picture the distribution of a quant. var. with • Histogram (similar to bar graph but now vertical • bars of neighboring classes touch) • Where one class ends, the next begins.

  12. Example 2 Consider the ages of the full-time faculty in the math dept. The breakdown of these 19 individuals is given in the table. % 30 20 10 10 30 50 70

  13. Info from histograms Helps to describe a distribution with • pattern (shape, center, spread) • deviations (outliers) from the rest of the data • Could result from unusual observation or typo • For shape, look at symmetric vs. skewed

  14. Examples 3 and 4 % 2 4 6 8 10 12 % v 40 60 20 80 100

  15. Example 4 without outliers % 30 10 5 v 20 40 60 80 100 % 20 10 5 v v v 20 40 60 80 100

  16. Describing Distributions with Numbers There are better ways to describe a quantitative data set than by an estimation from a graph. Center: mean, median, mode Spread: quartiles, standard deviation

  17. Center: Mean • The mean of a data set is the arithmetic average of • all the observations. • Given a data set:

  18. Mean – Example 1 • Your test scores in a Stats Class are: 60, 75, 92, 80 • Your mean score is:

  19. Mean – Example 2 • Compare high temperatures in Savannah for July 2010 and July 2011. • July 2010 high temps: 83, 87, 84, …, 97, 100, 92 • July 2011 high temps: 94, 91, 93, …, 97, 99, 99

  20. Center: Median • The median of a data set is the middle value of • all the (ordered) observations. • Given a data set:

  21. Median – Examples 3/4 • 11 tests: 60, 77, 92, 80, 84, 93, 80, 95, 65, 66, 75 • Ordered data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95 • 10 dice rolls: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10

  22. Center: Mode • The mode of a data set is the value that appears the most. • Tests data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95 • Dice rolls: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10 • 2010 July High Temps mode: • 2011 July High Temps mode:

  23. Spread: Quartiles A measure of center is not useful by itself • Are other observations close or far from center? Take an ordered data set and find: • M, • Q1, • Q3, • IQR = Summary of data in the “Five-Number Summary”:

  24. Quartiles – Example 5 • 11 tests: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95 • 5-num-sum: • Visualize 5-num-sum with a boxplot. • Draw rectangle with ends at Q1 and Q3. • Draw line in the box for the median. • Draw lines to the last observations within 1.5IQR of the quartiles. • Observations outside 1.5IQR of the quartiles are suspected outliers.

  25. Boxplot – Example 6 • 5-Num-Sum: 60, ____, 80, ____, 95 50 60 70 80 90 100 • Draw rectangle with ends at Q1 and Q3 • Draw line in the box for the median • Draw lines to last observations within 1.5IQR of the quartiles • Observations outside 1.5IQR of the quartiles are suspected outliers

  26. Boxplot – Example 7 • July 2010 5-Num-Sum: 83, 92, 94, 97, 102 • July 2011 5-Num-Sum: 84, 91, 95, 98, 99 • 2010 • IQR = 97-92=5 2010 2011 • 2011 • IQR = 98-91=7 80 85 90 95 100 105

  27. Spread: Standard Deviation More common measure of spread (in conjunction with the mean) is the standard deviation. A single deviation from the mean looks like For every value in a data set, deviations are either positive, negative or zero. Finding an average of those will be trouble, since when you add the deviations together, you’ll get 0. • Example 1 data: 60, 75, 92, 80

  28. To deal with this “adding to zero”, we get rid of any negative terms by squaring each deviation. A single squared deviation from the mean looks like: The average of the squared deviations is called the variance: • n-1 is called the degrees of freedom, since knowledge of the first (n-1) deviations will automatically set the last one.

  29. The standard deviationis the square root of the variance.

  30. When to use what? For skewed data: For (nearly) symmetric data: Outliers have a big impact on mean and std. dev. Consider two data sets: • Set 1: 1, 1, 3, 5, 10 • Set 2: 1, 1, 3, 5, 70

More Related