1 / 48

An Overview of Statistics

Learn about statistics, what statisticians do, and how statistics is used in politics, industry, decision-making, and data analysis. Explore measurement scales, variables, data collection, and probability concepts.

elishaf
Download Presentation

An Overview of Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Statistics

  2. What is Statistics? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.7 24 7.6 .552 Andy 36 31.5 21 8.4 .465 Larry 30 33.0 18 5.6 .493 Michael 31 35.1 29 6.1 .422

  3. Job of a Statistician • Collects numbers or data • Systematically organizes or arranges the data • Analyzes the data…extracts relevant information to provide a complete numerical description • Infers general conclusions about the problem using this numerical description

  4. POLITICS • Forecasting and predicting winners of elections • Where to concentrate campaign appearances, advertising and $$… If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6%

  5. INDUSTRY • To market product… • Interested in the average length of life of a light bulb • Cannot test all the bulbs

  6. Uses of Statistics • Statistics is a theoretical discipline in its own right • Statistics is a tool for researchers in other fields • Used to draw general conclusions in a large variety of applications

  7. Common Problem • Decision or prediction about a large body of measurements which cannot be totally enumerated. Examples • Light bulbs (to enumerate population is destructive) • Forecasting the winner of an election (population too big; people change their minds) Solutions Collect a smaller set of measurements that will (hopefully) be representative of the larger set.

  8. Data and Statistics • Data consists of information coming from observations, counts, measurements, or responses. Statisticsis the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. A populationis the collection of all outcomes, responses, measurement, or counts that are of interest. A sampleis a subset of a population.

  9. Introduction to Probability and StatisticsThirteenth Edition Chapter 1 Describing Data with Graphs

  10. Introduction to Statistical Terms • Variable • Something that can assume some type of value • Data • consists of information coming from observations, counts, measurements, or responses. • Data Set • A collection of data values • Observation • the value, at a particular period, of a particular variable • An experimental unitis the individual or object on which a variable is measured. • A measurementresults when a variable is actually measured on an experimental unit. • A set of measurements, called data,can be either a sampleor a population.

  11. Example • Variable • Time until a light bulb burns out • Experimental unit • Light bulb • Typical Measurements • 1500 hours, 1535.5 hours, etc.

  12. Populations and Samples • A Population is the set of all items or individuals of interest • Examples: All likely voters in the next election All parts produced today All sales receipts for November • A Sample is a subset of the population • Examples: 1000 voters selected at random for interview A few parts selected for destructive testing Every 100th receipt selected for audit

  13. population sample Sampling Techniques Parameters Statistics Statistical Procedures inference

  14. Parameters & Statistics A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic. Parameter Population Statistic Sample

  15. Univariate data:One variable is measured on a single experimental unit. • Bivariate data:Two variables are measured on a single experimental unit. • Multivariate data:More than two variables are measured on a single experimental unit. How many variables have you measured?

  16. Nominal • for things that are mutually exclusive/non-overlapping • there is no order or ranking • For example: gender (male or female), religion. • Ordinal • can be ordered, but not precisely. • For example : health quality (excellent, good, adequate, bad, terrible) • Interval • involves measurements, but there is no meaningful zero. • For example : temperature. • Ratio • involves measurements, it can be ranked and there are precise differences between the ranks, as well as having a meaningful zero. • For example: height, time, or weight Measurement Scales/Level

  17. Qualitative Quantitative Continuous Discrete Types of Variables

  18. Types of Variables • Qualitative variablesmeasure a quality or characteristic on each experimental unit. • Examples: • Hair color (black, brown, blonde…) • Make of car (Dodge, Honda, Ford…) • Gender (male, female) • State of birth (California, Arizona,….) • Quantitative variablesmeasure a numerical quantity on each experimental unit. • Discreteif it can assume only a finite or countable number of values. • Continuousif it can assume the infinitely many values corresponding to the points on a line interval.

  19. Examples • For each orange tree in a grove, the number of oranges is measured. • Quantitative discrete • For a particular day, the number of cars entering a college campus is measured. • Quantitative discrete • Time until a light bulb burns out • Quantitative continuous

  20. Statistical Methods Descriptive Statistics Inferential Statistics • Utilizes numerical and graphical methods to look for patterns in the data set. • The data can either be a representation of the entire population or a sample

  21. Descriptive Statistics Graphical Numerical Quantitative Quantitative Qualitative Qualitative • Bar Chart • Pie Chart • Bar/Pie Chart • Line Plot (Time Series) • Dotplot • Stem-and-Leaf Plot • Histogram • Ogive • Boxplot • Tables, frequency, percentage, cumulative percentage • Cross tabulation • Central Tendency • Dispersion (Variability) Note: Some graphs require a tabular representation (frequency distribution)

  22. Graphing Qualitative Variables • Use adata distributionto describe: • What values of the variable have been measured • How often each value has occurred • “How often” can be measured 3 ways: • Frequency • Relative frequency = Frequency/n • Percent = 100 x Relative frequency • Bar Chart • Pie Chart

  23. m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m Example • A bag of M&Ms contains 25 candies: • Raw Data: Statistical Table:

  24. Graphs Bar Chart Pie Chart

  25. Graphing Quantitative Variables • Bar/Pie Chart • Line Plot (Time Series) • Dotplot • Stem-and-Leaf Plot • Histogram • Ogive • Boxplot

  26. Graphing Quantitative Variables (1) • A single quantitative variable measured for different population segments or for different categories of classification can be graphed using abar orpie chart. A Big Mac hamburger costs $4.90 in Switzerland, $2.90 in the U.S. and $1.86 in South Africa.

  27. Graphing Quantitative Variables (2) • A single quantitative variable measured over time is called atime series. It can be graphed using alineorbar chart. CPI: All Urban Consumers-Seasonally Adjusted

  28. 4 5 6 7 Graphing Quantitative Variables (3) -Dotplot • The simplest graph for quantitative data • Plots the measurements as points on a horizontal axis, stacking the points that duplicate existing points. • Example: The set 4, 5, 5, 7, 6

  29. Stem and Leaf Plots (4) • A simple graph for quantitative data • Uses the actual numerical values of each data point. • Divide each measurement into two parts: the stem and theleaf. • List the stems in a column, with avertical lineto their right. • For each measurement, record the leaf portion inthesame rowas its matching stem. • Order the leaves from lowest to highest in each stem. • Provide akeyto your coding.

  30. Example : Stem-and-Leaf Plot The prices ($) of 18 brands of walking shoes: 90 70 70 70 75 70 65 68 60 74 70 95 75 70 68 65 40 65 4 0 5 6 0 5 5 5 8 8 7 0 0 0 0 0 0 4 5 5 8 9 0 5

  31. Relative Frequency Histograms (5) • A relative frequency histogramfor a quantitative data set is a bar graph in which the height of the bar shows “how often” (measured as a proportion or relative frequency) measurements fall in a particular class or subinterval. • Divide the range of the data into5-12subintervalsof equal length. • Calculate theapproximate widthof the subinterval as Range/number of subintervals. • Round the approximate width up to a convenient value. • Use the method ofleft inclusion,including the left endpoint, but not the right in your tally. • Create astatistical tableincluding the subintervals, their frequencies and relative frequencies.

  32. Relative Frequency Histograms (5) : cont’d • Draw therelative frequency histogram,plotting the subintervals on the horizontal axis and the relative frequencies on the vertical axis. • The height of the bar represents • Theproportionof measurements falling in that class or subinterval. • The probabilitythat a single measurement, drawn at random from the set, will belong to that class or subinterval.

  33. Example 1 The ages of 50 tenured faculty at a state university. • 34 48 70 63 52 52 35 50 37 43 53 43 52 44 • 42 31 36 48 43 26 58 62 49 34 48 53 39 45 • 34 59 34 66 40 59 36 41 35 36 62 34 38 28 • 43 50 30 43 32 44 58 53 Range • We choose to use6 intervals. • Minimum class width=(70 – 26)/6 = 7.33 • Convenient class width= 8 • Use6classes of length8, starting at25.

  34. Describing the Distribution Shape? Outliers? What proportion of the tenured faculty are younger than 42.5? What is the probability that a randomly selected faculty member is 52 or older? Skewed right No. (16 + 5)/50 = 31/50 = .62=62% (10 + 4 + 1)/50 = 15/50 = .34

  35. How Many Class Intervals? • Many (Narrow class intervals) • may yield a very jagged distribution with gaps from empty classes • Can give a poor indication of how frequency varies across classes • Few (Wide class intervals) • may compress variation too much and yield a blocky distribution • can obscure important patterns of variation. (X axis labels are upper class endpoints)

  36. Example 2 Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

  37. Example 2: Solution (Frequency Distribution) • Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find range: 58 - 12 = 46 • Select number of classes: 5 (usually between 5 and 12) • Compute class interval (width): 10 (46/5 then round up) • Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 • Compute class midpoints: 15, 25, 35, 45, 55 • Count observations & assign to classes

  38. Example 2: Solution (Frequency Distribution) (continued) Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Relative Frequency Frequency Class Percentage 10 ≤ X < 20 3 .15 15 20 ≤ X < 30 6 .30 30 30 ≤ X < 40 5 .25 25 40 ≤ X < 50 4 .20 20 50 ≤ X < 60 2 .10 10 Total 20 1.00 100

  39. Histogram: Example 2 Class Midpoint Class Frequency 10 ≤ X < 20 15 3 20 ≤ X < 30 25 6 30 ≤ X < 40 35 5 40 ≤ X < 50 45 4 50 ≤ X < 60 55 2 (No gaps between bars) Class Midpoints

  40. Ogive (6) An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. • Two type of ogive: (i) ogive less than (ii) ogive greater than First, build a table of cumulative frequency.

  41. Cumulative Frequency Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Cumulative Frequency Cumulative Percentage Class Frequency Percentage 10 ≤ X < 20 3 15 3 15 20 ≤ X < 30 6 30 9 45 30 ≤ X < 40 5 25 14 70 40 ≤ X < 50 4 20 18 90 50 ≤ X < 60 2 10 20 100 Total 20 100

  42. Graphing Cumulative Frequencies: The Ogive Lower class boundary Cumulative Percentage Class <10 0 0 10 ≤ X < 20 10 15 20 ≤ X < 30 20 45 30 ≤ X < 40 30 70 40 ≤ X < 50 40 90 50 ≤ X < 60 50 100 Class Boundaries (Not Midpoints)

  43. Interpreting Graphs: Location and Spread • Where is the data centered on the horizontal axis, and how does it spread out from the center?

  44. Mound shaped and symmetric (mirror images) Skewed right: a few unusually large measurements Skewed left: a few unusually small measurements Bimodal: two local peaks Interpreting Graphs: Shapes

  45. No Outliers Outlier Interpreting Graphs: Outliers Are there any strange or unusual measurements that stand out in the data set?

  46. Example • A quality control process measures the diameter of a gear being made by a machine (cm). The technician records 15 diameters, but inadvertently makes a typing mistake on the second entry. 1.991 1.891 1.991 1.988 1.993 1.989 1.990 1.988 1.988 1.993 1.991 1.989 1.989 1.993 1.990 1.994

More Related