480 likes | 493 Views
Learn about statistics, what statisticians do, and how statistics is used in politics, industry, decision-making, and data analysis. Explore measurement scales, variables, data collection, and probability concepts.
E N D
What is Statistics? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.7 24 7.6 .552 Andy 36 31.5 21 8.4 .465 Larry 30 33.0 18 5.6 .493 Michael 31 35.1 29 6.1 .422
Job of a Statistician • Collects numbers or data • Systematically organizes or arranges the data • Analyzes the data…extracts relevant information to provide a complete numerical description • Infers general conclusions about the problem using this numerical description
POLITICS • Forecasting and predicting winners of elections • Where to concentrate campaign appearances, advertising and $$… If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6%
INDUSTRY • To market product… • Interested in the average length of life of a light bulb • Cannot test all the bulbs
Uses of Statistics • Statistics is a theoretical discipline in its own right • Statistics is a tool for researchers in other fields • Used to draw general conclusions in a large variety of applications
Common Problem • Decision or prediction about a large body of measurements which cannot be totally enumerated. Examples • Light bulbs (to enumerate population is destructive) • Forecasting the winner of an election (population too big; people change their minds) Solutions Collect a smaller set of measurements that will (hopefully) be representative of the larger set.
Data and Statistics • Data consists of information coming from observations, counts, measurements, or responses. Statisticsis the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. A populationis the collection of all outcomes, responses, measurement, or counts that are of interest. A sampleis a subset of a population.
Introduction to Probability and StatisticsThirteenth Edition Chapter 1 Describing Data with Graphs
Introduction to Statistical Terms • Variable • Something that can assume some type of value • Data • consists of information coming from observations, counts, measurements, or responses. • Data Set • A collection of data values • Observation • the value, at a particular period, of a particular variable • An experimental unitis the individual or object on which a variable is measured. • A measurementresults when a variable is actually measured on an experimental unit. • A set of measurements, called data,can be either a sampleor a population.
Example • Variable • Time until a light bulb burns out • Experimental unit • Light bulb • Typical Measurements • 1500 hours, 1535.5 hours, etc.
Populations and Samples • A Population is the set of all items or individuals of interest • Examples: All likely voters in the next election All parts produced today All sales receipts for November • A Sample is a subset of the population • Examples: 1000 voters selected at random for interview A few parts selected for destructive testing Every 100th receipt selected for audit
population sample Sampling Techniques Parameters Statistics Statistical Procedures inference
Parameters & Statistics A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic. Parameter Population Statistic Sample
Univariate data:One variable is measured on a single experimental unit. • Bivariate data:Two variables are measured on a single experimental unit. • Multivariate data:More than two variables are measured on a single experimental unit. How many variables have you measured?
Nominal • for things that are mutually exclusive/non-overlapping • there is no order or ranking • For example: gender (male or female), religion. • Ordinal • can be ordered, but not precisely. • For example : health quality (excellent, good, adequate, bad, terrible) • Interval • involves measurements, but there is no meaningful zero. • For example : temperature. • Ratio • involves measurements, it can be ranked and there are precise differences between the ranks, as well as having a meaningful zero. • For example: height, time, or weight Measurement Scales/Level
Qualitative Quantitative Continuous Discrete Types of Variables
Types of Variables • Qualitative variablesmeasure a quality or characteristic on each experimental unit. • Examples: • Hair color (black, brown, blonde…) • Make of car (Dodge, Honda, Ford…) • Gender (male, female) • State of birth (California, Arizona,….) • Quantitative variablesmeasure a numerical quantity on each experimental unit. • Discreteif it can assume only a finite or countable number of values. • Continuousif it can assume the infinitely many values corresponding to the points on a line interval.
Examples • For each orange tree in a grove, the number of oranges is measured. • Quantitative discrete • For a particular day, the number of cars entering a college campus is measured. • Quantitative discrete • Time until a light bulb burns out • Quantitative continuous
Statistical Methods Descriptive Statistics Inferential Statistics • Utilizes numerical and graphical methods to look for patterns in the data set. • The data can either be a representation of the entire population or a sample
Descriptive Statistics Graphical Numerical Quantitative Quantitative Qualitative Qualitative • Bar Chart • Pie Chart • Bar/Pie Chart • Line Plot (Time Series) • Dotplot • Stem-and-Leaf Plot • Histogram • Ogive • Boxplot • Tables, frequency, percentage, cumulative percentage • Cross tabulation • Central Tendency • Dispersion (Variability) Note: Some graphs require a tabular representation (frequency distribution)
Graphing Qualitative Variables • Use adata distributionto describe: • What values of the variable have been measured • How often each value has occurred • “How often” can be measured 3 ways: • Frequency • Relative frequency = Frequency/n • Percent = 100 x Relative frequency • Bar Chart • Pie Chart
m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m Example • A bag of M&Ms contains 25 candies: • Raw Data: Statistical Table:
Graphs Bar Chart Pie Chart
Graphing Quantitative Variables • Bar/Pie Chart • Line Plot (Time Series) • Dotplot • Stem-and-Leaf Plot • Histogram • Ogive • Boxplot
Graphing Quantitative Variables (1) • A single quantitative variable measured for different population segments or for different categories of classification can be graphed using abar orpie chart. A Big Mac hamburger costs $4.90 in Switzerland, $2.90 in the U.S. and $1.86 in South Africa.
Graphing Quantitative Variables (2) • A single quantitative variable measured over time is called atime series. It can be graphed using alineorbar chart. CPI: All Urban Consumers-Seasonally Adjusted
4 5 6 7 Graphing Quantitative Variables (3) -Dotplot • The simplest graph for quantitative data • Plots the measurements as points on a horizontal axis, stacking the points that duplicate existing points. • Example: The set 4, 5, 5, 7, 6
Stem and Leaf Plots (4) • A simple graph for quantitative data • Uses the actual numerical values of each data point. • Divide each measurement into two parts: the stem and theleaf. • List the stems in a column, with avertical lineto their right. • For each measurement, record the leaf portion inthesame rowas its matching stem. • Order the leaves from lowest to highest in each stem. • Provide akeyto your coding.
Example : Stem-and-Leaf Plot The prices ($) of 18 brands of walking shoes: 90 70 70 70 75 70 65 68 60 74 70 95 75 70 68 65 40 65 4 0 5 6 0 5 5 5 8 8 7 0 0 0 0 0 0 4 5 5 8 9 0 5
Relative Frequency Histograms (5) • A relative frequency histogramfor a quantitative data set is a bar graph in which the height of the bar shows “how often” (measured as a proportion or relative frequency) measurements fall in a particular class or subinterval. • Divide the range of the data into5-12subintervalsof equal length. • Calculate theapproximate widthof the subinterval as Range/number of subintervals. • Round the approximate width up to a convenient value. • Use the method ofleft inclusion,including the left endpoint, but not the right in your tally. • Create astatistical tableincluding the subintervals, their frequencies and relative frequencies.
Relative Frequency Histograms (5) : cont’d • Draw therelative frequency histogram,plotting the subintervals on the horizontal axis and the relative frequencies on the vertical axis. • The height of the bar represents • Theproportionof measurements falling in that class or subinterval. • The probabilitythat a single measurement, drawn at random from the set, will belong to that class or subinterval.
Example 1 The ages of 50 tenured faculty at a state university. • 34 48 70 63 52 52 35 50 37 43 53 43 52 44 • 42 31 36 48 43 26 58 62 49 34 48 53 39 45 • 34 59 34 66 40 59 36 41 35 36 62 34 38 28 • 43 50 30 43 32 44 58 53 Range • We choose to use6 intervals. • Minimum class width=(70 – 26)/6 = 7.33 • Convenient class width= 8 • Use6classes of length8, starting at25.
Describing the Distribution Shape? Outliers? What proportion of the tenured faculty are younger than 42.5? What is the probability that a randomly selected faculty member is 52 or older? Skewed right No. (16 + 5)/50 = 31/50 = .62=62% (10 + 4 + 1)/50 = 15/50 = .34
How Many Class Intervals? • Many (Narrow class intervals) • may yield a very jagged distribution with gaps from empty classes • Can give a poor indication of how frequency varies across classes • Few (Wide class intervals) • may compress variation too much and yield a blocky distribution • can obscure important patterns of variation. (X axis labels are upper class endpoints)
Example 2 Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Example 2: Solution (Frequency Distribution) • Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find range: 58 - 12 = 46 • Select number of classes: 5 (usually between 5 and 12) • Compute class interval (width): 10 (46/5 then round up) • Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 • Compute class midpoints: 15, 25, 35, 45, 55 • Count observations & assign to classes
Example 2: Solution (Frequency Distribution) (continued) Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Relative Frequency Frequency Class Percentage 10 ≤ X < 20 3 .15 15 20 ≤ X < 30 6 .30 30 30 ≤ X < 40 5 .25 25 40 ≤ X < 50 4 .20 20 50 ≤ X < 60 2 .10 10 Total 20 1.00 100
Histogram: Example 2 Class Midpoint Class Frequency 10 ≤ X < 20 15 3 20 ≤ X < 30 25 6 30 ≤ X < 40 35 5 40 ≤ X < 50 45 4 50 ≤ X < 60 55 2 (No gaps between bars) Class Midpoints
Ogive (6) An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. • Two type of ogive: (i) ogive less than (ii) ogive greater than First, build a table of cumulative frequency.
Cumulative Frequency Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Cumulative Frequency Cumulative Percentage Class Frequency Percentage 10 ≤ X < 20 3 15 3 15 20 ≤ X < 30 6 30 9 45 30 ≤ X < 40 5 25 14 70 40 ≤ X < 50 4 20 18 90 50 ≤ X < 60 2 10 20 100 Total 20 100
Graphing Cumulative Frequencies: The Ogive Lower class boundary Cumulative Percentage Class <10 0 0 10 ≤ X < 20 10 15 20 ≤ X < 30 20 45 30 ≤ X < 40 30 70 40 ≤ X < 50 40 90 50 ≤ X < 60 50 100 Class Boundaries (Not Midpoints)
Interpreting Graphs: Location and Spread • Where is the data centered on the horizontal axis, and how does it spread out from the center?
Mound shaped and symmetric (mirror images) Skewed right: a few unusually large measurements Skewed left: a few unusually small measurements Bimodal: two local peaks Interpreting Graphs: Shapes
No Outliers Outlier Interpreting Graphs: Outliers Are there any strange or unusual measurements that stand out in the data set?
Example • A quality control process measures the diameter of a gear being made by a machine (cm). The technician records 15 diameters, but inadvertently makes a typing mistake on the second entry. 1.991 1.891 1.991 1.988 1.993 1.989 1.990 1.988 1.988 1.993 1.991 1.989 1.989 1.993 1.990 1.994