810 likes | 2.47k Views
The art of learning about and understanding our world through data. The Nature of Statistics:. File Information: 31 Slides To Print : Set PRINT WHAT to Handouts. Under HANDOUTS select the number of slides per page. A sample of the layout on a page appears to the right.
E N D
The art of learning about and understanding our world through data. The Nature of Statistics: • File Information: 31 Slides • To Print : • Set PRINT WHAT to Handouts. • Under HANDOUTS select the number of slides per page. A sample of the layout on a page appears to the right. • To change the orientation of the printing, select the PREVIEW button (lower left) and then the Orientation option on the Print Preview menu.
Essentials: The Nature of Statistics(a.k.a: The bare minimum I should take along from this topic.) Definitions and relationships as presented on the Anatomy of the Basics: Statistical Terms and Relationships sheet Identification of variables and their characteristics Careful review of data and their presentation Providing a context for the data Why percentages and not numeric counts when making comparisons
7 80 35,000 • What do you know about these numbers? • What do they mean to you? • What is missing?
Okay, so What is Statistics? Statistics is the study of how to collect, organize, analyze, interpret and report numerical information in order to make decisions. Statistics are the numeric data we use to better understand our world. They may take the form of frequencies, means, percentages, variances, etc. (or is that What ARE Statistics?)
Basic Terminology • DATA: Are numbers with a context - i.e. numbers with meaning. • Example: not 48.2, but 48.2 kg., not 5.23, but $5.23) • VARIABLE: A characteristic or property of an individual population unit that varies from one person or thing to another. • Example: age, square footage, and assessed value represent three variables associated with homes in Oneonta. • Variables have Values. Example: The variable hair color has the values of brown, blonde, red, etc. • UNIT (Element): Any individual member of the population. • Example: Each bottle of soda in a production run is a unit.
Population Basic Terminology • POPULATION: • Complete collection of all elements or units (usually people, objects, transactions, or events) that we are interested in studying. • In terms of data, a populationis the collection of all outcomes, responses, measurement, or counts that are of interest. • CENSUS: A complete enumeration (or accounting) of the population (i.e. collecting data from every element (or unit)in the population). • PARAMETER: A numeric value associated with a population. (e.g. - the average height of ALL students in this class, given that the class has been defined as a population)
Sample Basic Terminology • SAMPLE: Taken from a population a sample is a subset from which information is collected. • Example: 25 cans of corn (sample) randomly obtained from a full days production (population)) • STATISTIC: A numeric value associated with a sample. (e.g. - the average height of 10 individuals randomly selected from the class (defined population)). • INFERENCE:An estimate, prediction, or some other generalization about a population based on information contained in a sample. • Example: Based upon a randomly selected sample of 35 flights at JKF International Airport (the sample; individual flights are units) taken from all flights on Dec. 24, 2009 (defined population), we can state with a degree of confidence the mean delay for the population of the day’s flights was 35 minutes (sample statistic in context being inferred to the population).
In Summary To include ALL units, you are looking at: • POPULATION • CENSUS • PARAMETERS To work with a subset of all units, you are looking at: • SAMPLE • STATISTICS • INFERENCES to a population Parameter Population Statistic Sample
Example: Identifying Data Sets In a recent survey, 1708 adults in the United States were asked if they think global warming is a problem that requires immediate government action. Nine hundred thirty-nine of the adults said yes. Describe the data set. Identify: The population: The sample: A variable being studied: Values of the Variable: (Adapted from: Pew Research Center) Source; Larson/Farber 4th ed.
Examples: Populations & Samples • Smoking: Identify the population and sample. • In a recent survey, 250 college students at Union College were asked if they smoked cigarettes regularly. Thirty-five of the students said yes. Identify the population and the sample. • Student Income: Decide whether the numerical value describes a population parameter or a sample statistic. • A survey of 450 Cornell University students reported their average weekly income from part-time employment was $325. • For both of the above studies: • What are the units of the population/sample? • Identify a variable being studied. • Identify values of the variable.
Descriptive Statistics: • DESCRIPTIVE STATISTICS: Organize and summarize information using numerical and graphical methods. • Examples: • Summarizing the age of cars driven by students in a frequency table. • Graphing the ages of students. • Identifying the mean speed of cars driving in a 30 mph zone. • A descriptive statement describes some aspect of the data. • Examples: • Thirty-eight percent of the orange trees suffered damage due to the cold temperatures. • The average MPG for the 23 cars studied was 2,738 lb.
Descriptive Statistics at Work: SUNY Oneonta Car Registrations During the 2006 year there were 1.346 cars registered at SUNY Oneonta. Car registrations contain many variables, such as car type, car color, year of car, and license plate number. Noted below are ways descriptive statistics are used to convey information about the selected variables: a frequency table of Registrant Type (i.e. who registered the car), a graphic presentation of Vehicle Age, as well as descriptive measures of vehicle age. Frequency Table: Graphic presentation (here a Histogram): Mean & Median:The Mean age of cars driven by students was 7.45 years (vs. 6.19 yrs. for employees). The Median age of registered vehicles for students was 7.0 years (5.0 years for employees).
Inferential Statistics: • INFERENTIAL STATISTICS: Uses sample data to make estimates, decisions, predictions, or other generalizations about the population. • The aim of inferential statistics is to make an inference about a population, based on a sample (as opposed to a census), AND to provide a measure of precision for the method used to make the inference. • An inferential statement uses data from a sample and applies it to a population.
Examples of Inferential Statistics: • A Gallup Poll found that 57% of dating teens had been out with somebody of another race or ethnic group (+/- 4.5%; 95% CI) • Interpretation: We are 95% confident that between 52.5% and 61.5% of dating teens have been out with someone of a different race/ethnicity. • A Gallup Poll found that 40% of Americans would quit their job if they won the lottery (+/- 4%; 95% CI). • Interpretation: We are 95% confident that the true population proportion of Americans who would quit their job if they were to win a lottery lies between 36% and 44%).
Example: Descriptive and Inferential Statistics A large sample of men, aged 48, was studied for 18 years. For unmarried men, approximately 70% were alive at age 65. For married men, 90% were alive at age 65. (Source: The Journal of Family Issues) Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics? Source: Larson/Farber 4th ed.
Two Types of Data Qualitative Data – can be separated into different categories that are distinguished by some nonnumeric characteristic. Qualitative data is also referred to as categorical or attribute data. Examples include gender, eye color, and car brands Quantitative Data – numbers representing counts or measurements. This type of data may be subdivided into two categories...
Two Types of Quantitative Data • Discrete Data - resultwhen the number of possible values is either a finite or countably infinite number. • Examples: Siblings, Cars, and Coins in a jar (think of whole number counts here; even if you cannot count them all). • Continuous Data - result from infinitely many possible values corresponding to some continuous scale that covers a range of values without gaps, interruptions, or jumps. Continuous data can assume any value, including fractional parts. • Examples: Height, Weight, Time N.B.: Qualitative data cannot be classified as discrete or continuous.
Example: Classifying Data by Type The base prices of several vehicles are shown in the table. Which data are qualitative data and which are quantitative data? (Source Ford Motor Company) Source: Larson/Farber 4th ed.
4 Levels of Measurement Lowest to highest The level of measurement determines which statistical calculations are meaningful. The four levels of measurement are: nominal,ordinal,interval,andratio. Nominal Levels of Measurement Ordinal Interval Ratio
Levels of Measurement (cont.) • Nominal – characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme. Qualitative data. • Examples: Gender, Yes/No, Political Party affiliation, names of students. • Ordinal– characterized by data that can be arranged in some order, but the differences between data values either cannot be determined or are meaningless. These variables may be either qualitative (categorical) data or quantitative (numerical) data. • Examples: Military Rank, Position in a race, Attitude scales.
Levels of Measurement (cont.) • Interval– like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, there is no natural zero starting point. Quantitative data. • Examples: Temperature (F or C); IQ scores; Calendar Years. • Ratio– is the interval level modified to include the natural zero starting point. At this level, differences and ratios are both meaningful. Quantitative data. • Examples: Height, Weight, Time, Age.
Level of measurement Put data in categories Arrange data in order Subtract data values Determine if one data value is a multiple of another Summary of Levels of Measurement • Nominal Yes No No No • Ordinal Yes Yes No No • Interval Yes Yes Yes No • Ratio Yes Yes Yes Yes
Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level?(Source: Nielsen Media Research) Source: Larson/Farber 4th ed.
Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the interval level? Which data set consists of data at the ratio level?(Source: Major League Baseball) Source: Larson/Farber 4th ed.
Statistics is the study of how to collect, organize, analyze, interpret and report numerical information. Anatomy of the Basics: Statistical Terms and Relationships Descriptive Statistics: methods for organizing and summarizing information. E.g. Number of students in this class by major, baseball standings, housing sales by month. Inferential Statistics: methods for drawing conclusions and measuring the reliability of those conclusions using sample results. E.g. Political views of all 4-year college students. Parameter:numerical characteristic of a population. Census: data collected from ALL members of the population. Population:all individuals, items, or objects whose characteristics are being studied. Population vs. Sample Sample: a portion of the population selected for study. Statistic:numerical characteristic of a sample. Qualitative:a variable that cannot be measured numerically E.g. Gender, eye color. Variable: a characteristic or property of an individual unit. Variables have values. Discrete:a variable whose values are countable. It can only assume certain values, with no intermediate values. E.g. Number of auto accidents in Oneonta in 1998. Quantitative: a variable that can be measured numerically. E.g. Income, height, number of siblings one has. Continuous: a variable that can assume any numerical value over an interval or intervals. E.g.Time. Nominal: grouping individual observations into qualitative categories or classes. E.g. Grouping individuals by whether they are left-handed or right-handed. No Arithmetic Operations: individual observations can only be categorized. Ordinal: individual observations are assigned a number or “ranking.” There is a sense of “more than,” but you cannot say “how much” more than. E.g. Military ranks. Scaling of Variables (Measurement Levels) Interval:variables have no true zero point. Cannot say how much more. E.g. Temperature ( F or C), IQ scores. Arithmetic Operations: individual observations have meaningful numeric values. Ratio:variables have a true zero point. Can say how much more. E.g. Weight, height.
Misuse of Statistics ah yes… the old torture the data long enough and they will confess to anything routine... • Precise Numbers Tonight’s paid attendance was 56,423 • Guesstimates It was estimated that one million spectators lined the rode to L’Alpe d’Heuz for the 16th stage of the 2004 Tour de France race. • Distorted Percentages New and improved with 50% more ... – 50% might not be a meaningful amount. • Partial Pictures Ford truck adds • Loaded Questions Line item veto • Misleading Graphs Visual distortions of data • Pictographs The crescive cow. • Pollster Pressure • Public bathrooms. • Small/Bad Samples • 67% suspended • Self-Selected Surveys • CNN phone-in surveys
Visual Presentations of Data – Beware Source: http://findarticles.com
Data Considerations Anecdotal Evidence – basing our conclusions on a few individual cases. e.g. We remember the airplane crash that kills several hundred people and fail to notice that data for all flights show that flying is much safer than driving. Lurking Variables – almost all relationships between two variables are influenced by other variables lurking in the background.
On Time Delayed Alaska Airlines 3274 (86.7%) 501 (13.3%) America West 6438 (89.1%) 787 (10.9%) Airline Flights: Alaska Airlines vs. American West Which would you choose to fly?
Departure Location On Time Delayed On Time Delayed Los Angeles 497 62 694 117 Phoenix 221 12 4840 415 San Diego 212 20 383 65 San Francisco 503 102 320 129 Seattle 1841 305 201 61 TOTAL 3274 501 6438 787 Alaska Airlines vs. American WestA Closer Look Alaska Air America West