540 likes | 651 Views
Economic Reasoning Using Statistics. Econ 138 Dr. Adrienne Ohler. How you will learn. . Textbook: Stats : Data and Models 2 nd Ed ., by Richard D. DeVeaux , Paul E. Velleman , and David E. Bock Homework: MyStatLab brought to by www.coursecompass.com. The rest of this class.
E N D
Economic Reasoning Using Statistics Econ 138 Dr. Adrienne Ohler
How you will learn. • Textbook: Stats: Data and Models 2nd Ed., by Richard D. DeVeaux, Paul E. Velleman, and David E. Bock • Homework: MyStatLab brought to by www.coursecompass.com
The rest of this class • Attendance Policy • Cellphone Policy • Homeworks (10 out of 12) • Due Sundays by 11:59pm • Quizzes (5 out of 6) • Exams • Oct. 10th • Nov. 28 • Cumulative Optional Final • Data Project
Help for this Class • READ THE BOOK • Come to class prepared and awake • READ THE BOOK • Office Hours: T, H 9-11am and by Appointment • READ THE BOOK • Get a tutor at the Visor Center
Economic reasoning using statistics • What is economics? • The study of scarcity, incentives, and choices. • The branch of knowledge concerned with the production, consumption, and transfer of wealth. (google) • Wealth • The health, happiness, and fortunes of a person or group. (google) • What is/are statistics? • Statistics (the discipline) is a way of reasoning, a collection of tools and methods, designed to help us understand the world. • Statistics (plural) are particular calculations made from data. • Data are values with a context.
Statistics • Statistics (the discipline) is a way of reasoning, a collection of tools and methods, designed to help us understand the world. • Will the sun rise tomorrow?
What is Statistics Really About? • A statistic is a number that represents a characteristic of a population. (i.e. average, standard deviation, maximum, minimum, range) • Statistics is about variation. • All measurements are imperfect, since there is variation that we cannot see. • Statistics helps us to understand the real, imperfect world in which we live and it helps us to get closer to the unveiled truth.
The language of Statistics • For of literacy • 4 cows in a field • 7 cows by the road • 4 cows in a field on the left • 3 cows in a field on the right • At a party • Average age is 18 • Average age is 22 • Average age is 75
In this class • Observe the real world • Create a hypothesis • Collect data • Understand and classify our data • Graph our data • Standardize our data • Apply probability rules to our data • Test our hypothesis • Interpret our results
Questioning a Statistic • ½ of all American children will witness the breakup of a parent’s marriage. Of these, close to 1/2 will also see the breakup of a parent’s second marriage. • (Furstenberg et al, American Sociological Review �1983) • 66% of the total adult population in this country is currently overweight or obese. • (http://win.niddk.nih.gov/statistics/) • 28% of American adults have left the faith in which they were raised in favor of another religion - or no religion at all. • (http://religions.pewforum.org/reports)
Chapter 2 - What Are Data? • Information • Data can be numbers, record names, or other labels. • Not all data represented by numbers are numerical data (e.g., 1=male, 2=female). • Data are useless without their context…
The “W’s” • To provide context we need the W’s • Who • What (and in what units) • When • Where • Why (if possible) • and How of the data. • Note: the answers to “who” and “what” are essential.
Who • The Who of the data tells us the individual cases about which (or whom) we have collected data. • Individuals who answer a survey are called respondents. • People on whom we experiment are called subjectsor participants. • Animals, plants, and inanimate subjects are called experimental units. • Sometimes people just refer to data values as observations and are not clear about the Who. • But we need to know the Who of the data so we can learn what the data say.
Identify the Who in the following dataset? • Are physically fit people less likely to die of cancer? • Suppose an article in a sports medicine journal reported results of a study that followed 22,563 men aged 30 to 87 for 5 years. • The physically fit men had a 57% lower risk of death from cancer than the least fit group.
Who are they studying? • The cause of death for 22,563 men in the study • The fitness level of the 22,563 men in the study • The age of each of the 22,563 men in the study • The 22,563 men in the study
What and Why • Variables are characteristics recorded about each individual. • The variables should have a name that identify What has been measured. • A categorical (or qualitative) variable names categories and answers questions about how cases fall into those categories. • Categorical examples: sex, race, ethnicity
What and Why (cont.) • A quantitative variable is a measured variable (with units) that answers questions about the quantity of what is being measured. • Quantitative examples: income ($), height (inches), weight (pounds)
What and Why (cont.) • Example: In a fitness evaluation, one question asked to evaluate the statement “I consider myself physically fit” on the following scale: • 1 = Disagree Strongly; • 2 = Disagree; • 3 = Neutral; • 4 = Agree; • 5 = Agree Strongly. • Question: Is fitness categorical or quantitative?
What and Why (cont.) • We sense an order to these ratings, but there are no natural units for the variable fitness. • Variables fitness are often called ordinal variables. • With an ordinal variable, look at the Why of the study to decide whether to treat it as categorical or quantitative.
Are Fit People Less Likely to Die of Cancer? --------------Who is the population of interest? • All people • All men who exercise • All men who die of cancer • All men
Identifying Identifiers • Identifier variables are categorical variables with exactly one individual in each category. • Examples: Social Security Number, ISBN, FedEx Tracking Number • Don’t be tempted to analyze identifier variables. • Be careful not to consider all variables with one case per category, like year, as identifier variables. • The Why will help you decide how to treat identifier variables.
Counts Count • When we count the cases in each category of a categorical variable, the counts are not the data, but something we summarize about the data. • The category labels are the What, and • the individuals counted are the Who.
Where, When, and How • Whenand Where give us some nice information about the context. • Example: Values recorded at a large public university may mean something different than similar values recorded at a small private college.
Where, When, and How • GPA of Econ 101 classes. • Class 1 – 2.56 • Class 2 – 3.34 • Where – Washington State university • When – during the fall and spring semesters
Where, When, and How (cont.) • How the data are collected can make the difference between insight and nonsense. • Example: results from voluntary Internet surveys are often useless • Example: Data collection of ‘Who will win Republican Primary?’ • Survey ISU students on campus • Run a Facebook survey • Rasmussen Reports national telephone survey
Why statistics is challenging? • Word problems… • Rules of statistics don’t change • Data is information • If you are struggling with a problem, always ask the W questions about the data collected. • Who • What • When • Where • Why
Chapter 3 • Displaying and Describing • Categorical Data
Methods of Displaying Data • Frequency Table • Relative Frequency table • Bar Chart • Relative Frequency bar chart • Pie Chart • Contingency table • Contingency tables and Conditional Distributions • Segmented Bar charts
Frequency Tables: Making Piles • We can “pile” the data by counting the number of data values in each category of interest. • We can organize these counts into a frequency table, which records the totals and the category names.
Frequency Tables: Making Piles (cont.) • A relative frequency table is similar, but gives the percentages (instead of counts) for each category.
Bar Charts • A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. • A bar chart stays true to the area principle. • Thus, a better display for the ship data is:
Bar Charts (cont.) • A relative frequencybar chart displays the relative proportion of counts for each category. • A relative frequency bar chart also stays true to the area principle. • Replacing counts with percentages in the ship data:
What year in school are you? • Freshman • Sophomore • Junior • Senior
Pie Charts • When you are interested in parts of the whole, a pie chart might be your display of choice. • Pie charts show the whole group of cases as a circle. • They slice the circle into pieces whose size is proportional to the fraction of the whole in each category.
Methods of Displaying Data • Frequency Table (How much?) • Relative Frequency table (What percentage?) • Bar Chart (How much?) • Relative Frequency bar chart (What percentage?) • Pie Chart (How much?) • Contingency table and Marginal Distributions • Contingency tables and Conditional Distributions
Contingency Tables • A contingency table allows us to look at two categorical variables together. • It shows how individuals are distributed along each variable, contingent on the value of the other variable. • Example: we can examine the class of ticket and whether a person survived the Titanic:
Contingency Table The two variables in this contingency table is gender and class/section number.
Contingency Tables (cont.) • The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. • Each frequency distribution is called a marginal distribution of its respective variable.
Conditional Distributions • A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. • The following is the conditional distribution of ticket Class, conditional on having survived:
Conditional Distributions (cont.) • The following is the conditional distribution of ticket Class, conditional on having perished:
What Can Go Wrong? (cont.) • Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. • The percentage of students that are female & in ECO 138 Section 1 • (cell distribution) • The percentage of females that are in ECO 138 Section 1 • (conditioned upon females) • The percentage of ECO 138 Section 1 students that are females • (conditioned upon ECO 138 Section 1)
Conditional Distributions (cont.) • The conditional distributions tell us that there is a difference in class for those who survived and those who perished. • This is better shown with pie charts of the two distributions:
Segmented Bar Charts • A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. • Here is the segmented bar chart for ticket Class by Survival status:
Conditional Distributions (cont.) • We see that the distribution of Class/Section for the male is different from that of the female. • This leads us to believe that Class/Section and Gender are associated, that they are not independent. • The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable.