1 / 48

Arguing with Data: Introduction to Descriptive Data Analysis

Arguing with Data: Introduction to Descriptive Data Analysis. Professor Sarah Reber Lecture 1. Why I’m teaching this class. Causal analysis is important Difficult to learn Difficult to do Selection, selection, selection!

nevaeh
Download Presentation

Arguing with Data: Introduction to Descriptive Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Arguing with Data: Introduction to Descriptive Data Analysis Professor Sarah Reber Lecture 1

  2. Why I’m teaching this class • Causal analysis is important • Difficult to learn • Difficult to do • Selection, selection, selection! • Useful to know what is true in the world, even if you don’t always know what caused what • I’m extremely annoyed by bad graphs and mis-use of statistics/data • Thinking is good, thinking supported by data is better

  3. What is Descriptive Data Analysis? • How has air pollution declined over time? • What are the characteristics of the uninsured? • What share of people own a home? • Do we spend more on education than we used to? What do we spend it on? • Do men earn more than women? • Where do the poor live? • Do whites score higher on tests than blacks? • Do poor teenagers apply to good colleges? • How and why has spending on the SSDI program grown? • Are there more boys than girls born in China? • Are there income gaps in preschool-going?

  4. Using Data to Support Decisions • “Fact” • True fact • Relevant fact • Argument • Theory • Data • Explanation • Uncertainty

  5. Facts, True Facts, Relevant Facts

  6. How does this relate to statistics? • What share of people own a home? • I estimate that based on a sample of data • Statistics tells me how uncertain my estimate is due to sampling error • Let’s me build a confidence interval • Statistics doesn’t tell me • Why I care • What it means • What (if anything) I should do

  7. What about program evaluation? • Program evaluation is fundamentally causal • What is the effect of X (a program) on Y (some outcome I care about)? • Want to know what will happen if I take Action A instead of Action B • Key issue: finding a plausible counterfactual/control • Critical to making good decisions • If we know nothing about causal effects of anything, we are truly lost…

  8. Example: Preschool • Descriptive: What share of 4 year-olds attend preschool? How does that depend on income? • Statistical: What is the confidence interval around the answer • Causal: What is the effect of attending preschool on kindergarten performance? On wages later in life? • Statistical: What is the confidence interval around that estimate • Q: How much will it cost to expand public preschool programs?

  9. Toward a Causal Understanding • Theory/Reasoning/Argument + Descriptive Data Analysis  Causal Understanding / Problem Identification • If my model of the world is correct, what would be true in data? • Is my theory consistent with the facts? • Maybe I don’t know the cause, but can reject some theories

  10. Some Theories • Unemployment Insurance makes people less likely to find a job • Increases in the black/white wage ratio are mostly due to relative improvement in blacks’ human capital (versus reductions in discrimination) • Indian children are shorter than African children because of genetics • Thimerisol causes autism • Rising special education costs are due primarily to rising autism diagnosis/prevalence • The “Missing Girls” of China are explained by biological factors (versus maltreatment and discrimination)

  11. Autism Prevalence (Wikipedia) English: Bar chart of the number (per 1,000 U.S. resident children aged 6–17) of children aged 6–17 who were served under the Individuals with Disabilities Education Act (IDEA) with a diagnosis of autism, from 1996 through 2007. Counts of children diagnosed with autism for each year were taken from Table 1-9 of IDEA Part B Child Count (2005) and from Table 1-11 of IDEA Part B Child Count (2007). These were divided by census estimates for U.S. resident population aged 6–17 taken from US census estimates for 1990–1999 resident population by age and the similar estimates for 2000–2007; for all years, the September population estimates were used.

  12. http://www.autism-watch.org/general/edu.shtml

  13. Outline of Course • Technique • Visualization = Graphing • Money and inflation • Targeting • Life expectancy • Regression Decomposition • Applications • Disability Insurance Trends and Policy • Inequality Trends and Policy • Others…

  14. Assignments and Grading • Chart Talk 15% • Weekly Assignments 40% • Final Presentation 10% • Final Memo 20% • Class Participation 15%

  15. Next Time • Stata workshop in class • Bring your computer with Stata • Make sure you have basic understanding of how to use Stata • See instructions and assignment online

  16. Example: Missing Girls • My son’s preschool class seemed to have way more boys than girls last year • Again, his kindergarten class has a lot of boys • A lot of moms of young kids are saying there is a shortage of girls in preschool and kindergarten • Some preschools are saying it is easier to get in if you are a girl (because they tend to balance the gender ratio) • Where are the missing girls of West LA?

  17. First: Why care? • Many schools attempt to balance the gender ratio in each cohort and class • Why? • Missing girls can point to maltreatment of and discrimination against girls • Speculation is driving me crazy! • Why speculate when there are data in the world?!? • Will come back to this

  18. From Speculation to True Fact • Did my son’s class just SEEM to have more boys, or is it actually true • Do his friends classes just SEEM to have more boys, or is it actually true • Beware confirmation bias • Do I ignore it when people report a preponderance of girls? • Collect Data! • How?

  19. Some Data • I counted the boys and girls on the roster • 15 boys • 10 girls • 40% girls • Asked the preschool director for the boy/girl counts from last year • 11 boys • 5 girls • 31% girls • Now what?

  20. Now what? • OK, so it is not that unusual that this could happen for my child by chance • What do I need to distinguish these theories? • These classes randomly had more boys • The classes are drawing from a population that has more boys • Note: still not saying WHY

  21. More data Jack’s Class Public Private

  22. Data collection techniques I'm writing to ask for a hopefully easy favor. Please just ignore me if this is annoying or seems too silly :). I am teaching a new class to public policy students this year ("Arguing with Data: Introduction to Descriptive Data Analysis"). I need some data for an example I plan to do in class addressing the question of whether the sex ratio in parts of West LA has become skewed in some cohorts. Discussions with some of you and others about the seeming excess of boys not only in Temple Isaiah but in other West Side little kid venues inspired me to pursue this example -- and I hope you are as excited as I am to know what the data say! If you have a moment to reply to this email with the following information, I would be most grateful. Your former Golden Sun's current school: Your child's current teacher: Number of boys in the class: Number of girls in the class:

  23. … Cont • None of the classroom compositions are stat sig different from 49% but… • Notice that all the public school averages were below the 49% line • Pool all the data to see if the female share is off for the whole sample • Separately for public and private

  24. What about Preschoolers • Got data from largest preschool in Los Angeles County • Located on the West Side • We’ll call it “TI”

  25. Jack Rose

  26. Jack Rose

  27. Want a bigger, more general sample! • Public school enrollment by grade and gender from California Department of Education • Define my sample • Geographically • Type of school • What am I trying to understand • Why to I care? • Will use West LA zip codes • Include all public schools in those zips that have kindergarten enrollment > 0

  28. Use 90077, 90210, 90024, 90212, 90035, 90034, 90064, 90025

  29. What about … • Younger cohorts • My son is in Kindergarten THIS year • The Population • Differential cross-border public enrollment • Differential private school enrollment • Other Data? • Private school enrollment not by gender

  30. Age and Sex in the Census • Where can I get this for everyone? • -Residents by age and gender • -Tabulated by zip code

  31. Clicked Here

  32. YIKES!!

  33. Yikes!

  34. Age 18

  35. Jack

  36. Conclusion • My son’s preschool and K classes have been skewed male • So are the K classes of his friends who replied to my query for data (in public schools) • Whole preschool is skewed somewhat male • Natural female share just inside 95% CI • West LA elementaries not skewed last year • Cohorts not skewed in 2010 Census • Can check back next year

  37. Process • “Fact” = Speculation/theory/question • Is it a “True fact”? • What is the ideal data? • What do I have? • How certain am I? • Departures of actual from ideal data • Sample sizes • What does it mean? / Why is it happening? • Hypothesis  Data  Interpretation • What should we do?

  38. 100 Million Missing Women • Sen (1990) argued that 100 million women were “missing” worldwide • Mostly India and China • Assuming “natural” sex ratio  how many more girls should there be? • Some debate, some revision, but most agree substantial numbers of women and girls are missing

  39. Why? • Sen: Discrimination, maltreatment, sex-selective abortion, sex-selective infanticide • Biological explanations? • Hepatitis B • (Maternal/Paternal age) • Why does it matter? • Indicator of status of women and girls • Implications of excess men

  40. Evidence • Direct evidence: Effect of Hep B appears tiny • Direct evidence of differential treatment • Vaccinations • Breastfeeding • Schooling • Family size • What would be true if biological explanations are important? • Sex ratio should not depend on birth order or sex composition of prior births

  41. Why would sex ratio depend on birth order and sex composition of prior births • It doesn’t in societies with less discriminatory cultures/histories • Not “Treatment Effect” analysis  evidence consistent with…

More Related