610 likes | 749 Views
Data 101: Numbers, Graphs, and More Numbers. Emily Putnam-Hornstein, MSW Center for Social Services Research University of California at Berkeley March 11, 2008 The Performance Indicators Project at CSSR is supported by the California Department of Social Services and the Stuart Foundation.
E N D
Data 101:Numbers, Graphs, and More Numbers Emily Putnam-Hornstein, MSW Center for Social Services Research University of California at Berkeley March 11, 2008 The Performance Indicators Project at CSSR is supported by the California Department of Social Services and the Stuart Foundation
Agenda • Basic Terminology • Common Data Pitfalls • Graphics • Small Groups…
Data Basics… • Descriptive Data • Demographic characteristics of a population, place, office, etc. • Comparisons • Performance trends over time (one time period to another) • Differences/similarities between groups, counties, placement settings, interventions, etc. • Analyses • Exploring the relationship between two events (e.g., reunifications and re-entries to care) • Looking at the contributions of various factors to some outcome • Y=a+bX
Computing a Percent Answers.com Dictionary: Rate • A measure of a part with respect to a whole; a proportion: the mortality rate; a foster care entry rate. What Percentage of Children who were reunified in 2005 reunified within 12 months of entering care? Raw Numbers (counts) # Reunified w/in 12m = 290 # Reunified (total) = 440
Computing a Rate per 1,000 Answers.com Dictionary: Rate • A measure of a part with respect to a whole; a proportion: the mortality rate; a foster care entry rate. What was the foster care entry rate in 2005? (i.e., how many children entered care out of all possible children?) Raw Numbers (counts) # Entered Care = 1,333 Scales for a meaningful interpretation… # Child Population = 363,376
Measures of Central Tendency Mean: the average value for a range of data Median: the value of the middle item when the data are arranged from smallest to largest Mode: the value that occurs most frequently within the data 12 4 15 63 7 9 4 17 4 4 7 9 12 15 17 63 = 9.7 7 = 9
Measures of Variability Minimum: the smallest value within the data Maximum: the largest value within the data Range: the overall span of the data 4 4 7 9 12 15 17 63
Disaggregation • One of the most powerful ways to work with data… • Disaggregation involves dismantling or separating out groups within a population to better understand the dynamics • Useful for identifying critical issues that were previously undetected Aggregate Permanency Outcomes Race/Ethnicity County Age Placement Type
2000 July-December First EntriesCalifornia: Percent Exited to Permanency 72 Months From Entry 85%
2000 First EntriesCalifornia: Percent Exited to Permanency 72 Months From Entry 79% 88%
2000 First EntriesCalifornia: Percent Exited to Permanency 72 Months From Entry by Relative vs. Non-Relative Placement =84% =94% =84% =75%
3 Key Data Samples Data
How long do children stay in foster care? January 1, 2005 July 1, 2005 January 1, 2006 Child 1 Child 2 Child 3 Child 4 Child 5 Child 6 Child 7 Child 8 Child 9 Child 10
California Example: Age of Children in Foster Care (2003 first entries, 2003 exits, July 1 2004 caseload) Entries %
California Example: Age of Children in Foster Care (2003 first entries, 2003 exits, July 1 2004 caseload) Entries Exits %
California Example: Age of Children in Foster Care (2003 first entries, 2003 exits, July 1 2004 caseload) Entries Exits Point in Time %
Continuous vs. Discrete • The average foster child has 2.6 placements while in foster care • This number makes little sense because the underlying dimension is discrete (i.e., categorical, discontinuous) 1 2 3 4 5 6 There are 260 placements for every 100 foster children 2.6 x placements Continuous Data Discrete Data Age Days in Care Percentages / Rates Race/Ethnicity Placement Type Referral Reason
Correlation • Two “events” that covary with one another… Negative Correlation = Positive Correlation = % Births to Teen Moms % Reentries Event 1 Event 2 or Event 1 Event 2 % Reunified within 6 months
Percent Change Time Period 1 Time Period 2 10 children 11 children
10% 12% Percent Change Time Period 1 Time Period 2 % %
Exercise: Percent Change Calculation 50.7 48.3 -4.7% 12.0 10.8 -10% Baseline Referral Rate (time period 1): Percent Change: Comparison Referral Rate (time period 2): Minor Differences due to Rounding…
January 2004-January 2008California CWS Outcomes System:Federal Measures, Percent IMPROVEMENT
Cross-Sectional vs. Longitudinal Longitudinal Cross-Sectional (repeated) * Figure 5.23 retrieved from: http://www.mrs.umn/edu/~ratliffj/psy1051/cross.htm
There are three kinds of lies: Lies, Damned Lies and Statistics ^ Misused Statistics
Six Ways to Misuse Data (without actually lying!): • Using Raw Numbers instead of Ratios • Rank Data • Compare Apples and Oranges • Use ‘snapshots’ of Small Samples • Rely on Unrepresentative Findings • Logically ‘flip’ Statistics • Falsely Assume an Association to be Causal
1) Numbers that conceal more than they reveal… Challenger:“Violent crime in Anytown, CA has increased over the last year. 100 more crimes were recorded.” Incumbent:“Violent crime in Anytown, CA has decreased by 2% over the last year.” Who is telling the truth? They both are.
“There are approximately 82,000 children in the child welfare system in California – 20% of foster children in the nation, and the largest foster care population of all 50 states.” National Center for Youth Law, “Broken Promises”, 2006
“There are approximately 82,000 children in the child welfare system in California – 20% of foster children in the nation, and the largest foster care population of all 50 states.” NCYL, 2006 • Factually true? • Yes • Informative? • Not very. • What if California has one of the largest child populations of all states? • What if California has one of the smallest child populations of all states? • Misleading? • Maybe… • What is the point being made? • Telling us that California has the largest foster care population does not shed any light on how the state is performing!
2) Rank Data Two streets in Anytown, CA…. “Jane Doe is the poorest person living on Moneybags Avenue.” $$ Ave “Joe Shmoe is the wealthiest person living on Poverty Blvd.” It’s all relative… And SOMEONE will always be ranked last (and first) Poverty Blvd
“San Francisco ranks 55 out of 58 counties when it comes to state and national performance measures…” SF Chronicle, “No refuge. For Foster youth, it’s a state of chance”, November 15, 2005
“San Francisco ranks 55 out of 58 counties when it comes to state and national performance measures…” SF Chronicle San Francisco:AB636 UCB State Measures (Used in NCYL Ranking) % IMPROVEMENT Jan ‘04 compared to June ‘06 (+) indicates a measure where a % increase equals improvement. (-) indicates a measure where a % decrease equals improvement. indicates a measure where performance declined. • Rankings mask improvement over time. • However, even improvement over time and relatively high rankings can be misleading.
3) Compare Apples and Oranges Two doctors in Anytown, CA… Doctor #1Doctor #2 What if the populations served by each doctor were very different? Doctor of the Year? 2/1000 20/1000
“Foster Children in Fresno County are three times more likely to remain in foster care for more than a year than in Sacramento.” SF Chronicle, “Accidents of Geography”, March 8, 2006
“Foster Children in Fresno County are three times more likely to remain in foster care for more than a year than in Sacramento.” • Different families and children served? • Different related outcomes? • First entry rates in Fresno are consistently lower • Re-entries in Fresno are also lower… 3. Other considerations… • Resources available, resource allocation choices • Performance trends over time
4) Data snapshots… Crime in Anytown, CA… Number of Crimes Period 1: 76 Period 2: 51 Period 3: 91 Period 4: 76 No change. Average = 73.5 Crime jumped by 49%!! Crime dropped by 16%
“A foster child living in Napa County is in greater danger of being abused in foster care than anywhere else in the Bay area...” SF Chronicle, “No refuge. For foster youth, it’s a state of chance”, November 15, 2005
Abuse in Care Rate Period 1: 1.80% Period 2: 1.64% Period 3: 0.84% Period 4: 0.00% Responsible use of the data prevents us from making any of these claims (positive or negative). The sample is too small; the time frame too limited. “A foster child living in Napa County is in greater danger of being abused in foster care than anywhere else in the Bay Area…” = 2/111 = 2/122 100% improvement! = 1/119 0 Children Abused! = 0
5) Unrepresentative findings… Survey of people in Anytown, CA… 90% of respondents stated that they support using tax dollars to build a new football stadium. The implication of the above finding is that there is overwhelming support for the stadium… But what if you were then told that respondents had been sampled from a list of season football ticket holders?
“Some reports indicate that maltreatment of children in foster care is a serious problem, and in one recent large-scale study, about one-third of respondents reported maltreatment at the hands of their caregivers.” “My Word”, Oakland Tribune, May 25, 2006
“…in one recent large-scale study, about one-third of respondents reported maltreatment at the hands of their caregivers.” Oakland Tribune Factually true? • Yes. Misleading? • Yes. • This was a survey of emancipated foster youth • Emancipated youth represent a distinct subset of the foster care population • This “accurate” statistic misleads the reader to conclude that one-third of foster children have been maltreated in care…
6) Logical “Flipping”… Headline in The Anytown Chronicle: 60% of violent crimes are committed by men who did not graduate from high school. “Flip” 60% of male high school drop-outs commit violent crimes?
“One study in Washington State found that 75 percent of a sample of neglect cases involved families with incomes under $10,000.” Bath and Haapala, 1993 as cited in “Shattered bonds: The color of child welfare” by Dorothy Roberts
“One study in Washington State found that 75 percent of a sample of neglect cases involved families with incomes under $10,000.” • In reading statistics such as the above, there is a tendency to want to directionally “Flip” the interpretation • But the original and flipped statements have very different meanings! 75% of neglect cases involved families with incomes under $10,000 DOES NOT MEAN 75% of families with incomes under $10,000 have open neglect cases Families with incomes under $10,000 Put more simply, just because most neglected children are poor does not mean that most poor children are neglected Families with open neglect cases
7) False Causality… A study of Anytown residents makes the following claim: Adults with short hair are, on average, more than 3 inches taller than those with long hair. Finding an association between two factors does not mean that one causes the other… X Hair Length Height Gender
“A number of child characteristics have previously been shown to be associated with risk of maltreatment. Prematurity or low birth weight is frequently reported…” As reported in Sidebotham and Heron’s 2006 article
“A number of child characteristics have previously been shown to be associated with risk of maltreatment. Prematurity or low birth weight is frequently reported…” • Should one conclude that prematurity is a causal factor in maltreatment? prematurity maltreatment a third factor (Drug use?)
Graphs / Charts • Keep it simple… • Use consistent color themes when possible • Think about the type of data being presented (discrete vs. continuous) • Label Clearly • Tell a story • Look at presentations on the UC site!
Stacked Bar Chart Ethnicity and Path through the Child Welfare System: California 2006
Pie Chart Ethnicity of Children in Foster Care: California 2006