250 likes | 544 Views
Statistics Project. By: Rich Miktus, Christopher Geigel, Brandon Butch. 2004 Data - Raw New Jersey Counties. Abuse and Neglect Referrals of Children Special Education Enrollment Number of Child Arrests Average Income of Families with Children Child Poverty Child Population
E N D
Statistics Project By: Rich Miktus, Christopher Geigel, Brandon Butch
2004 Data - Raw New Jersey Counties • Abuse and Neglect Referrals of Children • Special Education Enrollment • Number of Child Arrests • Average Income of Families with Children • Child Poverty • Child Population • Total Population • School Enrollment
VariablesPer Capita • Abuse • Poverty • Special Education • Income • School Enrollment • Arrests • Population Density
Introduction • One Variable Analysis • Histograms • Scatterplots • Q-Q Plots • Two Variable Analysis • Linear models • Regression analysis • Simple Models • Arrests • School Enrollment • Residual Diagnostics
One Variable Analysis • Histograms & Scatterplots • Frequency of occurrences • Skew of data • Q-Q Plot • Normal distribution • Usefulness of variables • Real-life relationships • Data flaws
Two Variable Analysis • Correlation Table – used to check initial predictions • Linear regression line • Residuals • How much do our explanatory variables matter?
Two Variable Analysis • More refined analysis to test: • Arrests ~ abuse, special education enrollment, poverty, school • School enrollment ~ income, poverty, abuse, population density
Arrests vs. Abuse • Good linear fit – strong correlation • Residuals relatively small • Large F Statistic, small P Value
School vs. Income • Relationship is very weak • No strong, overall trend • Possible weak, positive correlation
Two Variable AnalysisConclusions • Arrests strongly correlated with abuse, moderately correlated with special education enrollment and poverty, and not correlated with school enrollment • School enrollment strongly with population density, and not related to income, poverty and abuse
Simple ModelsSchool Enrollment • Possible variables • Abuse • Income • Poverty • Population density
Income and Poverty Correlation Variance Inflation Factors Best Regression By AIC Not enough applicable data -0.911 Income: 8.489 Poverty: 9.278 School~Density Underfitted Flawed variables Problems with School Model
Simple ModelsArrests • Possible variables • Abuse • Special Eduacation • Poverty • Population Density • School enrollment
Problem High correlation and VIFs with explanatory variables Multicollinearity Fix Removed Income (too similar to poverty) Proceeded to refine the model and it worked itself out Problems with Arrests Model
Arrests Modelchoosing a model • The Test for best fit • AIC goodness test • Arrests~Abuse + Special Ed + Poverty + Density + School • Arrests~Abuse + Special Ed + School
Residual plots led to possible transformation on School To choose transform used GAM plots Residual DiagnosticsModel Refinement
Residual DiagnosticsModel Refinement • Used a Cubic transform • Resulted in a higher Adj R squared value • New Model didn’t have normal residuals • Rejected the model
Box Cox Plot Lowest near 0 No transform required Residual DiagnosticsModel Refinement
LRPlot One obvious non influential outlier Easily removed without damage to the model Residual DiagnosticsRemoving Outliers
Conclusions • Good linear fit between arrests and its explanatory variables; not so for school enrollment • Juvenile arrests can be modeled by: Arrests = 2.58 + 0.21(Sped) + 0.95(Abuse) – 0.08(School) • Not enough appropriate data to make a model for school enrollment • Improvements • Check correlation of variables earlier • Additional data acquisition