240 likes | 507 Views
Brennon Bortz , Marcos Carzolio, Andrew Hoegh , Shashidhar Sundareisan. CrimeScore. What is CrimeScore ?. CrimeScore is the predicted number of violent crimes per month within a 1km radius of a given location in Washington, D.C. Training Data.
E N D
BrennonBortz, Marcos Carzolio, Andrew Hoegh, ShashidharSundareisan CrimeScore
What is CrimeScore? • CrimeScore is the predicted number of violent crimes per month within a 1km radius of a given location in Washington, D.C.
Training Data • Uniform random sample points throughout Washington, D.C. • Data collected within a 1km radius of samples • Barbershops, bus stops, gas stations, schools, registered property, liquor establishments, and more • Distance to nearest police station, distance to nearest public housing project, etc.
Data Aggregation • Parsed crime data from DC Data Catalog • Classified crime data • Violent and non-violent crimes • Focused on violent crimes, consisting of homicide, robbery, assault with a deadly weapon, and sexual abuse
Implementation Goals • Simple, elegant and familiar • User Interface like Google maps • Dynamic; easily accommodates multiple queries • Represent crime score as a color and a number with an associated interpretation • Pack as much information as possible • Make queries fast and display results faster
Implementation • Query a search or listen to a click on the map • Use Google maps API to get positions of the search on the map • Feed the results to R-script to calculate CrimeScoreusing Shiny • Use the CrimeScore to display color coded markers on the map
Rook • Wraps R environment • Bootstraps R’s internal web server • Maintains environment • Finnicky!
Why Java Script? • Omnipresent in HTML scripting • Prevalent support and acceptance • Ability to write asynchronous functions so that the queries over the internet and to the database does not halt the web-page • Google Maps Java Script API v3 is heavily documented • Supports JSON data interchange format
Google Maps API • Use URL requests to access geocoding, directions, elevation, place and time zone information. • Embed an interactive Google Map in the webpage using JavaScriptby creating markers, infowindows etc. • The JavaScript Maps API V3 is a free service, available for any web site that is free to consumers
Google Maps API • Map • MapOpions • Geocoder • Marker • Infowindow • PlacesService • LatLng • Events
Google Maps API • Place map at the center of Washington DC • Restrict queries up to a 10 km radius • Retrieve latitude and Longitude values for results • Place markers with appropriate colors depending upon crime score • Place infowindow on all markers to show satellite information • Allow option to manually give a Lat/Lon by clicking
Data Storage • The data for the project were stored in a centralized database using MySQL • The main use of the database was to store Latitude, Longitude and details of places, as well as crimes relevant to the mining process • Data collected from the crime data set and the DC data catalog
Challenges • Incomplete or missing data • Dealing with spatial data • Simultaneously dealing with polygons and points in the dc catalog • Finding the distances to the nearest barber shop, schools, churches, police stations, bus stops etc. is time consuming
Challenges • Limit over number of queries in Google maps API • Using radarsearch over textsearch • Can’t specify the boundary of a search query other than a rectangle or a circle in google maps API • Maximum of 200 results per query
Improving Implementation • Make results appear faster • Instead of calculating distances from every place to calculate crime score divide the city into a grid with pre-calculated values of crime scores • A query now will only find what grid the place belongs to and return the appropriate crime score
Random Forest Regression • Each tree trains on a bootstrapped subset of data • At each node on all trees, algorithm randomly chooses predictors on which to build a regression model and create a split in feature space • Response in regression model is actual (observed) CrimeScore • Excellent predictions; difficult interpretations • Analysis done with randomForest R package
Random Forest Regression Regression Tree 1 Original Data Bootstrap 1 Regression Tree 2 Bootstrap 2 Random Forest Regression Tree 3 Bootstrap 3 Regression Tree 4 Bootstrap 4
Model Validation • Algorithm holds out 20% of data to test against model • Performance at each node measured by mean squared error and mean decrease in accuracy
CrimeScore Functionality • Travelers seeking a safe place to stay • City planners choosing locations for parks, etc. • Police mapping out patrol routes • Homebuyers selecting a new residence • Hotel and real estate advertising
Future Work • Implement CrimeScore in other cities • Develop interface within travel websites • Improve interactivity for city planner