1 / 20

Data Mining Customer & Employee-Related Subway Incidents: Phase II

Data Mining Customer & Employee-Related Subway Incidents: Phase II. David Budet Mariel Castro Jason Jaworski Yevgeny Khait Florangel Marte Client: Richard Washington, NYC Transit Authority. Presentation Summary. Project Description Review Progression City Crime vs. Subway Crime

Download Presentation

Data Mining Customer & Employee-Related Subway Incidents: Phase II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Customer & Employee-RelatedSubway Incidents: Phase II David Budet Mariel Castro Jason Jaworski Yevgeny Khait Florangel Marte Client: Richard Washington, NYC Transit Authority

  2. Presentation Summary • Project Description • Review • Progression • City Crime vs. Subway Crime • Results: Customer Assaults • Results: Employee Assaults • Results: Robberies (Simple Theft) • Results: Train Delays • Weka ID3 Decision Trees • Future Research Avenues

  3. Project Description • Phase I concentrated on looking at incidents and identifying reasons for aggression, specifically what effects delays had on aggression incidents • Phase II is more specifically concentrated on subway assaults and possible correlations with the data’s attributes • Main focus of both phases: analysis of a dataset of incidents which occurred in the New York City Subway system over multiple years and mining of the data to establish relationships and trends

  4. Review Violent assaults against customers and employees Delays Simple thefts (unarmed robberies, pick-pocketing, etc.) The first half of the study focused on mining data with Microsoft SQL Server 2008 and the program Weka. Utilizing these tools and team methodologies, we determined which stations and train lines had the most:

  5. Progression Acquired US Census data regarding crime and population in NYC Normalized the Census crime data and subway crime data by population for Manhattan, Brooklyn, Queens and the Bronx  Analyzed Subway crime as a microcosm of overall NYC crime for 2007 Created an interactive Javascript map pinpointing stations with most violent incidents and delays The second half of the study had a more regional focus. The team:

  6. City Crime vs. Subway Crime In comparing overall crime in New York City for 2007 to crime in the NYC Subway system: • We found that Manhattan, though the third largest borough in terms of population, accounted for over half the crime in NYC • The Bronx has the smallest population, but in terms of crime per resident, had the second highest rate of crime • Subway crime accounts for less of a percentage of overall crime in Manhattan than the other three boroughs researched

  7. City Crime vs. Subway Crime

  8. City Crime vs. Subway Crime When normalized for population, subway crime in Brooklyn and Queens accounts for a greater percentage of overall crime than in Manhattan and the Bronx, signaling these boroughs may have more dangerous, or incident prone stations than Manhattan or Queens.

  9. Findings: Customer Assaults The stations with the most assaults (all types of assault) against customers from 2005 – 2007 were 59th Street, 14th Street and 125th Street.

  10. Findings: Customer Assaults Between 2005 & 2007, the highest number of assaults (all types) committed against customers took place on the A, 2 and 4 lines.

  11. Findings: Employee Assaults Stations with more than 5 total assaults (all types of assault) against employees between 2005 – 2007

  12. Findings: Employee Assaults Between 2005 & 2007, the highest number of assaults (all types) committed against employees took place on the 6, 2 and A lines.

  13. Findings: Robberies (Simple Theft)

  14. Findings: Robberies (Simple Theft)

  15. Findings: Train Delays Number of delays by month over 3 year period:

  16. Findings: Train Delays

  17. Findings: Train Delays

  18. Weka ID3 Decision Tree

  19. Weka ID3 Decision Tree

  20. Future Research Avenues • MTA and project team can separately mine an identical data set and introduce an objective methodology for determining the best results and techniques from both databases • Continue in-depth data mining • Identify and research other algorithms in Weka conducive to mining and correlating NYC Subway data (we propose the next team utilize clustering analysis via the algorithm SimpleKMeans) • Investigate possible correlations between neighborhood income levels and stations where subway crime is prevalent • Continue to expand and build on Javascript map

More Related