1 / 29

What about the whole country?

What about the whole country?. Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’ 14 ORF467 F13. The Process. Generate Schools Generate Employee Patronage File Assign Patronage Generate Patronage-Employee Ratios

steffi
Download Presentation

What about the whole country?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13

  2. The Process • Generate Schools • Generate Employee Patronage File • Assign Patronage • Generate Patronage-Employee Ratios • A Look at the Data • Generate Census File (with Microsoft Access) • NN Files through 7 NJ Modules by Jake and Talal • Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension

  3. Roadmap • Schools Data • Employee-Patronage Data • A Look at the Data • Census Data • Further Steps

  4. Schools Data

  5. Public Schools in the US

  6. Quick stats on Public Schools (2011)

  7. Public Schools: Enrollment

  8. Private Schools in the US

  9. Private Schools: Enrollment

  10. Private Schools: School Size

  11. Post-secondary schools (2009)

  12. Employee-Patronage Data

  13. The Process • 2012 InfoGroup US Businesses File (5.80 GB) • 30 CSV files with 500,000 entries (~200MB) – Shell Script • 30 CSV files with patronage generation and data cleaning and mapping (~115MB) – R Script • 1570 Segmented State Files (1KB to 20MB) – R Script • 51 Merged State Files (8MB to 390MB) – Python Script

  14. Patronage Generation • Previous Process – Manual Fine-Tuning • Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio • Current Process – Employee Size Range, Sales Volume Range • Not Perfect Data • Matching businesses (Zip, County, NAICS, Latt/Long) • Same Employee Size Range • Assumption: Sales Volume same across time • Trying to acquire the 2005 Data for better correlations • Ratios from Averaging Previous EP file

  15. Comparison: Distributions

  16. Conclusion: Need to use NAICS Codes, in addition A large number of 0-1 ratio values are offset by the 7-20. Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume. Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination

  17. A Look at the Data

  18. NJ Counties (Change in NJ EP File) Uncensored Un-Named Removed

  19. NJ Wide Uncensored Un-named Removed No Businesses +39,350 Tot Emp +4.8M Emp Size +9.09 Tot Patrons -5.3M Avg Patrons -16.29 • No Businesses +73,500 • Tot Emp +4.8M • Emp Size +7.85 • Tot Patrons -4.9M • Avg Patrons -17.17

  20. Nation-Wide

  21. Census Data

  22. Inputs • 2010 Census Summary File 1 • http://www2.census.gov/census_2010/04-Summary_File_1/ • Does not convert to CSV/TXT; Files made for MS Access • Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA macro in MS Access (p.78) • VBA Code – whereabouts unknown, perhaps with Prof K • 2012 5-Year Census American Community Survey • http://www2.census.gov/acs2012_5yr/summaryfile/ • Income Data to assign incomes to households and residents

  23. Generation • Module 1 – Outputs resident file for each county in state • Rows: Individual People • Attributes/Columns: County Number (replace with State Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket • Module 2 – Out of state/region/nation nodes • For commenting on code, go to p.17-19 • http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynthesizer_v.1.pdf

  24. Further Steps

  25. What To Do Next? • Patronage Generation with NAICS, Sales Volume, Employee Size and Research – Low Difficulty • I already generated a file mapping all NAICS and employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries) • Census Data Generation and Rework NN Generation Modules – High Difficulty • Optional: Data Verification for Employee-Patronage Files

  26. Modules • Very hard-coded for NJ; not very well-commented • Initial National Implementation Ideas: • Treat US as one entity with external nodes at airports to represent foreigners • Problem: Computationally intensive for 330M people • Solution: Do a semi-randomized sample • Regionalize the US and use out-of-region external nodes • Less labor-intensive and parallel processing • Doing each state • Problem: Hard to generalize code, out-of-state nodes • Extremely labor-intensive

  27. The Code: Thought Process • Trips generated state-by-state • Use state-level demographic information on residents • Ignore state-level boundaries since we have employer and attraction information for the nation. • Example: • John Smith lives in NYC and works in CT. • We will get his household from NYC Census file and the probability distribution of workplace in CT E-P file. • When we map NYC Trips, we will see John Smith going to CT for work. When we map CT Trips, we will see John Smith returning from work. • Trip destinations can be approximated using destination county centroids • Requires assigning centroid to each county

  28. The Code: Thought Process • Workplace assignment (without replacement): • Census maps individuals to workplace • John Smith lives in NYC and works in CT • Use distribution to match workplace to E-P file (keep a count of employees to match the number given) • John Smith mapped to an employer in CT • If more than x (e.g. 250) miles, assume arrival at airport • School Assignment (without replacement): • Use bounds and distribution to match students with schools (assume same county) • Jane (8) is mapped to elementary school in her county

  29. The Code: Thought Process • Tour Type assignment and Temporal Dimension • Can try to repurpose Talal’s code • Add in Time Zones in Temporal Dimension • Can do this with replacement (patrons) • Assumptions: Same behavior across states in terms of work time and leisure time and activity patterns • Out-of-Country Commuters / Non-Resident Workers • International nodes for the states along the Canadian and Mexican borders • Trip to the nearest border crossing

More Related