290 likes | 425 Views
What about the whole country?. Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’ 14 ORF467 F13. The Process. Generate Schools Generate Employee Patronage File Assign Patronage Generate Patronage-Employee Ratios
E N D
What about the whole country? Extending the Activity-Based Person-Trip Synthesizer to all 330 million Americans Judy Sun ‘14 & Luke Cheng ’14 ORF467 F13
The Process • Generate Schools • Generate Employee Patronage File • Assign Patronage • Generate Patronage-Employee Ratios • A Look at the Data • Generate Census File (with Microsoft Access) • NN Files through 7 NJ Modules by Jake and Talal • Trip File Generator: Out-of-State commuters, students, workplace assignment, 18 Tour Type (Activity Patterns) assignment, Temporal Dimension
Roadmap • Schools Data • Employee-Patronage Data • A Look at the Data • Census Data • Further Steps
The Process • 2012 InfoGroup US Businesses File (5.80 GB) • 30 CSV files with 500,000 entries (~200MB) – Shell Script • 30 CSV files with patronage generation and data cleaning and mapping (~115MB) – R Script • 1570 Segmented State Files (1KB to 20MB) – R Script • 51 Merged State Files (8MB to 390MB) – Python Script
Patronage Generation • Previous Process – Manual Fine-Tuning • Inconsistent: Same NAICS Code, Different Patronage/Employee Ratio • Current Process – Employee Size Range, Sales Volume Range • Not Perfect Data • Matching businesses (Zip, County, NAICS, Latt/Long) • Same Employee Size Range • Assumption: Sales Volume same across time • Trying to acquire the 2005 Data for better correlations • Ratios from Averaging Previous EP file
Conclusion: Need to use NAICS Codes, in addition A large number of 0-1 ratio values are offset by the 7-20. Therefore, we get a surge averages of around 4-5. Difficult to capture nuances with just employee size and sales volume. Next Steps: Man-Power needed to assign ratio for each NAICS Code, Sales Volume, Employee Size combination
NJ Counties (Change in NJ EP File) Uncensored Un-Named Removed
NJ Wide Uncensored Un-named Removed No Businesses +39,350 Tot Emp +4.8M Emp Size +9.09 Tot Patrons -5.3M Avg Patrons -16.29 • No Businesses +73,500 • Tot Emp +4.8M • Emp Size +7.85 • Tot Patrons -4.9M • Avg Patrons -17.17
Inputs • 2010 Census Summary File 1 • http://www2.census.gov/census_2010/04-Summary_File_1/ • Does not convert to CSV/TXT; Files made for MS Access • Process Tables (P12, P16, P29, H13, P43) with Talal’s VBA macro in MS Access (p.78) • VBA Code – whereabouts unknown, perhaps with Prof K • 2012 5-Year Census American Community Survey • http://www2.census.gov/acs2012_5yr/summaryfile/ • Income Data to assign incomes to households and residents
Generation • Module 1 – Outputs resident file for each county in state • Rows: Individual People • Attributes/Columns: County Number (replace with State Number_County Number for national file), Household ID, Household Type, Latt/Long, ID Number, Age, Sex, Traveler Type, Income Bracket • Module 2 – Out of state/region/nation nodes • For commenting on code, go to p.17-19 • http://www.princeton.edu/~alaink/Orf467F12/MuftiTripSynthesizer_v.1.pdf
What To Do Next? • Patronage Generation with NAICS, Sales Volume, Employee Size and Research – Low Difficulty • I already generated a file mapping all NAICS and employment counts along with payrolls for patronage assignment using 2010 Census Data (200K entries) • Census Data Generation and Rework NN Generation Modules – High Difficulty • Optional: Data Verification for Employee-Patronage Files
Modules • Very hard-coded for NJ; not very well-commented • Initial National Implementation Ideas: • Treat US as one entity with external nodes at airports to represent foreigners • Problem: Computationally intensive for 330M people • Solution: Do a semi-randomized sample • Regionalize the US and use out-of-region external nodes • Less labor-intensive and parallel processing • Doing each state • Problem: Hard to generalize code, out-of-state nodes • Extremely labor-intensive
The Code: Thought Process • Trips generated state-by-state • Use state-level demographic information on residents • Ignore state-level boundaries since we have employer and attraction information for the nation. • Example: • John Smith lives in NYC and works in CT. • We will get his household from NYC Census file and the probability distribution of workplace in CT E-P file. • When we map NYC Trips, we will see John Smith going to CT for work. When we map CT Trips, we will see John Smith returning from work. • Trip destinations can be approximated using destination county centroids • Requires assigning centroid to each county
The Code: Thought Process • Workplace assignment (without replacement): • Census maps individuals to workplace • John Smith lives in NYC and works in CT • Use distribution to match workplace to E-P file (keep a count of employees to match the number given) • John Smith mapped to an employer in CT • If more than x (e.g. 250) miles, assume arrival at airport • School Assignment (without replacement): • Use bounds and distribution to match students with schools (assume same county) • Jane (8) is mapped to elementary school in her county
The Code: Thought Process • Tour Type assignment and Temporal Dimension • Can try to repurpose Talal’s code • Add in Time Zones in Temporal Dimension • Can do this with replacement (patrons) • Assumptions: Same behavior across states in terms of work time and leisure time and activity patterns • Out-of-Country Commuters / Non-Resident Workers • International nodes for the states along the Canadian and Mexican borders • Trip to the nearest border crossing