440 likes | 572 Views
Synthesizing Agents and Relationships for Land Use / Transportation Modelling. Lecture Outline. Introduction Previous Work Data New Methods Results. Introduction. How would land use, transportation patterns and emissions react to... High congestion charge? Greenbelt policy?
E N D
Synthesizing Agents and Relationships for Land Use / TransportationModelling
Lecture Outline • Introduction • Previous Work • Data • New Methods • Results
Introduction • How would land use, transportation patterns and emissions react to... • High congestion charge? • Greenbelt policy? • “Do nothing” while population grows • Major transportation projects • Major extrapolations from current behaviour • Too hard to predict conventionally
Introduction Traditional 4-stage
Introduction Integrated Land Use/Transportation Environment (ILUTE) model
Introduction • We can’t build such a complicated model using conventional methods • Instead, preferred approach is microsimulation model • What is microsimulation?
Introduction Conventional Model Simulation Model
Introduction • Microsimulation = Simulation + Agents • Models the state of agents • Combined behaviour of agents yields system state • 1. Begin with initial population in start year • 2. Update population, year by year • age persons, change family structures • change jobs, move homes • use this to predict annual travel patterns • 3. Obtain travel patterns in forecast year
Introduction • Need an initial population in the start year • List of agents and their attributes - e.g., • Number of persons, and their ages • Number of vehicles • Type of dwelling • etc. • But - complete list is unknown • “Population Synthesis” used instead • Use known data to create initial agents • Result has known statistical properties • Best estimate from limited data
Introduction • My results: • Improved method for population synthesis • Allows more attributes for each agent • New method for relationship synthesis • Allows correct set of agents and correct set of relationships • Created a synthetic population for ILUTE • Persons, families, households and dwellings • Complete 1986 population for GTHA
Previous Work • Two representations of set of agents • Listof agents and their attributes (as categories) • Contingency table • One cell for each combination of attributes • Cell contains count of number of agents
Previous Work • Data Limitations • Patchwork of partial data • Mostly, we have one-way margins • Break down of a single attribute into a few categories • Example: look at how we can use one-way margins
Previous Work • Iterative Proportional Fitting
Previous Work • Iterative Proportional Fitting
Previous Work • Iterative Proportional Fitting • e.g., “Biproportional Updating” of O/D tables • Exactly satisfies target margins • Also minimizes discrimination information relative to source population • Information theory: maximum entropy • Resulting PDF satisfies the constraints without assuming any information we do not possess
Previous Work • Many options for margins in 3D
Previous Work • Beckman, Baggerley & McKay (1996) • State-of-the-art application of IPF for census • Geography attribute gets special treatment • Due to nature of data in PUMS and census tables • Two approaches: zone-by-zone, or all zones at once • Treats final table as a PMF • Monte Carlo draws used to integerize • Hurts fit to target margins • Limited number of attributes
Previous Work • Williamson, Birkin and Rees (1998) • Not IPF: “Combinatorial Optimisation” • List-based, instead of tables • Pros: • good fit to target margins • may handle more attributes • Cons: • no guarantees about relationship with source sample • not entropy maximizing • slow
Data • Summary Tables • Usually one attribute, by zone (2D margin) • Contingency table • Large sample: 20% or 100% • Sometimes 2-3 attributes by zone • Used as Target Margins • Public Use Microdata Sample (PUMS) • List; almost all attributes, except zones • Small sample (1-2%) • Canada: defined for each large Census Metropolitan Area (CMA) • Used as Source Sample
Data • Canadian Census includes three PUMS • Persons • Census families • Households & Dwellings • Also summary tables related to each
New Methods: Sparsity • Beckman et al.’s approach doesn’t work well with many attributes • Computation becomes hard • Huge memory requirement • Slow • Thirteen attributes on family agent: • Beckman Zone-by-Zone needs 1.4 GB memory • Beckman Multizone needs 1,036 GB memory
New Methods: Sparsity • Number of cells in multiway table grows exponentially with number of attributes (dimensions)
New Methods: Sparsity • Large number of bins • Most bins are zero • Number of bins is larger than sample!
New Methods: Sparsity • Is it meaningful to use many attributes? • Tentatively, yes • Not a meaningful 13-way distribution • But, a link between many statistically valid low-order distributions (e.g., 3-way) • If acceptable, can we do better than standard IPF? • Yes - use a sparse data structure instead of a complete array to represent table • Store only non-zero cells in table
New Methods: Sparsity • Same representation as Williamson’s “Combinatorial Optimisation” • But, uses IPF algorithm • Maximum entropy guarantee; fast • Can implement either zone-by-zone or multizone IPF using sparse data structure
New Methods: Relationships • Land use/transportation models have more types of agents • Agents: Persons, families, households, business establishments • Objects: Vehicles, dwellings
New Methods: Relationships • Need to synthesize correct relationships • Examples: • Which persons are married? • Opposite sex, similar ages - usually • Which household owns/rents a given dwelling? • Number of rooms and number of persons should be correlated • Earlier methods could guarantee correct PDF for one agent type, but not all simultaneously
New Methods: Relationships • Family PUMS contains information about persons in family • husband/wife ages; child ages • Can synthesize “family” agent • Include some “person” attributes in family
New Methods: Relationships • Then, conditionally synthesize persons on family attributes • IPF result is a joint probability mass functionP(AGE, EDU, INCOME, OCCUP, SEX, ...) • Can convert to a conditional PMFP(EDU, INCOME, OCCUP, ... | AGE, SEX) • Synthesize, repeating for husband, wife, children
New Methods: Relationships • Guarantees good fit for both agent types • Correct Family PDF • Correct Person PDF • Simple, data-driven • No rules • No special data sources, models • Provided that attributes can be aligned between agents
Results • Programmed in R • A statistical programming platform • Dynamic language, fast prototyping • Good support for categorical data, contingency tables • Toronto CMA: 1.1 million households, 1.0 million families, 3.3 million persons • Run time: 2 hours, 7 minutes on older 1.5GHz computer • Repeated for Hamilton and Oshawa CMAs
Results • Experiment • Is there value in using really rich input data? • Or does PUMS + 1D tables give enough? • Calculated fit against all available data • SRMSE and G2 information theoretic statistics
Results • Improvement of result with additional data evident • However, no statistical tests possible • Monte Carlo stage causes some error • My conditional synthesis introduces small amount of additional error • Little difference between zone-by-zone and multizone methods