Synthesizing Agents and Relationships for Land Use / Transportation Modelling

Synthesizing Agents and Relationships for Land Use / TransportationModelling

Lecture Outline • Introduction • Previous Work • Data • New Methods • Results

Introduction • How would land use, transportation patterns and emissions react to... • High congestion charge? • Greenbelt policy? • “Do nothing” while population grows • Major transportation projects • Major extrapolations from current behaviour • Too hard to predict conventionally

Introduction Traditional 4-stage

Introduction Integrated Land Use/Transportation Environment (ILUTE) model

Introduction • We can’t build such a complicated model using conventional methods • Instead, preferred approach is microsimulation model • What is microsimulation?

Introduction Conventional Model Simulation Model

Introduction • Microsimulation = Simulation + Agents • Models the state of agents • Combined behaviour of agents yields system state • 1. Begin with initial population in start year • 2. Update population, year by year • age persons, change family structures • change jobs, move homes • use this to predict annual travel patterns • 3. Obtain travel patterns in forecast year

Introduction • Need an initial population in the start year • List of agents and their attributes - e.g., • Number of persons, and their ages • Number of vehicles • Type of dwelling • etc. • But - complete list is unknown • “Population Synthesis” used instead • Use known data to create initial agents • Result has known statistical properties • Best estimate from limited data

Introduction • My results: • Improved method for population synthesis • Allows more attributes for each agent • New method for relationship synthesis • Allows correct set of agents and correct set of relationships • Created a synthetic population for ILUTE • Persons, families, households and dwellings • Complete 1986 population for GTHA

Previous Work • Two representations of set of agents • Listof agents and their attributes (as categories)‏ • Contingency table • One cell for each combination of attributes • Cell contains count of number of agents

Previous Work • Data Limitations • Patchwork of partial data • Mostly, we have one-way margins • Break down of a single attribute into a few categories • Example: look at how we can use one-way margins

Previous Work

Previous Work • Iterative Proportional Fitting

Previous Work • Iterative Proportional Fitting • e.g., “Biproportional Updating” of O/D tables • Exactly satisfies target margins • Also minimizes discrimination information relative to source population • Information theory: maximum entropy • Resulting PDF satisfies the constraints without assuming any information we do not possess

Previous Work • Many options for margins in 3D

Previous Work • Beckman, Baggerley & McKay (1996)‏ • State-of-the-art application of IPF for census • Geography attribute gets special treatment • Due to nature of data in PUMS and census tables • Two approaches: zone-by-zone, or all zones at once • Treats final table as a PMF • Monte Carlo draws used to integerize • Hurts fit to target margins • Limited number of attributes

Previous Work • Williamson, Birkin and Rees (1998)‏ • Not IPF: “Combinatorial Optimisation” • List-based, instead of tables • Pros: • good fit to target margins • may handle more attributes • Cons: • no guarantees about relationship with source sample • not entropy maximizing • slow

Data • Summary Tables • Usually one attribute, by zone (2D margin)‏ • Contingency table • Large sample: 20% or 100% • Sometimes 2-3 attributes by zone • Used as Target Margins • Public Use Microdata Sample (PUMS)‏ • List; almost all attributes, except zones • Small sample (1-2%)‏ • Canada: defined for each large Census Metropolitan Area (CMA)‏ • Used as Source Sample

Data

Data • Canadian Census includes three PUMS • Persons • Census families • Households & Dwellings • Also summary tables related to each

New Methods: Sparsity • Beckman et al.’s approach doesn’t work well with many attributes • Computation becomes hard • Huge memory requirement • Slow • Thirteen attributes on family agent: • Beckman Zone-by-Zone needs 1.4 GB memory • Beckman Multizone needs 1,036 GB memory

New Methods: Sparsity • Number of cells in multiway table grows exponentially with number of attributes (dimensions)‏

New Methods: Sparsity

New Methods: Sparsity • Large number of bins • Most bins are zero • Number of bins is larger than sample!

New Methods: Sparsity • Is it meaningful to use many attributes? • Tentatively, yes • Not a meaningful 13-way distribution • But, a link between many statistically valid low-order distributions (e.g., 3-way)‏ • If acceptable, can we do better than standard IPF? • Yes - use a sparse data structure instead of a complete array to represent table • Store only non-zero cells in table

New Methods: Sparsity • Same representation as Williamson’s “Combinatorial Optimisation” • But, uses IPF algorithm • Maximum entropy guarantee; fast • Can implement either zone-by-zone or multizone IPF using sparse data structure

New Methods: Relationships • Land use/transportation models have more types of agents • Agents: Persons, families, households, business establishments • Objects: Vehicles, dwellings

New Methods: Relationships • Need to synthesize correct relationships • Examples: • Which persons are married? • Opposite sex, similar ages - usually • Which household owns/rents a given dwelling? • Number of rooms and number of persons should be correlated • Earlier methods could guarantee correct PDF for one agent type, but not all simultaneously

New Methods: Relationships • Family PUMS contains information about persons in family • husband/wife ages; child ages • Can synthesize “family” agent • Include some “person” attributes in family

New Methods: Relationships • Then, conditionally synthesize persons on family attributes • IPF result is a joint probability mass functionP(AGE, EDU, INCOME, OCCUP, SEX, ...) • Can convert to a conditional PMFP(EDU, INCOME, OCCUP, ... | AGE, SEX) • Synthesize, repeating for husband, wife, children

New Methods: Relationships • Guarantees good fit for both agent types • Correct Family PDF • Correct Person PDF • Simple, data-driven • No rules • No special data sources, models • Provided that attributes can be aligned between agents

Results

Results • Programmed in R • A statistical programming platform • Dynamic language, fast prototyping • Good support for categorical data, contingency tables • Toronto CMA: 1.1 million households, 1.0 million families, 3.3 million persons • Run time: 2 hours, 7 minutes on older 1.5GHz computer • Repeated for Hamilton and Oshawa CMAs

Results

Results • Experiment • Is there value in using really rich input data? • Or does PUMS + 1D tables give enough? • Calculated fit against all available data • SRMSE and G2 information theoretic statistics

Results

Results • Improvement of result with additional data evident • However, no statistical tests possible • Monte Carlo stage causes some error • My conditional synthesis introduces small amount of additional error • Little difference between zone-by-zone and multizone methods

Questions?

Synthesizing Agents and Relationships for Land Use / Transportation Modelling