1 / 9

ISCTSC Workshop A7 Best Practices in Data Fusion

ISCTSC Workshop A7 Best Practices in Data Fusion. Objectives. Indentify the state of the art and the state of practice Identify key research challenges and opportunities Identify tangible ways to accelerate methodological innovation and adoption in practice. What exactly is data fusion?.

gamba
Download Presentation

ISCTSC Workshop A7 Best Practices in Data Fusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISCTSCWorkshop A7 Best Practices in Data Fusion

  2. Objectives • Indentify the state of the art and the state of practice • Identify key research challenges and opportunities • Identify tangible ways to accelerate methodological innovation and adoption in practice

  3. What exactly is data fusion? • Using more than one data source to estimate a parameter of interest

  4. What exactly is data fusion? • Using more than one data source to estimate a parameter of interest

  5. SOP & SOA (1) • There is a long history of data fusion in transport, but very fragmented • Examples • Synthetic population generation • OD matrix updating • Data enrichment in discrete choice model estimation • Network state estimation • Activity pattern feature extraction from trace data • Use of multiple survey modes • Activity and time use survey consolidation • Population exposure modelling • Public transport (e.g. UK bus) OD matrix estimation

  6. Summary: SOP & SOA (2) • Problem types: • Direct observation by multiple methods • Requires error model • Does not in general require system process model • Direct and indirect observation • Requires error model • Requires additionally a system process model to link indirect observations to parameters of interest • Methods: • ‘Record linking’ methods (e.g., statistical matching, data mining, imputation, fuzzy logic) • Model-based inference (e.g., FIML, filtering, Bayesian inference)

  7. Research needs (1) • Enabling research • Better meta data (survey/data collection process + context) to support informed fusion (specially important in era of web 2.0) • More professional and disciplined protocols in reporting data treatments in published work • Better techniques of disclosure management • Understanding how to make the business case for data fusion • Benefits - sample size, precision; • Barriers – perception of ‘made up data’, threat to incumbent data providers

  8. Research needs (2) • Methodological research • Detecting genuinely conflicting information (not fuseable) – a form of specification test • Better means of validating fused data • Better methods for modelling the propagation of data and model uncertainty during data fusion – enhance confidence in fused data • Are deterministic/’mean imputation’ approaches adequate – how seriously do they distort the covariance structure? • Better re-sampling/Bayesian methods in high dimensions • Integrate methods from SAE • Opportunities to reduce respondent burden by split designs and ex-post fusion (a la SP surveys and analysis) – question substitutability • For record matching, what are the key connecting variables?

  9. Research needs (3) • Research infrastructure • Establish to more consistent and complete taxonomy of data fusion problems, methods, outcomes • Establish reference datasets and reference ‘cases’

More Related