ISCTSC Workshop A7 Best Practices in Data Fusion

ISCTSCWorkshop A7 Best Practices in Data Fusion

Objectives • Indentify the state of the art and the state of practice • Identify key research challenges and opportunities • Identify tangible ways to accelerate methodological innovation and adoption in practice

What exactly is data fusion? • Using more than one data source to estimate a parameter of interest

SOP & SOA (1) • There is a long history of data fusion in transport, but very fragmented • Examples • Synthetic population generation • OD matrix updating • Data enrichment in discrete choice model estimation • Network state estimation • Activity pattern feature extraction from trace data • Use of multiple survey modes • Activity and time use survey consolidation • Population exposure modelling • Public transport (e.g. UK bus) OD matrix estimation

Summary: SOP & SOA (2) • Problem types: • Direct observation by multiple methods • Requires error model • Does not in general require system process model • Direct and indirect observation • Requires error model • Requires additionally a system process model to link indirect observations to parameters of interest • Methods: • ‘Record linking’ methods (e.g., statistical matching, data mining, imputation, fuzzy logic) • Model-based inference (e.g., FIML, filtering, Bayesian inference)

Research needs (1) • Enabling research • Better meta data (survey/data collection process + context) to support informed fusion (specially important in era of web 2.0) • More professional and disciplined protocols in reporting data treatments in published work • Better techniques of disclosure management • Understanding how to make the business case for data fusion • Benefits - sample size, precision; • Barriers – perception of ‘made up data’, threat to incumbent data providers

Research needs (2) • Methodological research • Detecting genuinely conflicting information (not fuseable) – a form of specification test • Better means of validating fused data • Better methods for modelling the propagation of data and model uncertainty during data fusion – enhance confidence in fused data • Are deterministic/’mean imputation’ approaches adequate – how seriously do they distort the covariance structure? • Better re-sampling/Bayesian methods in high dimensions • Integrate methods from SAE • Opportunities to reduce respondent burden by split designs and ex-post fusion (a la SP surveys and analysis) – question substitutability • For record matching, what are the key connecting variables?

Research needs (3) • Research infrastructure • Establish to more consistent and complete taxonomy of data fusion problems, methods, outcomes • Establish reference datasets and reference ‘cases’

ISCTSC Workshop A7 Best Practices in Data Fusion

ISCTSC Workshop A7 Best Practices in Data Fusion

Presentation Transcript

CHF best practices workshop

Data Fusion

Data Mining – Best Practices

Internal Audit Best Practices Workshop

Best Practices in Presenting Data

Data Manager Best Practices

Best Practices for Maintaining Oracle Fusion Middleware

Data Management Best Practices

Data Center Best Practices

Workshop - Drug Testing: Best Practices

CHF BEST PRACTICES WORKSHOP

RFQ / RFP Best Practices Workshop

CHF best practices workshop

CHF best practices workshop

Data Fusion

Best Practices for Maintaining Oracle Fusion Middleware

Data Management Best Practices

CHF best practices workshop

Best Practices in Application Data Masking

CHF best practices workshop

RFQ / RFP Best Practices Workshop

Data Mining – Best Practices