320 likes | 429 Views
I1.1 Fundamentals for Context-aware Real-time Data Fusion. Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: Tarek Abdelzaher. Task Goal and Overview. Goal: Foundations for utilizing context and prior knowledge in fusion
E N D
I1.1 Fundamentals for Context-aware Real-time Data Fusion Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: TarekAbdelzaher
Task Goal and Overview • Goal: • Foundations for utilizing context and prior knowledge in fusion • Foundations for analysis of fusion latency. Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Data Fusion Threads • Thread 1: Enable exploitation of prior knowledge and information network links in the design of algorithms for data fusion (Dan Roth: UIUC) • Thread 2: Enhance ability to uncover links between heterogeneous content items, such as text and video (Huang, UIUC) • Thread 3: Advance latency analysisof distributed data fusion algorithms (Abdelzaher, UIUC) • Thread 4: Validate the results on viable platforms and crowd-sourcing applications (Lei, IBM Research)
Outline Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Outline Thread 1 Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Thread 1: A Framework for Integrating Prior Knowledge: • Fundamentals of Context-aware Real-time Data Fusion • Advances in Learning & Inference of Constrained Conditional Models • CCM: A computational framework for learning and inference with interdependent variables in constrained settings • Formulating Information Fusion as CCMs. • Preliminary theoretical and experimental work on Information Fusion • Key Publications: • R. Samdani and D. Roth, Efficient Learning for Constrained Structured Prediction, submitted. • M. Chang, M. Connor and D. Roth, The Necessity of Combining Adaptation Methods, EMNLP’10. • M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect Supervision, ICML’10. • M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained Latent Representations, NAACL’10 • G. Kundu, D. Roth and R. Samdani, Constrained Conditional Models for Information Fusion, submitted.
Fusion as a Decision Problem • Predict values of multiple, interdependent labels (in contexts as diverse as information extraction, information trustworthiness, information fusion, etc.) • Modeling complex dependencies leads to intractability of learning & inference (decision making) • Leads to over-simplification & unjustified independence assumptions • Constrained conditional models (CCMs) pair relatively simple learning models with expressive prior knowledge in the form of declarative constraints in supporting global decisions. • Learn models for sub-problems; incorporate models’ information, along with prior knowledge/constraints, in making globally coherent decisions Learn models; Acquire knowledge/constraints; Make decisions. • Recent Progress: LoCL (Locally Consistent Learning): a scheme which is consistent with Global Learning under certain conditions while being efficient. • Theoretical contribution and experimental confirmation on info extraction tasks.
Illustrative Example • Learning an optimal path AC B BCopt C A ABopt Sub-problem BC Sub-problem AB
Illustrative Example • Learning an optimal path B BCopt C A ABopt Constraint: No left turns at B
Illustrative Example • Learning an optimal path B BCopt C A Global Optimum Global Optimum Constraint: No left turns at B LoCL: Using local models + constraints find global optima
Application: Disaster Scenario Predict output states of different locations over consecutive time steps Text Messages Predicted States @ various locations: {y1,y2,…yn} Output space is spatially and temporally structured Images Selected Information Command Center Resource constraints: Router Feedback Data from sensors Expressing this structure using constraints can help make coherent predictions and boost accuracy. Information Sources
Outline Thread 1 Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Outline Thread 1 Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Thread 2 Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Thread 2: Constructing Cross-Domain Translator(UIUC, IBM) Bridge the cross-domain gap? Target instances (images) Source instances (text)
Constructing Cross-Domain Translator Inner product in latent space as translator W(s) W(t) Target instances (images) Source instances (text) Common Latent Space
Technical Contributions • Cross-Domain Knowledge Propagation • Propagating Knowledge in surrounding text to visual data • Published in WWW’11, collaboration with Dr. CharuAggarwal, IBM • Cross-Category Knowledge Sharing • Exploring the concept correlations to enhance the inference accuracy • To appear in CVPR’11, collaboration with Dr. CharuAggarwal, IBM • Modeling Context-Aware Image Similarity • Applications into Disaster Assessment (Collaboration with Prof. TarekAbdelzaher) • KDD’11, submitted
Outline Thread 1 Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Thread 2 Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Communication Network Sensor, text, image, and human sources
Outline Thread 1 Prior Knowledge Accounting for Prior Knowledge with Constrained Conditional Models Latency Thread 2 Resource Bottleneck Uncovering Links in Heterogeneous Content Latency Analysis Information Network ? Thread 3 Communication Network Sensor, text, image, and human sources
Thread 3: Latency AnalysisIn Collaboration with AylinYener, CNARC • Goal: • Answer the question: How much work can be done “on time” (given different data fusion workflows and different end-to-end deadlines) • Derive the real-time capacity region (load region where deadlines are met) • Model: • Different data flows share distributed computational and communication resources • Each flow is represented by its own workflow graph • Different flows have different end-to-end deadlines (worst-case allowable end-to-end latency) • Results: • An algebra for reducing distributed workflows to equivalent canonical “centralized systems” • A real-time capacity region for the canonical system
A Reduction Theory for Distributed Systems Equivalent Uniprocessor • In collaboration with CNARC (OICC) • Based on reduction of distributed systems to an “equivalent uniprocessor” C1eq= 2 C2eq = 1.8 Stage 1 Stage 2 C1,max = 2 F1 C1,1 = 2 C1,2 = 1.1 F2 C2,1 = 1 C2,2 = 1.8 C2,max = 1.8 Cmax,1 = 2 Cmax,2 = 1.8
Stage 1 Stage 2 C1,max = 2 F1 C1,1 = 2 C1,2 = 1.1 Reduction of Busy Pipelines F2 C2,1 = 1 C2,2 = 1.8 C2,max = 1.8 Cmax,1 = 2 Cmax,2 = 1.8 10 pipeline jobs of F1 10 pipeline jobs of F2 Stage 1 Stage 2 Time (a) Original Pipeline Execution 10 uniprocessor jobs of C1,max each 10 uniprocessor jobs of C2,max each Time (b) Uniprocessor Approximation
Reduction of Data Fusion Trees F1 1 1 2 F2 2 3 3 F3 2 2 1 3 1 F1 (b) Equivalent Uniprocessor 1 2 1 2 F2 2 2 1 F3 2 1 2 2 1 2 (a) Distributed Data Fusion System of Three Workflows
New: Reduction of Data Fusion Trees F1 1 1 2 F2 2 3 3 F3 2 2 1 3 1 F1 (b) Equivalent Uniprocessor 1 2 1 2 F2 2 2 1 F3 2 1 2 2 1 2 (a) Distributed Data Fusion System of Three Workflows
The Real-time Capacity Region • The real-time capacity theorem: In a system with a set, S, of processing workflows, where each workflow Fiin S incurs an effective utilization uieffecton an equivalent uniprocessor and has a job rate Riand a per-job end-to-end maximum latency constraint, Di, all jobs meet their end-to-end deadlines if: where:
The Real-time Capacity Region • The real-time capacity theorem: In a system with a set, S, of processing workflows, where each workflow Fiin S incurs an effective utilization uieffecton an equivalent uniprocessor and has a job rate Riand a per-job end-to-end maximum latency constraint, Di, all jobs meet their end-to-end deadlines if: where: Guaranteed (safe) real-time capacity region
Performance Evaluation • Theoretically predicted real-time capacity bound is very close to empirical onset of deadline misses
app1 app2 app3 Smart grid Smart building Smart healthcare Smart supply chain Data Center Domain Analytics Library MCS Data Broker Application Gateway Wide Area Network MCS Gateway Access Appliances Mobile Sensing Devices MCS Device Agent Social Architecture MCS Data Collector Thread 4: ValidationIBM, UIUC • Develop a general platform reusable for different mobile crowd-sensing applications to experiment with data fusion applications
Road Ahead • Analysis of trade-offs between timeliness and fusion quality • Investigation of the dependency of fusion quality and timeliness on distributed resource allocation. • Integration of prior knowledge, constraints, and resource distribution issues into future data fusion algorithms. • Improving quality/cost trade-offs via link discovery (between text and video) • Information-network-aware real-time capacity of data fusion. • Validation, documentation and publications.
Collaborations Better storage policies In-network Storage I2.1/C2.1 QoI Task I1.2 New fusion algorithms Accurate, timely Characterization of QoI/cost trade-offs Fusion Task I1.1 Better fusion from human sources Improved effective operational capacity Community Modeling S2.2 Capacity Task I1.2 Decisions under Stress S3.1 Improved diagnostic capabilities in fusion systems Provenance Task T1.3 CNARC QoI Task I1.2 Improved network QoI optimization for fusion systems
Papers Thread 1 (Q1): • (UIUC): GourabKundu, RajhansSamdani, Dan Roth, “Constrained Conditional Models for Information Fusion,” submitted to Fusion 2011 • (UIUC): Dan Roth at al. “Efficient Learning for Constrained Structured Prediction” Submitted to ICML 2011 Thread 2 (Q2): • (INARC+CNARC): Forrest Iandola, FatemehSaremi, TarekAbdelzaher, Praveen Jayachandran, AylinYener, “Real-time Capacity of Networked Data Fusion,” submitted to Fusion 2011
More Papers Thread 3 (I1.1-I1.2 Collaboration/Multi-institution): • (UIUC+IBM) G. Qi, C. Aggarwal, T. Huang, “Towards Semantic Knowledge Propagation between text and web images,” WWW Conference, 2011. • (UIUC+IBM) Guo-Jun Qi, CharuAggarwal, Yong Rui, QiTian, Shiyu Chang and Thomas Huang, “Towards Cross-Category Knowledge Propagation for Learning Cross-domain Concepts,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21-23, 2011 • (IBM+UIUC) C. Aggarwal, Y. Zhao, P. Yu. On Wavelet Decomposition of Uncertain Text Streams, CIKM Conference, 2011. • (UIUC+IBM) G. Qi, C. Aggarwal, T. Huang, “Transfer learning with distance functions between text and web images,” Submitted to the ACM KDD Conference, 2011. • (UIUC+IBM) G. Qi, C. Aggarwal, H. Ji, T. Huang, “Exploring Content and Context-based Links in Social Media: A Latent Space Method,” Submitted to IEEE Transactions on Pattern Mining (TPAMI) Thread 4 (Q3/Q4) • RaghuGanti, Fan Ye, Hui Lei, “Mobile Crowdsensing: Current State and Future Challenges,” in submission to IEEE Comm. Magazine
Military Relevance • Enhanced warfighter’sability to interpret reports, sensory data, and soft information sources for making the right decisions • Enhanced exploitation of semantic links between information items to improve data fusion accuracy • Improved ability to utilize context and background knowledge in interpreting data • Significantly improved situation assessment in the presence of heterogeneous content • Improved latency analysis algorithms for data fusion systems to ensure timeliness of fusion results