Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank

Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank Gonzalo Mateos and Georgios B. Giannakis Dept. of ECE and Digital Technology Center University of Minnesota Workshop on Architectures and Models for the Smart Grid November 5, 2012 1

Context Wind farm monitoring Smart metering Network health cartography Goal: Given few rows per agent, perform distributed cleansing and imputation by leveraging low-rank of the nominal data matrix and sparsity of the outliers. • Robust imputation of network data 2 2

Load curve data cleansing • Load curve: electric power consumption recorded periodically • Reliable data: key to realize smart grid vision [Hauser’09] • Missing data: Faulty meters, communication errors, few PMUs • Outliers: Unscheduled maintenance, strikes, sport events [Chen et al’10] Uruguay’s aggregate power consumption (MW) 3

Spatio-temporal load profiles Sparse outliers across buses and time Low-rank nominal load profiles • Power measured at bus , at time • Spatio-temporal model: 4

Principal Component Pursuit • Data model • Missing data: set ? ? ? ? ? ? ? (as) has low rank, is sparse • Goal: Given Y, recover and ? • Sampling operator ? ? ? ? • Principal component pursuit [Chandrasekaran et al’11], [Candes et al’11] 5

Distributed processing paradigms Fusion Center (FC) Incremental In-network • Limitations of FC-based architectures • Lack of robustness (isolated point of failure, non-ideal links) • High Tx power (as geographical area grows) • Less suitable for tracking applications • Limitations of incremental processing • Non-robust to node failures • (Re-) routing? Hamiltonian routes NP-hard to establish 6 6

Problem statement n Goal: Given per node and single-hop exchanges, find Goal: Given per node and single-hop exchanges, find • Network of smart meters: undirected, connected graph ? ? ? ? ? ? ? ? ? ? (P1) • Challenges • Nuclear norm is not separable • Global optimization variable 7

Separable regularization • New formulation equivalent to (P1) (P2) • Nonconvex; reduces complexity: Proposition 1.If stat. pt. of (P2) and , then is a global optimum of (P1). • Key property; e.g., [Recht et al’11] Lxρ ≥rank[X] 8 8

Distributed estimator • Alternating-directions method of multipliers (ADMM) solver • Method [Glowinski-Marrocco’75], [Gabay-Mercier’76] • Learning over networks [Schizas et al’07] • Primal variables per agent : n • Message passing: (P3) Consensus with neighboring nodes • Network connectivity (P2)(P3) 9 9

Distributed iterations 10 10

Attractive features • Highly parallelizable with simple recursions • Unconstrained QPs per agent • No SVD per iteration [O(Tρ3) complexity] • Low overhead for message exchanges • is and is small • Comm. cost independent of network size Recap: (P1)(P2)(P3) Consensus Nonconvex Sep. regul. Nonconvex Centralized Convex Stationary (P3) Stationary (P2) Global (P1) 11 11

Optimality • Proposition 2.If converges to • and , then: • i) • ii) is the global optimum of (P1). • ADMM can converge even for non-convex problems, e.g.,[Boyd et al’11] • Simple distributed algorithm for principal component pursuit • Centralized performance guarantees carry over 12 12

Synthetic data • Random network, N={15,20,25}, T=600 • Data • , • , 13

NorthWrite data • Power consumption of schools, government building, grocery store (’05-’10) Cleansing Imputation • Outliers: “Building operational transition shoulder periods” • Prediction error: 6% for 30% missing data (8% for 50%) 14 Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky (UofM)

Concluding summary Thank You! • Load curve data cleansing and imputation • Leveraging sparsity and low rank • Principal component pursuit for smart grid monitoring • Distributed algorithm with guaranteed performance • Estimate cleansed nominal load profiles • Identify when and where ‘bad data’ occur • Ongoing research: • Convergence of ADMM for bi-convex costs • Real-time (adaptive) algorithms 15 15

Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank

Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank

Presentation Transcript

Behavior Recognition via Sparse Spatio-Temporal Features

Spatio-Temporal Data Mining

Spatial and Spatio-temporal Data Uncertainty: Modeling and Querying

Low Rank Approximation and Regression in Input Sparsity Time

Nonparametric low-rank tensor imputation

Robust Network Traffic Estimation via Sparsity and Low Rank

SPATIO TEMPORAL FRAMEWORKS

Spatio-temporal HAC

Spatio-Temporal Databases

Management and Mining of Spatio-Temporal Data

Spatio-Temporal Queries and Indexing

Spatio-Temporal Clustering

Spatio-Temporal Databases

Imputation of Streaming Low-Rank Tensor Data

SPATIO-TEMPORAL DATABASES

Managing Uncertainty in Spatial and Spatio -temporal Data

Nonparametric low-rank tensor imputation

Unveiling Anomalies in Large-scale Networks via Sparsity and Low Rank

Indexing Spatio-Temporal Data Warehouses

Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank

Spatio-temporal Databases

Spatio-Temporal Predicates