150 likes | 305 Views
Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank. Gonzalo Mateos and Georgios B. Giannakis Dept. of ECE and Digital Technology Center University of Minnesota. Workshop on Architectures and Models for the Smart Grid. November 5, 2012. 1. Context.
E N D
Spatio-temporal Load Curve Data Cleansing and Imputation via Sparsity and Low Rank Gonzalo Mateos and Georgios B. Giannakis Dept. of ECE and Digital Technology Center University of Minnesota Workshop on Architectures and Models for the Smart Grid November 5, 2012 1
Context Wind farm monitoring Smart metering Network health cartography Goal: Given few rows per agent, perform distributed cleansing and imputation by leveraging low-rank of the nominal data matrix and sparsity of the outliers. • Robust imputation of network data 2 2
Load curve data cleansing • Load curve: electric power consumption recorded periodically • Reliable data: key to realize smart grid vision [Hauser’09] • Missing data: Faulty meters, communication errors, few PMUs • Outliers: Unscheduled maintenance, strikes, sport events [Chen et al’10] Uruguay’s aggregate power consumption (MW) 3
Spatio-temporal load profiles Sparse outliers across buses and time Low-rank nominal load profiles • Power measured at bus , at time • Spatio-temporal model: 4
Principal Component Pursuit • Data model • Missing data: set ? ? ? ? ? ? ? (as) has low rank, is sparse • Goal: Given Y, recover and ? • Sampling operator ? ? ? ? • Principal component pursuit [Chandrasekaran et al’11], [Candes et al’11] 5
Distributed processing paradigms Fusion Center (FC) Incremental In-network • Limitations of FC-based architectures • Lack of robustness (isolated point of failure, non-ideal links) • High Tx power (as geographical area grows) • Less suitable for tracking applications • Limitations of incremental processing • Non-robust to node failures • (Re-) routing? Hamiltonian routes NP-hard to establish 6 6
Problem statement n Goal: Given per node and single-hop exchanges, find Goal: Given per node and single-hop exchanges, find • Network of smart meters: undirected, connected graph ? ? ? ? ? ? ? ? ? ? (P1) • Challenges • Nuclear norm is not separable • Global optimization variable 7
Separable regularization • New formulation equivalent to (P1) (P2) • Nonconvex; reduces complexity: Proposition 1.If stat. pt. of (P2) and , then is a global optimum of (P1). • Key property; e.g., [Recht et al’11] Lxρ ≥rank[X] 8 8
Distributed estimator • Alternating-directions method of multipliers (ADMM) solver • Method [Glowinski-Marrocco’75], [Gabay-Mercier’76] • Learning over networks [Schizas et al’07] • Primal variables per agent : n • Message passing: (P3) Consensus with neighboring nodes • Network connectivity (P2)(P3) 9 9
Distributed iterations 10 10
Attractive features • Highly parallelizable with simple recursions • Unconstrained QPs per agent • No SVD per iteration [O(Tρ3) complexity] • Low overhead for message exchanges • is and is small • Comm. cost independent of network size Recap: (P1)(P2)(P3) Consensus Nonconvex Sep. regul. Nonconvex Centralized Convex Stationary (P3) Stationary (P2) Global (P1) 11 11
Optimality • Proposition 2.If converges to • and , then: • i) • ii) is the global optimum of (P1). • ADMM can converge even for non-convex problems, e.g.,[Boyd et al’11] • Simple distributed algorithm for principal component pursuit • Centralized performance guarantees carry over 12 12
Synthetic data • Random network, N={15,20,25}, T=600 • Data • , • , 13
NorthWrite data • Power consumption of schools, government building, grocery store (’05-’10) Cleansing Imputation • Outliers: “Building operational transition shoulder periods” • Prediction error: 6% for 30% missing data (8% for 50%) 14 Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky (UofM)
Concluding summary Thank You! • Load curve data cleansing and imputation • Leveraging sparsity and low rank • Principal component pursuit for smart grid monitoring • Distributed algorithm with guaranteed performance • Estimate cleansed nominal load profiles • Identify when and where ‘bad data’ occur • Ongoing research: • Convergence of ADMM for bi-convex costs • Real-time (adaptive) algorithms 15 15