320 likes | 473 Views
Sparsity Control for Robustness and Social Data Analysis. Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments : Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grant.
E N D
Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grant Minneapolis, MNDecember 9, 2011
Learning from “Big Data” `Data are widely available, what is scarce is the ability to extract wisdom from them’ Hal Varian, Google’s chief economist Fast BIG Productive Ubiquitous Revealing Messy Smart 2 K. Cukier, ``Harnessing the data deluge,'' Nov. 2011.
Social-Computational Systems • Complex systems of people and computers • The vision:preference measurement (PM), analysis, management • Understand and engineer SoCS • The means: leverage dual role of sparsity • Complexity control through variable selection • Robustness to outliers 3
Conjoint analysis • Marketing, healthcare, psychology [Green-Srinivasan‘78] • Optimal design and positioning of new products • Strategy: describe products by a set of attributes, `parts’ • Goal: learn consumer’s utility function from preference data • Linear utilities: `How much is each part worth?’ • Success story [Wind et al’89] • Attributes: room size, TV options, restaurant, transportation 4
Modeling preliminaries • Respondents (e.g., consumers) • Rate profiles Each comprises attributes • Linear utility: estimate vector of partworths • Conjoint data collection formats (M1)Metric ratings: (M2)Choice-based conjoint data: • Online SoCS-based preference data exponentially increases • Inconsistent/corrupted/irrelevant data Outliers 5
Robustifying PM Least-trimmed squares [Rousseeuw’87] Q: How should we go about minimizing nonconvex (LTS)? A: Try all subsets of size , solve, and pick the best • Simple but intractable beyond small problems • Near optimal solvers [Rousseeuw’06], RANSAC [Fischler-Bolles’81] (LTS) • is the -th order statistic among • residuals discarded 6 G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted).
Modeling outliers outlier otherwise • Nominal ratings obey (M1); outliers something else -contamination [Fuchs’99], Bayesian model [Jin-Rao’10] • Natural (but intractable) nonconvex estimator • Outlier variables s.t. • Both and unknown, typically sparse! 7
LTS as sparse regression Proposition 1: If solves (P0) with chosen s.t. , then in (LTS). • Lagrangian form (P0) • Tuning parameter controls sparsity in number of outliers • Formally justifies the preference model and its estimator (P0) • Ties sparse regression with robust estimation 8
Just relax! • (P0) is NP-hard relax e.g., [Tropp’06] • (P1) convex, and thus efficiently solved • Role of sparsity-controlling is central (P1) Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where 9
Lassoing outliers Proposition 2: Minimizers of (P1) are , • Data-driven methods to select • Lasso solvers return entire robustification path (RP) Coeffs. Decreasing • Suffices to solve Lasso [Tibshirani’94] 10
Nonconvex regularization • Iterative linearization-minimization of around • Initialize with , use • Bias reduction (cf. adaptive Lasso [Zou’06]) • Nonconvex penalty terms approximate better in (P0) • Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08] 11
Comparison with RANSAC Nominal: Outliers: • , i.i.d. 12
Nonparametric regression • Interactions among attributes? • Not captured by • Driven by complex mechanisms hard to model • If one trusts data more than any parametric model • Go nonparametric regression: • lives in a space of “smooth’’ functions • Ill-posed problem • Workaround: regularization [Tikhonov’77], [Wahba’90] • RKHS with kernel and norm 13
Function approximation Nonrobust predictions True function Robust predictions Refined predictions • Effectiveness in rejecting outliers is apparent 14 G. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012
Load curve data cleansing Uruguay’s power consumption (MW) • Load curve: electric power consumption recorded periodically • Reliable data: key to realize smart grid vision [Hauser’09] • Faulty meters, communication errors • Unscheduled maintenance, strikes, sport events • B-splines for load curve prediction and denoising [Chen et al ’10] 15
NorthWrite data • Energy consumption of a government building (’05-’10) • Robust smoothing spline estimator, hours • Outliers: “Building operational transition shoulder periods” • No manual labeling of outliers [Chen et al’10] 16 Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky
Principal Component Analysis DNA microarray Traffic surveillance • Motivation: (statistical) learning from high-dimensional data • Principal component analysis (PCA) [Pearson’1901] • Extraction of low-dimensional data structure • Data compression and reconstruction • PCA is non-robust to outliers [Jolliffe’86] • Our goal: robustify PCA by controlling outlier sparsity 17
Our work in context • Contemporary applications tied to SoCS • Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] • Video surveillance, e.g., [Oliver et al’99] • Matrix completion for collaborative filtering, e.g., [Candes et al’09] • Robust PCA • Robust covariance matrix estimators [Campbell’80], [Huber’81] • Computer vision [Xu-Yuille’95], [De la Torre-Black’03] • Low-rank matrix recovery from sparse errors, e.g., [Wright et al’09] 18
PCA formulations • Training data • Minimum reconstruction error • Compression operator • Reconstruction operator • Maximum variance • Component analysis model Solution: 19
Robustifying PCA (P2) • -norm counterpart tied to (LTS PCA) • (P2) subsumes optimal (vector) Huber • -norm regularization for entry-wise outliers • Outlier-aware model • Interpret: blind preference model with latent profiles 20 G. Mateos and G. B. Giannakis , ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted).
Alternating minimization 1 (P2) • update: SVD of outlier-compensated data • update: row-wise vector soft-thresholding Proposition 3: Alg. 1’s iterates converge to a stationary point of (P2). 21
Video surveillance Original PCA Robust PCA `Outliers’ 22 Data: http://www.cs.cmu.edu/~ftorre/
Big Five personality factors • Five dimensions of personality traits [Goldberg’93][Costa-McRae’92] • Discovered through factor analysis • WEIRD subjects • Big Five Inventory (BFI) • Measure the Big Five • Short-questionnaire (44 items) • Rate 1-5, e.g.,`I see myself as someone who… …is talkative’ …is full of energy’ 23 Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008.
BFI data • Eugene-Springfield community sample [Goldberg’08] • subjects, item responses, factors • Robust PCA identifies 8 outlying subjects • Validated via `inconsistency’ scores, e.g., VRIN [Tellegen’88] 24 Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller
Online robust PCA • At time , do not re-estimate • Motivation: Real-time data and memory limitations • Exponentially-weighted robust PCA 25
Online PCA in action • Nominal: • Outliers: 26
Robust kernel PCA Input space Feature space • Challenge: -dimensional Kernel trick: • Kernel (K)PCA [Scholkopf ‘97] • Related to spectral clustering 27
Unveiling communities ARI=0.8967 • Network: NCAA football teams (nodes), F’00 games (edges) • teams, kernel • Identified exactly: Big 10, Big 12, ACC, SEC, Big East • Outliers: Independent teams 28 Data: http://www-personal.umich.edu/~mejn/netdata/
Goal: find s.t. is the spectrum at position Original Estimated S P E C T R U M M A P Approach: Basis expansion model for , nonparametric basis pursuit Spectrum cartography Idea:collaborate to form a spatial map of the spectrum J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct. 2011.
Distributed adaptive algorithms Wireless sensor Improved learning through cooperation Issues and Significance: • Fast varying (non-)stationary processes • Unavailability of statistical information • Online incorporation of sensor data • Noisy communication links Technical Approaches: • Consensus-based in-network operation in ad hoc WSNs • Distributed optimization using alternating-direction methods • Online learning of statistics using stochastic approximation • Performance analysis via stochastic averaging G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,'‘IEEE Trans. Signal Process., Nov. 2009.
Unveiling network anomalies Approach: Flag anomalies across flows and time via sparsity and low rank Enhanced detection capabilities Anomalies across flows and time Payoff: Ensure high performance, QoS, and security in IP networks M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).
Concluding summary SIGNAL PROCESSING OUTLIER-RESILIENT ESTIMATION LASSO • Experimental validation with GPIPP personality ratings (~6M) Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com • Control sparsity in model residuals for robust learning • Research issues addressed • Sparsity control for robust metric and choice-based PM • Kernel-based nonparametric utility estimation • Robust (kernel) principal component analysis • Scalable distributed real-time implementations • Application domains • Preference measurement and conjoint analysis • Psychometrics, personality assessment • Video surveillance • Social and power networks 32