1 / 27

Mining Unusual Patterns in Data Streams in Multi-Dimensional Space

This paper explores the challenges and techniques for mining unusual patterns in data streams in multi-dimensional space, including regression analysis, stream cubing, and other mining methods. It discusses the characteristics of data streams, applications, and the key steps for stream data reduction.

lamons
Download Presentation

Mining Unusual Patterns in Data Streams in Multi-Dimensional Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Unusual Patterns in Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj

  2. Outline • Characteristics of data streams • Mining unusual patterns in data streams • Multi-dimensional regression analysis of data streams • Stream cubing and stream OLAP methods • Mining other kinds of patterns in data streams • Research problems Mining Unusual Patterns in Data Streams

  3. Data Streams • Data Streams • Data streams—continuous, ordered, changing, fast, huge amount • Traditional DBMS—data stored in finite, persistentdata sets • Characteristics • Huge volumes of continuous data, possibly infinite • Fast changing and requires fast, real-time response • Data stream captures nicely our data processing needs of today • Random access is expensive—single linear scan algorithm (can only have one look) • Store only the summary of the data seen thus far • Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing Mining Unusual Patterns in Data Streams

  4. Stream Data Applications • Telecommunication calling records • Business: credit card transaction flows • Network monitoring and traffic engineering • Financial market: stock exchange • Engineering & industrial processes: power supply & manufacturing • Sensor, monitoring & surveillance: video streams • Security monitoring • Web logs and Web page click streams • Massive data sets (even saved but random access is too expensive) Mining Unusual Patterns in Data Streams

  5. Challenges of Stream Data Mining • Multiple, continuous, rapid, time-varying, ordered streams • Main memory computation • Mining queries are either continuous or ad-hoc • Mining queries are often complex • Involving multiple streams, large amount of data, and history • Finding patterns, models, anomaly, differences, … • Mining dynamics (changes, trends and evolutions) of data streams • Multi-level/multi-dimensional processing and data mining • Most stream data are at pretty low-level or multi-dimensional in nature Mining Unusual Patterns in Data Streams

  6. Stream Data Mining Tasks • Multi-dimensional (on-line) analysis of streams • Clustering data streams • Classification of data streams • Mining frequent patterns in data streams • Mining sequential patterns in data streams • Mining partial periodicity in data streams • Mining notable gradients in data streams • Mining outliers and unusual patterns in data streams • …… Mining Unusual Patterns in Data Streams

  7. Multi-Dimensional Stream Analysis: Examples • Analysis of Web click streams • Raw data at low levels: seconds, web page addresses, user IP addresses, … • Analysts want: changes, trends, unusual patterns, at reasonable levels of details • E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours.” • Analysis of power consumption streams • Raw data: power consumption flow for every household, every minute • Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago Mining Unusual Patterns in Data Streams

  8. A Key Step—Stream Data Reduction • Challenges of OLAPing stream data • Raw data cannot be stored • Simple aggregates are not powerful enough • History shape and patterns at different levels are desirable: multi-dimensional regression analysis • Proposal • A scalable multi-dimensional stream “data cube” that can aggregate regression model of stream data efficiently without accessing the raw data • Stream data compression • Compress the stream data to support memory- and time-efficient multi-dimensional regression analysis Mining Unusual Patterns in Data Streams

  9. Regression Cube for Time-Series • Initially, one time-series per base cell • Too costly to store all these time-series • Too costly to compute regression at multi-dimensional space • Regression cube • Base cube: only store regression parameters of base cells (e.g., 2 points vs. 1000 points) • All the upper level cuboids can be computed precisely for linear regression on both standard dimensions and time dimensions • For quadratic regression, we need 5 points • In general, we need: where k = 2 for quadratic. Mining Unusual Patterns in Data Streams

  10. Basics of General Linear Regression • n tuples in one cell: (xi, yi), i =1..n, where yi is the measure attribute to be analyzed • For sample i , a vector of k user-defined predictors ui: • The linear regression model: where η is a k × 1 vector of regression parameters Mining Unusual Patterns in Data Streams

  11. Linearly Compressed Representation (LCR) • Stream data compression for multi-dimensional regression analysis • Define, for i, j = 0,…,k-1: • The linearly compressed representation (LCR) of one cell: • Size of LCR of one cell: quadratic in k, independent of the number of tuples n in one cell Mining Unusual Patterns in Data Streams

  12. Stock Price Example—Aggregation in Standard Dimensions • Simple linear regression on time series data • Cells of two companies • After aggregation: Mining Unusual Patterns in Data Streams

  13. Stock Price Example—Aggregation in Time Dimension • Cells of two adjacent time intervals: • After aggregation Mining Unusual Patterns in Data Streams

  14. A Stream Cube Architecture • Atilted time frame • Different time granularities • second, minute, quarter, hour, day, week, … • Critical layers • Minimum interest layer (m-layer) • Observation layer (o-layer) • User: watches at o-layer and occasionally needs to drill-down down to m-layer • Partial materialization of stream cubes • Full materialization: too space and time consuming • No materialization: slow response at query time • Partial materialization: what do we mean “partial”? Mining Unusual Patterns in Data Streams

  15. 4qtrs 7 days 24hrs 15minutes 25sec. 31days 24 hours 4 qtrs Time Now 12 months Time Now A Tilted Time-Frame Model Up to 7 days: Up to a year: Logarithmic (exponential) scale: 16t 8t 4t 4t 2t 1t Time Now Mining Unusual Patterns in Data Streams

  16. Two Critical Layers in the Stream Cube (*, theme, quarter) o-layer (observation) (user-group, URL-group, minute) m-layer (minimal interest) (individual-user, URL, second) (primitive) stream data layer Mining Unusual Patterns in Data Streams

  17. On-Line Materialization vs. On-Line Computation • On-line materialization • Materialization takes precious resources and time • Only incremental materialization (with slide window) • Only materialize “cuboids” of the critical layers? • Some intermediate cells that should be materialized • Popular path approach vs. exception cell approach • Materialize intermediate cells along the popular paths • Exception cells: how to set up exception thresholds? • Notice exceptions do not have monotonic behaviour • Computation problem • How to compute and store stream cubes efficiently? • How to discover unusual cells between the critical layer? Mining Unusual Patterns in Data Streams

  18. Stream Cube Structure: from m-layer to o-layer (A1, *, C1) (A1, *, C2) (A1, *, C2) (A1, *, C2) (A2, B1, C1) (A1, B1, C2) (A1, B2, C1) (A2, *, C2) (A2, B1, C2) A2, B2, C1) (A1, B2, C2) (A2, B2, C2) Mining Unusual Patterns in Data Streams

  19. Stream Cube Computation • Cube structure from m-layer to o-layer • Three approaches • All cuboids approach • Materializing all cells (too much in both space and time) • Exceptional cells approach • Materializing only exceptional cells (saves space but not time to compute and definition of exception is not flexible) • Popular path approach • Computing and materializing cells only along a popular path • Using H-tree structure to store computed cells (which form the stream cube—a selectively materialized cube) Mining Unusual Patterns in Data Streams

  20. Quant-Info Sum: xxxx Cnt: yyyy Regression: An H-Tree Cubing Structure root Observation layer sports politics entertainment uiuc uic uic uiuc Minimal int. layer jeff Jim jeff mary Q.I. Q.I. Q.I. Mining Unusual Patterns in Data Streams

  21. Partial Materialization Using H-Tree • H-tree: • Introduced for computing data cubes and iceberg cubes • J. Han, J. Pei, G. Dong, and K. Wang, “Efficient Computation of Iceberg Cubes with Complex Measures”, SIGMOD'01 • Compressed database, fast cubing, and space preserving in cube computation • Using H-tree for partial stream cubing • Space preserving: • Intermediate aggregates can be computed incrementally and saved in tree nodes • Facilitate computing other cells and multi-dimensional analysis • H-tree with computed cells can be viewed as “stream cube” Mining Unusual Patterns in Data Streams

  22. Time and Space vs. Number of Tuples at the m-Layer (Dataset D3L3C10T400K) a) Time vs. m-layer size b) Space vs. m-layer size Mining Unusual Patterns in Data Streams

  23. Time and Space vs. the Number of Levels a) Time vs. # levels b) Space vs. # levels Mining Unusual Patterns in Data Streams

  24. Mining Other Unusual Patterns in Stream Data • Clustering and outlier analysis for stream mining • Clustering data streams (Guha, Motwani et al. 2000-2002) • History-sensitive, high-quality incremental clustering • Classification of stream data • Evolution of decision trees: Domingos et al. (2000, 2001) • Incremental integration of new streams in decision-tree induction • Frequent pattern analysis • Approximate frequent patterns (Manku & Motwani VLDB’02) • Evolution and dramatic changes of frequent patterns Mining Unusual Patterns in Data Streams

  25. Conclusions • Stream data mining: A rich and largely unexplored field • Current research focus in database community: • DSMS system architecture, continuous query processing, supporting mechanisms • Stream data mining and stream OLAP analysis • Powerful tools for finding general and unusual patterns • Effectiveness, efficiency and scalability: lots of open problems • Our philosophy: • A multi-dimensional stream analysis framework • Time is a special dimension: tilted time frame • What to compute and what to save?—Critical layers • Very partial materialization/precomputation: popular path approach • Mining dynamics of stream data Mining Unusual Patterns in Data Streams

  26. References • B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom, “Models and issues in data stream systems”, PODS'02 (tutorial). • S. Babu and J. Widom, “Continuous queries over data streams”, SIGMOD Record, 30:109--120, 2001. • Y. Chen, G. Dong, J. Han, J. Pei, B. W. Wah, and J. Wang. “Online analytical processing stream data: Is it feasible?”, DMKD'02. • Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, “Multi-dimensional regression analysis of time-series data streams”, VLDB'02. • P. Domingos and G. Hulten, “Mining high-speed data streams”, KDD'00. • M. Garofalakis, J. Gehrke, and R. Rastogi, “Querying and mining data streams: You only get one look”, SIGMOD'02 (tutorial). • J. Gehrke, F. Korn, and D. Srivastava, “On computing correlated aggregates over continuous data streams”, SIGMOD'01. • S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering data streams”, FOCS'00. • G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams”, KDD'01. Mining Unusual Patterns in Data Streams

  27. www.cs.uiuc.edu/~hanj Thank you !!! Mining Unusual Patterns in Data Streams

More Related