1 / 21

Continuous Data Stream Processing

Continuous Data Stream Processing. Music Virtual Channel – extensions Data Stream Monitoring – tree pattern mining Continuous Query Processing – sequence queries. Post-Excellence Project Subproject 6. Date: 2005/10/21. Peer search engine. Profile database. Music channel simulator.

ecaruso
Download Presentation

Continuous Data Stream Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Data Stream Processing Music Virtual Channel – extensions Data Stream Monitoring – tree pattern mining Continuous Query Processing – sequence queries Post-Excellence Project Subproject 6 Date: 2005/10/21

  2. Peer search engine Profile database Music channel simulator XML Filtering engine MusicXML database Music Virtual Channel  Extensions Clustering engine Cluster coordinator Interface Channel monitor Cluster monitor Profile monitor Favorite channel 1 Internet V.C. player … 2 V.C. player Filtering engine … N Music metadata Music collections

  3. An Extension on Virtual Channel • After a player starts a range (or kNN) search, • It updates its profile periodically • The search results are continuously maintained V.C. player (query) V.C. player (peer)

  4. An Extension on Virtual Channel • Compared with the clustering engine • A flexible definition of “clusters” • Update is more natural than insertion/deletion • No need of parameter setting and re-clustering • Indexing can relieve the pain of frequent update • Compared with the problem of moving objects • Movements in a high-dimensional feature space • In most cases every object is also a query • Prediction of object movement is possible

  5. An Extension on Favorite Channel • When a music piece is played on a channel, • The corresponding musicXML file can be obtained • A query can be a portion of musicXML or XQuery

  6. An Extension on Favorite Channel • Compared with query segments • More musical semantic in a query • Do not interfere the music playback • Matching on complex tree-structures • Common subquery is still useful

  7. Research Issues • Peer Search Engine • An indexing method to support continuous query processing for high-dimensional moving objects • A prediction-based bounding mechanism to reduce the frequency of profile update • XML Filtering Engine • An online method to enable tree pattern mining over a data stream • An indexing mechanism to support XML filtering

  8. Discovering Frequent Tree Patterns over Data Streams Submitted for publication

  9. T3 T2 Problem Definition • As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0≦θ≦1 Frequent Tree Patterns T1 STMer

  10. B differs from C D Problem Definition (Cont.) • Labeled ordered tree • Induced subtree Tree pattern Query Tree B A D C B E C D

  11. A A A A A B A C B C B C B B B A A B B D E F C An Example • Given θ = 0.6 Frequent Tree Patterns (occurrence > 0.6*3) : Frequent Tree Patterns (occurrence > 0.6*2) : Frequent Tree Patterns (occurrence > 0.6*1) : B STMer

  12. Main Difficulties • The properties of data streams: • One pass  Traditional tree mining methods fail • Fast input rate  Efficiency issue is critical • Incremental  An incremental algorithm is required • Unbounded  Approximate counting is needed

  13. T1 Requests on demand A candidate pool An Overview of Our Method • Subtree generation • Subtree maintenance STMer

  14. String Representation • DFS order on T  (label, level) node sequence S

  15. Buffer A1B2 Buffer A1 TD A TD A A B B A A1 B B1 A1B2 B,2 A,1 t2 t1 Subtree Generation Data stream

  16. Buffer A1B2C2 TD A A A A C C B C B C1 B C A1C2 A1B2C2 A1B2 C,2 t3 Subtree Generation (Cont.) B A B1 A1 B,2 A,1 Data stream t2 t1

  17. C2 D3 E4 C1 D1 E1 D2 E2 C2 C E3 D3 C2 D E4 D3 E E4 Subtree Generation (Cont.) APT Φ Buffer A1B2 F2 B1 A1 TD A B2 B

  18. Φ APT Φ (A1, 5, 0) B1 A1 E1 (B2, 4, 1) B2 E2 (C3, 2, 1) (E2, 1, 3) E2 Subtree Maintenance +1 #query trees received = 321 Buffer A1B2E2 GPT +1 +1

  19. Experiments on Sensitivity Minimum support Error parameter

  20. Experiments on Comparison • StreamT (ICDM’02)

  21. A A C 5 2 Conclusion • Contribution • A novel technique is proposed for efficient subtree generation • A compact structure is employed to reduce the the memory requirement of the candidate pool • Current work • Mining closed frequent subtrees over data streams A A B C B 5 2

More Related