210 likes | 228 Views
This project introduces extensions to a virtual music channel for continuous data stream monitoring and query processing, including tree pattern mining and sequence queries. Key features include flexible clustering, natural updates, high-dimensional movements prediction, and musicXML file retrieval. The research tackles issues in peer search engines, XML filtering engines, and discovering frequent tree patterns in data streams. It provides solutions for high-dimensional moving objects indexing, profile update reduction, and efficient subtree generation.
E N D
Continuous Data Stream Processing Music Virtual Channel – extensions Data Stream Monitoring – tree pattern mining Continuous Query Processing – sequence queries Post-Excellence Project Subproject 6 Date: 2005/10/21
Peer search engine Profile database Music channel simulator XML Filtering engine MusicXML database Music Virtual Channel Extensions Clustering engine Cluster coordinator Interface Channel monitor Cluster monitor Profile monitor Favorite channel 1 Internet V.C. player … 2 V.C. player Filtering engine … N Music metadata Music collections
An Extension on Virtual Channel • After a player starts a range (or kNN) search, • It updates its profile periodically • The search results are continuously maintained V.C. player (query) V.C. player (peer)
An Extension on Virtual Channel • Compared with the clustering engine • A flexible definition of “clusters” • Update is more natural than insertion/deletion • No need of parameter setting and re-clustering • Indexing can relieve the pain of frequent update • Compared with the problem of moving objects • Movements in a high-dimensional feature space • In most cases every object is also a query • Prediction of object movement is possible
An Extension on Favorite Channel • When a music piece is played on a channel, • The corresponding musicXML file can be obtained • A query can be a portion of musicXML or XQuery
An Extension on Favorite Channel • Compared with query segments • More musical semantic in a query • Do not interfere the music playback • Matching on complex tree-structures • Common subquery is still useful
Research Issues • Peer Search Engine • An indexing method to support continuous query processing for high-dimensional moving objects • A prediction-based bounding mechanism to reduce the frequency of profile update • XML Filtering Engine • An online method to enable tree pattern mining over a data stream • An indexing mechanism to support XML filtering
Discovering Frequent Tree Patterns over Data Streams Submitted for publication
T3 T2 Problem Definition • As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0≦θ≦1 Frequent Tree Patterns T1 STMer
B differs from C D Problem Definition (Cont.) • Labeled ordered tree • Induced subtree Tree pattern Query Tree B A D C B E C D
A A A A A B A C B C B C B B B A A B B D E F C An Example • Given θ = 0.6 Frequent Tree Patterns (occurrence > 0.6*3) : Frequent Tree Patterns (occurrence > 0.6*2) : Frequent Tree Patterns (occurrence > 0.6*1) : B STMer
Main Difficulties • The properties of data streams: • One pass Traditional tree mining methods fail • Fast input rate Efficiency issue is critical • Incremental An incremental algorithm is required • Unbounded Approximate counting is needed
T1 Requests on demand A candidate pool An Overview of Our Method • Subtree generation • Subtree maintenance STMer
String Representation • DFS order on T (label, level) node sequence S
Buffer A1B2 Buffer A1 TD A TD A A B B A A1 B B1 A1B2 B,2 A,1 t2 t1 Subtree Generation Data stream
Buffer A1B2C2 TD A A A A C C B C B C1 B C A1C2 A1B2C2 A1B2 C,2 t3 Subtree Generation (Cont.) B A B1 A1 B,2 A,1 Data stream t2 t1
C2 D3 E4 C1 D1 E1 D2 E2 C2 C E3 D3 C2 D E4 D3 E E4 Subtree Generation (Cont.) APT Φ Buffer A1B2 F2 B1 A1 TD A B2 B
Φ APT Φ (A1, 5, 0) B1 A1 E1 (B2, 4, 1) B2 E2 (C3, 2, 1) (E2, 1, 3) E2 Subtree Maintenance +1 #query trees received = 321 Buffer A1B2E2 GPT +1 +1
Experiments on Sensitivity Minimum support Error parameter
Experiments on Comparison • StreamT (ICDM’02)
A A C 5 2 Conclusion • Contribution • A novel technique is proposed for efficient subtree generation • A compact structure is employed to reduce the the memory requirement of the candidate pool • Current work • Mining closed frequent subtrees over data streams A A B C B 5 2