130 likes | 146 Views
Explore sequence patterns in data streams for auction optimization, fraud detection, and market trends analysis. Learn about SQL-TS query language and powerful optimization techniques.
E N D
Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA
E-Commerce Applications • ZAIAS Corp: decision support and e-Services for web-based auctions—technology to: • monitor multiple ongoing auctions, • determine the right price for auctioned goods, & • secure well-priced items by timely bids. • Sophisticated analysis of sequence patterns and instant response is needed for first two points. • Similar requirements in many other applications
Sequence Analysis: many applications • Mining web access logs for: • Customer characterization/segmentation • Target advertising—banner optimization • Frequent sequences in market baskets • Analysis of stock market trends • Double bottoms and similar patterns • Fraud detection examples examples • Stolen credit cards, theft of cellular phones and user ids, • Intrusion Detection: • Example: A peripatetic intruder that attempts successive logins from proximity workstations---spatio-temporal criteria used to detect such an attack
State of The Art • ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization • In particular not suitable for infinite data streams • SEQ: Enhanced ADTs (e g. sets and sequences) with their own query language • SRQL: Adding sequence algebra operators to relational model
SQL-TS • A query language for finding complex patterns in sequences • Completely based on SQL---Minimal extensions, only the from clause affected • A powerful query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm
Example in Mining Weblogs Consider a table or a stream of tuples: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Page types include: content (‘c’) description of product (‘d’) purchase (‘p’) Ads a web merchant dream of: c, d, p
3 clicks for a purchase • SQL-TS queries to find the ideal 3-click scenario SELECT B.PageNo, C.ClickTime FROM Sessions PARTITION BY SessNo ORDER BY ClickTime AS (A, B, C) WHERE A.PageType=‘c’ AND B.PageType=‘d’ AND C.PageType=‘p’ PARTITION BY and SEQUENCE BY were used in the original paper
Credit Card Spending Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.
Credit Card Fraud Detection in SQL-TS Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending PARTITION BY AccountNo SEQUENCE BY Date AS (X+, Y, Z) WHERE COUNT(X+)=30 AND Y.Amount> 5*AVG(X+.Amount) AND Z.Amount > 5*AVG(X+.Amount) • +X denotes 1 or more occurrences of X • Aggregates can be computed on the stars
Example in Online Auction • stream containing ongoing bids: Bids(auctn_id, Amount, Time) • Table describing auctions: auctions(auctn_id, item_id, min_bid, deadline,…) Find bids that are converging to a fixed price during the last 15 minutes of the auction.
Example in Online Auction in SQL-TS Find three successive bids that raise the last bid by less than 2% during the last 15 minutes of auction: SELECT T.auction_id, T.time, T.amount FROM Auctions AS A, Bids PARTITION BY auctn_id ORDER BY time AS (X, Y, Z, T) WHERE A.auctn_id=X.auctn_id AND X.time + 15 Minute > A.deadline AND Y.amount - X.amount < 0.02*X.amount AND Z.amount - Y.amount < 0.02*Y.amount AND T.amount - Z.amount < 0.02*Z.amount
Conclusion • SQL-TS is a simple but powerful SQL extension to searche for pattern in sequences and time series. • QL-TS is supported by powerful query optimization techniques based on a generalization of the Knuth, Morris and Pratt text search algorithm.
References • Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing sequence queries in database systems. ACM Transactions on Database Systems (TODS)Volume 29 , Issue 2 (June 2004) • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001. • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001. • R. Sadri, Optimization of Sequence Queries in Database Systems Ph.D. Thesis, UCLA, 2001. • P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for sequence databases. In ICDE,, 1995. • P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. ACM SIGMOD Conference on Management of Data,, May 1994