130 likes | 245 Views
Database-Centric Data Mining and Inductive DBMS. Carlo Zaniolo Computer Science Department UCLA. E-Commerce Applications. ZAIAS Corp: decision support and e-Services for web-based auctions—technology to: monitor multiple ongoing auctions, determine the right price for auctioned goods, &
E N D
Database-Centric Data Miningand Inductive DBMS Carlo Zaniolo Computer Science Department UCLA
E-Commerce Applications • ZAIAS Corp: decision support and e-Services for web-based auctions—technology to: • monitor multiple ongoing auctions, • determine the right price for auctioned goods, & • secure well-priced items by timely bids. • Sophisticated analysis of sequence patterns and instant response is needed for first two points. • Similar requirements in many other applications
Sequence Analysis: many applications • Mining web access logs for: • Customer characterization/segmentation • Target advertising—banner optimization • Frequent sequences in market baskets • Analysis of stock market trends • Double bottoms and similar patterns • Fraud detection examples examples • Stolen credit cards, theft of cellular phones and user ids, • Intrusion Detection: • Example: A peripatetic intruder that attempts successive logins from proximity workstations---spatio-temporal criteria used to detect such an attack
State of The Art • ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization • In particular not suitable for infinite data streams • SEQ: Enhanced ADTs (e g. sets and sequences) with their own query language • SRQL: Adding sequence algebra operators to relational model
SQL-TS • A query language for finding complex patterns in sequences • Completely based on SQL---Minimal extensions, only the from clause affected • A powerful query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm
Example in Mining Weblogs Consider a table or a stream of tuples: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Page types include: content (‘c’) description of product (‘d’) purchase (‘p’) Ads a web merchant dream of: c, d, p
3 clicks for a purchase • SQL-TS queries to find the ideal 3-click scenario SELECT B.PageNo, C.ClickTime FROM Sessions CLUSTER BY SessNo SEQUENCE BY ClickTime AS (A, B, C) WHERE A.PageType=‘c’ AND B.PageType=‘d’ AND C.PageType=‘p’
Credit Card Spending Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.
Credit Card Fraud Detection in SQL-TS Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending CLUSTER BY AccountNo SEQUENCE BY Date AS (*X, Y, Z) WHERE COUNT(*X)=30 AND Y.Amount > 5 * AVG(*X.Amount) AND Z.Amount > 5 * AVG(*X.Amount) • *X denotes 1 or more occurrences of X • Aggregates can be computed on the stars
Example in Online Auction • stream containg ongoing bids: Bids(auctn_id, Amount, Time) • Table describing auctions: auctions(auctn_id, item_id, min_bid, deadline,…) Find bids that are converging to a fixed price during the last 15 minutes of the auction.
Example in Online Auction in SQL-TS Find three successive bids that raise the last bid by less than 2% during the last 15 minutes of auction: SELECT T.auction_id, T.time, T.amount FROM Autctions AS A, Bids CLUSTER BY auctn_id SEQUENCE BY time AS (X, Y, Z, T) WHERE A.auctn_id=X.auctn_id AND X.time + 15 Minute > A.deadline AND Y.amount - X.amount < 0.02 * X.amount AND Z.amount - Y.amount < 0.02 * Y.amount AND T.amount - Z.amount < 0.02 * Z.amount
Conclusion • SQL-TS is a simple but powerful SQL extension to searche for pattern in sequences and time series. • QL-TS is supported by powerful query optimization techniques based on a generalization of the Knuth, Morris and Pratt text search algorithm.
References • Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing sequence queries in database systems. ACM Transactions on Database Systems (TODS)Volume 29 , Issue 2 (June 2004) • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001. • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001. • R. Sadri, Optimization of Sequence Queries in Database Systems Ph.D. Thesis, UCLA, 2001. • P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for sequence databases. In ICDE,, 1995. • P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. ACM SIGMOD Conference on Management of Data,, May 1994