1 / 14

A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services

A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services. Reza Sadri Carlo Zaniolo sadri@procom.com zaniolo@cs.ucla.edu Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com.

maryjordan
Download Presentation

A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo sadri@procom.com zaniolo@cs.ucla.edu Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com

  2. Sequential Patterns in E-Commerce Applications • Applications: • Targeted Advertising • Instant analysis of stock market trends • Mining web access logs • Fraud detection • Requirements: • Expressive query language for finding complex patterns in database sequences • Efficient and scalable implementation: Query Optimization

  3. State of The Art • ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization • SEQ: Enhanced ADTs (e. g. sets and sequences) with their own query language • SRQL: Adding sequence algebra operators to relational model

  4. SQL-TS • A query language for finding complex patterns in sequences • Minimal extension of SQL—only the from clause affected • A new Query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

  5. Example in Mining Weblogs Consider a table: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Possible page type: content product description purchase

  6. Example in Mining Weblogs • SQL-TS queries to find the ideal scenarios: SELECT B.PageNo, C.ClickTime FROM Sessions CLUSTER BY SessNo SEQUENCE BY ClickTime AS (A, B, C) WHERE A.PageType=‘content’ AND B.PageType=‘product’ AND C.PageType=‘purchase’

  7. Example in Fraud Detection Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.

  8. Example in Fraud Detection Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending CLUSTER BY AccountNo SEQUENCE BY Date AS (*X, Y, Z) WHERE COUNT(*X)=30 AND Y.Amount > 5 * AVG(*X.Amount) AND Z.Amount > 5 * AVG(*X.Amount) Notice the Use of star and aggregates.

  9. Optimized string search:KMP Consider text array text and pattern array p: i 1 2 3 4 5 6 7 8 9 10 11 text[i] a b a b a b c a b c a j 1 2 3 4 5 6 pattern[j] a b a b c a ­ • After failing, use the information acquired so to: • - backtrack to shift(j), rather than i+1, and • - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns

  10. i – j + 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) i Input shift(j) + 1 shift(j) + next(j) 1 Pattern j next(j) j - shift(j) Shifted Pattern 1 shift(j) shift and next • Success for first j-1 elements of pattern. Failure for jth element (when input is at i) • Any shift less than shift(j) is guaranteed to lead to failure, • Match elements in the pattern starting at next(j)

  11. Optimal Pattern Search (OPS) Search path for naive algorithm vs. optimized algorithm:

  12. *U (less than 2% change) *T *V *Y *R *Z (less than 2% change) *W (less than 2% change) Optimizing Star Patterns Relaxed Double Bottom: • Only considering increases and decreases that are more than 2%

  13. Relaxed Double Bottom: Ninety fold improvement

  14. Conclusions • Disjunctive queries, partial ordered domains, aggregates also treated in this approach • Old applications—more power & flexibility than Datablades ADTs of commercial DBMSs • Ongoing implementation, by building on the user-defined aggregates supported in AXL.

More Related