1 / 21

Optimization of Sequence Queries in Database Systems

This paper explores the optimization of sequence queries in database systems, providing an efficient and scalable implementation using query optimization techniques. It introduces a new query language for finding complex patterns in sequences, with minimal extension of SQL. The paper also discusses various string search optimization algorithms, including KMP, Boyer and Moore, and Karp and Rabin. The study demonstrates the application of these techniques in time series analysis, such as querying purchase patterns for marketing, stock market analysis, and studying meteorological data.

kcarty
Download Presentation

Optimization of Sequence Queries in Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo reza@cs.ucla.edu zaniolo@cs.ucla.edu sadri@procom.com Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com

  2. Time series Analysis • Many Applications: • Querying purchase patterns for marketing • Stock market analysis • Studying meteorological data • What’s needed: • Expressive query language for finding complex patterns in database sequences • Efficient and scalable implementation: Query Optimization

  3. SQL-TS • A query language for finding complex patterns in sequences • Minimal extension of SQL—only the from clause affected • A new Query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

  4. Text Search Optimization • Boyer and Moore • Precomputed shift functions for each character and sub-pattern • Dependent to the alphabet size • Works best for non-repeating patterns • O(mn) worst case time • Knuth, Morris and Pratt (KMP) • Independent of the alphabet size • Most efficient in general: O(m+n) time • Karp and Rabin • Prefix Hashing • Dependent to the alphabet size • O(mn) worst case time

  5. Optimized string search:KMP Consider text array text and pattern array p: i 1 2 3 4 5 6 7 8 9 10 11 text[i] a b a b a b c a b c a j 1 2 3 4 5 6 pattern[j] a b a b c a ­ • After failing, use the information acquired so to: • - backtrack to shift(j), rather than i+1, and • - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns

  6. i – j + 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) i Input shift(j) + 1 shift(j) + next(j) 1 Pattern j next(j) j - shift(j) Shifted Pattern 1 shift(j) shift and next • Success for first j-1 elements of pattern. Failure for jth element (when input is at i) • Any shift less than shift(j) is guaranteed to lead to failure, • Match elements in the pattern starting at next(j)

  7. Equality Predicates: KMP suffices Find companies whose closing stock price in three consecutive days was 10, 11, and 15. SELECT X.name FROM quote CLUSTER BY name SEQUENCE BY date AS (X, Y, Z) WHERE X.price =10 AND Y.price=11 AND Z.price=15 But in SQL-TS we have general predicates

  8. Optimal Pattern Search (OPS) Search path for naive algorithm vs. optimized algorithm:

  9. Beyond KMP: General Predicates For IBM stock prices, find all instances where the pattern of two successive drops followed by two successive increases, and the drops take the price to a value between 40 and 50, and the first increase doesn't move the price beyond 52. SELECT X.date AS start_date, X.price U.date AS end_date, U.price FROM quote CLUSTER BY name SEQUENCE BY date AS (X, Y, Z, T, U) WHERE X.name='IBM' AND Y.price < X.price AND Z.price < Y.price AND 40 < Z.price < 50 AND Z.price < T.price AND T.price < 52 AND T.price < U.price

  10. *U (less than 2% change) *T *V *Y *R *Z (less than 2% change) *W (less than 2% change) Beyond KMP: Star Patterns Relaxed Double Bottom: • Only considering increases and decreases that are more than 2%

  11. Relaxed Double Bottom: Ninety fold improvement

  12. Relaxed Double Bottom in June 1990

  13. Conclusion • Significant speedups—from 6 to 900 times faster • Queries, partial ordered domains, aggregates also treated in this approach • Many other optimization opportunities: e.g., parallel search for multiple patterns

  14. i – j + 1 i – j + shift(j) + 1 i - j + shift(j) + next(j) i Input shift(j) + 1 shift(j) + next(j) 1 Pattern j next(j) j - shift(j) Shifted Pattern 1 shift(j) shift and next • Success for first j-1 elements of pattern. Failure for jth element (when input is at i) • Any shift less than shift(j) is guaranteed to lead to failure, • Match elements in the pattern starting at next(j)

  15. General Predicates--Cont p1(t) = (t.price < t.previous.price) p2(t) = (t.price < t.previous.price)  (40<t.price<50) p3(t) = (t.price > t.previous.price)  (t.price<52) p4(t) = (t.price > t.previous.price) And we need to find the implication between this pattern elements

  16. Matrices q and j: Input tested on pj is now tested against pk pj succeeded: pj failed: Combing values of these lower triangular matrices ( j ³ k), We derive the values of next(j) and shift (j)

  17. Example

  18. STAR Patterns SELECT X.NEXT.date, X.NEXT.price, S.previous.date, S.previous.price FROM quote CLUSTER BY name, SEQUENCE BY date AS (*X, Y, *Z, *T, U, *V, S) WHERE X.name='IBM‘ AND X.price > X.previous.price AND 30 < Y.price AND Y.price < 40 AND Z.price < Z.previous.price AND T.price > T.previous.price AND 35 < U.price AND U.price < 40 AND V.price < V.previous.price AND S.price < 30

  19. Handling Star Patterns Same input, Transitions on Original Pattern vs. Transitions on Pattern after the index set back j-k 21  31  32   41  42  43   Example: Elements j and k are star predicates and jk is U: U  j,k+1  j+1,k j+1,k+1

  20. Possible Transitions • Elements j and k are star predicates and jk is U: U  j,k+1  j+1,k j+1,k+1 • Elements j and k are star predicates and jk is 1: 1 j,k+1  j+1,k j+1,k+1 • Elements j and k are not star predicates: j,k j,k+1 j+1,k j+1,k+1

  21. Implication Graph

More Related