1 / 16

Access methods for time-evolving data

Presenters : Virag Kothari ,Vandana Ayyalasomayajula Date: 04/21/2010. Access methods for time-evolving data. Outline. Introduction to temporal databases Goal of the paper Access method costs Queries Index pagination & Data Clustering

cate
Download Presentation

Access methods for time-evolving data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenters : Virag Kothari ,Vandana Ayyalasomayajula Date: 04/21/2010 Access methods for time-evolving data

  2. Outline • Introduction to temporal databases • Goal of the paper • Access method costs • Queries • Index pagination & Data Clustering • Efficient method design for transaction data • References

  3. Introduction • Based on time dimension • Transaction time database • Valid time database • Bitemporal database

  4. Goal • Attempt to identify the implications for access method design from support of each time dimension • In this presentation, transaction time databases are considered.

  5. Access Method Costs • Performance of an access method depends on • storage space to physically store the data records and the structures of the access method • update processing time (the time to update the method’s data structures as a result of a change) • the query time for each of the basic queries ( discussed in the next slide !)

  6. Queries • Given a contiguous interval T, find all objects alive during this interval. • Given a key range and a contiguous time interval T, find the objects with keys in the given range that are alive during interval T. • Given a key range, find the history of the objects in this range.

  7. Queries - Special cases ! • “transaction pure-timeslice” • A special case of class (I) occurs when interval T is reduced to a single transaction time instant t. • “transaction range-timeslice” • representative case of class (II) where the time interval is reduced to a single transaction time instant. • “transaction pure-key query” • representative case of class (III), key range is reduced to a single key

  8. Cost parameters • In the case of transaction , Bitemporal databases, • n - > summation of insertions, deletions, and modification updates. • For Valid time databases, • L - > the number of interval objects currently stored in the method, i.e., the size of the collection • a -> to denote the answer size of a query in general.

  9. Index pagination & Data Clustering • Cost depends on IO cost ! • Performance of an index depends on how well it is ‘Paginated’ • Example: B+ trees. • Data Clustering improves performance by storing logically near data , physically close on the disk. • pure-timeslice query takes O(logBn + a/B ) page accesses. This method is more I/O efficient than another method that solves the same query in O(logBn + a) page accesses.

  10. Efficient Method Design for Transaction • Transaction Pure-Timeslice Query • ‘copy’ approach • Stores a copy of the transaction database state s(t) (timeslice) for each transaction time that at least one change occurred • Copies are indexed by time t. • ‘log’ approach • Stores only the changes that occur in the database timestamped by the time instant on which they occurred. • Copies indexed by time t.

  11. Comparison –Transaction pure timeslice

  12. Transaction Pure Key • “copy” and “log” solutions could be used for the pure-key query. But they are both very inefficient !!. • A better solution is to store the history of each key separately, i.e., cluster data by key only. • Access to a key’s (transaction time) history can be implemented by a hashing function or B tree. • The list of versions of each key can be further organized in a separate array indexed by transaction time to answer a pure-key query with time predicate .

  13. Costs – Transaction pure key • Cost to index into hash table or B tree + cost of searching in the array. • Array length can be n/B , so cost would O( log B n) .

  14. Transaction Range-Timeslice • To answer a range query efficiently, it is best to cluster by transaction time and key within pages. • Very similar to spatial indexing concept. • Two dimensions , time & key need to be considered. • Data bounding technique • Another possibility data mapping, maps a record to three (or more) coordinates –transaction start_time, end-_time, and key(s)—and then uses a multiattribute point index.

  15. References • Betty Salzberg, Vassilis J. Tsotras: Comparison of Access Methods forTime-Evolving Data. ACM Comput. Surv. (CSUR) 31(2):158-221 (1999)

  16. Thank you!!

More Related