1 / 24

Managing and Querying Transaction-time Databases under Schema Evolution

sharvani
Download Presentation

Managing and Querying Transaction-time Databases under Schema Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Managing and Querying Transaction-time Databases under Schema Evolution Hyun J. Moon (UCLA) Carlo Curino (Politecnico di Milano) Alin Deutsch (UC San Diego) Chien-Yi Hou (UC San Diego) Carlo Zaniolo (UCLA)

    2. PRIMA Hyun Moon 2

    3. PRIMA Hyun Moon 3

    4. PRIMA Hyun Moon 4

    5. PRIMA Hyun Moon 5 Outline Background: Transaction-time DBs with Fixed Schemas Challenges with Schema Evolution How to Archive? How to Query? Optimization

    6. PRIMA Hyun Moon 6 Schema Evolution Schemas change. Why? Data reorganization Better logical design, performance Changes in the modeled reality New data collection, M&A and restructuring, government regulations Is it real? Wikipedia DB: 171 schema versions over 4.5 years [ICEIS08] More case studies [Marche93] [Sjoberg93]

    7. PRIMA Hyun Moon 7 Schema Modification Operators (SMOs) Language for schema change Procedural fashion Do this, do that, … Easier for regular DBAs Similar to primitives in [Bernstein06]

    8. PRIMA Hyun Moon 8

    9. PRIMA Hyun Moon 9 Previous work: relational approaches Single-pool and Multi-pool [DeCastro97] Problem: temporally ungrouped, i.e. redundant, expensive coalescing Our approach: MV-Document Extending V-Document for evolving schemas Temporally grouped!

    10. PRIMA Hyun Moon 10 MV-Document: Example

    11. PRIMA Hyun Moon 11 Problem 2: How to Query w/ Schema Changes? Manual Querying Write one query per version Doesn’t scale: 100’s of versions Schema Versioning by Data Translation Translate data into the queried version. Survey [Roddick95] Inefficient! Implementation by Query Rewriting Use the above method as the semantics definition Rewrite the input query into source versions Efficient!

    12. PRIMA Hyun Moon 12 Query Rewriting MARS for a query rewriting engine [VLDB03] Input: XQuery, XICs Output: XQuery Chase input query to find an equivalent query modulo XICs PRIMA translate SMO into XICs XML Integrity Constraints ICs for XML ~ First-order logic with XPath Simple case: MERGE TABLE S, T into R (from v1 to v2)

    13. PRIMA Hyun Moon 13 Query Rewriting: A Simple Example engineerpersonnel(empno, name, hiredate, title, deptname) otherpersonnel(empno, name, hiredate, title, deptname) MERGE TABLE engineerpersonnel, otherpersonnel into empacct empacct(empno, name, hiredate, title, deptname)

    14. PRIMA Hyun Moon 14 Rewriting Optimization Problem: Rewriting with many constraints is expensive Goal: minimize constraints used in rewriting SMO Prune use SMOs that affect the tables/columns used in the input query SMO Compress group SMOs of a same type and produce a single mapping

    15. PRIMA Hyun Moon 15 Query Optimization Problem: query rewriting is for non-temporal queries Goal: exploit queries’ temporal semantics to optimize Minimal Source-version Detection E.g. Find Joe’s salary at 2007-01-01 We don’t need to query other schema versions Temporal Join Detection Temporal join: a join with validity overlap We don’t need to perform temporal joins across versions

    16. PRIMA Hyun Moon 16 Experiment Results (1/2) Query rewriting scales with many schema versions? Wikipedia’s 171 schema versions between Apr 2003 and Nov 2007 Top 20 queries from Wikipedia’s real query workload

    17. PRIMA Hyun Moon 17 Experiment Results (2/2)

    18. PRIMA Hyun Moon 18 Challenges of Schema Evolution & Panta Rhei Framework Show that we’re working and making progress on SE, in several aspects SMO is common, which unifies our approach – mention the SMOs in the previous slidesShow that we’re working and making progress on SE, in several aspects SMO is common, which unifies our approach – mention the SMOs in the previous slides

    19. PRIMA Hyun Moon 19 Conclusions Transaction-time DB is a great thing But, it doesn’t support evolving schemas We propose an effective and efficient solution Temporal data model for schema evolution, Efficient query answering, by query rewriting Optimization for rewriting and queries

    20. PRIMA Hyun Moon 20 Thank you!

    21. Extra Slides Smile! Enjoy! Introduce myself Introduce collaborators – 2 faculties, 3 students, 3 schools – collaboration/expertise brought together Smile! Enjoy! Introduce myself Introduce collaborators – 2 faculties, 3 students, 3 schools – collaboration/expertise brought together

    22. PRIMA Hyun Moon 22 Presentation in a Nutshell Problem Transaction-time DBs have strong applications Study cases reveal that schema evolution is a real problem How to support evolving schemas in transaction-time DB? Challenges and our contributions Data model: how to archive, with evolving schemas? Previous approach: relational-based approaches, which are temporally ungrouped Our contribution: XML-based temporally grouped data model Query support: how to query evolving data with evolving schemas? Previous approach: schema versioning, based on data translation Our contribution: schema versioning based on query rewriting Solution scalability: how to rewrite complex temporal queries over hundreds of schema versions Our contribution: mapping pruning, rewriting using minimal source versions and more

    23. PRIMA Hyun Moon 23 PRIMA in Schema Versioning Taxonomy Schema versioning supported Schema modification – allow schema changes Schema evolution – above + data preserved after schema changes Schema versioning – above + any schema version can be used for querying/updating Single-pool supported Single-pool: one big table for all schema versions, using completed schema, single line of history Multi-pool: one table for each schema, multiple lines of history Synchronous management of versioned data and schemata Synchronous – temporal data conform to schema of same time Asynchronous – temporal data may not conform to schema of same time Partial schema versioning Partial schema versioning – supports querying, retrospectively and prospectively. Update allowed only on a designated (usually current) schema version Full schema versioning – supports both querying and update retrospectively and prospectively

    24. PRIMA Hyun Moon 24

More Related