240 likes | 427 Views
E N D
1. Managing and Querying Transaction-time Databases under Schema Evolution Hyun J. Moon (UCLA)
Carlo Curino (Politecnico di Milano)
Alin Deutsch (UC San Diego)
Chien-Yi Hou (UC San Diego)
Carlo Zaniolo (UCLA)
2. PRIMA Hyun Moon 2
3. PRIMA Hyun Moon 3
4. PRIMA Hyun Moon 4
5. PRIMA Hyun Moon 5 Outline Background: Transaction-time DBs with Fixed Schemas
Challenges with Schema Evolution
How to Archive?
How to Query?
Optimization
6. PRIMA Hyun Moon 6 Schema Evolution Schemas change. Why?
Data reorganization
Better logical design, performance
Changes in the modeled reality
New data collection, M&A and restructuring, government regulations
Is it real?
Wikipedia DB: 171 schema versions over 4.5 years [ICEIS08]
More case studies [Marche93] [Sjoberg93]
7. PRIMA Hyun Moon 7 Schema Modification Operators (SMOs) Language for schema change
Procedural fashion
Do this, do that, … Easier for regular DBAs
Similar to primitives in [Bernstein06]
8. PRIMA Hyun Moon 8
9. PRIMA Hyun Moon 9 Previous work: relational approaches
Single-pool and Multi-pool [DeCastro97]
Problem: temporally ungrouped, i.e. redundant, expensive coalescing
Our approach: MV-Document
Extending V-Document for evolving schemas
Temporally grouped!
10. PRIMA Hyun Moon 10 MV-Document: Example
11. PRIMA Hyun Moon 11 Problem 2: How to Query w/ Schema Changes? Manual Querying
Write one query per version
Doesn’t scale: 100’s of versions
Schema Versioning by Data Translation
Translate data into the queried version. Survey [Roddick95]
Inefficient!
Implementation by Query Rewriting
Use the above method as the semantics definition
Rewrite the input query into source versions
Efficient!
12. PRIMA Hyun Moon 12 Query Rewriting MARS for a query rewriting engine [VLDB03]
Input: XQuery, XICs
Output: XQuery
Chase input query to find an equivalent query modulo XICs
PRIMA translate SMO into XICs
XML Integrity Constraints
ICs for XML
~ First-order logic with XPath
Simple case: MERGE TABLE S, T into R (from v1 to v2)
13. PRIMA Hyun Moon 13 Query Rewriting: A Simple Example engineerpersonnel(empno, name, hiredate, title, deptname)
otherpersonnel(empno, name, hiredate, title, deptname)
MERGE TABLE engineerpersonnel, otherpersonnel into empacct
empacct(empno, name, hiredate, title, deptname)
14. PRIMA Hyun Moon 14 Rewriting Optimization Problem: Rewriting with many constraints is expensive
Goal: minimize constraints used in rewriting
SMO Prune
use SMOs that affect the tables/columns used in the input query
SMO Compress
group SMOs of a same type and produce a single mapping
15. PRIMA Hyun Moon 15 Query Optimization Problem: query rewriting is for non-temporal queries
Goal: exploit queries’ temporal semantics to optimize
Minimal Source-version Detection
E.g. Find Joe’s salary at 2007-01-01
We don’t need to query other schema versions
Temporal Join Detection
Temporal join: a join with validity overlap
We don’t need to perform temporal joins across versions
16. PRIMA Hyun Moon 16 Experiment Results (1/2) Query rewriting scales with many schema versions?
Wikipedia’s 171 schema versions between Apr 2003 and Nov 2007
Top 20 queries from Wikipedia’s real query workload
17. PRIMA Hyun Moon 17 Experiment Results (2/2)
18. PRIMA Hyun Moon 18 Challenges of Schema Evolution & Panta Rhei Framework Show that we’re working and making progress on SE, in several aspects
SMO is common, which unifies our approach – mention the SMOs in the previous slidesShow that we’re working and making progress on SE, in several aspects
SMO is common, which unifies our approach – mention the SMOs in the previous slides
19. PRIMA Hyun Moon 19 Conclusions Transaction-time DB is a great thing
But, it doesn’t support evolving schemas
We propose an effective and efficient solution
Temporal data model for schema evolution,
Efficient query answering, by query rewriting
Optimization for rewriting and queries
20. PRIMA Hyun Moon 20 Thank you!
21. Extra Slides Smile!
Enjoy!
Introduce myself
Introduce collaborators – 2 faculties, 3 students, 3 schools – collaboration/expertise brought together
Smile!
Enjoy!
Introduce myself
Introduce collaborators – 2 faculties, 3 students, 3 schools – collaboration/expertise brought together
22. PRIMA Hyun Moon 22 Presentation in a Nutshell Problem
Transaction-time DBs have strong applications
Study cases reveal that schema evolution is a real problem
How to support evolving schemas in transaction-time DB?
Challenges and our contributions
Data model: how to archive, with evolving schemas?
Previous approach: relational-based approaches, which are temporally ungrouped
Our contribution: XML-based temporally grouped data model
Query support: how to query evolving data with evolving schemas?
Previous approach: schema versioning, based on data translation
Our contribution: schema versioning based on query rewriting
Solution scalability: how to rewrite complex temporal queries over hundreds of schema versions
Our contribution: mapping pruning, rewriting using minimal source versions and more
23. PRIMA Hyun Moon 23 PRIMA in Schema Versioning Taxonomy Schema versioning supported
Schema modification – allow schema changes
Schema evolution – above + data preserved after schema changes
Schema versioning – above + any schema version can be used for querying/updating
Single-pool supported
Single-pool: one big table for all schema versions, using completed schema, single line of history
Multi-pool: one table for each schema, multiple lines of history
Synchronous management of versioned data and schemata
Synchronous – temporal data conform to schema of same time
Asynchronous – temporal data may not conform to schema of same time
Partial schema versioning
Partial schema versioning – supports querying, retrospectively and prospectively. Update allowed only on a designated (usually current) schema version
Full schema versioning – supports both querying and update retrospectively and prospectively
24. PRIMA Hyun Moon 24