210 likes | 322 Views
Rule-based Management of Schema Changes at ETL sources. G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , T. Sellis 1,4 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) { gpapas , yv }@ dblab . ece . ntua . gr
E N D
Rule-based Management of Schema Changes at ETL sources G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, T. Sellis1,4, Y. Vassiliou1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) pvassil@cs.uoi.gr (3) HP Labs, Palo Alto, California, USA alkis@hp.com (4) Institute for the Management of Information Systems (Greece) timos@imis.athena-innovation.gr
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
Data Warehouse Environment MEDWa ‘09, Riga, September 2009
Data Warehouse Schema Evolution • Data warehouses are evolving environments, e.g.: • A dimension is removed or renamed • The structure of a dimension table is updated • A fact table is completely decoupled from a dimension • The measures of a fact table change • An ETL source is modified, etc MEDWa ‘09, Riga, September 2009
Evolving ETL sources… • Schema Changes on the sources of ETL processes. Design constructs are • Added, Removed, Modified • ETL processes affected: • Syntactically – i.e., become invalid • Semantically – i.e., must conform to the new source database semantics • Adaptation of ETL flows • time-consuming task, • treated in most of the cases manually by the administrators/developers MEDWa ‘09, Riga, September 2009
We would like to know... • What part of the process is affected and how if e.g., an attribute is deleted? • Can we predict and handle the impact of changes? • To what extent can readjustment be automated? MEDWa ‘09, Riga, September 2009
Hecataeus Framework • Graphbasedrepresentation of ETL workflows • Evolution events are mapped to changes on the graph constructs • Annotation of graph with rules for adapting ETL processes to source schema evolution • Mechanism for performing what-if analysis for potential changes of ETL sources MEDWa ‘09, Riga, September 2009
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
ETL Workflow representation MEDWa ‘09, Riga, September 2009
Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp# Join, GB Query representation MEDWa ‘09, Riga, September 2009
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
Graph Annotation with rules According to prevailing policy, the proper action is taken graph evolution MEDWa ‘09, Riga, September 2009
Event Add attribute Phone to relation EMP Example Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q: SELECT EMP.Emp#, EMP.Name, Phone FROM EMP MEDWa ‘09, Riga, September 2009
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
System architecture XML , Graph Viewer Import / jpeg Export DDL files Graph Visualization Scenarios SQL scripts Evolution Manager DB Schema representation Metric Manager Workload representation Parser Evolution Semantics Create Validate DB Workload Schema DB Catalog MEDWa ‘09, Riga, September 2009
Evolution Manager Architecture MEDWa ‘09, Riga, September 2009
Outline • Motivation • Graph-based representation of ETL processes • Regulating ETL Evolution • Hecataeus Internals • Conclusions MEDWa ‘09, Riga, September 2009
Research in DB Evolution • DB Schema Evolution • OODB evolution • Schema versioning • DW Schema Evolution • Taxonomy of evolution events • Versioning • Materialized Views Evolution • View adaptation & synchronization • Evolution wrt Model Mappings MEDWa ‘09, Riga, September 2009
Summarizing • The problem of adaptation of ETL workflows to evolvable data sources • Graph –based representation of ETL activities • Graph enrichment with semantics for evolution events • Graph annotation with rules for handling a priori evolution events • Hecataeus: Framework for performing and evaluating evolution scenarios in DW environments MEDWa ‘09, Riga, September 2009
Thank you ... Hecataeus: A tool for visualizing and performing what-if analysis for evolution scenarios http://www.cs.uoi.gr/~pvassil/projects/hecataeus/ MEDWa ‘09, Riga, September 2009