260 likes | 327 Views
Context. Tailoring the DBMS To support particular applications Beyond alphanumerical data Beyond retrieve + process To support particular hardware New storage devices To incorporate novel techniques New join implementations. Extensibility. Language extensions Abstract data types (ADT)
E N D
Context Tailoring the DBMS • To support particular applications • Beyond alphanumerical data • Beyond retrieve + process • To support particular hardware • New storage devices • To incorporate novel techniques • New join implementations
Extensibility • Language extensions • Abstract data types (ADT) • User defined functions (UDF) • Data management extensions • New access methods • New storage methods • Query processing extensions • New join methods • New optimization techniques
Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and stars • Mechanisms for extensibility • Rules for query rewrite and plan optimization
Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design
Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary
Starburst - Language Extensions • User defined functions (1) • Scalar functions • In: one or more field values from a single tuple • Out: a single value • Aggregate functions • In: one or more field values from several tuples • Out: a single value
Starburst - Language Extensions • User defined functions (2) • Set predicate functions • In: a simple predicate and a subquery (defines the range for the predicate) • Out: a boolean value • Table functions • In: one or several table expressions as well as field values • Out: a relation
Starburst – Language Extensions • Abstract data types • Considered useful for: • Type checking • Structuring of users’data • Add-on to the system design
Starburst – Data Management Extensions • Uniform record structure: • Header + offset directory + data area • Advantages: • Support for nested records • Treatment of null values and variable length fields • Inconvenients: • Overhead per record due to the offset directory • Core system services • Logging, recovery manager, predicate evaluator, event queues, lock manager, interface to OS services, debugging, tracing, error reporting.
Starburst – Data Management Extensions • Storage methods [associated to a relation] • Run-time methods for accessing relations: scan, fetch, insert, update, delete, destroy • Implementation: the run-time methods are registered in vector lists • Compile-time cost estimates • Attachments [associated to a relation] • Access methods, integrity constraints and trigger extensions
Starburst – Data Management Extensions • Advantages • New storage methods and attachments can be added without modifying existing code • Limitations • Attachments only called after storage methods • Order in which attachments are called in fixed order
Starburst – Query Processing Extensions Internal representation of queries • Query graph model • Beyond parse trees for the low-level plan operators • Used for query rewrite • Query execution plan • Operator based representation • Strategy alternative rules (stars) to represent execution plan • Used for query plan generation
Query Graph Model • Boxes • Stored relations • Derived relations • Vertices • Setformers iterators: produce tuples for a derived relation • Quantifiers iterators: restrict tuples for a derived relation • Edges • Range edges connecting a vertex and a box: access to a stored or a derived relation • Qualifier edges connecting one or more vertices: conjunction of predicates
Query Rewrite • Objectives: • Equivalent representation for alternative phrasings of a query • Only the DBMS can rewrite queries involving views • Example rules: • Views may be merged • Redundant joins may be eliminated • Selections may be pushed down
Query Rewrite Rules • A rule transforms a QGM into another QGM • Condition / action: IF THEN rules • Rule engine • Forward chaining • Various control strategies for rule application • Search strategy • Top down (depth first / breadth first)/ bottom up
How to Choose Between Alternative Rules? • Cost based decision • Problem: cost estimates are only known at the query execution plan level • Approach: several alternatives are kept in the QGM – CHOOSE operation
Query Execution Plan Execution plan represented using production rules: • Terminals: low-level plan operators • In: 0 or more streams of tuples • Out: 0 or more streams of tuples • Each stream of tuples is tagged with properties • Relational: schema information • Operational: order, location • Estimated: • Non terminals: STAR • Name • Alternative definitions in terms of low-level plan operators or other STARs
Query Execution Plan • A query execution plan is a tree of low-level plan operators • STAR production rules are used for generating query execution plans • General purpose STAR evaluator • Search strategy to choose next STAR to apply • Vector list of stars
Starburst Contributions • Revisited internal data structures • Query graph model • Query execution plan: low-level operators and STARs • Mechanisms for extensibility • Rules for query rewrite and plan optimization
Outline • Introduction • Starburst • Language extensions • Data management extensions • Query processing extensions • Predator • E-ADT processing • Summary
Basic Techniques for ADTs • Vector List of ADTs • Each ADT implements: • Common internal interface for access to ADT values • Functions for storage and indexed retrieval • Methods associated to ADT • ADT methods can be composed • DBMS understands minimal semantics about each method “Black box” ADT Approach
Motivation for E-ADTs • Basic observation: • ADT Methods can be expensive! • Need to identify optimizations on ADT methods • Need to define a framework for applying these optimizations systematically
Possible Optimizations • Algorithmic: • Using different algorithms for each method depending on data characteristics • Transformational: • Changing the order of methods • Constraint: • Pushing physical constraints through a method • Pipelining: • Avoiding materialization of intermediate results
Architectural Framework Each E-ADT supports some of the following enhancements: • Optimization: transforms a method expression into a query execution plan expression • Evaluation: routines to execute the query execution plan expression • Catalog management: routines to store schema information and maintain statistics • Storage management: physical representation of values of its type
E-ADT Rewrite Rules • Some of the optimizations for ADT methods can be applied on a logical representation of queries using rewrite rules
Predator Contributions • Enhanced abstract data types • Encapsulation principle applied to storage, optimization and evaluation • Type centric DBMS design