The State of the Art in Distributed Query Processing

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco

Introduction • Distributed database technology is becoming an increasingly attractive enhancement to many database systems • Cost and scalability • Software integration • Legacy systems • New applications • Market forces

Introduction • Topics covered in this paper • Basics of distributed query processing • Client-server distributed DB models • Heterogeneous distributed DB models • Data placement techniques • Other distributed architectures

Client-Server Database Systems • Relationships between distributed nodes take a client-server form • Client: makes requests of the servers, usually the source of queries • Server: responds to client requests, usually the source of data • System architectures: peer-to-peer, strict client-server, middleware/multitier

Architectures: Peer-to-Peer • All nodes are equivalent • Each can be either a client or server on demand (can store data and/or make requests) • Ex: SHORE system

Architectures: Strict Client-Server • Client or server status is pre-defined and can never change • Clients supply queries, servers supply data • Most common architecture in commercial DBMS’s

Architectures: Middleware/Multitier • Multiple levels of client-server interaction • Nodes act as clients to those below them and servers to those above • SAP R/3, web servers with DB backends

Architectures: Evaluation • Peer-to-Peer • Simplest setup • Equal load sharing • Strict Client-Server • Specialization • Administration for servers only • Middleware/Multitier • Functionality integration • Scalability

Client-Server Query Processing • Queries initiated at clients, data stored at servers • Where do we execute the query? • Query shipping: move the query down to the data • Data shipping: move the data up to the query • Hybrid shipping: combination of both

Query Shipping • SQL query code is sent down to the server • Server parses and evaluates query, returns result • Used in DB2, Oracle, MS SQL Server

Data Shipping • Client parses query and requests data from server • Server provides data, then client executes query • Data can be cached at client (main memory or disk)

Hybrid Shipping • Mix-and-match data shipping and query shipping • Query parts can be executed at any level according to query plan • Data is cached when beneficial

Evaluation • Query Shipping • Reliant on server performance • Scales poorly with increasing client load • Data Shipping • Good scalability • High communication costs • Hybrid • Potential to outperform other options • More complex optimizations

Hybrid Shipping Observations • Some observations of optimal performance using hybrid shipping • Preference to not use a client cache • If network transfer cost < client access cost • Shipping down cached data • If in main memory & execution at server • Multiple small updates • Maintain at client and post to server only when necessary

Query Optimization • Query plans must also specify where the query pieces are executed • Data shipping: all execution done at client • Query shipping: all execution done at server • Hybrid: choice can be made for each operator • Results display to user is always at client

Distributed Query Plans • Each operator is annotated with a logical site of execution – plans are shareable • client means an operator is executed from the client where the query is issued • server means: • for scan operators, execute at a location that has the necessary data • for updates, execute at all locations with the relevant data

Query Optimization: Where? • Should optimization occur at the client or the server? • At client: less load on servers, better scalability • At server: more information about system statistics, especially server loads • Potential solution: primary parsing and query rewriting at client, further optimization at server

Query Optimization: Statistics • Even when optimization is done at a server, that server does not usually have full knowledge of the system • System can either: • Guess the status of other servers – less accuracy, less cost • Ask other servers their status – fully accurate, additional communication costs

Query Optimization: When? • Tradeoff of accuracy vs. cost • Traditional-style: optimize once, store plan • No support for changing DB conditions • No incurred cost for query execution • Plan sets: optimize for possible scenarios • Generate a few query plans for diff. conditions • Choose plans based on runtime statistics • On-the-fly: observe intermediate results • Re-optimize query if different from expectations

Query Optimization: Two-Step • Compile-time: generate join order, etc. • Runtime: perform site selection • Reasonable cost at each end • Responds well to changing server loads • Fully utilizes client data caching

Two-Step Optimization: Downside • Optimal plan is generated traditional-style • Site selection is performed • True optimal plan was missed • Optimal was missed because first optimization step was done with no knowledge of the system

Query Execution Techniques • Standard fare: row blocking, multithread when possible • Issues: transactions with both updates and retrieval queries using hybrid shipping • We want to wait to propagate updates for efficiency’s sake • Other option: perform query before update and temporarily pad results

Questions? • Comments?

The State of the Art in Distributed Query Processing

The State of the Art in Distributed Query Processing

Presentation Transcript

Distributed Query Processing

The state of the art

Tutorial 2: Distributed Query Processing

Distributed Query Processing

Distributed Query Processing –An Overview

The State of the Art

State of the Art

Distributed Query Processing with OGSA-DQP

Service-Based Distributed Query Processing on the Grid

5. Distributed Query Processing

The State of the Art

Tutorial 2: Distributed Query Processing

HyperQueries: Dynamic Distributed Query Processing on the Internet

STATE OF THE ART

Query Processing: The Basics

The State of the Art

The State of the Art in Thai Language Processing

The State of the Art in Locally Distributed Web-server Systems

Distributed Databases and Query Processing

Distributed Query Processing

The State of the Art

Distributed Query Processing