CS411 Design of Database Management Systems

CS411Design of Database Management Systems Lecture 04: Parallel Databases Kevin C. Chang

Database Machines • Early 80’s: Specialized HW to run DB faster • extra processors • smart storage: make storage smarter, faster, do more • much like graphic accelerator for graphic app. • e.g.: • processor per disk track: eliminate seek • processor per head: eliminate bulk IO for selections • Failed but not totally washed • not help much with expensive operations (sort, joins) • no economical scale • Interests recently picked up • DB as HW target & benchmark app • architecture/DB research

Gray’s Bill’s Law for Software Economics • Bill Joy’s law (Sun): • Don’t write software for less than 100,000 platforms. @10M$ engineering expense, 1,000$ price • Bill Gate’s law: • Don’t write software for less than 1,000,000 platforms. @10M$ engineering expense, 100$ price • Examples: • $UNIX > $NT • UNIX-Oracle vs. SQL-Server

Parallel vs. Distributed DB Fully integrated system, logically a single machine • No notion of site autonomy • Centralized schema • All queries started at a well-defined “host”

Parallel DB: Performance Metrics • Speedup: constant problem, growing system small-system-elapsed-time big-system-elapsed-time • linear speedup if N-system yields N-speedup • Scaleup: ability to grow both the system/problem 1-system-elapsed-time-on-1-problem N-system-elapsed-time-on-N-problem • linear if scaleup = 1 • some problems have super-linear increase in cost • e.g., nlog(n) of sorting

Task Scaleup • Transactional scaleup • N times more transactions • ideal for parallel machine: each job run independently • Batch scaleup • scaleup as a single large task • queries on N-times larger DB • scientific simulation on N-times finer grid

Any Any Sequential Sequential Program Program Sequential Sequential Any Any Sequential Sequential Sequential Sequential Program Program Natural Parallelism: Relations In and Out • Pipeline: • piping the output of one op into the next • Partition: • N op-clones, each processes 1/N input • Observation: • essentially sequential programming

Data River: Split + Merge Streams • Producers add records to the river • Consumers consume records from the river • Purely sequential programming • river does flow control and buffering N X M Data Streams M Consumers N producers River

Speedup & Scaleup Barriers • Startup: • time to start a parallel operation • e.g.: creating processes, opening files, optimization • Interference: • slowdown for access shared resources • e.g.: locks, hotspots, logs • Skew: • service time = slowest parallel step of the job • average size of step decreases, but variance can dominate (e.g., update selected tuples)

Speedup

Parallel Architectures • Shared memory • Shared disks • Shared nothing • ?? Pros and cons? • software development (programming)? • hardware development (system scalability)?

Architecture: Shared What? Shared Memory Shared Disk Shared Nothing Easy to program Difficult to build Difficult to scaleup Hard to program Easy to build Easy to scaleup Sequent, SGI, Sun VMScluster, Sysplex Tandem, Teradata, SP2 • Winner will be hybrid of shared memory & shared nothing? • e.g.: distributed shared memory (DASH, Encore)

What was Gamma’s Choice? Why?

(Horizontal) Data Partitioning • Round robin • Hashed • Range partitioned • ?? Relevant to shared-what? • ?? Pros and cons? • directed queries? load spreading?

Data Partitioning • Round robin • query: no direction • load: distribute data uniformly • Hashed • query: can direct equality, e.g., equijoins • load: somehow randomized • Range partitioned • query: range queries, equijoins, group-by • load: depending on query’s range of interest • Indexes: • indexes created at all sites • primary index records where a tuple resides #

Query Execution • Query manager • parse and optimize query, generate operator tree • send to site (if a single-site query) or dispatcher • Dispatcher • give query to a scheduler (simple load control) • Scheduler • passes pieces to operator processes at sites • Site query processor with operator processes • results sent through scheduler to QM

Control Messages • 3-times as many as operators in query tree • Scheduler  Processor: initiate • Processor  Scheduler: ID or port to talk to • e.g., for later data movement • Operator  Scheduler: Done

Selections • Selection(R) = Union (Selection R1, …, Selection Rn) • Initiate selection operator at each relevant site • if predicate on partitioning attributes (range or hash): • send to those “overlapping” sites • otherwise, send to all sites

Joins: Hashing with “Left”-Deep Trees R hash-join S: hash R into hashtable in buffer // building relation for each tuple s of S // probing relation probe hashtable over R output all (r, s) s.t. r.c = s.c • Hash-based algorithms for join and grouping • hybrid hash join as we will discuss • Use “left”-deep (or building-deep) trees: (memory concerns) • building relations to further build • keep at most two join operations in tree active • ?? Q: Why not sort-merge join?

Hybrid Hash Join: Centralized Step 1: Step 2: hash build hash probe Hash Table R S (all r with same hash values) Passed Over R Tuples Passed Over S Tuples Step 3: continue on R-PassedOver join S-PassedOver

Parallel Hybrid Hash-Join M Joining Processors (later) R21 R2k R11 R1M RN1 RNk K Disk Sites joining split table R1 RN • partitioning split table • so that Ri fits into M processors building relation R

Aggregate Operations • Aggregate functions: • Count, Sum, Avg, Max, Min, MaxN, MinN, Median • select Sum(sales) from Sales group by timeID • Each site computes its piece in parallel • Final results combined at a single site • Example: Average(R) • ? what should each Ri return? • how to combine? • ?? Always can do “piecewise”?

Updates • Updates: replace, delete, append tuples • As usual • May require moving a tuple • ? when?

Sorting • Used for • data loading and reorganization • building indexes • Parallel sorting: • Bad benchmark for shared-nothing • a lot of data movement Merge Sort Sorted Data Input Data Runs

Performance Results • Linear speedup, constant scaleup! • not perfect, but close • almost expected • little startup overhead • no interference among disjoint data/operations • major operations (equijoin) insensitive to data skew

Research Issues Missing • Query optimization for parallel execution • Load balancing • Skew handling

What’s Next • Object-relational system– Postgres “Next Generation” DBMS

CS411 Design of Database Management Systems

CS411 Design of Database Management Systems

Presentation Transcript

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Design of Database Management Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems

CS411 Database Systems