370 likes | 486 Views
Gamma DBMS (Part 2): Failure Management Query Processing. Shahram Ghandeharizadeh Computer Science Department University of Southern California. Failure Management Techniques. Teradata’s Interleaved Declustering A partitioned table has a primary and a backup copy.
E N D
Gamma DBMS (Part 2): Failure Management Query Processing Shahram Ghandeharizadeh Computer Science Department University of Southern California
Failure Management Techniques • Teradata’s Interleaved Declustering • A partitioned table has a primary and a backup copy. • The primary copy is constructed using one of the partitioning techniques. • The secondary copy is constructed by: • Dividing the nodes into clusters (cluster size is 4), • Partition a primary fragment (R0) across the remaining nodes of the cluster: 1, 2, and 3. Realizing r0.0, r0.1, and r0.2.
Teradata’s Interleaved Declustering • When a node (say 1) fails, its backup copy processes requests directed towards the primary copy of R1. • Three backup fragments r1.2, r1.0 and r1.1. • Note that the load of R1 is distributed across the remaining nodes of the cluster.
Teradata’s Interleaved Declustering • MTTR involves: • Replacing the failed node with a new one. • Reconstructing the primary copy of the fragment assigned to the failed node, R1. • By reading r1.2, r1.0, and r1.1 from Nodes 0, 2, and 3. • Reconstructing the backup fragments assigned to the failed node: r0.0, r2.2, and r3.1.
Teradata’s Interleaved Declustering • When does data become unavailable?
Teradata’s Interleaved Declustering • When does data become unavailable? • When a second node in a cluster fails prior to repair of the first failed node in that cluster. • Note that it is a bit more complex than the discussion here.
Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8?
Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure.
Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8?
Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8? • Higher likelihood of data becoming unavailable.
Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8? • Higher likelihood of data becoming unavailable. • Tradeoff between load-balancing (in the presence of a failure) and data availability.
Gamma’s Chained Declustering • Nodes are divided into disjoint groups called relation clusters. • A relation is assigned to one relation cluster and its records are declustered across the nodes of that relation cluster using a partitioning strategy (Range, Hash). • Given a primary fragment Ri, its backup copy is assigned to node (i+1) mod M (M is the number of nodes in the relation cluster).
Gamma’s Chained Declustering • During normal mode of operation: • Read requests are directed to the fragments of primary copy, • Write requests update both primary and backup copies.
Gamma’s Chained Declustering • In the presence of failure: • Both primary and backup fragments are used for read operations, • Objective: Balance the load and avoid bottlenecks! • Write requests update both primary and backup copies. • Note: • Load of R1 (on node 1) is pushed to node 2 in its entirety. • A fraction of read request from each node is pushed to the others for a 1/8 load increase attributed to node 1’s failure.
Gamma’s Chained Declustering • MTTR involves: • Replace node 1 with a new node, • Reconstruct R1 (from r1 on node 2) on node 1, • Reconstruct backup copy of R0 (i.e., r0) on node 1. • Note: • Once Node 1 becomes operational, primary copies are used to process read requests.
Gamma’s Chained Declustering • Any two node failures in a relation cluster does not result in data un-availability. • Two adjacent nodes must fail in order for data to become unavailable.
Gamma’s Chained Declustering • Re-assignment of active fragments incurs neither disk I/O nor data movement.
Join • Hash-join • A data-flow execution paradigm
Example Join of Emp and Dept Emp join Dept (using dno) Dept EMP
Hash-Join: 1 Node • Join of Tables A and B using attribute j (A.j = B.j) consists of two phase: • Build phase: Build a main-memory hash table on Table A using the join attribute j, e.g., build a hash table on the Toy department using dno as the key of the hash table. • Probe phase: Scan table B one record at a time and use its attribute j to probe the hash table constructed on Table A, e.g., probe the hash table using the rows of the Emp department.
Hash-Join: Build • Read rows of Dept table one at a time and place in a main-memory hash table. dno % 7
Hash-Join: Build • Read rows of Emp table and probe the hash table. dno % 7
Hash-Join: Build • Read rows of Emp table and probe the hash table and produce results when a match is found. dno % 7
Hash-Join: Build • Termination condition is when all rows of the Emp table have been processed! dno % 7
Hash-Join • Key challenge:
Hash-Join • Key challenge: Table used to build the hash table does not fit in main memory! • Solution:
Hash-Join • Key challenge: Table used to build the hash table does not fit in main memory! • A divide-and-conquer approach: • Use the inner table (Dept) to construct n memory buckets where each bucket is a hash table. • Every time memory is exhausted, spill a fixed number of buckets to the disk. • The build phase terminates with a set of in-memory buckets and a set of disk-resident buckets. • Read the outer relation (Emp) and probe the in-memory buckets for joining records. For those records that map onto the disk-resident buckets, stream and store them to disk. • Discard the in memory buckets to free memory space. • While disk-resident buckets of inner-relation exist: • Read as many (say i) of the disk-resident buckets of the inner-relation into memory as possible. • Read the corresponding buckets of the outer relation (Emp) to probe the in-memory buckets for joining records. • Discard the in memory buckets to free memory space. • Delete the i buckets of the inner and outer relations.
Hash-Join: Build • Two buckets of Dept table. One in memory and the second is disk-resident. dno % 7
Hash-Join: Probe • Read Emp table and probe the hash table for joining records when dno = 1. With dno=2, stream the data to disk. dno % 7
Hash-Join: Probe • Those rows of Emp table with dno=1 probed the hash table and produce 3 joining records. dno % 7
Hash-Join: While loop • Read the disk-resident bucket of Dept into memory. dno % 7
Hash-Join: While loop • Read the disk-resident bucket of Dept into memory. • Probe it with the disk-resident buckets of Emp table to produce the remaining two joining records. dno % 7
Parallelism and Hash-Join • Each node may perform hash-join independently when: • The join attribute is the declustering attribute of the tables participating in the join operation. • The participating tables are declustered across the same number of nodes using the same declustering strategy. • The system may re-partition the table (see the next bullet) if its aggregate memory exceeds the size of memory the tables are declustered across. • Otherwise, the data must be re-partitioned to perform the join operation correctly. • Show an example!
Parallelism and Hash-Join (Cont…) • R join S where R is the inner table.
Data Flow Execution Paradigm • Retrieve all those Employees working for the toy department: SELECT * FROM Dept d, Emp e WHERE d.dno = e.dno and d.dname = Toy
Data Flow Execution Paradigm • Producer/Consumer relationship where consumers are activated in advance of the producers.
Data Flow Execution Paradigm • “Split Table” contains routing information for the records • The consumers must be setup in order to activate producers.