280 likes | 501 Views
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture -14 Parallel Database Systems. Outline. Parallel Database Systems Fundamental Functional Architecture Parallel DBMS Architectures shared-memory, shared-disk and
E N D
PMIT-6102Advanced Database Systems By- JesminAkhter Assistant Professor, IIT, Jahangirnagar University
Outline • Parallel Database Systems • Fundamental • Functional Architecture • Parallel DBMS Architectures • shared-memory, • shared-disk and • shared-nothing.
Parallel Database Systems • A parallelcomputer, or multiprocessor, is a special kind of distributed system made of a number of nodes (processors, memories and disks) connected by a very fast network within one or more cabinets in the same room. • Data distribution can be exploited to increase • performance (through parallelism) and • availability (through replication). • They can support very large databases with very high loads. • Implementation of parallel database systems naturally relies on distributeddatabase techniques.
Advantages • A parallel database system should provide the following advantages. • High-performance • Parallelism can increase throughput, using inter-query parallelism • Inter-query parallelism is a form of parallelism in the evaluation of database queries, in which several different queries execute concurrently on multiple processors to improve the overall throughput of the system. • decrease transaction response times, using intra-query parallelism • Intra-query parallelism is a form of parallelism in the evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.
Advantages • High-availability • Because a parallel database system consists of many redundant components, it can well increase data availability and fault-tolerance. • Replicating data at several nodes is useful to support failover, a fault-tolerance technique that enables automatic redirection of transactions from a failed node to another node that stores a copy of the data. This provides uninterupted service to users.
Advantages • Extensibility • Extensibility is the ability to expand the system smoothly by adding processing and storage power to the system. • Ideally, the parallel database system should Linear speedup and linear scale-up • Linear speedup refers to a linear increase in performance for a constant database size while the number of nodes (i.e., processing and storage power) are increased linearly. • Linear scale up refers to a sustained performance for a linear increase in both database size and number of nodes.
Advantages • Extensibility Fig. 14.1 Extensibility Metrics
Functional Architecture • The functions supported by a parallel database system can be divided into three subsystems much like in a typical DBMS. • Session Manager • Transaction Manager • Data Manager
Functional Architecture • Session Manager • It plays the role of a transaction monitor, providing support for client interactions with the server. • In particular, it performs the connections and disconnections between the client processes and the two other subsystems. • Therefore, it initiates and closes user sessions (which may contain multiple transactions). • In case of OLTP sessions, the session manager is able to trigger the execution of pre-loaded transaction code within data manager modules.
Functional Architecture • Transaction Manager • It receives client transactions related to query compilation and execution. • It can access the database directory that holds all meta-information about data and programs. • Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application. • Because it supervises transaction execution and commit, it may trigger the recovery procedure in case of transaction failure. • To speed up query execution, it may optimize and parallelize the query at compile-time.
Functional Architecture • Data Manager • It provides all the low-level functions needed to run compiled queries in parallel, i.e., database operator execution, parallel transaction support, cache management, etc. • If the transaction manager is able to compile dataflow control, then synchronization and communication among data manager modules is possible. Otherwise, transaction control and synchronization must be done by a transaction manager module.
Parallel DBMS Architectures • There are three basic parallel computer architectures depending on how main memory or disk is shared: • shared-memory, • shared-disk and • shared-nothing. • Hybrid architectures such as NUMA or cluster try to combine the benefits of the basic architectures.
Parallel DBMS Architectures • Shared-Memory • In the shared-memory any processor has access to any memory module or disk unit through a fast interconnect (e.g., a high-speed bus or a cross-bar switch). • All the processors are under the control of a single operating system. • All shared-memory parallel database products today can exploit inter-query parallelism to provide high transaction throughput and intra-query parallelism to reduce response time of decision-support queries.
Parallel DBMS Architectures • Shared-Memory Fig. 14.3 Shared-Memory Architecture
Parallel DBMS Architectures • Shared-Memory • Shared-memory has two strong advantages: • simplicity • Since meta-information (directory) and control information (e.g., lock tables) can be shared by all processors, writing database software is not very different than for single processor computers. • Intra-query parallelism requires some parallelization but remains rather simple • load balancing. • Load balancing is easy to achieve since it can be achieved at run-time using the shared-memory by allocating each new task to the least busy processor.
Parallel DBMS Architectures • Shared-Memory • Shared-memory has three problems: • high cost, • High cost is incurred by the interconnect that requires fairly complex hardware because of the need to link each processor to each memory module or disk. • limited extensibility • With faster processors (even with larger caches), conflicting accesses to the shared-memory increase rapidly and degrade performance • Therefore, extensibility is limited to a few tens of processors, typically up to 16 for the best cost/performance using 4-processor boards. • low availability • Finally, since the memory space is shared by all processors, a memory fault may affect most processors thereby hurting availability. The solution is to use duplex memory with a redundant interconnect.
Parallel DBMS Architectures • Shared-Disk • In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive (non-shared) access to its main memory. • Each processor-memory node is under the control of its own copy of the operating system. Then, each processor can access database pages on the shared disk and cache them into its own memory. • Since different processors can access the same page in conflicting update modes, global cache consistency is needed. • The first parallel DBMS that used shared-disk is Oracle with an efficient implementation of a distributed lock manager for cache consistency. • Other major DBMS vendors such as IBM, provide shared-disk implementations.
Parallel DBMS Architectures • Shared-disk has a number of advantages: • lower cost, • The cost of the interconnect is significantly less than with shared-memory since standard bus technology may be used. • high extensibility, • Given that each processor has enough main memory, interference on the shared disk can be minimized. Thus, extensibility can be better, typically up to a hundred processors. • load balancing, • easy migration from centralized systems. • availability, • Since memory faults can be isolated from other nodes, availability can be higher.
Shared-Nothing • In the shared-nothing approach each processor has exclusive access to its main memory and disk unit(s). • Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system. • Each node can be viewed as a local site (with its own database and software) in a distributed database system. • Therefore, most solutions designed for distributed databases such as database fragmentation, distributed transaction management and distributed query processing may be reused. • Using a fast interconnect, it is possible to accommodate large numbers of nodes. This architecture is often called Massively Parallel Processor (MPP).
Shared-Nothing • The first major parallel DBMS product was Teradata’s Database Computer that could accommodate a thousand processors in its early version. • Other major DBMS vendors such as IBM, Microsoft provide shared-nothing implementations.
Shared-Nothing • As demonstrated by the existing products, shared-nothing has three main virtues: • lower cost, • The cost advantage is better than that of shared-disk that requires a special interconnect for the disks. • high extensibility • By implementing a distributed database design that favors the smooth incremental growth of the system by the addition of new nodes, extensibility can be better (in the thousands of nodes). • high availability • By replicating data on multiple nodes, high availability can also be achieved.