220 likes | 233 Views
Parallel DBMS. Parallel DBMS. Parallel Computer and Parallel Database System A parallel computer, or multiprocessor, is itself a distributed system made of a number of nodes (processors and memories) connected by a fast network within a cabinet.
E N D
Parallel DBMS www.assignmentpoint.com
Parallel DBMS • Parallel Computer and Parallel Database System • A parallel computer, or multiprocessor, is itself a distributed system made of a number of nodes (processors and memories) connected by a fast network within a cabinet. • Parallel database systems combine database management and parallel processing in order to deliver high-performance and high-availability database servers at a much lower price than equivalent mainframe computers. www.assignmentpoint.com
Parallel DBMS • Functional Aspects provided by parallel database systems • Ideally, a parallel database system should have the following functional aspects. • High-performance: This can be obtained through several complementary solutions: database-oriented operating system support, parallelism, optimization, and load balancing. • High-availability: Because a parallel database system consists of many similar components, it can exploit data replication to increase database availability. www.assignmentpoint.com
Parallel DBMS • Functional Aspects provided by parallel database systems • Extensibility: It is the ability of smooth expansion of the system by adding processing and storage power to the system. Ideally, the parallel database system should provide two extensibility advantages: • Linear Speedup and • Linear Scaleup. • Linear speedup refers to a linear increase in performance for a constant database size and linear increase in processing and storage power. • Linear scaleup refers to a sustained performance for a linear increase in both database size and processing and storage power. www.assignmentpoint.com
Parallel DBMS • Functional Aspects provided by parallel database systems • Linear speedup and Linear scaleup. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • There are three basic parallel computer architectures depending on how main memory or disk is shared: • Shared-memory, • Shared-disk and • Shared-nothing. • Shared-Memory • In the shared-memory approach any processor has access to any memory module or disk unit through a fast interconnect (e.g. a high-speed bus). All the processors are under the control of a single operating system. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • Shared-Memory Advantages: simplicity and load balancing Problems: high cost, limited extensibility and low availability. Example: XPRS, DBS3, and Volcano. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • Shared-Disk • In the shared-disk approach any processor has access to any disk unit through the interconnect but exclusive access to its main memory. Each processor-memory node is under the control of its own copy of the operating system. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • Shared-Disk • Advantages: lower cost, high extensibility, load balancing, availability, and easy migration from centralized systems. • Problems: Shared-disk suffers from higher complexity and potential performance problems. • Example: IBM’s IMS/VS and DEC’s VAX DBMS. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • Shared-Nothing • In the shared-nothing approach, each processor has exclusive access to its main memory and disk unit(s). Similar to shared-disk, each processor memory- disk node is under the control of its own copy of the operating system. Then, each node can be viewed as a local site in a distributed database system. www.assignmentpoint.com
Parallel DBMS • Parallel Architectures • Shared-Nothing • Advantages: Shared-nothing has three main virtues: lower cost, high extensibility, and high availability. • Problems: Shared-nothing is much more complex to manage than either shared-memory or shared-disk. • Example: BUBBA, EDS, GAMMA, GRACE, and PRISMA. www.assignmentpoint.com
Parallel DBMS • Hierarchical Architecture • Hierarchical Architecture (also called cluster architecture) is a combination of shared-nothing and shared-memory. The idea is to build a shared-nothing machine whose nodes are shared-memory. • The advantages of such architecture are evident. • It combines the flexibility of and performance of SM and high extensibility of SN. • In each shared-memory node, communication is done efficiently using the shared-memory, thus increasing performance • Load balancing is eased by SM component. • Disadvantages: • Limited number of large nodes, e.g., 4 x 16 processor nodes • High number of small nodes, e.g., 16 x 4 processor nodes. www.assignmentpoint.com
Parallel DBMS • Hierarchical Architecture • Example: NCR’s Teradata. www.assignmentpoint.com
Parallel DBMS • Components of parallel DBM architecture. • It has three major components or subsystems. • Session Manager: It performs the connections and disconnections between the client processes and the two other subsystems. • Transaction Manager: It receives client transactions related to query compilation and execution. It can access the database directory that holds all meta-information about data and programs. Depending on the transaction, it activates the various compilation phases, triggers query execution, and returns the results as well as error codes to the client application. www.assignmentpoint.com
Parallel DBMS • Components of parallel DBM architecture. • Data Manager: It provides all the low-level functions needed to run compiled queries in parallel. www.assignmentpoint.com
Parallel DBMS • Data Partitioning Techniques • There are three basic strategies for data partitioning: • Round-robin, • Hash and • Range partitioning. www.assignmentpoint.com
Parallel DBMS • Data Partitioning Techniques • Round-robin partitioning is the simplest strategy. It ensures uniform data distribution. With n partitions, the ith tuple in insertion order is assigned to partition (i mod n). • Hash partitioning applies a hash function to some attribute that yields the partition number. This strategy allows exact-match queries on the selection attribute to be processed by exactly one node and all other queries to be processed by all the nodes in parallel. • Range partitioning distributes tuples based on the value intervals of some attribute. It is well-suited for range queries. However, range partitioning can result in high variation in partition size. www.assignmentpoint.com
Parallel DBMS • Parallel Query Optimization • Parallel query optimization refers to the process of producing an execution plan for a given query that minimizes an objective cost function. The selected plan is the best one within a set of candidate plans examined by the optimizer. • Parallel query optimization exhibits similarities with distributed query processing. However, it takes advantage of both intra-operator parallelism and inter-operator parallelism. A parallel query optimizer can be seen as three components: a search space, a cost model, and a search strategy. • The search space is the set of alternative execution plans to represent the input query. The cost model predicts the cost of the given execution plan. The search strategy explores the search space and selects the best plan. www.assignmentpoint.com
Parallel DBMS • Database Integration Process. • Database Integration involves the process by which information from participating database can be conceptually integrated to form a single cohesive definition of a multi-database. In other words, it is the process of designing the Global Conceptual Schema (GCS). Database Integration occurs in two general steps: schema translation and schema integration. • In the first step, the component database schemas are translated to a common intermediate canonical representation (InS1; InS2; : : : ; InSn). Clearly, the translation step is necessary only if the component databases are heterogeneous and local schemas are defined using different data models. www.assignmentpoint.com
Parallel DBMS • Database Integration Process. • In the second step, the intermediate schemas are used to generate a GCS. In some methodologies, local external schemas are considered for integration rather than full database schemas. www.assignmentpoint.com
Parallel DBMS • Database Integration Process. • Schema Translation • Schema Translation is the task of mapping from one schema to another. This requires the specification of a target data model for the global conceptual schema definition. Schema translation may not be necessary in a heterogeneous database if it can be accomplished during integration stage. Combining the translation and integration steps provides the integrator with all the information about the entire global database at one time. www.assignmentpoint.com
Parallel DBMS • Database Integration Process. • Schema Integration • Schema Integration is the process of identifying the components of a database which are related to one another, selecting the best representation for the global conceptual schema and finally, integrating the components of each intermediate schema. • Integration methodologies can be classified as binary or n-ary mechanisms. Binary integration methodologies involve the manipulation of two schemas at a time. These can occur in a stepwise (ladder) fashion where intermediate schemas are created for integration with subsequent schemas or in a purely binary fashion, where each schema is integrated with one other, creating an intermediate schema for integration with other intermediate schemas and N-ary integration mechanisms integrate more than two schemas at each iteration. www.assignmentpoint.com