370 likes | 378 Views
This project explores the reference architecture of a Distributed Database Management System (DDBMS), including the design of global, fragmentation, and allocation schemas. It also covers the component architecture of a DDBMS, including the roles of Local DBMS, Data Communications, and Global System Catalog.
E N D
Distributed Database Design Manal Ahmad Project Coordinator
Reference Architecture of DDBMS • A set of global external schemas; • A global conceptual schema; • A fragmentation schema and allocation schema; • A set of schemas for each local DBMS conforming to the ANSI-SPARC three-level architecture.
Reference Architecture of DDBMS • Global conceptual schema The global conceptual schema is a logical description of the whole database, as if it were not distributed. This level corresponds to the conceptual level of the ANSISPARC architecture and contains definitions of entities, relationships, constraints, security, and integrity information.
Reference Architecture of DDBMS • Fragmentation and allocation schemas The fragmentation schema is a description of how the data is to be logically partitioned. The allocation schema is a description of where the data is to be located, taking account of any replication.
Reference Architecture of DDBMS • Local schemas Each local DBMS has its own set of schemas. The local conceptual and local internal schemas correspond to the equivalent levels of the ANSI-SPARC architecture. The local mapping schema maps fragments in the allocation schema into external objects in the local database.
Component Architecture for a DDBMS • Local DBMS (LDBMS) component; • Data communications (DC) component; • Global system catalog (GSC); • Distributed DBMS (DDBMS) component;
Component Architecture for a DDBMS • Local DBMS (LDBMS) component The LDBMS component is a standard DBMS, responsible for controlling the local data at each site that has a database. It has its own local system catalog that stores information about the data held at that site. • Data communications (DC) component The DC component is the software that enables all sites to communicate with each other. The DC component contains information about the sites and the links.
Global system catalog (GSC) The GSC has the same functionality as the system catalog of a centralized system. The GSC holds information specific to the distributed nature of the system, such as the fragmentation, replication, and allocation schemas. • Distributed DBMS (DDBMS) component The DDBMS component is the controlling unit of the entire system.
Distributed Relational Database Design • Fragmentation • Allocation • Replication
Distributed Relational Database Design • Fragmentation A relation may be divided into a number of sub relations, called fragments, which are then distributed. There are two main types of fragmentation: horizontal and vertical. Horizontal fragments are subsets of tuples andvertical fragments are subsets of attributes.
Distributed Relational Database Design • Allocation Each fragment is stored at the site with “optimal” distribution. • Replication The DDBMS may maintain a copy of a fragment at several different sites.
Fragmentation • Definition and allocation of fragments carried out strategically to achieve: • Locality of Reference • Improved Reliability and Availability • Improved Performance • Balanced Storage Capacities and Costs • Minimal Communication Costs.
Fragmentation • Locality of Reference Where possible, data should be stored close to where it is used. If a fragment is used at several sites, it may be advantageous to store copies of the fragment at these sites. • Improved Reliability and Availability Reliability and availability are improved by replication: there is another copy of the fragment available at another site in the event of one site failing.
Fragmentation • Improved Performance Bad allocation may result in bottlenecks occurring; that is, a site may become inundated with requests from other sites, perhaps causing a significant degradation in performance. Alternatively, bad allocation may result in underutilization of resources. • Balanced Storage Capacities and Costs Consideration should be given to the availability and cost of storage at each site, so that cheap mass storage can be used where possible. This must be balanced against locality of reference.
Fragmentation • Minimal Communication Cost Consideration should be given to the cost of remote requests. Retrieval costs are minimized when locality of reference is maximized or when each site has its own copy of the data. However, when replicated data is updated, the update has to be performed at all sites holding a duplicate copy, thereby increasing communication costs.
Data Allocation • Four alternative strategies regarding placement of data: • Centralized • Partitioned (or Fragmented) • Complete Replication • Selective Replication
Data Allocation • Centralized Consists of single database and DBMS stored at one site with users distributed across the network. • Partitioned Database partitioned into disjoint fragments, each fragment assigned to one site.
Data Allocation • Complete Replication Consists of maintaining complete copy of database at each site. • Selective Replication Combination of partitioning, replication, and centralization.
Fragmentation • Why Fragmenting? • Usage • Efficiency • Parallelism • Security
Why Fragmenting? • Usage Applications work with views rather than entire relations. Therefore, for data distribution, it seems appropriate to work with subsets of relations as the unit of distribution. • Efficiency Data is stored close to where it is most frequently used. In addition, data that is not needed by local applications is not stored. • Parallelism With fragments as the unit of distribution, a transaction can be divided into several subqueries that operate on fragments. This should increase the degree of concurrency, or parallelism, in the system, thereby allowing transactions that can do so safely to execute in parallel.
Why Fragmenting? • Security Data not required by local applications is not stored and consequently not available to unauthorized users.
Why Fragmenting? • Disadvantage of fragmentation • Performance The performance of global applications that require data from several fragments located at different sites may be slower. • Integrity Integrity control may be more difficult if data and functional dependencies are fragmented and located at different sites.
Correctness of fragmentation • Completeness • Reconstruction • Disjointness
Correctness of fragmentation • Correctness If a relation instance R is decomposed into fragments R1, R2, . . ., Rn, each data item that can be found in R must appear in at least one fragment. This rule is necessary to ensure that there is no loss of data during fragmentation.
Reconstruction It must be possible to define a relational operation that will reconstruct the relation R from the fragments. This rule ensures that functional dependencies are preserved. • Disjointness If a data item d i, appears in fragment Ri, then it should not appear in any other fragment. Vertical fragmentation is the exception to this rule, where primary key attributes must be repeated to allow reconstruction. This rule ensures minimal data redundancy.
Types of fragmentation • Horizontal • Vertical • Derived
Types of fragmentation • Horizontal fragmentation Consists of a subset of the tuples of a relation.
Horizontal Fragmentation • Horizontal fragmentation groups together the tuples in a relation that are collectively used by the important transactions. A horizontal fragment is produced by specifying a predicate that performs a restriction on the tuples in the relation. It is defined using the Selection operation of the relational algebra. • P1: δtype = “house"(PropertyForRent) • P2: δtype = “flat"(PropertyForRent)
Types of fragmentation • Vertical Fragment Consists of a subset of the attributes of a relation.
Vertical Fragmentation • Vertical fragmentation groups together the attributes in a relation that are used jointly by the important transactions. A vertical fragment is defined using the Projection operation of the relational algebra. S1: πstaffno, position, sex, DOB, salary(Staff) S2: πstaffno, fname, lname, branchno(Staff)
Types of fragment • Mixed Fragmentation Consists of a horizontal fragment that is subsequently vertically fragmented, or a vertical fragment that is then horizontally fragmented.