190 likes | 205 Views
Learn about the structure, advantages, and components of distributed database systems. Explore topics such as data fragmentation, data replication, and the need for distributed databases. Discover the benefits and challenges associated with managing distributed databases.
E N D
UNIT-V DISTRIBUTED DATABASES
Objectives Definition of terms • Need for Distributed Database Systems • Structure of Distributed Database • Advantages and Disadvantages of DDBMS • Components of DDBMS • Features of DDBMS • Data fragmentation • Data replication
definition • Distributed Database: is a collection of multiple, logically related database,physically distributed over network is called distributed database. • Distributed Database Management System: A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.
Centralized databases are highly dependent on network connectivity. • The slower the internet connection is, the longer the database access time needed will be. • Bottlenecks can occur as a result of high traffic. • Limited access by more than one person to the same set of data as there is only one copy of it and it is maintained in a single location. • This can lead to major decreases in the general efficiency of the system. • If there is no fault-tolerant setup and hardware failure occurs, all the data within the database will be lost. • Since there is minimal to no data redundancy, if a set of data is unexpectedly lost it is very hard to retrieve it back, in most cases it would have to be done manually.
Need for Distributed Database Systems • Business unit autonomy and distribution • Data sharing • Data communication costs • Data communication reliability and costs • Multiple application vendors • Database recovery • Transaction and analytic processing
Components 1. computer work station 2. Network hardware and software components 3. Links: The communication connections that provide the necessary data transfer capabilities. 4. Transaction Manager(TM ):Transaction Manager sends the transaction to different sites where the necessary data is available. 5. Scheduler: scheduler's duty is to receive the transaction from transaction manager and schedule the transaction to data manager 6. Data Manager: Data manager sends the request to the database and responsible to send back the responses. The data is sent back through same link
Advantages • Management of data with different level of transparency • Network transparency • Replication transparencies • Fragmentation transparence • 2. Increased Reliability and availability • 3. Easier Expansion • 4. Improved Performance • 5. No more bottle neck state • 6.Lower communication cost
Disadvantages • 1. Complexity of management and control • 2. Data must stitch together • 3. DBA must have ability to co-ordinate • 4. Technological Difficulties like recovery, access path selection,transaction management • 5. Increased training cost • 6. Increased storage and Infrastructure requirement.
Features of a Distributed DBMS • A distributed DBMS is a collection of logically related shared data. • The data in a distributed DBMS is split into a number of fragments or partitions. • Fragments may be replicated in a distributed system. • Fragments/replicas are allocated to different sites. • In a distributed system, the sites are linked by communications network. • The data at each site is under the control of a DBMS. • The DBMS at each site has its own right, that is, it can handle local applications independently. • Each DBMS in a distributed system participates in at least one global application.
Data Replication Data replication is the process of storing separate copies of the database at two or more sites. It is a popular fault tolerance technique of distributed databases. Advantages of Data Replication Reliability − In case of failure of any site, the database system continues to work since a copy is available at another site(s). Reduction in Network Load − Since local copies of data are available, query processing can be done with reduced network usage, particularly during prime hours. Data updating can be done at non-prime hours. Quicker Response − Availability of local copies of data ensures quick query processing and consequently quick response time. Simpler Transactions − Transactions require less number of joins of tables located at different sites and minimal coordination across the network. Thus, they become simpler in nature.
DATA REPLICATION Disadvantages of Data Replication Increased Storage Requirements − Maintaining multiple copies of data is associated with increased storage costs. The storage space required is in multiples of the storage required for a centralized system. Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update needs to be reflected in all the copies of the data at the different sites. This requires complex synchronization techniques and protocols. Undesirable Application – Database coupling − If complex update mechanisms are not used, removing data inconsistency requires complex co-ordination at application level. This results in undesirable application – database coupling. Some commonly used replication techniques are − Snapshot replication Near-real-time replication Pull replication
Fragmentation Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be classified into two techniques: primary horizontal fragmentation and derived horizontal fragmentation. Fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the original table can be reconstructed from the fragments whenever required. This requirement is called “reconstructiveness.” Advantages of Fragmentation Since data is stored close to the site of usage, efficiency of the database system is increased. Local query optimization techniques are sufficient for most queries since data is locally available. Since irrelevant data is not available at the sites, security and privacy of the database system can be maintained.
Disadvantages of Fragmentation • When data from different fragments are required, the access speeds may be very high. • In case of recursive fragmentations, the job of reconstruction will need expensive techniques. • Lack of back-up copies of data in different sites may render the database ineffective in case of failure of a site. • Types of Fragmentation • Vertical fragmentation • Horizontal fragmentation • Hybrid fragmentation
Vertical Data Fragmentation : This is the vertical subset of a relation. That means a relation / table is fragmented by considering the columns of it. Horizontal Data Fragmentation : As the name suggests, here the data / records are fragmented horizontally. i.e.; horizontal subset of table data is created and are stored in different database in DDB.
HybridData Fragmentation This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have horizontal fragmentation to have subset of data to be distributed over the DB, and vertical fragmentation to have subset of columns of the table.