860 likes | 1.16k Views
Chapter 10 Distributed Database Management System. Database Systems: Design, Implementation, and Management. DDBMS. Distributed Database Management System
E N D
Chapter 10Distributed Database Management System Database Systems: Design, Implementation, and Management
DDBMS • Distributed Database Management System • Governs the storage and processingof logically related dataover interconnected computer systemsin which both data and processing functionsare distributed among several sites.
The Evolution of Distributed DBMS • Centralized DBMS in the 1970’s • Support for structured information needs. • Regularly issued formal reports in standard formats. • Prepared by specialist using 3GL in response to precisely channeled request. • Centrally stored corporate data. • Data access through dumb terminals. • Incapable of providing quick, unstructured, and ad hoc information for decision makers in a dynamic business environment.
The Evolution of Distributed DBMS • Social and Technical Changes in the 1980’s • Business operations became more decentralizedgeographically. • Competition increased at the global level. • Customer demands and market needs favored a decentralized management style. • Rapid technological change created low-cost microcomputers. The LANs became the basis for computerized solutions. • The large number of applications based on DBMSs and the need to protect investments in centralized DBMS software made the notion of data sharing attractive.
The Evolution of Distributed DBMS • Two Database Requirements in a Dynamic Business Environment: • Quick ad hoc data access became crucial in the quick-response decision making environment. • The decentralization of management structure based on the decentralization of business units made decentralized multiple-access and multiple-location databases a necessity. • Ad-hoc query – 即興的查詢
The Evolution of Distributed DBMS • Developments in the 1990’s affecting DBMS • The growing acceptance of the Internet and the World Wide Web as the platform for data access and distribution. • The increased focus on data analysis that led to data mining and data warehousing. • Data mining • Proactive ( In contrast to reactive DSS tools ) • Instead of having end user define the problem, select the data and select the tools to analyze such data,the data-mining tool automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user. • Data warehousing • An integrated, subject-oriented, time-variant, nonvolatile database that provides support for decision making.
DDBMS Advantages Data are located near the “greatest demand” site. Faster data access Faster data processing Growth facilitationNew sites can be added to the network without affecting the operations of other sites. Improved communicationsa local account receivable operation uses sales department data directly, without having to depend on delayed reports from the central office. The Evolution of Distributed DBMS
DDBMS Advantages Reduced operating costslow-cost PC ←→ mainframe User-friendly interfaceeasy-to-use GUI Less danger of a single-point failure Processor independencerequests do not depend on a specific processor;anyavailable processor can handle the user’s request. The Evolution of Distributed DBMS
DDBMS Disadvantages Complexity of management and control Securitythe possibility of security lapses increases when data are located at multiple sites. Lack of standardsno standard communication protocols for DDBMS Increased storage requirementsData replication The Evolution of Distributed DBMS
Distributed Processingand Distributed Database • Distributed processingshares the database’s logical processingamong two or more physically independent sitesthat are connected through a network. ( Figure 10.1) • Distributed databasestores a logically related databaseover two or more physically independent sites connected via a computer network. (Figure 10.2)
Distributed Processing Environment Figure 10.1
Distributed Database Environment Figure 10.2
Distributed Processingand Distributed Database • Distributed processing does not require a distributed database • Distributed database requires distributed processing • Distributed processing may be based on a single database located on a single computer. In order to manage distributed data, copies or parts of the database processing functions must be distributed to all data storage sites. • Both distributed processing and distributed databases require a network to connect all components.
What Is A Distributed DBMS? • A distributed database management system (DDBMS) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites.
What Is A Distributed DBMS? • Functions of a DDBMS • Application interface • Validation (analyze data requests) • Transformation (determine request’s components distributed/local) • Query-optimization to find the best access strategy • Mapping to determine the data location • I/O interface to read or write data • Formatting to prepare the data for presentation • Security to provide data privacy • Backup and recovery • Database administration • Concurrency control • Transaction management
Centralized Database Management System Figure 10.3
Fully Distributed Database Management System Figure 10.4
Single Logical database (Fig.10.4) • Both users see only one logical database anddo not need to know the names of the fragments. • The end-user need not even know • the database is divided into separate fragments • where the fragments are located
DDBMS Components • Computer workstationsthat form the network system. • Networkhardware and software components that reside in each workstation. • Communications mediathat carry the data from one workstation to another. • Transaction processor (TP) receives and processes the application’s data requests. • Data processor (DP) stores and retrieves data located at the site. Also known as data manager (DM).
Distributed Database System Components Figure 10.5
DDBMS Components • DDBMS protocol determines how the DDBMS will: • Interface with the network to transportdata and commands between DPs and TPs. • Synchronize all data received from DPs(TP side) androute retrieved data to the appropriate TPs(DP side). • Ensure common database functions in a distributed system -- security, concurrency control, backup, and recovery.
Levels of Data & Process Distribution • Database systems can be classified based on process distribution and data distribution SD MD SP MP
Levels of Data & Process Distribution Host DBMS • SPSD (Single-Site Processing, Single-Site Data) • All processing is done on a single CPU or host computer. • All data are stored on the host computer’s local disk. • The DBMS is located on the host computer. • The DBMS is accessed by dumb terminals. • Typical of most mainframe and minicomputer DBMSs. • Typical of the 1st generation of single-user microcomputer database.
Nondistributed (Centralized) DBMS SPSD (Single-Site Processing, Single-Site Data) Figure 10.6
Levels of Data & Process Distribution LAN DBMS • MPSD (Multiple-Site Processing, Single-Site Data) • Typically, MPSD requires a network file server on which conventional applications are accessed through a LAN. • A variation of the MPSD approach is known as a client/server architecture. (Chapter 12)
Levels of Data & Process Distribution • MPMD (Multiple-Site Processing, Multiple-Site Data) • Fully distributed DBMS with support for multiple DPs and TPs at multiple sites. • Homogeneous DDMSintegrate only one type of centralized DBMS over the network. • Heterogeneous DDBMSintegrate different types of centralized DBMSs over a network. (See Figure 10.8)
Distributed DB Transparency • DDBMS transparency features have the common property of allowing the end users to think that he is the database’s only user. • Distribution transparency • Transaction transparency • Failure transparency • Performance transparency • Heterogeneity transparency
Distribution Transparency • Distribution transparencyallows us to manage a physically dispersed databaseas though it were a centralized database. • Three Levels of Distribution Transparency • Fragmentation transparency • Location transparency • Local mapping transparency Names?
Distribution Transparency • Example (Figure 10.9): Employee data (EMPLOYEE) are distributed over three locations: New York, Atlanta, and Miami.Depending on the level of distribution transparency support, three different cases of queries are possible:
Distribution Transparency • Case 1: DB SupportsFragmentation Transparency SELECT *FROM EMPLOYEEWHERE EMP_DOB < ‘01-JAN-1940’;
Distribution Transparency • Case 2: DB Supports Location Transparency SELECT *FROM E1WHERE EMP_DOB < ‘01-JAN-1940’; UNION SELECT *FROM E2WHERE EMP_DOC < ‘01-JAN-1940’; UNION SELECT *FROM E3WHERE EMP_DOC < ‘01-JAN-1940’;
Distribution Transparency • Case 3: DB Supports Local Mapping Transparency SELECT *FROM E1 NODE NYWHERE EMP_DOB < ‘01-JAN-1940’; UNION SELECT *FROM E2 NODE ATLWHERE EMP_DOC < ‘01-JAN-1940’; UNION SELECT *FROM E3 NODE MIAWHERE EMP_DOC < ‘01-JAN-1940’;
Distribution Transparency • Distribution transparency is supported by distributed data dictionary (DDD) or distributed data catalog (DDC). • The DDC contains the description of the entire database as seen by the database administrator. • The database description, known as the distributed global schema, is the common database schema used by local TPs to translate user requests into subqueries.
Transaction Transparency • Transaction transparencyensures that database transactions will maintain the database’s integrity and consistency. • The transaction will be completed only if all database sites involved in the transaction complete their part of the transaction. • Related Concepts: • Remote Requests • Remote Transactions • Distributed Transactions • Distributed Requests
Transaction Transparency • Distributed Requests and Distributed Transactions • A remote requestallows us to access data to be processed by a single remote database processor. (Figure 10.10)
A Remote Request Figure 10.10
Transaction Transparency • Distributed Requests and Distributed Transactions • A remote transaction, composed of several requests, may access data at only a single site. (Figure 10.11)
A Remote Transaction Figure 10.11
Transaction Transparency • Distributed Requests and Distributed Transactions • A distributed transactionallows a transaction to reference several different (local or remote) DP sites. (Figure 10.12)
Transaction Transparency • Distributed Requests and Distributed Transactions • A distributed requestlets us reference data from several remote DP sites. (Figure 10.13) • It also allows a single request to reference a physically partitioned table. (Figure 10.14)
Distributed Concurrency Control • Concurrency control becomes especially important because multisite, multiple-process are much more likely to create data inconsistencies and deadlocked transactions. • Premature commit (Fig. 10.15) • Each transaction operation was comitted by each local DP, ( site A, site B )but one of the DPs could not commit the transaction’s results.( site C ) • yield an inconsistent database, because we cannotuncommitcommitted data. • Solution: two-phase commit protocol
Transaction Transparency • Two-Phase Commit Protocol • The two-phase commit protocol guarantees that, if a portion of a transaction operation cannot be committed, all changes made at the other sites participating in the transaction will be undone to maintain a consistent database state. • Each DP maintains its own transaction log. The two-phase protocol requires that each individual DP’s transaction log entry be writtenbefore the database fragment is actually updated. • The two-phase commit protocol requires a DO-UNDO-REDO protocol and a write-ahead protocol.
Transaction Transparency • Two-Phase Commit Protocol • The DO-UNDO-REDO protocolis used by the DP to roll back and/or roll forward transactions with the help of the system’s transaction log entries. • DO performs the operation and records the “before” and “after” values in the transaction log. • UNDOreverses an operation, using the log entries written by the DO portion of the sequence. • REDOredoes an operation, using the log entries written by DO portion of the sequence. • To ensure that DO-UNDO-REDO operations can survive a system crash while they are being executed, a write-ahead protocol is used. • The write-ahead protocolforces the log entry to be written to permanent storagebefore the actual operation takes place.
Transaction Transparency • Two-Phase Commit Protocol defines the operations between two types of nodes: the coordinator and one or more subordinates or cohorts. The protocol is implemented in two phases: • Phase 1: Preparation • The coordinatorsends a PREPARE TO COMMIT message to all subordinates. • The subordinatesreceive the message, write the transaction log using the write-ahead protocol, and send an acknowledgement message to the coordinator. • The coordinator makes sure that all nodes are ready to commit, or it aborts the transaction.
Transaction Transparency • Phase 2: The Final Commit • The coordinatorbroadcasts a COMMIT message to all subordinates and waits for the replies. • Each subordinatereceives the COMMIT message then updates the database, using the DO protocol. • The subordinatesreply with a COMMITTED or NOT COMMITTED message to the coordinator. • If one or more subordinates did not commit, the coordinator sends an ABORT message, thereby forcing them to UNDO all changes.