500 likes | 684 Views
Chapter 12. Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel. In this chapter, you will learn:. What a distributed database management system (DDBMS) is and what its components are
E N D
Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel
In this chapter, you will learn: • What a distributed database management system (DDBMS) is and what its components are • How database implementation is affected by different levels of data and process distribution • How transactions are managed in a distributed database environment • How database design is affected by the distributed database environment Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
The Evolution of Distributed Database Management Systems • Distributed database management system (DDBMS) • Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites • Centralized database required that corporate data be stored in a single central site • Dynamic business environment and centralized database’s shortcomings spawned a demand for applications based on data access from different sources at multiple locations (PDAs for example) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Centralized database management system Request Application issues a data request to the DBMS DBMS Reply Data Read Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Centralized database management problems • Performance degradation due to a growing number of remote locations • High costs (mainframe) • Reliability problems (single point of failure syndrome) • Scalability problems - single location • Organizational rigidity – no flexibility and agility Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Processingand Distributed Databases Distributed processing Database’s logical processing is shared among two or more physically independent sites Connected through a network For example, the data input/output (I/O), data selection, and data validation might be performed on one computer, and a report based on that data might be created on another computer (see figure12.2) Distributed processing does not require a distributed database Distributed database Stores logically related database over two or more physically independent sites Database composed of database fragments Distributed database requires distributed processing (each database fragment is managed by its own local database process) 6 Database Systems, 8th Edition Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Tue 1-4 DDBMS Advantages • Advantages include: • Data are located near “greatest demand” site • Faster data access • Faster data processing • Growth facilitation: New sites can be added to the network without affecting the operations of other sites. • Improved communications: Because local sites are smaller and located closer to customers • Reduced operating costs: Add workstation notmainframe • User-friendly interface: Easy training • Less danger of a single-point failure • Processor independence: end user is able to access any available copy of the data, and an end user’s request is processed by any processor at the data location. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
DDBMS Disadvantages • Disadvantages include: • Complexity of management and control • Security • Lack of standards – No compatibility • Increased storage requirements: Multiple copies of data are required at different sites • Increased training cost Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Characteristics of Distributed Management Systems • Application interface: interact with the end user, application programs, and other DBMSs • Validation: to analyze data requests for syntax correctness • Transformation: to decompose complex requests into atomic data request components • Query optimization: to find the best access strategy • Mapping:to determine the data location of local and remote fragments • I/O interface: to read or write data from or to permanent local storage • Formatting: to prepare the data for presentation to the end user or to an application program • Security: to provide data privacy at both local and remote databases • Backup and recovery: to ensure the availability and recoverability of DB in case of a failure • DB administration • Concurrency control: to manage simultaneous data access and to ensure data consistency • Transaction management: to ensure that the data moves from one consistent state to another Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Characteristics of Distributed Management Systems (continued) • Must perform all the functions of centralized DBMS • Must handle all necessary functions imposed by distribution of data and processing • Must perform these additional functions transparently to the end user • No need to know the names of fragments, where they found, and • No need to know that the DB is fragmented Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Characteristics of Distributed Management Systems (continued) Both users “see” only one logical database and do not need to know the names of the fragments Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
DDBMS Components • Must include (at least) the following components: • Computer workstations • Network hardware ( gateways, routers, network bridges, switches, hubs ) and software • Communications media (cables, microwave, fiber optics, satellite) • Transaction processor (also known as application processor, transaction manager) • Software component found in each computer that requests data (receives and processes the application’s data requests (remote and local)) • Data processor or data manager • Software component residing on each computer that stores and retrieves data located at the site • May be a centralized DBMS Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
DDBMS Components (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Levels of Data and Process Distribution • Current systems classified by how process distribution and data distribution supported Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Single-Site Processing, Single-Site Data (SPSD) All processing is done on single CPU or host computer (mainframe, midrange, or PC) All data are stored on host computer’s local disk Processing cannot be done on end user’s side of system. several processes to run concurrently on a host computer accessing a single DP Typical of most mainframe and midrange computer DBMSs DBMS is located on host computer, which is accessed by dumb terminals connected to it 17 Database Systems, 8th Edition Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Single-Site Processing, Single-Site Data(SPSD) TP and the DP are embedded within the DBMS 18 Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Multiple-Site Processing, Single-Site Data (MPSD) • Multiple processes run on different computers sharing single data repository • The end user sees the file server as just another hard disk. Because only the data storage input/output (I/O) is handled by the file server’s computer. • All record- and file-locking activities are done at the end-user location. All data selection, search, and update functions take place at the workstation, thus requiring that entire files. • travel through the network for processing at the workstation • MPSD scenario requires network file server running conventional applications that are accessed through LAN • Many multi-user accounting applications, running under personal computer network, fit such a description Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
SELECT *FROM CUSTOMERWHERE CUS_BALANCE > 1000;All 10,000 CUSTOMER rows must travel through the network to be evaluated at site A, even if 50 of them have balances greater than $1,000 Client/server architecture is similar to that of the network file server except that all database processing is done at the server site, thus reducing network traffic. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Multiple-Site Processing, Multiple-Site Data (MPMD) • Fully distributed database management system with support for multiple data processors and transaction processors at multiple sites • Classified as either homogeneous or heterogeneous • Homogeneous DDBMSs • Integrate only one type of centralized DBMS over a network Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Multiple-Site Processing, Multiple-Site Data (MPMD) (continued) • Heterogeneous DDBMSs • Integrate different types of centralized DBMSs over a network • Fully heterogeneous DDBMS • Support different DBMSs that may even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Database Transparency Features • Allow end user to feel like database’s only user • Features include: • Distribution transparency • Transaction transparency • Failure transparency • Performance transparency • Heterogeneity transparency Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distribution Transparency • Allows management of physically dispersed database as though it were a centralized database Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Transaction Transparency Ensures database transactions will maintain distributed database’s integrity and consistency Ensures transaction completed only when all database sites involved complete their part Distributed database systems require complex mechanisms to manage transactions To ensure consistency and integrity 26 Database Systems, 8th Edition Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions Remote request: single SQL statement accesses data from single remote database Remote transaction: accesses data at single remote site Distributed transaction: requests data from several different remote sites on network Distributed request: single SQL statement references data at several DP sites 27 Database Systems, 8th Edition Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Distributed Requests and Distributed Transactions (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Exam-2 Performance Transparency • Objective of query optimization routine is to minimize total cost associated with execution of request • Costs associated with request are function of: • Access time (I/O) cost • Communication cost • CPU time cost • Must provide: • Distribution transparency: Allows management of physically dispersed database as though it were a centralized database • Replica transparency: DDBMS’s ability to hide existence of multiple copies of data from user Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Thu 11-7 Distributed Database Design • Design concepts for centralized Database: • The Relational Database Model • Entity Relationship Modeling; and • Normalization of Database Tables • Three new issues for distributed Database: • Data fragmentation • How to partition database into fragments • Data replication • Which fragments to replicate • Data allocation • Where to locate those fragments and replicas Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation • Breaks single object ( Db or table) into two or more segments or fragments • Each fragment can be stored at any site over computer network • Information about data fragmentation is stored in distributed data catalog (DDC), from which it is accessed by TP to process user requests Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) • Strategies (based at table level) • Horizontal fragmentation • Division of a relation into subsets (fragments) of tuples (rows) • Each fragment represents the equivalent of a SELECT statement, with the WHERE clause on a single attribute. • Vertical fragmentation • Division of a relation into attribute (column) subsets • This is the equivalent of the PROJECT statement in SQL. • Mixed fragmentation • Combination of horizontal and vertical strategies • A table may be divided into several horizontal subsets (rows), each one having a subset of the attributes (columns). Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Company’s corporate management requires information about its customers in all three states, but company locations in each state (TN, FL, and GA) require data regarding local customers only. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Each horizontal fragment may have a different number of rows, but each fragment must have the same attributes. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Suppose the company is divided into two departments: the service department and the collections department. Each department is located in a separate building, and each has an interest in only a few of the CUSTOMER table’s attributes. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Each vertical fragment must have the same number of rows, but the inclusion of the different attributes depends on the key column (CUS_NUM) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Company’s structure requires that the CUSTOMER data be fragmented horizontally to accommodate the various company locations; within the locations, the data must be fragmented vertically to accommodate the two departments (service and collection). Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Fragmentation (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Sun 14-7 Data Replication • Storage of data copies at multiple sites served by computer network • Fragment copies can be stored at several sites to serve specific information requirements • Can enhance data availability and response time • Can help to reduce communication and total query costs Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Replication (continued) Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Replication (continued) • Replication scenarios • Fully replicated database • Stores multiple copies of each database fragment at multiple sites • Can be impractical due to amount of overhead • Partially replicated database • Stores multiple copies of some database fragments at multiple sites • Most DDBMSs are able to handle the partially replicated database well • Unreplicated database • Stores each database fragment at single site • No duplicate database fragments Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Data Allocation • Deciding where to locate data: which data to locate where • Data distribution over computer network is achieved through data partition, data replication, or combination of both • Allocation strategies • Centralized data allocation • Entire database is stored at one site • Partitioned data allocation • Database is divided into several disjointed parts (fragments) and stored at several sites • Replicated data allocation • Copies of one or more database fragments are stored at several sites Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Client/Server vs. DDBMS • Way in which computers interact to form a system • Features (Includes) user of resources, or client, and provider of resources, or server • Can be used to implement a DBMS in which client is the TP and server is the DP • The client (TP) interacts with the end user and sends a request to the server (DP). The server receives, schedules, and executes the request, selecting only those records that are needed by the client. The server then sends the data to the client only when the client requests the data. Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Client/Server vs. DDBMS (continued) • Client/server advantages • Less expensive than alternate minicomputer or mainframe solutions • Allow end user to use microcomputer’s GUI, thereby improving functionality and simplicity • More people in job market have PC skills than mainframe skills • PC is well established in workplace • Numerous data analysis and query tools exist to facilitate interaction with DBMSs available in PC market • Considerable cost advantage to offloading applications development from mainframe to powerful PCs Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Client/Server vs. DDBMS (continued) • Client/server disadvantages • Creates more complex environment • Different platforms (LANs, operating systems, and so on) are often difficult to manage • An increase in number of users and processing sites often paves the way for security problems • Possible to spread data access to much wider circle of users • Increases demand for people with broad knowledge of computers and software • Increases burden of training and cost of maintaining the environment Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel