860 likes | 913 Views
Distributed DBMS Architecture. Tahir Rashid. Architecture. Defines the structure of the system Defines the structure of the system components identified functions of each component defined interrelationships and interactions between components defined. Architecture. Goal:
E N D
Distributed DBMS Architecture Tahir Rashid
Architecture Defines the structure of the system • Defines the structure of the system • components identified • functions of each component defined • interrelationships and interactions between components defined DDBMS Architecture
Architecture • Goal: – present the issues that need to be addressed at design – present a framework within which the design and implementation issues can be discussed • The ISO/OSI 7-layered reference model for computer networks DDBMS Architecture
Standardization Reference Model • A conceptual framework whose purpose is to divide standardization work into manageable pieces and to show at a general level how these pieces are related to one another. A reference model can be described according to three different approaches: • Based on components • Based on functions • Based on data DDBMS Architecture
DBMS STANDARDIZATION • Based on components. The components of the system are defined together with the interrelationships between components. A DBMS consists of a number of components, each of which provides some functionality. • Based on functions. The different classes of users are identified and the functions that the system will perform for each class are defined. The system specifications within this category typically specify a hierarchical structure for the user classes. The ISO/OSI architecture fall in this category. DDBMS Architecture
DBMS STANDARDIZATION • Based on data. The different types of data are identified, and an architectural framework is specified which defines the functional units that will realize or use data according to these different views. This approach (also referred as the data logical approach) is claimed to be the preferable choice for standardization activities. DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE The ANSI / SPARC architecture is claimed to be based on the data organization. It recognizes three views of data: the external view, which is that of the user, who might be a programmer; the internal view, that of the system or machine; and the conceptual view, that of the enterprise. For each of these views, an appropriate schema definition is required. DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE • At the lowest level of the architecture is the internal view, which deals with the physical definition and organization of data. • At the other extreme is the external view, which is concerned with how users view the database. • Between these two ends is the conceptual schema, which is an abstract definition of the database. It is the „real world” view of the enterprise being modeled in the database. DDBMS Architecture
Conceptual Schema Definition RELATION EMP[ KEY = {ENO} ATTRIBUTES = { ENO : CHARACTER(9) ENAME : CHARACTER(15) TITLE : CHARACTER(10) } ] RELATION PAY [ KEY = {TITLE} ATTRIBUTES = { TITLE : CHARACTER(10) SAL : NUMERIC(6) } ] DDBMS Architecture
Conceptual Schema Definition RELATION PROJ[ KEY = {PNO} ATTRIBUTES = { PNO : CHARACTER(7) PNAME : CHARACTER(20) BUDGET : NUMERIC(7) } ] RELATION ASG[ KEY = {ENO,PNO} ATTRIBUTES = { ENO : CHARACTER(9) PNO : CHARACTER(7) RESP : CHARACTER(10) DUR : NUMERIC(3) } ] DDBMS Architecture
Internal Schema Definition RELATION EMP[ KEY = {ENO} ATTRIBUTES = { ENO : CHARACTER(9) ENAME : CHARACTER(15) TITLE : CHARACTER(10) } ] INTERNAL_REL EMPL[ INDEX ON E# CALL EMINX FIELD = { HEADER : BYTE(1) E# : BYTE(9) ENAME : BYTE(15) TIT : BYTE(10) } ] DDBMS Architecture
External View Definition – Example 1 Create a BUDGET view from the PROJ relation CREATE VIEW BUDGET(PNAME, BUD) AS SELECT PNAME, BUDGET FROM PROJ DDBMS Architecture
External View Definition – Example 2 Create a Payroll view from relations EMP and TITLE_SALARY CREATE VIEW PAYROLL (ENO, ENAME, SAL) AS SELECT EMP.ENO,EMP.ENAME,PAY.SAL FROM EMP, PAY WHERE EMP.TITLE = PAY.TITLE DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE • The square boxes represent processing functions, whereas the hexagons are administrative roles. • The arrows indicate data, command, program, and description flow, whereas the „I”-shaped bars on them represent interfaces. • The major component that permits mapping between different data organizational views is the data dictionary / directory (depicted as a triangle), which is a meta-database. • The database administrator is responsible for defining the internal schema definition. • The enterprise administrator’s role is to prepare the conceptual schema definition. • The application administrator is responsible for preparing the external schema for applications. DDBMS Architecture
DBMS STANDARDIZATIONANSI / SPARC ARCHITECTURE • Two more users: • Application programmer • System programmer • Two user classes: • Casual user • Retrieve database and possible update • Added in external schema • Novice user • Typically have no knowledge of data base • Example (banking machine) DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs The systems are characterized with respect to: (1) the autonomy of the local systems, (2) their distribution, (3) their heterogeneity. DDBMS Architecture
Architectural models for Distributed DBMSs DDBMS Architecture
Autonomy • Distribution of control (and not data) - the degree of independence – The local operations of the individual DBMSs are not affected by their participation in the multidatabase system – The manner in which individual DBMSs process queries and optimize them should not be affected by the execution of global queries – System consistency should not be compromised when individual DBMSs join or leave the multidatabase system DDBMS Architecture
Autonomy • On the other hand specifies the dimension of autonomy as: • Design autonomy: Ability of a component DBMS to decide on issues related to its own design. • Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs. • Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to. DDBMS Architecture
Autonomy • Possibilities: • Tight integration – a single-image of the entire database is available to any user who wants to share the information, which may reside in multiple databases. • Semiautonomous system – consist of DBMSs that can operate independently, but have decided to participate in a federation to make their local data sharable. • Total isolation – the individual systems are stand-alone DBMSs, which know neither of the existence of other DBMSs nor how to communicate with them. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - DISTRIBUTION Distributions refers to the distributions of data. Of course, we are considering the physical distribution of data over multiple sites; the user sees the data as one logical pool. Two alternatives: • client / server distribution • peer-to-peer distribution (full distribution) DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - DISTRIBUTION Client / server distribution. The client / server distribution concentrates data management duties at servers while the clients focus on providing the application environment including the user interface. The communication duties are shared between the client machines and servers. Client / server DBMSs represent the first attempt at distributing functionality. Peer-to-peer distribution. There is no distinction of client machines versus servers. Each machine has full DBMS functionality and can communicate with other machines to execute queries and transactions. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - HETEROGENEITY Heterogeneity may occur in various forms in distributed systems, ranging form hardware heterogeneity and differences in networking protocols to variations in data managers. Representing data with different modeling tools creates heterogeneity because of the inherent expressive powers and limitations of individual data models. Heterogeneity in query languages not only involves the use of completely different data access paradigms in different data models, but also covers differences in languages even when the individual systems use the same data model. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - HETEROGENEITY • Various levels (hardware, communications, operating system) • DBMS important one – data model, query language, transaction management algorithms • Representing data with different modeling tools creates heterogeneity because of the inherent expressive power and limitations of individual data models. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES The dimensions are identified as: A (autonomy), D (distribution) and H (heterogeneity). The alternatives along each dimension are identified by numbers as: 0, 1 or 2. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES A0 - tight integration A1 - semiautonomous systems A2 - total isolation H0 - homogeneous systems H1 - heterogeneous systems D0 - no distribution D1 - client / server systems D2 - peer-to-peer systems DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs DDBMS Architecture
Distribution Distributed multi-DBMS Peer-to-peer Distributed DBMS Client/server Autonomy Multi-DBMS Federated DBMS Heterogeneity Alternatives in Distributed Database Systems DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES • In figure 4.3 , two alternative architectures that are focus of this book: • (A0, D2, H0) • (A2, D2, H1) • Not all the architectures that are identified by this design space are meaningful. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES (A0, D0, H0) If there is no distribution or heterogeneity, the system is a set of multiple DBMSs that are logically integrated. Such systems can be given generic name composite systems. Not such examples but they may be suitable for shared everything multiprocessor systems. (A0, D0, H1) If heterogeneity is introduced, one has multiple data managers that are heterogeneous but provide an integrated view to the user. (A0, D1, H0) The more interesting case is where the database is distributed even though an integrated view of the data is provided to users (client / server distribution). Mentioned earlier and will discuss further. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES (A0, D2, H0) The same type of transparency is provided to the user in a fully distributed environment. There is no distinction among clients and servers, each site providing identical functionality. (A1, D0, H0) These are semiautonomous systems, which are commonly termed federated DBMS. The component systems in a federated environment have significant autonomy in their execution, but their participation in the federation indicate that they are willing to cooperate with other in executing user requests that access multiple databases. An example may be multiple installations of an DBMS. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES (A1, D0, H1) These are systems that introduce heterogeneity as well as autonomy, what we might call a heterogeneous federated DBMS. (A1, D1, H1) System of this type introduce distribution by placing component systems on different machines. They may be referred to as distributed, heterogeneous federated DBMS. (A2, D0, H0) Now we have full autonomy. These are multidatabase systems (MDBS). The components have no concept of cooperation. Without heterogeneity and distribution, an MDBS is an interconnected collection of autonomous databases. DDBMS Architecture
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs - ALTERNATIVES (A2, D0, H1) These case is realistic, maybe even more so than (A1, D0, H1), in that we always want to built applications which access data from multiple storage systems with different characteristics. (A2, D1, H1) and (A2, D2, H1) These two cases are together, because of the similarity of the problem. They both represent the case where component databases that make up the MDBS are distributed over a number of sites - we call this the distributed MDBS. DDBMS Architecture
Data logical Distributed DBMS Architecture ES: External Schema GCS: Global Conceptual Schema LCS: Local Conceptual Schema LIS: Local Internal Schema ... ES1 ESn ES2 GCS ... LCS1 LCS2 LCSn ... LIS1 LIS2 LISn DDBMS Architecture
Datalogical Multi-DBMS Architecture ... GESn GES2 • GES: Global External Schema • LES: Local External Schema GES1 … … LES11 LES1n GCS LESn1 LESnm … LCS2 LCSn LCS1 … LIS1 LIS2 LISn • LCS: Local Conceptual Schema • LIS: Local Internal Schema DDBMS Architecture
Distributed DBMS • Distributed database requires distributed DBMS • Functions of a distributed DBMS: • Locate data with a distributed data dictionary • Determine location from which to retrieve data and process query components • DBMS translation between nodes with different local DBMSs (using middleware) • Data consistency (via multiphase commit protocols) • Global primary key control • Scalability • Security, concurrency, query optimization, failure recovery DDBMS Architecture
Figure 13-10 – Distributed DBMS architecture DDBMS Architecture
Local Transaction Steps • Application makes request to distributed DBMS • Distributed DBMS checks distributed data repository for location of data. Finds that it is local • Distributed DBMS sends request to local DBMS • Local DBMS processes request • Local DBMS sends results to application DDBMS Architecture
Figure 13-10: Distributed DBMS Architecture (cont.) (showing local transaction steps) 1 3 5 4 2 Local transaction – all data stored locally DDBMS Architecture
Global Transaction Steps • Application makes request to distributed DBMS • Distributed DBMS checks distributed data repository for location of data. Finds that it is remote • Distributed DBMS routes request to remote site • Distributed DBMS at remote site translates request for its local DBMS if necessary, and sends request to local DBMS • Local DBMS at remote site processes request • Local DBMS at remote site sends results to distributed DBMS at remote site • Remote distributed DBMS sends results back to originating site • Distributed DBMS at originating site sends results to application DDBMS Architecture
3 1 7 6 8 4 2 5 Figure 13-10: Distributed DBMS architecture (cont.) (showing global transaction steps) Global transaction – some data is at remote site(s) DDBMS Architecture
DISTRIBUTED DBMS ARCHITECTURE • Client / server systems - (Ax, D1, Hy) • Distributed databases - (A0, D2, H0) • Multidatabase systems - (A2, Dx, Hy) DDBMS Architecture
Client/Server Systems • Networked computing model • Processes distributed between clients and servers • Client – Workstation (usually a PC) that requests and uses a service • Server – Computer (PC/mini/mainframe) that provides a service • For DBMS, server is a database server DDBMS Architecture
Application Logic in C/S Systems • Presentation Logic • Input – keyboard/mouse • Output – monitor/printer • Processing Logic • I/O processing • Business rules • Data management • Storage Logic • Data storage/retrieval GUI Interface Procedures, functions, programs DBMS activities DDBMS Architecture
Client/Server Architectures • File Server Architecture • Database Server Architecture • Three-tier Architecture DDBMS Architecture
File Server Architecture • All processing is done at the PC that requested the data • Entire files are transferred from the server to the client for processing. • Problems: • Huge amount of data transfer on the network • Each client must contain full DBMS • Heavy resource demand on clients • Client DBMSs must recognize shared locks, integrity checks, etc. DDBMS Architecture
File Server Architecture DDBMS Architecture