270 likes | 400 Views
Distributed DBMSs-Concept and Design. Jing Luo CS 157B Dr. Lee Fall, 2003. Centralized DBMS It allows users to access only a single logical database located at one site under its control. Distributed DBMS
E N D
Distributed DBMSs-Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003
Centralized DBMS It allows users to access only a single logical database located at one site under its control. Distributed DBMS It allows users to access not only the data at their own site but also data stored at remote sites. DBMSs
Definitions • Distributed database: A logically interrelated collection of shared data (and a description of this data) physically distributed over a computer network. • Distributed DBMS: The software system that permits the management of the distributed database and makes the distribution transparent to users.
Users access the distributed database via applications • Local applications Applications are those do not require data from other sites. • Global applications Applications are those do require data from other sites.
Characteristics of DDBMS • A collection of logically related shared data; • The data is split into a number of fragments; • Fragments may be replicated; • Fragments/replicas are allocated to sites; • The sites are linked by a communications network; • The data at each site is under the control of a DBMS; • The DBMS at each site can handle local applications, autonomously; • Each DBMS participates in at lease one global application.
A DDBMS is required to have at least one global application.It is not necessary for every site in the system to have its own local database. DDBMS DB Site 1 Site 2 Computer network Site 4 DB DB Site 3
Distributed processingA centralized database that can be accessed over a computer network.
Distributed Processing (cont’d) Distributed Processing Site 1 Site 2 Computer network Site 4 DB Site 3
Distributed DBMS System consists of data that is physically distributed across a number of sites in the network. Distributed processing Data is centralized, even though other users may be accessing the data over the network. Distributed DBMS vs. Distributed Processing
Parallel DBMSsA DBMS running across multiple processors and disks that is designed to execute operations in parallel, whenever possible, in order to improve performance
Three Main Architectures for Parallel DBMSs To provide multiple processors with common access to a single database, a parallel DBMS must provide for shared resource management. • Shared memory • Shared disk • Shared nothing
Shared memory is a tightly coupled architecture in which multiple processors within a single system share system memory. • Symmetric multiprocessing (SMP) This approach has become popular on platforms ranging from personal workstations that support a few microprocessors in parallel, to RISC (Reduced Instruction Set Computer) based machines, all the way up to the largest mainframes. • The architecture provides high-speed data access for a limited number of processors, but it is not scalable beyond about 64 processors when the interconnection network becomes a bottleneck.
Shared Memory (cont’d) • Shared Memory CPU CPU CPU CPU Interconnection network DB DB DB Memory
Shared disk is a loosely-coupled architecture optimized for applications that are inherently centralized and require high availability and performance. • Each processor can access all disks directly, but each has its own private memory. • Shared disk architecture eliminates the shared memory performance bottleneck without introducing the overhead associated with physically partitioned data.
Shared Disk (cont’d) • Shared Disk Memory Memory Memory Memory CPU CPU CPU CPU Interconnection network DB DB DB
Shared nothing known as massively parallel processing, is a multiple processor architecture in which each processor is part of a complete system, with its own memory and disk storage. • The database is partitioned among all the disks on each system associated with the database, and data is transparently available to users on all system. • This architecture can easily support a large number of processors.
Shared nothing (cont’d) • SN Memory Memory DB DB CPU CPU Interconnection network DB DB CPU CPU Memory Memory
Homogeneous system All sites use the same DBMS product. Heterogeneous system Sites may run different DBMS products, which need not be based on the same underlying data model, and so the system may be composed of relational, network, hierarchical, and object-oriented DBMSs. Homogeneous & Heterogeneous DDBMSs
Heterogeneous system problems In a heterogeneous system, translations are required to allow communication between different DBMSs. The system has the task of locating the data and performing any necessary translation. Data required from another site may have: • Different hardware • Different DBMS products • Different hardware and different DBMS products If the hardware is different but the DBMS products are the same, involving the change of codes and word length. If the DBMS products are different, involving the mapping of data structures in one data model to the equivalent data structures in another data model.
Heterogeneous system problems (cont’d) An additional complexity is the provision of a common Conceptual schemas. The integration of data models can be very difficult owing to the semantic heterogeneity. For example, attributes with the same name in two Schemas may represent different things. Equally well, Attributes with different names may model the same thing.
Solution Gateways, which convert the language and model of each different DBMS into the language and model of the relational system. Limitation • It may not support transaction management. The gateway between two systems may be only a query translator. For example, a system may not coordinate concurrency control and recovery of transactions that involve updates to the pair of databases. • The gateway approach is concerned only with the problem of translating a query expressed in one language into an equivalent expression in another language. As such, generally it does not address the issues of homogenizing the structural and representational differences between different schemas.
A multidatabase system (MDBS) is a distributed DBMS in which each site maintains complete autonomy. An MDBS resides transparently on top of existing database and file systems, and presents a single database to its users. It maintains a global schema against which users issue queries and updates; an MDBS maintains only the global schema and the local DBMSs themselves maintain all user data.
Concepts of Networking Network An interconnected collection of autonomous computers that are capable of exchanging information. For our purposes, the DDBMS is built on top of a network in such of a way that the Network is hidden from the user.
Classification of networkLAN: a local area network is intended for connecting computers at the same site.WAN: a wide area network is used when computers or LANs need to be connected over long distances.A special case of the WAN is a metropolitan area network (MAN), which generally covers a city or suburb.
WAN Distances up to thousands of kilometers link autonomous computers Network managed by independent organization (using telephone or satellite links) Data rate up to 33.6 kbits/(dial-up via modem), 45 Mbit/s (T3 circuit) Complex protocol Use point-to-point routing Use irregular topology Error rate about 1:10^5 LAN Distances up to a few kilometers Link computers that cooperate in distributed applications Network managed by users (using privately owned cables) Data rate up to 2500 Mbit/s (ATM) Simpler protocol Use broadcast routing Use bus or ring topology Error rate about 1:10^9 Summary of WAN and LAN characteristics
Network protocolsa set of rules that determines how messages between computers are sent, interpreted, and processed. • TCP/IP (Transmission Control Protocol/Internet Protocol) • SPX/IPX (Sequenced Packet Exchange/Internetwork Package Exchange) • NetBIOS (Network Basic Input/Output System) • APPC (Advanced Program-to-Program Communications)
Network protocol (cont’d) • DECnet • AppleTalk • WAP (Wireless Application Protocol) • SPX/IPX (Sequenced Packet Exchange/Internetwork Package Exchange)