250 likes | 705 Views
Distributed and Parallel Databases. Distributed Databases. Distributed Systems goal: to offer local DB autonomy at geographically distributed locations Multiple CPU's – each has DBMS, but data distributed Loosely coupled homogeneous
E N D
Distributed Databases • Distributed Systems goal: • to offer local DB autonomy at geographically distributed locations • Multiple CPU's – each has DBMS, but data distributed • Loosely coupled • homogeneous • heterogeneous - different DBMSs - need ODBC, standard SQL
Advantages of DDBs • distributed nature of some DB applications (bank branches) • increased reliability and availability if site failure - also replicate data at > 1 site • data sharing but also local control • improved performance - smaller DBs exist at each site • easier expansion
Client-Server • Client-Server (b) in figure • Client sends request for service (strict – fixed roles) • 3-tier architecture • Presentation tier • Logic tier • Data Tier
Distributed DBSs (DDBS) • Distributed DB (c) in figure • WAN • Multiple CPU's – each has DBMS, but data distributed • lower communication rates • Heterogeneous machines • Homogeneous DDBS • homogeneous – same DBMSs • Heterogeneous DDBS • different DBMSs - need ODBC, standard SQL
Heterogeneous distributed DBSsHDDBs • Data distributed and each site has own DBMS ORACLE at one site, DB2 at another, etc. • need ODBC, standard SQL • usually transaction manager responsible for cooperation among sites • must coordinate distributed transaction • need data conversion and to access data at other sites
P2P • P2P • Every site can act as server to store part of DB and as client to request service
Federated DB - FDBS • federated DB is a multidatabase that is autonomous (a) in figure • collection of cooperating DBSs that are heterogeneous • preexisting DBs form new database • Each DB specifies import/export schema (view) • keeps a partial view of total schema • Each DB has its own local users, local transparency and DBA • appears centralized for local autonomous users • appears distributed for global users
DDBS • Issues in DDBS in slides that follow
Replication • Full vs. partial replication • Which copy to access • Improves performance for global queries but updates a problem • Ensure consistency of replicated copies of data
Data fragments • Can distribute a whole relation at a site or • Data fragments • logical units of the DB assigned for storage at various sites • horizontal fragmentation - subset of tuples in the relation (select) • vertical fragmentation - keeps only certain attributes of relation (project) need a PK
Fragments cont’d • Horizontal fragments: • disjoint - tuples only member of 1 fragment salary < 5000 and dno=4 • complete - set of fragments whose conditions include every tuple • Complete vertical fragment: L1 U L2 U ... Ln - attributes of R Li intersect Lj = PK(R)
Example replication/fragmentation • Example of fragments for company DB: site 1 - company headquarters gets entire DB site 2, 3 – horizontal fragments based on dept. no.
Increased complexity Additional functions needed: • global vs. local queries • keep track of data and replication • execution strategies if data at > 1 site • which copy to access • maintain consistency of copies
To process a query • Must use data dictionary that includes info on data distribution among servers • Ensure atomicity • Parse user query • decomposed into independent site queries • each site query sent to appropriate server site • site processes local query, sends result to result site • result site combines results of subqueries
Architectures • Distributed Systems goal: to offer local DB autonomy at geographically distributed locations versus • Parallel Systems goal: to construct a faster centralized computer • Improve performance through parallelization • Distribution of data governed by performance • Processing, I/O simultaneously
Parallel DBSs • Shared-memory multiprocessor • get N times as much work with N CPU's access • MIMD, SIMD - equal access to same data, massively parallel • Parallel shared nothing • data split among CPUs, each has own CPU, divide work for transactions, communicate over high speed networks LANs - homogeneous machines CPU + memory - called a site
Query Parallelism • Decompose query into parts that can be executed in parallel at several sites • Intra query parallelism • If shared nothing & horizontally fragmented: Select name, phone from account where age > 65 • Decompose into K different queries • Result site accepts all and puts together (order by, count) • What if a join and table is fragmented?
Other issues • Distributed concurrency control using locking • New models • Cloud computing