440 likes | 465 Views
Distributed Database Management Systems. Database Management System (DBMS). Collection of interrelated data and set of programs to access the data Convenient and efficient processing of data Database Application Software. Abstraction. View level : different perspectives
E N D
Database Management System (DBMS) • Collection of • interrelated data and • set of programs to access the data • Convenient and efficient processing of data • Database Application Software CSCE 824
Abstraction • View level: different perspectives • application programs hide irrelevant data • Logical level: data models • Logical representation of data • Different approaches: relational, hierarchical, network, object oriented, semi-structured, etc. • Data independence principle • Physical level: how data is stored CSCE 824
Motivation for DBMS • Integrate related data • Provide centralized and controlled access to data • Guarantee database properties • Reduce database development efforts CSCE 824
Data Models • A collection of tools for describing • Data • Relationships among data items • Semantics of stored data • Database constraints • Entity-Relational Model • UML • Etc. CSCE 824
Database Management Systems • Smaller and smaller systems • Past: large and expensive DBMS • Present: DBMS in most personal computers • More and more data stored • Past: few MB • Present: terabyte (1012 bytes), petabyte (1015 bytes) • Functionality: from physical to view level • Optimization CSCE 824
Data Definition Language (DDL) • Defines the database schema and constraints • DDL compiler data dictionary • Metadata – data about data CSCE 824
Modeling data semantics CSCE 824
Example E/R Diagram Breed Name Name License # Dog Boards Kennel Age Address Owns Phone Weight Pays Owner Name Phone CSCE 824
Converting ER Model into Relations Dog CSCE 824
Data Manipulation Language (DML) • Accessing and manipulating the data • Query Languages • Procedural – user specifies what data is required and how to get those data • Nonprocedural – user specifies what data is required without specifying how to get those data CSCE 824
Query Languages • Relational Algebra • Set operations • SQL • Bag operations CSCE 824
Structured Query LanguageSQL • Typical SQL query form:SELECT A1, A2, ..., AnFROMr1, r2, ..., rmWHERE C • Ais represent attributes to be returned • ris represent relations • C is a condition CSCE 824
Computer Network • Distributed processing: • Number of autonomous processing elements that are interconnected by computer network • Cooperate to perform their assigned tasks CSCE 824
Why to distribute? • Intuition • Reliability • Performance CSCE 824
What to distribute? • Processing logic/element • Functions • Data • Control of execution CSCE 824
Distributed Database Systems • Distributed database: • Collection of multiple, logically interrelated databases that are distributed over a computer network • Distributed DBMS: software system that • Permits the management of the distributed database and • Makes the distribution transparent to the user CSCE 824
DDBMS Services • Transparent data management • Distributed, replicated data • Transparency: network, replica, fragmentation • Reliable access to data • Distributed transactions • Failure atomicity • Improved performance • Flexible expansion CSCE 824
Difficulties • Everything that is present in traditional DBs • Fragmentation and replica control • Data retrieval • Data update • Dealing with failures • Synchronization CSCE 824
Distributed Database Design CSCE 824
Design Issues • Placing of data and programs (DBMS and application) • Network issues CSCE 824
Level of Sharing • No sharing • Data sharing • Data and program sharing Heterogeneous environment! CSCE 824
Why Replication? • System availability • Performance • Scalability • Application requirements CSCE 824 - Spring 2011
Risk of Replication • Worse performance: updates must be applied to all replicas and synchronized • Worse availability: some algorithms require multiple replicas to be operational for any of them to be used CSCE 824 - Spring 2011
Replicated Databases • Multiple copies of the same data items (databases) • Consistency: • Local consistency • Mutual consistency CSCE 824 - Spring 2011
Transaction Correctness • 2-Phase Locking – serializability • 2-Phase Commit – reliability • Replica control – mutual consistency • Database design: local vs. global transactions • Database consistency: strong consistency vs. weak consistency • Location of updates: master vs. distributed • Update propagation: eager vs. lazy • Degree of transparency: limited vs. full CSCE 824 - Spring 2011
Mutual Consistency vs. Transaction Consistency • Transaction consistency: global serializability • Mutual consistency: replicas having the same values • Strong: all replicas have the same value at the end of the execution of an update transaction • Quorum: a quorum of replicas have the same value • Weak: eventually the values of all replicas become identical CSCE 824 - Spring 2011
Replica Control • Hides replication from transaction • Knows location of all replicas • Translates transaction’s request to access an item into request to access particular replica(s) • Maintains some form of mutual consistency CSCE 824 - Spring 2011 29
One-Copy Serializability (1SR) • Extension of the serializability theory • Effects of transactions on replicated data items should be the same as if they had been performed one at-a-time on a single set of date items CSCE 824 - Spring 2011
x1 Transaction x2 x3 Example Replication Issues • May reduce performance (complex operations) • Too expensive • Can’t control when replicas are updated CSCE 824 - Spring 2011 31 7/22/99
Replica Control • Pessimistic replica control: at most one group can make an update – mutual consistency at all times • Optimistic replica control: system must be available at all times. Correct if there is any violation of mutual consistency CSCE 824 - Spring 2011
Read One / Write All Replica Control • Pessimistic approach • Read the nearest replica • Write all replicas • Synchronous : before transaction commits • Asynchronous case: eventually • Advantage: • Mutual consistency • Performance benefits: reads transactions • Disadvantage: availability is not always guaranteed • E.g., Primary site approach CSCE 824 - Spring 2011 33
Primary Site – static • Primary site: most recent copy • What happens if the network is partitioned? 2 DB0 1 Primary DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011
Majority Approach • The group that contains the majority of the sites can process an update DB0 1 DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011
Majority Approach • The group that contains the majority of the sites can process an update 2 DB0 (N+1)/2 1 DB3 DB1 DB2 DB6 DB5 DB4 Farkas CSCE 824 - Spring 2011 CSCE 824 - Spring 2011 36
Majority Approach • Advantages: more flexible than primary site • Disadvantages: zero availability may still happen • Who has the most recent copy? • Version number: • Each site assigns a version number to the copy (initially VN=0) • After an update, the VN is incremented by 1 CSCE 824 - Spring 2011
Quorum Consensus • Each sites are not equal • Special case of majority approach W=5 DB0 W=3 W=2 DB3 DB1 W=1 W=1 DB2 DB6 DB5 DB4 W=1 W=15 CSCE 824 - Spring 2011
Other Approaches • Dynamic Linear: order sites linearly to calculate majority • Token-based primary site (moving token): change the location of the primary site CSCE 824 - Spring 2011
Pessimistic Replica Control • Advantages: • Mutual consistency at all times • Know the latest version ( between two consecutive updates, there is a site in common) • Disadvantage: • May result in zero availability CSCE 824 - Spring 2011
Optimistic Replica Control • Goal: availability at all time • Issues: consistency may not be guaranteed • Need an algorithm to detect whether an inconsistency occurred • Take actions to fix any inconsistencies CSCE 824 - Spring 2011
Example Optimistic Alg. • Two partitions P1, P2 • Assumption: separately, P1 and P2 produces serializable histories • Need: after P1 and P2 joins again: Detect which transactions violate global serializability CSCE 824 - Spring 2011
Summary • Correctness: If the transactions are ACID, local execution in serializable, distributed transactions are reliable, and update replication is synchronous then distributed transactions are globally atomic & serializable • Performance: • Applications: transactions are not always serializable (e.g., WS-transactions) • Replication: update propagation is not always asynchronous • Compensating transactions CSCE 824 - Spring 2011
Next Class Hadoop Architecture A read-only replication CSCE 824 - Spring 2011