440 likes | 467 Views
Explore principles, levels, and models of DBMS, focusing on distributed database design and management for optimized performance and data consistency.
E N D
Database Management System (DBMS) • Collection of • interrelated data and • set of programs to access the data • Convenient and efficient processing of data • Database Application Software CSCE 824
Abstraction • View level: different perspectives • application programs hide irrelevant data • Logical level: data models • Logical representation of data • Different approaches: relational, hierarchical, network, object oriented, semi-structured, etc. • Data independence principle • Physical level: how data is stored CSCE 824
Motivation for DBMS • Integrate related data • Provide centralized and controlled access to data • Guarantee database properties • Reduce database development efforts CSCE 824
Data Models • A collection of tools for describing • Data • Relationships among data items • Semantics of stored data • Database constraints • Entity-Relational Model • UML • Etc. CSCE 824
Database Management Systems • Smaller and smaller systems • Past: large and expensive DBMS • Present: DBMS in most personal computers • More and more data stored • Past: few MB • Present: terabyte (1012 bytes), petabyte (1015 bytes) • Functionality: from physical to view level • Optimization CSCE 824
Data Definition Language (DDL) • Defines the database schema and constraints • DDL compiler data dictionary • Metadata – data about data CSCE 824
Modeling data semantics CSCE 824
Example E/R Diagram Breed Name Name License # Dog Boards Kennel Age Address Owns Phone Weight Pays Owner Name Phone CSCE 824
Converting ER Model into Relations Dog CSCE 824
Data Manipulation Language (DML) • Accessing and manipulating the data • Query Languages • Procedural – user specifies what data is required and how to get those data • Nonprocedural – user specifies what data is required without specifying how to get those data CSCE 824
Query Languages • Relational Algebra • Set operations • SQL • Bag operations CSCE 824
Structured Query LanguageSQL • Typical SQL query form:SELECT A1, A2, ..., AnFROMr1, r2, ..., rmWHERE C • Ais represent attributes to be returned • ris represent relations • C is a condition CSCE 824
Computer Network • Distributed processing: • Number of autonomous processing elements that are interconnected by computer network • Cooperate to perform their assigned tasks CSCE 824
Why to distribute? • Intuition • Reliability • Performance CSCE 824
What to distribute? • Processing logic/element • Functions • Data • Control of execution CSCE 824
Distributed Database Systems • Distributed database: • Collection of multiple, logically interrelated databases that are distributed over a computer network • Distributed DBMS: software system that • Permits the management of the distributed database and • Makes the distribution transparent to the user CSCE 824
DDBMS Services • Transparent data management • Distributed, replicated data • Transparency: network, replica, fragmentation • Reliable access to data • Distributed transactions • Failure atomicity • Improved performance • Flexible expansion CSCE 824
Difficulties • Everything that is present in traditional DBs • Fragmentation and replica control • Data retrieval • Data update • Dealing with failures • Synchronization CSCE 824
Distributed Database Design CSCE 824
Design Issues • Placing of data and programs (DBMS and application) • Network issues CSCE 824
Level of Sharing • No sharing • Data sharing • Data and program sharing Heterogeneous environment! CSCE 824
Why Replication? • System availability • Performance • Scalability • Application requirements CSCE 824 - Spring 2011
Risk of Replication • Worse performance: updates must be applied to all replicas and synchronized • Worse availability: some algorithms require multiple replicas to be operational for any of them to be used CSCE 824 - Spring 2011
Replicated Databases • Multiple copies of the same data items (databases) • Consistency: • Local consistency • Mutual consistency CSCE 824 - Spring 2011
Transaction Correctness • 2-Phase Locking – serializability • 2-Phase Commit – reliability • Replica control – mutual consistency • Database design: local vs. global transactions • Database consistency: strong consistency vs. weak consistency • Location of updates: master vs. distributed • Update propagation: eager vs. lazy • Degree of transparency: limited vs. full CSCE 824 - Spring 2011
Mutual Consistency vs. Transaction Consistency • Transaction consistency: global serializability • Mutual consistency: replicas having the same values • Strong: all replicas have the same value at the end of the execution of an update transaction • Quorum: a quorum of replicas have the same value • Weak: eventually the values of all replicas become identical CSCE 824 - Spring 2011
Replica Control • Hides replication from transaction • Knows location of all replicas • Translates transaction’s request to access an item into request to access particular replica(s) • Maintains some form of mutual consistency CSCE 824 - Spring 2011 29
One-Copy Serializability (1SR) • Extension of the serializability theory • Effects of transactions on replicated data items should be the same as if they had been performed one at-a-time on a single set of date items CSCE 824 - Spring 2011
x1 Transaction x2 x3 Example Replication Issues • May reduce performance (complex operations) • Too expensive • Can’t control when replicas are updated CSCE 824 - Spring 2011 31 7/22/99
Replica Control • Pessimistic replica control: at most one group can make an update – mutual consistency at all times • Optimistic replica control: system must be available at all times. Correct if there is any violation of mutual consistency CSCE 824 - Spring 2011
Read One / Write All Replica Control • Pessimistic approach • Read the nearest replica • Write all replicas • Synchronous : before transaction commits • Asynchronous case: eventually • Advantage: • Mutual consistency • Performance benefits: reads transactions • Disadvantage: availability is not always guaranteed • E.g., Primary site approach CSCE 824 - Spring 2011 33
Primary Site – static • Primary site: most recent copy • What happens if the network is partitioned? 2 DB0 1 Primary DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011
Majority Approach • The group that contains the majority of the sites can process an update DB0 1 DB3 DB1 DB2 DB6 DB5 DB4 CSCE 824 - Spring 2011
Majority Approach • The group that contains the majority of the sites can process an update 2 DB0 (N+1)/2 1 DB3 DB1 DB2 DB6 DB5 DB4 Farkas CSCE 824 - Spring 2011 CSCE 824 - Spring 2011 36
Majority Approach • Advantages: more flexible than primary site • Disadvantages: zero availability may still happen • Who has the most recent copy? • Version number: • Each site assigns a version number to the copy (initially VN=0) • After an update, the VN is incremented by 1 CSCE 824 - Spring 2011
Quorum Consensus • Each sites are not equal • Special case of majority approach W=5 DB0 W=3 W=2 DB3 DB1 W=1 W=1 DB2 DB6 DB5 DB4 W=1 W=15 CSCE 824 - Spring 2011
Other Approaches • Dynamic Linear: order sites linearly to calculate majority • Token-based primary site (moving token): change the location of the primary site CSCE 824 - Spring 2011
Pessimistic Replica Control • Advantages: • Mutual consistency at all times • Know the latest version ( between two consecutive updates, there is a site in common) • Disadvantage: • May result in zero availability CSCE 824 - Spring 2011
Optimistic Replica Control • Goal: availability at all time • Issues: consistency may not be guaranteed • Need an algorithm to detect whether an inconsistency occurred • Take actions to fix any inconsistencies CSCE 824 - Spring 2011
Example Optimistic Alg. • Two partitions P1, P2 • Assumption: separately, P1 and P2 produces serializable histories • Need: after P1 and P2 joins again: Detect which transactions violate global serializability CSCE 824 - Spring 2011
Summary • Correctness: If the transactions are ACID, local execution in serializable, distributed transactions are reliable, and update replication is synchronous then distributed transactions are globally atomic & serializable • Performance: • Applications: transactions are not always serializable (e.g., WS-transactions) • Replication: update propagation is not always asynchronous • Compensating transactions CSCE 824 - Spring 2011
Next Class Hadoop Architecture A read-only replication CSCE 824 - Spring 2011