2.35k likes | 3.35k Views
COP5725 Advanced Database Systems. DB Fundamentals. Spring 2014. What are Database Management Systems. DBMS is a system for providing EFFICIENT , CONVENIENT , and SAFE MULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data. Example: Banking System. Data
E N D
COP5725Advanced Database Systems DB Fundamentals Spring 2014
What are Database Management Systems DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFEMULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data
Example: Banking System • Data • Information on accounts, customers, balances, current interest rates, transaction histories, etc. • MASSIVE • Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory • PERSISTENT • Data outlives programs that operate on it
Example: Banking System • SAFE: • from system failures • from malicious users • CONVENIENT: • simple commands to debit account, get balance, write statement, transfer funds, etc. • also unpredicted queries should be easy • EFFICIENT: • don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. • massive data! -> DBMS's carefully tuned for performance
Multi-user Access • Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls • Alex @ ATM1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance := balance - 100; dispense cash; put new balance into database; • Bob @ ATM2: withdraw $50 from account #007 get balance from database; if balance >= 50 then balance := balance - 50; dispense cash; put new balance into database; • Initial balance = 120. Final balance = ??
Why File Systems Won’t Work • Storing data: file system is limited • size limit by disk or address space • when system crashes we may loose data • Password/file-based authorization insufficient • Query/update: • need to write a new C++/Java program for every new query • need to worry about performance • Concurrency: limited protection • need to worry about interfering with other users • need to offer different views to different users (e.g. registrar, students, professors) • Schema change: • entails changing file formats • need to rewrite virtually all applications That’s why the notion of DBMS was motivated!
DBMS Architecture User/Web Forms/Applications/DBA query transaction DDL commands Query Parser Transaction Manager DDL Processor Query Rewriter Concurrency Control Logging & Recovery Query Optimizer Query Executor Records Indexes Lock Tables Buffer: data, indexes, log, etc Buffer Manager Main Memory Storage Manager Storage data, metadata, indexes, log, etc CS411
Data Structuring: Model, Schema, Data • Data model • conceptual structuring of data stored in database • ex: data is set of records, each with student-ID, name, address, courses, photo • ex: data is graph where nodes represent cities, edges represent airline routes • Schema versus data • schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") • data: actual "instance" of database, changes rapidly • vs. types and variables in programming languages
Schema vs. Data • Schema: name, name of each field, the type of each field • Students (Sid:string, Name:string, Age: integer, GPA: real) • A template for describing a student • Data: an example instance of the relation
Data Structuring: Model, Schema, Data • Data definition language (DDL) • commands for setting up schema of database • Data Manipulation Language (DML) • Commands to manipulate data in database: • RETRIEVE, INSERT, DELETE, MODIFY • Also called "query language"
People • DBMS user: queries/modifies data • DBMS application designer • set up schema, loads data, … • DBMS administrator • user management, performance tuning, … • DBMS implementer: builds systems
Key Steps in Building DB Applications • Step 0: pick an application domain • Step 1: conceptual design • Discuss with your team mate what to model in the application domain • Need a modeling language to express what you want • ER model is the most popular such language • Output: an ER diagram of the application domain • Step 2: pick a type of DBMS’s • Relational DBMS is most popular and is our focus
Key Steps in Building DB Applications • Step 3: translate ER design to a relational schema • Use a set of rules to translate from ER to relational schema • Use a set of schema refinement rules to transform the above relational schema into a goodrelational schema • 1NF, 2NF, 3NF, BCNF, 4NF,…… • At this point • You have a good relational schema on paper
Key Steps in Building DB Applications • Step 4: Implement your relational DBMS using a "database programming language" called SQL • SELECT-FROM-WHERE-GROUPBY-HAVING • Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence • Write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do
Constraints • Constraint: an assertion about the database that must be true at all times • Part of the database schema • Very important in database design • Finding constraints is part of the modeling process • Keys: social security number uniquely identifies a person • Single-value constraints: a person can have only one father • Referential integrity constraints: if you work for a company, it must exist in the database • Domain constraints: peoples’ ages are between 0 and 150 • General constraints: all others (at most 50 students enroll in a class)
More about Keys • Every entity must have a key • why? • A key can consist of more than one attribute • There can be more than one key for an entity set • Among all candidate keys, one key will be designated as primary key
ER Model vs. Relational Model • Both are used to model data • ER model has many concepts • Entities, relationships, attributes, etc. • Well-suited for capturing the app. requirements • Not well-suited for computer implementation • Relational model • Has just a single concept: relation • World is represented with a collection of tables • Well-suited for efficient manipulations on computers
Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Domain (Atomic type) Row (Record, Tuple)
Relations • Schema vs. instance = columns vs. rows • Schema of a relation • Relation name • Attribute names • Attribute types (domains) • Schema of a database • A set of relation schemas • Questions • When do you determine a schema (instance)? • How often do you change your mind?
Relations • The database maintains a current database state • Updates to the data happen very frequently • add a tuple • delete a tuple • modify an attribute in a tuple • Updates to the schema are relatively rare, and rather painful. Why?
Defining a Database Schema • A database schema comprises declarations for the relations (“tables”) of the database • Simplest form of creation is: CREATE TABLE <name> ( <list of elements> ); • And you may remove a relation from the database schema by: DROP TABLE <name>;
Elements of Table Declarations • The principal element is a pair consisting of an attribute and a type • The most common types are: • INT or INTEGER (synonyms) • REAL or FLOAT (synonyms) • CHAR(n ) = fixed-length string of n characters • VARCHAR(n ) = variable-length string of up to ncharacters
Example: Create Table CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL );
Declaring Keys • An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE • Each says the attribute(s) so declared functionally determines all the attributes of the relation schema • Single attribute keys CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) );
Multi-attribute Keys CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) );
Foreign Keys • A Foreign Key is a field whose values are keys in another relation • Must correspond to primary key of the second relation • Like a `logical pointer’ Enrolled Students CREATE TABLE Enrolled ( sidCHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCESStudents, FOREIGN KEY (cid) REFERENCESCourses )
Relational Algebra • Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Tallahassee • Could write in C++/Java, but a bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, Datalog • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles
What is an “Algebra”? • Mathematical system consisting of: • Operands--- variables or values from which new values can be constructed • Operators--- symbols denoting procedures that construct new values from given values • Examples • Arithmetic algebra, linear algebra, Boolean algebra …… • What are operands? • What are operators?
What is Relational Algebra? • An algebra • Whose operands are relations or variables that represent relations • Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output • Can be used as a query languagefor relations
Relational Operators at a Glance • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: s • Projection: p • Cartesian Product: X • When our relations have attribute names: • Renaming: r • Derived operations: • Intersection, complement • Joins (natural join, equi-join, theta join, semi-join)
Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 and not in R2, denoted as R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • All-Employees - Retired-Employees
Selection • Returns all tuples which satisfy a condition, denoted assc(R) • c is a condition: =, <, >, AND, OR, NOT • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)
Projection • Unary operation: returns certain columns, denoted as P A1,…,An(R) • Eliminates duplicate tuples ! • Input schema R(B1, …, Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1, …, An) • Example: project social-security number and names: • PSSN, Name (Employee)
Selection vs. Projection • Think of relation as a table • How are they similar? • How are they different? • Why do you need both?
Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Output schema is S(A1, …, An, B1, …, Bm) • Very rare in practice; but joinsare very common • Example: Employee x Dependent
Example Employee Dependent Employee x Dependent
Renaming • Does not change the relational instance, denoted as Notation: rS(B1,…,Bn) (R) • Changes the relational schema only • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • Soc-sec-num, firstname(Employee)
Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example • UnionizedEmployeesRetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ?
Theta Join • A join that involves a predicate q, denoted as R1 qR2 • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 qR2 = sq (R1 x R2) • Take the product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition
Theta Join: Example Sells Bar BarInfo := Sells Sells.Bar=Bar.Name Bar
Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where{C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} {B1,…, Bm}(called the join attributes)
Natural Join: Examples Employee Dependent • Employee Dependent = • PSSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)
Natural Join: Examples R S R S
Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S? • Given R(A, B), S(A, B), what is R S?
Equi-join • Special case of theta join: condition c contains only conjunction of equalities • Result schema is the same as that of Cartesian product • May have fewer tuples than Cartesian product • Most frequently used in practice: R1 A=BR2 • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently
Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way • Example • In arithmetic algebra: (x + 4)*(y - 3) • Relational algebra allows the same • Three notations, just as in arithmetic: • Sequences of assignment statements • Expressions with several operators • Expression trees
Sequences of Assignments • Create temporary relation names • Renaming can be implied by giving relations a list of attributes • Example: R3 := R1 JOINC R2 can be written: R4 := R1 x R2 R3 := SELECTC (R4)
Expressions with Several Operators • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 x R2) • Precedence of relational operators: • Unary operators --- select, project, rename --- have highest precedence, bind first • Then come products and joins • Then intersection • Finally, union and set difference bind last • But you can always insert parentheses to force the order you desire
Expression Trees • Leaves are operands • either variables standing for relations or particular constant relations • Interior nodes are operators, applied to their child or children
UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud” Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 Bars Sells