280 likes | 353 Views
The Relational Data Model and SQL 10:30AM—12noon Monday, July 18 th , 2005 CSIG05. Chaitan Baru. OUTLINE. Foundations of the relational data model Data models and database systems Relations, attributes, keys Introduction to SQL “Hands-on” with SQL Vishwanath Nandigam. S1. S2. Student.
E N D
The Relational Data Model and SQL10:30AM—12noonMonday, July 18th, 2005CSIG05 Chaitan Baru
OUTLINE • Foundations of the relational data model • Data models and database systems • Relations, attributes, keys • Introduction to SQL • “Hands-on” with SQL • Vishwanath Nandigam
S1 S2 Student C1 C2 C2 C3 C4 Course Instructor I1 I2 I2 I3 I1 Historical note: IBM’s role in database systems • IBM’s early database systems were based on the hierarchical data model – IMS (Information Management System). • IMS serves more than 95 percent of Fortune 1000 companies • Manages 15 petabytes of production data • Supports more than two hundred million users per day
IBM and Relational Database Systems • Relational model was introduced to provide separation of application logic from the data representations • DB2 IBM’s second database product! • Foundations of relational model were invented at IBM by E.F.Codd, IBM • A Relational Model of Data for Large Shared Data Banks, E. F. Codd, June 1970 • First prototype, System R, was developed at IBM Almaden in mid-70’s • Introduced SQL • Provided as SQL/DS on IBM mainframe systems • Oracle was based on early System R work
S# student info C# course info S# C# Why the Relational Model? • Deals with the application logic / data separation in business data processing, unlike the earlier “network” and “hierarchical” data model • Plus, an algebra for manipulating relations the key innovation takes teaches Student Course Instructor N N 1 M
What is a Database System? • Database (system) = • Database Instance (e.g. set of tables of rows) • Database Management System (DBMS) • Origins in the commercial world: • to organize, query, and manipulate data more effectively, efficiently, and independently • Scientific databases • often special features: • spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw & derived data, …
Why not just use files as “databases”? • Works for some applications… • But: • scanning & ‘grep’ing large files can be veryinefficient • no language support for selecting desired data, joining them, etc. • cannot express the kinds of questions/queries you’d like to ask • ‘grep’ is no substitute for a query language • redundant and/or inconsistent storage of data • no transaction management and concurrency control among multiple users • no security • no recovery • no data independence (application data) • no data modeling support • …
Features of a Database System • A data model(relational, object-oriented, XML) prescribes how data can be organized: • as relations (tables) of tuples (rows) • as classes of (linked) objects • as XML trees • A (database) schema (stored in the “data dictionary”) defines the structure of a specific database instance: • Relational schema • OO schema • XML Schema (or XML DTD) • A query language • Allows ad hoc, declarative (non-procedural) queries on schema
Features of a Database System • Data is treated uniformly and separately from the application • Efficient data access • Queries and views are expressed over the schema • Integrity constraints (checking and enforcement) • Transactions combine sets of operations into logical units (all-or-nothing) • Synchronization of concurrent user transactions • Recovery (after system crash) • not to be confused w/ backup • instead: guarantee consistency by “roll-back” of partially executed transactions (how? Hint: logging) • …
DB features:E.g. Concurrency Control • Concurrent execution of simultaneous requests • long before web servers where around... • transaction management guarantees consistency despite concurrent/interleaved execution • Transaction (= sequence of read/write operations) • Atomicity: a transaction is executed completely or not at all • Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained • Isolation: to the user, a transaction seems to run in isolation • Durability: the effect of a successful (“committed”) transaction remains even after system failure
Levels of Abstraction: Architecture Overview User Conceptual … Level View 1 View 2 View n Export schemas logical data independence ER-Model (Entity-Relationship) OO Models (Classes…) part of DB design conceptual design … often lost in the process… Logical (“conceptual”) level Tables physical data independence Physical level Index structures DB instances
Name Name since Manager Salary Database Design: Entity-Relationship (ER) Model • Entities: • Relationships: • Attributes: • ER Model: • initial, high-level DB design (conceptual model) • easy to map to a relational schema (database tables) • comes with more constraints (cardinalities, aggregation) and extensions: EER (is-a => class hierarchies) • related: UML (Unified Modeling Language) class diagrams Employee Department works-for
The Relational Model Employee • Relation/Table Name: • employee, dept • Attributes = Column Names: • Emp,Salary,DeptNo, Name, Mgr • Relational Schema: • employee(Emp:string, Salary:integer, DeptNo:integer), ... • Tuple = Row of the table: • (“tom”, “60000”, “1”) • Relation = Set of tuples: • {(...), (...), ...} EmpSalaryDNo tom 60k 1 tim 57k 1 sally 45k 3 carol 30k 1 carol 35k 2 …. FK: foreign key, pointing to another key Department DNo Name Mgr 1 Toys carol 2 Comp. carol 3 Shoes sam
Creating a Relational Database in SQL CREATE TABLE employee ( ssn CHAR(11), name VARCHAR(30), deptNo INTEGER, PRIMARY KEY (ssn), FOREIGN KEY (deptNo) REFERENCES department ) CREATE TABLE department ( deptNo INTEGER, name VARCHAR(20), manager CHAR(11), PRIMARY KEY (deptNo), FOREIGN KEY (manager) REFERENCES employee(ssn) )
What is a Query? • Intuitively: • An “executable question” in terms of a database schema • Evaluating a query Q against a database instance D yields a set of answer objects: • Relational tuples or XML elements • Example: • Who are the employees in the ‘Toys’ dept.? • Who is (are) the manager(s) of ‘Tom’? • Show all pairs (Employee, Mgr) • Technically: • A mapping from an input schema (the given table schemas) to a result schema (the new columns you are interested in) defined in some query language
Why (Declarative) Query Languages? • Things we talk and think about in PLs and QLs • Assembly languages: • registers, memory locations, jumps, ... • C and the likes: • if-then-else, for, while, memory (de-)allocation, pointers, ... • Object-oriented languages: • C++: C plus objects, methods, classes, ... • Java: objects, methods, classes, references, ... • Smalltalk: objects, objects, objects, ... • OQL: object-query language
Why (Declarative) Query Languages? • Things we talk and think about in PLs and QLs • Functional languages (Haskell, ML): • (higher-order) functions, fold(l|r), recursion, patterns, ... => Relational languages (SQL, Datalog) • relations (tables), tuples (rows); conceptual level: ER • relational operations: , , , , ..., ,,,,,..., , , |X| => Semistructured/XML (Tree) & Graph Query Languages • trees, graphs, nodes, edges, children nodes, siblings, … • XPath, XQuery, … • Also: Focus on what, and not how!
Employee Department result EmpSalaryDeptNo anne 62k 2 john 60k 1 DeptNoMgr 1 anne 2 anne EmpMgr john anne anne anne Example: Querying a Relational Database input tables join SELECT e.Emp, d.Mgr FROM Employee e, Department d WHERE e.DeptNo =d.DeptNo we don’t say how to evaluate this expression SQL query (or view def.) answer (or view)
Example Query: SQL vs DATALOG • “List all employees and their managers” • In SQL: SELECT e.name, d.manager FROM Employee e, Department d WHERE e.deptNo = d.deptNo • In DATALOG: q(E, M) :- employee(E, S, D), department(D, N, M). a “join” operation
Important Relational Operations • select(R, Condition) • filter rows of a table wrt. a condition • project(R, Attr) • remove unwanted columns; keep rest • join(R1, A2, R2, A2, Condition) • find “matches” in a “related” table • e.g. match R1.foreign key = R2.primary key • cartesian product(R1, R2) • union (“OR”), intersection (“AND”) • set-difference (“NOT IN”)
Queries, Views, Integrity Constraints • … can all be seen as “special queries” • Query q(…) :- … ad-hoc queries • View v(…) :- … exported views; • Integrity Constraints • ic (…) :- …. MgrSal < EmpSal … • say what shouldn’t happen • if it does: alert the user (or refuse an update, …)
Query Evaluation vs Reasoning • Query evaluation • Given a database instance D and a query Q, run Q(D) • What databases do all the time • Reasoning (aka “Semantic Query Optimization”) • Given a query Q and a constraint C, “optimize” Q&C (e.g., given C, Q might be unsatisfiable) • Given Q1 and Q2 decide whether Q1 Q2 • Given Q1,Q2, C decide whether Q1 Q2 | C • Note: we are NOT given a database instance D here; just the schema and the query/IC expressions
Natural Join: same attribute name add condition that values must match Summary QLs for Relational Databases