210 likes | 222 Views
Data Models and Query Languages. CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999. NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus. The Relational Model.
E N D
Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999
NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus The Relational Model • Database consists of relations, where columns are attributes and rows are tuples • Declarative queries using an algebraic or logic language (or SQL, which we shall ignore) attribute tuple NewMovies(Name, Rating, Director) arity = 3, cardinality = 3
Schema • Each attribute in a relation has a type: NewMovies(Name: String, Rating: RatingDomain, Director: String), where RatingDomain = {G, PG, PG-13, R, NC-17} (In many cases, we will omit the type, and assume the type most appropriate to the expression) • Relation schema: relation name, attributes, and types • Relation instance: a set of tuples for a given schema • Database schema: set of relation schemas • Database instance: relation instance for every relation schema
More on Tuples • Formally, a tuple is a mapping from attribute names to (correctly typed) values: • Director -> “George Lucas” • Name -> “Stepmom” • May specify a tuple using values & schema notation: NewMovies(“The Phantom Menace”, “PG”, “George Lucas”) • Can have constraints on attribute values: • Integrity constraints • Keys • Foreign keys • Functional dependencies
Relational Algebra All operators take one or more sets of tuples as input, and produce a new set as output • Basic set operators (, , —, but normally no complement) • Selection () • Projection () • Cartesian Product () • Join () • Division ()
Set Operations • Sets must be compatible! • Same number of attributes • Same types/domains • R1 R2 = {all tuples in either R1 or R2} • R1 R2 = {all tuples in both R1 and R2} • Complement??? • Negation: R1 — R2 = {all tuples in R1 not in R2} • Query without unions is conjunctive
NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Name Rating Director Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Selection Chooses a subset of tuples satisfying a predicate Rating = “PG-13” (NewMovies)
NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Name, Director (NewMovies) Projection Chooses a subset of attributes, removes duplicates Name Director The Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus
NewMovie Name Director Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus MetroShows Name Time Patch Adams 7:00 Patch Adams 9:40 Stepmom 6:50 Stepmom 9:00 NewMovie MetroShows Name Director Name Time Phantom Menace George Lucas Patch Adams 7:00 Phantom Menace George Lucas Patch Adams 9:40 Phantom Menace George Lucas Stepmom 6:50 Phantom Menace George Lucas Stepmom 9:00 Patch Adams Tom Shadyac Patch Adams 7:00 … … … ... Cartesian Product • Combine two relations - all pairings of tuples
NewMovie Name Director Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus MetroShows Name Time Patch Adams 7:00 Patch Adams 9:40 Stepmom 6:50 Stepmom 9:00 NewMovie NewMovie.Name = MetroShows.Name MetroShows Name Director Time Patch Adams Tom Shadyac 7:00 Patch Adams Tom Shadyac 9:40 Stepmom Chris Columbus 6:50 Stepmom Chris Columbus 9:00 Join • Combine tuples from two relations by predicate. If predicate is equality, remove duplicate attribute. Equivalent to: Name, Director, Time ( NewMovie.Name = MetroShows.Name (NewMovie MetroShows))
Division • Only returns results from dividend which match attributes of all of tuples in divisor (“for all”) Customers name branch Johnson Downtown Smith Brighton Lindsay Redwood Green Brighton Green Downtown Branches branch Brighton Downtown Customers Branches name Green
Logic-Based Query Languages • Tuple-relational calculus • Based on first-order predicate calculus, but imposes restrictions (e.g., no function symbols) • Equivalent in power to relational algebra • Datalog • More powerful than tuple-relational calculus Supports recursion and thus transitive closure • Equivalent to set of Horn clauses
Predicates & Atoms • Relations represented by predicates • Tuples represented by atoms: Flight(“Alon”, “British Airways”, 1234, “Seattle”, “Israel”, “1/9/1999”, “10:00”) • Arithmetic atoms: X < 100, X + Y + 5 > Z * 2 • In certain cases, negated atoms: not Flight(“Zack”, “British Airways”, 1234, “Seattle”, “Israel”, “1/9/1999”, “10:00”)
Datalog Rules & Programs • “Pure” datalog rules: head :- atom1, atom2, atom3 where all atoms are non-negated and relational • Datalog program is set of datalog rules • Conjunctive query has single rule: HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M) • Disjunctive query: HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M) HanksDirector(D) :- ProducedBy(“Hanks”, M) & DirectedBy(D, M)
Intension & Extension • Distinction between EDB & IDB: • Extensional DB • what’s stored in the database • can only be in the body of a Datalog rule • Intensional DB • result of a datalog rule • can be in head or body of a Datalog rule
Evaluating Rules • Evaluating rules: • Start with EDB, iteratively derive facts for IDB’s • Repeat until cannot derive any new facts: • Consider every possible assignment from database to variables in body • If each atom in body is made true by assignment, add tuple for the head into the head’s relation • Conjunctive queries are inexpensive • Disjunctive are more expensive (even though they’re Horn clauses)
Transitive Closure • Suppose we to know all possible destination airports reachable from SeaTac, given an EDB Flight(From, To): SFO SEA LAX LGA DEN JFK We need to express the query: Find all airports reachable from SEA
Recursion in Datalog • Program: Dest(X, Y) :- Flight(X, Y) Dest(X, Y) :- Dest(X, Z), Flight(Z, Y) • Evaluate unknown # of times until fixpoint • Flight: {(SEA,SFO), (SEA,LAX), (SEA,JFK), (LAX,LGA), (JFK,LGA),(LGA,DEN)}Dest0: {} • Dest1: Dest0 {(SEA,SFO), (SEA,LAX), (SEA,JFK)} • Dest2: Dest1 {(SEA,LGA),(LAX,DEN),(JFK,DEN)} • Dest3: Dest2 {(SEA,DEN)} • No more changes, so stop (minimum fixpoint)
Built-in Predicates • Rules may include atoms with built-in predicates: Expensive(X) :- Product(X, Y, P) & P > 100 • But need to restrict use of built-in atoms: P(X) :- R(X) & X < Y • When is X < Y? • We require that every variable from a built-in atom appear in a relational atom
Negated Subgoals • Restrict forms: P(X, Y) :- Between(X, Y, Z) & NOT Direct(X,Z) • Bad: Q(X, Y) :- From(X) & NOT To(Y) • Bad but fixable: P(X) :- From(X) & NOT Connects(X, Y) • Rewrite as: Connects’(X) :- Connects(X,Y) P(X) :- End(X) & NOT Connects’(X)
Stratified/Stratisfiable Rules • Predicate P depends on predicate Q if Q appears negated in a rule defining P • If cycle in dependency graph, program is not stratifiable: p(X) :- r(X) & NOT q(X) q(X) :- r(X) & NOT p(X) • Suppose r has tuple {1} Intuitively, stratification provides a way of executing program P as sequence of subprograms P1 … Pn defining IDB relations w/o “forward references”. Results independent of stratification.