120 likes | 568 Views
Datalog. Presented by: Michael Lam CS 157A section 1 Dr. Sin Min Lee. Outline. Basic Structure/Terminolgoy (5.2.1) Syntax of Datalog Rules (5.2.2) Safety (5.2.4) Recursion in Datalog (5.2.6) Relational Operations in Datalog (5.2.5) Examples. What is Datalog?.
E N D
Datalog Presented by: Michael Lam CS 157A section 1 Dr. Sin Min Lee
Outline • Basic Structure/Terminolgoy (5.2.1) • Syntax of Datalog Rules (5.2.2) • Safety (5.2.4) • Recursion in Datalog (5.2.6) • Relational Operations in Datalog (5.2.5) • Examples
What is Datalog? Datalog is a nonprocedural query language based on the logic-programming language Prolog. A user describes the information desired without giving a specific procedure for obtaining that information. Datalog simplifies writing simple queries and makes query optimization easier.
Terminology Variable A variable is like a variable in a mathematical equation, it represents some constant that has yet to be determined. By convention variables always start with an upper case letter, e.g., City, Firstname,Code, A, B, C, etc Constant A constant is an attribute value. By convention all constants begin with lower-case letters such as: sammy, joe, xxy2003, etc fact A fact is the same things a tuple in a relation. A fact has the following structure. (called Ground facts, because there are no variables) <predicate name>(<list of constants>) Example: Facts about student enrollment: The following set of facts shows some students and the subjects in which they are enrolled. enrolledIn(joe, cp2001) enrolledIn(sue, cp2001) enrolledIn(joe, cp2003). In relational database terms, it would be the same as the following relation. enrolledIn Rules A rule is a way to derive new facts. It is part of a query.
enrolledIn The major difference in the representation is that in the Datalog facts the attributes are not named. So there is nothing to tell us that the first attribute is a Name and the second is a Code.
Syntax of Datalog Rules A rule has the following form <head> :-<body> Head is a single predicate and the body is a list of predicates. Each predicate has the form positive literal: <relation name>(<list of constants or variables>) negative literal: not <relation name>(<list of constants or variables>) What does :- mean? Assume we have a rule: Q :- P Then :- means, if P is true then Q is true Example: A rule about students The following rule identifies courses students are enrolled in.
result(Name, Code, Phone) :- student(Name, Phone), enrolledIn(Name, Code). student enrolledIn Test every combination of student and enrooledIn that unifies student(joe,408-489-5454), enrolledIn(joe, CP2004) student(joe,408-489-5454), enrolledIn(sue, CP2001) student(joe,408-489-5454), enrolledIn(joe, CP2003) Result: result(joe, CP2004, 408-489-5454) result(joe, CP2003, 408-489-5454)
Safety It may be possible to write rules that generate an an infinite number of answers. For example: employee(X,Y) :- X>Y Since the relation defining > is infinite, this rule would generate an infinite number of facts for the relation employee. Also the use of negation can cause similar problems. For example: not-in-loan(L, B, A) :- not loan(L, B, A) We what to retrieve all tuples that are not in the loan relation but if account numbers, branch-name and balance is infinite then the relation not-in-loan would also be infinite. To avoid such problems Datalog rules are required to satisfy the following safety conditions: 1. Every variable that appears in the head of the rule also appears in a nonarithmetic positive literal in the body of the rule 2. Every variable appearing in a negative literal in the body of the rule also appears in some positive literal in the body of the rule.
Recursion in Datalog Suppose now that we want to find out which employees are supervised, directly or indirectly by a given manager, "Jones". (1) People whose manager is "Jones" (2) People whose manager is supervised by "Jones" Ex. Tim manager is Jones Kelly manager is Tim employee-jones(X) :- manager(X, "Jones") employee-jones(X) :- manager(X,Y), employee-jones(Y) Procedure I = set of facts in the database repeat Old_I = I I = I U infer( R, I) until I = Old_I
employee-jones(X) :- manager(X, "Jones") employee-jones(X) :- manager(X,Y), employee-jones(Y) manager relation
Relational Operations in Datalog student(Name,StudentID) courses(Cname, StudentID) Relational Algebra: names = Πname(student) Datalog Rule: names(Name) :- student(Name,_) Relational Algebra: joeInfo = ΠstudentID(σname="joe"(student)) Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID) Relational Algebra: ΠcoursName(σname="joe"(student |><| courses)) Datalog Rule: joeInfo(Name,Cname) :- student(joe, StudentID), courses(Cname, StudentID)
Examples Relational Algebra: joeInfo = ΠstudentID(σname="joe", studentID > 25(student)) Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID), StudentID > 25 Relational Algebra: ΠcName(courses) |><| Πname (student) Datalog Rule: joeInfo(Name,Cname) :- student(Name, StudentID), courses(Cname, StudentID) Relational Algebra: ΠstudentID(σname "joe", studentID > 25(student)) Datelog Rule: joeInfo(StudentID) :- student(Name,StudentID), StudentID > 25, Name <> joe