530 likes | 661 Views
Structure of this quarter. Database Fundamentals. Relational Model. Normalization. ConceptualModeling. Query Languages. AdvancedSQL. Transaction Management. Java DB Applications
E N D
1. MIS415 Module 2Query Languages –SQL Arijit Sengupta
2. Structure of this quarter
3. Today’s Buzzwords Query Languages
Formal Query Languages
Procedural and Declarative Languages
Relational Algebra
Relational Calculus
SQL
Aggregate Functions
Nested Queries
4. Objectives At the end of the lecture, you should
Get a formal as well as practical perspective on query languages
Have a background on query language basics (how they came about)
Be able to write simple SQL queries from the specification
Be able to look at SQL queries and understand what it is supposed to do
Be able to write complex SQL queries involving nesting
Execute queries on a database system
5. Set Theory Basics A set: a collection of distinct items with no particular order
Set description:
{ b | b is a Database Book}
{c | c is a city with a population of over a million}
{x | 1 < x < 10 and x is a natural number}
Most basic set operation:
Membership: x ? S (read as x belongs to S if x is in the set S)
6. Other Set Operations Addition, deletion (note that adding an existing item in the set does not change it)
Set mathematics:
Union R ? S = { x | x ? R or x ? S}
Intersection R ? S = { x | x ?R and x ? S}
Set Difference R – S = { x | x ? R and x ? S}
Cross-product R x S = { <x,y> | x ? R and y ? S}
You can combine set operations much like arithmetic operations: R – (S ? T)
Usually no well-defined precedence
7. Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database.
Relational model supports simple, powerful QLs:
Strong formal foundation based on logic.
Allows for much optimization.
Query Languages != programming languages!
QLs not expected to be “Turing complete”.
QLs not intended to be used for complex calculations.
QLs support easy, efficient access to large data sets.
8. Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:
Relational Algebra: More operational, very useful for representing execution plans.
Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)
9. Structured Query Language Need for SQL
Operations on Data Types
Definition Manipulation
Operations on Sets
Declarative (calculus) vs. Procedural (algebra)
Evolution of SQL
SEQUEL ..SQL_92 .. SQL_93
SQL Dialects
Does SQL treat Relations as ‘Sets’?
10. Preliminaries A query is applied to relation instances, and the result of a query is also a relation instance.
Schemas of input relations for a query are fixed (but query will run regardless of instance!)
The schema for the result of a given query is also fixed! Determined by definition of query language constructs.
Positional vs. named-field notation:
Positional notation easier for formal definitions, named-field notation more readable.
Both used in SQL
11. Example Instances Students, Registers, Courses relations for our examples.
12. Basic operations:
Selection ( ) Selects a subset of rows from relation.
Projection ( ) Deletes unwanted columns from relation.
Cross-product ( ) Allows us to combine two relations.
Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
Union ( ) Tuples in reln. 1 and in reln. 2.
Additional operations:
Intersection, join, division, renaming: Not essential, but (very!) useful.
Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) Relational Algebra
13. Projection Deletes attributes that are not in projection list.
Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation.
Projection operator has to eliminate duplicates! (Why??)
Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)
14. Vertical Slices Projection
Specifying Elements
15. Does SQL treat Relations as ‘Sets’? What are the different salaries we pay to our employees?
select salary
from EMPLOYEE;
OR is the following better?
select DISTINCT salary
from EMPLOYEE;
16. Selection Selects rows that satisfy selection condition.
No duplicates in result! (Why?)
Schema of result identical to schema of (only) input relation.
Result relation can be the input for another relational algebra operation! (Operator composition.)
17. Horizontal Slices Restriction
Specifying Conditions
18. Specifying Conditions
19. Pattern Matching
20. Missing or Incomplete Information
21. Horizontal and Vertical Query:
List all student ID, names and addresses who have
GPA > 3.0 and date of birth before Jan 1, 1980.
select StudentID, Name, Address
from STUDENT
where GPA > 3.0 and DOB < ‘1-Jan-80’
order by Name DESC;
Algebra: ? StudentID,name, address (? GPA > 3.0 and DOB < ‘1-Jan-80’ (STUDENT))
Calculus: {t.StudentID, t.name, t.address | t ? Student ?t.GPA > 3.0 ?
t.DOB < ‘1-Jan-80’}
Order by sorts result in descending (DESC) order.
Note: The default order is ascending (ASC) as in:
order by Name;
22. Union, Intersection, Set-Difference All of these operations take two input relations, which must be union-compatible:
Same number of fields.
`Corresponding’ fields have the same type.
What is the schema of result?
23. Union
24. Union Compatibility
25. Union, Intersect, Minus
26. Cross-Product Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.
Conflict: Both S1 and R1 have a field called sid.
27. Joins Condition Join:
Result schema same as that of cross-product.
Fewer tuples than cross-product, might be able to compute more efficiently
Sometimes called a theta-join.
28. Joins Equi-Join: A special case of condition join where the condition c contains only equalities.
Result schema similar to cross-product, but only one copy of fields for which equality is specified.
Natural Join: Equijoin on all common fields.
29. Find names of students who have taken course #103
30. Connecting/Linking Relations
31. Join
32. Join Forms General Join Forms
Equijoin
Operator Dependent
Natural Join
Outer Join
Left
Right
Full
33. Find names of students who have taken a CIS course Information about departments only available in Courses; so need an extra join:
34. Find students who have taken an MIS or a CS course Can identify all MIS or CS courses, then find students who have taken one of these courses:
35. Find students who have taken a CIS and an ECI Course Previous approach won’t work! Must identify students who have taken CIS courses, students who have taken ECI courses, then find the intersection (note that sid is a key for Students):
36. Relational Calculus Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).
Calculus has variables, constants, comparison ops, logical connectives and quantifiers.
TRC: Variables range over (i.e., get bound to) tuples.
DRC: Variables range over domain elements (= field values).
Both TRC and DRC are simple subsets of first-order logic.
Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.
37. Find students with GPA > 3.7 who have taken a CIS Course
38. Find students who have taken all CIS courses
39. Monotonic and Non-Monotonic Queries Monotonic queries: queries for which the size of the results either increase or stay the same as the size of the inputs increase. The result size never decreases
Non-monotonic queries: queries for which it is possible that the size of the result will DECREASE when the size of the input increases
Examples of each?
Which of the algebra operations is non-monotonic?
What does this signify?
40. Summaries and Aggregates
41. Aggregate Functions
42. Grouping Results Obtained
43. Selections on Groups
44. Grouping Results after Join
45. Nesting Queries SELECT attribute(s)
FROM relation(S)
WHERE attr [not] {in | comparison operator | exists }
( query statement(s) );
List names of students who are taking “BA201”
select Name
from Student
where StudentID in
( select StudentID
from REGISTRATION
where course#=‘BA201’);
46. Sub Queries
47. Sub Queries
48. Relational Views Relations derived from other relations.
Views have no stored tuples.
Are useful to provide multiple user views.
49. View Creation Create View view-name [ ( attr [ , attr ] ...) ]
AS subquery
[ with check option ] ;
DROP VIEW view-name;
Create a view containing the student ID, Name, Age and GPA for those who are qualified to take 300 level courses, i.e., GPA >=2.0.
50. View Options With Check Option enforces the query condition for insertion or update
To enforce the GPA >=2.0 condition on all new student tuples inserted into the view
A view may be derived from multiple base relations
Create a view that includes student IDs, student names and their instructors’ names for all CIS 300 students.
51. View Retrieval Queries on views are the same as that on base relations.
Queries on views are expanded into queries on their base relations.
select Name, Instructor-Name
from CIS300-Student
where Name = Instructor-Name;
52. View: Update Update on a view actually changes its base relation(s)!
update Qualified-Student
set GPA = GPA-0.1
where StudentID = ‘s3’;
insert into Qualified-Student
values ( ‘s9’, ‘Lisa’, 4.0 )
insert into Qualified-Student
values ( ‘s10’, ‘Peter’, 1.7 )
Why are some views not updateable?
What type of views are updateable?
53. Non-monotonic queries – again! Need to use either MINUS or NOT EXISTS!
Find courses where no student has gpa over 3.5
Find students who have taken all courses that Joe has taken
How would you solve these?
54. Summary SQL is a low-complexity, declarative query language
The good thing about being declarative is that internally the query can be changed automatically for optimization
Good thing about being low-complexity?
No SQL query ever goes into an infinite loop
No SQL query will ever take indefinite amount of space to get the solution
Can be used for highly complex problems!