270 likes | 356 Views
Relational Algebra “ The beginning… ” What is a query? Why use a theoretical basis for designing ‘ queries ’ ? What is a relation? What is relational algebra (RA)? Start with some unary operators in RA Then the basic binary operators. A query. What do we mean by a query to a database?.
E N D
Relational Algebra “The beginning…” What is a query? Why use a theoretical basis for designing ‘queries’? What is a relation? What is relational algebra (RA)? Start with some unary operators in RA Then the basic binary operators
A query • What do we mean by a query to a database?
A query • What do we mean by a query to a database? • Given one or more relations, apply operations that ultimately return a new relation. • Usually we assume the new relation will have different information than the original input relations and will combine the original relations in meaningful ways. • We can write a query accurately in English • We can accurately name the resulting relation in English • We can draw queries as a way of planning them out
SQL is the language used for talking to a relational database management system • Allows • changing the schema (“data definition”), • changing the data (“data modification”), and • querying • But the querying part of SQL isn't very good • Confusing syntax that is hard to read • Major differences from one DBMS to the next • We teach you relational algebra (RA) for querying • We use a graphical representation of RA • It's easy to convert RA into SQL
A relational database is a set of relations • A relation is a set of rows and a set of columns • A relation has a name • A relation has an identifier which is a set of columns • We consistently underline columns in the identifier • Rules: • A row has a value for each column in the identifier • (A row may or may not have values for columns that are not in the identifier) • No two rows have the same values in all the identifying columns
Is this a relation? • Oscar
Is this a relation? • Oscar
Is this a relation? • Oscar
More about relations • No two rows of data are duplicated • implied by our rule about identifying columns • No two columns have the same name • No two relations in a given database have the same name
Relational Algebra • Inputs: • one or two relations • Operators: • work on those relations (or more explicitly columns or rows) • Output: • one relation
RA Operators • We'll look at unary operators first These take a relation and return a relation • project • filter • reduce • Group
Project • Project removes and adds non-identifying columns • Adds computed columns – arithmetic, string manipulation, etc. • result: • as wide as number of columns named, • which must include the identifying columns • exactly as tall as original relation • identifier is same • SQL: • Select columns from relation;
Project example oscar Project Id: ??? Carry: Movie Compute: age = currentYear - BirthYear ?????
Project example oscar Project Id: First_name, Last_name, Award Carry: Movie Compute: age = currentYear - BirthYear First_name, Last_name, Award, Movie, Actor_age of Oscar
Filter • Filter removes rows based on a condition • result: • as wide as the original relation • shorter or exactly as tall as the original relation • identifier is same • SQL: • Select * from relation where condition;
Filter example: • Find each oscar award for the movie “The Blind Side”
What algorithm do you think Filter uses? • Trick question: depends on the DBMS and the configuration!
We often combine project and filter • An activity you could try on your own • Go to sqlzoo.net • Try tutorial 0 • Reverse engineer the example queries into relational algebra
Reduce • Reduce changes the identifier, • result: • narrower or the same width as the original relation • possibly shorter than the original relation • unsafe! • SQL: • Select distinctnew_identifier from original relation • Let’s try it with the tiny database: • Practice Query #3
Group • Group changes the identifier, combining duplicate rows using an aggregate function • result: • narrower, the same width, or wider than the original relation • possibly shorter than the original relation • unsafe! • SQL (2 basic types): Select distinct new_identifier, aggregate_function from relation group by new_identifier Select aggregate_function from relation
Tiny database SQL (2 basic types): Select distinct new_identifier, aggregate_function from relation group by new_identifier Select aggregate_function from relation Practice queries 12, 16