1k likes | 1.1k Views
CPSC 534A – Background. Rachel Pottinger January 13 and 18, 2005. Administrative notes. Please note you’re supposed to sign up for one paper presentation and one discussion… for different papers Please sign up for the mailing list WebCT has been populated – make sure you can access it
E N D
CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005
Administrative notes • Please note you’re supposed to sign up for one paper presentation and one discussion… for different papers • Please sign up for the mailing list • WebCT has been populated – make sure you can access it • HW 1 is on the web, due beginning of class a week from today
Overview of the next two classes • Relational databases • Entity Relationship (ER) diagrams • Object Oriented Databases (OODBs) • XML • Other data types • Database internals (Briefly) • An extremely brief introduction to category theory Metadata management examples are interspersed
Relational Database Basics • What’s in a relational database? • Relational Algebra • SQL • Datalog
Relational Data Representation Attribute names or columns PName Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Relation or table Tuples or rows
Relational schema representation • Every attribute has an atomic type (e.g., Char, integer) • Relation Schema: Column headings:relation name + attribute names + attribute types Product(VarChar PName, real Price, VarChar Category, VarChar Manfacturer)often types are left off:Product(PName, Price, Category, Manfacturer) • Relation instance: The values in a table. • Database Schema: a set of relation schemas in the database. • Database instance: a relation instance for every relation in the schema.
Querying – Relational Algebra • Select()- chose tuples from a relation • Project ()- chose attributes from relation • Join (⋈) - allows combining of 2 relations • Set-difference ( ) Tuples in relation 1, but not in relation 2. • Union( ) • Cartesian Product(×) Each tuple of R1 with each tuple in R2
Find products where the manufacturer is GizmoWorks Product Selection: σManufacturer = GizmoWorksProduct
Find the Name, Price, and Manufacturers of products whose price is greater than 100 Product Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product)
Find the product names and price of products that cost less than $200 and have manufacturers where there is a Company that has a CName that matches the manufacturer, and its country is Japan Product Company πPName, Price((σPrice < 200Product)⋈ Manufacturer = Cname (σCountry= ‘Japan’Company))
When are two relations related? • You guess they are • I tell you so • Constraints say so • A key is a set of attributes whose values are unique; we underline a key Product(PName, Price, Category, Manfacturer) • Foreign keys are a method for schema designers to tell you so • A foreign key states that an attribute is a reference to the key of another relationex: Product.Manufacturer is foreign key of Company • Gives information and enforces constraint
SQL • Data Manipulation Language (DML) • Query one or more tables • Insert/delete/modify tuples in tables • Data Definition Language (DDL) • Create/alter/delete tables and their attributes • Transact-SQL • Idea: package a sequence of SQL statements server
Querying – SQL Standard language for querying and manipulating data Structured Query Language • Many standards out there: • ANSI SQL • SQL92 (a.k.a. SQL2) • SQL99 (a.k.a. SQL3) • Vendors support various subsets of these • What we discuss is common to all of them
SQL basics • Basic form: (many many more bells and whistles in addition) Select attributes From relations (possibly multiple, joined) Where conditions (selections)
SQL – Selections SELECT * FROM Company WHERE country=“Canada” AND stockPrice > 50 Some things allowed in the WHERE clause: attribute names of the relation(s) used in the FROM. comparison operators: =, <>, <, >, <=, >= apply arithmetic operations: stockPrice*2 operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.
SQL – Projections Select only a subset of the attributes SELECT name, stock price FROM Company WHERE country=“Canada” AND stockPrice > 50 Rename the attributes in the resulting table SELECT name AS company, stockPrice AS price FROM Company WHERE country=“Canada” AND stockPrice > 50
SQL – Joins SELECT name, store FROM Person, Purchase WHEREname=buyer AND city=“Vancouver” AND product=“gizmo” Product ( name, price, category, maker) Purchase (buyer, seller, store, product) Company (name, stock price, country) Person( name, phone number, city)
Selection: σManufacturer = GizmoWorks(Product) Product What’s the SQL?
Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product) Product What’s the SQL?
πPName, Price((σPrice <= 200Product)⋈ Manufacturer = Cname (σCountry = ‘Japan’Company)) Product Company What’s the SQL?
More SQL – Outer Joins • What happens if there’s no value available? Product Company Select pname, Country From Product, Company Where Manufacturer = Cname Select pname, Country From Product outer join Company on Manufacturer = Cname
Querying – Datalog • Enables expressing recursive queries • More convenient for analysis • Some people find it easier to understand • Without recursion but with negation it is equivalent in power to relational algebra and SQL • Limited version of Prolog (no functions)
Datalog Rules and Queries A datalog rule has the following form: head :- atom1, atom2, …, atom,… You can read this as then :- if ... ExpensiveProduct(N) :- Product(N,M,P) & P > $100 CanadianProduct(N) :- Product(N,M,P) & Company(M, “Canada”, SP) IntlProd(N) :- Product(N,M,P) & NOT Company(M, “Canada”, SP) Arithmetic comparison or interpreted predicate Distinguished variable Subgoal or EDB Existential variables constant Negated subgoal - also denoted by ¬ Head or IDB
Conjunctive Queries • A subset of Datalog • Only relations appear in the right hand side of rules • No negation • Functionally equivalent to Select, Project, Join queries • Very popular in modeling relationships between databases
Selection: σManufacturer = GizmoWorks(Product) Product What’s the Datalog?
Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product) Product What’s the Datalog?
πPname,Price((σPrice <= 200Product)⋈ Manufacturer = Cname (σCountry = ‘Japan’Company)) Product Company What’s the Datalog?
Bonus Relational Goodness: Views Views are relations, except that they are not physically stored. (Materialized views are stored) They are used mostly in order to simplify complex queries and to define conceptually different views of the database to different classes of users. Used also to model relationships between databases View: purchases of telephony products: CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.name AND Product.category = “telephony”
Summarizing Relational DBs • Relational perspective: Data is stored in relations. Relations have attributes. Data instances are tuples. • SQL perspective: Data is stored in tables. Tables have columns. Data instances are rows. • Query languages • Relational algebra – mathematical base for understanding query languages • SQL – very widely used • Datalog – based on Prolog, very popular with theoreticians • Views allow complex queries to be written simply
Data Integration:Planning a Beach Vacation Beach Good Weather Cheap Flight Fodors weather.com wunderground AAA Expedia Orbitz
Data Integration System Architecture User Query Mediated Schema Virtual database “Airport” Local Schema 1 Local Schema N Local Database 1 Local Database N Expedia Orbitz
Data Translation Data exists in two different schemas. You have data in one, and you want to put data into the other • How are the schemas related to one another? • How do you change the data from one to another?
Data Warehousing Data Warehouses store vast quantities of data for fast query processing, but only batch updating. • Import schemas of data sources • Identify overlapping attributes, etc. • Build data cleaning scripts • Build data transformation scripts • Enable data lineage tracing
Schema Evolution and Data Migration Schemas change over time; data must change with it. • How do we deal with schema changes? • How can we make it easy for the data to migrate • How do we handle applications built on the old schema that store in the new database?
Outline • Relational databases • Entity Relationship (ER) diagrams • Object Oriented Databases (OODBs) • XML • Other data types • Database internals (Briefly) • An extremely brief introduction to category theory
Entity / Relationship Diagrams Entities Attributes Relationships between entities Product address buys
Keys in E/R Diagrams • Every entity set must have a key name category price Product
name category name price makes Company Product stockprice buys employs Person name sin address
Multiplicity of E/R Relations 1 2 3 1 2 3 1 2 3 a b c d a b c d a b c d • one-one: • many-one • many-many
name category name price makes Company Product stockprice What doesthis say ? buys employs Person name sin address
Roles in Relationships What if we need an entity set twice in one relationship? Product Purchase Store buyer salesperson Person
Attributes on Relationships date Product Purchase Store Person
Subclasses in E/R Diagrams name category price Product isa isa Software Product Educational Product platforms Age Group
Keys in E/R Diagrams name category Underline: price Product No formal way to specify multiple keys in E/R diagrams Person name SIN address
From E/R Diagramsto Relational Schema • Entity set relation • Relationship relation
Entity Set to Relation name category price Product Product(name, category, price) name category price gizmo gadgets $19.99
Relationships to Relations price name category Start Year name makes Company Product Stock price Makes(product-name, product-category, company-name, year)Product-name Product-Category Company-name Starting-year gizmo gadgets gizmoWorks 1963 (watch out for attribute name conflicts)
Relationships to Relations price name category Start Year name makes Company Product Stock price No need for Makes. Modify Product: name category price StartYear companyName gizmo gadgets 19.99 1963 gizmoWorks
Multi-way Relationships to Relations address name Product Purchase Store price name Person Purchase( , , ) sin name