1.92k likes | 2.21k Views
Relational Databases and SQL. Session 1 An Introduction. Outline: Whole Course. The Relational Model. Introduction to SQL. Relational Database Systems. Example Database Systems. Database Design and Programming. Database Programming Examples. Outline: Relational Model and SQL.
E N D
Relational Databasesand SQL Session 1 An Introduction
Outline: Whole Course • The Relational Model. • Introduction to SQL. • Relational Database Systems. • Example Database Systems. • Database Design and Programming. • Database Programming Examples. Chris Smith, BRC, April 2004
Outline: Relational Model and SQL • The Relational Model • History • The Relational Model Summarized • Tables and Keys • Relational Algebra • SQL • History • Data Manipulation Language • Data Definition Language • Relational Databases. • What are they? • Why use one? Chris Smith, BRC, April 2004
The Relational Model: History • Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks.E.F.Codd, IBM Research Report RJ599 (August 1969) • A Relational Model of Data for Large Shared Data Banks.E.F.Codd, CACM 13 No. 6 (June 1970) • Research and systems developed in the 1970’s. (e.g. Ingres, Oracle) Chris Smith, BRC, April 2004
The Relational Model • Summary of Codd’s work:Data should be represented as relations (tables). Chris Smith, BRC, April 2004
Properties of Tables • A table has a unique name (in some scope). • Each cell of the table can contain an “atomic” value only. • First normal form (“no repeating groups”) • Each column has a unique name (within the table). • Values in a column all come from the same domain. • Each row in the table is distinct. • Part of the model but not actually enforced! Chris Smith, BRC, April 2004
Relational Model: Jargon We will generally use Alternative 1. Chris Smith, BRC, April 2004
Defining a Table • A table is defined by giving a set of attribute and domain name pairs. • This is called a Table Schema (or Relation Schema). • A Relational Database Schema is a named set of relation schemas. • We’ll just say “schema”, or “database schema” if needed. Chris Smith, BRC, April 2004
Keys • For practical purposes we want to be able to identify rows in our tables. • We use keys for this. • A key is just a set of columns in the table. • Quite frequently just one column is enough, and quite often it is obvious what it should be. • There are rules of thumb regarding choosing keys which we will see later. Chris Smith, BRC, April 2004
Keys: Jargon Chris Smith, BRC, April 2004
NULL Values • A special value “NULL” is provided to allow for cells in a table that have an unspecified value. • NULL is not the same as zero or the empty string, but represents complete absence of a value. • Incorporation of NULL in the relational is contentious – but it’s here to stay. • No part of a primary key may be NULL. Chris Smith, BRC, April 2004
Example Schema DeZign for databases, v2.5.2 http://www.datanamic.com Chris Smith, BRC, April 2004
Hierarchical Data • The restriction to one atomic piece of data per cell precludes adding hierarchical data directly to a table. • Use a separate table and a foreign key instead. • All “spots” are gathered into one table and connected to their owner by the foreign key. • Using multiple tables helps reduce redundancy e.g. gene annotation text is not duplicated for every spot with that gene. Chris Smith, BRC, April 2004
Relational Algebra • We have seen how to define tables (relations). We want to be able to manipulate them too. • “The relational algebra is a theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s).”(“Database Systems” Connolly and Begg.) Chris Smith, BRC, April 2004
Relational Algebra:Unary Operations • Selection • Take a subset of rows from a table (on some criterion). • Projection • Take a subset of columns from a table. Chris Smith, BRC, April 2004
Relational Algebra:Binary Operations 1 • Union • Return all rows from two tables. • The two tables must have columns with the same domains (union compatibility). • Intersection • Return all matching rows from two tables. • Difference • Return all rows from one table not in another. • The two tables must be union compatible. Chris Smith, BRC, April 2004
Relational Algebra:Binary Operations 2 • Cartesian Product • Concatenate every row from one table with every row from another. • Join • Not really a separate operation: can be defined in terms of cartesian product and selection. • Is very important. Chris Smith, BRC, April 2004
Relational Database Management System (RDBMS) • Implements the relational model and relational algebra (under the covers). • Provides a language for managing relations. • Provides a language for accessing and updating data. • Provides other services: • Security • Indexing for efficiency. • Backup services (maybe). • Distribution services (maybe). Chris Smith, BRC, April 2004
RDBMS Implementation • An RDBMS is usually implemented as a server program. • Client programs communicate with the server (typically using TCP/IP). • In Unix-based systems the server will run as a daemon. • In Windows it will run as a service. Chris Smith, BRC, April 2004
SQL History • Structured Query Language. • Officially pronounced S-Q-L, but many people say “sequel”. • Has its roots in the mid-1970’s. • Standardized in 1986 (ANSI), 1987 (ISO) • Further standards in 1992 (ISO SQL2 or SQL-92), 1999 (ISO SQL3). Chris Smith, BRC, April 2004
SQL Today • SQL is the only database language to have gained broad acceptance. • Nearly every database system supports it. • The ISO SQL standard uses the “Table, Row, Column” terminology rather than “Relation, Tuple, Attribute”. • Some debate about how closely SQL adheres to the relational model. • Many different dialects from different vendors. Chris Smith, BRC, April 2004
SQL • SQL is divided into two parts: • Data Manipulation Language • Data Definition Language • Originally designed to be used from another language and not intended to be a complete programming language in its own right. • Non-procedural. Define what you want, not how to get it. • Supposed to be “English Like”! Chris Smith, BRC, April 2004
SQL: Syntax • Can be a little arcane. • String literals are surrounded by single quotes. Numeric literals are not enclosed in quotes. • SELECT price FROM item_table WHERE description = ‘Mug’ Chris Smith, BRC, April 2004
SQL: Data Manipulation Language • Statements • SELECT • INSERT • UPDATE • DELETE Chris Smith, BRC, April 2004
SQL: SELECT • SELECT is the real workhorse of SQL • It can perform the selection, projection and join operations of the relational algebra. • And gets quite complicated. • “Selects” rows from a table. • A database “query”. • SELECT [DISTINCT] {*|[column_expression [ AS name]] [,…]}FROM table_name [alias] [,…] [WHERE condition][GROUP BY column_list] [HAVING condition][ORDER BY column_list [ASC|DESC] ] • “Condition” is an expression composed of column names (as variables) and comparison operators. • The values of the variables range over all entries in the table. Chris Smith, BRC, April 2004
SQL Operators • =, <> • IS NULL, IS NOT NULL • IN (value_list) • LIKE • For string comparison with % and _ wildcards. • Standard SQL LIKE is case sensitive. • PostgreSQL has ILIKE for case insensitivity. • MySQL’s LIKE is case insensitive (but you can use the BINARY keyword to force case insensitivity). • Regular expressions pattern matching (not standard) • PostgreSQL: ~ • MySQL: REGEXP • Arithmetic operators (+,-,*,/,%) • Boolean operators (AND, OR, NOT) • And more… Chris Smith, BRC, April 2004
SQL: SELECT • SELECT specifies which columns to return (columns can be renamed with “AS”). • FROM specifies the table(s) being considered. • WHERE restricts the rows being considered using some criterion. • WHERE works strictly on a row-by-row basis. • GROUP BY essentially executes the query for each value specified in the group clause. Returns one row for each such value. • HAVING allows you to restrict the groups being considered. • ORDER BY sorts the results. Chris Smith, BRC, April 2004
SQL: SELECT • SELECT 1 • SELECT 1+SQRT(2) • SELECT USER • SELECT * FROM ath1_results • SELECT * FROM ath1_results WHERE experiment = ‘G’ • SELECT COUNT(*) FROM ath1_results • SELECT * FROM ath1_results WHERE value > 50 • SELECT clone_norm, COUNT(DISTINCT function) FROM quant_genes_temp GROUP BY clone_norm HAVING COUNT(DISTINCT function) > 1 Chris Smith, BRC, April 2004
SQL: Aggregate Functions • Can only appear in the SELECT clause, or a HAVING clause (not in a WHERE clause: WHERE applies to single rows). • SUM, AVG, COUNT, MIN, MAX • Different systems provide others e.g. PostgreSQL has STDDEV and VARIANCE, MySQL has STDDEV. • Can use DISTINCT inside the parentheses: COUNT(DISTINCT name) • Can use COUNT(*) to count number of rows. • Apart from COUNT(*), NULLs are ignored. Chris Smith, BRC, April 2004
SQL: JOIN • Cartesian product and selection from relational algebra. • Joining large tables can be very, very slow (because of the product step): make sure you limit the results as much as possible. • Different types of join determine behavior on mismatches. • LEFT JOIN includes rows with values on the left, but no matching value on the right etc. • Joins recreate the spreadsheet view from a hierarchical view of the data. Chris Smith, BRC, April 2004
SQL: JOIN Examples Chris Smith, BRC, April 2004
SQL: JOIN Examples • SELECT COUNT(*) FROM trait_measurement m • SELECT COUNT(*) FROM trait_measurement m, technician c WHERE m.technician_id = c.technician_id • SELECT COUNT(*) FROM trait_measurement m LEFT JOIN technician c ON m.technician_id = c.technician_id • SELECT COUNT(*) FROM trait_measurement m FULL JOIN technician c ON m.technician_id = c.technician_id Chris Smith, BRC, April 2004
SQL: JOIN Types and Syntax • JOIN Types • INNER JOIN • Only exact matches. • CROSS JOIN • Every pair of rows. • OUTER JOIN • LEFT or RIGHT. • FULL JOIN • Mismatches on both sides. • JOIN conditions • ON condition • USING (columnName,…) • NATURAL • Short for “USING all columns with matching names” Chris Smith, BRC, April 2004
SQL: UNION, EXCEPT, INTERSECT • (SELECT …) UNION [ALL] (SELECT …) • (SELECT …) EXCEPT [ALL] (SELECT …) • (SELECT …) INTERSECT [ALL] (SELECT …) • INTERSECT is supported by PostgreSQL, but not MySQL (no big deal). • Results of SELECTs must match. • Returns table consisting of distinct results from both SELECTs, unless ALL is specified. Chris Smith, BRC, April 2004
SQL: LIMIT and OFFSET • Sometimes we want to limit the number of results returned by a query. • Especially useful on web sites for dividing many result rows between pages. • SELECT … LIMIT n OFFSET m • Not always supported: but both MySQL and PostgreSQL have it. Chris Smith, BRC, April 2004
Other Functions • SQL allows other functions in SELECT statements. • Highly dependent on the particular RDBMS being used. • Some standard ones: • CURRENT_DATE • CURRENT_TIME • SUBSTRING • || (string concatenation) • LOWER, UPPER Chris Smith, BRC, April 2004
SQL: INSERT • INSERT INTO table_name [(column_list)] VALUES (value_list) • INSERT INTO table_name [(column_list)] SELECT … • Column_list is optional, but if not provided you must give values for all columns. • Defaults can be specified when the table is created. • Second form allows moving data from table to table. Chris Smith, BRC, April 2004
SQL: UPDATE • UPDATE table_name SET col1 = val1[, col2 = val2 …] [WHERE condition] • In general cannot UPDATE based on data in other tables. • Both MySQL and PostgreSQL provide an extension to allow this. Chris Smith, BRC, April 2004
SQL: DELETE • DELETE FROM table_name [WHERE condition] • (Too) easy to delete everything from a table. • In general cannot DELETE based on data in other tables. • MySQL provides an extension to allow this. • PostgreSQL does not. Chris Smith, BRC, April 2004
SQL: Data Definition Language • Statements: • CREATE • TABLE, VIEW, INDEX • ALTER • TABLE • DROP • TABLE, VIEW, INDEX Chris Smith, BRC, April 2004
SQL: CREATE TABLE • CREATE TABLE ({column_name data_type [DEFAULT value] [, …]}[, PRIMARY KEY (column_list)]) • Simplified! • Integrity mechanisms not shown. Chris Smith, BRC, April 2004
SQL Numeric Data Types • Exact numeric types • NUMERIC [(precision[,scale])] • DECIMAL [(precision[,scale])] • DECIMAL(5,2) means 999.99 • INTEGER (INT) • SMALLINT • BIGINT • Approximate numeric types • FLOAT [(precision)] • REAL • DOUBLE PRECISION Chris Smith, BRC, April 2004
SQL Character Types • CHAR(length) • Short form of CHARACTER • VARCHAR(length) • Short form of CHARACTER VARYING • PostgreSQL allows TEXT type for “long” character data fields. • MySQL has TINYTEXT, TEXT, MEDIUMTEXT and LONGTEXT types! Chris Smith, BRC, April 2004
SQL Date and Time Types • DATE • TIME [WITH TIME ZONE] • TIMESTAMP [WITH TIME ZONE] • INTERVAL • Not available in MySQL. Chris Smith, BRC, April 2004
SQL: CREATE VIEW • A view is a “virtual table”. • Not available in MySQL 4. • Supposed to be coming in version 5. • Created as needed from a SELECT statement given when the view is defined. • CREATE VIEW AS SELECT … • Simplified! • Often used to restrict access to a table (by hiding some columns or rows). • Also used to “hide” complex queries in the database (rather than repeating them in code). Chris Smith, BRC, April 2004
SQL: CREATE INDEX • Used to enhance performance of SELECT’s (may slow down INSERT’s since index must be updated). • Index columns used for frequently for lookup. • Primary key columns are usually automatically indexed. • CREATE INDEX index_name ON table_name (col1 [, …]) Chris Smith, BRC, April 2004
SQL: DROP • DROP is used to remove tables, views and indices from the system. • DROP TABLE table_name • DROP INDEX index_name • DROP VIEW view_name • For a table: all data in the table will be lost. Chris Smith, BRC, April 2004
Creating a Database • Creation of an entire database tends to depend on the RDBMS being used. • Usually allow multiple named databases to be accessed through a single instance of a database server. Chris Smith, BRC, April 2004
When to Use an RDBMS? • Good for large amounts of data. • Indexing capabilities. • Frequent updates: • Insertions of new values • Many different views of the data wanted. • Associations between different entities (foreign keys). • Data integrity. • Constraints. • Transactions. • ACID = Atomicity, Consistency, Isolation, Durability. • Integration with other systems e.g. web pages. • Sharing data between users. Chris Smith, BRC, April 2004
Plain Old Text Files • Can be perfect (even for largish amounts of data). • Easier to hand over to someone else. • Don’t have to say “first install database X”. • Not great for updates to existing values. • No integrity checks (can be made in code). Chris Smith, BRC, April 2004