1.29k likes | 1.3k Views
This training provides a quick start to using relational databases, covering concepts such as data normalization, simple selects, DML, DCL, and advanced topics. By the end, participants will be able to manipulate and work with data stored in databases using SQL.
E N D
Wéber André RELATIONAL MODEL AND SQL LANGUAGE
Objective of the training • After this training, the trainee • understands the relational theory and the SQL language. • can manipulate and work with data stored in databases using the SQL language. • has the required knowledge and background to study and understand any type of database.
Methods and tools • Training session • Teaching from PPT. • Examples and stories • Open discussion • Exercises • Individual exercises are proposed.
Methods and tools • The objectives of the training is to give you a quick start using relational databases
Agenda • Day 1 : Introduction, Data Normalization, Simple selects, exercises. • Day 2 : DML (select/insert/update), exercises. • Day 2 : DCL, advanced topics, exercises.
General scheme of our training • What is a database and what is a DBMS. • Different database models: • Hierarchical • Network • Relational • Object • The relational databases • Fundamental concepts. • Different activities • Entity & relations models • SQL
What is a database and what is a DBMS. • A database can be viewed as • On or more files • A set of structured data • The database is managed by a DBMS. In most cases it‘s a server (exception: Ms-Access). • A DBMS can manage one or more databases. • It is possible to create links between databases. • The responsabilities of a DBMS are: • To manage the data and the definitions of the data structures (meta data). • To manage the security. • To manage the data integrity. • To give access to client applications. • To manage isolation between transactions. • ...
Databases models • Different models exist : • Hierarchical : nodes of data (set of fields/value) with parent-child relations forming a hierachical tree : a child may have only one parent • Network : nodes of data with relations between any nodes : it forms a network : a child may have many parents. In hierachical and network databases to retrieve a node you have to navigate through relations. • Relational : tables of data. A table is a set of records having the same structure and the same meaning (entities of the same type). A record is composed of fields; each field has a value. A one to one or one to many relation is modelized as a data field. • Object/Relational Model : extension to the relational model to work with objects. • ... • In this training session, we will only cover in the relational database model in details.
Relational model • Values are atomic (we can‘t access only to a part of a value) • The sequence of rows is insignificant • In the different rows of the same table, the column values have the same type (the type is defined for the column) • The sequence of columns is insignificant • On each table you have to define a primary key (on a single column or on many). The primary key identifies the record. Values in primary key should be unique. Values in column participating to a primary key can‘t be null. • On each table you may define indexes (on a single column or on many). Searches with creteria related to indexes are more efficient.
Where are we in our general scheme? • What is a database and what is a DBMS. • Different database models: • Hierarchical • Network • Relational • The relational databases (RDB) • Fundamental concepts. • Different actors & activities using RDB • Entity & relations models • SQL
Relational data bases – fundamental concepts • Data stored in tables. • Tables formed of rows and columns. • Primary keys. • Surrogate. • Foreign keys. • Integrity constraints.
Database objects • Schemas : collection of tables, indexes and views • Tables : storage structure • Views : stored select SQL • Indexes : definition of indexation
Primary key, surrogate key, foreign key. • A primary key is the field that will identify each record in a table. • A surrogate key is an artificial column added to a relation to serve as a primary key: • Often supplied by the DBMS (usage of a sequence) • Short, numeric and never changes – an ideal primary key! • Has technical values that are meaningless to users • Normally hidden in forms and reports You will use a surrogate key either for optimization reasons (less data, quick access) or when you have no “natural” key. A foreign key in a table is an identifier of a record of another table (in the table X, the foreign key on the table Y contains values of the primary key of the table Y). In other words, a foreign key is the primary key of one relation that is placed in another relation to form a link between the relations
Referential integrity constraint • A referential integrity constraint is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation
Activities related to the usage of relational databases. • Design : data modeling, data definition (meta-data) with SQL or with a tool • Usage : data manipulation (SQL insert, update, delete) and retrieval (SQL select). • Administration : security management, performance management, data integrity. • Such activities are not always allocated to different people. Design Daily usage Administration t
Where are we in our general scheme? • What is a database and what is a DBMS. • Different database models: • Hierarchical • Network • Relational • The relational databases (RDB) • Fundamental concepts. • Different actors & activities using RDB • Entity & relations models + Normalization • SQL
Data modeling - Entity-relationship model • The entity & relationship model is a way to represent structures and links that can be stored in a relational database. It is a tool for conceptualisation; it is used mainly during the analysis stage. • An entity set is a group of entities having the same type (= a table). The definition of this type is part of the definition of the entity set. • An entity is a flat structure (set of fields) holding the characteristics of a concrete thing (for examples a person, a training session) or of an abstract thing (for examples a knowledge, an objective). (= a table row) • A relation defines the possibility to create links between two or more entities. • An entity-relationship model can be represented by textual expressions or by graphs. • To transform an entity-relationship model into a database model, entities are mapped to tables, relations are mapped as supplementary fields of entities tables or as separated tables.
Entity-relation graphs • Main represented elements are: • Entities sets • Attributes • Relations between entity sets • Cardinality of such relations.
Relations cardinality • Cardinality means “count,” and is expressed as a number. • Maximum cardinality is the maximum number of entity instances that can participate in a relationship. • Minimum cardinality is the minimum number of entity instances that must participate in a relationship. • There are three types of maximum cardinality: • One-to-One [1:1] • One-to-Many [1:N] • Many-to-Many [N:M]
Relations – min cardinality • Minimums are generally stated as either zero or one: • IF zero [0] THEN participation in the relationship by the entity is optional, and no entity instance must participate in the relationship. • IF one [1] THEN participation in the relationship by the entity is mandatory, and at least one entity instance must participate in the relationship.
Data modeling - Logical vs Physical Model • The logical model is created as support of the design phase. Normally, the designer will work on the logical model to obtain a normalized form (the normalization process is explained here after). • The physical model directly represents what will be created (tables and relations) in the database. • The main difference are: • In the physical model all “many to many” relations are represented via an intermediary table. • In the logical model, foreign keys are not shown. • Physical considerations may cause the physical data model to be quite different from the logical data model (de-normalization).
Example of a logical model Person House owns • name • age • sex • cadastralNumber • street • number • city • country • number of rooms
Example of a physical model PersonHouseRel Person House • personName • cadastralId • country • name • age • sex • cadastralId • street • number • city • country • number of rooms
Normalization forms • See http://www.datamodel.org/NormalizationRules.html • Normalization = respect of guide lines. Enforce the data consistence (no redundant data). • 1 NF: each field has a single value, we can’t have a variable number of fields in two records of the same table. • 2 NF: 1NF + a non-key field must give an information (a fact) related to the key, as the whole key, and nothing but the key. • 3 NF: 2NF + a non-key field can not be an information related to another non-key field • 4 NF: 3NF + a table should not contain two multi-valued facts about an entity. • …
PersonName Age Children Gaston 65 Benoit,Françoise André 47 Alice, Pascal, Eloïse 1NF • Each field has a single value, we can’t have a variable number of fields in two records of the same table. • Bad example:
PersonName Age Child1 Child2 Child3 Child4 Child5 Gaston 65 Benoît Françoise Null Null Null André 47 Alice Pascal Eloïse Null Null 1NF • Each field has a single value, we can’t have a variable number of fields in two records of the same table. • Bad example:
FatherName Child Gaston Benoît PersonName age Gaston Françoise Gaston 65 André Alice André 47 André Pascal André Eloïse 1NF • Each field has a single value, we can’t have a variable number of fields in two records of the same table. • A possible solution: table2 table1
Company Country NbEmpl Currency CW BE 450 Euro CW GB 50 GBP CW LU 60 Euro 2NF • 2 NF: A non-key field must give an information (a fact) related to the key, as the whole key, and nothing but the key. • Bad example: Headquarters
Country Currency Company Country NbEmpl BE Euro CW BE 450 GB GBP CW GB 50 LU Euro CW LU 60 2NF • 2 NF: A non-key field must give an information (a fact) related to the key, as the whole key, and nothing but the key. • A possible solution: Countries Headquarters
Country Currency Roundup BE Euro 0.01 GB GBP 0.01 LU Euro 0.01 3NF • A non-key field can not be an information related to another non-key field • Bad example: Countries
Country Currency Currency Roundup BE Euro Euro 0.01 GB GBP GBP 0.01 LU Euro 3NF • A non-key field can not be an information related to another non-key field • A possible solution: Countries Currencies
EmployeeId skill Language awe C FR awe Java EN awe SQL Null xyz Accounting EN 4NF • A table should not contain two multi-valued facts about an entity. • Bad example:
EmployeeId skill EmployeeId Language awe C awe FR awe Java awe EN awe SQL awe Null xyz Accounting xyz EN 4NF • A table should not contain two multi-valued facts about an entity. • A possible solution:
Where are we in our general scheme? • What is a database and what is a DBMS. • Different database models: • Hierarchical • Network • Relational • The relational databases (RDB) • Fundamental concepts. • Different actors & activities using RDB • Entity & relations models + normalization • SQL
The scheme of the SQL part • SQL • Introduction, history, concepts • DML • DDL • DCL • Advanced topics:
SQL - Introduction • Structural Query Language is a standardized language for Databases. Basically, It’s a declarative language (not procedural). • SQL is born in 1974 (called SEQUEL). SQL has been standardized in different versions (see next slide) by ANSI and ISO. • The implementations of SQL for the different RDBMS (Oracle, Sybase, SQL Server, Ms-Access, mySQL) are not always standard. • As many languages, SQL is defined by a dictionary and a grammar. See http://savage.net.au/SQL/ for a “Backus-Naur Form” (BNF) description of this grammar. • In this training session, we will use the SQL92 syntax (SQL2). • Newer versions exist and incorporates some object-oriented concepts, but they are not widely used in commercial RDBMS products.
SQL – non standard features • Oracle 9i SQL versus Standard SQL: http://infolab.stanford.edu/~ullman/fcdb/oracle/or-nonstandard.html • Oracle 8i and Standard SQL: • http://download-uk.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/ap_stand.htm • Comparison of different SQL implementations : http://troels.arvin.dk/db/rdbms/ • Oracle 10g SQL reference: • http://www.acs.ilstu.edu/docs/oracle/server.101/b10759/toc.htm
SQL history • Extract of http://en.wikipedia.org/wiki/SQL
SQL • SQL is composed of different sub-languages: • DDL : data definition language • DML : data manipulation language and transaction control • DCL : data control language, definition of rights • To learn those languages, we first have to cover some items: • naming • data types • literal values (constants) • functions
SQL Naming • You will mainly define names for • Table & views • Table fields & aliases • There are some naming limits (it may differ from a DB to another): • Usable chars : [A-Z] for the first char, [A-Z0-9_] for next chars • Length of names : 30 chars for Oracle and DB2 fields names • Character case in not significant. • You can’t use reserved names (tokens of the SQL language). • You should respect supplementary constraints in naming: • Public conventions: for example, do not prefix names of fields with the name of the table. • Specific conventions.
SQL Naming limitations. • For DB2, see http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001029.htm • For Oracle, • Field and table names can not contain spaces or symbols (except “_”). • Field and table names are limited to 30 characters. • Oracle doesn’t distinguish the character upper/lower cases. • Name pattern : “[A-Z][A-Z0-9_]{0,29}”
SQL – Data types • For Oracle, see http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i54330 • Main data types: • CHAR(len): fixed-length string • VARCHAR2: variable-lengh string • NUMBER(precision,scale): decimal or float number. Precision = total number of digits. Scale = number of digits after the decimal. • DATE : calendar date (in Oracle, this type will include the time). • TIMESTAMP: Date and time • INTERVAL : interval of time • BLOB: Binary large object
SQL – constants (literals) • For Oracle, see http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements003.htm#i11223 • Numeric constants: written as usually; for example : 123 or -1245.63 • String constants: enclosed in simple quotes; for example ‘hello world’. If you have to include a simple quote in your string, you have to write two consecutive simple quotes; for example : ‘I can’’t do it!’ • Date constants: DATE ‘yyyy-mm-dd’; example : DATE ‘2001-12-13’. There is no syntax for a date literal with a time, you may use the “TO-DATE” function to convert a string; for example: TO_DATE('98-DEC-25 17:30','YY-MON-DD HH24:MI') • Timestamp constants: TIMESTAMP ‘yyyy-mm-dd HH:MM:SS’ • ; example TIMESTAMP ‘2007-01-01 12:34:56' • Interval literal: INTERVAL ‘n[-n]’ [(YEAR|MONTH)] [‘(‘n’)’] [TO MONTH]. Example: INTERVAL '123-2' YEAR(3) TO MONTH
Pseudo columns • Oracle pseudo columns • See http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/pseudocolumns.htm#i1014965 • ROWID: identifier of the current row, not supported by DB2 • ROWNUM: sequential number of the current row, not supported by DB2 • Sequence.CURRVAL: current value of a sequence • Sequence.NEXTVAL: next value of a sequence • SYSDATE: current date • Hierarchical Query Pseudocolumns • Difference between Oracle and DB2 SQL, see http://www.fornera.com/helpdesk/db2SurvivalGuide.html
SQL built-in functions • SQL defines standard functions. Each RDBMS may offer additional functions (and sometimes doesn’t include all the standard functions). • We may distinguish different types of functions: • Get context info: current_time, current_date, current_timestamp, current_user, session_user, system_user. (not covered by Oracle) • Aggregate functions: max, min, count, avg, sum • Functions to manipulate strings: char_length (length in Oracle), position(starting_string IN search_string) (instr in Oracle), substring(substr for Oracle), lower, upper, concatenate (concat for Oracle), trim, … • Date & time functions: year, month, day, hour, minute, second, … • Mathematics functions: sin, cos, ceil, floor, log, ln,… • Conversion of types: cat (expression as type)