450 likes | 578 Views
Opsummering. Basic Definitions. Database : A collection of related data. Data : Known facts that can be recorded and have an implicit meaning. E.g. “John B. Smith” a name123456789 a number ---two pieces of data
E N D
Basic Definitions • Database: A collection of related data. • Data: Known facts that can be recorded and have an implicit meaning. • E.g. “John B. Smith” a name123456789 a number ---two pieces of data • If they are used in a query like “who is the head of the department and what is his ssn” the data will turn into information (give an implicit meaning) • Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university.
User/programmers Database System Application programs/queries DBMS Software Software to processqueries/programs Software to access storede data Stored database Stored database definition Meta data
Additional Implications of Using the Database Approach • Flexibility to change data structures: database structure may evolve as new requirements are defined. • Availability of up-to-date information – very important for on-line transaction systems such as airline, hotel, car reservations.
Data Models • Data Model: A set of concepts to describe the structure of a database,and certain constraints that the database should obey. • Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.
Schemas versus Instances • Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. • Schema Diagram: A diagrammatic display of (some aspects of) a database schema (fig 2.1). • Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE. • Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).
End users Externalview Externalview ....... ConceptualSchema Internal Schema Data independence Stored database
DBMS Languages • Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas.
DBMS Languages • Data Manipulation Language (DML): Used to specify database retrievals and updates. • DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as Java or C# • Alternatively, stand-alone DML commands can be applied directly (query language).
DBMS Languages • High Level or Non-procedural Languages: e.g., SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages. • Low Level or Procedural Languages: record-at-a-time;they specify how to retrieve data and include constructs such as looping.
Conceptual Data Models • A conceptual model of the data on which the IT systems of an organisation are based • Independent of implementation • Stable over time • Conceptual data structure doesn't change as much as functionality • Conceptual models are to be transformed to a database model as the relational model
Relations Cardinality ratio Participation (total / partial) Relations may have attributes Weak Entity Types • Identifying owner • Identifying relation • Partial key • A weak entity always has total participation in the identifying relation. ER Model: Concepts • Entities • Attributes • Atomic • Composite • Multi valued • Attribute values • Entity types • Keys • Domains
Den relationelle model • Den relationelle model er en datamodel med specielt sigte på relationsdatabaser • Den relationelle model er en logisk datamodel, der beskriver hvordan data struktureres i relationsdatabaser
Den relationelle model • Den relationelle model beskrives ved hjælp af en række veldefinerede begreber: • domæner • relationelle skemaer • relationer • attributter • tupler • primærnøgler, fremmednøgler • begrænsninger (constraints)
Nøglebegrebet • En nøgle er en attributkombination, som entydigt identificerer en forekomst i en tabel. • En nøgle er minimal, dvs.. fjernes een attribut, er den ikke længere entydig. • Alle attributter fra tabellen vil tilsammen altid være en (evt.. ikke-minimal) nøgle, kaldet en supernøgle. • Der kan være flere forskellige kandidatnøgler i en tabel • Der vælges altid en primærnøgle fra mængden af kandidatnøgler
Tabelsammenhænge • repræsenteres ved fremmednøgler • en fremmednøgle er een eller flere attributter i en tabel, som svarer til primærnøglen i en anden tabel • en fremmednøgle peger på en forekomst i en anden tabel og fortæller, at her ligger resten af oplysningerne • fremmednøglen og primærnøgleattributterne i den tabel, der refereres til, skal have samme domæne.
Integritetsregler • Integritet: at være sammenhængende • Domæneregel: Værdien af en attribut skal være en atomisk værdi fra dom(A) • Entitetsintegritet: En primærnøgle må ikke indeholde NULL-værdier • Referenceintegritet: En fremmednøgle skal enten være NULL eller referere til en forekomst med en tilsvarende primærnøgleværdi • Semantisk integritet: Forskellige regler, der i modsætning til de andre former for integritet, afhænger af den bestemte database.
DBMS-understøttelse • DBMS’et bør understøtte: • Domæneintegritet • Entitetsintegritet • Referenceintegritet • Semantisk integritet • Udbredte relationelle DBMS understøtter kun 1 og 4 i begrænset omfang.
Det er det, man forstår ved en algebra! Datamanipulation i den relationelle model - relationsalgebraen • Arbejder på hele tabeller dvs. alle operationer tager tabeller som input og returnerer nye tabeller • Hermed kan operationer sammensættes til udtryk (som almindelige regneudtryk) • Operationer: • rækkeudvælgelse (RESTRICT/SELECT) • søjleudvælgelse (PROJECT) • sammensætning af tabeller (JOIN) • mængdeoperationer (UNION, INTERSECTION, MINUS, PRODUCT) • avancerede operationer (OUTER (LEFT/RIGTH) JOIN)
Table Design Transformation from E/R-model to Relational Model Eigth Steps Algorithm Does not always yield an optimal design, but provides a good starting point for the final design of tables
Step 2: For each weak entity type create a table • All attributes from the weak entity are included. • The primary key from the owner is included as foreign key. • The primary key is composed by the owner’s primary key and the partial key. • Step 3: For each (binary) 1:1-relation type include primary key of one participant as foreign key in the other • Any attribute on the relation type is included with the key. • If possible, include on a side with total participation. Step 1: For each regular entity create a table For composite attributes only the components are included. Multi-value attributes are not included (they are considered in step 6). Choose a primary key.
Step 4: For each (binary) 1:n relation type include primary key of 1-side as foreign key on n-side Step 5: For each (binary) n:m relation type create table with participating entity types primary keys as foreign keys • Any attribute on the relation is included in the new table. • Primary key is composed of the foreign keys. • This may also be applied to binary 1:1- and 1:n relations – in particularly if there are relatively few instants of the relation type. • Step 6: For each multi value attribute create table with primary key of entity type as foreign key and the multi value attribute • The primary key of the new table is composed of the foreign key and the multi value attribute. Any attribute is included with the key on the n-side.
Step 7: For each n-ary (n>2) relation type create a table with the primary keys of all participating entity types as foreign keys Any attribute on the relation type is included. The primary key is composed of the included foreign keys.
Step 8: B. Pull-down (only in case of disjoint, total specialisation): • Create a table for each subclass • Include (“pull down”) all attributes from the super class in each table • Use the primary key from the super class as primary key in the new tables
Step 8: C. Pull-up-1: (only in case of disjoint specialisation): • Create one table for the superclass • Include (pull up) all attributes from the subclasses • Add a type attribute
Step 8: D.Pull-up-2: (in case of overlapping specialisation): • Create one table for the superclass • Include (pull up) all attributes from the subclasses • Add a type flag for each subclass
Normalisation • Normal forms are the formal way to state design guidelines. • Normalisation is the process. • 6 normal forms (NF) are defined: • 1st, 2nd, 3rd, and Boyce-Codd (BCNF). • 4th and 5th NF • BCNF is the one of most practical interest.
Guideline for Normalisation All attributes are to depend on the key, the whole key, and nothing but the key. So help me Codd.
SQL DDL create definition af table, view alter tilføjefelter, ændrefelter tilføje constraint drop grant / revoke DML insert update delete select
/* automatik autoincrement pa primaernoeglen */ • /* • createtable test • (id int IDENTITY(1,1) primary key, • navn varchar(20)); • */
Comparing NULL’s to Values • The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN. • When any value is compared with NULL, the truth value is UNKNOWN. • But a query only produces a tuple in the answer if its truth value for the WHERE clause is TRUE (not FALSE or UNKNOWN).
Use is or is not selectFname, lname from employee wheresuperssn is null Instead of = or != Sincesqlconsiderseachnullvalue as beingdistinct from everyothernullvalue
AGGREGATE FUNCTIONS • Include COUNT, SUM, MAX, MIN, and AVG • Query 15: Find the maximum salary, the minimum salary, and the average salary among all employees.Q15: SELECT MAX(SALARY), MIN(SALARY), AVG(SALARY) FROM EMPLOYEE • Some SQL implementations may not allow more than one function in the SELECT-clause
GROUPING (cont.) • Query 20: For each department, retrieve the department number, the number of employees in the department, and their average salary.Q20: SELECT DNO, COUNT (*), AVG (SALARY) FROM EMPLOYEE GROUP BY DNO • In Q20, the EMPLOYEE tuples are divided into groups--each group having the same value for the grouping attribute DNO • The COUNT and AVG functions are applied to each such group of tuples separately • The SELECT-clause includes only the grouping attribute and the functions to be applied on each group of tuples • A join condition can be used in conjunction with grouping
THE HAVING-CLAUSE • Sometimes we want to retrieve the values of these functions for only those groups that satisfy certain conditions • The HAVING-clause is used for specifying a selection condition on groups (rather than on individual tuples)
THE HAVING-CLAUSE (cont.) • Query 22: For each project on which more than two employees work , retrieve the project number, project name, and the number of employees who work on that project.Q22: SELECT PNUMBER, PNAME, COUNT (*) FROM PROJECT, WORKS_ON WHERE PNUMBER=PNO GROUP BY PNUMBER, PNAME HAVING COUNT (*) > 2
Innerjoin • select pnumber, dnum, fname, lname, address • from ((project join department on dnum = dnumber) • join Employee on mgrssn = ssn) • whereplocation = 'Stafford'
Outerjoin • select fname, lname, dname as lederAf • from (employee left join department on • ssn = mgrssn) John Smith NULL FrankLinWong Research Joyce English NULL RameshNarayalan NULL James Borg Headquarters Jennifer Wallace Administration Ahmad Jabbar NULL AliciaZelaya NULL
Select • select < attribute and function list> • from < tablelist> • [where < condition>] • [group by <groupingattributelist>] • [Having <groupcondition>] • [Order by < attributelist>]
SQL Views: An Example CREATE view ViewWorksOn(fname, lname, pname, hours) AS (SELECT FNAME, LNAME, PNAME, HOURS FROM EMPLOYEE, PROJECT, WORKS_ON WHERE SSN=ESSN AND PNO=PNUMBER GROUP BY fname, lname, PNAME,hours)