430 likes | 582 Views
CSCI 3140 Module 1 – Background. (Based on Chapters 1 – 13 of Database Systems by Connolly and Begg) Theodore Chiasson Dalhousie University. Definitions. Database A collection of related data Database Management System (DBMS) Software that manages and controls access to the database.
E N D
CSCI 3140Module 1 – Background (Based on Chapters 1 – 13 of Database Systems by Connolly and Begg) Theodore Chiasson Dalhousie University
Definitions • Database • A collection of related data • Database Management System (DBMS) • Software that manages and controls access to the database
File-based systems • Early attempt to computerize manual filing systems • Limitations of file-based systems include: • Separation and isolation of data • Application programs often access many files • Duplication of data • Data needed in several departments will be entered into each department’s system • Data dependence • Application programs need to be updated if file formats are changed • Incompatibility of files • Different application development environments produce different file formats • Fixed queries/proliferation of application programs • Application programmer needs to intervene for ad hoc queries
Two underlying problems with file-based systems: • The definition of data is embedded within the application programs rather than being stored separately and independently • There is no control over access and manipulation of data beyond that imposed by the application programs
Definitions (revisited) • Database • A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization • Database Management System (DBMS) • A software systems that allows user to define, create, maintain, and control access to the database.
Important database terms • System catalog, data dictionary, or meta-data • The ‘data about the data’ • The part of the database that makes it self-describing • Entity • A distinct object (person, place, thing, concept or event) that is represented in the database • Attribute • A property describing some aspect of an entity • Relationship • An association between entities
Facilities provided by a DBMS: • Data Definition Language (DDL) • Used to specify the data types and structures and the constraints on the data to be stored in the database • Data Manipulation Language (DML) • Used to insert, update, delete, and retrieve data from the database • Structured Query Language (SQL) • De facto standard language for relational DBMSs. • Controlled access to the database • Security system • Integrity system • Concurrency control system • Recovery control system • User-accessible catalog
ANSI-SPARC three-level architecture • External level • The users’ view of the database • Describes the part of the database that is relevant to each user • External schemas, or subschemas • Conceptual level • The community view of the database • Describes what data is stored in the database and the relationships among the data • Conceptual schema • Internal level • The physical representation of the database on the computer • Describes how the data is stored in the database • Internal schema
ANSI-SPARC three-level architecture User 1 User 2 User n External level View 1 View 2 View n Conceptual level Conceptual Schema Internal Schema Internal level Physical data organization
The Relational Model • All data is logically structured in relations (tables) • Each relation has a name and is made up of attributes (columns) of data • Each tuple (row) contains one value per attribute Relation: a table with columns and rows (also called a file) Attribute: a named column of a relation (also called a field) Tuple: a row of a relation (also called a record) Domain: the set of allowable values for one or more attributes Degree: the number of attributes in a relation (unary, binary, ternary, n-ary) Cardinality: the number of tuples in a relation Relational database: a collection of normalized relations with distinct relation names
Mathematical review • Let A and B be sets, where A = {2,4} and B = {1,3,5). • The CARTESIAN PRODUCT of these two sets, written A X B, is the set of all ordered pairs such that the first element is an element of A and the second element is an element of B, which gives us the set {(2,1),(2,3),(2,5),(4,1),(4,3),(4,5)} • Any subset of this set is termed a relation • For example, R = {(2,1),(4,1)} is a relation. In this case, R can be specified as all those ordered pairs with the second element equal to 1, or R = {(x,y) | x E A, y E B, and y = 1} We could also define a relation S as S = {(x,y) | x E A, y E B, and x = 2y} which, given set A and B above, would yield S = {(2,1)}
Database Relations • Relation Schema • a named relation defined by a set of attribute and domain name pairs • Let A1, A2, … An be attributes with domains D1, D2, ... Dn. Then the set {A1:D1, A2: D2, … An: Dn} is a relation schema. Thus, a relation R is a set of n-tuples: (A1:d1, A2,d2, … An:dn) such that d1E D1, d2E D2, … dnE Dn Tuples are normally written without the column names, yielding (d1,d2, … dn) A relation instance is a set of tuples, such as {(6050, Theo, 317)} {(ID:6050, NAME: Theo, Office: 317)} • Relational database schema • a set of relation schemas, each with a distinct name • Also called a relational schema
Relational keys • Superkey • An attribute, or set of attributes, that uniquely identifies a tuple within a relation • Candidate key • A superkey such that no proper subset is a superkey within the relation • Primary key • The candidate key that is selected to uniquely identify tuples with the relation • Foreign key • An attribute, or set of attributes, within one relation that matches the candidate key of some (possibly the same) relation
Relational Integrity • Domain constraints • Restrict the allowable set of values for the attributes of a relation • Null • Represents a value for an attribute that is currently unknown or is not applicable for this tuple • Entity integrity • In a base relation, no attribute of a primary key may be null • Referential integrity • If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null • Enterprise constraints • Additional rules specified by the users or database administrators of a database
Views • Base Relation • A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database • View • The dynamic results of one or more relational operations operating on the base relations to produce another relation. • A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of the request • Not all views are updatable.
Relation Algebra • A theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s) • Five fundamental operations: • Selection • Projection • Cartesian product • Union • Set difference • Three additional operations (can be derived from the five) • Join • Intersection • Division
Unary operations • Selection operation • Input is a relation and a predicate. The predicate is constructed from boolean expressions involving attributes of the relation using logical operators ^ (AND), v (OR), and ~ (NOT). Output is a new relation that consists of only those tuples in the original relation for which the predicate evaluates to TRUE. • Example: σsalary > 100000(Staff) • Result: A new relation with the same attributes as the Staff relation, and with tuples corresponding to staff with a salary greater than 100,000. • The degree of the new relation is the same as the degree of the original relation. • Projection operation • Input is a relation and a set of attributes of the relation. Output is a new relation with only those attributes listed in the input set. The degree of the new relation is the number of attributes in the input set, and the cardinality of the new relation is the same as the cardinality of the original relation. • Π(staffNo, salary)(Staff)
Set Operations • Union (R U S) • Input is two union-compatible relations R and S, output is a new relation containing all of the tuples of R and S • Union-compatibility is defined as having the same number of attributes with each pair of corresponding attributes having the same domain • Set Difference (R – S) • Yields a new relation consisting of tuples that are in relation R that are not in relation S. R and S must be union-compatible. • Intersection (R S) • Yields a new relation consisting of tuples that are in both relation R and relation S. Equivalent to R – (R – S) • Cartesian Product (R x S) • Yields a new relation that is the concatenation of every tuple of R with every tuple of S
Join Operations • Theta join ( R FS) • Yields a relation with all tuples of the Cartesian Product of R and S that satisfy the predicate F • Equijoin ( R FS) • A Theta join in which the only predicate is equal (=) • Natural join ( R S) • Equijoin over all common attributes of R and S. One occurrence of each common attribute is eliminated from the result. • Outer join ( R S) • Tuples from R which do not have matching values in the common attributes of S are also included in the join, with missing values in the second relation set to null. • Semijoin ( R FS) • Contains the tuples of R that participate in a join of R with S
Division operation (R – S) • The division operation defines a relation over the attributes C that consists of the set of tuples from R that match the combination of every tuple is S, where C is the set of attributes that are in R but are not in S.
Relational Calculus • Based on a branch of symbolic logic called predicate calculus • A predicate is a truth-valued function with arguments • A proposition is a predicate with values filled in for the arguments • If P is a predicate, the set of all x such that P is true for x is expressed as {x | P(x)}
Tuple Relational Calculus • Tuple relational calculus • Find tuples for which a predicate is true • Tuple variables `range over’ a named relation • Specify a range of a tuple variabe S as the Staff relation using the notation Staff(S) • To express the query `find all tuples of S such that F(S) is true’, we write {S | F(S)} F is a formula, or well-formed formula (wff) • Quantifiers tell how many instances the predicate applies to • Universal quantifier `for all’ • Existential quantifier `there exists’ • Variables qualified with a quantifier are bound variables, those not qualified are free variables • All free variables are to the left of the vertical bar in a wff
General form of an expression in Tuple Relational Calculus { S1.a1, S2.a2, …, Sn.an | F(S1, S2, …, Sm)} m≥n Where S1, S2, …, Sm are tuple variables, each aiis an attribute of the relation over which Si ranges, and F is a formula A wff is made up of atoms, where atoms can be: - R(Si) where Si is a tuple variable and R is a relation - Si.a1θ Sj.a2 where Si and Sj are tuple variables, a1 is an attribute of the relation over which Si ranges, a2 is an attributes of the relation over which Sj ranges, and θ is a comparison operator (<,≤,>,≥,=,≠); the attributes a1 and a2 must have domains whose members can be compared by theta - Si.a1θ c where where Si is a tuple variable, a1 is an attribute of the relation over which Si ranges, c is a constant from the domain of attribute a1, and θ is one of the comparison operators.
Domain Relational Calculus • Domain relational calculus • Variables take their values from the domains of attributes rather than the tuples of relations • Expressions have the general form: {d1, d2, …, dn | F(d1, d2, …, dm)} m >= n Where d1, d2, …, dn, …, dm represent domain variables and F(d1, d2, …, dm) represents a formula composed of atoms, where each atom has one of the following forms: - R(d1, d2, …, dn) where R is a relation of degree n and each di is a domain variable - diθ dj ,where di and dj are domain variables and θ is a comparison operator - diθ c,where di is a domain variable, c is a constant from the domain of di, and θ is a comparison operator We recursively build up formulae from atoms where • an atom is a formula, • the conjunction or disjunction of two formulae is a formula, the negation of a formula is a formula • Existential and universal quantifiers can be applied
Entity-Relationship Modeling • Entity type • A group of objects with the same properties, which are identified by the enterprise as having an independent existence • Entity occurance • A uniquely identifiable object of an entity type • Each entity type in an E-R diagram is represented as a rectangle labeled with the name of the entity Staff PropertyForRent
Entity-Relationship Modeling • Relationship type • A set of meaningful associations among entity types • Relationship occurance • A uniquely identifiable association, which includes one occurrence from each participating entity type • Each relationship type in an E-R diagram is represented as a line connecting the associated entity types, labeled with the name of the relationship Has Staff Branch
Entity-Relationship Modeling • Degree of a relationship type • The number of participating entity types in a relationship • Binary: A branch has staff • Ternary: Staff registers a client at a branch • Quaternary: A solicitor arranges a bid for a buyer through a bank Has Staff Branch Registers Staff Branch Client Solicitor Arranges Buyer Bank Bid
Entity-Relationship Modeling • Recursive relationship • A relationship type where the same entity type participates more than once in different roles • Example: Staff (supervisor) supervises staff (Supervisee) • Example of entities associated through two distinct relationships • Manager manages branch office • Branch office has member of staff Supervises Supervisor Staff Supervisee Manager Branch Manages Staff Branch Has Member of staff Branch
Entity-Relationship Modeling • Attribute • A property of an entity or relationship type • Attribute domain • The set of allowable values for one or more attributes • Simple Attribute • An attribute composed of a single component with an independent existence • Composite Attribute • An attribute composed of multiple components, each with an independent existence • Examples • Address (could be broken into street, city, postal code) • Name (could be broken into FirstName, MiddleInitial, LastName) • Single-valued Attribute • An attribute that holds a single value for each occurrence of an entity type • Multi-valued Attribute • An attribute that holds multiple values for each occurance of an entity type • Example: Each branch may have several phone numbers • Derived Attribute • An attribute that represents a value that is derivable from the value of a related attribute or set of attributes, not necessarily in the same entity type
Entity-Relationship Modeling • Candidate Key • The minimal set of attributes that uniquely identifies each occurance of an entity type • Primary Key • The candidate key that is selected to uniquely identify each occurrence of an entity type • Composite Key • A candidate key that consists of two or more attributes
Entity-Relationship Modeling • Diagrammatic representation of attributes in ER diagram Manages Primary key Staff staffNo {PK} name position salary /totalStaff Branch branchNo {PK} address street city postalCode telNo[1..3] Has Area for attribute list Composite attribute Derived attribute Multi-valued attribute
Entity-Relationship Modeling • Strong entity type • An entity type that is not existence-dependent on some other entity type • Weak entity type • An entity type that is existence-dependent on some other entity type • Attributes on relationships • Attributes can be attached to relationships as well as entities • Diagrammatic representation: Advertises Newspaper newspaperName PropertyForRent propertyNo dateAdvert cost
Entity-Relationship Modeling • Multiplicity • The number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship • One-to-one (1:1) • One-to-many (1:*) • Many-to-many (*:*) • Cardinality • Describes the maximum number of possible relationship occurrences for an entity participating in a given relationship • Participation • Determines whether all or only some entity occurrences participate in a relationship
Entity-Relationship Modeling • Specialization/Generalization • Superclass • An entity type that includes one or more distinct subgroupings of its occurrences • Subclass • A distinct subgrouping of occurrences of an entity type • (name of subclass) IS-A (name of superclass) • Manager IS-A Staff • Secretary IS-A Staff • Specialization • The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics • Generalization • The process of minimizing the differences between entites by identifying their common characteristics
Entity-Relationship Modeling • Diagrammatic representation of specialization/generalization Has Branch branchNo {PK} address street city postalCode Staff staffNo {PK} name position salary 1..* 1..1 1..1 {Optional, And} Manages 1..1 Manager mgrStartDate bonus salesArea carAllowance SalesPersonnel Secretary typingSpeed
Normalization • Unnormalized form • A table that contains one or more repeating groups
Normalization • First normal form (1NF) • A relation in which the intersection of each row and column contains one and only one value
Normalization • Functional dependency • Describes the relationship between attributes in a relation • If A and B are attributes of a relation R, B is functionally dependent on A (denoted A -> B) if each value of A is associated with exactly one value of B. • (A and B may each consist of one or more attributes) • Determinant • The attribute or set of attributes on the left-hand-side of the arrow of a functional dependency • Full functional dependency • Indicates that if A and B are attributes in a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A • Second normal form (2NF) • A relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on every candidate key
Normalization • Transitive dependency • A condition where A, B, and C are attributes of a relation such that if A->B and B->C , then C is transitively dependent on A via B (provided that A is not fully functionally dependent on B or C) • Third normal form (3NF) • A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on any candidate key
Normalization • Boyce-Codd normal form (BCFN) • A relation that is in BCNF if and only if every determinant is a candidate key • Example: ClientInterview(clientNo, interviewDate, interviewTime, staffNo, roomNo) clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary key) staffNo, interviewDate, interviewTime -> clientNo (Candidate key) roomNo, interviewDate, interviewTime -> staffNo, clientNo (Candidate key) staffNo, interviewDate -> roomNo Split into two relations that are in BCNF: Interview(clientNo, interviewDate, interviewTime, staffNo) StaffRoom(staffNo, interviewDate, roomNo)
Normalization • Multivalued dependency • Represents a dependency between attributes in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, the set of values for B and C are independent of each other. • Trivial multivalued dependency • If A->> B is a multivalued dependency in relation R, it is considered trivial if either (a) B is a subset of A or (b) A U B = R • Fourth normal form (4NF) • A relation that is in BCNF and contains no nontrivial multivalued dependencies Example: BranchStaffOwner(branchNo, sName, oName) BCNF branchNo ->> sName 4NF branchNo ->> oName 4NF
Normalization • Lossless-join dependency • A property of decomposition, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation • Fifth normal form (5NF) • A relation has no join dependencies Example: Chrysler, Dodge, Jeep dealership carries cars from all three manufacturers. If the dealership decides to offer trucks for sale, trucks from all manufacturers that make trucks must be offerred. Manufacturer(manufacturerNo, vehicleType) Dealer(dealerNo, dealerName) VehicleTypes(vehicleTypeNo, vehicleType, vehicleTypeDescription) VehicleTypesOffered(manufacturerNo, dealerNo, vehicleTypeNo)
Normalization • Unnormalized form • A table that contains one or more repeating groups • First normal form (1NF) • A relation in which the intersection of each row and column contains one and only one value • Second normal form (2NF) • A relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on every candidate key • Third normal form (3NF) • A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent any candidate key • Boyce-Codd normal form (BCNF) • A relation is in BCNF, if and only if, every determinant is a candidate key • Fourth normal form (4NF) • A relation is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies • Fifth normal form (5NF) • A relation has no join dependencies