2.11k likes | 2.52k Views
CMSC 424 Database Design Section 401 Dr. David Kuijt. Contact Info. Professor: David Kuijt Office: AVW 3205 Phone: 5-0534 Email: kuijt@cs.umd.edu Office Hours: T/Th 3:15-4:00 (or by appointment) TA: Debbie Heisler Office: AVW 3270 Phone: 405-7027 Email: heisler@cs.umd.edu
E N D
CMSC 424 Database Design Section 401 Dr. David Kuijt
Contact Info Professor: David Kuijt • Office: AVW 3205 • Phone: 5-0534 • Email: kuijt@cs.umd.edu • Office Hours: T/Th 3:15-4:00 (or by appointment) TA: Debbie Heisler • Office: AVW 3270 • Phone: 405-7027 • Email: heisler@cs.umd.edu • Office Hours: TBA
Basic Information • Required text: Korth & Silberschatz Database System Concepts, Fourth Edition, McGraw Hill 2001. • Warnings: • Late homework or projects are not acceptable. Hand in what you have finished. Exceptions will be made only for emergencies or medical reasons with a doctor's note. • No makeup exams. Exceptions as above. • Cheating will result in an immediate grade of XF ("failure through academic dishonesty" -- this goes on your permanent transcript), and may result in suspension or expulsion from University. This is your only warning. Don't do it.
Motivation We live in a database world. The simplest acts are tied to databases. The last time I called out for delivery pizza, it involved at least four enormous databases. What were they? • Pizza Hut knew what I had ordered before -- they asked if I wanted the same pizza as last time. They probably store lots more information than that -- perhaps all my old order information. They could use this information to make corporate decisions (quantities of materials to order; forecasting pizza trends) as well as a local aid.
Motivation (2) • Every delivery food place around here uses caller ID. That's a relatively simple database, just giving names and incoming telephone number, but it helps them avoid some types of fraudulent orders and errors when writing down names, addresses, and stuff. • I used a telephone. When you use a telephone, all the details about the call are stored in a database. Call length, what number you called, time of call, billing information, and so on. Cell phone databases are even more complex.
Motivation (3) • I paid by credit card. Huge databases are involved. • Every time somebody runs your credit card through a swipe reader or types in the number, they're checking information in a database. • Is this a valid credit account? • Does it have enough money to cover the bill? • Is the credit card stolen? • Debit the merchant account, credit the consumer account • Whether the transaction is accepted or rejected, all the details are recorded in a database somewhere.
More Motivation • Every time you go to an ATM, use a credit card, buy something with a UPC bar code at a supermarket or department store, go to a movie, concert, or Caps game, register for classes, or get a parking ticket in Lot 4, you are working with databases. • Everything in your wallet that isn't a photograph is an entry in a database somewhere • medical plan cards • credit cards • student ID • driver's license • membership cards in clubs or interest groups • everything. Even the currency!
Final Motivation • Databases are all around us. • Knowledge is power. • Databases give us power (the ability to do things we couldn’t otherwise do) • they give other people power over us, and knowledge about what we are doing. • This class is about Databases. So what is a Database?
What is a Database? At the simplest level, a database architecture has two components. (1) Data. • Usually a whole lot of it • Representing multiple types of different objects • Each type may be related to itself and to other types in multiple ways (2) A set of methods to access and manipulate the data.
Data • For any reasonable-size database the data may be quite complex. • It is an attempt to record or model all the aspects of the real world that are important to one specific purpose -- telephone calls, for example, or credit card accounts. • Lots of different objects need to be stored as data, and they need to be stored in such a way as to reflect the ways that the objects can interact with each other.
Database Example • For example, consider a local telephone system database. Types of data stored include: • Account information: • customers: individuals, groups, companies that have leased numbers; • billing addresses, payment history, calling plans and billing contracts; • Hardware information: • network structure (call routing), • hardware age, reliability, and maintenance information, • system load tracking, • network billing pattern (what numbers are long distance from what other numbers, and what ones are local)
Database Example (2) • Local telephone system database continued. Additional types of data would include: • Call information: • start and end time, • telephone number that initiated the call; • telephone number(s) that received the call • All that information could be stored in files with much less fuss and bother. Why use a database? Why not just store the information in flat files?
Why Not Flat Files? • Why use a database? Why not just use a flat file? • Databases have a number of advantages over flat files. • Data Access. The set of programs that provide access to a database allow much more complex and flexible queries to the database with greater efficiency and convenience. • Reduced duplication and better control over data consistency. Data redundancy is bad. Which item to change in an update? How do you know that you've found all the copies? Data inconsistency (disagreement between various copies of the same data) is a serious problem. • Integrity Constraints can be enforced inside a database -- telephone numbers all 10 digits; phone numbers in Maryland all have the first three numbers being 410 or 301.
Why Not Flat Files? (2) • Uniform access and control of data using a standard language • Data Independence. We want the data to be independent of the representation chosen for it within the system. Tying the data to a given representation is what caused the Y2K fuss -- only two digits were used for a "year" field. • Concurrency control. Multiple users on a single database is a big advantage. • Recovery. • Security. Different users of the database may need different levels of access to information. • Centralized Control • Platform independence (portability). Since the internal file structure and access program details are hidden from the user, it is much easier to use the database on multiple platforms.
Data Abstraction • Most users don't need to understand all the details of the implementation and data design of a complex database. To make a database convenient to use, the system provides users with an abstract view of the data, limiting the information available to them. There are usually three levels of data abstraction. • Physical Level • Conceptual Level • View Level
Data Abstraction (2) • Physical level. The actual implementation details of low-level data structures are described at this level. • Conceptual level. This level describes all the different data types that exist by defining a relatively small number of simple structures, including all the relationships that these data types have with each other. Implementation of these objects might be complex, but it is hidden from the user at this level. Database administrators are usually the only ones who have access at this level.
Data Abstraction (3) • View level. There may be multiple different views, each of which represents a simpler subset of the functions and data available at the conceptual level. Different user types may require different parts of the database (for example, a bank account database might be accessed by cashiers, account holders, credit card companies, and the bank's payroll manager. Each of them can only access a small part of the full database of bank account information). Creating a number of restricted views makes the database more useful for the individual user types, giving each type access according to the needs of that type.
Data Abstraction (4) • Definition: a Schema is a specification of a particular database using a particular data model. The three levels of data abstraction are often referred to as: • External Schema(s) (for the view level(s)). • Conceptual Schema (for the conceptual level) • Internal Schema (for the physical level)
Database as Model • A model represents a perception of a real system • Models help us manage or understand the real world system they represent. • When modeling a system we select aspects and characteristics we want to represent; we abstract them to form a simple(r) system • examples: a map, an airplane flight simulator, computer weather analysis program • A database is a model of reality
Data Models underlying the Database • The data model is a collection of conceptual tools for describing data and its attributes • data objects • interrelationships of the data • data semantics and consistency constraints • There are two well-established data models used in database design • Entity-Relationship (E-R) model • Relational model • older methods included the Network and Hierarchical data models • Each was tied closely to the underlying implementation, which made it more difficult to model data and to modify or update the database. As a result they aren’t much used any more
Entity-Relationship Model • Diagram based model • Two primitives • Entities -- each represents a unique real-world object • Relationships -- each represents an association among several entities • Each are associated in sets of the same type (for example, one entity set might be customer, representing the set of all entities that represent customers at a given bank) • Third important notion: Attributes • Entities are associated with a set of attributes
Entities • Entity: a distinguishable object we want to model • e.g., room CSI 3120, Celine Dion, Elizabeth I of England • Entities have attributes (single-valued properties) • e.g., a person has a name, SSN#, gender, … • if an attribute has more than a single value, we should model it as an Entity • Entity Set: a set of entities of the same type • e.g., CLASSROOMs, SINGERs, HISTORICAL MONARCHs • Entity Sets may overlap • CSI 3120 is a member of CLASSROOMs and also a member of CSI BUILDING ROOMs.
Relationships • Relationship is an association among entities • David Kuijt teaches-in CSI 3120 • Relationship Set is a collection of relationships of the same type • FACULTY teach-in CLASSROOMs • Relationships may also have attributes • e.g., the relationship teach-in has an attribute “weekday” and another attribute “time” to store the day and time in which a given Entity of the set FACULTY teaches in a given Entity of the type CLASSROOM
Example Database Design (1) • Application: library database. Authors have written books about various subjects; different libraries in the system may carry these books. • Entities (with attributes in parentheses): • Authors (SS#, name, tel, birthdate) • Books (ISDN, title) • Subjects (sname) • Libraries (lname) • Relations [associating entities in square brackets]: • Wrote-on [Authors, Subjects] • Carry [Libraries, Subjects] • Index [Subjects, Books]
Poor Initial Design • Our first design is a poor model of the real-world system we are examining. Problems in our first design: • no relationship associating authors and books • no relationship associating libraries and books • common queries will be complex and difficult: • Q: what libraries carry books by a given author? • Q: what books has a given author written? • Q: who is the author of a given book? • Q: how many copies of a given book exist at each library? • Q: what edition of a book does the library have?
Example Database Design (2) • Application: library database as before • Entities (with attributes in parentheses): • Authors (SS#, name, tel, birthdate) • Books (ISDN, title) • Subjects (sname) • Libraries (lname) • Relations [associating entities in square brackets] (attributes in parentheses): • Wrote [Authors, Books] • In-stock [Libraries, Books] (quantity, edition) • Index [Subjects, Books]
Keys • Fundamental concept for databases • Must be able to uniquely identify things within a database (in the E-R model, Entities and Relationships) • Avoid duplication of results in a search; identify data redundancy in other operations • Halt search on positive results • Quick lookup in underlying data structures used at the Physical Level of abstraction • Examples of possible keys • Student ID number (SS#) is used as a key for most UMD databases having to do with students
Entity Keys • Superkey: set of attributes whose values uniquely identify the entity • candidate key: a minimal superkey (a minimal subset of a superkey whose values still uniquely identify the entity) • primary key: if there is more than one possible candidate key, one is chosen as the primary one used for most entity-identification purposes • weak entity:has no primary key; instead it depends upon another strong entity’s primary key to exist • e.g., CHILDren of EMPLOYEEs are weak; the primary key of EMPLOYEE in addition to the attributes of the CHILD are used for identification • weak entities are “existent dependent” on a strong entity -- when the strong entity gets deleted, so does the weak one
Relationship Keys • Depend upon the entity mapping of the relationship • one-one: the primary key of any of the entities can be used to uniquely distinguish a given relationship between two unique entities. • one-many: the primary key of the “many” entity, plus possibly a subset of the attributes of the relationship, will uniquely identify a given relationship • e.g., MOTHER gave-birth-to CHILD; to identify a specific gave-birth-to relationship requires the primary key of MOTHER and possibly the (date) and (time) attributes of gave-birth-to • many-many: the union of the primary keys of the entities associated, plus possibly a subset of the attributes of the relationship, will uniquely identify a given relationship • e.g., PERSON married PERSON; SS# of both and possibly date
Special Cases • Relationships may associate different entities of the same type • Ternary versions of the above • M-N relationships: many-one mappings are often more useful in practice than many-many mappings. • DUMMY Entities can be used to convert an M-N mapping relationship to a pair of relationships, one M-1 and one N-1.
Specialization-Generalization(ISA Hierarchy) • This is a way to represent entity complexity • specialization: top-down refinement of entities with distinct attributes • Entity type BANK ACCOUNT might be subdivided into related but different types CHECKING ACCT and SAVINGS ACCT • generalization: bottom-up abstraction of common attributes • Course types DATABASE, SYSTEM, and NETWORK all have common attribute (project). From them we can abstract a new course type PRACTICAL COURSE • other common course attributes are included (e.g., course number)
ISA Hierarchy Example: Top-down Refinement • Account entity with attributes balance and number • additional complexity: we want to represent two subtypes of account • Savings Account with attribute Interest Rate • Checking Account with attribute Overdraft Limit
ISA Hierarchy Example: Bottom-up Abstraction • Three related entities with similar attribute project • we abstract a new type of super entity Practical Course and link the three entities as subtypes • other shared attributes (e.g., course number) are also promoted to the upper level entity
Aggregation(Part-of Hierarchy) • This is a way to represent relationship complexity • relationships among relationships are not supported by the E-R model • often we want to model lower-level relationships differently • Groups of entities and relationships can be abstracted into higher level entities
Part-of Hierarchy Example • Entities driver, car, tires, doors, engine, seats, piston, valves • Relationship drives is insufficient to model the complexity of this system • Part-of relationships allow abstraction into higher level entities (piston and valves as parts of engine; engine, tires, doors, seats aggregated into car)
Mapping an E-R Schema to Tables • Motivation - translating E-R database designs into Relational designs • Both models are abstract, logical representations of a real-world enterprise • Both models employ similar design principles • Converting an E-R diagram to tables is the way we translate an E-R schema to a Relational schema. • Later on we’ll examine how to convert a Relational schema to an E-R schema
Mapping an E-R Schema to Tables (2) • Strong Entity E with primary key PK and attributes A, B, … ==> E(PK, A, B, …) • Weak Entity F with (non-primary) key WK and attributes C, D, … depending upon E above for primary key ==> F(PK, WK, C, D, …) • Relationship R with attributes L, M, … and associating Entities E (with primary key PK), E2 (PK2), E3 (PK3), … ==> R(PK, PK2, PK3, …, L, M, …) • Relationships between weak entities and the strong one on which they are dependent usually do not require representation because it is usually a many-one relationship with no attributes on the relationship (they are on the weak entity) and so the resulting table R(PK, WK) is a subset of the weak entity itself.
Table Details • The whole table represents a single Entity Set or Relationship Set. • Each entry (row) in the table corresponds to a single instance (member in that set) • For an Entity Set each column in the table represents an attribute in the E-R diagram • For a Relationship Set each column in the table represents either an attribute of the Relationship or one of the parts of the primary key of the Entity Sets it associates
Mapping an E-R Schema to Tables (3) • ISA relationships: choose either to • Represent the super class entity, then represent each subclass with the primary key of the super class and its own attribute set. This is very similar to the way weak entities are treated. • Or, map the subclasses to separate relations and ignore the whole super class. This is good when the subclasses partition the whole superclasses between them (the subclasses are disjoint and the union of the subclasses covers the whole super class). • Aggregate (part-of) relationship • Translation is straightforward -- just treat the aggregate as an entity and use the methods defined above. • With last week’s lecture, this covers the material of chapter 2.
Relational Database Model • Most popular logical data model • Relations (also called tables) represent both Entity Sets and Relationship Sets. • Attributes form the columns of the table (column and attribute are synonymous) • Each row represents a single entity or relationship (called a row or tuple) • Each instance of an attribute takes values from a specific set called the domain of the column (the domain defines the type)
Relational Database Model (cont) • A relation schema is made up of the name and attributes of a relation with their underlying domains • A database schema is a set of all relation schemas. • The notions of keys, primary keys, superkeys are all as previously described
Query Languages • a language in which a user requests information from the database • a higher level language than standard programming languages • query languages may be procedural or non-procedural • procedural languages specify a series of operations on the database to generate the desired result • non-procedural languages do not specify how the information is generated • most commercial relational database systems offer a query language that includes procedural and non-procedural elements
Relational Algebra • procedural query language • set of operators that map one or more relations into another relation • closed algebraic system • best feature - operations on operations • form relational algebraic expressions • two types of operations: set-theoretic and database specific
Relational Algebra Operations • database specific: • (horizontal) selection () • (vertical) projection () • join • outer join • semijoin • division • set operators • union • difference • intersection • cartesian (cross) product
EMP ename salary dept Gary 30K toy Shirley 35K candy Christos 37K shoe Robin 22K toy Uma 30K shoe Tim 12K (null) DEPT dept floor mgr candy 1 Irene toy 2 Jim men 2 John shoe 1 George Example Relations
Database Specific Operators • (horizontal) selection () • picks a subset of the rows • (vertical) projection () • picks a subset of the columns • join • creates a new relation (table) out of two • equijoin (based upon equality of attributes) • natural join (equijoin plus projection to eliminate duplicated columns)
Set Operators • union • both relations must be union-compatible -- same degree and same domains • set difference • both relations must be union-compatible as above • intersection • same deal • cartesian (cross) product • note similarity to join operation; join can be defined as a cross product followed by a selection criteria