300 likes | 426 Views
IS C332: Database Systems and Applications. Today’s Class. Data Models Relational Model. Importance of Data Models. Data models Representations, usually graphical, of complex real-world data structures
E N D
IS C332: Database Systems and Applications Today’s Class Data Models Relational Model
Importance of Data Models • Data models • Representations, usually graphical, of complex real-world data structures • Facilitate interaction among the designer, the applications programmer and the end user • End-users have different views and needs for data • Data model organizes data for various users • Good database design uses an appropriate data model as its foundation
Data Model : Basic Building Blocks • Entity is anything about which data are to be collected and stored • Attribute is a characteristic of an entity • Relationship describes an association among (two or more) entities • One-to-many (1:M) relationship • Many-to-many (M:N or M:M) relationship • One-to-one (1:1) relationship
Business Rules • Brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization’s environment • Apply to any organization that stores and uses data to generate information • Description of operations that help to create and enforce actions within that organization’s environment
Business Rules • Must be rendered in writing • Must be kept up to date • Sometimes are external to the organization • Must be easy to understand and widely disseminated • Describe characteristics of the data as viewed by the company
Sources of Business Rules • Company managers • Policy makers • Department managers • Written documentation • Procedures • Standards • Operations manuals • Direct interviews with end users
History of Data Models • Relational Model: proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. Now in several commercial products (DB2, ORACLE, SQL Server, SYBASE, INFORMIX). • Network Model: the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.). • Hierarchical Data Model: implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. The most popular model of its time. Other system based on this model: System 2k (SAS inc.)
History of Data Models • Object-oriented Data Model(s): several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB). • Object-Relational Models: Most Recent Trend. Started with Informix Universal Server. Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server etc. systems.
Hierarchical Database Model • Logically represented by an upside down tree • Each parent can have many children • Each child has only one parent
Hierarchical Database Model • Advantages • Conceptual simplicity • Database security and integrity • Data independence • Efficiency • Disadvantages • Complex implementation • Difficult to manage and lack of standards • Lacks structural independence • Applications programming and use complexity • Implementation limitations
Hierarchical Data Model • ADVANTAGES: • Hierarchical Model is simple to construct and operate on • Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies • Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc. • DISADVANTAGES: • Navigational and procedural nature of processing • Database is visualized as a linear arrangement of records • Little scope for "query optimization"
Network Database Model • Each record can have multiple parents • Composed of sets • Each set has owner record and member record • Member may have several owners
Network Database Model • Advantages • Conceptual simplicity • Handles more relationship types • Data access flexibility • Promotes database integrity • Data independence • Conformance to standards • Disadvantages • System complexity • Lack of structural independence
Network Data Model • ADVANTAGES: • Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. • Can handle most situations for modeling using record types and relationship types. • Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database. • DISADVANTAGES: • Navigational and procedural nature of processing • Database contains a complex array of pointers that thread through a set of records. • Little scope for automated "query optimization”
Introduction • Proposed by Edgar. F. Codd(1923-2003) in the early seventies. [ Turing Award –1981 ] • Most of the modern DBMS are relational. • Simple and elegant model with a mathematical basis. • Led to the development of a theory of data dependencies and database design. • Relational algebra operations • crucial role in query optimization and execution. • Laid the foundation for the development of • Tuple relational calculus and then • Database standard SQL
Basic Concepts • Entities and relationships are stored in tables • Relationships are captured by including key of one table into another • Languages for manipulating the tables • All popular DBMSs today are based on relational data model (or an extension of it, e.g., object-relational data model)
Why is it so good? • Simplicity, everybody knows how to manipulate tables • Tables are simple enough so that solutions to complicated problems such as concurrency controland query optimization can be obtained • It has a theoretical basis for the studying of database design problems • Tables are logical concepts; physically tables can be stored in different ways support data independence
Terminology • Relation table; denoted by R(A1, A2, ..., An) where R is a relation name and (A1, A2, ..., An) is the relation schema of R • Attribute column; denoted by Ai • Tuple row • Attribute value value stored in a table cell • Domain legal type and range of values of an attributedenoted by dom(Ai) • Attribute: Age Domain: [0-100] • Attribute: EmpName Domain: 50 alphabetic chars • Attribute: Salary Domain: non-negative integer • Ideally, a domain can be defined in terms of another domain; e.g., the domain of EmpName is PersonName. This is NOT allowed in most basic DBMSs. • However, most recent DBMSs allows this (object-relational) extension such as Oracle 10g.
Relational Database: Definitions • Relational database:a set of relations • Relation: made up of 2 parts: • Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity. • Schema :specifiesname of relation, plus name and type of each column. • e.g. Students(sid: string, name: string, login: string,age: integer, gpa: real). • Can think of a relation as a setof rows or tuples (i.e., all rows are distinct).
An Example Relation Relation Name/Table Name Attributes/Columns (collectively as a schema) Tuples/Rows • Cardinality = 5, degree = 4, all rows distinct
Another Relation Example enrollment (studentName, rollNumber, courseNo, sectionNo) enrollment
Some Formal Definitions • A relation is denoted by: R(A1, A2, ..., An) • STUDENT(Name, Student-id, Age, CGA) • Degree of a relation: the number of attributes n in the relation. • Tuple t of R(A1, A2, ..., An): An ordered set of values <v1,v2,...,vn> where each vi is an element of dom(Ai). • Relation instance r(R): A set of tuples in R r(R) = {t1, t2, ..., tm}, or alternatively r(R) dom(A1) dom(A2) ... dom(An)
Relation and Cartesian Product • A relation is any subset of the Cartesian product of domains of values • Example:Let Dom(Name) = { Lee, Cheung } Dom(Grade) = { A, B, C } Then the Cartesian product of the domains is Dom(Name) Dom(Grade) = { Lee, A, Lee, B , Lee, C, Cheung, A, Cheung, B, Cheung C } • A relation StudentGrade (Name, Grade) can be defined as any subsetof the Cartesian product Dom(Name) Dom(Grade) r(StudentGrade) = { Lee, A, Cheung C } Dom(Name) Dom(Grade)
Characteristics of Relations • Tuples in a relation are not considered to be ordered, even though they appear to be in a tabular form. (Recall that a relation is a set of tuples.) • Ordering of attributes in a relation schema R are significant. • Values in a tuple: All values are considered atomic. (Recall that a domain is a set of atomic values.) A special null value is used to represent values that are unknown or inapplicable to certain tuples.
Keys • Let K R (I.e., K is a set of attributes which is a subset of the schema of R) • K is a superkey of R if K can identify a unique tuple in a given relationr(R) Customer(CusNo, Name, Address, …)where customers have unique customer numbers and unique names.Possible superkeys: CusNo {CusNo, Name} {CusNo, Name, Address} plus many others • K is a candidate key if K is minimal There are two candidate keys: CusNo and Name • Every relation is guarantee to (must) have at least one key. Why?
Key(Candidate key) • A key can not be determined from any particular instance data • it is an intrinsic property of a scheme • it can only be determined from the meaning of attributes • A relation can have more than one key. • Superkey: A set of attributes that contains any key as a subset. • A key can also be defined as a minimal superkey • Primary Key: One of the candidate keys chosen for indexing purposes ( More details later…)
Relational Model • Sets • collections of items of the same type • no order • no duplicates • Mappings domain range 1:many many:1 1:1 many:many
Exercise • What are the mapping cardinalities of the following 4 relationships? B C D A