550 likes | 778 Views
Introduction to Database. Data versus Information. When people distinguish between data and information, Data is simply a set of individual numbers or facts. E.g. John Smith got a 87 on Test 2 in CSC 240.
E N D
Data versus Information • When people distinguish between data and information, • Data is simply a set of individual numbers or facts. • E.g. John Smith got a 87 on Test 2 in CSC 240. • Information arises from integration – putting the individual facts together to see the larger picture or by asking a question not answerable by just one piece of data. • John Smith scored more than 10 points over the class average on all of the tests in CSC 240. • Jane Jones got the highest score on Test 2 is CSC 240.
Database • The purpose of a database is to bring together a set of data so that it is centralized, organized, easy to extract information from, and so forth. • The way the facts are organized and how they relate to each other is known as the database design. • The software application that facilitates the storing of the data and the accessing of the information is known as a database management system.
What do we want? • Desired Features of our database • Storage: • We want to store data efficiently, have it centralized (or at least seemingly centralized). • Retrieval: • We want to have the data at our fingertips when we want it. • Querying: • We want to ask various questions about the data (and get answers in a timely manner). • (These desires are to some extent in conflict.)
Database Modeling • In database design, one tries to organize the data in a way that reflects the situation from which the data comes. • One approach is known as entity-relationship modeling or ER modeling. • The design is often represented pictorially in an ER Diagram.
Design Implementation • After a database is designed, one uses a database management system to implement (actually make) the database. • In database implementation, one encounters terms like table, record, field, keys, and so forth.
Entities • One identifies units of information that clearly belong together. These are known as the entities. • An entity is somewhat similar to an object in programming, it collects data that belongs together in some immediate way. • Entities also separate the data into distinct units. • Database entities often reflect real objects/entities (persons, buildings, courses, etc.)
Fields • The lower-level pieces of data gathered together to form an entity are known as fields or attributes or properties. • The Person entity might consist of fields like FirstName, LastName, JobType, SocSecNum, etc. • Fields are analogous to properties of an object. • Fields have a type (Text, Number, Yes/No, Memo, Date/Time, etc.) which indicate how the information is to be stored and interpreted.
Relationship • The various entities may be distinct, but they are not completely disconnected. • E.g. a Customer places an order • An association between two entities is known as a relationship.
ER Diagram • One can visualize the entities and their relationship using an Entity-Relationship (ER) diagram. • The entities are represented by rectangles. • The relationships are represented by arrows between the rectangles. • The arrow may include a verb to capture the nature of the relationship (as well as other notations).
ER Diagram Example Places Is part of
DBMS • Before databases a programmer had to worry about • where files are to be found, • the order in which the fields occur, • the length of the fields and/or the delimiter used, • Such features of the file-based approach are called program-data dependence. • These are now taken care of by the database management system (DBMS).
Database vs. DBMS • The generalized routines for reading, parsing, searching, sorting etc. are in the DBMS. • But information specific to a particular case (number of fields, their type, size and so on) is still required. This data is placed together with the “actual” data in the database.
Meta-data • This data about the data is known as meta-data. • Meta: a prefix meaning: after, along with or beyond • The meta-data describes the actual data, and so databases are sometimes called self-describing. • Related terms include: data dictionary, system catalog and schema.
Meta-data: Open a database file (Access in this example) in Notepad Some actual data One can see there’s more to this file than just customer data.
Database/DBMS Distinction Database Raw-data and meta-data DBMS User Application Users and applications interact with a database only through the DBMS.
Control of data redundancy Data consistency More info from same data Sharing of data Improved data integrity Improved security Enforcement of standards Economy of scale Balancing of conflicting requirements Improved accessibility and responsiveness Improved maintenance through data independence Increased concurrency Improved backup and recovery services Pros of Database Approach
Client-Server • The client-server model is a way for transactions to take place. • The transaction is viewed as a service. • The client requests the service. • The server provides the service. • For example, to query a networked database • A client would request the network server(s) to connect it to the database server • The database server queries the database • The result is passed from database server to network server to client. • The client-server terminology can be applied to both software and hardware.
SQL • SQL (Structured Query Language), pronounced S-Q-L or See-Quel, has become the standard language for relational databases. • SQL is part third generation and part fourth generation.
What’s it made of? • SQL has 3 components: • Data Definition Language (DDL) • The part that allows you to establish the structure of the database • Data Manipulation Language (DML) • The part that allows you to enter data, update data and ask questions of the data (queries) • Data Control Language (DCL) • The part that allows you to add security features (e.g. user authentication), concurrency (multi-user) features, recovery features, etc.
Express Relationships as Verbs • Relationships are generally expressed as verbs. For example, • Athlete comes from Country • Athlete competes in Event • The relationship can be represented by a line between the two rectangles representing the participating entities. The verb is written on the line. In the Chen model, the verb is placed in a diamond.
Basic Structure: Relationships • Relationships are said to have a multiplicity. • Relationships are categorized by how many things are related to how many things. • 1:M (one-to-many) • M:N (many-to-many) • 1:1 (one-to-one)
Relationship Examples • One-to-many • A country will be represented by many athletes, but each athlete represents only one country. • Many-to-many • An athlete may compete in many events, and an event has many athletes competing in it. • One-to-one • Each country has one athlete serve as flag bearer in the opening ceremony.
Degree of a Relationship • Relationships are said to have a degree (the number of entity types involved). • Binary: involves two entities • Ternary: involves three entities • Quaternary: involves four entities • Even if not using the Chen model, ternary and higher degree relationships are represented using a diamond.
Ternary Relationship Example StockHolder StockBroker Buy/Sell A StockHolder buys or sells a stock through a StockBroker. No arrows in relationships with degree higher than 2. Stock
Quaternary Relationship Example Producer Actor Director Make a movie Writer
Recursive Relationship • If an entity (type) has a relationship with itself, that relationship is called recursive. • The entity occurrences in the relationship may be distinct. • Since the subject and object are of the same entity type, the “roles” the occurrences are playing in the relationship may be added to the diagram.
Recursive Relationship • A typical example here is if one employee serves as the supervisor of another employee. • While the relationship involves different entity occurrences (i.e. two different employees), it involves only one entity type. Thus as far as entity type goes, the Employee entity has a relationship with itself.
Role names are used to clarify situations with multiple relationships Teacher Student Teaches Faculty Student Advises Advisor Advisee
Attributes • Attributes are the properties of an entity. • E.g. the attributes of a Customer are FirstName, LastName, etc. • Relationships can also have properties. • E.g. a stock is bought or sold on a particular date (at a particular price). • (One may consider introducing a new entity called a transaction.)
Attribute Domain • An attribute’s domain is the set of values that a property is allowed to take on. • For example, • The quantity of items ordered is 0 • The price paid is 0 • Gender would be ‘M’ or ‘F’ (or perhaps NULL) • Phone numbers consist of numbers only. One can also specify the number of digits or a range thereof.
Self-documenting attribute names • When it comes time to implement the database and one is turning attributes into the corresponding fields, resist the temptation to use abbreviated field names. • If you use descriptive field names, your implementation will be self-documenting – in that many people will know what you mean simply by the name you have used. • Some designers suggest that the first part of a field name refer to the table/entity it belongs to. This can be especially useful with common fields like IDs and names.
Attributes: Simple or Composite • A property that takes on a value that cannot be broken down into pieces is called simple, (aka “atomic”) • E.g. quantity, price, gender • A property that can be broken into constituent properties is called composite. • E.g. address street, city, state, zipcode • name firstName, lastName
Attributes: Single-valued or Multi-valued • Single-valued: a property that takes on only one value at a time for a given entity occurrence. • E.g. dateOfBirth, you only have one • Multi-valued: a property that can take on more than one value at the same time for a given entity occurrence • E.g. phoneNumber (home and cell number) • E.g. beneficiary
Attributes: Derived • If a property can be determined from other properties, it is said to be derived. • E.g. age or AgeCategory (20-29, 30-39, etc.) can be derived from dateOfBirth • E.g. taxBracket can be derived from grossIncome and deductions • E.g. city might be derivable from zipcode
Keys • A candidate key is a minimal set of properties that uniquely determine each entity occurrence (record, row, tuple). • A candidate key may be composite, i.e. consisting of more than one property. • Minimal above means that if one property is removed from the set, the set no longer uniquely determines an occurrence.
Keys (Cont.) • There can be more than one candidate key. • In the school’s database (banner), a person can be identified by his or her • socialSecNumber • idNumber • pidm • The primary key is the candidate key that is selected to identify the entity occurrence internally (within the database). • A candidate not chosen to be the primary is sometimes called an alternate key.
Relational Model: Ingredients • The main components of the Relational Model are tables (a two-dimensional array). • Tables are a realization of the mathematical concept of a relation. • Tables are reminiscent of the files used in a file-based approach. • Table Relation File • The table is logical and the data does not necessarily take this form physically. • A table has a name.
Relational Model: Ingredients (Cont.) • A table collects together associated data. • A table is thought of in terms of rows and columns. • The data in a single column is all of the same type, i.e. all the same property. • E.g. all of the people’s last names. • The column (a.k.a. field) has a name and a type (e.g. text, number, etc.). • A table is distinct from a similar looking mathematical object, the matrix, in that the order of the columns does not matter. • Column Field Attribute Property
Relational Model: Ingredients (Cont.) • The row (a.k.a. a record) collects together the various properties that belong to a particular object. • E.g. a person’s first name, last name, date of birth, etc. • Again a table is distinct from a matrix, in that the order of the rows does not matter. • Row Record Tuple
More Relational Model Vocabulary • In addition to having a type, a field has a domain, the set of values that the particular property is allowed to have. • E.g. a number must fall between 0 and 100. • E.g. some text (string) must have two letters followed by four numbers. • E.g. a person’s gender must be M or F. • Ensuring that a value falls within the domain is called applying the domain constraint.
Input masks and Validation Rules are ways to impose domain constraints in Access
Since data was entered before the validation rule was introduced. The old data must be tested for consistency.
More Relational Model Vocabulary (Cont.) • The number of fields (tuples) in a table is known as its degree. • Unary relations (1-tuples) • Binary relations (2-tuples) • Ternary relations (3-tuples) • N-ary relations (N-tuples) • The number of records in a table is called its cardinality. • The degree is a property of the schema, while the cardinality is a property of the instance.
Degree and Cardinality degree cardinality
Keys • A fields or set of fields that can be used to uniquely identify all of the rows in a table is known as a key. • A key should not have any extraneous fields. • E.g. if SocSecNum uniquely identifies a person, then you don’t need SocSecNum and LastName. • A table may have more than one field or set of fields that serve this purpose, they are called collectively the candidate keys. • One key is chosen from the candidate keys to be the primary key.