410 likes | 567 Views
Databases. Lecture 9. What is a database?. An organised collection of related information or “data” Reflects some part of the real world Changes in the real world lead to changes in the database Data: Known facts that can be recorded and have some meaning. What is a database?.
E N D
Databases Lecture 9
What is a database? • An organised collection of related information or “data” • Reflects some part of the real world • Changes in the real world lead to changes in the database • Data: Known facts that can be recorded and have some meaning
What is a database? • Typically a database will consist of one or more tables (or entities) • A Table is a collection of records about a certain subject • Often people will use the term entity and table interchangeably but there is a subtle difference. An entity is an object that is represented in a database, usually with a table. Table are also needed to record data about the relationship between two entities • Each table has a number of fields (or attributes). These are usually the column names • A Field is a single piece of information about a subject • Each table has a number of records, each represented by a table row • A record is a complete set of fields relating to the same subject
What is a database system? • The data along with the Database Management Software. • Database Management Software (DBMS): A collection of software to support the storage, retrieval and modification of large volumes of data. Support is also provided for multiple users along with administration tools • E.g. Oracle, SQL Server, and Access
Why use a database? • Central repository for data • Data can be shared • Minimizes duplicate & redundant data • Minimizes mistakes and conflicting sets of data • E.g. take the example of a university with a number of different departments. Each dept. can keep it’s own records or can share in a database. • Consider the case where a student transfers from one course to another. According to one dept the student has left college and according to another he is attending college
Why use a database? • Databases describe more accurately the relationship between various sorts of data • These relationships can be viewed as a type of a constraint by the DBMS • E.g. A student is taking a course in College. Due to a clerical error, data is entered saying that the student is taking a course that does not exist. This may go unnoticed, but a DBMS will not allow this to happen if you design your database not to allow this
Why use a database? • Insulation between programs and data • Different programs can access the database via the DBMS • E.g. A company may wish to show some data from the database on the internet. Software running on the web server can make requests to the DBMS which returns the requested data • Changes in the structure of the database don’t effect the software that uses the data • If this was the case everytime a database was modified new software would have to be bought
Why use a database? • Data Abstraction • A user gets a conceptual representation of the data that is stores. I.e. A data model • Insulated from how the data is stores • The data model uses concepts such as entities, their attributes and their interrelationships which are easier for a user to understand
Why use a database? • Support for multiple views of the data • Different perspectives for different users • A view can be a subset of the data or contain derived data. E.g. somebody’s age can be derived from their birthday and the current date • E.g. Different depts. In a college will require different views of the database, giving them access to only relevant information
Functions of a DBMS • Data Storage, Retrieval and Updating • Data Dictionary • This describes the structure and content of the DB. E.g. The names of tables, names of fields, characteristics of fields, relationships between entities, what the user is allowed to do. • Transaction support • Feedback to the user. Did a transaction fail? Why? • Concurrency Control • Make sure that users aren’t accessing or changing data that is being changed by another user at the same time. Must be done in a controlled manner to avoid mistakes
Functions of a DBMS • Recovery Services • Database can be recovered to some past correct state in the event of failure. • This can done using a system log which contains information an all the previous transactions so that they can be reversed • The system log may also indicate the cause of failure • Backups can also be made regularly
Functions of a DBMS • Authorisation services • Who can access what data and how. • Support for data communication • Many DBs are accessed remotely and the DBMS must support this • Integrity services • Enforcing constraints to ensure that data remains correct. An example would be the data type. Other constraints are derived from the semantics of the data • Providing multiple interfaces • Query language interface for casual users, programming language interfaces for programmers, menu driven interfaces for beginners
Database efficiency • A (relational) database is considered efficient when there is: • Absence of redundancy: Same things are not recorder multiple times • Minimal use of NULL values: When a value is unset it is assigned a NULL value • No potential for lost information: Some things may ONLY be recorded in the DB if they are associated with other entities
Example of inefficiency • Redundancy in course data • Null values for tutor • If J Pink left we would lose all information about the Statistics course • We should have a separate table for course data and another for student data
Guideline for creating tables • When designing a database, the following guidelines may be helpful: • Primary key: each record should have a field which is unique to that record, e.g., Student Id, Account Number, PRSI Number, Membership Number, etc. • A primary key is a field or combination of fields that uniquely identify a record in a table. A primary key may not be NULL • Every attribute in a table should be dependant on the primary key. If it isn’t it should be in another table
Many to Many relationships • Imagine we have two tables: Driver and Vehicle • Driver contains details on a particular driver and Vehicle contains information on a particular type of vehicle • A Driver may drive more than one vehicle and obviously a vehicle may be driven by more than one driver • Where do we store this information??
Many to Many relationships • If we store this information in the Driver table we will have a multi valued attribute or not have any primary key and likewise with the Vehicle table • In order to avoid this we create a new table which will contain the primary key of each table, DriverID and VehicleID. • The combination of these two will be unique so we will have a compound primary key • The fields DriverID and VehicleID are related to the primary keys in their respective tables and are called foreign keys
Guideline for creating tables • When designing a database, the following guidelines may be helpful: • No field should be multi-valued. E.g. if an employee has two positions they may not be included in the same field. Instead, another table needs to be created • The meaning/purpose of the table should be clear • Eliminate redundancy • Avoid null (empty) fields if possible
Structured Query Language (SQL (pronounced sequel)) • A language for interfacing with a database • Designed by IBM • It is both a data definition language (DDL) and a data manipulation language (DML) • DDL: Use to create the database tables, set attribute types, set relationships and constraints • DML: Used to update, modify, add, remove and retrieve table data • Nearly all relational DBs support SQL
SQL DDL • The DDL has 4 main commands • CREATE DATABASE • CREATE TABLE • ALTER TABLE • DROP TABLE • To create a database called “company” the command is: CREATE DATABASE company
SQL DDL • To create a table we must specify all its attributes and constraints • CREATE TABLE Courses (CourseNumber NUMBER(6) PRIMARY KEY, CourseName, VARCHAR(30)) • CREATE TABLE Student (StudentID NUMBER(6) PRIMARY KEY, StudentName VARCHAR(30), CourseNumber REFERENCES Courses)
SQL DDL • To delete a table use the DROP command • DROP TABLE Student • To alter a table use the ALTER command • ALTER TABLE Student ADD DateOfBirth DATE • ALTER TABLE Student DELETE DateOfBirth
SQL DML • There are three command for modifying the database • INSERT • DELETE • UPDATE • INSERT INTO Courses (CourseNumber, CourseName) VALUES (5, ‘Sociology’)
SQL DML • To delete or update information we need to specify what rows • DELETE FROM Courses WHERE CourseNumber = 5 • UPDATE Courses SET CourseName = ‘Applied Sociology’ WHERE CourseNumber = 5
SQL DML: Queries • SQL queries have a standard basic format: SELECT <field1,field2,…..,fieldN> FROM <table1,table2,..…,tableN> WHERE <condition> • The WHERE is optional but is generally used
SQL DML: Queries • SELECT StudentName FROM Student • This returns every students name • SELECT StudentName FROM Student WHERE CourseNumber=32 • This returns every students name who is taking course number 32 • SELECT StudentName FROM Student WHERE CourseNumber=32 OR CourseName=11 • This returns every students name who is taking either course 32 or 11
A More Complex Example • Sometime we need to access more than one table to extract the requisite information • Take the Driver/Vehicle example from earlier • We want to find the names of the cars Michael Schumacher drives • We need to cross-reference different table to do this • This is known as a join
Ambiguous Attribute Names • A problem we are going to face when doing this is ambiguous field names • We can solve this by using the format: TableName.FieldName • SELECT DriverName, VehicleName FROM Driver, Vehicle, DrivenBy WHERE Driver.DriverID = DrivenBy.DriverID AND Vehicle.VehicleID = DrivenBy.VehicleID AND DriverName = ‘Michael Schumacher’
Other functions • MAX: returns the maximum of a list of values • MIN: returns the minimum • SUM: Returns the SUM of the results • COUNT: counts the list of values • AVG: gives the average of a list of values • SELECT COUNT( * ) FROM Driver • This returns the number of records in the Driver Table
Presenting your output • Sometimes you may want to assign a different name to the output of a query • Take the previous example. The output is a column named COUNT( * ) with a single value in it • It is better to call it something meaningful • SELECT COUNT( * ) AS Total_Drivers FROM Driver
Presenting your output • It would also be nice to order your output so that it can be easily read • We can do this using the ORDER BY statement after a query • SELECT DriverID, DriverName, DateOfBirth FROM Driver ORDER BY DriverName, DriverID
Mathematical Operators • SQL has the standard operators • + , - : addition and subtraction • * / : multiplication and division • <, > : less than and greater than • <=, >=: less than or equals and greater than or equals • SQL also supports the use of parentheses • NOTE: The * operator is also a wildcard and SQL distinguishes between the two by the context
Special Characters &Wild Cards • * : Instead of selecting all the attributes in a table you can use the star symbol • %: When doing a comparison between two character strings you can use this to specify that other variable characters may come before or follow the string • _: Instead of using % you can use the underscore to indicate a single character which can be anything
LIKE • You can use the % and _ in conjunction with the LIKE keyword in Queries • SELECT DriverName FROM Driver WHERE DriverName LIKE ‘%Schu%’ • This returns the driver names that contain ‘Schu’ • SELECT DriverName FROM Driver WHERE DriverName LIKE ‘Mich_ _ _ Schu%’ • What does this return???