300 likes | 409 Views
Database Systems. Marcus Kaiser School of Computing Science Newcastle University. Aims. provide an understanding of the role of Databases provide an understanding of the functionality of Databases introduce the skills of Database design. Student Objectives.
E N D
Database Systems Marcus Kaiser School of Computing Science Newcastle University
Aims • provide an understanding of the role of Databases • provide an understanding of the functionality of Databases • introduce the skills of Database design
Student Objectives • an awareness and understanding of the potential problems of storing data outside a Database system • a knowledge and understanding of what a database system can do • an awareness of when it is appropriate to use a Database • an introduction to how Database systems store information and process queries • an introduction to the skills of designing databases
Core Content • Storing Data Outside a Database Management System • Methods • Issues • Advantages of Database Management Systems • Overview of Database Management Systems • Introduction to Database Design • How Databases Store Information • How Databases Access Information • Advanced Issues • Security
Importance: Database Systems are the main technology used to manage business information Databases are used extensively in Bioinformatics Computer scientists use Databases in most projects
By the end of this Theme you should be able to: • identify opportunities for the use of a Database in business/Bioinformatics/Computing • design Databases • use a Database • explain what Databases offer • explain how Databases store and access data
What is a Database ? A collection of related Data about a subject that provides a base for procedures such as retrieving information and making decisions
Why are they important • All organisations are highly dependent on the effective use of information • day-to-day operations • historic analysis • prediction
Storing Data outside a Database System • Storing Data in Files (similar problems with spreadsheets) • Data often stored in as a delimited file (‘,’) • e.g. a Companies Personnel Address Database held in a file has one entry for each employee: Name , Initial ,Street, Town, Post Code, Telephone Number Smith,A,73 Dover St., Newcastle, NE2 3SR,01912234554 Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,01914566547 • We could then write a program to access the file to pick out fields • e.g. telnum Smith would print01912234554 • e.g. address Jones would print25 Spring Rd., Sunderland, S12 7DD • but this has limitations....
Problems with this Approach : Data Redundancy • Data Redundancy • If there is no central database then the same information may be held in several different places • e.g. • The personnel department has the address list shown above • but, the payroll group needs some different information and keeps its own records on each employee Name , Initial , Street, Town, Post Code, National Insurance Number, Salary Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000 Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450 • There is redundancy in the name and address information • stored in multiple places • wastes storage space • updates take twice the work • if they happen .....
Problems: Data Integrity • When common data is kept in separate files discrepancies often develop • e.g. Name and Address information • One may be updated but not the other
Interdependence of Programs and Data Files • The program which accesses the data has to know the format of the data • Change the format, and you have to change the access program • e.g. consider previous address database Name , Initial, Street, Town, Post Code, National Insurance Number, Salary Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000 Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450 • a program has been written to generate address labels from this file • but.. someone has complained about being addressed as A. Smith rather than as Dr. A. Smith ... so a title field is added to the file: Name , Initial, Title, Street, Town, Post Code, National Insurance Number, Salary Smith,A,Dr,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000 Jones,C.A.,Mrs,25 Spring Rd,Sunderland,S12 7DD,WM32345,21450 • so now all programs which access this file have to change.
Data Inconsistency when a Computer Fails • Consider the following Sequence of events: • Sue @ Newcastle University ATM • 11.00 Check Balance • 11.01 ATM says there’s £200 • 11.02 Ask for £200 • 11.03 ATM Finds £200 in account • 11.04 ATM Gives £200 • 11.05 ATM Stores £0 in Balance • What if the computer crashes at 11.04:30 ? • the balance is wrong
Data Loss when a Computer Fails • What if the Disk holding the data file breaks ? • Do we just loose all that information?
Sue @ Newcastle University ATM 11.00 Check Balance 11.01 ATM says there’s £200 11.02 Ask for £200 11.03 ATM Finds £200 in account 11.04 ATM Gives £200 11.05 ATM Stores £0 in Balance Jim @ Metro Centre ATM 11.00 Check Balance 11.01 ATM says there’s £200 11.02 Ask for £200 11.03 ATM Finds £200 in account 11.04 ATM Gives £200 11.05 ATM Stores £0 in Balance Simultaneous Access to the Data • Sometimes problems can occur when a file is being updated if there is more than one user. • e.g. Sue and Jim have a joint bank account. • they go shopping separately and both run out of money at the same time • they both head for the nearest ATM.......
Security • Some data needs to be protected • only certain users are allowed to access it • e.g. • Anyone in Personnel can see someone's address (to send them a letter) • Only the Head of Personnel can see their salary • Only the Head of Personnel can change a salary
Remote Access • Due to the problems of keeping multiple copies of data... • ...many organisations wish to centralize information • If the organisation is distributed so everyone can’t use the same computer, how do they all get access to the data ?
Requirements • Remove Redundancy – or at least control it • Data Integrity • Separation of data and program • Prevent inconsistency when systems fail • Allow multiple (simultaneous) interactions • High level of security • Remote access • Access by humans and computers
Database Management Systems: Structure DBMS Software Database Query Language User Query Pre-Defined Applications: Report Generator Query by Forms Application Program
Designing a Database • Before creating a Database we need to design it • Database design cannot be done purely mechanically but there are techniques which help • The steps are: 1. Decide on what subjects are covered by the database 2. Decide how they are related to each other 3. Decide on what characteristics they have 4. Derive the database tables from the design 5. Create the database tables using database software
Decide what Subjects are Covered by the Database Rules: • Look at what the subject is about, rather than the conclusions you want to find • Think about the subject independently of any particular database software (or computing at all) • Don’t design database on computer • Focus on the data you are likely to collect, rather than how a computer will organise the data
Decide on what subjects are covered by the DB Subjects are called Entities: • things that hold interest for you in your database • represent a class of things • precise definition • different for every database you create
Example: a Crime database • The police want to track cases • When a robbery has happened who is involved? • What was taken? • Did this go to court? • What was the verdict? • What are the Entities (subjects) here? • Police Officer • Stolen Item • Case • Result
Example: a Crime database • What are the entities?
Decide how the entities are related to one another • Relationships • a significant association between the entities • represented as a diamond • Each relationship has: • a name • a degree (more on this later) • Names • Pick a simple word that encapsulates the relationship • Concentrate on the main entities first: the rest can come later when you understand the problem better
Deciding on the Attributes of the Entities • Attributes are details about the state of an entity • They are things we want to know about an entity • Ensure each has a unique name within the Entity • They are usually drawn as spokes on Entity-Relationship diagrams:
Summary Database requirements • Remove Redundancy – or at least control it • Data Integrity • Separation of data and program • Prevent inconsistency when systems fail • Allow multiple (simultaneous) interactions • High level of security Database structure (Entity-Relationship model) • Entity • Attribute • Relationship