430 likes | 582 Views
An Introduction to Databases. Dr Stephen Swift The Intelligent Data Analysis Group Brunel University. An Introduction to Databases. Databases The Parts of a Database A Brief Description of SQL Examples Using Microsoft Access. What is a Database? (1).
E N D
An Introduction to Databases Dr Stephen Swift The Intelligent Data Analysis Group Brunel University
An Introduction to Databases • Databases • The Parts of a Database • A Brief Description of SQL • Examples Using Microsoft Access
What is a Database? (1) • A Database System is a Computerised Record Keeping System • Rather Like an Electronic Filing Cabinet • The Data can be Added to, Deleted, Modified etc… • The Data Contained is of the Same Type • Would Not Have a Database Containing Patient Records and the Sales Records of a Pet Shop, For Example
What is a Database? (2) • In Large Organisations, a Database System is Usually a Subsystem of a Larger Information System • An Information System Supports the Information Handling Requirements of an Organisation • Smaller Organisations Might Just Have a Single Database • A Database Management System (DBMS) is a Software System that Enables Users to Define, Create, Maintain and Control Access to a Database
Why Are Databases Needed? • A Huge Amount of Data is Being Collected Every Second of the Day • The Data: • Is Often Complex • Large in Size • Requires Sophisticated Manipulation • Databases and DBMS are Essential to Successfully Manage Such Data
An MDB File Tables Queries Forms & Reports Macros & Modules An MS Access Database
Microsoft Access • A Stand Alone Database System • All Aspects of the Database are Contained in a Single MDB File • Slow When Handling Huge Volumes of Data • Can be Used to Create Database Applications
Foreign Column name Primary key key Ward Patient Patient Patient Date of Sex No. No. Surname Forename birth 23/7/53 10 923 Moneybags Maurice Male 3/4/41 11 109 Foot Ivor Male Rows 13/11/61 10 854 Hare Susan Female 4/2/31 7 231 Knee Boris Male 10/2/70 10 459 Legg Brian Male Male 1–12 Domains Female Sex Patient No. Surname Forename Date of birth Ward No. The Patient table Ward Ward Type Number No. Name of beds 8 3 Nightingale Medical 12 11 Fleming Medical 21 10 Barnard Surgical The Ward table Tables
Table Properties (1) • Rows (or Records) • Shows Occurrences of Patient • Each Row Must be Uniquely Identifiable • The Order of the Rows MUST NOT Be Significant
Table Properties (2) Columns (or Fields) • Each Column has a Type, e.g. Number, Text, Boolean, Multimedia, etc… • The Order of the Columns MUST NOT be Significant • Only One Value Should be Associated With Each Column/Row Intersection in the Table
Table Properties (3) • Domain • A Pool of Possible Values From Which the Actual Values Appearing in the Columns of the Table are Drawn • e.g. The Domain of Patient Numbers Includes all of the Possible Patient Numbers, Not Just the Ones Currently in Hospital • Very Important for Comparing Values from Different Tables
The Primary Key • A Special Type of Field • Not All Tables Have a Primary Key • Usually a Number or String, e.g. Patient Number • Used to Relate Data Between Tables
Worked Examples • Check That Microsoft Access Loads • Check That You Can See Four Files: • “Functions.xls” • “Gene ID.xls” • “spellman_yeast_alpha.xls” • “annette2004.ppt”
Worked Example (1) • We Will: • Create a Microsoft Access Database • Import Some Data • Make Sure the Fields are the Correct Type • Create Three Tables • Look at the Tables (Datasheet View)
Queries (1) • A Query Selects or Modifies a Subset of One or More Tables • E.g. All Female Patients Under 18 Years Old • A Query is Often Expressed in a Special Language Called SQL
SQL • “Structured Query Language” • Originally a Proprietary Language from IBM • Now an International Standard High Level Language Supported by Most Database Products • Used to Modify Data Within a Database
Data Manipulation • Data is Manipulated by Rows and Columns • A Subset of Data is Selected and then Modified • The Selection is Made by the User, Usually Some Set of Requirements • E.g. Select All Female Patients Under 18 Years Old and Delete All Their Records
Queries (2) A SELECT Query Selects a Subset of One or More Tables SELECT <Fields> FROM <Table> WHERE <Condition>; SELECT Alpha.* FROM Alpha WHERE Alpha.alpha63="NULL";
Queries (3) A Make Table Query Creates a Subset of One or More Tables and Puts the Results Into a New Table. The Destination Table is Replaced SELECT <Fields> INTO <Destination Table> FROM <Source Table> WHERE <Condition>; SELECT Alpha.* INTO Temp FROM Alpha WHERE Alpha.ORF Like "YP*";
Queries (4) An Update Query Changes the Values of One or More Fields in One or More Tables UPDATE <Table> SET <Fields to Values> WHERE <Condition>; UPDATE Alpha SET Alpha.alpha63 = "0“ WHERE Alpha.alpha63="NULL";
Queries (5) An Append Query Selects a Subset of One Tables and Adds it into Another Table INSERT INTO <Destination Table> SELECT <Fields> FROM <Source Table> WHERE <Condition>; INSERT INTO Temp SELECT Alpha.* FROM Alpha WHERE Alpha.alpha63="NULL";
Queries (6) A Delete Query Removes a Subset of One or More Tables From the Database DELETE <Rows> FROM <Table> WHERE <Condition>; DELETE Alpha.*FROM Alpha WHERE Alpha.alpha63="NULL";
Queries (7) A Crosstab Query is Very Complex and Will Therefore Not be Covered!
Worked Example (2) • We Have Some Import Errors • We Must Locate What Fields are in Error • We Must Then Use an UPDATE Query to Modify the Erroneous Data
Forms Forms are Used to View/Add/Manipulate Data
Data Entry (1) • The User Should Only be Able to Enter the Domain of a Field on a Form • E.g. If There are Only 10 Wards in a Hospital, They Should Only be Able to Enter 1-10 in the Wards Field • In the Example Above, Allowing Any Number Would Increase the Chance of Data Errors
Data Entry (2) • Pick Lists and Check Boxes Can Help to Maintain Data Integrity • Validation Rules on Form Fields Can Prevent the User From Entering Invalid Data • Minimise Free Text Entry to Fields • The Application Should Help the User in Completing Forms Correctly
Reports Reports are Used to Display Data
Macros and Modules • Macros are a User-Defined List of Database Actions to be Carried Out • Usually Commonly Performed Tasks • A Module Contains Functions and Subroutines that Carry Out More Complex Tasks • Modules are Constructed Using a Form of Visual Basic
Joins • A Join Combines Two Tables into One Virtual Table • Tables are Joined Together Based on a Common Value in a Field • The Field That the Two Tables are Joined on Must be the Same Type
Worked Example (3) • We Are Going to Join Our Tables Together • Using “Tools-Relationships” • Add the Three Tables We Imported • Join “Alpha-ORF” and “Gene ID-ORF” • Join “Gene ID-SGD” and “Function-SGD”
Worked Example (4) • Now Look at the Effect on: • Building a SELECT Query on All of the Tables • The Datasheet View For One of the Tables • Without Joins it Would be Very Difficult to Relate and/or Compare Data From Different Tables • Why is This Important?
Normalising a Table Normalisation is: “The Organisation of a System's Attributes into a Set of Compact and Meaningful Tables”
Normalising a Table Well Normalised Tables Avoid: • Unnecessary Duplication of Data • i.e. No Redundant Data • Problems With Modifying, Inserting and Deleting Data • N.B. Sometimes Referred to as “Update Anomalies”
Stages of Normalisation (1) • Normalisation Takes Place in Stages • Each Stage is Known as a Normal Form • Each Stage is a Development From the Previous Stage
Stages of Normalisation (2) Un-Normalised Form First Normal Form Second Normal Form Third Normal Form
Un-Normalised Form • Column Headings (Field Names) Should be Meaningful • Choice of Primary Key • Must be Unique for the Particular Data Source • May Require Two or More Fields • Use the Smallest Number of Fields Possible • Avoid Textual Keys (Degrades Speed)
1st, 2nd and 3rd Normal Form • 1st : Separate any Repeating Groups of Fields to Other/New Tables • 2nd : Separate Fields that Only Depend Upon Part of the Key to Other/New Tables • 3rd : Separate any Fields That are Not Directly and Fully Dependent on the Key to Other/New Tables
DRUG CARD Patient No. Surname Forename 923 Moneybags Maurice Ward No. Ward Name 10 Barnard Drugs Prescribed Length of Date Drug Code Drug Name Dosage Treatment 2 pills 3 x day 20/5/88 CO2355P Cortisone 14 days after meals Injection every 4 hours 20/5/88 MO3416T Morphine 5 MO3416T Morphine Injection 25/5/88 3 every 8 hours 26/5/88 PE8694N Penicillin 1 pill 3 x day 7 for additional drugs continue on another card Sample Source of Data
SYSTEM: Hospital DATE / / AUTHOR Source ID No.: Name of Source: Drug Card UNF 1NF 2NF 3NF Patient Number Patient Number Patient Number Patient Number Patient Surname Patient Surname Patient Surname Patient Surname Patient Forename Patient Forename Patient Forename Patient Forename Ward Number Ward Number Ward Number Ward Number * Ward Name Ward Name Ward Name Prescription Date Patient Number Patient Number Ward Number Drug Code Prescription Date Prescription Date Drug Name Ward Name Drug Code Drug Code Dosage Drug Name Dosage Patient Number Length of Treatment Dosage Length of Treatment Prescription Date Length of Treatment Drug Code Drug Code Dosage Drug Name Length of Treatment Drug Code Drug Name After Normalisation
Drug Drug Code Drug Name CO2355P Cortisone Morphine MO3416T Penicillin PE8694N AS473A Aspirin VA231M Valium Tables as a Logical Data Structure Ward Patient Wd No Ward Name Pat No Forename Wd No Surname Barnard 10 923 Moneybags Maurice 10 11 Fleming 109 Foot Ivor 11 Prescription Trt Lgth Drug Code Prescr Date Pat No Dosage 2 pills 3 x day after meals 923 20/5/88 CO2355P 14 Injection 923 20/5/88 MO3416T 5 every 4 hours Injection 923 25/5/88 MO3416T 3 every 8 hours 923 26/5/88 PE8694N 1 pill 3 x day 7 2 pills 3 x day 109 15/5/88 AS473A 7 after meals 109 20/5/88 VA231M 2 per day 5
Worked Example (4) • Create a SELECT Query that Just Displays the Functional Groups • Check that it Contains What We are After • Change the SELECT Query to a MAKE TABLE Query
References • Further Reading and Source for this Presentation: • Database Systems: “A Practical Approach to Design, Implementation and Management”, 3rd Edition, T. Connolly and C. Begg, Addison Wesley, 2001 • “An Introduction to Database Systems”, 8th Edition, C. J. Date, Addison Wesley, 2004