690 likes | 814 Views
Welcome to CSE3180 ‘Principles of Database Systems’ Semester 1, 2005. Introduction to CSE3180. This unit covers many aspects associated with database, and in this series of lectures, those related to Relational Data Base (but not exclusively) I’m : Rod Simpson
E N D
Welcome to CSE3180 ‘Principles of Database Systems’ Semester 1, 2005
Introduction to CSE3180 This unit covers many aspects associated with database, and in this series of lectures, those related to Relational Data Base (but not exclusively) I’m : Rod Simpson My Office is : Room C 4.46 (for Semester 1) My Phone contact number is : (03)990 32352 My email is rod.simpson@csse.monash.edu.au
Which stands for School of Computer Science and Software Engineering Faculty of Information Technology Materials for this unit are located at www.csse.monash.edu.au/courseware/cse3180
University Notice Friday 11th March is the last day to add a Semester 1 or Full Year on-campus unit without the risk of a $73.00 late enrolment penalty
Principles of Database Systems The examination will be closed book. It will consist of 3 sections: A: A number of multi choice questions B: A number of short answer questions C: Four essay type questions (choice of 6)
Principles of Database Systems You should have a copy of the unit outline, which includes • The recommended text (Hoffer, Prescott and McFadden, Edition 7 or 8, whichever is available in the Bookshop) • The Project • The Certificate of Work Performed if working in a group • The Tutorial Guide
Timetable • The lecture will be held in B2.15 from 10.00am to midday on Thursdays • Tutorials will be held on Thursday in Laboratory K 1.07 from midday to 2.00pm The Laboratories are booked for 2 hours so that if you wish you can continue with the unit exercises or the project There are 16 terminals in each room
Exclusions The exclusions to this unit are • Bachelor of Computing • Bachelor of Computer Science • Masters degrees • Graduate Diplomas in Computing ...
Introduction Part 3 The notes for this unit can be found on the Monash Web Page at this address: http://www.csse.monash.edu.au/courseware/cse3180 The Unit Code is CSE3180 The notes are the PowerPoint format and can be viewed on the File Server software.
Lecture Objectives This lecture will cover: • some thoughts on data storage and retrieval constraints • what is the form of ‘data’ • some definitions of ‘data’, ‘information’, ‘audit trail’ • what is a data base - who would want one • some of the functions of the database management software • different models (commercial) • the relational model • advantages and disadvantages of database • some practical aspects of your project
Introduction Course Outline As you will see from the notes, the recommended text is Hoffer, Prescott and McFadden, ‘Database Management’. The examples and the exercises at the end of most chapters are well worth a read. There is a project, and it is expected that you will work in groups - your tutor will arrange this in the first tutorial. The assignment support software is either Oracle or MS Access. If you wish to use some other DBMS, (such as SQLServer or MySQL) you will need to come to some amicable arrangement with your tutor
Introduction Features of database in the lectures such as • Recovery • Security • Consistency • Concurrency • Database Management architecture • Background processes will be based on Oracle’s version 9i
Database In designing a database there is the need to understand what requirements are made of data analyses, who needs these analyses, when and in what form. In all this there is an expectation that the results of ‘queries’, which is the same as saying data supported by the database is accurate, timely and complete. In these lectures, you will see how these requirements can be built into a database - as you will do with your database model. And that is what it is - a model which accurately reflects data as it occurs and is processed in the ‘real world’.
Some Thoughts on Data Storage • A major benefit of Computing is the ability to STORE and RETRIEVE large amounts of data • However, there are a number of processes and other considerations which need to be worked together to maximise this benefit • Some very early items are • What data ? • What are its sources ? • What are the volumes / frequency ? • How long is to be stored and why this period ? • In what FORM is it to be stored ?
Some Thoughts on Data Retrieval • Who is going to ‘access’ (retrieve ?) this data • How often ? • From where ? • Why is data to be accessed - for what purpose ?
Some Thoughts on Data Retrieval • How is it to be accessed ? Voice inquiry, remote, by formal request, normal processing schedule, randomly, whenever the ‘need’ arises ? • Is the data to be freely available ? • Are there some limitations on access ? • How are these access limitations managed ? • What value is inherent in the data ?
Some General Thoughts • What time base or volume spread is to be represented by the data ? • What levels of accuracy are to be expected ? • Is data to be available 7 days a week, 24 hours/day ? • What response time is expected ? Minimum ? / Tolerable ? And just what does that mean in real value terms ? • How is new or altered data to be directed to existing data ?
Some General Thoughts • How is input access to be controlled ? • When and why is data deleted - who authorises such deletions ? • What does the ‘data’ consist of - characters, objects, audio visual, TV, audio, animation ? • What is the optimum method of storage (organisation) ?
Some Advanced Commercial Applications You have probably heard of these ‘advanced’ applications • Consumer (or Customer) Resource Management • Supply Chain Management But have you heard of • Business Intelligence • Business Process Intelligence • Business Process Management • Business Activity Monitoring These are some of the new, current applications
Some Definitions A General Definition: DATA - raw (unprocessed or partly processed) facts which represent the state of entities (things) which have occurred INFORMATION - data which has been processed into a form USEFUL TO THE USER What is Information to one user may be Data to another user.
Where are the Mars landers ? Which of these is Mars ? Possible influence of ABC Channel 2 ?
Audit Trail General Definition: ‘The presence of data processing media and procedures which allow any and / or all transaction(s) to be traced through ALL STAGES of processing’ This infers that the following devices / techniques are in place: 1. A logging device which ‘traps’ all transactions 2. Some way of tagging each transaction so that it can be identified 3. Some way of retrieving the required transaction(s) 4. Some way of archiving - what is the required period ? 5. Control procedures and processes to ensure integrity
Database A Database is a shared collection of Inter-Related data designed to meet the needs of multiple types of users and applications. This implies that multiple USER VIEWS can be defined Data stored is independent of the programs which use it Data is structured to provide a basis for future applications DATABASE = Stored Collection of Related Data May be physically distributed
Database Management Software A DBMS is SOFTWARE which provides access to the database in an integrated and controlled manner A DBMS must contain : 1. Data Definition and Structure capabilities 2. Data Manipulation capabilities
Data Definition and Manipulation Data Definition Language (DDL) used to describe data at the database level Schema level - complete database description Sub-Schema level - user views (restricted) Data Manipulation Language (DML) Provides for these Create Insert capabilities Update Retrieve (extract) Delete Drop Modify Calculation Report
The Many Faces of Database Databases can be: 1. Transaction Intensive - ATM’s Checkouts 2. Decision Support - Browsing for trends 3. Mixed-Load - Combination of both 4. Small databases - Few thousand records 5. Very Large Database - Many millions or trillions (VLDB) of records (Banks) 6. Non Traditional - Weather bureau, flight plans Computer Aided Design data 7. Mobile - Able to ‘move around’
DBMS Requirements Querying Capabilities Data Displays (Presentation) Data entry Data Validation Data Deletion Committing Procedures (of changes) AND Data Integrity, Security, Consistency and Concurrency Capabilities
The Many Faces of Database • They can be: • Data Warehouses (which you will meet in Lecture 8) • Data Marts (or martlets) • How is a database size measured ? • There are a number of ‘measurements’ • Raw data size • Total database size • Total usable disk space size (which includes media protection such as mirroring)
Hardware Database Raw Data Total Disk HP9000 Oracle 100GB 643GB Digital 8400 Oracle 100GB 361GB IBM SP2 DB2/6000 100GB 377GB NCR5100 Teradata 100GB 880GB NCR5100 Teradata 1,000GB 3,280GB The Many Faces of Database
How ‘Big’ can a Database be ? • Try this : The Land Registry in England has a DB2 database whose ‘size’ is 18.3 TeraBytes • This includes : User data Summaries Aggregates Indexes It does not include Free Space and Redundancy areas
Important Database Features • Data Integrity • Data Independence • Referential Integrity - Relational Database Model • Concurrency Control - Multiple Users • Consistency - multi users - distributed database - replicated database - partitioned database - mobile database • Recovery from failure (Transaction and Media) • Security
Data Base Models - Hierarchical owner / parent child / parent owner member child child / parent
Data Base Models - Network set of data owner member owner member set of data Note: Only linked sets can be accessed
Data Base Models - Network set of data owner member set of data Note: Only linked sets can be accessed
Data Base Models - Relational table table table table table A B C D E Any table(s) can be joined to any other table(s), provided there is a means of effecting the join Primary key / Foreign key concept. Data redundancy No fixed linkages
Dr. E. F. Codd Edgar F. Codd • Born 1932, UK. • Died April 18, 2003. • BA/MA (Maths) Oxford University • PhD University of Michigan. • The Relational Oath:“I promise to use the key, the whole key and nothing but the key, so help me Codd”
A Primary Key - What’s that ? • Hoffer, Prescott and McFadden define a Primary Key as : An attribute (or combination of attributes) which uniquely identifies each row in a relation. (table) • Richard T. Watson has this to say: The primary key definition block specifies a set of column values comprising the primary key. Once a Primary Key is defined, the system enforces its uniqueness by checking that the Primary Key of any new row does not already exist in the table.
And - A Foreign Key ?? • Hoffer, Prescott and McFadden’s definition: An attribute (or attributes) in a relation (table) of a database which serves as the Primary Key of another relation (table) in the same database. • Richard T. Watson says: An attribute (or attributes) that is a Primary Key in the same table, or another table. It is the method of recording relations in a relational database. And, both the Primary and Foreign Key(s) should be drawn from the same Domain.
2 relations (tables) EMPNUM NAME Date of Birth DEPTNUM 3 JONES 16-05-1956 605Referencing 7 SMITH 23-09-1965 432Table 11 ADAMS 11-08-1972 201 15 NGUYEN 23-10-1964 314 18 PHAN 16-11-1976 201 23 SMITH 19-09-1974 314 Relation (Table) Name : EMP Relation Schema: EMP(empnum,name,date of birth,deptnum) DEPTNUM DEPTNAME 201 Production 314 Finance 432 Information Systems 605 Administration Referenced Table Relation (Table) Name : DEPT Relation Schema: DEPT(deptnum, deptname)
Relational Database Data is represented in ROW and COLUMN form (matrix) (tuple) (attribute) Collections of related data ---> TABLES (relations) 1 or more tables ----> DATA BASE ATTRIBUTES are generally static ROWS are DYNAMIC and Time-Varying The number of Attributes = DEGREE of a table The number of Rows = CARDINALITY of a table
Some RDB Considerations • Data is held in tables • No order of data in tables - row or attribute • Concept of Foreign Key - Primary Key relationship • Data Typing - including nulls • Query Access - insert, update, delete, retrieval • Indexing on candidate (and Primary) keys • Integrity Constraints Attribute value ranges Referential Integrity (Foreign Key - Primary Key) Entity Integrity User Defined Integrity • Set retention constraints
Some RDB Considerations • Domain constraints • User defined ‘Rules’ e.g. quantities and values must not be negative; pricing rate must not be zero • Recovery procedures • No explicit linkages between tables • Linking or embedding database operations in a procedural language (Cobol, C ..) • Databases may be distributed across similar or different DBMS’s • Security features
Data Description Language Used to describe data at the Database level Structure: Attributes Schema : Complete description of the database using DDL SubSchema : Describes data in the database as it is ‘known’ to individual programs(processes) or users The segment of logical data record(s) required is commonly known as a VIEW
Data Manipulation Language Language (commands and syntax) used to cause transfers of data from the Database and the Operating Environment and vice versa Variety of Languages - Cobol, C, Java, C++ and SQL as in Access, DB2, dBASEV, Informix, Oracle, VisualDataBase, SQLServer, MySQL Windows versions provide Icons and Menu options which are translated by the DBMS software to Database manipulation commands Typical commands: get, put, replace, seek, update,delete, insert, drop, find, modify
A Typical Database Model Users - keyboard direct DBMS Database I n t e r f a c e P r o g r a m Menu Options Users Database Query Access Language Programs written in Cobol, C, C++, Pascal Java etc.
Advantages of Database • Reduced Data Redundancy • Data Integrity • Data Independence • Data Security • Data Consistency • Easier use of Data via DBMS Tools (Query languages, 4GL's)
Disadvantages of Database • Complexity • Expense • Vulnerability • Size of - disk storage, processor memory • Training Costs • Compatibility • Technology Lock In
More on DBMS Capabilities Required: • the DBMS must provide a natural interface of user data • the interface must be independent of any physical storage structures • different users with different views must be able to access the same database • database changes must be possible without affecting programs which do not use the changes (Physical and Logical Independence)