230 likes | 374 Views
Introduction to Databases. Week 1, Day 1 (based on Ch 1 of Connolly and Begg). Introduction to Databases - Outline. Before Databases Some history not in the text File Based Approach Illustrated with real world problems Database Approach With simplified advantages & disadvantages.
E N D
Introduction to Databases Week 1, Day 1 (based on Ch 1 of Connolly and Begg) CMPT 355 Sept-Dec 2010 - w1d1
Introduction to Databases - Outline • Before Databases • Some history not in the text • File Based Approach • Illustrated with real world problems • Database Approach • With simplified advantages & disadvantages CMPT 355 Sept-Dec 2010 - w1d1
Before Databases - Outline • Some Basic Concepts • Record • File • Field • Accessing Data • Sequential Access • Direct Access • Record Keys • Indexed Sequential Access • Random Access • Some Problem Scenarios CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts Record: • The name “record” is based on traditional “recorded” documents. • Earliest “records” were 80 column punch cards • “The card is often called a unit record, because data is restricted to the 80 columns, and the card is read or punched as a unit of information.”1 • Other definitions • “A stored record is an identifiable collection of data elements.”2 • “A record is some collection of attributes that describe some entity or event.”3 1 Introduction to IBM Data Processing Systems, 1964, IBM, F22-6517-2 2 Introduction to Data Management, 1970, IBM, SC20-8096-0 3 J. Carter, Developing e-Commerce Systems, Prentice-Hall, 2002 CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts Record (cont): • A record is the basic unit of stored data that an user recognizes. • E.g. customer record, sales slip record. • Problems • Different users may • have different records for the same data • use different versions of the same record • Currently dealt with as records / rows / views. CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts File: • Name “file” based on traditional file folders and filing cabinets • “A named collection of occurrences of logical records which may be of more than one logical record type; a set of application record values, pertaining to one or more record formats.”1 • Other definitions • “Stored records are grouped on storage volumes as data sets.”2 • “A collection of similar records that may be used individually or together”3 1 Data Base Concepts, 1971, IBM, ZR20-4219-0 2 Introduction to Data Management, 1970, IBM, SC20-8096-0 3 J. Carter, Developing e-Commerce Systems, 2002, Prentice-Hall CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts File (cont): • The basic unit of stored data that an operating system recognizes • E.g. Customer file, sales slip file • Currently dealt with as a tables. CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts Field: • The name “field” is based on traditional “fields” that need to be filled in on forms • “A field is the smallest meaningful unit of information of interest.”1 • Other related definitions • “The smallest unit of logical data of concern to a programmer.”2 1 Introduction to Data Management, 1970, IBM, SC20-8096-0 2 Data Base Concepts, 1971, IBM, ZR20-4219-0 CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Basic Concepts Field: • The basic unit of stored data that a program recognizes • E.g. customer name, sales slip id number • Problems • Does name {first + last} require 1 or 2 fields? • How many fields do you use for an address? • How many fields are needed on a sales slip to record all items purchased? • Currently dealt with as a data attribute. CMPT 355 Sept-Dec 2010 - w1d1
B db – Accessing Data Sequential Access • The method of using tape storage. (Consider accessing a song on a cassette tape.) • Easiest to use if sorted based on some field of information (usually a record key) • “Each file is made up of records, each containing information required to describe completely a single item. The sequence may be by item number, name, account number, or man number, but all files in a single application must be in the same sequence.”1 • Updates + Old File = New File 1 Introduction to IBM Data Processing Systems, 1964, IBM, F22-6517-2 CMPT 355 Sept-Dec 2010 - w1d1
B db – Accessing Data Record keys • According to IBM 1 • “The data element chosen to order the (sequential) data set is called the key. • “The sequence of data may be changed by selecting a different data element to be the key and sorting the stored records according to the values of the new key. • “In some cases, using one data element as a key is not sufficient to identify a given stored record. In this case, one or more additional data elements would be concatenated to form the key.” 1 Introduction to Data Management, 1970, IBM, SC20-8096-0 CMPT 355 Sept-Dec 2010 - w1d1
B db – Accessing Data Direct Access • The first access method designed to make use of the ability to quickly go to any location on a disk. • Records stored in fixed locations based on the values of key fields that can be directly mapped to a physical location on disk. • There must be space for records with each possible record key value. • Usually record key values are allocated sequentially to ensure that all storage locations are used (at least initially). • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1
B db – Accessing Data Indexed Sequential Access • Optimized access speed with storage space utilization as a major improvement over direct access. • Records stored in FCFS manner are quickly accessed by using an index of pointers from record keys to the locations of the records. • Index needs to be resorted each time it is updated. • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1
B db – Accessing Data Random Access • Optimized access speed with storage space utilization as a major improvement over direct access. • Records stored in at particular locations based on hashing values of the record key. • If multiple records hash to the same location, need to be able to deal with as small chains of records. • Records are updated in their original location. CMPT 355 Sept-Dec 2010 - w1d1
B db – Some Problem Scenarios • Me as a Grad Student moving from place to place • data redundancy • Me trying to get the Registrars people to work with the residence halls • data availability • Me looking for a book in the library • data sharability • Me answering a survey about my favorite beer data evolvability CMPT 355 Sept-Dec 2010 - w1d1
File Based Approach - Outline • Definition • Development • Disadvantages CMPT 355 Sept-Dec 2010 - w1d1
File Based Approach Definition A file based system is • A collection of application programs that perform services for the end-users such as the production of reports. Each program defines and manages its own data. Text p.7 Section 1.2.1 CMPT 355 Sept-Dec 2010 - w1d1
File Based Approach Development • Typically developed • bottom-up • to meet the needs of a small group of users • often on local departmental systems • Evolution may be limited by initial design CMPT 355 Sept-Dec 2010 - w1d1
File Based Approach Disadvantages • Separation and isolation of data • Hard to link data in several files - limiting data sharability • Duplication of data • Waste and inconsistency - due to data redundancy • Data dependence • Program and data structures are highly interdependent - limiting data evolvability • Incompatible file formats • Between programs and programming languages - further limiting data sharability • No standard for queries • You have to develop you own queries - to get data availability CMPT 355 Sept-Dec 2010 - w1d1
Database Approach - Outline • Definitions • Advantages • Disadvantages CMPT 355 Sept-Dec 2010 - w1d1
Database Approach Definitions A database is • A shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization. Text p.14 Section 1.3.1 CMPT 355 Sept-Dec 2010 - w1d1
Database Approach Advantages • Data integrity • Ensuring the correctness, protection, and security of the data • Data sharability • Ensuring the ability to share data between applications and between users on a need-to-know basis • Data availability • Ensuring the ability to access the data when and where it is needed • Database evolvability • Ensuring that the database can be modified to meet changing needs • Avoiding redundancy • That occurs where multiple (often incompatible and inconsistently updated) copies of data are collected and used independently of one another CMPT 355 Sept-Dec 2010 - w1d1
Database Approach Disadvantages • Complexity • Requires highly trained staff • Requires organizational infrastructure to handle costs, evolutionary planning, hardware and software support • Cost of (and Dependence on) DBMS • Large high performance DBMS have very high costs • Large high performance DBMS are closed source • Cost of Conversion • Interfacing with or ignoring legacy systems • Performance • Additions for new applications may slow down existing applications • High Impact of Failure • Moving from department threatening to organization threatening levels of risk CMPT 355 Sept-Dec 2010 - w1d1