360 likes | 486 Views
Module 5.1.4. File and Database Concepts. Definitions. Database A large collection of data items and links between them, structured in a way that allows it to be accessed by a number of different applications programs. It is also used to describe any collection of data. Records
E N D
Module 5.1.4 File and Database Concepts
Definitions • Database • A large collection of data items and links between them, structured in a way that allows it to be accessed by a number of different applications programs. • It is also used to describe any collection of data. • Records • The basic unit of data stored in a data file. • It is a collection of items which may be different data types, all relating to the individual or object the record describes. • It is treated as a unit for processing. • Fields • Is part of a record designed to hold a single data item of a specified type. • Holds data items of the same data type • Primary Key • It is a unique value which is used to identify a record.
Definitions (Cont.) • A database is a collection of related data items: • An address book • A library loan system • A video library membership system • A register at a school • A database is made up of files (one or more) • A file is made up of records • Records are made of fields • Fields are made of alpha numeric characters
Tables • Tables contain information about entities. • A table is, in essence, a file • It contains Attributes (columns in a table) and Tuples (a tuple is a row of data) • The table is also known as a Relation
Entity • An entity is the overall name for the contents of a table. • For example, if you had a table that contained names and addresses of work colleagues, then the entire table is the entity and could be called: WORK_COLLEAGUES. When you write out the name of an entity, it should be in capitals. • The table below contains the forename, surname and date of birth of work colleagues. • Therefore as an entity, the name WORK_COLLEAGUES is appropriate. It is a name which gives an idea as to the contents of the table.
Attribute • An attribute is a column in a table • It is a field. • Each attribute can have its own properties: • Validation: • Length, type, presence, etc • Each attribute needs a unique name • Attribute can be a: • Primary Key • Foreign Key
Key Fields • A data file holding details of a club membership containing fields for: • Name, Initials, Date_Of_Birth and Membership_Number • The Membership_Number could be the Primary Key field because it is unique and identifies who the record is about. • Name could be a secondary key: • (a field which allows the record to be accessed in a different order to the primary key) • Date_Of_Birth could be a secondary key because the resulting list would be in order of age, which is useful.
Key Fields: Composite Key • Where there is not a single field suitable as a primary key, two or more fields can be combined to make a primary key • This is known as a composite key: • Take the following example – a project manager runs several projects with many employees and needs to keep a track of the hours spend by each employee on each project. The attributes are: • EmployeeID, ProjectID, Hours_Worked • No single key is suitable as a primary key • However, EmployeeID and ProjectID are suitable as a composite primary key
Foreign Keys • A foreign key is a field in a table • It is not a key field in the table but is a key field in another table The WorkID field in the DEPARTMENTS table is a foreign key because it is a primarykey in another table, the WORK_COLLEAGUES table
Duplicate Data • If data is repeated then it is said to have been duplicated • Entire records can be repeated: • Or parts of records can be repeated:
Duplicate Data • Increases storage space required • Makes errors more likely to occur • Need to change the same data in multiple places • If one is not changed then people looking at the data will not know which one is correct
Referential Integrity • This is ensuring that each attribute in a table that has a foreign key has a linked attribute in the home table of the foreign key • If it does not, the foreign key is said to be an orphan and referential integrity broken Referential integrity has been broken The foreign Key for IT – 567 - has no corresponding value in the WORK_COLLEAGUES table (the home of The WorkID foreign key)
Database Systems • Flat File • Relational • Hierarchical • Discuss benefits and drawbacks of each
Flat File • The data is held in separate files • There is no way to link different files together • The files are two dimensional • For example, an address book: • You may have an address book of Relations, Friends, Work Colleagues, etc. Each address book is a separate file. It may be that you work with a Relation. • In this case do you put them in Relation or Work Colleagues, or both. With a flat file there is no way to link the files so the entry should go in both.
Benefits and Drawbacks of Flat Files • Benefits: • Simple to set up • Easy to understand • Drawbacks: • Duplicated data • Increased storage space required • Cannot search across files • Referential integrity not available
Relational • The data is stored in independent, self contained tables • Foreign keys link the data in tables together • Tables can be linked in a variety of ways and using a number of foreign keys
Drawbacks and Benefits of a Relational Database • Benefits: • Easy to change structure of the database • Easy to add new tables • Referential integrity maintained • Data duplication eliminated • Drawbacks: • Needs complex software • Can be complicated to administer and set up
Hierarchical • A database where the data is held in a tree structure. • Each segment of the tree links to lower level, or child segments. (also called leaves) • Following the links down the tree will find the data. • There is a one to many link between parent (higher branch) and child (lower branch)
Benefits and Drawbacks of a Hierarchical Database • Benefits: • Fast access to data • Fast creation of reports based on child segments • Drawbacks: • Very difficult to create reports crossing child segments • Reorganisation of data is difficult • Outputs and reports must be known before the system is designed • Very difficult to change
Levels of Access • Not all Data should be visible/changeable by all people • Access rights for: • Read • Write • Edit • Level is usually assigned during a logon to the system
User ID and Password • User ID • Controls access to database • which data can be viewed • what format the data is viewed in • which data can be edited • Used to identify user as part of a group • Group can be allocated rights to forms, tables, queries and reports • Individual picks up group rights as a member of the group • Used to monitor user actions • Password • Used to verify User ID • Adds layer of protection
Data Types • Integer • Numbers without a decimal point – e.g Number of pages • Real • Numbers with a decimal point, e.g Currency • Boolean • One of two values, e.g True/False, Male or Female Yes/No • Autonumber • An integer which auto increments for a new record, e.g Key field • Date/Time • Used to store date/time • Choose appropriate data type for a given set and identify advantages and disadvantages of use of each.
Fixed Length Records • A record when the number of bits in a record is decided in advance at the design stage. • The length is constant and cannot be changed later • For example: • Name as a field could be given a fixed length of 20 characters • Advantages: • The length of the record will always be known • Disadvantages: • There may be spare characters at the end which takes up disk space • The name may be cut short (truncated)
Methods of Access • Serial • Sequential • Indexed Sequential • Random Access • Need to show advantages, disadvantages, characteristics and applications suitable for each
Serial • When the items are read/written one at a time from the physical start of the file • Slow access to individual items of data • Must read all records to know if the one you want is missing • Useful for storing transactions as they occur • Log files • Shop transactions
Sequential • This is where the items are read, one at a time, from the logical start of the file, in key order. • Similar to serial access, but sorted • Slow access to individual items of data • However, if you come to a record greater than the one you are searching for, you can stop looking • Useful when all records need to be accessed • updating payroll at the end of every month where every record needs to be accessed • Sending out statements or bills to all customers
Indexed Sequential • Indexed records of a sequential file • This allows both indexed and sequential access techniques to be used • Direct access to the file is required • For example: • To locate “Smith” and index is searched to find the location of the first item beginning with “S”. The pointer moves to this location and then the file is searched sequentially. • Useful for • monthly access to all records (payroll) • selective access to few address changes
Indexed Sequential – Multiple Indexes • Sometimes a file is too large to be searched quickly by using one index • In this case a set of indexes is used • For example: • To locate “Smith” an index is searched to find the location of an index for items beginning with “S” • The pointer moves to this location and the new index is searched to find the location of the first item beginning with “Sm” • The file is searched sequentially • There may be several indexes, depending on the file size
Random Access • Any item can be retrieved almost immediately, provided its position in the file is known. (there is a delay in calculating the address and then moving the the correct area). • Sometimes called direct access • The address of each record is the sector and track where the record is stored • Must use direct access media • Requires overheads in maintaining the file position
Hashing • The process of calculating a numeric value from one or more data items • This is an important technique in storing values in a data structure known as a hash table • The calculated value can be used to mark the position in the table where the data item should be found, enabling it to be accessed directly, rather than via a sequential search • Collisions • This occurs when two data items give the same value under hashing and should therefore be stored in the same place in the table
Example of Hashing and Collision • Hashing using Modular Arithmetic • Divide the key by the number of storage locations and take the remainder as the address: • A table with 100 locations – a four digit key: 1537 will be stored at location 37 (1537/100 = 15, remainder 37) • Hashing using Folding • The key is divided into two or more parts and the parts added together: • The number 847377 could be split into 847 and 377. If you add them together you get: 1224. If you have 100 storage locations, you would take the last two digits: 24
Collisions (Overflow) • The collisions can be put into the next available free block • If the record is not in the intended block, the file is searched sequentially until the record is found or an empty block is encountered in which case the record does not exist. • Looping Integers • If the block is full, add 1 to the key, if that block is full add 2 to the key, if that is full add 3 to the key and so on. If the end of the file is reached, start again from the beginning: • For example, if the record should be in block 32, try 33, then 35, then 38, then 42 and so on There are many methods for creating hash keys and dealing with overflows. These are only sample methods.