530 likes | 680 Views
Bottom-up Data Analysis. An Example. Rigorous, bottom-up data analysis. Originally: redesign a paper-based information system Nowadays: given screen and report layouts, derive a set of relations Creativity and judgement to identify dependencies A procedure to derive the relations.
E N D
Bottom-up Data Analysis An Example Bottom Up Analysis
Rigorous, bottom-up data analysis • Originally: redesign a paper-based information system • Nowadays: given screen and report layouts, derive a set of relations • Creativity and judgement to identify dependencies • A procedure to derive the relations Bottom Up Analysis
Systems analyst obtains a copy of every document screen display report record card ... … or any other tangible information source relevant to the information system that is being designed Process the sources according to a well defined procedure …. to produce a suitable collection of relations Outline Bottom Up Analysis
The Procedure (to be followed for every information source) • represent the source as an un-normalised structure • normalise that structure • splitting as necessary • check that all relations are in normal form • repeating as necessary • optimise the whole set of normalised relations Bottom Up Analysis
Normalisation • Remember ... • normalisation means a separation of concerns • often eliminates redundancy in the relations • there are various normal forms • choose an acceptable one • 3NF • (BCNF) Bottom Up Analysis
Analysing a Document Student_Registration • Suppose we are working with a registration card for Llandwp College Bottom Up Analysis
An Un-normalised Structure • As an initial step, the document is represented as an un-normalised structure: Student Enrolment Bottom Up Analysis
Functional Dependencies and Enterprise Rules • Next, functional dependencies between the data fields are recorded • Functional dependencies are determined by enterprise rules for the data in question Bottom Up Analysis
Functional Dependencies Student no. Student's Name, Course title, Year of Course, Tutor's Staff Code, Tutor's Name, Tutor's Extension Tutor's Staff Code Tutor's Name, Tutor's Extension Student no., Unit Code Unit Title, Mark Unit Code Unit Title Bottom Up Analysis
Functional Dependencies Student no. Student's Name, Course title, Year of Course, Tutor's Staff Code, Tutor's Name, Tutor's Extension Tutor's Staff Code Tutor's Name, Tutor's Extension Student no., Unit Code Unit Title, Mark Unit Code Unit Title Bottom Up Analysis
First Normal Form • The un-normalised structure is brought to first normal form by transferring repeating groups of items to separate relation schemes • Each new scheme includes the key of the original un-normalised structure as part of its own key Bottom Up Analysis
First Normal Form Student Enrolment Bottom Up Analysis
Student Enrolment First Normal Form • STUDENT(Student no., Student’s Name, Year of Course, Tutor's Staff Code, Tutor’s Name, Tutor’s Extension) • ENROLMENT(Student no*, Unit Code, Unit Title, Mark) • FKStudent noreferencesSTUDENT Bottom Up Analysis
Second Normal Form • The relation schemes are brought to second normal form by eliminating partial dependencies on keys • Unit Title is only partially dependent on the key (Student no., Unit Code), since Unit Code alone is sufficient to determine its value for any unit Bottom Up Analysis
Second Normal Form Student Unit Enrolment Bottom Up Analysis
Enrolment Unit Student Second Normal Form • STUDENT(Student no., Student’s Name.Course Title, Year of Course, Tutor's Staff Code, Tutor’s Name, Tutor’s Extension) • UNIT(Unit Code, Unit Title) • ENROL(Student no.*, Unit Code*, Mark) • FKStudent noreferencesSTUDENT • FKUnit CodereferencesUNIT Bottom Up Analysis
Bringing Relation Schemes to Third Normal Form • The relation schemes are brought to third normal form by eliminating transitive dependencies on keys • Tutor's Name and Tutor's Extension are transitively dependent on the key, Student no Bottom Up Analysis
Third Normal Form Student Enrolment Unit Staff Bottom Up Analysis
Student Staff Unit Enrolment Third Normal Form • STUDENT(Student no.{PK}, Student’s Name.Course Title, Year of Course, Tutor's Staff Code*) FKTutor’s Staff CodereferencesTUTOR • UNIT(Unit Code{PK}, Unit Title) • TUTOR(Tutor’s Staff Code{PK}, Tutor’s Name, Tutor’s Extension) • ENROL(Student no.*, Unit Code* {PK}, Mark) • FKStudent no.referencesSTUDENT, FKUnit CodereferencesUNIT Bottom Up Analysis
Represent the source document (as an un-normalised structure) Identify functional dependencies between data items Construct first normal form schemes by eliminating repeating groups of data items Bring the schemes to second normal form by eliminating partial dependencies on key fields Bring the schemes to third normal form by eliminating transitive dependencies on key fields Summary of the normalisation procedure Bottom Up Analysis
More documents from the Llandwp College • Also included are copies of • a list of Course Units • a list of Staff Teaching Duties • a Timetable • As a systems analyst, it is your job to analyse these documents • like the Registration Card was analysed above Bottom Up Analysis
The List of Course Units Example data (document) C22 Problem Solving C31 Basic Compiler Design C66 Computing Fundamentals C70 Java coding C89 Practical Business Computing … Bottom Up Analysis
Representing the Course Units list as an "un-normalised" structure gives: Enterprise rule: Each unit has a unique unit code which determines its title The only functional dependency is: Unit Code Unit Title The List of Course Units Bottom Up Analysis
Analysing the List of Course Units • In this case, the “un-normalised” structure is already in third normal form • It contains • no repeating groups of data items, • no partial dependencies on key fields, and • no transitive dependencies on keys. • It needs no further analysis, and the resulting relation scheme is, as before: Bottom Up Analysis
Unit Analysing the List of Course Units • UNIT( Unit Code, Unit Title) Bottom Up Analysis
The List of Teaching Duties Example data csano Tony Otter C31, C70 csasd Amanda Dolittle C89 cswas Bill Smith C22, C66 Bottom Up Analysis
Un-normalised Structure: Enterprise Rule: Each member of staff has a unique staff code. Functional dependency: Staff Code Name Enterprise Rule: Each member of staff may teach many Units: {Staff Code, Unit Code} { } Staff The List of Teaching Duties Bottom Up Analysis
These schemes are also in third normal form (and BCNF) . First Normal Form Eliminate repeating groups of data items by moving the items in question to a new relation scheme, and including the key of the original scheme in that of the new scheme Staff Teach Analysing the list of Teaching Duties Bottom Up Analysis
Staff Teach Analysing the list of Teaching Duties • STAFF(Staff Code, Name ) • TEACH(Staff Code*, Unit Code) • FK Staff Code references STAFF Bottom Up Analysis
Analysing the Timetable Example Data the timetable is written as a list Monday 10:00 C22 321 12:00 C66 318 Tuesday 11:00 C89 317 11:00 C70 320 ... Bottom Up Analysis
Analysing the Timetable • Un-normalised Structure: Bottom Up Analysis
Analysing the Timetable • Enterprise Rules: • At a given time and day of the week, a unit (module) can only take place in one room. • Functional Dependency: Day of Week, Time, Unit Code Room No. • At a given time and day of the week, a room can only hold one unit (module). • Functional Dependency: Day of Week, Time, Room No. Unit Code (Captures that different units cannot co-exist at same time in same room) Bottom Up Analysis
First Normal Form: These schemes are also in third normal form. CKs: {Day, Time, Unit} {Day, Time, Room} Analysing the Timetable Bottom Up Analysis
WEEK_DAY(Day of Week) TIME_SLOT(Day of Week*, Time) FK Day of Week references WEEK_DAY T_TABLE(Day of Week, Time*, Unit Code, Room No) ALTERNATE KEY:(Day of Week, Time,Room No.) FK {Day of Week, Time} references TIME_SLOT CK: {Day, Time, Room} Analysing the Timetable Bottom Up Analysis
The Complete Set of Normalised Relation Schemes • From the Student Registration Card: • 4 relations • From the Course Units List: • 1 relation • From the Teaching Duties List: • 2 relations • From the Timetable: • 3 relations Bottom Up Analysis
The Normalised Relation Schema Bottom Up Analysis
Check attribute names for consistency, renaming attributes where necessary Merge relations that have the same key if they represent the same real world entity Do not merge relations that have the same key if one represents a subset domain of the other Eliminate key-only relations unless they represent data that you want to store in tables Merging the Normalised Relation Schemes Bottom Up Analysis
The Optimised Normalised Relation Schema Key only? Sameconcept? Duplication? Rename? Bottom Up Analysis
The Optimised Normalised Relation Schema Added Slide Bottom Up Analysis
STAFF(Staff Code, Name, Extension) STUDENT(Student no., Course Title, Year of Course, Tutor's Staff Code*) FK Tutor's Staff Code references STAFF UNIT(Unit Code, Unit Title) ENROL(Student no.*, Unit Code*, Mark) FK Student no. references STUDENT FK Unit Code references UNIT TEACH(Staff Code*, Unit Code*) FK Staff Code references STAFF FK Unit Code references UNIT TIME_SLOT(Day of Week, Time) T_TABLE(Day of Week*, Time*, Unit Code*, Room No.) FK Time (Day of Week, Time) references TIME_SLOT FK Unit Code Code references UNIT Summary of Relation Schema Added Slide
Alternative Optimised Normalised Relation Schema Bottom Up Analysis
STAFF(Staff Code, Name, Extension) STUDENT(Student no., Course Title, Year of Course, Tutor's Staff Code*) FK Tutor's Staff Code references STAFF UNIT(Unit Code, Unit Title) ENROL(Student no.*, Unit Code*, Mark) FK Student no. references STUDENT FK Unit Code references UNIT TEACH(Staff Code*, Unit Code*) FK Staff Code references STAFF FK Unit Code references UNIT TIME(Time Slot, Day of Week, Time) ALTERNATE KEY: (Day of Week, Time) T_TABLE(Time Slot*, Unit Code*, Room No.) ALTERNATE KEY:(Time Slot, Room No.) FK Time Slot references TIME FK Unit Code references UNIT Summary of Relation Schema
Summary • Rigorous, bottom-up data analysis entails • collecting source documents • drawing up un-normalised structures • normalising these - 3NF, BCNF, 4NF ... • optimising the resulting set of schemes • The resulting set defines the structure of the database that is to be constructed Bottom Up Analysis
Review - concise First bring relation to 1NF: StuDetails( StuNo, StuName, Ctitle, Cyear, TSCode, Tname, TExt, UCode, UTitle, Mark) Document the FDs: StuNo {StuName, CTitle, CYear, TSCode, TName, TExt} TSCode {TName, TExt} {StuNo,UCode} {UTitle, Mark} UCode Utitle Bottom Up Analysis
Review - concise {StuNo,UCode} these attributes determine all others, and are minimal. So, the 1NF relation StuDetails( StuNo, StuName, Ctitle, Cyear, TSCode, Tname, TExt, UCode, UTitle, Mark) has key {StuNo, Ucode} Bottom Up Analysis
Bring to 2NF The attributes of the key {StuNo,UCode} satisfy FDs StuNo {StuName, CTitle, CYear, TSCode, TName, TExt} UCode Utitle These present partial dependencies on the key {StuNo,UCode} . Eliminate these for 2NF: Bottom Up Analysis
Bring to 2NF StuDetails( StuNo, StuName, CTitle, CYear, TSCode, Tname, TExt,) UnitDetails(UCode, UTitle) StuReg( StuNo*, Ucode*, Mark) Bottom Up Analysis
Bring to 3NF In StuDetails, we still have: StuNo {StuName, CTitle, CYear,TSCode, TName, TExt} TSCode {TName, TExt} So,StuNo TSCode {TName, TExt} Transitive dependency! Bottom Up Analysis
Bring to 3NF Eliminate the transitive dependency: StuDetails( StuNo, StuName, CTitle, CYear, TSCode*} TutorDetails(TScode, Tname, TExt,) Bottom Up Analysis
Complete Normalisation of Registration Table StuDetails( StuNo(PK), StuName, CTitle, CYear, TSCode*} TutorDetails(TScode (PK), Tname, TExt,) UnitDetails(UCode (PK), UTitle) Enrol( StuNo*, Ucode*, Mark) FK StudNo references StudDetails FK Ucode references UnitDetails Bottom Up Analysis