1 / 53

Bottom-up Data Analysis

Bottom-up Data Analysis. An Example. Rigorous, bottom-up data analysis. Originally: redesign a paper-based information system Nowadays: given screen and report layouts, derive a set of relations Creativity and judgement to identify dependencies A procedure to derive the relations.

denzel
Download Presentation

Bottom-up Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bottom-up Data Analysis An Example Bottom Up Analysis

  2. Rigorous, bottom-up data analysis • Originally: redesign a paper-based information system • Nowadays: given screen and report layouts, derive a set of relations • Creativity and judgement to identify dependencies • A procedure to derive the relations Bottom Up Analysis

  3. Systems analyst obtains a copy of every document screen display report record card ... … or any other tangible information source relevant to the information system that is being designed Process the sources according to a well defined procedure …. to produce a suitable collection of relations Outline Bottom Up Analysis

  4. The Procedure (to be followed for every information source) • represent the source as an un-normalised structure • normalise that structure • splitting as necessary • check that all relations are in normal form • repeating as necessary • optimise the whole set of normalised relations Bottom Up Analysis

  5. Normalisation • Remember ... • normalisation means a separation of concerns • often eliminates redundancy in the relations • there are various normal forms • choose an acceptable one • 3NF • (BCNF) Bottom Up Analysis

  6. Analysing a Document Student_Registration • Suppose we are working with a registration card for Llandwp College Bottom Up Analysis

  7. An Un-normalised Structure • As an initial step, the document is represented as an un-normalised structure: Student Enrolment Bottom Up Analysis

  8. Functional Dependencies and Enterprise Rules • Next, functional dependencies between the data fields are recorded • Functional dependencies are determined by enterprise rules for the data in question Bottom Up Analysis

  9. Functional Dependencies Student no. Student's Name, Course title, Year of Course, Tutor's Staff Code, Tutor's Name, Tutor's Extension Tutor's Staff Code Tutor's Name, Tutor's Extension Student no., Unit Code Unit Title, Mark Unit Code Unit Title Bottom Up Analysis

  10. Functional Dependencies Student no. Student's Name, Course title, Year of Course, Tutor's Staff Code, Tutor's Name, Tutor's Extension Tutor's Staff Code Tutor's Name, Tutor's Extension Student no., Unit Code Unit Title, Mark Unit Code Unit Title Bottom Up Analysis

  11. First Normal Form • The un-normalised structure is brought to first normal form by transferring repeating groups of items to separate relation schemes • Each new scheme includes the key of the original un-normalised structure as part of its own key Bottom Up Analysis

  12. First Normal Form Student Enrolment Bottom Up Analysis

  13. Student Enrolment First Normal Form • STUDENT(Student no., Student’s Name, Year of Course, Tutor's Staff Code, Tutor’s Name, Tutor’s Extension) • ENROLMENT(Student no*, Unit Code, Unit Title, Mark) • FKStudent noreferencesSTUDENT Bottom Up Analysis

  14. Second Normal Form • The relation schemes are brought to second normal form by eliminating partial dependencies on keys • Unit Title is only partially dependent on the key (Student no., Unit Code), since Unit Code alone is sufficient to determine its value for any unit Bottom Up Analysis

  15. Second Normal Form Student Unit Enrolment Bottom Up Analysis

  16. Enrolment Unit Student Second Normal Form • STUDENT(Student no., Student’s Name.Course Title, Year of Course, Tutor's Staff Code, Tutor’s Name, Tutor’s Extension) • UNIT(Unit Code, Unit Title) • ENROL(Student no.*, Unit Code*, Mark) • FKStudent noreferencesSTUDENT • FKUnit CodereferencesUNIT Bottom Up Analysis

  17. Bringing Relation Schemes to Third Normal Form • The relation schemes are brought to third normal form by eliminating transitive dependencies on keys • Tutor's Name and Tutor's Extension are transitively dependent on the key, Student no Bottom Up Analysis

  18. Third Normal Form Student Enrolment Unit Staff Bottom Up Analysis

  19. Student Staff Unit Enrolment Third Normal Form • STUDENT(Student no.{PK}, Student’s Name.Course Title, Year of Course, Tutor's Staff Code*) FKTutor’s Staff CodereferencesTUTOR • UNIT(Unit Code{PK}, Unit Title) • TUTOR(Tutor’s Staff Code{PK}, Tutor’s Name, Tutor’s Extension) • ENROL(Student no.*, Unit Code* {PK}, Mark) • FKStudent no.referencesSTUDENT, FKUnit CodereferencesUNIT Bottom Up Analysis

  20. Represent the source document (as an un-normalised structure) Identify functional dependencies between data items Construct first normal form schemes by eliminating repeating groups of data items Bring the schemes to second normal form by eliminating partial dependencies on key fields Bring the schemes to third normal form by eliminating transitive dependencies on key fields Summary of the normalisation procedure Bottom Up Analysis

  21. More documents from the Llandwp College • Also included are copies of • a list of Course Units • a list of Staff Teaching Duties • a Timetable • As a systems analyst, it is your job to analyse these documents • like the Registration Card was analysed above Bottom Up Analysis

  22. The List of Course Units Example data (document) C22 Problem Solving C31 Basic Compiler Design C66 Computing Fundamentals C70 Java coding C89 Practical Business Computing … Bottom Up Analysis

  23. Representing the Course Units list as an "un-normalised" structure gives: Enterprise rule: Each unit has a unique unit code which determines its title The only functional dependency is: Unit Code Unit Title The List of Course Units Bottom Up Analysis

  24. Analysing the List of Course Units • In this case, the “un-normalised” structure is already in third normal form • It contains • no repeating groups of data items, • no partial dependencies on key fields, and • no transitive dependencies on keys. • It needs no further analysis, and the resulting relation scheme is, as before: Bottom Up Analysis

  25. Unit Analysing the List of Course Units • UNIT( Unit Code, Unit Title) Bottom Up Analysis

  26. The List of Teaching Duties Example data csano Tony Otter C31, C70 csasd Amanda Dolittle C89 cswas Bill Smith C22, C66 Bottom Up Analysis

  27. Un-normalised Structure: Enterprise Rule: Each member of staff has a unique staff code. Functional dependency: Staff Code Name Enterprise Rule: Each member of staff may teach many Units: {Staff Code, Unit Code} { } Staff The List of Teaching Duties Bottom Up Analysis

  28. These schemes are also in third normal form (and BCNF) . First Normal Form Eliminate repeating groups of data items by moving the items in question to a new relation scheme, and including the key of the original scheme in that of the new scheme Staff Teach Analysing the list of Teaching Duties Bottom Up Analysis

  29. Staff Teach Analysing the list of Teaching Duties • STAFF(Staff Code, Name ) • TEACH(Staff Code*, Unit Code) • FK Staff Code references STAFF Bottom Up Analysis

  30. Analysing the Timetable Example Data the timetable is written as a list Monday 10:00 C22 321 12:00 C66 318 Tuesday 11:00 C89 317 11:00 C70 320 ... Bottom Up Analysis

  31. Analysing the Timetable • Un-normalised Structure: Bottom Up Analysis

  32. Analysing the Timetable • Enterprise Rules: • At a given time and day of the week, a unit (module) can only take place in one room. • Functional Dependency: Day of Week, Time, Unit Code Room No. • At a given time and day of the week, a room can only hold one unit (module). • Functional Dependency: Day of Week, Time, Room No. Unit Code (Captures that different units cannot co-exist at same time in same room) Bottom Up Analysis

  33. First Normal Form: These schemes are also in third normal form. CKs: {Day, Time, Unit} {Day, Time, Room} Analysing the Timetable Bottom Up Analysis

  34. WEEK_DAY(Day of Week) TIME_SLOT(Day of Week*, Time) FK Day of Week references WEEK_DAY T_TABLE(Day of Week, Time*, Unit Code, Room No) ALTERNATE KEY:(Day of Week, Time,Room No.) FK {Day of Week, Time} references TIME_SLOT CK: {Day, Time, Room} Analysing the Timetable Bottom Up Analysis

  35. The Complete Set of Normalised Relation Schemes • From the Student Registration Card: • 4 relations • From the Course Units List: • 1 relation • From the Teaching Duties List: • 2 relations • From the Timetable: • 3 relations Bottom Up Analysis

  36. The Normalised Relation Schema Bottom Up Analysis

  37. Check attribute names for consistency, renaming attributes where necessary Merge relations that have the same key if they represent the same real world entity Do not merge relations that have the same key if one represents a subset domain of the other Eliminate key-only relations unless they represent data that you want to store in tables Merging the Normalised Relation Schemes Bottom Up Analysis

  38. The Optimised Normalised Relation Schema Key only? Sameconcept? Duplication? Rename? Bottom Up Analysis

  39. The Optimised Normalised Relation Schema Added Slide Bottom Up Analysis

  40. STAFF(Staff Code, Name, Extension) STUDENT(Student no., Course Title, Year of Course, Tutor's Staff Code*) FK Tutor's Staff Code references STAFF UNIT(Unit Code, Unit Title) ENROL(Student no.*, Unit Code*, Mark) FK Student no. references STUDENT FK Unit Code references UNIT TEACH(Staff Code*, Unit Code*) FK Staff Code references STAFF FK Unit Code references UNIT TIME_SLOT(Day of Week, Time) T_TABLE(Day of Week*, Time*, Unit Code*, Room No.) FK Time (Day of Week, Time) references TIME_SLOT FK Unit Code Code references UNIT Summary of Relation Schema Added Slide

  41. Alternative Optimised Normalised Relation Schema Bottom Up Analysis

  42. STAFF(Staff Code, Name, Extension) STUDENT(Student no., Course Title, Year of Course, Tutor's Staff Code*) FK Tutor's Staff Code references STAFF UNIT(Unit Code, Unit Title) ENROL(Student no.*, Unit Code*, Mark) FK Student no. references STUDENT FK Unit Code references UNIT TEACH(Staff Code*, Unit Code*) FK Staff Code references STAFF FK Unit Code references UNIT TIME(Time Slot, Day of Week, Time) ALTERNATE KEY: (Day of Week, Time) T_TABLE(Time Slot*, Unit Code*, Room No.) ALTERNATE KEY:(Time Slot, Room No.) FK Time Slot references TIME FK Unit Code references UNIT Summary of Relation Schema

  43. Summary • Rigorous, bottom-up data analysis entails • collecting source documents • drawing up un-normalised structures • normalising these - 3NF, BCNF, 4NF ... • optimising the resulting set of schemes • The resulting set defines the structure of the database that is to be constructed Bottom Up Analysis

  44. Review - concise First bring relation to 1NF: StuDetails( StuNo, StuName, Ctitle, Cyear, TSCode, Tname, TExt, UCode, UTitle, Mark) Document the FDs: StuNo {StuName, CTitle, CYear, TSCode, TName, TExt} TSCode {TName, TExt} {StuNo,UCode} {UTitle, Mark} UCode Utitle Bottom Up Analysis

  45. Review - concise {StuNo,UCode} these attributes determine all others, and are minimal. So, the 1NF relation StuDetails( StuNo, StuName, Ctitle, Cyear, TSCode, Tname, TExt, UCode, UTitle, Mark) has key {StuNo, Ucode} Bottom Up Analysis

  46. Bring to 2NF The attributes of the key {StuNo,UCode} satisfy FDs StuNo {StuName, CTitle, CYear, TSCode, TName, TExt} UCode Utitle These present partial dependencies on the key {StuNo,UCode} . Eliminate these for 2NF: Bottom Up Analysis

  47. Bring to 2NF StuDetails( StuNo, StuName, CTitle, CYear, TSCode, Tname, TExt,) UnitDetails(UCode, UTitle) StuReg( StuNo*, Ucode*, Mark) Bottom Up Analysis

  48. Bring to 3NF In StuDetails, we still have: StuNo {StuName, CTitle, CYear,TSCode, TName, TExt} TSCode {TName, TExt} So,StuNo TSCode {TName, TExt} Transitive dependency! Bottom Up Analysis

  49. Bring to 3NF Eliminate the transitive dependency: StuDetails( StuNo, StuName, CTitle, CYear, TSCode*} TutorDetails(TScode, Tname, TExt,) Bottom Up Analysis

  50. Complete Normalisation of Registration Table StuDetails( StuNo(PK), StuName, CTitle, CYear, TSCode*} TutorDetails(TScode (PK), Tname, TExt,) UnitDetails(UCode (PK), UTitle) Enrol( StuNo*, Ucode*, Mark) FK StudNo references StudDetails FK Ucode references UnitDetails Bottom Up Analysis

More Related