1 / 44

Principles of Database Design

Principles of Database Design. James J. Cimino NIH Clinical Center. Outline. Definition Motivation History and evolution Design principles Design methods Exercises Take-Home Messages. Database: Definition. A collection of data that: is organized usually computer-based

reece
Download Presentation

Principles of Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles of Database Design James J. Cimino NIH Clinical Center

  2. Outline • Definition • Motivation • History and evolution • Design principles • Design methods • Exercises • Take-Home Messages

  3. Database: Definition • A collection of data that: • is organized • usually computer-based • represents repetitive information implicitly • supports retrieval

  4. Content Name Date Diagnosis Medication Problem Procedure Visit Structure Field Record Table Database Information

  5. Paper Database as Expert System

  6. Motivation • Power and flexibility depend on data model • Database is the realization of data model • Evaluation of commercial products • Communicating with vendors and IT staff • Building your own databases

  7. 1234567 ADDR 123 MAIN STREET 0002 1234567 DOB 02/01/1948 0003 1-7 8-22 23-30 31-36 1234567 NAME SANDIEGO, CARMEN0001 1234567SANDIEGO,CARMEN123MAIN020148 In the beginning, there were punch cards…

  8. Sequential Files • Stored on magnetic tape • Longer (or shorter) than 80 characters • 8-bit characters (256 characters) • Variable-length records Len=21 Data Len=16 Data 0211234567SandiegoCarmen0161234568CiminoJim • Random access possible ID Loc ID Loc 12345678901234567890123 SandiegoCarmenCiminoJim 1234567000112345680015 • Slowwwwww…..

  9. Random Access Files • Disk storage with moving heads • Larger capacity (MB!) • Addressable records and fields using pointers • Indexes created as lists of pointers • Separation of physical and logical models • Can be difficult to recover if index corrupted

  10. } } } } } } Random Access Files 0000: Name1000Addr2000DOB 6000 1000: Sandiego, Carmen 2000: 123 Main Street 3000: 4000: 5000: 6000: Feb 1, 1948

  11. Indexed Files 0000: 0000 0004: 4000 0008: 2000 0000: Cimino, James 1000: 2000: Norton, Cathy 3000: 4000: Lindberg, Don

  12. Hierarchical Databases 1234567 Sandiego, Carmen 123 Main Street Labs Chem7 Chem7 K 3.9 Na142 K 4.3 Na136

  13. Hierarchical Databases • Easy to use • Efficient storage • “Tree walking” is fast • Queries across trees are slow • Flexible • Too flexible: chaos is allowed • Too easy to modify • Difficult to document complex structures

  14. Hierarchical Databases ^EMR(1234567)=“Sandiego, Carmen” ^EMR(1234567, “Address”)=“123 Main Street” ^EMR(1234567, “Chem7”, “2/2/02”, “Na”)=136 ^EMR(1234567, “Chem7”, “2/2/02”, “K”)=4.3 ^EMR(1234567, “Chem7”, “2/3/02”, “Na”)=142 ^EMR(1234567, “Chem7”, “2/3/02”, “K”)=3.9

  15. Hierarchical Chaos 1234567 Admissions Admission 1 Admit Date: 2/2/02 Primary DX: CHF Other DX AODM A Fib Flag: S Flag: P

  16. Network Databases 1234567 Gyn Clinic 2 Main St. Sandiego 305-2500 Secretary Gyn Clinic 8AM-5PM Ms Smith 305-1000 Service Pap Gyn Visit Dr. Jones Beeper 34

  17. Extensible Markup Language (XML) Databases • SGML is a metalanguage • SGML is used to write Document Type Definitions (DTDs) that define languages • HTML is a language with an SGML DTD • Tags are for formatting/presentation syntax • XML is a proper subset of SGML • XML defines tags that convey semantics • We could write “Health Markup Language” (“HML”) in XML (if we could agree on the semantics and tags) • Tags may or may not be stored with data

  18. <document> </document> • <document.id>CXR001</document.id> • <doc. date>19991101</doc. date> • <document.type> • </document.type> • <document.body> • <document.body> <identifier>P5-00010</identifier> <text>Chest X-Ray</text> <findings>No infiltrate, cardiac shadow not enlarged...</findings> <impression>Normal X-ray</impression>

  19. <patient> </patient> • <patient.id> • </patient.id> • <patient.name> • </patient.name> • <patient.dob>19230113</patient.dob> • <patient.sex value="male"/> • <inpatient/> • <id.value>1234789</id.value> • <family.name>Sandiego</family.name> • <given.name>Carmen</given.name> • <suffix>M.D.</suffix>

  20. Extensible Markup Language (XML) Databases • Strengths • Flexibility to represent wide range of data • Data carries its field assignment • Sparse data handled compactly • Tags can have platform-specific display • Weaknesses • Immature database tools • Verbose • I/O intensive • A trade-off of decreased efficiency for increased flexibility; ? scalability

  21. Relational Databases - Features • Tables with columns and rows • Logical vs. physical representation • Multiple indexes • Inter-table relationships • Virtual sequential files (with simultaneous update)

  22. Patient Lab_test Pt-UI Lname Fname Pt-UI Tname Date 12345 Smith Elmer 12346 Jones Barbara 12347 Clark Arthur 12348 Jones Casey 12349 Sample Steve 12345 Na 5/30/96 42353 CBC 5/30/96 47756 ESR 5/30/96 12348 HBsAg 5/30/96 34523 Amylase 5/30/96 Relational Databases From table “Patient”, get “Pt_UI” where “Lname”=“Jones” and “Fname”=“Casey” and then Get “Tname” and “Date” from table “Lab_test” for the same “Pt_UI”

  23. Normalization • Efficient database organization • Eliminate redundant data • Ensure data dependencies make sense • E.F. Codd, 1970: five normal forms • First Normal Form • Eliminate duplicative columns • Create separate tables for each group of related data • Identify each row with a unique column or set of columns (the primary key).

  24. Normalization (continued) • Second normal form • Remove subsets of data that apply to multiple rows of a table and place them in a separate table • Create relationships between these new tables and their predecessors through the use of foreign keys • Third normal form: • Remove columns that are not dependent upon the primary key

  25. Relational Databases - Advantages • Comprehensible • Multiple “views” possible • Easy to modify • New elements don’t “break” programs • Database management systems (DBMS) • Referential integrity • “Reorg” for efficiency • Access control • Locking for multiple simultaneous use

  26. Relational Databases - Disadvantages • Storage overhead • I/O-intense • Cost

  27. Systems Design Fundamentals • The data model is the most critical aspect • Data model should reflect real world objects and relationships to ensure durability • A correct data model outlasts applications, including many not anticipated at system start-up

  28. Patient ID Name Physician Phone No. System Design: Basic Concepts • The world contains “things” • Develop abstractions called “objects” • Group objects by criteria which represent the abstract object as an empty table

  29. Types of Objects (Tables) • Tangible things (book, person) • Roles (doctor, patient, supervisor) • Events (ordering of a lab test) • Interactions - bind two or more other objects via a transaction (“purchase” relates buyer to seller)

  30. LICENSE Doctor License Exp. Date Specialty Casey 123 ABC Jan 2004 Surgery Kildare 691XKY Mar 2005 Medicine Holiday 12-A-962 Sculpture No holes in the table No strange values Objects • All of the real-world things in the set (the “instances”) have the same characteristics • All instances conform to the same rules

  31. Patient ID Name Physician Phone No. 3131313 John Smith Casey 867-5309 1234567 Ben Casey Killdare PA6-9000 2121212 Mary Jones Holiday 555-1234 Basic Concepts (continued) • Empty tables can be filled in to represent the real world things from which the object was abstracted

  32. Patient ID Name Physician Phone No. Physician Address Phone Basic Concepts (continued) • Relationships between objects are “attributes” of those objects Relationship: “Has-Doc” Patient Has-Doc Physician

  33. Empty Table form: Patient_Admissions Pt_ID Date_Adm Time_Adm Unit Room Graphical Form: Patient_Admissions * Pt_ID * Date_Adm -Time_Adm -Unit -Room Table Notation Textual Form: Patient_Admissions (Pt_ID, Date_Adm, Time_Adm, Unit, Room)

  34. Formalisms for Tables • Rule 1: One instance of an object has: • exactly one value for each attribute • only one element per row-column intersection • no repeating groups • no true “holes” in table • Rule 2: Attributes contain no internal structure

  35. Not ok: Ok: Formalisms for Tables

  36. Formalisms for Tables • Rule 1: One instance of an object has: • exactly one value for each attribute • only one element per row-column intersection • no repeating groups • no true “holes” in table • Rule 2: Attributes contain no internal structure • Rule 3: Every attribute should represent a characteristic of the entire object, not a characteristic of a limited part of the object

  37. Patient Table *Person Identifier -Person Name -Date of Birth -Date of Admission Not OK: Attribute of encounter, not patient Patient Table *Person Identifier -Person Name -Date of Birth Admission Table *Admission ID -Person Identifier -Date of Admission OK: Formalisms for Tables

  38. Formalisms for Tables • Rule 1: One instance of an object has: • exactly one value for each attribute • only one element per row-column intersection • no repeating groups • no true “holes” in table • Rule 2: Attributes contain no internal structure • Rule 3: Every attribute should represent a characteristic of the entire object, not a characteristic of a limited part of the object

  39. Relationships • Relationship: an abstraction of an association between real world things • Patient OCCUPIES Bed • Library CONTAINS Books • Specimen IS ASSAYED by Lab Method • Inverse relationships: • Bed is OCCUPIED BY Patient • Book IS LENT BY Library

  40. One-to-One: Bed Patient Patient One-to-Many Disease Patient Many-to-Many Doctors Relationship Types

  41. DRUG *generic name - other attributes DRUG MANUFACTURER * manufacturer name - other attributes LICENSE * manufacturer name * generic name - date licensed Modeling Many-to-Many Relationships

  42. Exercise: Devise a Relational Model for MEDLINE citations PMID- 2405204 TI- Medical informatics. An emerging academic discipline and institutional priority. AB- Information management constitutes a major activity of the health care…\ AD- Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115 AU- Greenes RA AU- Shortliffe EH LA- eng PT- Journal Article PT- Review JT- JAMA : the journal of the American Medical Association SO- JAMA. 1990 Feb 23;263(8):1114-20. MH- Career Choice MH- Hospital Information Systems MH- Information Systems MH- Medical Informatics/education/organization & administration/*trends MH- Medical Informatics Applications MH- National Library of Medicine (U.S.) MH- Research MH- Training Support MH- United States

  43. Case Presentation The patient is a 50 year old, Native American female who present to the emergency room (ER) with the chief complaint of lip numbness, nausea and chest pain. The patient was generally well until about one half hour prior to arrival in the ER, while eating dinner at as seafood restaurant in Rock Harbor, MA. She was finishing a dinner of New England clam chowder, lobster, steamed clams, and corn on the cob when she noted onset of symptoms. Others in her party ate fish and chips, although two other people ate the clam chowder; none at the steamers. She gives a history of hypertension and states that she was getting a "capsule, half green, half blue-green" from her private doctor. She also reports that she was treated in the past for tuberculosis while she was pregnant, but doesn't remember what she was treated with or for how long. She reports that she was at another hospital on the other side of town, where she had a liver biopsy. She reports that he thinks the diagnosis was "hemachromatosis". The patient reports an allergy to Bufferin. Physical examination revealed a well-developed, well-nourished diaphoretic female in moderate respiratory distress. Vital signs showed a pulse of 110, a respiratory rate of 8, an oral temperature of 100.3, and a blood pressure of 150/100. Examination revealed rales over both lower lung fields. Abdominal exam revealed a tender, palpable liver edge. Neurologic exam reveals dysarthria, diffuse muscle weakness, and hyperreflexia. Chem7 (serum): Glucose 100 (70-105) Chem7 (plasma): Glucose 150 (75-110) CBC: Hgb 15 (12.0-15.8), Hct 45 (42.4-48.0), WBC 11,000 (3,540-9,060), Platelets 145K (165-415K) A fingerstick blood sugar was 80 Urinalysis showed protein of 1+ and glucose of 0 A blood culture was positive for methicillin-resistant Staphylococcus aureus (MRSA) ECG - Sinus Rhythm, 74BPM, Axis -30 degrees, ST segment 2mm elevated and T-waves down in leads I, L, V5 and V6 Chest X-ray Left upper lobe infiltrate, left ventricular hypertrophy The patient's nurse reported that the patient seemed more worried about who would care for her elderly father if anything happened to her. A medical student reviewing the case wonders whether paralytic shellfish poisoning could cause a myocardial infarction; she decides to do a literature search. The patient was treated with activated charcoal and stomach lavage , followed by enteric-coated aspirin. Due to worsening respiratory insufficiency, she was intubated and placed on mechanical ventilation.

  44. Take-Home Messages • Data model is the most critical aspect of system design and function • Data models should reflect real world objects and their relationships to ensure durability • A correct data model outlasts applications, including many not anticipated at system start-up

More Related