310 likes | 740 Views
Database Design Techniques for Clinical Research. Melissa K. Carroll, M.S. October 20, 2003. Overview. Relational database design Implementing a relational database in Microsoft Access Designing a database for a typical study by our group Resources Questions. Relational Database Design.
E N D
Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003
Overview • Relational database design • Implementing a relational database in Microsoft Access • Designing a database for a typical study by our group • Resources • Questions
What is a Database? • Collection of data organized for efficient operations • Everyone uses them multiple times a day, often without realizing it • Examples • Airline reservations • Online shopping • Underlying design principles are largely universal
Problems with “Flat” Files • Data Redundancy • Leads to more work and inconsistencies • Wreaks havoc on performing basic manipulations such as searching and sorting • File Management • Multiple files • Concurrent users • Security • Intermediate results • Ad hoc programming (reinventing the wheel)
Data Modeling: Entity-Relationship Model • Models data as entities, with attributes, and relationships between entities • Entity: person, place, or thing • Instance: example of an entity • Attribute: feature of an entity • Relationship: describes association between (usually two) entities
Designing an E-R Diagram • Issues to Consider • What questions will the data be used to answer? • What are the entities and how do they relate to each other? • What attributes uniquely identify entities? • What attributes need to be sub-divided? • Goal: Eliminate Redundancy • Process is called “normalizing” data
Relational Model • Lower-level model used for actual database implementation • Translating from E-R model • Entities become tables • Attributes become fields • Many-to-many relationships become tables • Unique identifiers from involved tables as fields • Unique identifiers from “one” sides are added as fields to corresponding “many” sides
Relational Database Management Systems (RDBMSs) • Database Management System (DBMS): software with purpose of helping user design and use a database • Relational Database Management System (RDBMS): DBMS for databases based on relational model • Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)
SQL • Need language to tell the DBMS • The design of the database • Actual data to be entered • What data to retrieve and in what format • SQL = standardized language used by almost all major DBMSs • Standard language provides interoperability and portability
SQL Examples • CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75)) • INSERT INTO artist (artistName) VALUES (“The Beatles”) • UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”
SQL Examples Continued • SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID • SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID • SELECT albumTitle, albumAge AS releaseYear - Date() FROM album
SQL Examples Continued • SELECT Count(artistID) from artist • SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”
Typical Simple Study • Baseline and fixed number of follow-ups • Subject reaches each time point only once • Different time points have different scale protocols • Considerable overlap in scales between time points • Isolated from other studies
Four Database Design Approaches • Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week • One table per time point, items as attributes • Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton • One table per scale per time point, items as attributes • Approach Three: scale administration as entity • One table per scale, items as attributes • Approach Four: item as entity • One table (theoretically)
Evaluation of Approach One • May seem appropriate because common format for analysis is one record per subject • Problems • Limited number of fields allowed in some DBMSs • Will have many missing values • General redundancy issues (shares with Approach Two, to follow)
Pros and Cons of Approach Two Versus Approach Three • Pros • “Horizontal” format • Flexibility for handling inter-time point scale disparities • Cons (for simple studies) • Data model complexity • Table creation and modification time multiplied • Space consumption • More data locations (entry and retrieval complexity) • Re-assigning to different time points
Reassigning Scale Time Points Using Approaches Two and Three
Approach Two Cons for More Complex Studies • Poor at handling an indefinite number of follow-up time points • Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times • May happen due to progressing through the study multiple times • May also happen due to e.g. being screened multiple times
Evaluation of Approach Four • Pros • Could potentially handle changes more elegantly • Perhaps more “normalized” theoretically • Cons • Considerably harder to design entry interface • Harder to obtain data in formats usually required • Doesn’t fix non-database problems with data collection changes
Databases and Datasets • Database: Collection of data organized for efficient entry, updating, storage, and retrieval • Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose • Well-designed databases should facilitate creation of datasets in any desired format • Datasets should be formatted for a particular purpose and used only for that purpose
Normalizing Data Continued: Comparison of Medication Queries
Multi-Study Issues: To Separate or Not To Separate • If same data will count for multiple studies • Keeping design and data in sync • E.g. updating all copies when data changed • E.g. ensuring scale changes are reflected in all tables and forms • If handling multiple, possibly “isolated” studies • Keeping design in sync • Can still use views so actual storage is transparent to user
Summary • Careful planning must go into designing a database • First step in design is to model the data • E-R relational model is effective • DBMSs, such as Access, offer tools for creating, using, and maintaining databases • When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy • Properly designed databases will supply data in any format desired
Resources • Access Help (Help in top menu, Contents and Index, Contents tab) • Access Database Wizard (in main menu upon opening) • Oreilly Access Database Design & Programming, 3rd Edition • For database design theory: online chapter at http://www.oreilly.com/catalog/accessdata3/chapter/ch04.html • Access (97/2000/etc.) Bible • Available here; not 100% accurate • Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)