450 likes | 560 Views
EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries. Michael A. Kohn, MD, MPP 1 August 2013. Clinical Research*. Choose the study design, and define the study population, predictor variables, and outcome variables;
E N D
EPI 218Database Management for Clinical ResearchTables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August 2013
Clinical Research* • Choose the study design, and define the study population, predictor variables, and outcome variables; • measure these variables and anticipate problems with measurement; • analyze the results In this course, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements. *Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.
Outline • Housekeeping • Data Tables • Rows = Records; Columns = Fields • Normalization of Data Tables • Start Lab 1
Housekeeping Epi 218
Course website: http://www.epibiostat.ucsf.edu/courses/schedule/data_management.html • Lectures and Labs will be in China Basin Landing 6702 with overflow into 6704, 8:30 – 10:30 • “Learn MS Access 2000” video http://mkanders.com/learn_access_video.htm Username: ucsfdbclass Password: access2000 (We can also loan you the video on CD.)
Platforms • Access (Labs 1, 2 and 3) • REDCap (Lab 4) • QuesGen (Lab 5) • OnCore (Lab 6) May use other data management platforms for final project: -- SurveyMonkey -- Filemaker Pro -- Oncore -- OpenClinica -- Other
Microsoft Access • Integrated desktop database management platform • Uses SQL (Structured Query Language) • Has an outstanding graphical query design tool • Incorporates an excellent report writer • Based on the principles of the Relational Model • Relationships diagram has integrated referential integrity • Very flexible, infinitely customizable • NOT browser based, desktop application • Using advanced features usually requires hiring a developer
Microsoft Access Changed user interface between Access 2003 and Access 2007. If you are running Access 2003 or an earlier version, use the lab instructions for Access 2003. If you are running Access 2007 or 2010, use the lab instructions for Access 2007. DEB Terminal Server 185-RDS1.epi-ucsf.org has Access 2010. The others have Access 2003.
DEB Terminal Server • Provides a remote Windows desktop with Microsoft Office Professional • Remote Desktop client software freely available for the Mac and already part of Windows • http://www.microsoft.com/en-us/download/details.aspx?id=18140 • Obtain DEB Terminal Server username and password from HelpDesk@psg.ucsf.edu • Instructions available on course syllabus page
REDCap • Web-based research data collection system developed at Vanderbilt • Available free through UCSF Academic Research Systems • http://tinyurl.com/yh5m6ka • You are both the Principal Investigator and User 1. • Model= “Do-it-yourself”
QuesGen • Web-enabled research data collection and management platform developed (with UCSF input) by a private company based in Burlingame • More full-featured and customizable than REDCap, but primarily “pay-us-to-do-it” rather than “do-it-yourself” • User accounts for Epi 218 students
Learning Objectives • develop a multi-table, relational database for a research study using Microsoft Access • query a database for monitoring and analyzing research data • learn about REDCap: basic functions, advantages and limitations • understand the advantages and costs of other web-based platforms such as QuesGen • hear about data management for large-scale clinical trials in industry
Requirements • Turn in all 5 labs on time • Labs are due by midnight the following Thursday (Lab 1 due 8/8 at midnight) • Complete Final Project • Due 9/18/2013
Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/20/2012 Send in a copy of your research study database*. We prefer a database that you are currently using or will use for a research study. However, a demonstration or pilot database is acceptable. *If you are unable to package your database in a file to email, you can send us a link or work out another way to review your database.
Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/18/2013 If you are doing secondary analysis of data collected by someone else, • obtain the data collection forms* used in the original data collection, • set up a new database that you would use for a follow-up study. *Often easily obtained by doing a Google search or emailing the author of the original study.
Final Project: Part BSubmit Your Data Management PlanDue 9/18/2013 • General description of database • Data collection and entry • Error checking and data validation • Analysis (e.g., export to Stata) • Security/confidentiality • Back up
Final ProjectDue 9/18/2013 Start thinking about this now. Build your own study database as you work through the labs. Use extra time in lab to work on your study database. Set up appointments with course faculty early.
TICR Professional Conduct StatementClarifications for this class • I will maintain the highest standards of academic honesty • I will neither give nor receive aid in examinations or assignments unless such cooperation is expressly permitted by the instructor • I will conduct research in an unbiased manner, reports results truthfully, and credit ideas developed and work done by others • I will not use answer keys from prior years • I will write answers in my own words, and, when collaboration is permitted, acknowledge collaborators when answers are jointly formulated For Epi 218 – Just don’t turn in somebody else’s work as your own.
Data Tables Rows = Records = Entities Columns = Fields = Attributes
DCR Chapter 16 Exercise 2 The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio. The following are base-hospital physician data collection forms for 2 enrolled patients: Lowenstein DH, Alldredge BK, Allen F, Neuhaus J, Corry M, Gottwald M, et al. The prehospital treatment of status epilepticus (PHTSE) study: design and methodology. Control Clin Trials 2001;22(3):290-309. Alldredge BK, Gelb AM, Isaacs SM, Corry MD, Allen F, Ulrich S, et al. A comparison of lorazepam, diazepam, and placebo for the treatment of out-of-hospital status epilepticus. N Engl J Med 2001;345(9):631-7.
Display the data from these 2 data collection forms in a 2-row data table.
JIFee Jaundice and Infant Feeding Study Methods: Design-Nested double cohort study. Setting-Kaiser Subjects-Infants with neonatal jaundice and randomly selected non-jaundiced infants Predictor Variable-Presence or absence of jaundice Outcome Variable- Neuropsychological score (ranging from 55 to 145) at age 5 Analysis- ? Newman, T. B., P. Liljestrand, et al. (2006). "Outcomes among newborns with total serum bilirubin levels of 25 mg per deciliter or more." N Engl J Med354(18): 1889-900.
Infant Jaundice Study Data • Approximately 400 children • 5 examiners (doctors) • Approximately 700 neuropsychological examinations, measuring weight, height, and “NPScore” (IQ) • Some children to be examined more than once • No examiner to see the same child twice • If child died before age 5, store age and circumstances of death
Demonstration: Creating a Data Table Label columns and enter rows of data in datasheet view Where is predictor on data collection form?
Demonstration: Data Dictionary • Table design view: • field (=column) names, • data types, • definitions, • validation rules • (More on data types, free-text vs. coded responses, later)
Acceptable table showing one set of exam results per participant. (BabyExamForFigure3)
Demonstration Disallowed values Duplicate primary keys This automatic error checking and data validation IS why you need to enter your data into a computer; it is NOT why you need a relational DBMS. Many single-table products (Filemaker Pro, SAS FSP, even Excel) can do error checking and data validation.
Demonstration: Same Table in Excel, Stata • Excel • Stata • Etc Rows = Records = Entities Columns = Fields = Attributes Access and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.
Table of Study Subjects Table of Study Subjects Row = Individual Infant Columns = ID#, Name, DOB, Sex, Jaundice If some infants have more than one exam, what do you do?
Undesirable table showing multiple exam results per study participant. (BabyExamForFigure4)
Demo • Find highest IQ Score • Find all exams done in April
Common Error • If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, … • Or if your table is more than about 30 columns wide, • It is time to restructure your table.
Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.) (ExamBabyForFigure5)
Demo • Find highest IQ Score • Find all exams in a particular month • What is Helen’s birth date? • What happened to Alejandro, Ryan, Zachary, and Jackson?
Normalization If some infants have multiple exams, “normalize” the records into two tables, one for subjects and one for examinations.
Data normalized into two tables: one (“Baby”) with rows comprising subject-specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.
Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).
Demonstration Inability to create integrity violations with normalized tables. This IS why you need a multi-table relational DBMS.
Outline • Housekeeping • Data Tables • Rows = Records; Columns = Fields • Normalization of Data Tables • Start Lab 1