570 likes | 709 Views
Review of Assignment 3, Platform Choices, Security. Michael A. Kohn, MD, MPP 31 January 2012. Outline. Loose Ends : Move Epi 218 to Summer?, Yes/No Fields Assignment 3 Review Platform Choices HIPAA Privacy Rule and CFR 21 Part 11 Assignment 4. Housekeeping.
E N D
Review of Assignment 3, Platform Choices, Security Michael A. Kohn, MD, MPP 31 January 2012
Outline • Loose Ends : Move Epi 218 to Summer?, Yes/No Fields • Assignment 3 Review • Platform Choices • HIPAA Privacy Rule and CFR 21 Part 11 • Assignment 4
Housekeeping • Database demos with advice for Assignment 4/Final Project: Tuesday 2/7 • Assignment 4/Final Project is due 2/21
Move Epi 218 to Summer? REDCap Survey to be emailed to current ATCR and MCR students
Move Epi 218 to Summer? The TICR Program is considering moving its course on Database Management (Epi 218; Database Management for Clinical Research; Dr. Michael Kohn, Course Director) from its current time in the Winter Quarter to the Summer Quarter (August thru mid-September). The course would thus be required during students' initial quarter in their respective programs (either ATCR or Master's). The rationale for this proposed change is to give students the earliest possible exposure to best practices in data collection and management.
Move Epi 218 to Summer? How do you feel about this proposed change (moving the Epi 218 database course to Summer Quarter)? • I prefer that Epi 218 stay in Winter Quarter • I prefer that Epi 218 move to Summer Quarter
Move Epi 218 to Summer? Potential problems? • PSG Remote Desktop Accounts • REDCap Usernames • MyResearch Accounts • Stata
Loose End: Yes/No fields • Binary fields are not very useful, because you can’t distinguish “No” from blank (not valued). • I create a combo box like we used for Race in Lab 1 with 0 for “No” and 1 for “Yes”. This allows blank.
Assignment 3 Lab 3: Exporting and Analyzing Data 1/24/2012 Determine if neonatal jaundice was associated with the 5-year IQ scores and create a table, figure, or paragraph appropriate for the “Results” section of a manuscript summarizing the association. Extra Credit: Write a sentence or two for the “Methods” or “Results” section on inter-rater reliability. (Use Bland and Altman, BMJ 1996; 313:744)
Answer Of the infants with neonatal jaundice, 149 had IQ tests at age 5, and of the infants without neonatal jaundice, 248 had IQ tests. The mean (+SD) IQ score was significantly higher in the jaundice group, 111.5 +21.1, than in the no-jaundice group 101.4+20.5 -- difference 10.1 (95% CI 5.9 – 14.4).
Would you submit this for publication? ----------------------------------------------------------------------------- Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- No | 248 101.3925 1.303441 20.52661 98.8252 103.9597 Yes | 149 111.5358 1.732576 21.14879 108.112 114.9596 ---------+-------------------------------------------------------------------- combined | 397 105.1994 1.06956 21.31083 103.0967 107.3021 ---------+-------------------------------------------------------------------- diff | -10.14332 2.152007 -14.37414 -5.912502 ------------------------------------------------------------------------------ Degrees of freedom: 395 Ho: mean(No) - mean(Yes) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -4.7134 t = -4.7134 t = -4.7134 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000
Essential Elements • Sample size (149 jaundiced, 248 non-jaundiced) • Indication of effect size (report both means, or the difference between them) • Get direction of effect right. (Jaundiced group did better!) • Indication of variability (Sample SDs, SEs of means, CIs of means, or CI of difference between means.)
Browner on Figures Figures should have a minimum of four data points. A figure that shows that the rate of colon cancer is higher in men than in women, or that diabetes is more common in Hispanics than in whites or blacks, [or that jaundiced babies had higher IQs at age 5 years than non-jaundiced babies,] is not worth the ink required to print it. Use text instead. Browner, WS. Publishing and Presenting Clinical Research; 1999; Williams and Wilkins. Pg. 90
Figure 1: In 149 infants with neonatal jaundice, the average IQ scores were higher compared to the 248 non-jaundiced infants when evaluated at age 5 (p<0.0001).
Box Plot • Median Line • Box extends from 25th to 75th percentile • Whiskers to upper and lower adjacent values • Adjacent value = 75th /25th percentile ±1.5 x IQR (interquartile range) • Values outside the adjacent values are graphed individually • Would be nice if area of box were proportional to sample size (N). In some box plots the width of the box is proportional to log N, but not in Stata.
Extra Credit Extra Credit • Report within-subject SD (4.0) as a measure of reliability. • Calculate repeatability (11.0) • Bland-Altman plot with mean difference and 95% limits of agreement* * Anna did this
Methods/Results Methods: We assessed inter-rater reliability of the IQ test by having different examiners re-test some of the children. We calculated the within-subject standard deviation and repeatability. (Bland and Altman, BMJ 1996; 313:744) Results: Different examiners re-tested 198 children. The within-subject standard deviation was 4.0, so the “repeatability” was 11.0, meaning that two examiners of the same subject would score within 11 points of each other 95 percent of the time. (Bland and Altman, BMJ 1996; 313:744)
N = 142 (children examined by both Satcher and Richmond) Mean Difference = 0.49 (95% CI -0.41 – 1.38) 95% Limits of Agreement: -10.1 – 11.0
N = 142 (children examined by both Satcher and Richmond) Mean difference = -.49 Limits of agreement (-11.0 - 10.1)
Bland-Altman in Stata ssc install batplot batplot richmondscore satcherscore, notrend title(Agreement between Richmond and Satcher) ytitle(Difference (Richmond - Satcher)) xtitle(Average of Richmond and Satcher)
Three Types of Research Database • Combination of paper files, Excel spreadsheets, and direct keyboard entry into the statistical analysis package. (Sinusitis DB) • Desktop multi-table relational database. --Access --Filemaker Pro • Web-Enabled Research Platform. --QuesGen (private vendor) --REDCap (academic consortium) --SurveyMonkey (private)
Advantages of Being Web-Based • Available anywhere with an internet connection • No software requirement beyond a browser • Easy to share data • No PHI on laptops or USB drives
Disadvantages of Being Web-based • Limited look-and-feel options on forms (In contrast, Access forms are highly customizable.) • Limited data structures • Requires an internet connection
Demo: Helen’s Record in … • Access • QuesGen* • REDCap * Use training.studydata.net, Jifuser02. Password: 199efe22
Demo: Helen’s Record in … • Access • Master list of subjects • Tabular display of data • Very flexible/customizable • QuesGen • Master list of subjects • Tabular display of data • Less flexible/customizable than Access • REDCap • No master list • No tabular display of data • Not flexible or customizable
Demo: NIH Report in … • Access* • QuesGen • REDCap** • *Query and Report: CountsByRaceAndSex • **ExportInfantJaundiceDataForLecture.dta • tabulate race sex • tabulate race sex if redcap_event_name == "Exam1", missing
Demo: NIH Report in … • Access • Query grid builds up SQL statement • Report based on query • Good report writer • QuesGen • Have to write SQL statement • Decent canned reports • Have to pay for custom reports • REDCap • Single table database • “Report Builder” is really a filtering tool • Querying and reporting requires export to Access or Stata
Demo: Calculate Subject’s Average IQ Score • Access* • QuesGen** • REDCap*** *Query: JaundiceScoreForExport **AvgScore ***Can’t do it
Demo: Calculate Subject’s Average IQ Score • Access • Query grid builds up SQL statement • Copy/Paste from query into Stata • QuesGen • Get QuesGen personnel to write the SQL statement • Extract to csv to Stata • REDCap • Can’t calculate averages across records • Must export to another tool • Nice Stata upload script
Summary • Access (Desktop Integrated DBMS) • Screen forms customizable/flexible • SQL-based with query grid • Multiple tables and relationships • Good report writer • QuesGen (Web-based) • Screen form less flexible than Access • SQL-based but no query grid • Supports multiple tables and relationships • Reliant on QuesGen personnel and support • REDCap (Web-based) • Easy-to-develop web forms with limited flexibility • No SQL • Single table • Must export data for querying, monitoring, etc.
Advantages of REDCap • Multiple user roles • PHI fields explicitly identified • Provided by UCSF • Templates for clinical research • Survey/Questionnaires with skip logic • Extensive auditing • Free!
Outline • DONE Assignment 3 Review • DONE Platform Choices • HIPAA Privacy Rule, CFR 21 Part 11 • Assignment 4
HIPAA Privacy Rule • Patient identifying information must be secure and available only to authorized personnel with auditing of all accesses • Patient identifying data include dates such as date of visit, date of surgery, etc.
Name Address (all geographic subdivisions smaller than state) All elements (except years) of dates related to an individual (birth date, admission date, date of death and exact age if over 89) Telephone numbers FAX number E-mail address Social Security number Medical record number Health plan beneficiary number Account number Certificate/license number Any vehicle or other device serial number Device identifiers or serial numbers Web URL Internet Protocol (IP) address numbers Finger or voice prints Photographic images
Wednesday, January 27, 2010 (SF Chronicle) UCSF patient records possibly compromised Victoria Colliver, Chronicle Staff Writer (01-27) 16:01 PST SAN FRANCISCO -- Medical records for about 4,400 UCSF patients are at risk after thieves stole a laptop from a medical school employee in November, UCSF officials said today. http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2010/01/27/BA1U1BOI6U.DTL This problem can be avoided just by using a Remote Desktop as we do in this class.
CFR 21 Part 11 • Required for submission of electronic data to the FDA when applying for drug or device approval • Audit trail of all data entries, updates, and deletions.
What Have You Learned? • The meaning and importance of the terms “normalization”, “primary key”, and “foreign key”. • The difference between a flat-file database, and a normalized, multi-table relational database. • A little bit of Microsoft Access, REDCap, or QuesGen • Querying data • Exporting data for analysis in a statistical package • Field types • “Front End” (forms) vs. “Back End” (tables)
Before seeking help with data management Search the internet and ask other researchers for already developed data collection forms. Draft your data collection form. Test your data collection form with dummy subjects and, even better, with real (de-identified) study subjects. Enter your test data into a data table with rows corresponding to subjects and columns corresponding to data elements. (Use Excel, Access, Stata, or even Word.) Create or at least think about a data dictionary. Decide who will collect the data, and when/how the data will be collected.
Data Management Plan • http://hub.ucsf.edu/data-management
Data Management Plan • General description of database • Data collection and entry • Error checking and data validation • Analysis (e.g., export to Stata) • Security/confidentiality • Back up
General Description of Database • DBMS, e.g. MS Access XP • # of dynamic tables • # of static “lookup” tables • # of forms • # of reports An appendix could include the relationships diagram, the table names and descriptions, and the field names and descriptions (data dictionary). Print relationships diagram using either “Print Relationships” or taking a screen shot.
Data Collection and Entry • Import baseline data from existing systems • Import lab results, scan results (e.g. DEXA), holter monitor data, and other digital data. • For each form, who will collect the data? • Collect onto paper forms and then transcribe? Enter directly using screen forms? Scannable forms?
Error Checking and Validation • Standard field validation (e.g., for dates) and coding (No = 0, Yes = 1, Unknown = -9) • On entry of outcome classification (e.g. ruptured ectopic), prompted to specify criterion • Error: inadequate monitoring and interim analysis
Analysis • How will you get the data out of the database?
Security/Confidentiality • Keep identifying data (name, SSN, MRN) in a separate table. • Link rest of DB to this table via a Subject ID that has no meaning external to the DB. • Restrict access to identifying data. • Password protect at both OS and application levels. • Audit entries and updates.