780 likes | 1.21k Views
Clinical-Data Acceptance Testing Procedure. Sunil Gupta Quintiles. What are examples of bad data?. Duplicate records exist Missing values in required variables Start dates are after stop dates Invalid value for variable Values are inconsistent across other variables or other data sets
E N D
Clinical-Data Acceptance Testing Procedure Sunil Gupta Quintiles
What are examples of bad data? • Duplicate records exist • Missing values in required variables • Start dates are after stop dates • Invalid value for variable • Values are inconsistent across other variables or other data sets • Etc.. Across all industries, bad data costs companies more than $ 600 billion per year. For clinical trials, up to 5 % of raw data values may initially be erroneous.
Sources of Data Quality Problems across all Industries 2001 survey by the Data Warehousing Institute Check Data: Each Data Transfer, Data Conversion or Data Updates
What are Clinical Data Issues? • Incorrect or Incomplete clinical data • Critical variables may need to be: • Non-missing • Consist only of valid values • Be within a range • Or be consistent with other variables. • Process using Edit Check Macros • Unit Testing • Universal principles 21 CFR Part 11 Prevent confusion and frustration. Prevent incorrect clinical study conclusion: Safety, Efficacy
Benefits of High Quality Data across all Industries 2001 survey by the Data Warehousing Institute About $20 to $25 per case report form page or up to 15% of clinical research budget may be spent to ensure data quality.
Clinical Data Process Flow Raw Data Demog: Valid/Invalid Data Vitals: Valid/Invalid Data Labs: Valid/Invalid Data AE: Valid/Invalid Data As a result, in general: - Keep invalid data in data sets - Exclude invalid data in reports Edit Check Process 1. Identify Invalid Data based on DMP 2. Isolate Data Issue 3. Communicate Finding to CDM Outcome 1. MONTHLY: Monitor Improvements in Invalid Data with e-mail notification to all team members 2. FINAL: Use Valid Data in Analysis data sets, Tables, Lists and Graphs
What is Data Acceptance Testing • Create Data Management Plan • Generally a monthly process to refresh data • Identify, isolate and report clinical data issues • Make critical decision before database lock to accept or reject database Similar to User Acceptance Testing for software application. Data
Solution involves four steps beforeDatabase Lock • 1. Specifying Requirements in Data Management Plan (DMP) • 2. Developing and Testing Edit Check Macros • 3. Communicating Results with Clinical Data Management (CDM) (Partnership) • 4. Monitoring the Metrics of Data Issues Using Edit Check Macros standardized our approach to validate the quality of raw clinical data.
Specifying Requirements in Data Management Plan (DMP) • All unique key variables are required. • Patient ID variable is non-missing and unique. • Confirm minimum and maximum values. • Vitals data set: valid temperature and blood pressure values within lower and upper range values. • Display all unique values of selected variables. • Demog data set: valid treatment (active, placebo). continued … At a minimum, these types of data checks should be performed.
Specifying Requirements in Data Management Plan (DMP) 4. Confirm the logic between two variables. • Adverse Events data set: adverse event description, preferred term and system organ class need to be consistent. 5. Confirm the consistency between two clinical dates. • Adverse Events data set: Adverse start dates before or same day as adverse stop dates. continued … At a minimum, these types of data checks should be performed.
Specifying Requirements in Data Management Plan (DMP) • Are there any protocol violations that should be excluded from analysis? • Are the treatment groups randomly distributed based on safety subset population? • Have lab values been correctly converted from reported units to standard international units? • Are patient follow-up visit windows in compliance with the protocol? Ideally, these types of standard data checks should also be performed.
Developing and Testing Edit Check Macros • System Requirements • Unselect data checks • Easily modify data checks, Add new data checks • Display ‘No records found’ for no data issues • Display feedback from CDM on data issues • Limited Programming Resources • SAS’s ODS, Minimum SAS macro programming • Simple, task-oriented macros approach • Apply standard options to selected SAS Procedures A traditionally lengthy SAS program of over 1,000 lines is easier to read with only 75 lines containing 75 edit check macro calls. continued …
Edit Check Macros: Functional Requirements • Macros use basic macro programming techniques that are easy to understand • Quick development of new macros • Quick enhancements to existing macros • Macros provide informative feedback in titles • Input data set name • Variables checked • Any subset condition applied Data set name, variables checked and subset condition are important information for CDM to first confirm the data issue before taking any action. continued …
Edit Check Macros: Functional Requirements • Macros provide reference information in footnotes • Program name • Output file name • Date executed • Macros display data issues • Patient and visit identification • Data values of variable checked • Supporting variables (if any) • One data issue/page • Findings saved to one RTF file
One Edit Check Macro for each type of data issue (partial list) * May require extra programming step since most all edit check macros require single data set.
Scope of Edit Check Macros Involved Multiple Procedures %crt_comp (Proc Sort, Data Step, Proc Compare, etc.) %u_ecdup (5 SAS Procedures, 2 Data Steps ) Simple Single Procedure %u_ecprnt (Proc Print) %u_ecfreq (Proc Freq) %u_ecmens (Proc Means)
Basic Edit Check Macro call with standard macro parameters %let study = proto1; < libref of data sets – raw or analysis > %edit_check_macro( edsn = &study.XXX < data set name with optional data set option or if used with a SAS procedure then a SAS procedure option >, ecvr = < one or more edit check variables >, byvars = < patient for by-group processing >, ecvrl = < one or more extra supporting variables >, esub = < subset condition using “where statement” >, fnote = < user defined message to understand data check > );
SAS code for %u_ecfreq Edit Check Macro %macro u_ecfreq( edsn = &study.ptinfo, /* data set name */ ecvr = %str(_all_), /* variable or list of variables, can also pass cross tables with options */ esub = , /* subset where condition, can also be a format statement */ fnote = /* optional footnotes */ ); continued …
SAS code for %u_ecfreq Edit Check Macro title1 "Study: &study - Edit Check in &edsn dataset"; title2 "PROC FREQ of %upcase(&ecvr) variable with the condition: &esub"; %if %length(&fnote) > 0 %then %do; %if %index(&fnote,\) = 0 %then footnote3 &fnote.;; %if %index(&fnote,\) > 0 %then %do; %let fnote3 = %scan(&fnote,1,\); footnote3 &fnote3; %let fnote4 = %scan(&fnote,2,\); footnote4 &fnote4; %end; %end; continued …
SAS code for %u_ecfreq Edit Check Macro proc freq data = &edsn; tables &ecvr / list missing nocum; &esub; run; footnote3; %mend u_ecfreq; Temporary or permanent, raw or derived data set, with data set options Single or multiple crossed variables with standard statement options Independent single or multiple SAS statements, generally a WHERE statement to apply subset condition Edit Check macro parameters are directly applied to PROC FREQ.
%u_ecfreq Edit Check Macro call %u_ecfreq( edsn = &study..ptinfo, ecvr = sex race, esub =, fnote = %str(Note to CDM: Confirm valid gender \Note to CDM: Confirm valid race)); Aside from the footnote, the %u_ecfreq edit check macro call takes only 1 SAS line instead of 16 SAS lines.
Communicating Results with Clinical Data Management (CDM) - Output file Study: proto1 - Edit Check in proto1.ptinfo dataset PROC FREQ of SEX RACE variable with the condition: The FREQ Procedure sex Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 4 40.00 Mal 2 20.00 Male 4 40.00 Key Info Only continued … Macro parameters are directly applied to titles.
Communicating Results with Clinical Data Management (CDM) - Output file race Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Asian 1 10.00 Black 1 10.00 White 7 70.00 X 1 10.00 Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007 Note to CDM: Confirm valid gender Note to CDM: Confirm valid race
%crt_comp Edit Check Macro call %crt_comp( ec1dir =, /* prefix study directory data set 1 */ ec1dsn =, /* data set name 1 */ ec1var =, /* variable names from data set 1 */ ec2dir =, /* prefix study directory data set 2 */ ec2dsn =, /* data set name 2 */ eco2dsn =, /* output data set name 2 */ ecr2dsn = 0, /* remove duplicate records 0 = no, 1 = yes */ ec2var =, /* variable names from data set 2 */ ecvrlt =, /* common variable names to compare or _all_ */ ecid =, /* matching variable name for data set 1 */ ecid2 =, /* matching variable name for data set 2 */ ec1wst =, /* optional where condition on data set 1 */ ec2wst =, /* optional where condition on data set 2 */ ec1opt =, /* optional option condition on data set 1 */ ec2opt =, /* optional option condition on data set 2 */ cmpopt = %str(method=exact) /* proc compare method option */ );
%crt_comp Edit Check Macro call %crt_comp( ec1dir = a, ec1dsn = ptinfo, ecid = patient, ec2dir = a, ec2dsn = mhist, ecid2 = patient); Simple, one SAS line macro call provides comprehensive PROC COMPARE results to automatically identify and compare common variables between ptinfo and mhist data sets.
%crt_comp Edit Check Macro – Output file Study: proto1 - Edit Check in proto1.ptinfo dataset Compare aproto1: BASE Dataset ptinfo with aproto1: COMPARE Dataset mhist Compare common variables Column Name ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ patient age sex race ... Automatic %crt_comp macro assumes that common variables have same variable names. No hard coding of variable names is required.
%u_ecdup Edit Check Macro call * Check for duplicate records by all variables; %u_ecdup( edsn = &study..ptinfo, /* Data set name */ ecvr = _all_, /* Error check variables, default is all*/ ecvrl =, /* Extra variables to be kept in the output dataset */ fnote =, /* Optional footnote */ eoutdsn = /* Output dataset */ );
Communicating Results with Clinical Data Management (CDM) - Output file Study: proto1 - Edit Check in proto1.ptinfo dataset Indicate the result of checking for duplicated records Error Code No duplicated records found for _ALL_ Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007
Communicating Results with Clinical Data Management (CDM) - Output file * Check for duplicate records by key variables; %u_ecdup(edsn = &study..ptinfo, ecvr = patient); Study: proto1 - Edit Check in proto1.ptinfo dataset (Results on next page) Indicate the result of checking for duplicated records Error Code No duplicated records found for PATIENT Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007
%u_eccust Edit Check Macro call %u_eccust( edsn = c990736.vlabs, /* Dataset name */ ecvr = labdt, /* Variable for edit check*/ ecst = %str(if labnme = 'wbc' and not ((.1 < labv < 100))), /* Actual IF condition statement with NOT built in – expect zero records with clean data */ titles =, /* Extra Title */ fnote =, /* Optional footnote */ ); %u_eccust macro allows for almost any customized data check using a valid single ‘IF’ statement which is applied in a DATA step.
%u_eccust Edit Check Macro call %u_eccust( edsn = h&study..hematol, ecvr = hehgb hehgbf hehgbl hehgbh, ecst = if nmiss(hehgb, hehgbl, hehgbh) = 0 and ( (hehgb < hehgbl and hehgbf ne 'L' ) or (hehgb > hehgbh and hehgbf ne 'H' ) or (hehgbl <= hehgb <= hehgbh and hehgbf ne 'N' ) ) );
Flexibility with Edit Check Macros Parameters to Edit Check macros can be macro variables. - Can group macros by data set, variable, etc. - Can apply conditional execution based on non-zero obs %macro ec_run(dsetn =, whrcon =); %u_ecfreq(edsn= &dsetn,ecvr= race,esub= &whrcon); %u_ecfreq(edsn= &dsetn,ecvr= age,esub= &whrcon); %negval(ngdst= &dsetn,ngvr= age,ngwhr= &whrcon); %mend ec_run; %ec_run(dsetn= ptinfo, whrcon= %str(itt = 1));
Making multiple Edit Check Macro calls %u_ecfreq(edsn= &study..hematol, ecvr= hewbc, esub= where hewbc <= .); %u_ecfreq(edsn=&study..hematol, ecvr= heneut, esub= where heneut <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hesegs, esub= where hesegs <= .); %u_ecfreq(edsn=&study..hematol, ecvr= heband, esub= where heband <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hegran, esub= where hegran <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hebaso, esub= where hebaso <= .); One SAS line macro call for each lab test instead of 14 SAS lines. Focus is generating output instead of writing SAS code.
Making multiple Edit Check Macro calls %*u_ecfreq(edsn= &study..hematol, ecvr= hewbc, esub= where hewbc <= .); %*u_ecfreq(edsn= &study..hematol, ecvr= heneut, esub= where heneut <= .); %*u_ecfreq(edsn= &study..hematol, ecvr= hesegs, esub= where hesegs <= .); %u_ecfreq(edsn= &study..hematol, ecvr= heband, esub= where heband <= .); %u_ecfreq(edsn= &study..hematol, ecvr= hegran, esub= where hegran <= .); %u_ecfreq(edsn= &study..hematol, ecvr= hebaso, esub= where hebaso <= .);
Communicating Results with Clinical Data Management (CDM) • Provide all important information: • Data set name, Patient Number, Visit Date, Variables checked, Supporting Variables, Subset condition • One page for each data issue • Confirm data issue exists in raw data sets • Identify discrepancy between biostats expectations and CDM’s service • CDM updates database design or new variable calculation as needed • CDM updates/adds CDM data checks
SOPs on Clinical Data Acceptance Testing • Data Standards: SOPs on accessing and writing to data sets • Input – Reading and archiving of data from various sources • Mapping – Documented system to convert data values • Output – Structuring and archiving of output data • Data Quality: SOPs on identify and resolving data issues • Standard Edit Checks (ex. Duplicate records) • Edit Checks based on Data Management Plan (DMP) • Cleaning: Screening of all data sets, all variables and all records (Some fails may be acceptable) • CDAT: Thorough checking of all critical (primary endpoint) variables (No fail is acceptable) • Query Resolution – Feedback loop to monitor data improvement
Metrics on Clinical Data Acceptance Testing • Unit: # of edit checks tested (based on DMP) • Summary level measurement (Scope of issue – at least one failed patient out of x edit checks) = # of failed edit checks/Total # of edit checks • Detail level measurement (Impact of issue – systematic or localized problem?) • Overall = # of failed records/(Total # of records x Total # of edit checks) • By edit check = # of failed records/Total # of records Goal: To capture and monitor the correction of unexpected data.
1. Ptno is a required variable – non-missing. %u_ecprnt(edsn= work.ptinfo (obs=5), ecvr= age sex race); Study: Test - Edit Check in work.ptinfo dataset PROC PRINT of age sex race variables, all observations Obs ptno age sex race 1 . 35 Mal X 2 102 0 Mal White 3 105 -2 Female White 4 107 30 Male Black 5 110 31 Male White
2. Age range acceptable (1 - 100)? %u_ecmens(edsn = work.ptinfo, ecvr = age); Study: Test - Edit Check in work.ptinfo dataset PROC MEANS of AGE variable with the condition: The MEANS Procedure Analysis Variable : age N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 10 28.6000000 17.2832610 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ N Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 10 -2.0000000 54.0000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2. Age range acceptable (1 - 100)? proc format; value agegrp 1 - 100 = 'Acceptable'; run; %u_ecfreq(edsn= work.ptinfo, ecvr= age, esub= %str(format age agegrp.)); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of AGE variable with the condition: format age agegrp. The FREQ Procedure age Frequency Percent ------------------------------------------ -2 1 10.00 0 1 10.00 Acceptable 8 80.00
3. Sex values acceptable (Male, Female)? %u_ecfreq(edsn= work.ptinfo, ecvr= sex); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of SEX variable with the condition: The FREQ Procedure sex Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 4 40.00 Mal 2 20.00 Male 4 40.00
4. Race values acceptable (Asian, Black, White)? %u_ecfreq(edsn= work.ptinfo, ecvr= race); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of RACE variable with the condition: The FREQ Procedure race Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Asian 1 10.00 Black 1 10.00 White 7 70.00 X 1 10.00
5. Do duplicate records exist? %u_ecdup(edsn= work.ptinfo, ecvr= ptno); Error Code No duplicated records found for PTNO
Metrics on Clinical Data Acceptance Testing • Unit: 5 edit checks tested • Summary level measurement (Scope) 80% = 4 failed edit checks/5 edit checks • Detail level measurement (Impact) • Overall 12% = 6 failed records/(10 records x 5 edit checks) • By edit check: 1. Patno (10%), 2. Age (20%), 3. Sex (20%), 4. Race (10%), 5. Duplicates (0%) Maybe fail Data Acceptance Test since Patno is a unique critical variable?
Possible Reasons for Poor Clinical Data Summary: High Low Detail: Low High
Summary: What the Edit Check MacrosAre Not Designed to do • Totally replace traditional SAS programming in validation programs; they are designed to supplement them • Any data manipulation using the DATA STEP • Display user friendly messages to help diagnose warnings or error messages • Provide flexibility to use all SAS procedure options
Summary: What the Edit Check MacrosWere Designed to do • Enable quick startup with minimum skills so that any programmer could assist as needed • Improve productivity of standard checks by 80% so that more time can be spent on complex checks • As data is downloaded, reduce confusion within the team early since data issues are known/accounted for • Improve the quality and reliability of source clinical data so that final data can be accepted at database lock time