360 likes | 473 Views
at SUGI 2006. A few new things I learned…. in San Francisco. I will talk a bit about…. CDISC – What is it? Will any of us ever use it? Is it really that terrible??? Some Simple PROC REPORT that I have found useful Some new graphics news in SAS
E N D
at SUGI 2006 A few new things I learned… in San Francisco
I will talk a bit about… • CDISC – What is it? Will any of us ever use it? Is it really that terrible??? • Some Simple PROC REPORT that I have found useful • Some new graphics news in SAS • Some other random SAS things I saw that could be useful
But first… Thanks to Alex, Emma and the kid
CDISC – a brief introduction… • CDISC stands for the Clinical Data Interchange Standards Consortium • Mission statement: The mission of CDISC is to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare.
CDISC – a brief introduction… • Essentially CDISC aims to make standardized clinical trial datasets that can be easily interpreted and processed when passed to regulatory agencies • The CDISC “movement” has developed very strict standards for databases, affecting variable/table naming conventions, formats, what gets databased and where, what is included in meta-data, etc.. • One end goal is to have datasets so standardized that regulatory bodies can apply data viewing tools universally to all submission datasets without additional programming
CDISC – Who are these people? • Started in Fall 1997 as a ‘grass roots’ volunteer group – 25 people attended first meeting • Currently independent ‘non-profit’ group with >150 corporate memberships and >50 corporate sponsors • Models for datasets are developed by teams of members (i.e. data users!) and are approved using a consensus based approach – has drawbacks! • Local groups exist in some major centres (NJ, SF, Philli in US) – there are some articles on groups in India and Japan meeting about CDISC too
Will I ever see these datasets? • Most global pharmaceutical companies are adopting CDISC as their standard for submission databases • US FDA is also working to adopt such standards for submissions, and compiling into JANUS database (cross-study database) • Several workshops and talks at SUGI related to programming implications of CDISC • Likely will not soon be adopted in academic settings, outside of collaborative projects • There are other SDOs in use (e.g. HL7)
Where do I learn about the standards? • CDISC web-site: www.cdisc.org has an implementation guide giving database specifications • Learning curve is BIG – it is hard at times to visualize the datasets described in the guide • Language is also difficult as they have chosen unique language to use to describe the trial* • Courses are available from CDISC, but are $$$ ($500-2000 USD) *example coming…
Some sample language… • For CDISC the trial is defined in terms of Elements, Arms, Visits and Epochs • The datasets and metadata standards at times request that you describe the trial or an observation with these terms • Roughly speaking: Elements are the building blocks of Arms, Visits the times patients are observed, and Epochs are similar to Elements, but useful for blinded studies… …we need an example!
Epochs, Elements, Visits and Arms…. Epochs • 3 Arms • 7 Epochs • 6 Elements (Screen, Placebo, 5 mg, Rest,…) • 11 Visits = Visits Arm 1 Arm 2 Arm 3
What does a standardized database look like? • The database is broken down into several datasets called domains • Some examples of domains would be AE (adverse events), VS (vital signs), EG (ECG data), MH (Medical History), IE (incl/excl criteria),… • In each domain, all variables have been predefined by the standards and must be present! • If something on the CRF cannot be captured by the existing dataset specs, variables CANNOT be added to those sets → SUPPQUAL it!
Some features you will notice in the sets… • Prefer vertical format, rather than horizontal – hence lots of “stacking” • Dates are in ISO 8601 (yyyy-mm-ddThh:mm:ss) • Durations also in ISO 8601 – brutal! 2 days = P2D 8 yrs, 2 mnths, 3 days & 3 hrs = P8Y2M3DT3H • Variable names all begin with the two character domain name (few exceptions) but across datasets are otherwise alike (e.g. AETERM and MHTERM)
Let’s look at the Vital Signs set… • Take note of the STUDYID, USUBJID and DOMAIN variables – all other are VS-- • Has the “stacked” vertical format where weight, systolic BP, diastolic BP are in rows, not columns • Many, many variables defined for time-point identification and decodes for abbreviated terms Here is the VS dataset
And a quick look at other sets… • The EX set -- exposure to study drug -- contains all of the dosing info about the drug and the dates started/stopped • The DM set -- contains the demographic data – oddly “RACE – OTHER” often collected, goes in another table • The LB set -- contains the lab assessments, also stacked vertically.
And now for some metadata… • These datasets would contain all of the design info for the trial as well as descriptions of the epochs, elements, visits and arms Here is a sample of some of the sheets that might make up the metadata file
Some simple -- but maybe useful -- PROC REPORT… • These simple examples are all related to printing simple listings of data – things you would normally do in PROC PRINT • These 3 examples are for .lst file output control, and will not work with ODS RTF We’ll start with an example to fix something I HATE!!! When a simple printout I want to make looks like…
PROC PRINT output… /June_2006/proc_report_ex1.sas Output from PROC PRINT Obs pat_id treat 1 1-1 2 2 1-2 1 3 1-3 2 4 1-4 1 5 1-5 2 6 1-6 1 7 1-7 2 8 1-8 1 9 1-9 2 10 1-10 1 11 1-11 2 12 1-12 1 13 1-13 2 14 1-14 1 15 1-15 2 16 2-16 1 17 2-17 2 18 2-18 1 19 2-19 2 20 2-20 1 21 2-21 2 22 2-22 1 23 2-23 2 24 2-24 1 25 2-25 2 26 2-26 1 27 2-27 2 28 2-28 1 29 2-29 2 30 2-30 1 31 2-31 2 32 2-32 1 33 2-33 2 34 2-34 1 35 3-35 2 36 3-36 1 37 3-37 2 38 3-38 1 39 3-39 2 40 3-40 1 41 3-41 2 42 3-42 1
…it keeps going and going… /June_2006/proc_report_ex1.sas Output from PROC PRINT Obs pat_id treat 43 3-43 2 44 3-44 1 45 3-45 2 46 3-46 1 47 3-47 2 48 3-48 1 49 3-49 2 50 3-50 1 51 3-51 2 52 3-52 1 53 3-53 2 54 3-54 1 55 3-55 2 56 3-56 1 57 3-57 2 58 3-58 1 59 3-59 2 60 3-60 1 61 3-61 2 62 3-62 1 63 3-63 2 64 3-64 1 65 3-65 2 66 3-66 1 67 3-67 2 68 3-68 1 69 3-69 2 70 3-70 1 71 3-71 2 72 3-72 1 73 3-73 2 74 3-74 1 75 3-75 2 76 3-76 1 77 3-77 2 78 3-78 1 79 3-79 2
…splitting on several pages. /June_2006/proc_report_ex1.sas Output from PROC PRINT Obs pat_id treat 80 3-80 1 81 3-81 2 82 3-82 1 83 3-83 2 84 3-84 1 85 3-85 2 86 3-86 1 87 3-87 2 88 3-88 1 89 3-89 2 90 3-90 1 91 3-91 2 92 3-92 1 93 3-93 2 94 3-94 1 95 3-95 2 96 3-96 1 97 3-97 2 98 3-98 1 99 3-99 2 100 3-100 1
Panel printing #1 – output… /June_2006/proc_report_ex1.sas Very Simple Printing in Multiple Columns Using PROC REPORT pat_id treat pat_id treat pat_id treat 1-1 2 3-36 1 3-71 2 1-2 1 3-37 2 3-72 1 1-3 2 3-38 1 3-73 2 1-4 1 3-39 2 3-74 1 1-5 2 3-40 1 3-75 2 1-6 1 3-41 2 3-76 1 1-7 2 3-42 1 3-77 2 1-8 1 3-43 2 3-78 1 1-9 2 3-44 1 3-79 2 1-10 1 3-45 2 3-80 1 1-11 2 3-46 1 3-81 2 1-12 1 3-47 2 3-82 1 1-13 2 3-48 1 3-83 2 1-14 1 3-49 2 3-84 1 1-15 2 3-50 1 3-85 2 2-16 1 3-51 2 3-86 1 2-17 2 3-52 1 3-87 2 2-18 1 3-53 2 3-88 1 2-19 2 3-54 1 3-89 2 2-20 1 3-55 2 3-90 1 2-21 2 3-56 1 3-91 2 2-22 1 3-57 2 3-92 1 2-23 2 3-58 1 3-93 2 2-24 1 3-59 2 3-94 1 2-25 2 3-60 1 3-95 2 2-26 1 3-61 2 3-96 1 2-27 2 3-62 1 3-97 2 2-28 1 3-63 2 3-98 1 2-29 2 3-64 1 3-99 2 2-30 1 3-65 2 3-100 1 2-31 2 3-66 1 2-32 1 3-67 2 2-33 2 3-68 1 2-34 1 3-69 2 3-35 2 3-70 1
Panel printing #1 – code… proc report panels=3 ps=39 ls=110; title2 "Printing in Multiple Columns..."; columns pat_id treat; define pat_id / display;run; • Key option is the “panels” option in the PROC REPORT statement • List variables to print in “columns” statement • Must “define” at least one of the columns
Panel printing #2 – output… /June_2006/proc_report_ex1.sas Simple Printing in Multiple Columns Using PROC REPORT pat_id treat pat_id treat pat_id treat 1-1 2 | 3-36 1 | 3-71 2 | 1-2 1 | 3-37 2 | 3-72 1 | 1-3 2 | 3-38 1 | 3-73 2 | 1-4 1 | 3-39 2 | 3-74 1 | 1-5 2 | 3-40 1 | 3-75 2 | 1-6 1 | 3-41 2 | 3-76 1 | 1-7 2 | 3-42 1 | 3-77 2 | 1-8 1 | 3-43 2 | 3-78 1 | 1-9 2 | 3-44 1 | 3-79 2 | 1-10 1 | 3-45 2 | 3-80 1 | 1-11 2 | 3-46 1 | 3-81 2 | 1-12 1 | 3-47 2 | 3-82 1 | 1-13 2 | 3-48 1 | 3-83 2 | 1-14 1 | 3-49 2 | 3-84 1 | 1-15 2 | 3-50 1 | 3-85 2 | 2-16 1 | 3-51 2 | 3-86 1 | 2-17 2 | 3-52 1 | 3-87 2 | 2-18 1 | 3-53 2 | 3-88 1 | 2-19 2 | 3-54 1 | 3-89 2 | 2-20 1 | 3-55 2 | 3-90 1 | 2-21 2 | 3-56 1 | 3-91 2 | 2-22 1 | 3-57 2 | 3-92 1 | 2-23 2 | 3-58 1 | 3-93 2 | 2-24 1 | 3-59 2 | 3-94 1 | 2-25 2 | 3-60 1 | 3-95 2 | 2-26 1 | 3-61 2 | 3-96 1 | 2-27 2 | 3-62 1 | 3-97 2 | 2-28 1 | 3-63 2 | 3-98 1 | 2-29 2 | 3-64 1 | 3-99 2 | 2-30 1 | 3-65 2 | 3-100 1 | 2-31 2 | 3-66 1 | 2-32 1 | 3-67 2 | 2-33 2 | 3-68 1 | 2-34 1 | 3-69 2 | 3-35 2 | 3-70 1 |
Panel printing #2 – code… proc report panels=3 ps=39 ls=110; title2 "Printing in Multiple Columns…"; columns pat_id treat lines; define pat_id / display; define lines / computed ' ';compute lines / char length=5; lines=" | ";endcomp;run; • Here we need to use a “compute” statement and column type “computed” to produce a new variable inside PROC REPORT
Panel printing #3 – output… /June_2006/proc_report_ex1.sas Printing Simple Patient Profiles/Medication Labels/Addresses in Multiple Columns Using PROC REPORT SITE: 1 SITE: 1 SITE: 1 SITE: 2 PATIENT: 1 PATIENT: 6 PATIENT: 11 PATIENT: 16 Rx Group: Old Rx Rx Group: New Rx Rx Group: Old Rx Rx Group: New Rx Outcome: No Event Outcome: No Event Outcome: No Event Outcome: Event SITE: 1 SITE: 1 SITE: 1 SITE: 2 PATIENT: 2 PATIENT: 7 PATIENT: 12 PATIENT: 17 Rx Group: New Rx Rx Group: Old Rx Rx Group: New Rx Rx Group: Old Rx Outcome: No Event Outcome: No Event Outcome: No Event Outcome: No Event SITE: 1 SITE: 1 SITE: 1 SITE: 2 PATIENT: 3 PATIENT: 8 PATIENT: 13 PATIENT: 18 Rx Group: Old Rx Rx Group: New Rx Rx Group: Old Rx Rx Group: New Rx Outcome: No Event Outcome: No Event Outcome: No Event Outcome: No Event SITE: 1 SITE: 1 SITE: 1 SITE: 2 PATIENT: 4 PATIENT: 9 PATIENT: 14 PATIENT: 19 Rx Group: New Rx Rx Group: Old Rx Rx Group: New Rx Rx Group: Old Rx Outcome: No Event Outcome: No Event Outcome: Event Outcome: No Event SITE: 1 SITE: 1 SITE: 1 SITE: 2 PATIENT: 5 PATIENT: 10 PATIENT: 15 PATIENT: 20 Rx Group: Old Rx Rx Group: New Rx Rx Group: Old Rx Rx Group: New Rx Outcome: No Event Outcome: No Event Outcome: Event Outcome: Event
Panel printing #3 – key commands… • Need to make a new variable “block” out of the variables of interest • Define this variable inside PROC REPORT using a “computed” variable • Use the “flow” option on the new variable and define special characters to denote breaks in the line printing • Set other variables to NOPRINT
Panel printing #3 – code… proc report panels=99 ps=39 ls=110 split='\'; title2 "Printing Simple Patient Profiles/Medication...“; columns centre patnum treat outcome block; define centre / display noprint; define patnum / display noprint; define treat / display noprint; define outcome / display noprint; define block / computed flow ' ' width=20;compute block / char length=100; block=' SITE: '||compress(centre)|| '\ PATIENT: '||compress(patnum)|| '\ Rx Group: '||put(treat,trt.)|| '\ Outcome: '||put(outcome,outof.)|| '\\\';endcomp;run;
ODS extension to graphics… • In test version in 9.1.3, becomes production in version 9.2 (~Spring 2007) • Essentially SAS will automatically produce graphics for standard PROCs like it does now for analysis summaries • You can then select which plots you want using an “ODS GRAPHICS” statement similar to the usual ODS command
A simple example… • A Kaplan-Meier Survival plot can be generated with the following simple code • Graph will be sent to HTML file with other requested output ods html; ods graphics on; proc lifetest plots=survival(hwb test=logrank); time T * Status(0); strata Group; run; ods graphics off; ods html close;
You just need to write your code like this… ods html; ods graphics on; proc lifetest data=BMT plots=survival(atrisk=0 to 2000 by 250 test=logrank); time T * Status(0); strata Group; run; ods graphics off; ods html close; • We have used the “atrisk” and “test” options to add sample sizes and p-values to the plot
ODS Graphics… • Currently can try this on about 20 SAS PROCs • Can try some of the features in LOGISTIC (effect size plots), REG (scatter plots, fitted values, etc), LIFETEST (KM plots), TIMESERIES (time series plots) and diagnostics for many others • Also can apply styles/templates to plots
My Trial with PROC REG… • I used the ODS Graphics statement on a recent PROC REG run to test this • Had trouble running in batch mode – I think this is correctable though • Output goes to HTML file: looks like this
PROC SQL… • Several great presentations on PROC SQL at SUGI this year – growing in popularity! • A great talk/paper by Defoor can take you through SQL code and the corresponding data step programming • Covers basic queries, joins, case statements, having statements and some ideas for fast merges on big datasets
SUGI Papers used here… SUGI papers available from: http://support.sas.com/usergroups/sugi/sugi31/index.html • Paper 192–31: Creating Statistical Graphics in SAS 9.2: What Every Statistical User Should Know (Rodriguez and Balan) • Paper 235-31: So You're Still Not Using PROC REPORT. Why Not? (Pass and Ewing) • Paper 250-31: Proc SQL – A Primer for SAS Programmers (Defoor) • Russ Lavery’s “A CDISC Sandbox” manual