410 likes | 816 Views
XML in a SAS World. Mike Molter d-Wise Technologies. <MyFamily> <Self eyecolor=" brown " sex=" M " htft=" 6 " htin=" 1 "> Mike </Self> <Spouse eyecolor=" hazel " sex=" F " htft=" 5 " htin=" 8 " seqno=" 1 "> Teresa </Spouse> <Children>
E N D
XML in a SAS World Mike Molter d-Wise Technologies
<MyFamily> • <Self eyecolor="brown" sex="M" htft="6" htin="1">Mike</Self> • <Spouse eyecolor="hazel" sex="F" htft="5" htin="8" seqno="1">Teresa</Spouse> • <Children> • <Child eyecolor="hazel" sex="F" htft="5" htin="9" seqno="1" momseq="1">Lauren</Child> • <Child eyecolor="brown" sex="M" htft="6" htin="0" seqno="2" momseq="1">Ryan</Child> • </Children> • <Pet type="dog" sex="F" color1="black" breed1="lab" breed2="unknown" seqno="1">Sydney</Pet> • </MyFamily>
Background • Author: Mike Molter • Company: d-Wise • Committees: CDISC XML Technologies, Phuse Working Group Best Practices • Reason for presentation: Increasing prevalence of XML in our industry
Agenda • What is XML? • Comparison to HTML • Purpose and use • Examples of XML standards (schemas) • Tools for working with XML (SAS and non-SAS) • XML in the pharmaceutical industry
HTML • Hypertext Markup Language • Language of the web • Provides instructions to web browsers for displaying content • Pre-defined elements <html> <table> <tr> <th>Team</th> <th>Conference</th> <th>Division</th> </tr> <tr> <td>Red Wings</td> <td>Eastern</td> <td>Atlantic</td> </tr>
What is XML? • eXtensibleMarkup Language • A data container - used for structure, storage, and transport of data (w3schools.com) • Like any other computer language… • textual gibberish • set of rules (structural, syntax) • vocabulary • elements • attributes • tags • schemas
What is XML? • Like any other computer language… • textual gibberish • set of rules (structural, syntax) • vocabulary • elements • attributes • tags • schemas • Unlike other computer languages… • no pre-defined element (no keywords) • no processor
<MyFamily> • <Self eyecolor="brown" sex="M" htft="6" htin="1">Mike</Self> • <Spouse eyecolor="hazel" sex="F" htft="5" htin="8" seqno="1">Teresa</Spouse> • <Children> • <Child eyecolor="hazel" sex="F" htft="5" htin="9" seqno="1" momseq="1">Lauren</Child> • <Child eyecolor="brown" sex="M" htft="6" htin="0" seqno="2" momseq="1">Ryan</Child> • </Children> • <Pet type="dog" sex="F" color1="black" breed1="lab" breed2="unknown" seqno="1">Sydney</Pet> • </MyFamily>
<nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> What is XML?
XML Schema • XML Schema(or Language, or Vocabulary) - A specific set of elements and attributes, along with a set of rules that govern their use • An XML schema can be a combination of new elements along with other XML schemas (extensible) • A schema file lays out the rules of an XML language. • An XML schema language is a computer language in which schema files are written. • Examples: DTD, XSD • An XML validator is a piece of software that uses the schema file to validate an XML file.
XML Language Examples • NHL (Ok, I made this one up) • XSL (eXtensibleStylesheetLanguage, .xsl) • Transforms XML into something else • XML Schema Definition (.xsd) • Validates an XML document • XML Spreadsheet 2003 (.xml) • Read and displayed by Excel • ODM, Define, Dataset-XML, Analysis Results Metadata, OpenCDISC • Clinical Trials data, metadata
Exporting XML Teams.sas7bdat
Exporting XML with a DATA step filename xmlout4 'C:\teams_datastep.xml' ; data _null_ ; file xmlout4 ; set teams end=thatsit ; if _n_ eq 1 then put '<nhl>' ; put '<team name="' name '">' ; put '<conference>' conference '</conference>' ; put '<division>' division '</division>' ; put '<location>' location '</location>' ; put '</team>' ; if thatsit then put '</nhl>' ; run;
Exporting XML with the LIBNAME statement libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run;
Exporting XML with the LIBNAME statement libname xmlout xml 'C:\teams_oracle.xml' xmltype=oracle ; data xmlout.xteams ; set teams ; run;
Exporting XML with the LIBNAME statement or ODS using tagsets libname xmlout xml 'C:\teams_tagset_libname.xml' tagset=<tagset-name> ; data xmlout.xteams ; set teams ; run; ods markup tagset=<tagset-name> file='C:\teams_tagset_ods.xml'; proc print noobs data=teams ; run; ods markup close ;
Exporting XML with ODS using SAS's ExcelXP tagset ods markup tagset=excelxp file='C:\teams_excel.xml'; proc print noobs data=teams ; run; ods markup close ;
Importing XML Export libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run; Import data sasteams ; set xmlout.xteams ; run;
NHL.XML libname xmlin xml 'C:\teams_nhl.xml' ; data sasteam ; set xmlin.team ; run; <nhl> <team name="Red Wings"> <conference>Eastern</conference> <division>Atlantic</division> <location>Detroit</location> </team> <team name="Flames"> <conference>Western</conference> <division>Pacific</division> <location>Calgary</location> </team> <team name="Devils"> <conference>Eastern</conference> <division>Metropolitan</division> <location>New Jersey</location> </team> </nhl> SASTEAM.SAS7BDAT
XML in Pharma • Operational Data Model (ODM) • Collected clinical trial data, metadata, administrative data, reference data, audit information • Define-XML • Metadata for submitted data in ODM structure • Value-level metadata is in the define extension • Dataset-XML • Submission data in ODM structure
XML in Pharma • Analysis Results Metadata • Metadata that describes the methods used for arriving at the results • OpenCDISC • Extension of Define-XML • Describes validation checks applicable to each domain
ODM Conventions • item • common element prefix • represents a variable • def • common element suffix • represents a definition • ref • common element suffix • represents a reference to a def • oid • common attribute suffix • object identifier • represents a link to another part of the document
ODM Clinical Data ItemGroup (dataset-level) Metadata
Clinical Data ODM ItemGroup (dataset-level) Metadata Item (variable-level) Metadata
ODM Item (variable-level) Metadata Codelist Metadata (allowable values)
Importing XML with an XML map • XMLMap is an XML schema • Provides instructions to the XML LIBNAME engine for reading XML • Name and Label for the data set • Which XML elements define observations • How to define variables (attributes and values) • Uses XPath syntax to navigate the XML document and identify its components filename mymap 'C:\mymap.map' ; libname xmlin xml 'C:\nhl.xml' xmlmap=mymap; data sasteams ; set xmlin.teams ; run;
Importing XML with an XML map <?xml version="1.0" encoding="UTF-8"?> <SXLEMAP version="1.2"> <TABLE name="SASTeams"> Name of data set to be created <TABLE-PATH syntax="XPath">/nhl/team</TABLE-PATH> Observation boundary <COLUMN name="conference"> <PATH syntax="XPath">/nhl/team/conference</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>20</LENGTH> </COLUMN> <COLUMN name="name"> <PATH syntax="XPath">/nhl/team/@name</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>20</LENGTH> </COLUMN> Variable Definition </TABLE> </SXLEMAP>
Extensible Stylesheet Language (XSL) • XSLT - XSL Transformations - transforms XML into something else • XSL is an XML schema • An XSL processor reads through an XML document and generates text according to instructions in the stylesheet • XSL processors: • SAS (PROC XSL) • Internet Explorer
Extensible Stylesheet Language (XSL) SAS's PROC XSL creates an output file, given an input file and a stylesheet filename inxml 'C:\mysubmission\define.xml' ; filename outhtml 'C:\mysubmission\define.html' ; filename xslss 'C:\mysubmission\define.xsl' ; proc xsl in=inxml out=outhtml xsl=xslss ; run;
Extensible Stylesheet Language (XSL) Internet Explorer renders XML as HTML Define.xml via text editor <?xml-stylesheet type="text/xsl" href="define.xsl"?> Define.xml via Internet Explorer HTML generated by XSL <caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption>
Extensible Stylesheet Language (XSL) <caption>Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2)</caption> <caption> <xsl:value-of select="$g_ItemGroupDefPurpose"/> Datasets for Study <xsl:value-of select="/odm:ODM/odm:Study/odm:GlobalVariables/odm:StudyName"/> ( <xsl:value-of select="$g_StandardName"/> <xsl:text> </xsl:text> <xsl:value-of select="$g_StandardVersion"/> )</caption>
Clinical Standards Toolkit (CST) • A Base SAS framework for executing clinical data tasks such as verification of data compliance against standards and importing/exporting ODM and Define.xml. • Contains all necessary files (SAS macros and driver programs, maps, property files, XSL stylesheets) • Learning curve
Clinical Standards Toolkit (CST) …or PROC XSL
References • Using the SAS Clinical Standards Toolkit 1.5 to Import CDISC ODM Files, Lex Jansen, Pharmasug 2013 • Using the SAS Clinical Standards Toolkit for Define.xml Creation, Lex Jansen, Pharmasug 2011 • Accessing the Metadata from the Define.xml Using XSLT Transformation, Lex Jansen, Phuse 2010
References A SAS Programmer's Guide to Generating Define.xml, Mike Molter, SAS Global Forum 2009 ods markup tagset=mydefine file='define.xml' ; proc print noobs data=meta-dataset1; run; proc print noobs data=meta-dataset2; run; proc print noobs data=meta-dataset3; run; etc ods markup close ;
Other Resources • LinkedIn Groups • CDISC XML Technologies • CDISC Define-XML • CDISC Dataset-XML • CDISC-SDTM Experts • wiki.cdisc.org • http://www.cdisc.org
In Summary… • Options for Exporting XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • ODS (SAS XML destinations or user-defined tagsets) • DATA step • XSL stylesheets • CST (clinical) • Options for Importing XML • XML LIBNAME engine (XMLTYPE=, TAGSET= options) • XML maps • XSL stylesheets • CST (clinical)
In Summary… So what do I need to know???