200 likes | 346 Views
Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions. Agenda. Typical work flow when sponsors create the SDTM / ADaM in-house and collaborate with vendors for the Define files Define.xml Sections
E N D
Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions
Agenda • Typical work flow when sponsors create the SDTM / ADaM in-house and collaborate with vendors for the Define files • Define.xml Sections • Define.xml Process - How do we go about extracting the information from the data & documents provided ..? • Validating Define.xml & the typical Checks • Common Issues • Conclusion – How soon should the sponsor start..?
Typical Work flow collaborating with a Vendor for creating Define files Sponsor provides the documents & Draft Data Run the compliance / structure checks on the data Generate draft Define.xml & run the compliance checks Sponsor reviews the findings and update the specification / dataset / annotation Summarize the Issues/findings and deliver the draft define for review Send the updated Annotations/Specification / XPTs back to the vendor for a final delivery (Pass II) Runs the compliance checks, re-generate the final version of Define (Pass II)
Inputs that are provided.. • Annotated Case Report Form • Mapping Specification documents • SAS Datasets / XPTs • Sponsor Controlled Terminology Documents, if applicable • Protocol, if Trial Design Domain to be produced • Data Guide / Supplemental Document
Define.XML Section • TOC – Metadata of Datasets • blankcrf (Annotated ) • Variable Level Metadata • Value Level Metadata • Controlled Terminology • Computational Algorithms • Supplemental Data Definition Document
Define.XML Section (Not visible through the Style Sheet) • Xmlns - Identifies the default namespace for this document • ODMVersion - Identifies the ODM version that underlies the schema for the Define-XML • FileOID - unique identifier for this file. • CreationDateTime - When the specific version of the define.xml file was created. • StudyName, StudyDescription, ProtocolName – Study level Information
Define.XML Components and how do we generate them… • MetaData Generation – • DOMAIN Level • VARIABLE Level • VALUE Level • ORIGIN, CODELIST, Comments and Computational Algorithm • blankcrf, Data Guide / Supplemental Docs • Generate Define.xml • Validate Define files
Input Sheet for Define.XML Generation • DOMAIN Level Input – SAS based macro utility will create the Input s for this sheet based on the Datasets provided VARIABLE METADATA – By reading through the metadata of the SAS datasets provided, variable Level metadata input sheet is populated.
Input Sheet for Define.XML Generation • ORIGIN information will be extracted based on the Annotations & Mapping Specification provided. Based on the variables for which CODELIST , COMPUTATION ALGORITHM and VALUELIST need to be populated, OID will be assigned here. Based on the OIDs assigned in the VARIABLE LEVEL sheet, VALUE LEVEL input sheet and CODELIST input sheet will be generated by reading the data and the associated codelist files.
Input Sheet for Define.XML Generation • Value Level Input • Codelist / Computation Methods Input
External Documents – blankcrf & Data Guide • Annotated Case Report Form and Supplemental Documents like Data Guide will be linked to the define.xml • ORIGIN Page number presented as part of the variable level metadata must be hyperlinked to the corresponding CRF pages attached to the Define file.
Input Sheet for Define.XML Generation • Once the Domain Level, Variable level, Value Level, Codelist sheets are created, external documents linked and the ORIGIN, COMPUTATIONAL ALGORITHM & External Dictionary information updated and inputs reviewed, DEFINE.XML can be generated
Validation Checks • Structural Checks:
Validate Define.XML • A valid Define.xml should be well formed & conform to the XML schemas. Should reference correct versions of CDISC standards.
Common Issues • Origin is ‘CRF’, but not annotated. ORIGIN ‘Derived’ but annotated in the CRF. • Key variables not properly defined. • While presenting Custom domains, Domain assumption should be followed. Sometimes custom domains derived without a TOPIC variable. • Subjects collected as part of external data LB/EG, but not populated in DM domain. All Subjects must be present in DM domain. • One-to-one relationship missing across some of the paired variables like TEST / TESTCD, PARAM / PARMCD, VISIT / VISITNUM, AVISIT /AVISITN, TPT / TPTNUM TPT & TPTREF • Common variables across different domains having different ORIGIN derivation. If it’s the same across, can go with “Copied from ADSL.XX”
Common Issues (contd) • Generally, XPTs up to 1 GB size is fine. If the XPT file size exceeds 1GB, it must be split to smaller datasets not exceeding 1 GB. Study Data Specifications • Split files should have the same metadata structure so that concatenation / merging of the split datasets should be feasible. Both smaller split files & larger (non-split) file should be included. • Split datasets and the method applied should be documented in the data guide • If not following linear approach, need to make sure consistency between ADaM/SDTM sources.
Common Issues (Contd) • ADaM when derived in a Parallel Stream might require extra efforts for ensuring traceability & Data Lineage.
Conclusion • Finalize the scope of the work being outsourced / to be performed by the vendor. • Explain the process being followed and agree to a common form for exchange of documets that could expedite the Define files generation. • While working across a family of similar studies within the same indication, after a couple of iterations/studies, should look for achieving better efficiency. • Identify the Vendor(s) at least three months before you expect the first Define.XML to be published. If possible, do a pilot or DEMO define.