70 likes | 77 Views
Explore the role of dynamism and automation in clinical data integration, with a focus on storage structures, transformation rules, and target structure based on study analytical needs. Learn about the integration approaches, warehouse and hub, and the impact of metadata-driven data processing.
E N D
Metadata Driven Clinical Data Integration – Integral to Clinical AnalyticsApril 11, 2016 Kalyan Gopalakrishnan, Priya Shetty Intelent Inc. Sudeep Pattnaik, Founder, Thoughtsphere
Role of Dynamism and Automation in Integration Dynamism Drivers for Dynamism and Automation • Storage structures at appropriate levels of hierarchy and stages of data lifecycle need to be dynamic • Such dynamism needs to be planned either by leveraging existing metadata or manufactured metadata • Alternatively or in addition, a robust user interface or means of configuration can address gaps. • Key is to minimize code change. • Source structure, transformation rules, target structure based on study analytical needs. Most could vary across studies. • This warrants a set of dynamic transformation rules to accommodate heterogeneous needs. • In addition the structure of the source, the physical storage, maturity of the data transfer mechanism and relevant data dictionaries could vastly vary as well • Important to minimize and possibly avoid any code change, transformation pre-processing services in data ingestion layer. • These are simply costly and time consuming, and discourages adoption within the enterprise. Automation • High availability of data to points of analysis • From disparate sources: Raw source data, integrated data across CTMS, IxRS, EDCs, Labs, Reconciled, cleansed/not cleansed, aggregated data • Based on use cases - interim analysis, submission, operational metrics, central monitoring, medical monitoring etc. • Key is to automate data delivery in appropriately usable format with minimal manual intervention
Integration Aspects Warehouse Approach Hub Approach Integration – Two Approaches • Pre-Modeling Required • Structure oriented • Generic content model (schema) required based on storage technology. For Ex. Form/Domain level storage • No Pre-Modeling, Loosely coupled • Storage granularity preserved as per the source system. • Data tagged at appropriate level after reconciliation. Storage and Modeling • System agnostic Integration. Data is ingested at source level granularity without pre-processing – ELT approach. • Requires source feeds to adhere to input descriptionsor requires setup / configuration • Robust mapping user interface @ Study level which utilizes a mapping library with auto (machine) learning technologies – promotes mapping reuse across studies • Post processing pipeline architecture • Requires source system adapters, Pre formatting to warehouse structure – ETL approach. • Enabling dynamism and automation for transformations, requires: • Availability of a repository of governed metadata – structural and transformational. • Interface that allows study level mappings and leveraging existing library of rules • Multiple adapter development, especially with external sources (Labs/partner data) Source Data Integration Data Processing • Transformations accomplished on an as-needed basis, in a post-processing layer, based on business needs. For Ex: • Operational review processes need subject level data granularity • Bio-statistical programming processes need SDTM +/- domain level tabulated data • Heavy reliance on data pre-processing before loading into the warehouse • Time consuming and costly
Metadata Driven Data Processing Business Issue • How do we provide quicker access to source and analysis ready data? • How do we adapt to changes in regulatory standards rapidly and apply these changes to business and operational processes? • How do we bring in more efficiency in the source to target mapping and transformation processes? Solution Overview Solution Impact • Data Ingestion Framework ingests data from Diverse Sources (clinical data, operational data, reference data) • Populate Structural Metadata (Source/Target/Reference) and Transformational Metadata (rules/derivations) in Metadata Repository • Dynamic Process applies transformation rules on source data to generate target datasets • High Availability of Data (Source, Integrated, Standardized) • Reusability of Standard Algorithms • Dynamic Automated Process • Accelerated Path for Submissions • Enhanced Support for Product Defense, Data Sharing, Data Mining • Traceability
Dynamic SAS process leverages SAS Macros corresponding to transformational metadata • Structural &Transformational Metadata extracted from Metadata Repository drives dynamic program for generating hybrid SDTM target datasets Approach 1 - Metadata Driven Dynamic SAS Engine • Source to Target Transformations – Updates in metadata repository applied in next run, MedDRA Merge, ISO Formats
Approach 2 – ClinDAP - Thoughtsphere’sMetadata Driven Source System Agnostic Clinical Data AggregationFramework ClinDAP - Next Generation Data Aggregation Platform • Source System Agnostic Data Aggregation Framework • Proprietary algorithms to aggregate disparate data sources (EDC, CTMS, IVRS, Labs, ePro, etc.) • Document-oriented database readily assembles any structured or unstructured data • Robust Mapping Engine, extensible rule library reusable across studies (Hybrid SDTM) • Interactive visualization-based data discovery • Robust Mapping Framework – Reusable mapping library, Leverage existing SAS libraries, Specify complex study level transformations, Extensible targets – Hybrid SDTM, ADaM • Ability to operationalize analytics is possible when you enable automation and dynamism to integrate data and generate standardized datasets