Transforming Data Analytics in Industry: Enhancing Decision-Making with Tools

Translate, Transfer, Transform: Academia, Industry and Enabling Data Analytic Tools Martin Owen Verity Fisher Emily Matthews David Woods 1 1

Why do we need to do something differently? “Because of the dissolution complexity and volume of data, it is not easy to keep all of the key results in your mind to make an informed decision. Historically only a few people have had enough working memory in their brain to do this… thus having a dashboard which visualizes the data during the meeting allows us to go beyond the limits of our brain capacity and make the right decisions”. Head of Drug Delivery , GlaxoSmithKline

What is the generalised problem we are wanting to solve? “how can we empower data creators and data users to improve the way project teams communicate risk, evaluate options through interactive models and make informed evidence-based decisions?” Case Study The support the development of a formulated product in the pharmaceutical industry • Design of formulation impact on • Stability • Dissolution

How are we going to do this? The multidisciplinary collaboration • Aim: to generate enhanced statistical design, modelling and visualisation capability. • fundamental science • development of innovative statistical methodology • implementation and application of the solutions • translation and communication of the results. Desired output • the use of maximally efficient quantitative and statistical methods, • better understanding and quantification of uncertainty and risk to product quality .

Informatics • SEEK, INCUBATE, INDUSTRIALISE the creation of task specific applications Case studies Incubation (JMP): Accelerated stability Seek (R): Modelling dissolution profiles

(1) Standardised, automated data import Fig 1a: Launch dashboard Fig 1b: Switch project (1c), access data tables, re-run dashboard, retrieve existing models Fig 1c: Create new project, select existing project Fig 1d: Confirm table selection, access exploratory data analysis options Fig 1d Fig 1c Fig 1b Fig 1a

(2) Exploratory Data Analysis/Cleanse data • Multifaceted views of the same data set give different insights • Does the data make sense? • Data manipulation (stack/split) occurs behind scenes

(3a) Overview of risk and uncertainty

(4a) Centre focus

(4b) Centre focus drill down

(5) Automated model selection heuristics

(6) Diagnostics: How good is the model?

(6) What can the model tell us?

(6) Depiction of uncertainty

Application: Compare concepts and contrast the specifics Accelerated Stability Dissolution Standardised, automated data workflow Exploratory data analysis Views X,Y,Z..... Automated modeling and manual assessment of quality of models Observed dissolution data Observed test reference data Weibull Gompertz Asymptotic Origin Logistic Heuristic-based model selection Diagnostic evaluation • Standardised, automated data workflow • Exploratory data analysis • Views A,B,C ...... • Automated modeling and manual assessment of quality of models • Observed long term data • Relative Humidity Linear Time Kinetic • Absolute Humidity Linear Time Kinetic • Relative Humidity Accelerating Kinetic • Absolute Humidity Accelerating Kinetic • Relative Humidity Decelerating Kinetic • Absolute Humidity Decelerating Kinetic • Relative Humidity Power Model • Absolute Humidity Power Model • Heuristic-based model selection • Diagnostic evaluation

Modelling Dissolution Profiles Emily Matthews and Dave Woods Southampton Statistical Sciences Research Institute {E.S.Matthews, D. Woods}@soton.ac.uk

Introduction • Hierarchical Modelling • Stage 1 • Stage 2 • Model Assessment • Visualisation Using RStudio

What is a Profile?

Why Automate the Modelling? • Capsule experiment – non-regular fractional factorial design. • 6 factors, 16 runs. • Aim: model dissolution profiles to identify treatments which pass tests. • Profile for treatment ‘suitably’ close to the reference. • Four dissolution tests in four different media. • Capsule dissolved in three to twelve vessels per test. • 243 dissolution curves.

Two-Stage Hierarchical Model We have used a two-stage hierarchical model for the dissolution curves: • Stage 1: Fit a model to each dissolution curve: • . , ~ • Model fit assessed using R2, AIC and BIC. • Stage 2: Fit a linear regression model ~ to the p stage 1 parameter estimates. To predict the dissolution curve for a new treatment, we evaluate the stage 1 model using parameters predicted using the stage 2 model.

Example – Gompertz Model • Stage 1: • Stage 2:

Parametric Models for Dissolution Profiles

Problems with Profiles:Test 1 and 2

Models for Stage 1 • Test 1: linear model, using lm in R. • Test 2: non-parametric model, principal components analysis (Jolliffe, 2002) using svd in R. • Test 4: Weibull model,

Modelling for Stage 2 Two methods considered to find estimates for the stage two parameters: • The Expectation-Maximisation (EM) Algorithm (Davidian and Giltinan, 1995). • Samples from the Metropolis-Hastings within Gibbs Sampling Algorithm by Matthews and Woods (2015). • Variable selection. • Model averaged predictions.

Model Assessment

Centre Point

Factor 2 Low

Factor 2 High

Factor 4 Low

Factor 4 High

Factor 6 Low

Factor 6 High

Bibliography Davidian, M. and Giltinan, D.M. (1995) Nonlinear Models for Repeated Measurement Data. No. 62 in Monographs on Statistics and Applied Probability. Florida: Chapman and Hall. Jolliffe, I.T. (2002) Principal components analysis. New York: Springer, second edn. Matthews, E.S. and Woods, D.C. (2015) A Bayesian analysis of split-plot designs with spike-and-slab priors. In Preparation.

Summary

Seek and Incubate Formulation Workflow “New” model building methodology & selection (R) X Import model description Data visualisation and “Existing” Model building & selection (JMP) Dashboard Application Output for general users (JMP) Mainstream Scientific decision makers Data

Acknowledgements include.... • Case study 1 • ASAP Incubation Team • Don Clancy, Jonathan Dean, Neil Hodnett, Rachel Orr, Martin Owen, John Peterson • David Burnham (PegaAnalytics) • Case study 2 • Capsule Challenge Team • The GSK Project team members • Emily Matthews, Dave Woods, Verity Fisher

Transforming Data Analytics in Industry: Enhancing Decision-Making with Tools

Transforming Data Analytics in Industry: Enhancing Decision-Making with Tools

Presentation Transcript

Tools for VDM in Industry

How to Apply to Academia: How to get a faculty position from an industry job

Instruction Set of 8086 Microprocessor

Sensor Networks: Technology Transfer

Lecture 13: Inverse Laplace Transform

Personal Transparency and self-analytic tools for online habits

Enabling data management in a big data world

Collection Analytic Building Blocks

Data Normalization Milestones

Tools to Transform Teaching Technology in STEM 213B

Engaging Academia in the Industry Skills Agenda

Yonsei University Health System

Data Sharing Between Academia and Industry: Drug Companies have a Conscience Too

Chapter 6 z-Transform

Abstract

Bulk Data Transfer Tools

Enabling Data Intensive Applications using Logistical Networking Tools

Enabling environments for technology transfer: ‘some food for thought’

The Academia-Industry Marriage: How to get the I do!

ASTER DATA Transfer System

Local Employment Dynamics’ Analytic Tools Industry Focus

Abstract