Issues in the Practical Application of Data Mining Techniques to Pharmacovigilance

Issues in the Practical Application of Data Mining Techniques to Pharmacovigilance A. Lawrence Gould Merck Research Laboratories May 18, 2005

Spontaneous AE Reports • Clinical trial safety information is incomplete • Few patients -- rare events likely to be missed • Not necessarily ‘real world’ • Need info from post-marketing surveillance & spontaneous reports : Pharmacovigilance • Carried out by skilled clinicians & epidemiologists • Long history of research on issue, e.g. • Finney (1974, 1982) Royall (1971) • Inman (1970) Napke (1970)

Signal Generation: The Traditional Method PatientExposure ComparativeData ConsultMarketing ConsultDatabase Consultation Single suspicious case or cluster PotentialSignals RefinedSignal(s) Action IntegrateInformation IdentifyPotentialSignals ConsultLiterature ConsultProgrammer StatisticalOutput BackgroundIncidence

Some Limitations of Traditional Approach • Incomplete reports of events, not reactions • How to compute effect magnitude • Many events reported, many drugs reported • Bias & noise in system • Difficult to estimate incidence because no. of pats at risk, pat-yrs of exposure seldom reliable • Inappropriate to consider incidence using only spontaneous reports

The Pharmacovigilance Process Traditional Methods Data Mining Detect Signals Generate Hypotheses Insight from Outliers Public Health Impact, Benefit/Risk Refute/Verify Type A (Mechanism-based) Estimate Incidence Act Inform Type B (Idiosyncratic) Restrict use/ withdraw Change Label

Major Uses of Data Mining • Identify subtle associations that might exist in large databases • Early identification of potential toxicities • Identify complex relationships not apparent by simple summarization • Screening tool to identify potential associations to undergo clinical/epidemiological followup

More to Pharmacovigilance than Data Mining • Data mining a refinement to discover subtleties • Still need initial case review respond to reports involving severe, potential life-threatening events eg., Stevens-Johnson syndrome, agranulocytosis, anaphylactic shock • Clinical/biological/epidemiological verification of apparent associations is essential • Need to think about most effective use of data mining in routine pharmacovigilance practice

Statistical Methodology (1) • Not the key issue • Most use variations of 2-way table statistics Basic idea: Flag when R = a/E(a) is “large” • Some possibilities • Reporting Ratio: E(a) = nTD  nTA/n • Proportional Reporting Ratio: E(a) = nTD  c/nOD • Odds Ratio: E(a) = b  c/d

Statistical Methodology (2) • Estimate variability in various ways, e.g., usual chi-square statistic, Bayesian & Empirical Bayesian models) • Similar results for all methods if more than a few drug/event combinations reported (e.g., 10) • No non-clinical “gold standard” → can’t assess diagnostic utility of any method in usual sense • OR > PRR > RR when a > E(a), doesn’t mean OR identifies real associations better than RR • RR probably most stable

Spontaneous Report Database Limitations • Significant under reporting (esp. OTC) -- depending on seriousness or novelty of event, newness of drug, intensity of monitoriing • Different regulatory reporting requirements • Reflects only reporting practice, not incidence • Synonyms for drugs & events → sensitivity loss • Much duplication of reports • Exposure rate unknown • For any given report, there is no certainty that a suspected drug caused the reaction

A Major Limitation (Often Ignored) • Accumulated reports cannot be used to calculate incidence or to estimate drug risk. Comparisons between drugs cannot be made from these data • Unfortunately, this still is done – disclaimers do not balance the effect of the misrepresentation • Easy to show differences with data mining techniques, but impossible to make valid inferences about causality and may mislead

Implementation Issues • Portfolio bias in company databases can lead to inaccurate estimates of relative reporting rates • Does public health benefit justify cost of following up signals detected by routine data mining methods? • Variation in tools and databases among regulators could lead to significant cost without public health benefit • Do frequency-based signal detection methods useful to regulators have business value in industry settings? • Need examples of situations where computerized approach failed to identify important issues and where signals were “created” by publicity or reporting artifacts

Mining is Easy, Refining Low-grade Ore is Hard • What is data mining activity intended to accomplish -- what are the clinical/epidemiological/regulatory questions that need to be answered • Need to address the impact of various factors, e.g., evolution of apparent association over time, association with key demographic factors such as age, sex, disease classification

More Issues • Composition of database may be important, important associations of a new drug could be cloaked by events associated with an old drug with similar mechanism of action • Individual company databases tend to have comprehensive information about company products, but not general spectrum of drugs/ vaccines • Databases contain reports mentioning drugs, not demonstrations of causality

Discussion • Most apparent associations represent known problems • Some reflect disease or patient population • ~ 25% may represent signals about previously unknown associations • Statistical involvement in implementation & interpretation is important • The actual false positive rate is unknown as are the legal and resource implications

What Next? • PhRMA/FDA working group is considering ways to address issues – white paper will be published • May be worthwhile to construct & maintain a cleaned-up canonical database from AERS to provide a common resource for checking data mining findings based on individual company proprietary databases

Issues in the Practical Application of Data Mining Techniques to Pharmacovigilance

Issues in the Practical Application of Data Mining Techniques to Pharmacovigilance

Presentation Transcript

Data Mining Practical Machine Learning Tools and Techniques

Dental Data Mining: Practical Issues and Potential Pitfalls

Application of Data Mining Techniques to Industrial Processes to Improve Business Performance

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

Bayes, Data Mining and Pharmacovigilance

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Techniques

Data Mining in Practice: Techniques and Practical Applications

Application and Analysis of Data Mining Techniques in a Learning System

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

Issues in Data Mining Infrastructure

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Techniques

Pharmacovigilance Techniques

Application of Data Mining to Physics Summary Data

Issues in Data Mining Infrastructure

Data Mining Techniques