Late fusion methods and performance metrics for the effective prioritization of drug candidates

Late fusion methods and performance metrics for the effective prioritization of drug candidates Author: Gábor Csizmadia Supervisor: Péter Antal

Abstract There are many different ways to assess similarities between compounds, such as: • Structure-based • Chemical property-based • Biological effect-based • Literature-based If we combine different methods, we should get more accurate results -> data fusion Implemented software: rank and score fusion methods, performance metrics

Overview • Drug prioritization • Data fusion approaches • Rank/score fusion • Performance metrics • Implemented software • Future plans

Drug prioritization List of known active compounds for a specific condition Assess similarities to other compounds Predicting which compounds are active in the unknown set

Data fusion approaches • Early: data vectors are concatenated • Intermediate: similarity matrices are combined • Late: rankings or scorings are combined -> rank and score fusion

Rank and score fusion Learning to rank Rank fusion methods: • Borda fusion • Rank vote • Pareto ranking • Parallel selection Score fusion methods: • Sum score

Borda fusion • Each ranking assigns a certain number of points to the ranked compounds based on their rank • The points are then summed to get the score of each compound

Rank vote • Each ranking votes for its top n compounds • The ranking is based on how many votes a compound received

Pareto ranking Each compound is ranked based on the number of compounds better in all rankings

Parallel selection • Compounds are selected from each ranking in turn • If a compound that would be selected has already been selected before, the next compound from that ranking is selected instead

Sum score The normalized scores of each ranking are summed to get the fused score of a compound

Performance metrics 1. The performance of a ranking (how early it ranks actives) can be measured in various ways: Area under curve (AUC) values for the following: • AC (Accumulation Curve): plots the true positive rate as a function of the fraction of data classified as positive • ROC (Receiver Operating Characteristic): plots the true positive rate as a function of the false positive rate • CAC (Centralized AC) • CROC (Centralized ROC) ROC curve, source: Wikipedia

Performance metrics 2.

Implemented software • Java language • command line 2 modules: fuser (12 classes), performance tester (13 classes + 2 interfaces) • dedicated class for scored rankings: Ranking • common interface for all fusion methods: Fuser • common interface for all metrics: Metric • java fusiontester.Main [type] [r1path] [r1ms] [r2path] [r2ms] ... • java performancetester.Main [type] [rankingpath] [activespath]

Future plans • better handling of incomplete data • testing effects of noise • consider statistical significance of sources • ... • (TDK)

References • Bolgár Bence Márton. Kernel fúziós módszerek alkalmazása a genomikai kísérlettervezésben és adatelemzésben. 2012. • Fredrik Svensson, Anders Karlén, and Christian Sköld. Virtual Screening Data Fusion Using Both Structure- and Ligand-based Methods. J. Chem. Inf. Model. 2012, 52, 225−232. • S. Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily and Pierre Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Advance Access publication April 7, 2010. • Jean-François Truchon and Christopher I. Bayly. Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem. J. Chem. Inf. Model. 2007, 47, 488-508.

Late fusion methods and performance metrics for the effective prioritization of drug candidates

Late fusion methods and performance metrics for the effective prioritization of drug candidates

Presentation Transcript

PERFORMANCE METRICS

Gene prioritization through genomic data fusion

Effective Performance Management and Use of Performance Metrics Refresher

Methods for Learning Metrics

Key Metrics for Effective Storage Performance and Capacity Reporting

IP Performance Metrics: Metrics, Tools, and Infrastructure

The Definition of Performance Metrics for HEW

Methods and Metrics for Analysis of Sensemaking

Methods of Project Prioritization

Performance Metrics

Performance Metrics

Performance Metrics for Weatherization

The Role of Drug Metabolism Studies in Optimizing Drug Candidates

Performance Metrics for Weatherization

Performance Metrics

Performance Metrics

Performance metrics for caches