340 likes | 691 Views
Data Mining in the Pharmaceutical Industry. By Jerry Swartz. Introduction. Since I am a remote student, if there are questions, feel free to e-mail jswartz@ligand.com. Pharmaceutical Development. Four Stages of Drug Development Research finds new drugs
E N D
Data Mining in the Pharmaceutical Industry By Jerry Swartz
Introduction • Since I am a remote student, if there are questions, feel free to e-mail jswartz@ligand.com
Pharmaceutical Development • Four Stages of Drug Development • Research finds new drugs • Development tests and predicts drug behavior • Clinical trials test the drug in humans • Commercialization takes drug and sells it to likely consumers (doctors and patients) • I’ll show an example for the Research, Development, and Clinical Trials stages
Research Stage • Huge user of data mining tools and techniques • Scientists run experiments to determine activity of potential drugs • Uses high speed screening to test tens, hundreds, or thousands of drugs very quickly – this generates microarray data
Research Stage • “Bioinformatics” is a general term for the information processing activities on data generated in Research Stage, especially microarray data • General goal is to find activity on relevant genes or to find drug compounds that have desirable characteristics (whatever those may be)
Research Stage • Data mining techniques used • Clustering • Classification • Neural networks
Research Stage Example 1 • Goal: Determine compounds with similar activity • Why: Compounds with similar activity may behave similarly • When: • Have known compound and are looking for something better • Don’t have known compound but have desired activity and want to find compound that exhibits this activity
Research Stage Example 1 • Sample data
Research Stage Example 1 • Cluster compounds that have similar activity • We like behavior of H2O and want to see what compounds have similar activity • Example derived from Application of Nearest-Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery • Clustering takes place based on similar activity using Euclidean “distance.”
Research Stage Example 1 • For simplicity, distance in example is simply difference between Beta and Delta values, not Euclidean • Distances:
Research Stage Example 1 • Dendrogram 0.49 0.16 0.00 H2O2 CO2 H2O
Research Stage Example 1 • Conclusion: • H2O2 and CO2 are most alike but, • H2O2 behaves more like H2O than CO2 behaves like H2O
Research Stage Example 1 • Variations • Example clustering performed on activity • Clustering could have been performed on structure (i.e. find chemically similar compounds) • Clustering could have been performed on both structure and activity (called SAR – Structure Activity Relationship, see next slide)
Structure Activity Research Stage Example 1
Development Stage • Company thinks drug might have some benefit • Undergoes testing in animals, human tissue to observe effect; maybe limited human tests • Determine how much drug to consume for desired effect • How dangerous is drug?
Development Stage • Data mining techniques used • Classification • Neural networks
Development Stage Example 2 • Goal: Predict if treatment will aid patients • Why: If drug will not aid patients, what purpose does drug serve? • When: • Have data supporting use of drug • Have training data that shows effects of drug (positive or negative) • Want to be able to predict which patients will benefit
Development Stage Example 2 • Will treatment help sickle cell anemia patients? • We have information like gender, body weight, disease state, etc. • Feed these into neural network and predict whether patient will benefit from drug. • Example derived from Prediction of Sickle Cell Anemia Patient’s Response to Hydroxyurea Treatment Using ARTMAP Network
Development Stage Example 2 • Uses ARTMAP network which is similar to neural network • Instead of activation function, uses choice function which compares two values • Basically matches input to “template” and generates output • If input is similar enough to “template” it generates the corresponding output
Development Stage Example 2 • Imagine training data has one of two classifications (Yes and No) • Network is trained for the Yes classifications and a snapshot is taken of the neural network. • Network then trained for the No classifications and another snapshot is taken. • Output is Yes or No, depending on whether the inputs are more similar to the “Yes” or the “No” training data.
Development Stage Example 2 • ARTMAP Imagine array of weights, one for each “template” Template closest to input chosen. Weight Height Patient Benefits? Gender Blood Pressure Path of “least resistance” chosen for output.
Clinical Trials Stage • Company tests drugs in actual patients on larger scale • Must keep track of data about patient progress • Government wants to protect health of citizens, many rules govern clinical trials • In USA, Food and Drug Administration oversees trials.
Clinical Trials Stage • Data mining techniques used • Neural networks
Clinical Trials Stage • Data is collected by pharmaceutical company but undergoes statistical analysis to determine success of trial • Data reported to FDA inspected closely. Too many negative reactions might indicate drug is too dangerous – these are “adverse events” • Adverse event might be medicine causing drowsiness • Data mining performed by FDA, not as much by pharmaceutical companies
Clinical Trials Stage Example 3 • Goal: Detect when too many adverse events occur or detect link between drug and adverse event • Why: Too many adverse events linked to a drug might indicate drug is too dangerous or health of patient is at risk • When: • As adverse events are reported to FDA • Or when link is suspected
Clinical Trials Stage Example 3 • Is a drug causing “too many” adverse events? • We have number of reports of adverse events pertaining to drugs. • Feed these into neural network and let network lead us to what is “too many.” • Example derived from Data mining in the US Vaccine Adverse Event Reporting System (VAERS): early detection of intussusception and other events after rotavirus vaccination
Clinical Trials Stage Example 3 • Sample data – cells contain number of reports linking drug and adverse event
Clinical Trials Stage Example 3 • Uses Bayesian neural network • Prior probability is probability that any report contains reference to adverse event • Posterior probability is probability that report has link between drug and adverse event • Determines “strength” of link between adverse event and drug (called Information Component or IC) • More complicated than appears: patient may consume multiple drugs – which one caused adverse event?
Clinical Trials Stage Example 3 • Bayesian Neural Network Adverse Event Strength of link between adverse event and drug Drug
Clinical Trials Stage Example 3 • Could be solved using Bayes Theorem and correlation techniques • Number of possible drug/adverse event combinations is very, very, large • Training data is from FDA, WHO databases • Neural network hides statistical complexity • Unfortunately details of NN like activation function and hidden nodes are unknown
Data Mining Benefits • Research Stage – instead of trial and error, data mining can help find drugs that have desirable activity • Development Stage – data mining can help predict who will benefit from drug • Clinical Trials Stage – data mining protects patients and helps regulate drug testing • Commercialization Stage – data mining can optimize use of sales resources like manpower, advertising