Analysis of Microarray Data using Monte Carlo Neural Networks

Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey The Institute for Quantitative BiologyEast Tennessee State University

Outline of Talk • Microarray Data • Neural Networks • A Simple Perceptron Example • Neural Networks for Data Mining • A Monte Carlo Approach • Incorporating Known Genes • Models of the Neuron and Neural Networks

Microarray Data • Goal: Identify genes which are up- or down- regulated when an organism is in a certain state • Examples: • What genes cause certain insects to enter diapause (similar to hibernation)? • In Cystic Fibrosis, what non-CFTR genes are up- or down-regulated?

cDNA Microarray’s • Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference) • Synthesize cDNA’s from mRNA’s in cell • cDNA is long (500 – 2,000 bases) • But not necessarily the entire gene • Reference labeled green, Sample labeled red • Hybridize onto “spots”—each spot is a gene • Each “Spot” is often (but not necessarily) a gene • cDNA’s bind to each spot in proportion to concentrations

cDNA Microarray Data • Ri, Gi = intensities of ith spot • Absolute intensities often cannot be compared • Same reference may be used for all samples • There are many sources of Bias • Significant spot-to-spot intensity variations may have nothing to do with the biology • Normalization to Ri,= Gi on average • Most genes are unchanged, all else equal • But rarely is “all else equal”

Microarray Data • Several Samples (and References) • A time series of microarrays • A comparison of several different samples • Data is in the form of a table • Jth microarray intensities are Rj,i, Gj,I • We often have subtracted background intensity • Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?

MicroArray Data • We do not use Mj,I = log2( Rj,I / Gj,I ) • Large | Mj,I | = obvious up or down regulation • In comparison to other | Mj,I | • But must be large across all n microarrays • Otherwise, hard to make conclusions from Mj,I • It is often difficult to manage

Microarray Data Mi = Log2(Ri/Gi) Log2(Ri Gi)

Microarray Analysis • Is a classification problem • Clustering: Classify genes into a few identifiable groups • Principal Component Analysis: Choose directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters • Neural Nets and Support Vector Machines • Trained with Positive and Negative Examples • Classifies unknown as positive or negative

Artificial Neural Network (ANN) • Made of artificial neurons, each of which • Sums inputs from other neurons • Compares sum to threshold • Sends signal to other neurons if above threshold • Synapses have weights • Model relative ion collections • Model efficacy (strength) of synapse

Artificial Neuron Nonlinear firing function . . .

Firing Functions are Sigmoidal

Possible Firing Functions • Discrete: • Continuous:

3 Layer Neural Network The output layer may consist of a single neuron Output Input Hidden (is usually much larger)

ANN as Classifiers • Each neuron acts as a “linear classifier” • Competition among neurons via nonlinear firing function = “local linear classifying” • Method for Genes: • Train Network until it can classify between references and samples • Eliminating weights sufficiently close to 0 does not change local classification scheme

Multilayer Network . . . . . .

How do we select w’s • Define an energy function • t vectors are the information to be “learned” • Neural networks minimize energy • The “information” in the network is equivalent to the minima of the total squared energy function

Back Propagation • Minimize the Energy Function • Choose wj and aj so that • In practice, this is hard • Back Propagation with cont. sigmoidal • Feed Forward and Calculate E • Modify weights using a d rule • Repeat until E is sufficiently close to 0

ANN as Classifier • Remove % of genes with synaptic weights that are close to 0 • Create ANN classifier on reduced arrays • Repeat 1 and 2 until only the genes that most influence the classifer problem remain Remaining genes are most important in classifying references versus samples

Simple Perceptron Model Gene 1 w1 Gene 2 w2 wm Gene m The wi can be interpreted to be measures of how important the ith gene is to determining the output Input

Simple Perceptron Model • Features • The wi can be used in place of the Mji • Detects genes across n samples & references • Ref: Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004. • Drawbacks • The Perceptron is a linear classifier (i.e., only classifies linearly separable data) • How to incorporate known genes

Linearly Separable Data Separation using Hyperplanes

Data that Cannot be separated Linearly

Functional Viewpoint • ANN is a mapping f: Rn→ R • Can we train perceptron so that f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference? • Answer: Yes if data can be linearly separated, but no otherwise • So then can we design such a mapping for a more general ANN?

Hilbert’s Thirteenth Problem • Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?” • Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?

Kolmogorov’s Theorem Modified Version: Any continuous function f of n variables can be written where only h depends on f

Cybenko (1989) Let s be any continuous sigmoidal function, and let x = (x1,…,xn). If f is absolutely integrable over the n-dimensional unit cube, then for all e>0, there exists a (possibly very large ) integer N and vectors w1,…,wN such that where a1,…,aN and q1,…,qN are fixed parameters.

Recall: Multilayer Network . . . . . .

ANN as Classifer • Answer: (Cybenko) for any e>0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within e by a multilayer neural network. • But the weights no longer have the one-to-one correspondence to genes.

ANN and Monte Carlo Methods • Monte Carlo methods have been a big success story with ANN’s • Error estimates with network predictions • ANN’s are very fast in the forward direction • Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere)(De Freitas J. F. G., et. al., 2000)

Recall: Multilayer Network . . . . . . ajcorrespond to genes, but do not directly depend on a single gene. N Genes N node Hidden Layer

Naïve Monte Carlo ANN Method • Randomly choose subset S of genes • Train using Back Propagation • Prune based on values of wj (or aj , or both) • Repeat 2-3 until a small subset of S remains • Increase “count” of genes in small subset • Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset • Most frequent genes are the predicted

Additional Considerations • If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1. • This is a simple-minded Bayesian method. Bayesian analysis can make it much better. • Algorithm distributes naturally across a multi-processor cluster or machine • Choose the subsets first • Distribute subsets to different machines • Tabulate the results from all the machines

What Next… • Cybenko is not the “final answer” • Real neurons are much more complicated • ANN abstract only a few features • Only at the beginning of how to separate noise and bias from the classification problem. • Many are now looking at neurons themselves for answers

Synaptic Terminals Soma Axon Dendrites nucleus Myelin Sheaths Components of a Neuron

Signals Propagate to Soma Signals Decay at Soma if below a Certain threshold

Signals May Arrive Close Together If threshold exceeded, then neuron “fires,” sending a signal along its axon.

Signal Propogation along Axon • Signal is electrical • Membrane depolarization from resting -70 mV • Myelin acts as an insulator • Propagation is electro-chemical • Sodium channels open at breaks in myelin • Rapid depolarization at these breaks • Signal travels faster than if only electrical • Neurons send “spike trains” from one to another.

Hodgkin-Huxley Model • 1963 Nobel Prize in Medicine • Cable Equation plus Ionic Currents (Isyn) • Can only be solved numerically • Produce Action Potentials • Ionic Channels • n = potassium activation variable • m = sodium activation variable • h = sodium inactivation variable

Hodgkin-Huxley Equations where any V with subscript is constant, any g with a bar is constant, and each of the a’s and b’s are of similar form:

Hodgkin-Huxley nearly intractable • So researchers began developing artificial models to better understand what neurons are all about

A New Approach • Poznanski (2001): Synaptic effects are isolated into hot spots Synapse Soma

Tapered Equivalent Cylinder • Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder Soma

Tapered Equivalent Cylinder • Assume “hot spots” at x0, x1, …, xm . . . Soma 0 x0x1 . . .xml

Ion Channel Hot Spots • Ij is the ionic current at the jth hot spot • Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )

Convolution Theorem • The solution to the original is of the form • The voltage at the soma is

Ion Channel Currents • At a hot-spot, “voltage” V satisfies ODE of the form • Assume that a’s and b’s are large degree polynomials • Introduce a new family of functions • “Embed” original into system of ODE’s for

Linear Embedding: Simple Example To Embed Let . Then

Linear Embedding: Simple Example The result is The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation. However, linear embeddings do often produce good numerical approximations. Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s

convolutions of The Sumof Weighted sums of functions of one variable The Hot-Spot Model “Qualitatively” Kolmogorov’s Theorem (given that convolutions are related to composition)

Analysis of Microarray Data using Monte Carlo Neural Networks

Analysis of Microarray Data using Monte Carlo Neural Networks

Presentation Transcript

Microarray Data Analysis

Probabilistic Re-Analysis Using Monte Carlo Simulation

Microarray Data Analysis

Analysis of microarray data

Monte Carlo Analysis

Monte Carlo Analysis

Microarray Data Analysis Using BASE

Monte Carlo

Monte Carlo tuning using ATLAS data

Analysis of Microarray Data

Microarray data analysis

Microarray Data Analysis Using R

Monte Carlo uncertainty analysis 

Microarray Data Analysis Using BASE

Analysis of Microarray Data

Monte Carlo Simulation using @Risk

DNA Microarray Data Analysis using Artificial Neural Network Models.

Probabilistic Re-Analysis Using Monte Carlo Simulation

Monte Carlo Artificial Intelligence: Bayesian Networks

Microarray Data Analysis Using BASE