520 likes | 698 Views
Analysis of Microarray Data using Monte Carlo Neural Networks. Jeff Knisley, Lloyd Lee Glenn, Karl Joplin, and Patricia Carey The Institute for Quantitative Biology East Tennessee State University. Outline of Talk. Microarray Data Neural Networks A Simple Perceptron Example
E N D
Analysis of Microarray Data using Monte Carlo Neural Networks Jeff Knisley, Lloyd Lee Glenn,Karl Joplin, and Patricia Carey The Institute for Quantitative BiologyEast Tennessee State University
Outline of Talk • Microarray Data • Neural Networks • A Simple Perceptron Example • Neural Networks for Data Mining • A Monte Carlo Approach • Incorporating Known Genes • Models of the Neuron and Neural Networks
Microarray Data • Goal: Identify genes which are up- or down- regulated when an organism is in a certain state • Examples: • What genes cause certain insects to enter diapause (similar to hibernation)? • In Cystic Fibrosis, what non-CFTR genes are up- or down-regulated?
cDNA Microarray’s • Obtain mRNA from a population/tissue in the given state (sample) and a population/tissue not in the given state (reference) • Synthesize cDNA’s from mRNA’s in cell • cDNA is long (500 – 2,000 bases) • But not necessarily the entire gene • Reference labeled green, Sample labeled red • Hybridize onto “spots”—each spot is a gene • Each “Spot” is often (but not necessarily) a gene • cDNA’s bind to each spot in proportion to concentrations
cDNA Microarray Data • Ri, Gi = intensities of ith spot • Absolute intensities often cannot be compared • Same reference may be used for all samples • There are many sources of Bias • Significant spot-to-spot intensity variations may have nothing to do with the biology • Normalization to Ri,= Gi on average • Most genes are unchanged, all else equal • But rarely is “all else equal”
Microarray Data • Several Samples (and References) • A time series of microarrays • A comparison of several different samples • Data is in the form of a table • Jth microarray intensities are Rj,i, Gj,I • We often have subtracted background intensity • Question: How can we use Rj,i, Gj,I for n samples to predict which genes are up- or down- regulated for a given condition?
MicroArray Data • We do not use Mj,I = log2( Rj,I / Gj,I ) • Large | Mj,I | = obvious up or down regulation • In comparison to other | Mj,I | • But must be large across all n microarrays • Otherwise, hard to make conclusions from Mj,I • It is often difficult to manage
Microarray Data Mi = Log2(Ri/Gi) Log2(Ri Gi)
Microarray Analysis • Is a classification problem • Clustering: Classify genes into a few identifiable groups • Principal Component Analysis: Choose directions (i.e., axes) (i.e., principal components) that reveal the greatest variation in the data and then find clusters • Neural Nets and Support Vector Machines • Trained with Positive and Negative Examples • Classifies unknown as positive or negative
Artificial Neural Network (ANN) • Made of artificial neurons, each of which • Sums inputs from other neurons • Compares sum to threshold • Sends signal to other neurons if above threshold • Synapses have weights • Model relative ion collections • Model efficacy (strength) of synapse
Artificial Neuron Nonlinear firing function . . .
Possible Firing Functions • Discrete: • Continuous:
3 Layer Neural Network The output layer may consist of a single neuron Output Input Hidden (is usually much larger)
ANN as Classifiers • Each neuron acts as a “linear classifier” • Competition among neurons via nonlinear firing function = “local linear classifying” • Method for Genes: • Train Network until it can classify between references and samples • Eliminating weights sufficiently close to 0 does not change local classification scheme
Multilayer Network . . . . . .
How do we select w’s • Define an energy function • t vectors are the information to be “learned” • Neural networks minimize energy • The “information” in the network is equivalent to the minima of the total squared energy function
Back Propagation • Minimize the Energy Function • Choose wj and aj so that • In practice, this is hard • Back Propagation with cont. sigmoidal • Feed Forward and Calculate E • Modify weights using a d rule • Repeat until E is sufficiently close to 0
ANN as Classifier • Remove % of genes with synaptic weights that are close to 0 • Create ANN classifier on reduced arrays • Repeat 1 and 2 until only the genes that most influence the classifer problem remain Remaining genes are most important in classifying references versus samples
Simple Perceptron Model Gene 1 w1 Gene 2 w2 wm Gene m The wi can be interpreted to be measures of how important the ith gene is to determining the output Input
Simple Perceptron Model • Features • The wi can be used in place of the Mji • Detects genes across n samples & references • Ref: Artificial Neural Networks for Reducing the Dimensionality of Gene Expression Data, A. Narayanan, et al. 2004. • Drawbacks • The Perceptron is a linear classifier (i.e., only classifies linearly separable data) • How to incorporate known genes
Linearly Separable Data Separation using Hyperplanes
Functional Viewpoint • ANN is a mapping f: Rn→ R • Can we train perceptron so that f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference? • Answer: Yes if data can be linearly separated, but no otherwise • So then can we design such a mapping for a more general ANN?
Hilbert’s Thirteenth Problem • Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?” • Modern: Can any continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?
Kolmogorov’s Theorem Modified Version: Any continuous function f of n variables can be written where only h depends on f
Cybenko (1989) Let s be any continuous sigmoidal function, and let x = (x1,…,xn). If f is absolutely integrable over the n-dimensional unit cube, then for all e>0, there exists a (possibly very large ) integer N and vectors w1,…,wN such that where a1,…,aN and q1,…,qN are fixed parameters.
Recall: Multilayer Network . . . . . .
ANN as Classifer • Answer: (Cybenko) for any e>0, the function f(x1,…,xn) =1 if x vector is from a sample and f(x1,…,xn) =0 if x is from a reference can be approximated to within e by a multilayer neural network. • But the weights no longer have the one-to-one correspondence to genes.
ANN and Monte Carlo Methods • Monte Carlo methods have been a big success story with ANN’s • Error estimates with network predictions • ANN’s are very fast in the forward direction • Example: ANN+MC implement and outperform Kalman Filters (recursive linear filters used in Navigation and elsewhere)(De Freitas J. F. G., et. al., 2000)
Recall: Multilayer Network . . . . . . ajcorrespond to genes, but do not directly depend on a single gene. N Genes N node Hidden Layer
Naïve Monte Carlo ANN Method • Randomly choose subset S of genes • Train using Back Propagation • Prune based on values of wj (or aj , or both) • Repeat 2-3 until a small subset of S remains • Increase “count” of genes in small subset • Repeat 1-5 until each gene has 95% probability of appearing at least some minimum number of times in a subset • Most frequent genes are the predicted
Additional Considerations • If a gene is up-regulated or down-regulated for a certain condition, then put it into a subset in step 1 with probability 1. • This is a simple-minded Bayesian method. Bayesian analysis can make it much better. • Algorithm distributes naturally across a multi-processor cluster or machine • Choose the subsets first • Distribute subsets to different machines • Tabulate the results from all the machines
What Next… • Cybenko is not the “final answer” • Real neurons are much more complicated • ANN abstract only a few features • Only at the beginning of how to separate noise and bias from the classification problem. • Many are now looking at neurons themselves for answers
Synaptic Terminals Soma Axon Dendrites nucleus Myelin Sheaths Components of a Neuron
Signals Propagate to Soma Signals Decay at Soma if below a Certain threshold
Signals May Arrive Close Together If threshold exceeded, then neuron “fires,” sending a signal along its axon.
Signal Propogation along Axon • Signal is electrical • Membrane depolarization from resting -70 mV • Myelin acts as an insulator • Propagation is electro-chemical • Sodium channels open at breaks in myelin • Rapid depolarization at these breaks • Signal travels faster than if only electrical • Neurons send “spike trains” from one to another.
Hodgkin-Huxley Model • 1963 Nobel Prize in Medicine • Cable Equation plus Ionic Currents (Isyn) • Can only be solved numerically • Produce Action Potentials • Ionic Channels • n = potassium activation variable • m = sodium activation variable • h = sodium inactivation variable
Hodgkin-Huxley Equations where any V with subscript is constant, any g with a bar is constant, and each of the a’s and b’s are of similar form:
Hodgkin-Huxley nearly intractable • So researchers began developing artificial models to better understand what neurons are all about
A New Approach • Poznanski (2001): Synaptic effects are isolated into hot spots Synapse Soma
Tapered Equivalent Cylinder • Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder Soma
Tapered Equivalent Cylinder • Assume “hot spots” at x0, x1, …, xm . . . Soma 0 x0x1 . . .xml
Ion Channel Hot Spots • Ij is the ionic current at the jth hot spot • Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 (plus boundary conditions )
Convolution Theorem • The solution to the original is of the form • The voltage at the soma is
Ion Channel Currents • At a hot-spot, “voltage” V satisfies ODE of the form • Assume that a’s and b’s are large degree polynomials • Introduce a new family of functions • “Embed” original into system of ODE’s for
Linear Embedding: Simple Example To Embed Let . Then
Linear Embedding: Simple Example The result is The result is an infinite dimensional linear system which is often as unmanageable as the original nonlinear equation. However, linear embeddings do often produce good numerical approximations. Moreover, linear embedding implies that each Ij is given by a linear transformation of the vector of U’s
convolutions of The Sumof Weighted sums of functions of one variable The Hot-Spot Model “Qualitatively” Kolmogorov’s Theorem (given that convolutions are related to composition)