300 likes | 419 Views
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications. Andrei Zinovyev Institute des Hautes Etudes Scientifique, France. Plan of the talk. Object of study Definition of principal manifold (PM) Constructing PMs: elastic maps
E N D
Non-linear Principal Manifoldsa Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France
Plan of the talk • Object of study • Definition of principal manifold (PM) • Constructing PMs: elastic maps • Examples of biomedical applications
Principal manifoldsElastic maps framework LLE ISOMAP Clustering Multidim. scaling Principal manifolds PCA K- means Visualization SOM Non-linear Data-mining methods Factor analysis Supervised classification SVM Regression, approximation
Finite set of objects in RN X i i=1..m
K-means clustering Mean point
1st Principal axis Maximal dispersion 2nd principal axis Principal Component Analysis ,
What do we want? • Non-linear surface (1D, 2D, 3D …) • Smooth and not twisted • The data model is unknown • Speed (time linear with Nm) • Uniqueness • Fast way to project datapoints
Metaphor of elasticity U(Y) U(E), U(R) Data points Graph nodes
y E(0) R(1) R(2) E(1) R(0) Constructing elastic nets
Xj y E(0) R(1) R(2) E(1) R(0) Definition of elastic energy .
Global minimum and softening 0, 0 103 0, 0 102 0, 0 101 0, 0 10-1
Adaptive algorithms Refining net: Growing net Idea of scaling: Adaptive net
Projection onto the manifold Closest node of the net Closest point of the manifold
Colorings: visualize any function Value of the coordinate
principal component regression F(x) x Regression and principal manifolds
Application: economical data Density Gross output Profit Growth temp
Medical table1700patients with infarctus myocarde Patients map, density Lethal cases
Medical table1700patients with infarctus myocarde 128 indicators Stenocardia functional class Numberof infarctus in anamnesis Age
Codon usage in all genes of one genome Escherichia coli Bacillus subtilis Majority of genes “Foreign” genes “Hydrophobic” genes Highly expressed genes
Golub’s leukemia dataset3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML) Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie ALL sample AML sample
Golub’s leukemia datasetmap of samples: AML ALL/B-cell ALL/T-cell Retinoblastoma binding protein P48 Cystatin C density CA2 Carbonic anhydrase II X-linked Helicase II
Thank you for your attention! • Questions?