Instructor: Kai Zhang Fall 2019

Applications of Matrix-based Learning Algorithms Instructor: Kai Zhang Fall 2019

Data Intensive Research empirical theoretical simulation data intensive

Symmetric Matrices • Kernel (Gram) matrix • Graph adjacency matrix • Kernel methods • Support vector machines • Kernel PCA, LDA, CCA • Gaussian process regression • Graph-based algorithms • Manifold learning, dimension reduction • Clustering, semi-supervised learning • Random walk, graph propagation Manipulating matrix time, space!

Universal Applicability Low-rank Approximation Matrix Decomp. Comes Into Play! • Reduce problem size, complexity • Structure mining • Fundamental building block in modern computing • Information Technology (information retrieval, recommender systems) • Computer Vision (image processing, face recognition, video surveillance) • Computer Graphics (3D reconstruction, relighting, rendering) • Bioinformatics(clustering, network analysis, gene expression analysis) • Deep Networks (model compression, energy saving) • Optimization (linear systems, newton method) Data gathered from 1998 12th Release: 4 million spectra

More Applications • Wide applications of matrix (singular-value) decomposition

Graph kernel • Motivation A Toxic Non-toxic B E D B A C C A E B D Unknown B D C Known C A E Task: predict whether molecules are toxic, given set of known examples D F

Manifold Learning • Manifold Learning (or non-linear dimensionality reduction) embeds data that originally lies in a high dimensional space in a lower dimensional space, while preserving characteristic properties. • a manifold is a topological space that locally resembles Euclidean space near each point.

Mapmaking problem A B B A Earth (sphere) Planar map

Image Examples Objective: to find a small number of features that represent a large number of observed dimensions. For each image: there are 64x64 = 4096 pixels (observed dimensions) Assumption: High-dimensional data often lies on or near a much lower dimensional, curved manifold.

Visualization of 6,000 digits from the MNIST dataset produced by t-SNE.

The COIL20 dataset Each object is rotated about a vertical axis to produce a closed one-dimensional manifold of images.

Visualization of COIL20 produced by t-SNE.

Time Series Embedding Example • BOLD (Blood-oxygen-level dependent )signal Signal of a single brain region Signal of altogether 90 brain regions

Impact of perplexity Perplexity controls the range of the neighborhood used in computing the probability distribution

Computer graphics Applications • Rendering on meshed surfaces

Spring networks View edges as rubber bands or ideal linear springs Nail down some vertices, let rest settle potential energy is When stretched to length

Spring networks Nail down some vertices, let rest settle Physics: position minimizes total potential energy subject to boundary constraints (nails)

Drawing by Spring Networks (Tutte’63)

Drawing by Spring Networks (Tutte’63) If the graph is planar, then the spring drawing has no crossing edges!

Drawing by Spring Networks (Tutte’63)

Unsupervised feature learning with a neural network x1 x1 • Network is trained to output the input (learn identify function). • Trivial solution unless: • Constrain number of units in Layer 2 (learn compressed representation), or • Constrain Layer 2 to be sparse. x2 x2 x3 x3 x4 x4 a1 x5 x5 a2 +1 x6 x6 Layer 2 a3 Layer 3 Layer 1 +1 Encoding Decoding

Instructor: Kai Zhang Fall 2019

Instructor: Kai Zhang Fall 2019

Presentation Transcript

DENNIS zHANG, Fall 2008

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Speaker: Ya-Wen Tasi Instructor: Yu-Yan Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Shengyu Zhang

Instructor: Juyong Zhang juyong@ustc stafftc/~juyong

Fall Meeting 2019

Fall 2019