360 likes | 494 Views
Explore the world of high-dimensional vectors in image processing and neural networks, including the significance of orthogonality for basis representation. Learn about image compression, feature identification, and classification using linear algebra and signal processing concepts. Delve into the application of neural networks for image classification and the challenges in choosing architectures and features for optimal performance.
E N D
Image processing and neural networks Gregery T. Buzzard
Linear algebra • What are some of your favorite vectors of dimension at least one million?
Linear algebra • Here are some of my favorite vectors:
Linear algebra • Some more favorite vectors (infinite dimensional): http://mathworld.wolfram.com/FourierSeries.html
Linear algebra • What is a vector? • How do we represent a vector? • Basic answer: • Vector = ordered list of n real numbers • Basis = n vectors so that any vector can be written uniquely as a linear combination of these vectors http://fourier.eng.hmc.edu/e102/lectures/orthogonaltransform/node3.html
High dimensional vectors • Note: images and signals are high-dimensional vectors. Can add and scale. • Q: What’s a ”good” basis for an image or signal? • Q: How can we represent an image in a given basis? • Q: Can we get by with less than a full basis – some kind of approximation?
Orthogonality • Inner product: • For vectors in Rn: • For real-valued functions: • Orthogonal (perpendicular): inner product is 0. https://en.wikipedia.org/wiki/Dot_product
Orthogonality • Orthogonality gives a firm (and fast) foundation! • With a basis of pairwise orthogonal unit vectors, we can represent a vector in a basis using dot product: a = <a, e1> e1 + <a, e2> e2 + … + <a,en>en • Standard basis vectors slide a single nonzero element over each coordinate. https://www.yogawinetravel.com/how-to-get-the-most-out-of-your-visit-to-the-piazza-del-duomo-in-pisa-italy/
Orthogonality • Orthogonal expansion for images and signals: • Fourier series http://www.bragitoff.com/2016/03/fourier-series-and-scilab/
Image compression https://www.slideshare.net/gpeyre/signal-processing-course-presentation-of-the-course
Beyond orthogonality • Image compression: • Take inner product with orthogonal basis elements • Keep only the most informative elements • Can be used for denoising • Image filtering: • Take inner product with a kernel that can slide around the image • Similar to basis expansion, but different goal • Used to transform images, identify features (edges, etc). https://www.slideshare.net/gpeyre/signal-processing-course-presentation-of-the-course
More general maps • Limitation: Basis expansion and filtering are linear: • F(av + bu) = aF(v) + bF(u) • How do we map from images to a discrete classification like {cat, not cat}? https://en.wiktionary.org/wiki/cat https://www.kickstarter.com/projects/vat19/the-not-a-cat-cat-the-worlds-first-cat-that-isnt
Internal representation • Nonlinear low dimensional representation from high-dimensional data • Input: 28x28 grayscale digits • Compare random images: • Low-dimensional representations of digits: Graph: repelling particles + springs t-SNE: stochastic neighbor embedding PCA http://colah.github.io/posts/2014-10-Visualizing-MNIST/
Image classification • Convert image to low dimensional representation, then to classification. • Given n distinct classes, use the softmax function to give probabilities: • X = (x1, …, xn) is a vector of real numbers • S(X) = (exp(x1), …, exp(xn)) / (exp(x1) + … + exp(xn)) is a vector of positive numbers that add to 1: interpret as “probability” • Map images to probabilities for each class using softmax function S(X).
First attempt at an image classifier • Input an image = X • Apply a linear map. E.g., inner product with various kernels to identify features. X -> AX • Apply the softmax function to estimate correct classification https://science.howstuffworks.com/transport/flight/classic/ten-bungled-flight-attempt.htm
Problems • Dimension of AX must equal number of classes. • How do we know what features to use? • Do we really want to weight the absence of a feature as much as the presence of a feature? • Do we really want to use linear maps up to the final step?
More general classifier • Use activation units: ReLU or rectified unit or similar functions to identify the presence of features. • Stack multiple layers using • Convolutional layer (inner product) • ReLU • Max pool • Fully connected https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
Early neural network • Convolutions to get and combine multiple features • Nonlinearity and subsampling to enhance features • Full connection to get output https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
Putting the layers together https://www.extremetech.com/extreme/215170-artificial-neural-networks-are-changing-the-world-what-are-they
Questions • How do we choose the architecture? • (and do we need multiple layers)? • How do we choose the features? • How well does it work?
Do we need multiple layers? • Universal approximation theorem: • a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. • Beautiful mathematical result, but not practical for learning and efficient representation. https://en.wikipedia.org/wiki/Universal_approximation_theorem http://aeronauticpictures.com/buy/download/t/early-flight_stock-footage/
Choosing the architecture • Changing all the time – very active research • Increasingly modular • Dozens to hundreds of layers 2012 2016 2013 https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
Choosing features: training • Instead of choosing features, we’ll learn them. • Need training data: input-output pairs • Need an error function – how good is our output? • Need an update mechanism: backpropagation
Backpropagation • Input, X, produces output, Y. Output depends on parameters, 𝜃, (kernel/filter/weights between nodes). • Training data is set of known pairs ( Xi , Yi) • Error function, E(X, 𝜃) gives the error between calculated output and desired output • Use gradient descent to change parameter values
Backpropagation • Input-output calculation flows forwards, gradient flows backwards f = qz; q = x+y Dff = 1; Dqf = z, Dzf = q Dxq = 1, Dyq = 1 Dxf = (Dqf)(Dxq) = z, Dyf = (Dqf)(Dyq) = z, Dzf = q = x+y http://cs231n.github.io/optimization-2/
Generative vs. Discriminative • Discriminative models learn the boundary between classes • Take an input image and produce a classification. • Generative models model the distribution of individual classes • Take a class label and some parametrization of that class and produce an example image.
Autoencoders • Autoencoders learn a reduced dimension representation of input. • Can be paired with a decoder to recover the input. • Applying random input to the decoder generates samples from the distribution on input.
GANs: Generative Adversarial Networks • Paired networks: one discriminates real from fake, another generates examples to pass as real. • Double feedback loop: • Generator takes random input and returns an image. • Discriminator takes in both real and fake images and returns probability of real. • Both get feedback about the other. https://deeplearning4j.org/generative-adversarial-network
Recurrent networks • Image processing is mostly feedforward. • Time dependent input uses recurrence: feedback and feedforward. • s acts like a memory • E.g., translation, generating text, speech recognition, generating image descriptions (with CNNs). http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
Reinforcement learning • Learn to take appropriate action based on external environment, internal state, and reward for action. https://en.wikipedia.org/wiki/Reinforcement_learning
What can they do? • Image classification, segmentation, denoising, etc. • Facial recognition and object recognition • Speech recognition and generation • Real time spoken translation • Cancer detection and other healthcare • Robot learning by demonstration • Recommender systems • Image generation: • https://www.geek.com/tech/nvidia-ai-generates-fake-faces-based-on-real-celebs-1721216/
How well do they work? https://www.slideshare.net/DavidBalduzzi/game-theory-for-neural-networks
How do they fail? Adversarial attacks https://medium.com/deep-dimension/deep-learning-papers-review-universal-adversarial-patch-a5ad222a62d2
Techniques for building/training • Structure • Layer components: Convolution, ReLU, max pool, full linear, gated recurrent, etc. • Hyperparameters: Number of layers, kernel width, padding • Training: • Performance metric and baseline/goal • Data – how much and how to use? • Optimizer: SGD+Nestorov, Adam (adaptive moment estimation). • Minibatches, dropout, batch normalization, regularization. • Hyperparameters: learning rate, dropout rate, regularization weight • Start from a successful structure for a similar problem.
Mathematical questions • Why do they work well? • How do they approximate a function on a high-dim’l space? • How do they generalize to new examples? • Is there a mathematical explanation for how to choose architecture? • How can we train them more quickly with fewer examples? • What layers might be better than existing layers? • How can we make them more robust? • How can we combine multiple sources of expertise?