CSC 4510 – Machine Learning

CSC 4510 – Machine Learning 7: Introduction to Neural Networks Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/

Machine learning problems • Supervised Learning • Classification • Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University

Motivation – part 1 • Learning a non-linear function CSC 4510 - M.A. Papalaskari - Villanova University • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/

But the camera sees this: What is this? You see this:

Computer Vision: Car detection Not a car Cars Testing: What is this?

pixel 1 Learning Algorithm pixel 2 Raw image pixel 2 Cars “Non”-Cars pixel 1

pixel 1 Learning Algorithm pixel 2 Raw image pixel 2 Why not apply logistic regression? Cars “Non”-Cars pixel 1

Reminder: logistic regression can do non-linear Non-linear decision boundaries Add some higher-order terms? x2 1 x1 1 -1 Predict “ “ if -1

pixel 1 Learning Algorithm 50 x 50 pixel images→ 2500 pixels (7500 if RGB) pixel 2 Raw image pixel 2 Cars pixel 1 intensity “Non”-Cars pixel 2 intensity pixel 2500 intensity Quadratic features ( ): ≈3 million features pixel 1

Motivation – part 2 • Take inspiration from the brain CSC 4510 - M.A. Papalaskari - Villanova University

Neural Networks • Origins: Algorithms that try to mimic the brain. • Was very widely used in 80s and early 90s; • popularity diminished in late 90s. • Recent resurgence: State-of-the-art technique for many applications

The “one learning algorithm” hypothesis Auditory Cortex Auditory cortex learns to see [Roe et al., 1992] [Roe et al., 1992]

The “one learning algorithm” hypothesis Somatosensory Cortex Somatosensory cortex learns to see [Metin & Frost, 1989]

Sensor representations in the brain Human echolocation (sonar) Seeing with your tongue Haptic belt: Direction sense Implanting a 3rd eye [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]

Neuron in the brain Input signals sent from other neurons If enough sufficient signals accumulate, the neuron fires a signal. Connection strengths determine how the signals are accumulated

Neurons in the brain [Credit: US National Institutes of Health, National Institute on Aging]

Comparing Carbon & Silicon • Human Brain • Computational Units: O(1011) neurons • Storage Units: O(1011) neurons, O(1014) synapses • Cycle Time: O(10-3) sec, Bandwidth: O(1014) bits/sec • Neuron Updates/sec: O(1014) • Computer • Computational Units: 1 CPU, 107 gates • Storage Units: O(1011) bits RAM, O(1012) bits disk • Cycle Time: O(10-8) sec, Bandwidth: O(108) bits/sec • Neuron Updates/sec: O(108)

The neuron model McCulloch & Pitts 1943 • Compute weighted sum of inputs and “fire” if above threshold value CSC 4510 - M.A. Papalaskari - Villanova University

input signals ‘x’ and coefficients ‘w’ are multiplied • weights correspond to connection strengths • signals are added up – if they are enough, FIRE! output signal incoming signal connection strength activation level

Activation Functions

A neuron can compute….

The perceptron Rosenblatt 1958: Training algorithm Neural Net • A single neuron • Adjustable synaptic weights CSC 4510 - M.A. Papalaskari - Villanova University

Perceptron learning • Initialize weights and thresholds to random numbers between -0.5 and 0.5 • Activate perceptron • Update weights: wi(p+1) = wi(p) + α * xi(p) * err(p) • Iterate until Convergence

Example: Perceptron learning logical AND

Representation Limits for perceptron • Linear Separability

CSC 4510 – Machine Learning