260 likes | 388 Views
CSC 4510 – Machine Learning. 7: Introduction to Neural Networks. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www .csc.villanova.edu /~map/4510/. Some of the slides in this presentation are adapted from:
E N D
CSC 4510 – Machine Learning 7: Introduction to Neural Networks Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/ • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/
Machine learning problems • Supervised Learning • Classification • Regression • Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC 4510 - M.A. Papalaskari - Villanova University
Motivation – part 1 • Learning a non-linear function CSC 4510 - M.A. Papalaskari - Villanova University • Some of the slides in this presentation are adapted from: • Prof. Frank Klassner’s ML class at Villanova • the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/ • The Stanford online ML course http://www.ml-class.org/
But the camera sees this: What is this? You see this:
Computer Vision: Car detection Not a car Cars Testing: What is this?
pixel 1 Learning Algorithm pixel 2 Raw image pixel 2 Cars “Non”-Cars pixel 1
pixel 1 Learning Algorithm pixel 2 Raw image pixel 2 Cars “Non”-Cars pixel 1
pixel 1 Learning Algorithm pixel 2 Raw image pixel 2 Why not apply logistic regression? Cars “Non”-Cars pixel 1
Reminder: logistic regression can do non-linear Non-linear decision boundaries Add some higher-order terms? x2 1 x1 1 -1 Predict “ “ if -1
pixel 1 Learning Algorithm 50 x 50 pixel images→ 2500 pixels (7500 if RGB) pixel 2 Raw image pixel 2 Cars pixel 1 intensity “Non”-Cars pixel 2 intensity pixel 2500 intensity Quadratic features ( ): ≈3 million features pixel 1
Motivation – part 2 • Take inspiration from the brain CSC 4510 - M.A. Papalaskari - Villanova University
Neural Networks • Origins: Algorithms that try to mimic the brain. • Was very widely used in 80s and early 90s; • popularity diminished in late 90s. • Recent resurgence: State-of-the-art technique for many applications
The “one learning algorithm” hypothesis Auditory Cortex Auditory cortex learns to see [Roe et al., 1992] [Roe et al., 1992]
The “one learning algorithm” hypothesis Somatosensory Cortex Somatosensory cortex learns to see [Metin & Frost, 1989]
Sensor representations in the brain Human echolocation (sonar) Seeing with your tongue Haptic belt: Direction sense Implanting a 3rd eye [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]
Neuron in the brain Input signals sent from other neurons If enough sufficient signals accumulate, the neuron fires a signal. Connection strengths determine how the signals are accumulated
Neurons in the brain [Credit: US National Institutes of Health, National Institute on Aging]
Comparing Carbon & Silicon • Human Brain • Computational Units: O(1011) neurons • Storage Units: O(1011) neurons, O(1014) synapses • Cycle Time: O(10-3) sec, Bandwidth: O(1014) bits/sec • Neuron Updates/sec: O(1014) • Computer • Computational Units: 1 CPU, 107 gates • Storage Units: O(1011) bits RAM, O(1012) bits disk • Cycle Time: O(10-8) sec, Bandwidth: O(108) bits/sec • Neuron Updates/sec: O(108)
The neuron model McCulloch & Pitts 1943 • Compute weighted sum of inputs and “fire” if above threshold value CSC 4510 - M.A. Papalaskari - Villanova University
input signals ‘x’ and coefficients ‘w’ are multiplied • weights correspond to connection strengths • signals are added up – if they are enough, FIRE! output signal incoming signal connection strength activation level
The perceptron Rosenblatt 1958: Training algorithm Neural Net • A single neuron • Adjustable synaptic weights CSC 4510 - M.A. Papalaskari - Villanova University
Perceptron learning • Initialize weights and thresholds to random numbers between -0.5 and 0.5 • Activate perceptron • Update weights: wi(p+1) = wi(p) + α * xi(p) * err(p) • Iterate until Convergence
Representation Limits for perceptron • Linear Separability