240 likes | 370 Views
READINGS IN DEEP LEARNING. 4 Sep 2013. ADMINSTRIVIA. New course numbers (11-785/786) are assigned Should be up on the hub shortly Lab assignment 1 up Due date: 2 weeks from today Google group: is everyone on? Website issues.. Wordpress not yet an option (CMU CS setup) Piazza?.
E N D
READINGS IN DEEP LEARNING 4 Sep 2013
ADMINSTRIVIA • New course numbers (11-785/786) are assigned • Should be up on the hub shortly • Lab assignment 1 up • Due date: 2 weeks from today • Google group: is everyone on? • Website issues.. • Wordpress not yet an option (CMU CS setup) • Piazza?
Poll for next 2 classes • Monday, Sep 9 • The perceptron: A probabilistic model for information storage and organization in the brain • Rosenblatt • Not really about the logistic perceptron, more about the probabilistic interpretation of learning in connectionist networks • Organization of behavior • Donald Hebb • About the Hebbian learning rule
Poll for next 2 classes • Wed, Sep 11 • Optimal unsupervised learning in a single-layer linear feedforward neural network. • Terence Sanger • Generalized Hebbian learning rule • The Widrow Hoff learning rule • Widrow and Hoff • Will be presented by PallaviBaljekar
Notices • Success of course depends on good presentations • Please send in your slides 1-2 days before the presentations • So that we can ensure they are OK • You are encouraged to discuss your papers with us/your classmates while preparing for them • Use the google group for discussion
A new project • Distributed large scale training of NNs.. • Looking for volunteers
The Problem: Distributed data • Training enormous networks • Billions of units • from large amounts of data • Billions or Trillions of instances • Data may be localized.. • Or distributed
The problem: Distributed computing • A single computer will not suffice • Need many processors • Tens or hundreds or thousands of computers • Of possibly varying types and capacity
Challenge • Getting the data to the computers • Tons of data to many computers • Bandwidth problems • Timing issues • Synchronizing the learning
Logistic Challenges • How to transfer vast amounts of data to processors • Which processor gets how much data.. • Not all processors equally fast • Not all data take equal amounts of time to process • .. and which data • Data locality
Learning Challenges • How to transfer parameters to processors • Networks are large, billions or trillions of parameters • Each processor must have the latest copy of parameters • How to receive updates from processors • Each processor learns on local data • Updates from all processors must be pooled
Learning Challenges • Synchronizing processor updates • Some processors slower than others • Inefficient to wait for slower ones • In order to update parameters at all processors • Requires asynchronous updates • Each processor updates when done • Problem: Different processors now have different set of parameters • Other processors may have updated parameters already • Requires algorithmic changes • How to update asynchronously • Which updates to trust
Current Solutions • Faster processors • GPUs • GPU programming required • Large simple clusters • Simple distributed programming • Large heterogeneous clusters • Techniques for asynchronouslearning
Current Solutions • Still assume data distribution nota major problem • Assume relatively fast connectivity • Gigabit ethernet • Fundamentally cluster-computingbased • Local area network
New project • Distributed learning • Wide area network • Computers distributed across the world
New project • Supervisor/Worker architecture • One or more supervisors • May be a hierarchy • A large number of workers • Supervisors in charge of resource and task allocation, gathering and redistributing updates, synchronization
New project • Challenges • Data allocation • Optimal policy for data distribution • Minimal latency • Maximum locality
New project • Challenges • Computation allocation • Optimal policy for learning • Compute load proportional to compute capacity • Reallocation of data/task asrequired
New project • Challenges • Parameter allocation • Do we have to distribute all parameters • Can learning be local
New project • Challenges • Trustable updates • Different processors/LANs have different speeds • How do we trust their updates • Do we incorporate or reject?
New project • Optimal resychronization: how much do we transmit • Should not have to retransmit everything • Entropy coding? • Bit-level optimization?
Possibilities • Massively parallel learning • Never ending learning • Multimodal learning • GAIA..
Asking for Volunteers • Will be an open source project • Write to Anders
Today • Bain’s theory: Lars Mahler • Linguist, mathematician, philosopher • One of the earliest people to propose connectionist architecture • Anticipated much of modern ideas • McCulloch and Pitts: KartikGoyal • Early model of neuron: Threshold gates • Earliest model to consider excitation and inhibition