270 likes | 286 Views
Learn about the reverse engineering process in developing brain-like software architecture and the challenges faced in disentangling complex sensory input. Explore how modules can work together to solve perceptual problems.
E N D
Brain-like software architecture Confessions of an ex-neuroscientist Bill Softky
Which comes first: the problem or the solution? • Reverse engineering starts with hardware, works backward • Usually only succeeds if problem is understood • “Forward” software engineering starts with the problem, and saves hardware for last
“Reverse” engineering From an engineering perspective, this is nuts!
Initial goals here • Input: we need a generic description of sensory input (at least audio & visual) • Processing: speculate on generic, modular processing “API” which can untangle those correlations • No neurons, synapses, spikes…yet.
Simple “truth” tangled inputs Hypothesis: each entangling transformation is fairly simple
Stepwise decorr untangled truth Hypothesis: a sequence of similar compressions will yield useful representation
First toy problem: cocktail party with echoes • Multiple independent speakers • Multiple “ears” (mics) • Multiple echoes/amplitudes for each speaker/mic combo • Echo patterns constrained (3-D) and unchanging Try to remove echoes and separate speakers (our brains can do this...)
Echo kernels = location info + M-a M-a S 1 = M-b + M-b S 2 M-c M-c + 3 x 10kHz 2 x 10kHz (x,y,z) static “entangled signals” Echo kernels, transfer functions, “maps” “pure signals”
Second toy problem: video • Moving “objects” (simple shapes) • Constant velocity • Spatiotemporal pixel pattern is just echoes from t=0 at center
Echo kernels = location/shape/velocity (0,0) (0,0) + = . . . (0,1) (0,1) + (4,4) (4,4) + {v, f, D} semi-static 1 kHz 100 x 1 kHz “Time at center” Spatio-temporal Pixel responses “entangled signals”
Generic entanglement = . . . . . . . . many entangled, correlated, high-bw signals as inputs Echo kernels in low-dim subspace give persistent structure Very few independent pure signals to track
Recap: echo-entanglement as a generic perceptual problem • Very similar to early vision • Just like audio echo-removal • Structured “echoes” carry near-static info • Associative memory and vector quantization are special cases
How to dis-entangle? • Want to reveal original signals and structures • Problem is hard (unsolved!) So… • Skip the mere algorithms • Skip the neurons and biology • Focus on a module’s inputs & outputs • Try to make modules work together
What would one disentangling module do? • Note separate timescales: • Many channels of high-BW input • 1-3 indep channels med-BW output (time blurring) • Many channels near-static output & input • Learn correlations (echoes) in input • Find low-dim subspace for echos (e.g. {x,y,z}, or {v, f, D}) • Reconstruct inputs all at once (batch) • Minimize reconstruction error (Assume typically 1 pure signal max during learning)
Basic disentangling module e.g. for cocktail-party decorrelation “now” T=-500 +100, coarse Float outputs Pure signal x,y,z Decorrelation & vector quantization Reconstruction & prediction “mics” Float inputs T=-500 +100, fine “now”
Add multiple, independent outputs • Multiple speakers/objects multiple outputs • Each output represents one object (max 3) • Output streams and mappings are independent • An even harder disentangling task • (complications too!....)
Module with multiple outputs X1,y1,z1 Speaker 1 X2,y2,z2 Speaker 2 X3,y3,z3 Speaker 3
Add confidence estimates (sigmas) • Disentangling is already a statistical-estimation task • Confidence estimates come for free during reconstruction • Propagate inputs’ sigmas forward • Create output sigmas based on input sigmas and reconstruction error
Module with sigmas s s
Add layers • Pure signal outputs become inputs to next layer • Many modules below feed each module above • Maybe, modules below can feed more than one above • Whole upper layer uses longer and coarser timescale • Stackable indefinitely • Top layers have huge input range, long memory, broad abstractions
Modules in layers T=-1000200 -500 100
Add feedback • Upper layer reconstructions provide estimates to lower modules (might help, can’t hurt) • Near-static channels provide cheap “prediction” of input interrelations • Update all estimates frequently • Predicted pure signals could help reconstruction below
Open problems • How do do the decompression? • Iterative? Monte Carlo? Low-dim subspace? • Multiple objects/pure signals: • Deciding how many objects from a module • “binding” problem across modules • Which goes with which? • Layers 2-N need “clones,” one clone per extra object
= Summary: generic sensory model • Assume inputs result from cascading a simple entangling transformation • Entangling transformation is cocktail-party with echoes
Summary: stackable disentangling modules • Assume one layer of disentangling can be learned and done somehow • Separate time-series from static echo-kernel structure • Disentangle time-series in batches • Use reconstructions for error-checking and feedback • Propose “API” by which such modules can interact to solve multi-scale, multi-sensory problems