230 likes | 234 Views
Information-Theoretic Listening. Paris Smaragdis Machine Listening Group MIT Media Lab. Outline. Defining a global goal for computational audition Example 1: Developing a representation Example 2: Developing grouping functions Conclusions. Auditory Goals.
E N D
Information-Theoretic Listening Paris Smaragdis Machine Listening Group MIT Media Lab
Outline • Defining a global goal for computational audition • Example 1: Developing a representation • Example 2: Developing grouping functions • Conclusions
Auditory Goals • Goals of computational audition are all over the place, should they? • Lack of formal rigor in most theories • Computational listening is fitting psychoacoustic experiment data
Auditory Development • What really made audition? • How did our hearing evolve? • How did our environment shape our hearing? • Can we evolve, rather than instruct, a machine to listen?
Goals of our Sensory System • Distinguish independent events • Object formation • Gestalt grouping • Minimize thinking and effort • Perceive as few objects as possible • Think as little as possible
Entropy Minimization as a Sensory Goal • Long history between entropy and perception • Barlow, Attneave, Attick, Redlich, etc ... • Entropy can measure statistical dependencies • Entropy can measure economy • in both ‘thought’ (algorithmic entropy) • and ‘information’ (Shannon entropy)
What is Entropy? • Shannon Entropy: • A measure of: • Order • Predictability • Information • Correlations • Simplicity • Stability • Redundancy • ... • High entropy = Little order • Low entropy = Lots of order
Representation in Audition • Frequency decompositions • Cochlear hint • Easier to look at data! • Sinusoidal bases • Signal processing framework
Evolving a Representation • Develop a basis decomposition • Bases should be statistically independent • Satisfaction of minimal entropy idea • Decomposition should be data driven • Account for different domains
Method • Use bits of natural sounds to derive bases • Analyze these bits with ICA
Results • We obtain sinusoidal bases! • Transform is driven by the environment • Uniform procedure for different domains
Good Continuation Common AM Common FM Auditory Grouping • Heuristics • Hard to implement on computers • Require even more heuristics to resolve ambiguity • Weak definitions • Bootstrapped to individual domains • Vision Gestalt Auditory Gestalt …
Method • Goal: Find grouping that minimizes scene entropy Parameterized Auditory Scene s(t,n) Density Estimation Ps(i) Shannon Entropy Calculation
n = 0.5 Frequency Time Common Modulation - Frequency • Entropy Measurement: • Scene Description:
Common Modulation - Amplitude • Entropy Measurement: • Scene Description: Sine 2 Amplitude n = 0.5 Sine 1 Amplitude Time
n = 0.5 Sine 2 Amplitude Common Modulation - Onset/Offset • Entropy Measurement: • Scene Description: Sine 1 Amplitude Time
Frequency Time Similarity/Proximity - Harmonicity I • Entropy Measurement: • Scene Description:
Frequency Time Similarity/Proximity - Harmonicity II • Entropy Measurement: • Scene Description:
Simple Scene Analysis Example • Simple scene: • 5 Sinusoids • 2 Groups • Simulated Annealing Algorithm • Input: Raw sinusoids • Goal: Entropy minimization • Output: Expected grouping
Important Notes • No definition of time • Developed a concept of frequency • No parameter estimation requirement • Operations on data not parameters • No parameter setting!
Conclusions • Elegant and consistent formulation • No constraint over data representation • Uniform over different domains (Cross-modal!) • No parameter estimation • No parameter tuning! • Biological plausibility • Barlow et al ... • Insight to perception development
Future Work • Good Cost Function? • Joint entropy vs entropy of sums • Shannon entropy vs Kolmogorov complexity • Joint-statistics (cumulants, moments) • Incorporate time • Sounds have time dependencies I’m ignoring • Generalize to include perceptual functions
Teasers • Dissonance and Entropy • Pitch Detection • Instrument Recognition