380 likes | 487 Views
1. in. otion. Harmony. ♫. ♫. Zohar Barzelay , Yoav Y. Schechner. Dept. Elect. Eng. Technion – Israel Institute of Technology. Ack: Einav Namer, Yael Waissman, ISF. 2. ♫. “Harmony in otion”. ♫. Barzelay, Schechner. Violin-guitar: raw. 3. ♫. “Harmony in otion”.
E N D
1 in otion Harmony ♫ ♫ Zohar Barzelay , Yoav Y. Schechner Dept. Elect. Eng. Technion – Israel Institute of Technology Ack: Einav Namer, Yael Waissman, ISF
2 ♫ “Harmony in otion” ♫ Barzelay, Schechner Violin-guitar: raw
3 ♫ “Harmony in otion” ♫ Barzelay, Schechner Violin: Detected and Recovered
4 ♫ “Harmony in otion” ♫ Barzelay, Schechner Guitar: Detected and Recovered
5 Video features: track all Find the best Barzelay & Schechner, Harmony in Motion
6 Finding an Audio-Visual Object (AVO) Barzelay & Schechner, Harmony in Motion
7 • Corresponding images? • * Always: unmatched features • * Good image match: • many “coincidences” * Spatial Edges Spatial matching: Many “coincidences” ? ? ? Barzelay & Schechner, Harmony in Motion
8 Audio-Visual matching* Feature-based* Feature = significant change in time: temporal-edge* Maximize coincidences* No need to match everything Spatial matching* Feature-based* Feature = significant change in space: edge, corner* Maximize coincidences* No need to match everything Barzelay & Schechner, Harmony in Motion
9 Feature-based Cross-Modal Matching Barzelay & Schechner, Harmony in Motion
9 Feature-based Cross-Modal Matching Barzelay & Schechner, Harmony in Motion
10 Feature-based Cross-Modal Matching Acceleration time [frames] Barzelay & Schechner, Harmony in Motion
11 Amplitude 1 1 0 0 t t Feature-based Cross-Modal Matching t ‘Visual Onsets’ ‘Audio Onsets’
12 Audio-VisualCoincidences Barzelay & Schechner, Harmony in Motion
13 amplitude 0 energy t F 0 frequency frequency Spectrogram t 0 Audio Pre-processing Barzelay & Schechner, Harmony in Motion
14 frequency spectrogram temporal derivative t 0 Beginning of new sounds Audio Onsets t 0 Significant change in audio Barzelay & Schechner, Harmony in Motion
15 Handling pitch-drift Barzelay & Schechner, Harmony in Motion
16 directional derivative spectrogram non-directional derivative spectrogram Handling pitch-drift Barzelay & Schechner, Harmony in Motion
17 Visual Matching t t 1 0
18 Visual Matching Amplitude 0 1 -4 0 t 1 1 t 0 -5 t
19 Ranking Criterion t 0 coincidences inconsistencies 1 t 0 1 0 t Barzelay & Schechner, Harmony in Motion
20 Residual Audio Onsets t 0 coincidences Residual Onsets 1 0 1 0 t Barzelay & Schechner, Harmony in Motion
21 Sequential Object Detection Amplitude Residual Onsets 1 0 t 1 t 0 0 t Barzelay & Schechner, Harmony in Motion
22 ♫ “Harmony in otion” ♫ Barzelay, Schechner Speech: raw
23 ♫ “Harmony in otion” ♫ Barzelay, Schechner Speech A-B-C: Detected & Recovered
24 ♫ “Harmony in otion” ♫ Barzelay, Schechner Speech 1-2-3: Detected & Recovered
25 Audio Isolation
26 amplitude 0 energy t F 0 frequency frequency Spectrogram t 0 Audio Pre-processing Barzelay & Schechner, Harmony in Motion
27 Audio Isolation Spectrogram Corresponding Onsets t frequency t 0 Barzelay & Schechner, Harmony in Motion
27 Audio Isolation Harmonic Sounds Spectrogram Corresponding Onsets t frequency t 0
28 Fourier representation phase amplitude 0 energy t 0 frequency F 0 frequency Spectrogram frequency t 0 Barzelay & Schechner, Harmony in Motion
29 Filtered audio amplitude 0 t -1 F Spectrogram old phase 0 frequency energy frequency 0 frequency t 0 Barzelay & Schechner, Harmony in Motion
30 Limitations: Temporal Tolerance t ¼ sec 0 t t 00:00:16 t 1 0 Barzelay & Schechner, Harmony in Motion
31 Limitations: Audio Sparsity Overlapping audio onsets • Soundsmay overlap in time • Onsets should not frequency t Time-Frequency overlap Barzelay & Schechner, Harmony in Motion
32 00:00:15 Detection Parameters acceleration t 1 0 time Feature-Detection: • edge scale • significance level • pruning Visual Edges: Barzelay & Schechner, Harmony in Motion
33 ♫ “Harmony in otion” ♫ Barzelay, Schechner Dual Viloin
34 ♫ “Harmony in otion” ♫ Barzelay, Schechner
35 ♫ “Harmony in otion” ♫ Barzelay, Schechner
36 Feature-based Cross-Modal Association • Features: Temporal Audio/Visual Edges. • Simultaneous Objects + Sounds. • A General Concept.