240 likes | 366 Views
User Benefits of Non-Linear Time Compression. Liwei He and Anoop Gupta Microsoft Research. Introduction. Time compression: key to browse AV content We focus on informational content Audio time compression algorithms Linear: speed up audio uniformly
E N D
User Benefits of Non-Linear Time Compression Liwei He and Anoop Gupta Microsoft Research
Introduction • Time compression: key to browse AV content • We focus on informational content • Audio time compression algorithms • Linear: speed up audio uniformly • Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes) • How much more do users gain from more complex algorithms?
Methodology • Conduct user listening test • One Linear TC algorithm • Two Non-linear TC algorithms • Simple: Pause-removal followed by Linear TC • Sophisticated: Adaptive TC • Compare objective and subjective measurements
Linear Time Compression • Classic algorithms • Overlap Add (OLA) and Synchronized OLA (SOLA) • We use SOLA
Non-Linear Time Compression • Algorithm 1: Pause removal plus TC • Energy and Zero Crossing Rate analysis • Leave 150ms untouched • Shorten >150ms to 150ms • Apply SOLA algorithm • PR shortens speech by 10-25%
Non-Linear Time Compression (cont.) • Algorithm 2: Adaptive TC • Mimics people when talking fast • Pauses and silences are compressed the most • Stressed vowels are compressed the least • Consonants are compressed more than vowels • Consonants are compressed based on neighboring vowels
System Implications • Computational complexity • Adaptive TC 10x more costly than Linear TC • Complexity in client-server implementation • Buffer management required for non-linear TC • Audio-video synchronization quality
User Study Goals • Highest intelligible speed • Comprehension • Subjective preference • Sustainable speed
Experiment Method • 24 subjects • 4 tasks for each subject • 3 time compression algorithms • Linear TC using SOLA (Linear) • Pause removal plus Linear TC (PR-Lin) • Adaptive TC (Adapt) • Each test takes approximately 30 minutes
Highest Intelligible Speed Task • 3 clips from technical talks • Find the highest speed when most of words are understandable
Comprehension Task • 3 clips at 1.5x and 3 clips at 2.5x • Clips from TOEFL listening test • Answer 4 multiple choice questions
Subjective Preference Task • 3 pairs of clips at 1.5x • 3 pairs of clips at 2.5x • Each pair contains the same clip compressed with 2 of the 3 TC algorithms • Indicate preference on 3-point scale
Sustainable Speed Task • 3 clips each 8 minute along • Clips from a CD audio book • Find the maximum comfortable speed • Write a 4-5 sentence summary at the end
Highest Intelligible Speed Task • PR-Lin is significantly better than Adapt (p<.01)
Comprehension Task Adapt is better than PR-Lin (p=.083) at 2.5x
Preference Task at 1.5x • Slight preference for PR-Lin (p=.093)
Preference Task at 2.5x • PR-Lin and Adapt do significantly better than Linear
Previous Works • Mach1 (Covell et. al. ICASSP 98) • Comprehension and preference tasks • Comparing Linear and Mach1 (Adapt) at 2.6-4.2x • Comprehension scores 17% better w/ Mach1 • 95% prefers Mach1 to Linear • No data on < 2.0x • Other works (Harrigan, Omoigui, Li, Foulke) • 1.2-1.7x is the sustainable listening speed
Conclusions • Trade off in TC algorithms is task-related • Listening: Linear TC is sufficient • Fast Forwarding: Non-linear TC is more suitable • Adapt TC is close to the way people talk fast • Limit lies in the human-listening and comprehension