Progressive Perceptual Audio Rendering of Complex Scenes

Progressive Perceptual Audio Rendering of Complex Scenes Thomas Moeck - Nicolas Bonneel - Nicolas Tsingos - George Drettakis - Isabelle Viaud-Delmon - David Alloza 1 4 1,2 1 3 1 1- REVES/INRIA Sophia-Antipolis 2- Computer Graphics Group, University of Erlangen-Nuremberg 3- CNRS-UPMC UMR 7593 4- EdenGames

Objectives • Efficient audio rendering of very complex scenes with moving sources • Without audible impairment of the quality • Verify results by user tests

Previous Work • Rendering complex auditory scenes • Clustering [Tsingos et al. 2004]: replace many sources with a representative • Still can only treat ~200 sound sources (cost of clustering itself) • Scalable audio processing • Importance-guided processing of few frequency/time bins [Fouad et al. 1997, Wand & Straßer 2004, Gallo et al. 2005, Tsingos 2005]. • Audio processing (e.g., HRTF, spatialization) is expensive • Crossmodal effects • Neuroscience Literature: “Ventriloquism affects 3D audio perception” • Ventriloquism spatial window can vary from a few up to 15 degree • Few papers on ecological experiments

Methodology • Recursive approach to clustering • Reduce cost of clustering • Scalable perceptual premixing • Faster premixing without audible loss of quality • Taking perceptual and cross-modal information into account • Improve audio clustering algorithm • User experiments to detect improvement possibilities • Improving quality with results of tests • Validation of resulting algorithms

Overview of the algorithms • Masking of inaudible sources (with energy) • Clustering of remaining sources • Progressive premixing within each cluster • Spatial audio processing (HRTF) recursive

Our Work • Optimized recursive approach of clustering • Clustering performance evaluation • Improved scalable perceptual premixing • Quality evaluation study • Study of cross-modal effects by user experiments • Using results of cross-modal studies to develop audio-visual clustering algorithm

Optimized Recursive Clustering • Recursive splitting of clusters • Fixed-budget approach • Using a fixed number of clusters • Variable-budget approach • Splitting clusters until break condition is reached • Break condition: Average angle error • Optimal number of clusters • Variant used by EdenGames • 8 cluster budget • Local clustering when necessary

Eden Games’ implementation Test Drive Unlimited

Clustering Performance Evaluation • Performance of recursive algorithms are clearly better

Improved progressive scalable perceptual premixing (1) • After clustering: Premixing in each cluster • Why? Effects can be done afterwards - less cost because viewer signals • Only premixing necessary data • Assigning frequency bins to sound sources (iterative importance sampling) by using pinnacle value

Improved progressive scalable perceptual premixing (2) premixing clustering

Improved progressive scalable perceptual premixing (3) • Iterative importance sampling • Calculation of importance value from energy, loudness or audio saliency map • Assignment of frequency proportional to importance • until pinnacle value is reached • Reassignment of remaining frequencies to sounds relative to importance values

Varying budget

Quality Evaluation Study (1) • MUSHRA (“Multiple Stimuli with Hidden Reference and Anchors”) test of perceptual premixing • 7 subjects, aged from 23 – 40 • Ambient, music and speech • Various budgets (2% – 25 %) • With and without pinnacle value • Using loudness or saliency as importance value

Quality Evaluation Study (2) • Results: • Approach is capable of generating high quality using 25% of the original data • Acceptable results with 10% (2% in case of speech) • Significant Effects: • Budget • Importance value • Pinnacle value

Study of Cross-Modal Influences – Questions • Do we need more or fewer clusters in the viewing frustum? • We move spatial position of sound sources to representative in cluster • How tolerant are we to this error ? • Do visuals influence the perceived quality?

Study of Cross-Modal Influences – Setup (1)

Study of Cross-Modal Influences – Setup (2)

Study of Cross-Modal Effects – Setup (3)

Uniform distribution [1/4]

[2/3] condition

[3/2] condition

[4/1] condition

Study of Cross-Modal Influences – Results • Statistical analysis of the results shows: • We need more clusters in the viewing frustum • No significant difference of visuals/no-visuals but possible cross-modal effect

Modifying the algorithm • Introducing weighting term in clustering:  Increasing number of clusters in the viewing frustum

Cross-Modal illustration

Video: Putting it all together

Conclusions • Up to nearly 3000 sound sources possible in good quality • Main limitation are graphics (!) • Better quality because more clusters in viewing frustum • Future work • experiment with auditory saliency measurements • handle procedurally synthesized sounds?

Questions?

Progressive Perceptual Audio Rendering of Complex Scenes