MPEG-4 Structured Audio

MPEG-4 Structured Audio Eric D. Scheirereds@media.mit.edu Machine Listening GroupMIT Media Laboratory Editor, ISO 14496-3 (MPEG-4 Audio) Project Bar-B-Q 1999Guadalupe River Ranch15 Oct 1999

MPEG-4 Structured Audio,A New Standard for Interactive Sound, in the Creation of Which Tom White did not Run the Whole Show, but Only Played a Small (Though Valuable) Part Eric D. Scheirereds@media.mit.edu Machine Listening GroupMIT Media Laboratory Editor, ISO 14496-3 (MPEG-4 Audio) Project Bar-B-Q 1999Guadalupe River Ranch15 Oct 1999

What’s this all about? • MPEG-4 is not just about compression • MPEG-4 shows one way for the IA world to move beyond wavetable synthesis

Overview • What is MPEG? • What is MPEG-4 Structured Audio? • Why was it created? • How does it work? • How can it be used in IA applications? • What is its current status? • A brief note on MPEG-4 AudioBIFS

Intellectual property in MPEG-4 • Structured Audio and AudioBIFS are free • All patentable IP has been released to public domain • No licensing or other costs to build tools & players • (Standard itself costs $300 for printing/bureaucracy) • SA and AudioBIFS are open standards • Companies competing through cooperation • Interoperability makes the whole pie bigger • MPEG processes for improving/correcting standard • MIT has no veto over the future of the standard

What is MPEG? • MPEG is ISO/IEC JTC1 SC29 WG11 • A subcommittee of the Int’l Standards Organization • The “Moving Pictures Experts Group” • MPEG-1 : 1993 (ISO 11172) • Digital audio/video coding (MP3) • MPEG-2 : 1994-7 (ISO 13818) • Digital coding for broadcast • MPEG-4: 1998 (ISO 14496) • Object based, synthetic/natural, interactive coding

MPEG Marketplace Model MPEG Committee MPEG Standard Server-side tools makers Client-side tools makers Authoring tools Playback tools MPEGContent Content developers Content consumers

MPEG Marketplace Model MPEG Committee This talk MPEG Standard Server-side tools makers Client-side tools makers Authoring tools Playback tools MPEGContent Content developers Content consumers

MPEG Marketplace Model MPEG Committee MPEG Standard Server-side tools makers Client-side tools makers The businessopportunities Authoring tools Playback tools MPEGContent Content developers Content consumers

MPEG-4 Audio • High-quality sound • Based on MPEG-AAC algorithm: twice as good as MP3 • Low-bitrate sound • For WWW and cellular: speech/music as low as 4 kbps • Synthetic sound • Interface to Text-to-Speech synthesizers • High-quality audio synthesis with Structured Audio • AudioBIFS • Mix and postproduce multi-track sound streams

MPEG-4 Structured Audio • Transmit structured descriptionof sound • Use real-time synthesis to play sound • “PostScript for audio” • Based on new (to MPEG) technology • SAOL: New music synthesis language • SASL: New music control format • A lot of related technology in academia • Csound, Music-11, SynthScript, Nyquist, CLM, ...

Standardization goals • Provide synthetic sound in MPEG-4 • Bring algorithmic synthesis to wider community • Standardize academic state-of-the-art; don’t innovate • Get new companies to work on synthesis • Implementation required for full MPEG-4 system • Set a higher bar for PC sound architecture • Drive forward the world of sound on PCs! Stated goals Secret goals

MPEG-4 SA decoding process SAOL Decoder Bitstream header Reconfigurable Synthesis Engine Samples SASL/MIDI Decoder Bitstream Control parameters Multichannel high-quality audio

What SAOL looks like • A C-like language • Based on the Music-N model • Variables hold audio signals • Unit generators do basic functions • Instruments controlled by score or MIDI instr beep(mp, vol) { asig wave; ksig env; table sig(harm,2048,1,1); wave = oscil(sig,cpsmidi(mp)); env = kline(0,dur*0.05,vol, dur*0.6,vol, dur*0.35,0); output(wave * env); } SAOL: Structured Audio Orchestra Language

SAOL capabilities • Many nice features built in • Wavetable manipulation FFT/IFFT • Multitap delay lines Arrays of signals • FIR & IIR filters Effects routing • Granular synthesis 3-D audio interface • Dynamic layering and triggering • SAOL is extensible-from-within • (Allows encapsulation and structured programming) • Any kind of synthesis can be used in SAOL

Example • “Xanadu” (Joseph Kung) • 60 seconds long, 44 KHz stereo (10.5 MB as WAVE) • 2.2 KB in header • 4.2 KB in bitstream (= 0.07 kbps) • No samples anywhere, only algorithmic synthesis More than 1200:1 “compression”, no loss of quality Could be controlled/restructured interactively

MPEG-MMA relationship • MIDI can control MPEG-4 SA synth • SASL = more flexible, more tightly coupled • DLS-2 synthesis embedded in SA synth • Do wavetable in series or parallel with other techniques • “Wavetable-only” profile of MPEG-4 • MIDI + DLS-2 + compressed audio + video (no SAOL) • Logical path of progression from today to tomorrow • Lots of help from MMA - appreciated! • MPEG is ready to help in the other direction (MIDI-DLA?)

Applications ideas • MPEG-4 is not an application! • It’s a tool - enables functionality and interoperability • Implementations could be hardware, software, both • Authoring tools also very important • Use MPEG-4 SA like Staccato Synthcore • Use MPEG-4 SA like Beatnik • Use MPEG-4 SA like Koan • Use MPEG-4 SA for new music applications

Application example: Gaming MPEG-4algorithm andsample editors MPEG-4 algorithm marketplace MPEG-4 synthesis/effects algorithms Startup Host program (game) MPEG-4 enabledsound card Runtime MPEG-4 & MIDI controls • Not just music -- parametric sound effects as well • All audio programming and asset development in SAOL No host-language audio programming needed • Host APIs (e.g. DirectMusic) can generate controlsEmbedded MPEG-4 side can do this too, if useful Multichannel, 3-D, post-processed sound

Current status • Standard and reference software finished • Many implementation projects starting • Creative Tech Center: Compression & Interactive Audio • Studer + EPFL: “ThreeDSpace” project • Hobbyist projects (Java API, ActiveX plugin) • Others: Be Inc., Sseyo, Kings College, UC Berkeley, • Catholic U. Leuven, Q-Team DE, Nokia, ... • 3 complete implementations already! • A few authoring tools projects • Active mailing list for developers

A brief note on AudioBIFS • BIFS is scene-description part of MPEG-4 • “Binary Format for Scenes” • Based on VRML, but with many new features • AudioBIFS is the audio mixing part • Stream audio in multitrack format • Deliver mixdown instructions in AudioBIFS • Mixing, spatialization, effects in SAOL, multichannel • Terminal-adaptive capability • Candidate for “PC DSP architecture”?

AudioBIFS - scene graph model Sound Attach sound to main scene (spatially position if desired) Create sound objectwith AudioBIFS (mixing, filtering, reverb, etc) AudioBIFSmanipulation AudioSource AudioSource Inject sound into scene graph NaturalDecoder SyntheticDecoder Decode into raw audio samples Streaming compressed audio & synthesis controls

Summary • MPEG-4 Structured Audio • The international standard for algorithmic sound synthesis • MPEG-4 AudioBIFS • The international standard for audio postproduction • New market opportunities for • Hardware/software MPEG-4 players (embedded or not) • Authoring tools (editors, sequencers) • Advanced interactive audio content

What was this all about? • MPEG-4 is not just about compression • MPEG-4 shows one way for the IA world to move beyond wavetable synthesis

For more information • MPEG home page • http://www.cselt.it/mpeg • Requirements, future of MPEG • MPEG-4 SA home page • http://sound.media.mit.edu/mpeg4 • Draft standard, code, mailing lists, matchmaking • Contact • eds@media.mit.edu • Slides, technical papers, discussion available

MPEG-4 Structured Audio