450 likes | 607 Views
Creating Sound Effects and Sound Textures from Examples. An exercise in audio synthesis. Jim Parker, MinkHollow Media. ViPER. The Booze Cruise. Advergames /Educational Experiences. parker@minkhollow.ca. 403 932 5101 Office 403 962 1327 Cell.
E N D
Creating Sound Effects and Sound Textures from Examples An exercise in audio synthesis Jim Parker, MinkHollow Media
ViPER • The Booze Cruise Advergames/EducationalExperiences parker@minkhollow.ca 403 932 5101 Office 403 962 1327 Cell
A friend of mine always begins his game talks with this question. I am currently playing PlanetSide 2, Sony online multiplayer FPS. What am I Playing? parker@minkhollow.ca 403 932 5101 Office 403 962 1327 Cell
Takes a lot of space. • Download time • Can be expensive to get right • There can be a lot of different clips Repetition – we tend to use the same clips for every instance, and the repetition kills. One pistol shot, one engine sound, etc. etc. Sound We have a better memory for such things than you might think. Sound familiar? Wolf Thunder
Given the space needed by audio, the space limitations of the media available (or bandwidth how can we introduce more variety in the sounds we use? Can we reduce the cost associated with creating new effects? Answer is ‘YES”. … by re-using what you already have, and by creating new versions of sounds on the fly. The Problem.
Sound texture – ambient sound, or sound that has a character but no particular event, no sharp changes in volume or frequency, a consistent temporal appearance. Sound effect – a sound representing an event, such as a collision or weapon. Has a changing temporal character, a beginning, middle, and end. Music – possesses specific melodic and harmonic structure, definite rhythm. Creating new sounds would be valuable for the creation of games, animations, web sites, and virtual/enhanced reality spaces. The Problem.
We call this method ‘particle audio’, and it is related to granular synthesis but it does not use synthetic granules, so and it has a matching and fading problem. Here’s and example of granular synthesis, where small pieces (1-50 ms) are layered, played at various speeds and phases, volume, and frequency adjustments. . We break up an existing sound into small pieces and re-order them. Sounds too simple. How?
Generating Audio By Example We have in fact devised 3 distinct techniques for synthesizing sounds from samples. Based on previous work in computer graphics on texture synthesis, it is possible to create new sounds from examples, so that looping is not needed. All are descended from the same basic method.
Basic method • Read in and decode a sound file (EG .wav). • Break it into fixed sized blocks, each representing a small time interval. • Choose one to begin the new sound • Find one (at random) that has a starting portion that matches the end of the first block, more or less. • Fade initial block into the new one. • Repeat We do this with software
Second Method – Gaussian Pyramids This algorithm uses a tree to represent the audio texture, in which each level represents a different level of detail. • The input signal is the bottom level. The next level up is found by convolving the input with a Gaussian kernel, essentially performing a low-pass filter. • The new level contains less data than the original, and is stored using half as many samples. • The process is repeated until the desired tree height is achieved, and the result will be referred to as the input pyramid.
Second Method – Gaussian Pyramids Was originally used in computer graphics for visual textures.
Second Method – Gaussian Pyramids • When the audio sample is loaded, a ‘Gaussian pyramid’ is built by: • convolving the signal with a Gaussian kernel. This is effectively a low-pass filter. • Each subsequent level is stored using only half as many data points as the previous, and contains only the lower frequencies. • A five-sample Gaussian kernel is • used for the convolution. Gaussian=Normal curve • or ‘bell curve’
Analysis 3 2 1
3 5 2 1 4 2 Synthesis • Synthesis beings at the top level of the pyramid, which is initialized to a random set of blocks. • Synthesis proceeds one level at a time, from the top down, with the rationale being that the lower frequencies are constructed first, and the higher frequencies (detail) worked out later • We go from left to right, top to bottom, so that the next block to be selected would be the one marked with the blue* • The actual neighborhood size can be made arbitrarily large, and will influence the output texture. An L-shaped neighborhood of previously synthesized blocks is considered. An example of this neighborhood is outlined in green.
3 5 2 2 Synthesis In selecting which block to use next in the synthesis, we consider all candidates blocks in the corresponding level of the input pyramid. neighborhood from synthesis pyramid
3 5 2 2 Synthesis In selecting which block to use next in the synthesis, we consider all candidates blocks in the corresponding level of the input pyramid. neighborhood from synthesis pyramid Original Input pyramid
3 5 2 2 Synthesis In selecting which block to use next in the synthesis, we consider all candidates blocks in the corresponding level of the input pyramid. neighborhood from synthesis pyramid ? In the diagram, the block marked with a blue ‘?’ is being considered as a candidate for the next block in synthesis.
3 5 2 2 Synthesis The neighborhood blocks of the candidate are compared against the neighborhood from the synthesis pyramid using the distance measures described previously. neighborhood from synthesis pyramid What’s the distance = how similar? The best candidate (which is the one whose neighborhood differs the least from that from the synthesis pyramid) is selected as the next block. ?
Similarity • The best candidate (which is the one whose neighborhood differs the least from that from the synthesis pyramid) is selected as the next block. • So we need ways to tell how similar two sounds blocks are to each other. • N-Vector Euclidean Distance • Considers each block of samples to be an n-vector. • The distance between two blocks is simply the Euclidean distance between the two n-vectors. • Not very accurate, as this measure is too sensitive to things like phase and inversion.
Similarity • 2. RMS Energy • This method attempts to characterize the amplitude or volume of the audio signal. • A single number, the Root Mean Squared, is calculated to represent the energy of the signal. • The distance between two blocks is the difference between the RMS energies. • This method does not take into consideration any differences in the frequency composition of the audio data.
Sorry – The Math p0,q0 Distance RMS p1,q1
Since there is a predetermined number of blocks in the source texture, the distance between each pair of blocks can be computed prior to synthesis speed up the synthesis step. • The distances can be pre-computed for each level in the pyramid according to the distance metric shown. • These distances can be stored in lookup tables such as the one shown. Computation
Sample Results Original sound Synthetic/Gaussian pyramid
Sample Results Original sound Synthetic/Gaussian pyramid
Sample Results Original sound Synthetic/Gaussian pyramid
Sample Results Original sound Synthetic/Gaussian pyramid
Sample Results Original sound Synthetic/Gaussian pyramid
Non-Texture Sounds These methods work for some non-texture sounds too! Coins falling into a machine Bells Animals Bubbles
Sound Effects These are sounds that represent specific events (gun shot, car horn, footsteps, etc) Unlike textures, effects/events have a distinct beginning, middle, and end. The entire sound is contained in and partly defined by an envelope.
Envelope Extraction Can use multiple example files. The first step is to stretch or compress the audio files so that they are the same length. Create an amalgam image that is the average of all sound files being used..
Envelope Extraction The envelope is the boundary of the signal region in the resulting amalgam image.
Envelope Extraction This envelope was created from a set of missile-launch sound effect inputs. This amalgam is then broken into 11 regions and the root-mean-square (RMS) amplitude of the region is calculated. Then line segments are drawn between the enclosing points
Locality In the creation of synthetic sound effects the quilting method was used. When searching for the next sample to use, the search was restricted to an area (time interval) near the current position in the output stream. We decided to restrict the size of the window to 15% of the size of the input. 1 2 3 4 5
Locality (details) absolute position - We will search for matches in each input within +/- 7.5% of the current index N. If an input sample has a total length less than N samples, it will not be considered in the synthesis of the selected sound. . A second variant, called relative positioning, uses a progress ratio. If the output stream is supposed to be Lout samples long, and we have written out N, we will search the region within +/- 7.5% of (Lout/ N)*Lin.
Original Synthetic Gun Guns Missile Missile RevRevs Effects - Results
New sounds can be generated in slack CPU periods (ha ha). After one sound is used, signal the creation of a new one. Never generate new sounds from generated sounds. (errors propagate) Conclusions and Summary
Particle Audio … has other uses Speed up/slow down music Transmits sounds with high compression
Questions,Comments? Dr. Jim Parker parker@minkhollow.ca Thanks for listening