1 / 47

Part 2

Part 2. Music Processing with MPEG-7 Low Level Audio Descriptors Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London. MPEG-7 Software Tools. ISO 15938-6 (Reference Software C++) http://www.lis.ei.tum.de/research/bv/topics/mmdb/e_mpeg7.html

makala
Download Presentation

Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 2 Music Processing with MPEG-7 Low Level Audio Descriptors Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London

  2. MPEG-7 Software Tools • ISO 15938-6 (Reference Software C++) • http://www.lis.ei.tum.de/research/bv/topics/mmdb/e_mpeg7.html • Audio Only Reference Software (Matlab) • http://ccc.soi.city.ac.uk/mpeg7 (City University Mirror)

  3. Audio Descriptions Header

  4. Audio Descriptions Segments

  5. Audio Descriptions Descriptor

  6. AudioSegmentType AudioDSType AudioLLDScalarType AudioDType AudioLLDVectorType SeriesOfScalarType SeriesOfVectorType ScalableSeriesType Containment Hierarchy for Audio Descriptors

  7. Audio LLD DataTypes

  8. Some Useful Descriptors for Music Processing • AudioSpectrumEnvelopeD • AudioSpectrumBasisD • AudioSpectrumProjectionD • SoundModelDS • SoundModelStatePathD • SoundModelStateHistogramD

  9. Other Useful Descriptors for Music Processing • AudioSpectrumFlatnessD • AudioHarmonicityD • AudioSpectrumCentroidD

  10. AudioSpectrumEnvelopeD • Log frequency scale spectral power coefficients • Total power preserved across logarithmic bands

  11. AudioSpectrumEnvelopeD [AudioSpectrumEnvelope, attributegrp, map, XMLFile] = AudioSpectrumEnvelopeType(audioFile,hopSize,attributegrp,writeXML,XMLFile,map) This function determines an AudioSpectrumEnvelope and also returns the map from linear to log bands. % EXAMPLE 1: AudioSpectrumEnvelopeD extraction ag.octaveResolution='1/4'; ag.loEdge=62.5; ag.hiEdge=8000; hopSize='PT10N1000F'; fname='e:\Beatles\1\000100.wav'; [ASE,ag]=AudioSpectrumEnvelopeD(fname,hopSize,ag,1,'ase.xml');

  12. AudioSpectrumEnvelopeD . . .

  13. AudioSpectrumEnvelopeD

  14. AudioSpectrumBasisD AudioSpectrumBasisD SVD / ICA Basis Rotation AudioSpectrumProjectionD

  15. AudioSpectrumBasisD AudioSpectrumBasisType -independent components of a spectrum matrix [V,env]=AudioSpectrumBasis(X, k, DDL_FLAG) Inputs: X - spectrum data matrix ( t x n, t=time points, n=spectral channels) k - number of components to extract DDL_FLAG - 1=write XML output. [0] Outputs V - n x k matrix of basis functions env - L2-norm envelope of log Spectrogram data (required for MPEG7) % EXAMPLE2: AudioSpectrumBasisD [ASB,env]=AudioSpectrumBasisD(ASE,10,'asb.xml');

  16. AudioSpectrumBasisD

  17. AudioSpectrumBasisD

  18. AudioSpectrumBasisD: Block Form

  19. AudioSpectrumProjectionD AudioSpectrumBasisD SVD / ICA Basis Rotation AudioSpectrumProjectionD

  20. AudioSpectrumProjectionD [P,maxenv] = AudioSpectrumProjectionD(X, V, XML) Inputs X = t x n matrix containing AudioSpectrumEnvelopeD values: t=timepoints,n=frequency bins V = n x k matrix containing AudioSpectrumBasisD values n=frequency bins, k=basis functions DDL_FLAG XML file name [optional] Output P = t x (1 + k) matrix where each row contains 1 x L2-norm envelope coefficient and k x spectral projection coefficients. % EXAMPLE3: AudioSpectrumProjectionD extraction [ASP,maxEnv]=AudioSpectrumProjectionD(ASE,ASB,'asp.xml');

  21. AudioSpectrumProjectionD

  22. Basis Reduction Independent Spectrum Basis Features Time Function Reconstruction 1 Component Spectral Feature High Channel Spectrogram 4 Components 10 Components

  23. Outer Product Spectrum Reconstruction Individual Basis Component

  24. 4 Component Reconstruction

  25. 10 Component Reconstruction

  26. Music Unmixing • Linear basis projection using SVD and ICA • spectrum subspace separation • fast computation of subspace ICA • full-rate filterbank masking • Blocked ICA functions • subspace reconstruction Y = XVV • cluster subspaces to identify “tracks” • sum masked filterbank output to create audio + j j j

  27. Drum Mixture dB Music Unmixing Example 1

  28. Music Unmixing Example 1

  29. Music Unmixing Example 1

  30. Music Unmixing Example 1

  31. Music Unmixing Example 2(Pink Floyd: stereo -> 9 subspace tracks)

  32. SoundModelDS

  33. 1 2 3 4 Sound Model DSand related descriptors AudioSpectrumBasisD ContinuousHiddenMarkovModelDS SoundModelStatePathD x 1 3 3 2 2 3 4 4 4 4 ... T(i,j) AudioSpectrumEnvelopeD AudioSpectrumProjectionD

  34. SoundModelDS - Bayesean inference of HMM parameters from training data Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...]) INPUTS: TrainingDataList - filename of training data list: WAV file names (one per line). nS - number of states in hidden Markov model [10] nB - number of basis components to extract [10] The following variables are optional, and are specified using ['parameter', value pairs] on the command line. 'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize 'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz 'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz 'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution 'sequenceHopSize' '', - HMM data window hop [whole file] 'sequenceFrameLength' '' - HMM data window length [whole file] 'outputFile' '' - Filename for Model output [stem+mp7.xml] 'soundName' '' - Model identifier name OUTPUTS: outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p} T - state transition matrix S - initial state probability vector M - stacked means matrix (1 vector per row) C - stacked inverse covariances V - AudioSpectrumBasis vectors maxenv- scaling parameter for model decoding p - training cycle likelihoods outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme

  35. SoundModelDS - Bayesean inference of HMM parameters from training data Y = SoundModelDS(TrainingDataListFile, nS, nB [,OPTIONAL ARGUMENTS...]) INPUTS: TrainingDataList - filename of training data list: WAV file names (one per line). nS - number of states in hidden Markov model [10] nB - number of basis components to extract [10] The following variables are optional, and are specified using ['parameter', value pairs] on the command line. 'hopSize' 'PT10N1000F' - AudioSpectrumEnvelopeD hopSize 'loEdge' 62.5, - AudioSpectrumEnvelopeD low Hz 'hiEdge' 16000, - AudioSpectrumEnvelopeD high Hz 'octaveResolution' '1/8' - AudioSpectrumEnvelopeD resolution 'sequenceHopSize' '', - HMM data window hop [whole file] 'sequenceFrameLength' '' - HMM data window length [whole file] 'outputFile' '' - Filename for Model output [stem+mp7.xml] 'soundName' '' - Model identifier name OUTPUTS: outputFile.dat = matlab struct Y.{T,S,M,C,X,maxenv,V,p} T - state transition matrix S - initial state probability vector M - stacked means matrix (1 vector per row) C - stacked inverse covariances V - AudioSpectrumBasis vectors maxenv- scaling parameter for model decoding p - training cycle likelihoods outputFile.mp7 = XML file containing MPEG-7 SoundModel description scheme Process Small Chunks = Local Dynamics Model

  36. SoundModelDS

  37. SoundModelDS

  38. SoundModelStatePathD A simplified representation of spectral dynamics State Path

  39. SoundModelStatePathD [Path,loglike]=SoundModelStatePathD(soundfilename, arg2 [,OPTIONAL ARGS]) Compute HMM State Path and log likelihood of sequence data Inputs: soundfilename - filename of input sound (.wav or .au) arg2 - SoundModelDS structure or filename of binary SoundModelDS instance (.mat) The following variables are optional, and are specified using 'parameter' value pairs on the command line. 'hopSize' 'PT10N1000F' 'loEdge' 62.5, 'hiEdge' 16000, 'octaveResolution' '1/8' 'sequenceHopSize' '', 'sequenceFrameLength' '' % EXAMPLE 5: SoundModelStatePathD extraction [Path,ll]=SoundModelStatePathD(fname,Y,'octaveResolution','1/4','hiEdge',8000);

  40. SoundModelStatePathD

  41. SoundModelStatePathD BEATLES: A Hard Day’s Night state index seconds state index 0.01s Frames

  42. SoundModelStateHistogramD SoundModelStateHistogramD(Path, Nstates, [segSkip], [segLen]) Extract normalized segmental state-path histograms Inputs: Path - SondModelStatePathD output Nstates - Number of states in SoundModel [segSkip] - hop size in samples [segLen] - histogram length in samples Outputs: H - t x n matrix containing segmented state occupancy histograms t=time points, n=states % EXAMPLE 6: SoundModelStateHistogramD extraction H=SoundModelStateHistogramD(Path,10,100,1000);

  43. SoundModelStateHistogramD state index 0.01s Frames state index seconds

  44. S-Matrix • Similarity Function • Segmented Histograms are Unit Norm • Outer Product Computes Similarity Matrix >>size(H) ans = 137 10 >>S = H * H’ ; % Similarity Matrix >>imagesc(S); >>D = real(acos(S)); % Dissimilarity Matrix

  45. S-Matrix

  46. Sound Replacement and Audio Mosaics • Find segments similar to target segment • Similarity Scores Computed between Histograms. • Cluster with k-means or pair-wise clustering. • Replace with similar (but different) material • Segmentation boundaries (beat alignment) • EXAMPLES

  47. Acknowledgements • International Standards Organisation • ISO/IEC JTC 1 SC29 WG11 (MPEG) • Mitsubishi Electric Research Labs • Massachusetts Institute of Technology • Music Mind Machine Group (formerly Machine Listening Group) • Paris Smaragdis, Youngmoo Kim, Brian Whitman • Iroro Orife, John Hershey, Alex Westner, Kevin Wilson • City University • Deparment of Computing • Centre for Computational Creativity

More Related