400 likes | 487 Views
“Reinventing the wheel”:A Novel approach to music player interfaces. Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007. Present by Yi-Tang Wang. Outline. Introduction Audio-Based Similarity Web-Based Similarity
E N D
“Reinventing the wheel”:A Novel approach to music player interfaces Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang
Outline • Introduction • Audio-Based Similarity • Web-Based Similarity • Problem Modeling • Evaluation and Results • Conclusion & future work
Introduction • A novel music player interface using a wheel • Generating a circular playlist from personal repositories • Keeps on playing similar tracks • Not only audio-based similarity is used, but also text-based similarity
Audio-Based Similarity • MFCCs ( Mel frequency cepstral coefficients ) • Discarding the higher-order MFCCs • beneficial for the ability to compare different frames, but possibly at the cost of discarding musically meaningful information.
Audio-Based Similarity • The wave file were downsampled to 22 kHz • 19 MFCCs per frame • Ignoring the temporal order • Model the distribution of MFCC coefficients with Gaussian mixture model
Audio-Based Similarity • Similarity between music • Compute the distance between two GMM • Likelihood • computing the probability that the MFCCs of song A be generated by the model of B • Drawback: need to store all MFCC coefficients
Audio-Based Similarity • Sampling • Only store the GMM parameters, instead of storing MFCCs • Sample from one GMM • compute the likelihood given another GMM • Corresponds roughly to re-creating a song
Web-Based Similarity • Cultural, social, historical, and contextual aspects should be taken into account • WWW information • Query using artist’s name + ”music” with Google • 50 top-ranked pages are retrieved • Remove all terms that - # of occur page < c • Such that about 10000 terms remain
Web-Based Similarity • Term frequency tfta • a : artist , t : term • # of occurrences of t in documents related to a • Document Frequency dft • # of pages t occurred in • Term weight per artist • term frequency × inverse document frequency
Web-Based Similarity • Each artist is described by a vector of term weights • Apply cosine normalization on the vector • Euclidean distance is a simple similarity measure • In this paper, we use SOM as measure method
Web-Based Similarity - SOM • SOM -Self-organizing Maps • a subtype of artificial neural networks • It is trained using unsupervised learning • low dimensional representation of the training samples while preserving the topological properties of the input space • Using a rectangular 2-D grid in this paper for text-based similarity between songs
Web-Based Similarity - SOM • A SOM consists of units • A model vector in the high-dimensional input data space is assigned to each of the units. • model vectors which belong to units close to each other on the 2-D grid, are also close to each other in the data space. • Training to choose model vectors Unit
Web-Based Similarity - SOM • Batch-SOM algorithm • Initial • Randomly initialize the model vector • 1st step • for each data item xi, the Euclidean distance between x and each model vector is calculated • each data item x is assigned to the unit ci that represents it best.
Web-Based Similarity - SOM • 2nd step • neighborhood relationship between two units is usually defined by a Gaussian-like function • hjk = exp(-djk2/rt2) • djk= distance on the map , rt= neighborhood radius • rt decrease with each iteration (the adaptation strength decreases gradually)
Web-Based Similarity - SOM • Two artist is similar if they are mapped to same or adjacent units Newer experiments have actually shown that 6 × 6 grid might be better for this collection
Combining two approach • Adding a constant value to the audio-based distance matrix for all songs of dissimilar artists • Half of maximum audio-based distance • Adding Penalty to transitions between songs by dissimilar artist
Previous work • P. Knees, M. Schedl, T. Pohle and G.Widmer, “An Innovative Three Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web,” ACM MM’06 • Audio-based similarity – Fluctuation Patterns • Using SOM only on audio-based data • Labeling SOM with information from www • A 3-D browsing system
Problem Modeling • Map the playlist generation problem to Traveling Salesman Problem • The cities correspond to the tracks in collection • The distances are determined by the similarities between the tracks • Find a optimal route = producing a circular playlist
TSP Problem • Greedy Algorithm • All edges are examined in order of increasing length and add to the route properly • Minimum Spanning Tree • Found a minimum spanning tree and do DFS • Connecting the nodes in the order they are first visited • LKH • Lin-Kernighan algorithm proposed in 1971 • Start with randomly generated tour • Deleting edges from the route and recombining the remaining tour fragments
TSP Problem • One-Dimensional SOM • Train a 1-D cyclic SOM • a circular playlist • As many units as tracks? • Recursive approach • Combining subtour in a greedy manner
Evaluation & Results • Collection 1 • 2545 tracks, 13 genres • A Cappella (4.4%), Acid Jazz (2.7%), Blues (2.5%), Bossa Nova (2.8%), Celtic (5.2%), Electronica (21.1%), Folk Rock (9.4%), Italian (5.6%), Jazz (5.3%), Metal (16.1%), Punk Rock (10.2%), Rap (12.9%), and Reggae (1.8%) • 103 artists • for each artist, minimum - 8 tracks, maximum - 61 tracks
Evaluation & Results • Collection 2 • 3456 tracks, 7 genres • Classical (14.7%), Dance (15.0%), Hip-Hop (14.5%), Jazz (13.6%), Metal (14.9%), Pop (11.6%), and Punk (15.6%). The minimum number • 339 artists • for each artist, minimum - 1 tracks, maximum - 317 tracks
Fluctuations Between Genres • A Cappella, Acid Jazz, Blues, Bossa Nova, Celtic, Electronica, Folk Rock, Italian, Jazz, Metal, Punk Rock, Rap, andReggae (collection 1)
Shannon Entropy • Estimate how locally coherent a playlist is • Count how many of n consecutive tracks belonged to each genre • n = 2…12 • Typical album contains about 12 tracks • Average over the whole playlist • SOM yields better results on web-enhanced data than LKH on audio only data
Long-Term Consistency • SOM algorithm on combined data
Long-Term Consistency • MinSpan algorithm on audio similarity data
Long-Term Consistency • Greedy algorithm on audio similarity data
User Study • 10 test persons using the collection 2 • Create a large playlist • Extract 10 seed tracks • Randomly choosing a start point • Selecting tracks at intervals of 3 degress • Generate two playlist • Adding the next nine tracks • Randomly choose from same genre
User Study • Users rate each playlist from 1 to 5 • Summing up rating scores • Calculate the difference tspi,j-geni,j • i : playlist no. , j : user
User Interface • The user interface is very intuitive and its handling extremely easy • Apple’s iPod • Users’ opinion • A scanning function to skip 10 seconds when pressing • Genres containing only a few tracks are quite difficult to locate • Not usable when finding a specific track
Summary of Evaluation Result • all TSP algorithms provided better results with respect to our playlist evaluation criteria when using the web based extension • the combined similarity measure reduces the number of unexpected placements of tracks in the playlist
Summary of Evaluation Result • LKH and greedy algorithm • best small-scale genre entropy values • large-scale genre distributions are quite fragmented • SOM-based algorithm • highest entropy values • the least fragmented long-term genre distributions • MinSpan algorithm • in the middle field regarding the entropy values
Conclusion & future work • a new approach to conveniently access the music stored in mobile sound players • The whole collection is ordered in a circular playlist and thus accessible with only one input wheel • two different similarity measures — one relying on timbre information, the other on a combination of timbre and community metadata gathered from artist related web pages
Conclusion & future work • Problems to solve • Not possible to precisely select a desired piece • only tracks selectable that are representative for a region • zooming or hierarchical structuring techniques • The user does not know in advance which region on the wheel contains which style of music
Conclusion & future work • M. Schedl, T. Pohle, P. Knees, and G.Widmer, “Assigning and visualizing music genres by web-based co-occurrence analysis,” in Proc. 7th Int. Conf. Music Information Retrieval (ISMIR’06), Victoria, Canada, Oct. 2006.