1 / 29

組員: 江啟賓 張展華 陳威呈 胡家豪

Understandable Models of Music Collections Based On Exhaustive Feature Generation With Temporal Statistics Fabian Moerchen , Ingo Mierswa , Alfred Ultsch In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), pp. 882-891.

cloris
Download Presentation

組員: 江啟賓 張展華 陳威呈 胡家豪

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understandable Models of Music Collections Based On Exhaustive Feature Generation With Temporal StatisticsFabian Moerchen , Ingo Mierswa , Alfred UltschIn KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), pp. 882-891 組員:江啟賓 張展華 陳威呈 胡家豪

  2. Outline • Introduction • Related Work • Audio Feature Generation • Semantic Audio Features • Evaluation • Conclusions • Discussions • Applications

  3. Introduction(1/2) However, it’s really hard to understand!!! • Confronted with music data, data mining encounters a new challenge of scalability. Music databases store millions of records and each item contains up to several million values.  Extract features from the audio signal which leads to a strong compression of the data set at hand. • Artist and genre classification or retrieval of similar music can be performed with machine learning methods utilizing these features. • Many researchers use features motivated by heuristics on music structure and psychoacoustic analysis of frequency and modulation of sound. But not all features need to be relevant for a particular task. • The result of applying signal processing and statistical methods cannot easily be explained to the common user of music applications.

  4. Introduction(2/2) Contribution: • The authors use logistic regression in order to obtain concise and interpretable features summarizing a subset of the complicated features generated directly from polyphonic audio.

  5. Related Work • Stacking → building the new features → decision model → prediction • Mel Frequency Cepstral Coefficient (MFCC) • Support Vector Machines (SVM) • Linear Discriminant Analysis or linear predictive coefficients(LPC)

  6. Audio Feature Generation(1/5) • The raw audio data of polyphonic music is not suitedfor direct analysis with data mining algorithms. • various sound impressions • Extracting audio features on short time windows • short-term features • long-term features

  7. Audio Feature Generation(2/5) • The authors used four disjoint data sets for the evaluation of our method. • sampling frequency of 22kHz • lead in and lead out effects

  8. Audio Feature Generation(3/5) Short-term features: Including some variants obtained by preprocessing the features, e.g., the algorithm of the Chroma features, a total of 140 short-term features was generated.

  9. Audio Feature Generation(4/5) • Long-term features: • The authors used the first four moments, robust variants by removing the largest and smallest 2.5% of the data prior to estimation. • These ten statistics are also applied to the first and second order differences and the first and second order absolute differences, generating 40 additional features.

  10. Audio Feature Generation(5/5) • The cross-product of short- term and long-term feature functions amounts to 140 × 284 = 39, 760 long-term audio features. • Obviously, this can take a lot of computation timeand memory.

  11. Semantic Audio Feature(1/6) • 40,000 features are huge and hard to be understood. • The goal of the section is to simplify the features and eliminate the irrelevant features. • The author’s idea is to adapt the Stacking.

  12. Semantic Audio Feature(2/6) • “In contrast to Stacking we do not learn the same concept on differentsubsamples but different concepts on the same sample.”

  13. Semantic Audio Feature(3/6) • D are the data sets describing these different concepts. • Dk ∩ Dl= ∅ ⇒ Dk= Dland D = Dk D1 D3 D2 D4

  14. Semantic Audio Feature(4/6) • the authors applied a robust z-transformation to each long-term feature and a logistic regression learner for each of the K classification tasks. • Since the values are already normalized, it is not necessary to apply post-processing scaling schemes after learning a classification function.

  15. Semantic Audio Feature(5/6) Using Laplace priors for the influence of each feature leads to a built-in feature selection that reduces runtime and avoids over-fitting of the final model.

  16. Semantic Audio Feature(6/6) • Therefore, Using these likelihood predictions as new feature set reduces the amount of features from 40,000 to K(K<10).

  17. Evaluation(1/7) • Analysis of semantic audio features • Genre classification • Interpretability

  18. Evaluation(2/7) Analysis of semantic audio features: • The logistic regression learning of the genre ground truth worked very well within the RADIO and GTZAN data sets. • For both the training and the disjunct test part of the data, the separation of Metal from the remaining music is clearly visible.

  19. Recall v.s Precision • Recall |Ra|/ |R| - The fraction of relevant items which have been retrieved • Precision |Ra|/ |A| - The fraction of relevant items which have been relevant

  20. Evaluation(3/7) • The precision and recall values as measured on the test set are listed. • The features columns show the number of samples picked out of the almost 40,000 candidate features.

  21. Evaluation(4/7) • In left table lists the long-term features picked for 5 or 6 of the 7 models. • In right table, the authors investigated which features had the largest absolute weights in the logistic regression models, indicating their relative importance in the decision for a genre.

  22. Evaluation(5/7) Genre classification: SVM, KNN, C4.5

  23. Evaluation(6/7)

  24. Evaluation(7/7) Interpretability:

  25. Discussions • If the users provide a categorization of some music he knows well, our method could generate personalized features that describe how much does this sound like other music that makes me happy. • One advantage of logistic regression is, that the numerical values do not need preprocessing for methods relying on distance calculations like k-nearest neighbor classification, k-Means clustering, or visualization with Emergent Self-Organizing Maps(ECOM). • The amount of candidate features is only limited by the computational resources. Using more long-term features, the accuracy of our models can still be increased.  Calculate quite time consuming.

  26. Discussions • Some of the x-axis and y-axis of figures can not understand what the author mean. • Some references url are not available, for example: http://marsyas.sf.net. • Long-term feature • C4.5 decision tree

  27. Conclusions • Exhaustive feature generation is used to capture many different aspects of the raw audio data that cannot be used directly. • This can be seen as a meta learning technique loosely related to stacking. • The resulting low-dimensional vector based representations can efficiently be used for music mining tasks in like genre classification, recommendation, or visualization of music collections.

  28. Applications • Text mining with large feature sets corresponding to words occurring in documents or video mining where many features could be derived by combining short-term and long-termdescriptions as we did for music. • News or some of the applications. For example: Shazam/SoundHound/Track ID, stock market.

  29. Thanks for your listening

More Related