10 likes | 155 Views
Towards Automatic Topic Detection of Folksongs. Shyamala Doraisamy Stefan R üger Faculty of Computer Science and Information Technology Knowledge Media Institute University Putra Malaysia The Open University Malaysia United Kingdom. THEMATIC CATEGORIES.
E N D
Towards Automatic Topic Detection of Folksongs Shyamala Doraisamy Stefan Rüger Faculty of Computer Science and Information Technology Knowledge Media Institute University Putra Malaysia The Open University Malaysia United Kingdom THEMATIC CATEGORIES Topics from Cecil Sharp’s Collection of English Folk Songs [2] Formal Databases - eg: An Index Search with the Roud Folksong Index from the Vaughn Williams Memorial Library (VWML) at www.library.efdss.org • Folksong collections in general are indexed by the collectors’ recorded data such as titles and, place collected, performer, etc • Folksong collection tasks are based on an oral tradition and several lyric versions of the same song could be available • Thematic categorisation of folksongs are commonly performed by collectors or bibliographers • A subjective lyrics analysis would be required for this task • Automated topic modelling would be useful to support folksong thematic categorisation tasks Record 3 of 187800 Record 2 of 187800 • Formal Folk Song Collections and Bibliographies • Examples of collections with thematic categorisations • Cecil Sharp’s Collection of English Folk Songs [2] • David Atkinson, English Folksong Bibliography: An Introductory Bibliography Based on the Holdings of the Vaughn Williams Memorial Library, 3rd (electronic) edition, 2006 • Informal collection from the Internet, • Eg: http://www.folkinfo.org with an alphabetically organised folksong collection, Record 1 of 187800 Song title: Tune First line Informal DataBases - eg: Indexed alphabetically from www.folkinfo.org providing notation, lyrics, notes and descriptions of songs and song index number (eg: Roud index) if available Discussion Notes Notation Lyrics There was a Lady …..,Lay the Bent to the……,And she had lovely …..,Fa, la la la, fa, la…..There was a Knight of Noble…..,Which also lived in the …… , AUTOMATIC TOPIC MODELLING • Modelling text corpora and discrete data collections • to find short descriptions of the members of a collection that enable efficient processing of a large collection • Topic Modelling has been applied to song lyrics text corpora • Relatively few or no related studies on English Folksong lyrics from the English Tradition • Folksong Lyrics vs Contemporary music lyrics • Classification • Genres vs themes • Vocabulary • Modern vs Old English • To utilise Latent Dirichlet Allocation (LDA) , a generative probabilistic model proposed by Blei et. Al [1] for topic model modelling Folksong Lyrics Collection Topic models Labeled models Latent Topic Analysis PRELIMINARY RESULTS • Experimentation • 940 folksongs were obtained from www.folkinfo.org in abc music notation format • Pre-processed to remove notation tags, hyphens and punctuations marks • Topic analysis performed using the GibbsLDA++ package [4] • Number of topics for analysis were set to 5, 10, 15, 20 and 25 • Results • Topics output were analysed for mapping based on topics identified from Cecil Sharp’s Collection of Folk Songs [2] • With 10 topics, approximate mapping was able to be performed as shown in Results Table • With more than 10 topics, too many junk and insignificant topics were identified OBSERVATIONS • Preliminary results show the feasibility of topic modelling of folksongs using LDA • Further investigation would be needed to reduce the insignificant topics identified • Future work • Topic Significance Ranking techniques to be tested to eliminate insignificant topics • Subject matter experts for performance validation • Larger data collections comprising folksongs in English from America, Australia, etc. • Topic Significance Ranking • To evaluate topic significance using the approach proposed by Alsumait et. al. [3] • The distance between a topic distribution and three definitions of “junk distribution” is computed to determine topic significance REFERENCES [1] Blei, D.M., Ng. A.Y., Jordan, M.I., Latent Dirichlet Allocation. The Journal of Machine Learning research 3, 993-1022 (2003). [2] Cecil Sharp’s Collection of English Folk Songs, edited by Maud Karples, Vol. 1 & 2, Oxford University Press, 1974. [3] AlSumait, L., Barbara, D., Gentle, J., Domeniconi, Topic Significance Ranking of LDA Generative Models, W. Buntine et. Al. (Eds.): ECML PKDD 2009, part 1, LNAI 5781, pp. 67-82,. [4] http://gibbslda.sourceforge.net