140 likes | 256 Views
LyricSearch: A Text-Based Search Engine for Lyrics. R96921033 徐兆良 R96942052 林佑璟. Outline. Introduction System description Data collection Query Expansion – PLSA 、 WordNet Evaluation Conclusion Future Work. Introduction. Traditional lyrics search method: Artist Title of song Lyrics
E N D
LyricSearch: A Text-Based Search Engine for Lyrics R96921033 徐兆良 R96942052 林佑璟
Outline • Introduction • System description • Data collection • Query Expansion – PLSA、WordNet • Evaluation • Conclusion • Future Work
Introduction • Traditional lyrics search method: • Artist • Title of song • Lyrics • If user wants a playlist for a wedding… • Search by theme • What if the terms not in the lyrics? • How to connect song’s theme and lyrics is the main issue
Framework Lyrics dataset Text Pre-processing Model Creation Vocabulary Term Inverted-file Stemming & Stop words removing Text Pre-processing Query Expansion Text Retrieval Query (theme) Ranking song list PLSA or WordNet Okapi BM25 Result Evaluation Ranking song list Evaluation scores Ground-truth
Data Collection • Data collection • Annotations from AMG • Lyrics from Web Lyrics Search Engine • Data statistics • 500 albums • 3267 songs • 67 themes
WordNet • A large lexical database of English • Synonyms • Hyperonyms • Antonyms • Query expansion • Find synset of the query terms • regret: repent rue ruefulness sorrow
PLSA - Query Expansion • Find top K most similar terms (KNN) • Fast search: KD-tree sun sky fly P(z|wi)
Evaluation – Query Expansion (1/2) • AP of each query (total 67 themes)
Evaluation – Query Expansion (2/2) • MAP (PLSA expansion VS random expansion)
PLSA result • The top 10 words in latent topics of PLSA Lyrics General Articles P(w|zi)
Conclusion • Most terms used in theme annotations do not appear in the corresponding lyrics • Query expansion is necessary • Query expansion with PLSA can improve the performance of lyrics search • Lyrics are often short and repeated, so there are few meaningful terms in each lyrics. • The concepts of PLSA are not obvious different • The performance of PLSA is not good enough
Future Work • Use different expansion methods and compare the evaluation results with PLSA • WordNet • WordNet + PLSA • Others? • Lyrics expansion • WordNet