Speaker: 黃宥、陳縕儂

National Taiwan University, Taiwan Automatic Key Term Extraction fromSpoken Course LecturesUsing Branching Entropy and Prosodic/Semantic Features Speaker: 黃宥、陳縕儂

Outline • Introduction • Proposed Approach • Branching Entropy • Feature Extraction • Learning Method • Experiments & Evaluation • Conclusion Key Term Extraction, NTU

Introduction Key Term Extraction, NTU

Definition • Key Term • Higher term frequency • Core content • Two types • Keyword • Key phrase • Advantage • Indexing and retrieval • The relations between key terms and segments of documents Key Term Extraction, NTU

Introduction Key Term Extraction, NTU

Introduction language model n gram hmm acoustic model hidden Markov model phone Key Term Extraction, NTU

Introduction bigram language model n gram hmm acoustic model Key Term Extraction, NTU hidden Markov model phone Target: extract key terms from course lectures

Proposed Approach Key Term Extraction, NTU

Automatic Key Term Extraction ▼ Original spoken documents Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key Term Extraction, NTU

Automatic Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key Term Extraction, NTU

Automatic Key Term Extraction Phrase Identification Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key Term Extraction, NTU First using branching entropy to identify phrases

Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key terms entropy acoustic model : Key Term Extraction, NTU Learning to extract key terms by some features

Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key terms entropy acoustic model : Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can • “hidden” is almost always followed by the same word : : : : Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word : : : : Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can • “hidden” is almost always followed by the same word • “hidden Markov” is almost always followed by the same word • “hidden Markov model” is followed by many different words : : : : boundary Define branching entropy to decide possible boundary Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can : : X xi : : • Definition of Right Branching Entropy • Probability of children xi for X • Right branching entropy for X Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can : : X : : boundary • Decision of Right Boundary • Find the right boundary located between Xand xiwhere Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can : : : : Key Term Extraction, NTU

How to decide the boundary of a phrase? Branching Entropy represent is of is in hidden Markov model can : : X : : boundary • Decision of Left Boundary • Find the left boundary located between X and xi where X: model Markov hidden Key Term Extraction, NTU Using PAT Tree to implement

1 4 6 5 2 3 How to decide the boundary of a phrase? Branching Entropy • Implementation in the PAT tree • Probability of children xi for X • Right branching entropy for X hidden X: hidden Markov x1: hidden Markov model x2: hidden Markov chain Markov X state variable chain x2 model Key Term Extraction, NTU distribution x1

Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key terms entropy acoustic model : Key Term Extraction, NTU Extract some features for each candidate term

Feature Extraction Speaker tends to use longer duration to emphasize key terms • Prosodic features • For each candidate term appearing at the first time duration of phone “a” normalized by avg duration of phone “a” using 4 values for duration of the term Key Term Extraction, NTU

Feature Extraction Higher pitch may represent significant information • Prosodic features • For each candidate term appearing at the first time Key Term Extraction, NTU

Feature Extraction Higher energy emphasizes important information • Prosodic features • For each candidate term appearing at the first time Key Term Extraction, NTU

Feature Extraction • Lexical features Using some well-known lexical features for each candidate term Key Term Extraction, NTU

Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Probability tj: terms Di:documents Tk: latent topics Key Term Extraction, NTU

Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Probability non-key term key term How to use it? Key Term Extraction, NTU describe a probability distribution

Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Significance Within-topic to out-of-topic ratio non-key term out-of-topic freq. key term within-topic freq. Key Term Extraction, NTU

Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Entropy non-key term key term Key Term Extraction, NTU

Feature Extraction Key terms tend to focus on limited topics • Semantic features • Probabilistic Latent Semantic Analysis (PLSA) • Latent Topic Entropy non-key term Higher LTE key term Lower LTE Key Term Extraction, NTU

Automatic Key Term Extraction Phrase Identification Key Term Extraction Learning Methods K-means Exemplar AdaBoost Neural Network Archive of spoken documents ASR trans Feature Extraction Branching Entropy ASR speech signal Key terms entropy acoustic model : Key Term Extraction, NTU Using learning approaches to extract key terms

Learning Methods • Unsupervised learning • K-means Exemplar • Transform a term into a vector in LTS (Latent Topic Significance) space • Run K-means • Find the centroid of each cluster to be the key term The terms in the same cluster focus on a single topic The term in the same group are related to the key term The key term can represent this topic Key Term Extraction, NTU

Learning Methods • Supervised learning • Adaptive Boosting • Neural Network Automatically adjust the weights of features to produce a classifier Key Term Extraction, NTU

Experiments & Evaluation Key Term Extraction, NTU

Experiments • Corpus • NTU lecture corpus • Mandarin Chinese embedded by English words • Single speaker • 45.2 hours 我們的solution是viterbi algorithm (Our solution is viterbi algorithm) Key Term Extraction, NTU

Experiments some data from target speaker • ASR Accuracy SI Model Bilingual AM and model adaptation AM CH EN trigram interpolation LM Adaptive Key Term Extraction, NTU Background In-domain corpus Out-of-domain corpora

Experiments • Reference Key Terms • Annotations from 61 students who have taken the course • If the k-th annotator labeled Nk key terms, he gave each of them a score of , but 0 to others • Rank the terms by the sum of all scores given by all annotators for each term • Choose the top N terms form the list (N is average Nk) • N = 154 key terms • 59 key phrases and 95 keywords Key Term Extraction, NTU

Experiments • Evaluation • Unsupervised learning • Set the number of key terms to be N • Supervised learning • 3-fold cross validation Key Term Extraction, NTU

Experiments • Feature Effectiveness • Neural network for keywords from ASR transcriptions F-measure 56.55 48.15 42.86 35.63 20.78 Pr: Prosodic Lx: Lexical Sm: Semantic Key Term Extraction, NTU Prosodic features and lexical features are additive Three sets of features are all useful Each set of these features alone gives F1 from 20% to 42%

Experiments AB: AdaBoost NN: Neural Network F-measure • Conventional TFIDF scores w/o branching entropy • stop word removal • PoS filtering 67.31 • Overall Performance 62.39 55.84 51.95 23.38 Key Term Extraction, NTU Branching entropy performs well Supervised approaches are better than unsupervised approaches K-means Exempler outperforms TFIDF

Experiments AB: AdaBoost NN: Neural Network F-measure 67.31 • Overall Performance 62.70 62.39 57.68 55.84 52.60 51.95 43.51 23.38 20.78 Key Term Extraction, NTU Supervised learning using neural network gives the best results The performance of ASR is slightly worse than manual but reasonable

Conclusion Key Term Extraction, NTU

Conclusion • We propose the new approach to extract key terms • The performance can be improved by • Identifying phrases by branching entropy • Prosodic, lexical, and semantic features together • The results are encouraging Key Term Extraction, NTU

Speaker: 黃宥、陳縕儂