1 / 16

FINE-GRAINED HIDDEN MARKOV MODELING FOR BROADCAST-NEWS STORY SEGMENTATION

FINE-GRAINED HIDDEN MARKOV MODELING FOR BROADCAST-NEWS STORY SEGMENTATION. Warren Greiff, Alex Morgan, Randall Fish, Marc Richards, Amlan Kundu MITRE Corporation 2001 Technical Papers. Presented by Chu Huei-Ming 2004/12/17. Outline. Introduction Features Coherence X-duration

fifi
Download Presentation

FINE-GRAINED HIDDEN MARKOV MODELING FOR BROADCAST-NEWS STORY SEGMENTATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FINE-GRAINED HIDDEN MARKOV MODELING FOR BROADCAST-NEWS STORY SEGMENTATION Warren Greiff, Alex Morgan, Randall Fish, Marc Richards, Amlan Kundu MITRE Corporation 2001 Technical Papers Presented by Chu Huei-Ming 2004/12/17

  2. Outline • Introduction • Features • Coherence • X-duration • Triggers • Parameter Estimation • Segmentation

  3. Introduction • Fine-grained modeling • Differences in feature patterns to be observed at different points in the development of a news story are exploited • Modeling of the story-length distribution

  4. Generative Model • Generative model • Model the generation of news stories as a 251 state HMM • 1 to 250 correspond to each of the first 250 words of a story • 251 is included to model the production of all words at the end of stories exceeding 250 words in length

  5. Features • Coherence • COHER-1: based on a buffer of 50 words immediately prior to the current word • COHER-2, COHER-3, COHER-4 correspond to similar features • Current word does not appear in the buffer, the value is 0 • If it appear in the buffer the value is –log(Sw/S) • Words that did not appear in the training data, are treated as having appeared once (Add one smoothing) .

  6. Features • X-duration • This feature is based on indications given by the speech recognizer that it was unable top transcribe a portion of the audio signal • The existence of an untranscribable section prior to the word gives a non-zero X-DURATION value based on the extent of the section

  7. Features • Triggers • Correspond to small regions at the beginning and end of stories • Some words are far more likely to occur in these positions than in other parts of a news segment

  8. Features • For a word w, the value of the feature is an estimate of how likely it is for w to appear in the region of interest • is the number of times w appeared in R in the training data • is the total number of occurrences of w • is the fraction of all tokens of w that occurred in the region

  9. Features • Advantage • The prior probability would not be greatly affected for words observed only a few times in the training data • It would be pushed strongly towards the empirical probability of the word appearing in the region for words that were encountered in R • It has a prior probability fR, equal to the expectation for a randomly selected word

  10. Features • Discussion • Model end-of-story words • A trigger related to the last word in a story would be delayed by a one word buffer

  11. Parameter Estimation • Applied non-parametric kernel estimation techniques • Using the LOCFIT library of the R open-source statistical analysis package, which is based on the S-plus system

  12. Parameter Estimation • Transition probabilities • Assumed that the underlying probability distribution over story length is smooth • Conditional transition probabilities can be estimated directly by the probability density estimation of story length

  13. Parameter Estimation • Conditional observation probabilities • Conditional probability distribution over states for a given binned features value p(State=s/Feature=fv)

  14. Segmentation • The Viterbi algorithm is employed to determine the sequence of states most likely to have produced the observation sequence • A boundary is then associated with each word produced from state 1 for the maximum likelihood state sequence

  15. Result • Training : All but 15 of the ABC World News Tonight programs from the TDT-2 corpus • Test : remain 15 corpus • False-alarm: 0.11 • Corresponding miss: 0.14

  16. Result • The x-axis corresponds to time • The y-axis corresponds the state of the HMM model • Return to state=1 correspond to boundaries between stories

More Related