1 / 13

Hao -Chin Chang Department of Computer Science & Information Engineering

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 . Hao -Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/09/05. Outline. Introduction

hilde
Download Presentation

Hao -Chin Chang Department of Computer Science & Information Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Summarization using Conditional Random FieldsDou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng ChenIJCAI 2007 Hao-Chin Chang Department of Computer Science & Information Engineering National Taiwan Normal University 2011/09/05

  2. Outline • Introduction • CRF-based Summarization • Experiments and Result • Conclusion and Future work

  3. Introduction(1/2) • Text document summarization has attracted much attention since the original work by Luhn (1958) • Text mining tasks such as document classification [Shen 2004] • Help readers to catch the main points of a long document with less effort • Summarization tasks can be grouped into different categories • Input • Single document summary • Multiple documents summary • Purpose • Generic summary • Query-oriented summary [Goldstein 1999] • Output [Mani 1999] • Extractive summary • Abstractive summary

  4. Introduction(2/2) • Extractive document summarization • Supervised algorithms • a two class classification problem • classify each sentence individually without leveraging the relation ship among sentences • Unsupervised algorithms • use heuristic rules to select the most informative sentences into a summary directly, which are hard to generalize • Conditional Random Fields (CRF) • avoid two disadvantages • as a sequence labeling problem instead of a simple classification problem • Solve to fail to predict the sequence labels given the observation sequences in many situations because they inappropriately use a generative joint model P(D|S) in order to solve a discriminative conditional problem when observations are given

  5. CRF-based Summarization(1/3) • Observation sequence (sentence sequence) • Corresponding state sequence • The probability of Y conditioned on X defined in CRF • Feature functions • Weights

  6. CRF-based Summarization(2/3) • is the set of weights in a CRF model • is usually estimated by a maximum likelihood procedure in the train data • To avoid overfitting, some regularization methods add variances of the Gaussian priors

  7. CRF-based Summarization(3/3) • Given probability CRF and paremeters, the most probable labeling sequence can be obtained as • We can order the sentences based on andselect the top ones into the summary • Forward value • Back value

  8. Experiment(1/5) • Basic Feature • Position • Thematic word : most frequent word • Upper case word : authors want to emphasize • Similary to Neighboring sentence • Complex Feature • LSA score • HIT score – document must be treat as a graph

  9. Experiment(2/5) • 147 document summary pairs from Document Understanding Conference (DUC) 2001 • Supervise Method • Naive Bayes(NB) • Logistic Regression (LR) • Support Vector Machine (SVM) • Hidden Markov Model(HMM) • Conditional Random Fields (CRF) • Unsupervise Method • Select sentences randomly from the document is denoted as RANDOM • Select the lead sentence in each paragraph is denoted as LEAD • LSA • Graph based ranking algorithm such as HITS

  10. Experiment(3/5) • Random is worst • CRF is best • HMM and LR improve the performance as compared to NB due to the advantages of leveraging sequential information • CRF makes a further improvement by 8.4% and 11.1% , over both HMM and LRin terms of ROUGE-2 and F1 • CRF outperforms HITS by 5.3% and 5.7% in terms of ROUGE-2 and F1

  11. Experiment(4/5) • CRF is still the best method, which improves the values of ROUGE-2 and F1 achieved by the best baselines by more than 7.1% and 8.8% • Compared with the best unsupervised method HITS ,the CRF based on both kinds of features improves the performance by 12.1% and 13.9% in terms of ROUGE-2 and F1 • we compared CRFto the linear combination method used to combine the results of LSA, HITS and CRF based only on the basic features , the best result we can obtain on DUC01 is 0.458 and 0.392 in terms of ROUGE-2 and F1

  12. Experiment(5/5) • 10-fold cross validation procedure, where one fold is for training and the other nine folds for test • we can obtain more precise parameters of the models with more training data • CRF-based methods and the other four supervised methods is clearly larger when the size of the training data is small • HMM is are not particularly relevant to the task of inferring the class labels • The bad performance of NB, LR and SVM overfitting with a small amount of training data.

  13. Conclusion • We provided a framework to consider all available features that include the interactions between sentences • We plan to exploit more features, especially the linguistic features which are not covered in this paper, such as the rhetorical structures

More Related