170 likes | 313 Views
Summarization using Event Extraction Base System . 01/12 KwangHee Park . Research goal. Summarize the article by categorize the subject of article Not just extract key sentence but rearrange the sentence by subject of event Easily understand what happen each subject. Research goal.
E N D
Summarization using Event ExtractionBase System 01/12 KwangHee Park
Research goal • Summarize the article by categorize the subject of article • Not just extract key sentence but rearrange the sentence by subject of event • Easily understand what happen each subject
Research goal • Extract event and rearrange them by subject • The north • Launched 170 artillery shells • Used both direct-firing guns and howitzers • … • South Korean forces • Fired back only 80 shells • … • South Korean marines • First evacuated to safe places • … Summarization from raw text
Architecture On the other hand, it’s turning out to be another very bad financial week for Asia. The financial assistance from the World Bank and the International Monetary Fund are not helping. In the last twenty four hours, the value of the Indonesian stock market has fallen by twelve percent. The Indonesian currency has lost twenty six percent of its value. In Singapore, stocks hit a five year low. In the Philippines, a four year low. And in Hong Kong, a three percent drop. More in Hong Kong for a place, for an economy, that many experts thought was once invincible Raw text Event recognizer Subject assigner Categorizer
Architecture On the other hand, it’s turning out to be another very bad financial week for Asia. The financial assistance from the World Bank and the International Monetary Fund are not helping. In the last twenty four hours, the value of the Indonesian stock market has fallen by twelve percent. The Indonesian currency has lost twenty six percent of its value. In Singapore, stocks hit a five year low. In the Philippines, a four year low. And in Hong Kong, a three percent drop. More problems in Hong Kong for a place, for an economy, that many experts thought was once invincible Raw text Event recognizer Subject assigner Categorizer
Architecture On the other hand, it’s turning out to be another very bad financial week for Asia. The financial assistance from the World Bank and the International Monetary Fund are not helping. In the last twenty four hours, the value of the Indonesian stock market has fallen by twelve percent. The Indonesian currency has lost twenty six percent of its value. In Singapore, stocks hit a five year low. In the Philippines, a four year low. And in Hong Kong, a three percent drop. More problems in Hong Kong for a place, for an economy, that many experts thought was once invincible Raw text Event recognizer Subject assigner Categorizer
Architecture Indonesian stock market Fallen by twelve percent Indonesian currency Lost twenty six percent Singapore stock Five year low The Philippines stocks Four year low Hong Kong stock Three percent drop Raw text Event recognizer Subject assigner Categorizer
Event Extraction • Event • An instance of a topic identified at document level describing something that happen • Event extraction • Extract event with their argument from the text • Example : • The Nasdaq Financial index lost about 1%,or 3.95, to 448.80. • <s>The <ENAMEX TYPE="ORGANIZATION">Nasdaq Financial Index</ENAMEX> <EVENT eid="e229" class="OCCURRENCE" >lost</EVENT> about <NUMEX TYPE="PERCENT">1%</NUMEX>, or 3.95, <SIGNAL sid="s364" >to</SIGNAL> 448.80.</s>
Event recognizer • Recognize whether the word is used as event or not • The Nasdaq Financial Index lostabout 1%, or 3.95, to 448.80. • The Nasdaq Financial Index <EVENT>lost</EVENT> about 1%, or 3.95, to 448.80. • In this example, only the word ‘lost’ is used as event word.
Event recognizer • Rule-based recognition • Training Feature • POS tag only • Any verb pos tagged word except be verb and have verb • Word dependency with POS tag – standard Stanford word dependency • 55 number of grammatical binary relations. • Bi-gram POS tagged context
Experiment • Corpus • Timebank 1.1 annotated corpus • 176 number of document • 2603 number of sentences • 7168 number of events • Use • Stanford parser • Stanford POS tagger • 3-fold cross validation
Subject assigner • Select Subject of given event word or phase • Subject means the main agent of given event • Step1 • Make set of candidate subject • Step2 • make relevant subject-event fair
Subject assigner – Baseline feature • Step1 • Make deepest depth NP chunk from parser tree • Step2 • Assign right foreword NP chunk to Event word • EX) Finally today, welearned that the space agency has finally takena giant leap forward. NP Result We – learned The space agency - taken NP NP3 NP NP1 Event NP2 Event
Experiment result • Corpus • Manually annotated corpus based on TimeBank 1.1 Corpus • 100 sentence containing 158 number of event • Result • 82 / 158 = 52% accuracy
Conclusion • So far I Implement base line System • Need to improve each component by accuracy • Each of component need to be solved different problem • Event recognizer, Subject assigner : need more suitable feature • Categorizer : how to treat the pronoun type subject Event recognizer Event recognizer Subject assigner Subject assigner Categorizer Categorizer