100 likes | 205 Views
Progress report on SRL. Abdul- Lateef Yussif 11-03-2011. Agenda. CoNLL 2004 Shared Tasks Data Set Format of Data Set. Example of annotated sentence. IOB2 Format. The IOB2 format represents chunks which do not overlap nor embed. Words outside a chunk receive the tag O.
E N D
Progress report on SRL Abdul-LateefYussif 11-03-2011
Agenda • CoNLL 2004 Shared Tasks • Data Set • Format of Data Set
IOB2 Format • The IOB2 format represents chunks which do not overlap nor embed. • Words outside a chunk receive the tag O. • For words inside a chunk of type $k, the first word receives the “B-$k” tag (Begin), and the remaining words receive the tag “I-$k” (Inside).
Find potential Arguments • An argument can be any consecutive words • Restrict potential arguments • BEGIN(word) = word begins argument • END(word) = word ends argument • Argument • (wi…..wj) is a potential argument iff • BEGIN(wi) = 1 and END(wj) = 1
Classifiers & Features • I intend to use support vector with the following features • Words • Predicate lemmas • POS • Token Position • Path • Headword • length
Data and evaluation Metrics • CoNLL 2004 dataset • Part of the Propbank Corpus • Consists from the Wall Street Journal of the Penn Treebank • Training (Section 15-18) • Development (Section 20) • Testing data (Section 21)
Hypothesis • Target is to replicate and improved on Best System performance
Questions Thank you