170 likes | 326 Views
Annotation for Hindi PropBank. Outline. Introduction to the project Basic linguistic concepts Verb & Argument Making information explicit Null arguments. Tasks to be carried out Tools for annotation Timesheets, tips P ractice. Creation of Resources. For machines rather than humans
E N D
Outline • Introduction to the project • Basic linguistic concepts • Verb & Argument • Making information explicit • Null arguments • Tasks to be carried out • Tools for annotation • Timesheets, tips • Practice
Creation of Resources • For machines rather than humans • Imagine a dictionary/ thesaurus for computers • A requirement for Natural Language Processing • Large annotated resources • Annotation implies addition of linguistic information • Tailored to language specific requirements • Needs to be as consistent as possible • Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation
Hindi-Urdu Treebank Project • One of the first efforts to make a large-scale resource for Hindi-Urdu • Similar resources exist for Chinese, Arabic and English • Three main components • Hindi-Urdu dependency treebank • Hindi-Urdu PropBank • Hindi-Urdu phrase structure treebank [derived]
PropBank • PropBank resource creation at CU Boulder • We annotate semantic information on top of syntactic information • PropBank involves annotation of predicate argument structure • Mainly concerned with verbs & their arguments • And the semantic nature of the arguments
What are verbs? • Verbs are predicating elements e.gdaud, pii, baras etc • Encode (very broadly) actions and states • Also have two kinds of grammatical information • Tense, aspect (present, future ; perfect, continuous) • Gender, number, person (masc/fem; sing, pl; 1st, 2nd, 3rd )
What are arguments? • In a sentence, e.g Ram ate an apple / Raam ne sebkhaaya: • A verb, ‘eat’ or ‘khaa’ predicate • A person eating ‘Raam’ ARGUMENT • Thing eaten ‘apple’ / ‘seb’ ARGUMENT • Without arguments, the meaning of the verb ‘ate’ is not realized completely • Together, they make up the predicate argument structure of the sentence
Arguments show what’s important • Raam ne jaldi se sebkhaaya • Raam, seb are arguments • But ‘jaldi se’ is not • It’s all about the verb • It projects its need for certain arguments • Sift what’s mandatory from what’s optional
Like Unix commands • Some commands require only one argument. • cd/home/student/ashwini • cphmwk1.txthmwk2.txt • If the command is typed with too many or too few arguments…
Making information explicit • As speakers of Hindi or English, we already have knowledge of predicate argument structure • E.g. hari ___ pahuMcaa • Capturing this knowledge for the machine is essential • Ram ne sebkhaayaaurpaanipiyaa • Who drank the water?
Identify arguments • In PropBank, we first identify arguments of a verb • When explicitly present, they are called ARG • Further, they are numbered as ARG0, ARG1, ARG2 etc. • Often, you have ARG as well as ARG-M • RamARG0 ne jaldiseARG-M sebARG1khaaya
Null arguments • What if arguments are not explicit? • E.g Ram ne sebkhaayaaur___ paanipiyaa • Ram is also the person drinking water • It can be dropped, because of conjunction aur • For the machine, it must be retrieved from the sentence • We also mark these missing or null arguments
Tasks to be carried out • Null argument insertion • Argument annotation
Tools to be used • Sanchay – GUI for annotators. We use it especially for Null argument insertion • Use yourverbs account to access Sanchay • Wiki for annotator resources
Timesheets & tips • Being honest about filling out timesheets is quite important • We can access the amount of time you spend on verbs • I will ask you to keep track of number of annotations per hour to cross check • Turn in the timesheets at my CINC mailbox in physical form, with your signature
Practice • We need to learn about four kinds of empty categories • Plan to proceed • Recognizing syntactic constructions • Getting familiar with the tool • Practice with the corpus • Q & A based on null argument insertion