160 likes | 324 Views
Annotation for Hindi PropBank. Outline. Introduction to the project Basic linguistic concepts Verb & Argument Making information explicit Null arguments. Tasks to be carried out Timesheets, tips. Creation of Resources. For machines rather than humans
E N D
Outline • Introduction to the project • Basic linguistic concepts • Verb & Argument • Making information explicit • Null arguments • Tasks to be carried out • Timesheets, tips
Creation of Resources • For machines rather than humans • Imagine a dictionary/ thesaurus for computers • A requirement for Natural Language Processing • Large annotated resources • Annotation implies addition of linguistic information • Tailored to language specific requirements • Needs to be as consistent as possible • Used for applications like Semantic Role Labelling, Parsing, Word Sense Disambiguation
Hindi-Urdu Treebank Project • One of the first efforts to make a large-scale resource for Hindi-Urdu • Similar resources exist for Chinese, Arabic and English • Three main components • Hindi-Urdu dependency treebank • Hindi-Urdu PropBank • Hindi-Urdu phrase structure treebank [derived]
PropBank • PropBank resource creation at CU Boulder • We annotate semantic information on top of syntactic information • PropBank involves annotation of predicate argument structure • Mainly concerned with verbs & their arguments • And the semantic nature of the arguments
What are verbs? • Verbs are predicating elements e.gdaud, pii, baras etc • Encode (very broadly) actions and states • actions: hit, run, throw ; states: think, see, smell • Realization of these actions and states requires participants • Ram ne kaamkiyaa • ‘karnaa’ is realized by the doer & the thing done
What are arguments? • In a sentence, e.g Ram ate an apple / Raam ne sebkhaaya: • A verb, ‘eat’ or ‘khaa’ predicate • A person eating ‘Raam’ ARGUMENT • Thing eaten ‘apple’ / ‘seb’ ARGUMENT • Without arguments, the meaning of the verb ‘ate’ is not realized completely • Together, they make up the predicate argument structure of the sentence
Arguments show what’s important • Raam ne jaldi se sebkhaaya • Raam, seb are arguments • But ‘jaldi se’ is not • It’s all about the verb • It projects its need for certain arguments • Sift what’s mandatory from what’s optional
Like Unix commands • Some commands require only one argument. • cd/home/student/ashwini • cphmwk1.txthmwk2.txt • If the command is typed with too many or too few arguments… you get an error
Making information explicit • As speakers of Hindi or English, we already have knowledge of predicate argument structure • E.g. hari ___ pahuMcaa • Capturing this knowledge for the machine is essential • Ram ne sebkhaayaaurpaanipiyaa • Who drank the water?
Tasks to be carried out • Argument identification & annotation • Null argument identification & annotation
Training • For argument id & annotation: • Learn PropBank labels • Get familiar with annotation tools (Jubilee) • Identifying and labelling correctly • For null argument id & annotation • Recognizing syntactic constructions • Getting familiar with annotation strategy • Practice with doing both arguments & nulls
Training • For argument id & annotation: Mid October • Learn PropBank labels • Get familiar with annotation tools (Jubilee) • Identifying and labelling correctly • For null argument id & annotation Mid Nov • Recognizing syntactic constructions • Getting familiar with annotation strategy • Practice with doing both arguments & nulls
Training • First step: reading documentation • Followed by hands-on practice using Jubilee Note this wiki page: http://resourcesforannotators.wikispaces.com/
Timesheets & tips • Bi-weekly timesheets • I will cross check the number of hours logged in • Turn in the timesheets at my Hellems mailbox in physical form, with your signature • Hellems is located opposite the UMC. The Linguistics dept mailboxes are on the 2nd floor
Timesheets & tips • To get into the payroll system at ICS: • You need to meet Catherine Latzer at CINC • Centre for Innovation and Creativity, 1777 Exposition Drive, Room 171C,catherine.latzer@colorado.edu Phone: 303/735-4282 • Please go with the following 3 items • Your Identification documents e.g. passport • Social Security Number • A voided check. Tear out a check and write VOID across it diagonally