220 likes | 368 Views
Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses. Vanja Štefanec, Kristina Vučković, Zdravko Dovedan University of Zagreb, Faculty of Humanities and Social Science s { vstefane, kvuckovi, zdovedan } @ ffzg.hr NooJ20 10 Komotini. Our goal.
E N D
Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses Vanja Štefanec, Kristina Vučković, Zdravko Dovedan University of Zagreb, Faculty of Humanities and Social Sciences {vstefane, kvuckovi, zdovedan}@ffzg.hr NooJ2010Komotini
Our goal • to determine the boundaries of dependent clauses within the complex sentence • focusing the parser • performing disambiguation of chunks • improving the chunker • to test the adequacy of this model as a pre-parsing method for complex sentences NooJ2010 Komotini
Overview of the work • grammar that can recognize the dependent nounclause (objectclause) in the complex sentence • both simple object clause and coordination of object clauses • by defining the co-text in which object clause can occur • NOT by describing its structure • relying on • output of the chunker • conjunctions, complementizers, punctuations, ... NooJ2010 Komotini
Object clauses in Croatian • very frequent • refer to their superordinateclause predicate as a direct object • three types (according to grammars) • relative (odnosne) • interrogative (zavisnoupitne) • declarative (izrične) NooJ2010 Komotini
Relative object clauses • introduced by relative pronouns and adjectives Jeste li našli [što ste tražili]? Have you found [what you’ve been looking for]? Kupit ću [kakvog nađem]. *I will buy [of the kind I’ll find]. NooJ2010 Komotini
Interrogative object clauses • general (općeupitne) • introduced by interrogative conjunctions ‘li’, ‘da li’ or by interrogative pronouns (‘tko’, ‘koji’, ‘čiji’, ‘što’, …) Još ne shvaćaš [što se dogodilo]. You still don’t understand [what happened]. Zaboravio sam [koji je danas dan]. I forgot [which day it is]. NooJ2010 Komotini
Interrogative object clauses • of place (mjesne) • introduced by interrogative adverbs of place Recite [kamo ste se zaputili]. Tell us [where you are headed]. • of time (vremenske) • introduced by interrogative adverbs of time Nisu rekli [kad će doći]. They didn’t say [when they’ll be coming]. NooJ2010 Komotini
Interrogative object clauses • of manner (načinske) • introduced by interrogative adverb ‘kako’ Još nismo saznali [kako se to dogodilo]. We still haven’t found out [how that happened]. • qualitative (kvalitativne) • introduced by interrogative adjectives ‘kakav’, ‘kakva’, ‘kakvo’ Ne znam [kakav si ti to čovjek]. I don’t know [what kind of a person you are]? NooJ2010 Komotini
Interrogative object clauses • of amount (količinske) • introduced by interrogative adverb ‘koliko’ Znaš li [koliko si već popio]? Do you know [how much you drank already]? • of cause (uzročne) • introduced by interrogative adverbs of cause or prepositional expressions ‘zašto’, ‘zbog čega’, … Ne razumijem [zašto si zakasnio]. I don’t understand [why you are late]. NooJ2010 Komotini
Declarative object clauses • introduced by conjunctions • ‘da’ (most common) • ‘kako’ (less frequent; stylistic variant of ‘da’) • ‘gdje’ (extremely rare; very stylistically marked) Obećao si [da ćeš doći]. You promised [that you’ll come]. Rekli su [kako ga nije briga]. They said [that he doesn't care]. NooJ2010 Komotini
Object clauses in Croatian • have to be preceded by a transitive verb in an active voice form • impossible to predict their function by observing only the structure (Vidio sam)PRED([da se igra])OBJ. I saw that he’s playing. object-clause (Vidio sam)PRED(ga)OBJ([da se igra])ATTR. I saw him playing. adjective clause (Izišao je)PRED(van)ADV([da se igra])ADV. He went out to play. purpose clause NooJ2010 Komotini
Object clauses in Croatian • can be easily confused with subjectclauses • subjectclauses refer either to the nominal predicate or verbal predicate in passive voice forms (Poznato je)PRED([da pušenje uzrokuje rak])SUBJ. It is well known that smoking causes cancer. (Kaže se)PRED([da je bolje spriječiti nego liječiti])SUBJ. It is said that it is better to be safe than sorry. NooJ2010 Komotini
1. 2. 3. 4. The model • can be divided into four parts • the predicate • what can appear between the predicate and object clause • object clause • what can appear after the object clause NooJ2010 Komotini
1. the predicate NooJ2010 Komotini
2. between predicate and the clause NooJ2010 Komotini
3. object clause - conjunctions NooJ2010 Komotini
3. object clause - body NooJ2010 Komotini
4. after the object clause NooJ2010 Komotini
Examples Dodao je ([da približavanje Hrvatske EU ima dvije faze]). Pretpostavimo ([da imate visoke demokratske standarde], [da manjine imaju puna prava], [da su medijske slobode savršene])... Zato savjetuje svima koji namjeravaju podići kredite ([da malo pričekaju, ako to mogu]). Odgovarajući na pitanje hoće li na dogovore iz Mokrica djelovati skorašnji slovenski lokalni izbori, Maštruko je rekao ([kako u to ne vjeruje] te [da bi u slučaju kad bi države svaki put čekale ([da prođu izbori]), pregovaranje bilo nemoguće]). NooJ2010 Komotini
Problems • chunker can not identify the whole VP • undisambiguated chunks • subject clauses • some verbs can take two arguments in accusative case • ‘pitati’ (to ask), ‘učiti’ (to teach), ... • adjective clauses, purpose clauses • identifying the level of subordination • often problem beyond syntax • rules of orthography • proper use of punctuation marks (comma, dash) NooJ2010 Komotini
Evaluation • performed in ideal circumstances • predicate is correctly identified (i.e. chunked) • information about verb valency is present • corpus consists of 174 sentences with 215 object clauses NooJ2010 Komotini
Evaluation • low precision • BUT correct identification in 91% of the cases • average number of results per clause is 2,15 • disambiguation! • high recall • confirms the adequacy of the model • AND we have identified the critical cases so improvements can also be expected NooJ2010 Komotini