250 likes | 414 Views
On the Ambiguity of Serbian Texts and Methods to disambiguate it. Cvetana Krstev, Duško Vitas, University of Belgrade. 8 th Intex/Nooj Workshop. What is the ambiguity?. the assignment of different lemmas the assignment of different grammatical categories. gore. The ambiguity in Serbian.
E N D
On the Ambiguity of Serbian Texts and Methods to disambiguate it Cvetana Krstev, Duško Vitas, University of Belgrade 8th Intex/Nooj Workshop
What is the ambiguity? • the assignment of different lemmas • the assignment of different grammatical categories
gore The ambiguity in Serbian In Serbian many word forms are homographs although not homophones—stress marks are not recorded: gőre adv. up gőrē adv. worse gòrē P3s goreti,V+Ek to burn gòre A3s gòrē P3s gorjeti,V+Ijk to burn gòre A3s gòre fs2 gora forest
The ambiguity in Serbian (2) e : form is the same for definite, indefinite rodoslovna,rodoslovni.A2+PosQ:akms2g:akms4v:aefs1g:aefs5g:akns2g:aenp1g:aenp4g:aenp5g rodoslovne,rodoslovni.A2+PosQ:aemp4g:aefs2g:aefp1g:aefp4g:aefp5g rodoslovni,rodoslovni.A2+PosQ:adms1g:aems4q:aems5g:aemp1g:aemp5g rodoslovnih,rodoslovni.A2+PosQ:aemp2g:aefp2g:aenp2g rodoslovnim,rodoslovni.A2+PosQ:aems6g:aemp3g:aemp6g:aemp7g:aefp3g:aefp6g:aefp7g:aens6g:aenp3g:aenp6g:aenp7g rodoslovnima,rodoslovni.A2+PosQ:aemp3g:aemp6g:aemp7g:aefp3g:aefp6g:aefp7g:aenp3g:aenp6g:aenp7g rodoslovno,rodoslovni.A2+PosQ:aens1g:aens4g:aens5g rodoslovnog,rodoslovni.A2+PosQ:adms2g:adms4v:adns2g rodoslovnoga,rodoslovni.A2+PosQ:adms2g:adms4v:adns2g rodoslovnoj,rodoslovni.A2+PosQ:aefs3g:aefs7g rodoslovnom,rodoslovni.A2+PosQ:adms3g:adms7g:aefs6g:adns3g:adns7g … ← 9 sets of grammatical categories g : form is the same for animate and inanimate
Disambiguation process • Reconstructing word forms • Using filter dictionaries • Using restricted dictionaries • Using dictionaries of compounds • Using disambiguation grammars
i izdavanxem YUBA kartica 20. februara 2002. godine. celog sistema. Zato je josx pocyetkom 1996. godine jedan i www.plivamed.net. U petom mjesecu 2001.godine smo oformlx cxe biti odrzxan u novembru ove godine u Neumu, a za prvog Reconstructing word forms – date adverbial phrases (2)
Reconstructing word forms – forms written with digits(2) sxkovi iznosili oko 500 hilxada maraka. Znacyajna usxteda poput SAP-ovog ili IBM-ovog, dobijate i organizaciju firme cyelicyne industrije 1890-ih nije postojao. Ali, poznata je sveta drma tezxinom od 81,7 milijardi dolara u 160 zemalxa, odnosno ukupno bezmalo pola milijarde (464 miliona)! Predxe
mi,ja.PRO01+Prs:sx3i mi,mi.PRO03+Prs:px1r mi,miti.V35+Imperf+Tr+Iref+Ref:Ays:Azs li,li.PAR li,liti.V98+Imperf+Tr+It+Iref:Ays:Azs Using filter dictionaries
Very cautious filter dictionary with only 41 entries: Using filter dictionaries (2)
Using restricted dictionaries • Dictionaries contain lemmas for both standard pronunciations – Ekavian and Ijekavian. Text, however, are usually written in only one. • Dictionaries contain lemmas for both Serbian and Croatian language (or variant of Serbo-Croatian)
crvene,crven.A17+Col:aemp4g:aefs2g:aefp1g:aefp4g:aefp5g crvene,crveneti.V547+Imperf+It+Iref+Ref+Ek:Pzp:Ays:Azs crvene,crveniti.V54+Imperf+Tr+Iref:Pzp crvene,crvenxeti.V747+Imperf+It+Iref+Ref+Ijk:Pzp Using restricted dictionaries (2)
bez obzira na,bez obzira na.PREP+C+Ncn+p4 bez,bez.PREP+p2 na,na.INT na,na.PREP+p4+p7 obzira,obzir.N1:ms2q:mp2q obzira,obzirati.V519+Imperf+It+Ref:Ays:Azs Using dictionary of compounds
Using disambiguation grammars – positional constraint It is interjection, if it is followed by an exclamation mark.
Using disambiguation grammars – positional constraint (2) After sentence or phrase boundary, “mi” and “ti” are personal pronouns in nominative case (after other possibilities were excluded)
Using disambiguation grammars – sequential constraint “da” is a conjunction (and not a form of a verb dati – to give – if is followed by an auxiliary verb in clitic form)
Using disambiguation grammars – sequential and positional constraints sxargarepe evropska unija ne samo da je prihvatila nasxu i da,.CONJ da,.ADV da,.INT da,.PAR da,dati.V103+Perf+Tr+Iref+Ref:Pzs:Ays:Azs
Using disambiguation grammars – agreement An adjective, possessive pronoun or numeral has to agree in gender, number, and case with a noun that follows
Using disambiguation grammars – agreement (2) povecxati nxegov proboj u regionu. Rumunska proporcija u,.PREP+p2 u,.PREP+p4 u,.PREP+p7 regionu,region.N1:ms3q regionu,region.N1:ms7q
Using disambiguation grammars – agreement of personal names Special rules of the agreement of first name and surname
Using disambiguation grammars – agreement (2) raspalio je Mladxan Dinkicx sxakom o okrugli sto "Platne kartice - Mladxan,Mladxan.N1002+Hum+NProp+First+SR:ms1v Mladxan,mladxan.A7:akms1g:akms4q Dinkicx,Dinkicx.N28+NProp+Hum+Last+SR:ms1v
The order of grammar application ←Apply first Apply second →
Careful construction of grammars Syntactic ambiguity: Zalagacxu se da ti trosxkovi budu minimalni. I will do my best to minimize these expences. I will do my best to minimize your expences. Although some cases are much more frequent... Kličke je bio voljan da daautomobil. Klicke was willing to give the car. Mislio sam da ti tvoja gospođa ne da da je viđaš. I thought that your misses is not giving to you to see her.