220 likes | 520 Views
Hindi Analysis System. Sunil Kumar Dubey Indian Institute of Technology Bombay. Format of Discussion. Enconversion Overview Working of Enconverter Examples Ambiguity resolution. Morphological. Syntactic. Semantic. Enconversion Overview. Enconverter Engine. Hindi Analysis Rules.
E N D
Hindi Analysis System Sunil Kumar Dubey Indian Institute of Technology Bombay
Format of Discussion • Enconversion Overview • Working of Enconverter • Examples • Ambiguity resolution
Morphological • Syntactic • Semantic Enconversion Overview • Enconverter Engine • Hindi Analysis Rules • Dictionary
Morphological Analysis Study of word transformation and extract information about the Tense, Mood, Gender. • Noun Morphology • Verb Morphology • Adjective Morphology
Engine Algorithm 1) Start scanning from left 2) Picks all morphemes from dictionary 3) Choose rule according to candidate word 4) Apply analysis rule and action performed according the type of rule 5) Process ends when only the predicate remains Output in UNL format
Analysis Rules Enconverter Dictionary ni-1 ni+3 Node List ni ni+1 ni+2 C C C A A A D Node-net C B E Working of Enconverter
Working of Enconverter Contd… • Condition Window Check two neighboring nodes on both sides of analysis window to judge whether analysis rule is applicable or not. • Analysis window to apply one of the analysis rule.
Universal word Headword Attribute list Flags Dictionary [QaIro] {} “slow(icl>how)” (ADV,MAN) <H,0,0>; [AcC] {} “good(aoj>thing)” (ADJ,AdjA,QUAL) <H,0,0>; [Ka] {} “eat(icl>do)” (V,VINT,VA) <H,0,0>; [jaapana ] {} “Japan(icl>place)” (N,P,PLACE,INANI,3SG) <H,0,0>;
Semantic relation can be generated by this rule > {N,ANI : : agt :}{V,^AGTRES :+AGTRES : :} (STAIL)P20 Priority Left analysis Window Right analysis Window Condition window Rule type What is a rule? For example : Syaama jaata hO.
plc(play(icl>do).@entry.@present.@progress, field(icl>ground)) agt(play(icl>do).@entry.@present.@progress, Mohan(icl>person)) cag(play(icl>do).@entry.@present.@progress, Shyam(icl>person)) @entry play obj(play(icl>do).@entry.@present.@progress, football) plc agt obj cag field Mohan Shyam football Simple sentence maaohna maOdana maoM Syaama ko saaqa fuTbaa^la Kola rha hO.
see read @entry @entry obj:01(read(icl>do).@entry.@present.@progress, book) agt:01(read(icl>do).@entry.@present.@progress, Mohan(icl>person)) agt agt obj obj agt(see(icl>event).@entry.@past, I(icl>person)) obj(see(icl>event).@entry.@past, :01) book :01 :01 I I Mohan Clausal Sentence Noun Clause maOMnao doKa ik maaohna iktaba pZ, rha hO.
Long Sentence [sa ]_oSya ko ilae‚ Aa[- TI yaU ek bahupxaIya gaaoYzI p`dana krtI hO jahaÐ sarkarI AaOr gaOr–sarkarI saMsqaaeÐ AapsaI ihtaoM ko xao~aoM mao samaJaaOtaoM pr baatcaIt krnao ko ilae imala sakoM AaOr eosao maanadNDaoM kao gaZ, sako jaao dUrsaMcaar saMsqaanaaoM ko inaiva-Qna pircaalana kao sauinaiScat kroM AaOr saBaI doSaaoM maoM [nakI phuÐca kao baZ,avaa do sakoM. obj(provide(icl>do).@entry.@present, forum(icl>seminar)) pur(provide(icl>do).@entry.@present,purpose(icl>intention)) aoj(provide(icl>do).@entry.@present, ITU(icl>International Telecommunication Union)) mod(purpose(icl>intention), this:00) scn(forge.@past.@ability, forum(icl>seminar)) qua(forum(icl>seminar), one)
Long Sentence Contd… aoj(multilateral, forum(icl>seminar)) obj(forge.@past.@ability, standard(icl>measure).@pl) obj(forge.@past.@ability, meet(icl>event).@past.@ability) aoj(meet(icl>event).@past.@ability, institute(icl>facilities)) pur(meet(icl>event).@past.@ability, discuss(icl>talk)) obj(discuss(icl>talk), agreement(icl>pact).@pl) scn(agreement(icl>pact).@pl, field(icl>category).@pl) mod(field(icl>category).@pl, benefit(icl>advantage).@pl) mod(benefit(icl>advantage).@pl, mutual(icl>)) mod(institute(icl>facilities), government) and(private, government) aoj(ensure, standard(icl>measure).@pl) mod(standard(icl>measure).@pl, such) and(promote(icl>do).@past.@ability, ensure)
Long Sentence Contd… obj(ensure, operation(icl>action)) mod(operation(icl>action), resource(icl>abstract thing).@pl) aoj(smooth, operation(icl>action)) mod(resource(icl>abstract thing).@pl, telecommunication(icl>communication)) obj(promote(icl>do).@past.@ability, access(icl>)) scn(access(icl>), country(syn>nation,equ>team).@pl) mod(access(icl>), these) aoj(all(icl>quantity), country(syn>nation,equ>team).@pl)
Inclusion Of Tag • To clarify the syntax structure of sentence Syaama nao Kato hue baccao kao doKa. • To clarify the role of component of a sentence Aapkao imaza[- iKlaanaI pD,ogaI.
Syntax Structure tags <s> </s> sentence start and sentence end <p> </p> phrase start and phrase end <c> </c> conjunction start and conjunction end
@entry @entry See see agt obj agt obj Shyam child coo Shyam child agt eat eat Syaama nao Kato hue baccao kao doKa. Syaama nao <p> Kato hue baccao kao </p> doKa. Phrase Tag
Role Component tag <&[part of speech]> Specify part of speech <#[UW][.attribute]> Specify UW and/or attribute <-[relation]> Specify relation
@entry give give @entry agt obj ben obj you you sweet sweet Aapkao <-ben> imaza[- iKlaanaI pD,ogaI. Aapkao <-agt> imaza[- iKlaanaI pD,ogaI. Relation Tag
Conclusion • handle all the relation labels in the UNL specification. • Can deal with simple, clausal and interrogative sentences. • We have handled different corpuses e.g Agriculture corpus, ITU corpus • There are around 6000 rules in the rule file