400 likes | 595 Views
Partial Dependency Parsing for Irish. Elaine Uí Dhonnchadha & Josef Van Genabith. Aims of the Research. To be able to parse and/or chunk unrestricted Irish text To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way
E N D
Partial Dependency Parsing for Irish Elaine Uí Dhonnchadha & Josef Van Genabith
Aims of the Research • To be able to parse and/or chunk unrestricted Irish text • To account for as much of the syntactic phenomena of Irish as possible in an efficient and principled way • To use open-source software a far as possible
Outline of the Talk • Background • Stages of Development for Dependency Parser • Chunker • Future Work
Irish Language – some facts • Celtic Language • Goidelic (Irish, Manx, Scottish Gaelic) • Brittonic (Breton, Cornish, Welsh) • Verb – Subject – Object sentence word order • Chaith Seán an liathróid.Threw Seán the ball.V S O‘Seán threw the ball’ • Fixed word order
Irish Language • Inflectional language • gender: fem/masc • case: common/genitive/vocative • verbs inflected for number and person • chuala mé, I heard (analytic) • chualamar, we heard (synthetic) • Initial mutation of words • cailín ‘girl’, an chailín ‘the girl’ • arán ‘bread’, an t-arán ‘the bread’ • seachtain ‘week’, an tseachtain ‘the week’ • bord ‘table’, ar an mbord ‘on the table’
Irish Language • Prepositions inflected for person and number. • Labhair sé liom faoiSpoke he with-me about-it‘He spoke to me about it’ • Tabhair dom éGive to-me it‘Give it to me’ • Full paradigm for every preposition • liom ‘with-me’leat ‘with-you’leis ‘with-him/it ETC. ETC.
Irish Language • Verbal noun - used in progressives, perfects, infinitives, etc. • De-verbal nouns: bris(v) ‘break’, briseadh(vn) ‘breaking • De-agentive nouns: feirmeoir(n) ‘farmer’, feirmeoireacht(vn) ‘farming’ • Progressive • Tá mé agoscailtan doraisIs he at opening(vn) the door(gen)‘He is opening the door’ • After Perfect • Tá mé tar_éis an dorasaoscailtIs me after the door PRT opening(vn) ‘I am after opening the door’
Parsing Methodology • Dependency Analysis & Constituency Analysis • Dependency Analysis • Relationships between pairs of words • Grammatical Functions and Head-Modifier dependencies • Root and terminal nodes • Constituency Analysis • Phrase Structure Rules, e.g. S = NP VP • Hierarchical structure; root, phrase categories, leaf/terminal nodes
Dependency Analysis • Issues in the theoretical syntax of Irish on which there is no clear concensus … • The non-adjacency of verb and object in a VSO language, i.e. difficulties with VP • Some periphrastic aspectual constructions in Irish, e.g. progressive aspect has more nominal than verbal characteristics … • Dependency Analysis includes semantic as well as synactic information
Dependency Parsing • A dependency analysis looks at dependencies between pairs of words (which do not have to be adjacent) in a sentence • The tokens present in the input string are annotated without introducing any abstract categories (e.g. phrasal nodes) • i.e. dependency analysis consists of a root, and leaf nodes, without intermediate levels • Grammatical functions such as subject, object, predicate, as well as various types of prepositional phrase, e.g. adverbial, aspectual, predicative, etc. are annotated • Clauses and head-modifier dependencies are identified
DO S Dependency Parsing • Surface-oriented, bottom-up parsing • Dependency relations between pairs of tokens • Grammatical functions • Head-modifier relations • Tokens not necessarily adjacent. V Det N Det N Bhris an fear a rúitínBroke the man his ankle‘The man broke his ankle’
Previous NLP Work • Tokenization & Morphological Analysis • Finite-State Morphology: (Karttunen, Beesley, 1999; 2003) • Finite-State Morphological Analyser & Generator for Irish: (Uí Dhonnchadha, 2002) • POS Tagging and Parsing • Constraint Grammar (CG): Karlsson et al (1995), • Constraint Grammar Parser CG-2 (Tapanainen, 1996), • VISL CG3 (Bick et al, 2003 ...) http://visl.sdu.dk • Chunking: • Partial Parsing via Finite-State Cascades (Abney, 1996)
Stages of Development • Define the Syntactic Phenomena to include • Gather Test Data • Decide on Parsing Methodology • Decide a Tag-Set for dependency and grammatical relations • Develop Linguistic Rules for dependency analysis • Test the rules • Evaluate the results
Syntactic Phenomena • Sources of Information • Grammar books • Previous research on aspects of Irish Syntax • Simple declarative sentences (incl. neg. and interrogative) • Relative clauses • Copular constructions • Non-finite complements • Adjuncts
Test Data (Gold Standard) • Sample Sentences • Short invented grammatical sentences (225) based on grammar books etc. • Automatically POS tagged and manually checked and corrected • Dependency tagged and manually checked and corrected • Chunked and manually checked and corrected • Corpus Data • Corpus data – 250 real sentences • randomly selected from the 3000 sentence Gold Standard POS Tagged Corpus • Dependency tagged, chunked and manually checked and corrected
Tag Set • Grammatical Functions • @SUBJ, @OBJ, @FMV, @FAUX @CLB, etc. • Unlabelled depedencies • @>N, @N<, @P<, etc. • Start with the @ symbol, by convention, to distinguish them from morphosyntatic tags • “Fuair” faigh +Verb+VTI+PastInd+Len+@FMV • This tagset follows the style of tags described for English (Karlsson, 1995), and for Danish (Bick, 2003),[1] • However, there is not a prescribed list of tags for CG, which allows us to tailor the tagset to the language. [1] Other languages are also detailed on the VISL website: http://visl.sdu.dk/corpus_linguistics.html
Parsing Methodology: Constraint Grammar • Aims (Karlsson et al., 1995) • assign the appropriate morphological and syntactic information according to the context of each token or larger structure in the text; • assign an analysis to every string in the input, bearing in mind that unrestricted text will contain typographical errors, non-sentential fragments, dialectal and colloquial material; • if an ambiguity cannot be resolved, the alternative analyses are retained rather than forcing a (possibly incorrect) choice
Constraint Grammar Principles • Differences between CG and other parsing methodologies (Karlsson, 1995, p37). • Unlike a context-free grammar, a Constraint Grammar does not attempt to define the set of grammatical sentences in a language. • ‘... everything is licensed which is not explicitly ruled out’ • makes it more robust in handling unrestricted text • Does not aim to produce a minimal set of general rules – a CG grammar can contain many specific lexically-specific rules to handle special cases. • Doesn’t attempt to determine constituency structure.
CG Dependency Rules • MAP (@TAG) TARGET (POS) IF (CONDITIONS); • e.g. • MAP (@FMV) TARGET (Verb) IF (NOT 0 VSYNTH OR AUX) (NOT -1 RELPART) (NOT -2 RELPART); • SETS • LIST VSYNTH = (Verb 1P) (Verb 2P) (Verb 3P) (Verb Auto) ; • LIST AUX = ("bí") ("téigh") ("tosaigh") ("tosnaigh") ("féad") ("caith") ("féach"); • LIST RELPART = (Vb Rel) (Prep Rel) ;
Order of Implementation of Rules • Dependency Analysis is carried out in the following order: • Clause Boundaries • Verbs and/or Copulas • Preposition Heads • All Dependent Modifiers • Subject • Predicates of Copular Constructions • Object(s) • Adverbials • Other
Example (1) • Fuair faigh+Verb+VT+PastIndsé sé+Pron+Pers+3P+Sg+Masc+Sbjleabhar leabhar+Noun+Masc+Com+Sgins i+Prep+Art+Sgan an+Art+Sg+Defsiopa siopa+Noun+Masc+Com+Sg+DefArt • Fuair sé leabhar ins an siopaGot he book in the shopV Pro N Prep Det N@FMV @SUBJ @OBJ @PP_ADVL @>N @<P’He got a book in the shop’
Example (1) • Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N@FMV @SUBJ @OBJ @PP_ADVL @>N @P< ’He got a book in the shop’
Example (1) • root Fuair sé leabhar ins an siopa Got he book in the shop V Pro N Prep Det N@FMV @SUBJ @OBJ @PP_ADVL @>N @<P ’He got a book in the shop’
Example (2) • Chonaic Máire an fear a bhí ag itheSaw Máire the man that was at eatingV N Det N Rel V Prep VN@FMV @SUBJ @>N @SUBJ_REL @>V @FAUX @PP_ASP @<P‘Máire saw the man that was eating’ • ag ithePrep VN FORM@PP_ASP @<P FUNCTION‘eating’
POS Tagged Text CG Mapping Rules Dependency Analysis Test against Gold Std. Development/Test Cycle
Evaluation of Dependency Analysis • Sample Sentences: 225 short grammatical sentences Precision (Test Suite): • Gold Standard Dependency Analysis Corpus • 250 sentences randomly selected from the 3,000 sentence Gold Standard POS Tagged Corpus
Chunking • Using the Dependency Annotations and a Regular Expression Grammar (implemented using Xerox Finite-State Tools[1]) we can identify phrase-like structures, described by Abney (1991) as 'chunks'. • [1] For details see http://www.cis.upenn.edu/~cis639/docs/xfst.html
Implementation • Regular expressions and Xerox FST • Chunks • [NP .. ] , [V .. ] etc. • PP with embedded NP • [PP .. [NP .. ] ] • Conjunction with embedded conjoint • [CJ2 .. [?] ] • [NP úlla ] [CJ2 agus [NP oráistí NP] ] • ‘apples and oranges’ • Aspectual phrases • [ASP [PP-ASP .. [NP ..] ] ([OA ..]) ] • [ASP [PP-ASP ag [NP dúnadh ] ] [OA an dorais] ]] • ‘closing the door’
Example (3) • "<Tá>" "bí" Verb VI PresInd @FAUX Is"<sé>" "sé" Pron Pers 3P Sg Masc Sbj @SUBJ he"<ag>" "ag" Prep Simp @PP_ASP at"<rith>" "rith" Verbal Noun VTI @P< running • ‘He is running’ • [S[V Tá bí+Verb+VI+PresInd+@FAUX] [NP sé sé+Pron+Pers+3P+Sg+Masc+Sbj+@SUBJNP] [ASP [PP-ASPag ag+Prep+Simp+@PP_ASP[NPrith rith+Verbal+Noun+VTI+@P<NP] PP-ASP] ASP]S]
Regular Expession Chunker • ########################################################### • # Verb Chunk Dependency Tags • ########################################################### • define VTag [%@FAUX|%@FAUX%_REL|%@FMV|%@FMV%_REL]; • define VSTag [%@FAUX%_SUBJ|%@FAUX%_REL%_SUBJ| %@FMV%_SUBJ|%@FMV%_REL%_SUBJ]; • define PreVTag [%@%>V]; • # Verb Pre & Post Modifiers • define PreVStr [TokLemMTag PreVTag SP]; • # Verb Chunk • define VStr [TokLemMTag VTag SP]; • define VChunk [PreVStr* VStr]; • define VChunkBr [VChunk @-> "[V " ... " ] "]; • # Verb_Subject Chunk • define VSStr [TokLemMTag VSTag SP]; • define VSChunk [PreVStr* VSStr]; • define VSChunkBr [VSChunk @-> "[VS " ... " ] "];
Example (4) • Tá bí+Verb+VI+PresInd+@FAUX Is mé mé+Pron+Pers+1P+Sg+@SUBJ_ASP Iag ag+Prep+Simp+@PP_ASP atdéanamh déanamh+Verbal+Noun+VTI+@P< making cáca cáca+Noun+Masc+Gen+Sg+@OBJ_ASP cake. .+Punct+Fin+ . • ‘I am making a cake’ • [S[V Tá bí+Verb+VI+PresInd+@FAUXV][NP mé mé+Pron+Pers+1P+Sg+@SBJ_ASPNP][ASP[PP-ASP ag ag+Prep+Simp+@PP_ASP[NP déanamh déanamh+Verbal+Noun+@P< NP] PP-ASP][OA cáca cáca+Nn+Msc+Gen+Sg+@OBJ_ASPOA] • ASP] . .+Punct+Fin S]
Corpus Data • Ach sin an toradh is measa a fhéadfadh tarlú don pháirtí agus déarfaidís leat nár cóir an iomad airde a thabhairt do na pobalbhreitheanna nach raibh riamh fabhrach do na páirtithe beaga.'But that is the worst possible result for the party and they would say to you that it is not right to pay too much attention to the opinion polls that were never favourable to small parties.‘
Dependency Analysis • [S • [CONJ Ach ach+Conj+Subord+@CLB ] • [COP Sin sin+Cop+Pro+Dem+@COP_SUBJ ] • [NP an an+Art+Sg+Def+@>N toradh toradh+Noun+Msc+Com+Sg+DefArt+@PRED • is is+Part+Sup+@>ADJ • measa olc+Adj+Comp+@N< NP] • [VP a a+Part+Vb+Rel+Direct+@CLB • fhéadfadh féad+Verb+VTI+Cond+Len+@FAUX_REL ] • [INF tarlú tarlú+Verbal+Noun+VTI+@INF INF] • [PP don do+Prep+Art+Sg+@PP_ADVL • [NP pháirtí páirtí+Noun+Masc+Com+Sg+Len+@P< NP] PP] • [CB agus agus+Conj+Coord+@CLB ] • [VS déarfaidís abair+Verb+VTI+Cond+3P+Pl+@FMV+SUBJ] • [PP leat le+Pron+Prep+2P+Sg+@PP_ADVL PP] • [COP nár is+Cop+Past+Rel+Neg+@CLB ] • [PRED cóir cóir+Adj+Base+@PRED ] [INF an an+Art+Sg+Def+@>N iomad iomad+Subst+Noun+Sg@OBJ_INF airde aird+Noun+Fem+Gen+Sg+@N< [I a a+Prep+Simp+@PP_INF thabhairt tabhairt+Verbal+Noun+VTI+Len+@P<I] INF] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N pobalbhreitheanna pobalbhreith+Noun+Fem+Com+Pl+@P< NP]PP] [V nach nach+Part+Vb+Neg+Rel+@CLB raibh bí+Verb+PastInd+Neg+Len+@FMV_REL ] [PRED riamh riamh+Adv+Its+@>ADJ ] fabhrach fabhrach+Adj+Base+@PRED ] [PP do do+Prep+Simp+@PP_ADVL [NP na na+Art+Pl+Def+@>N páirtithe páirtí+Noun+Masc+Com+Pl+DefArt+@P< beaga beag+Adj+Com+NotSlen+Pl+@N< NP] PP] . +Punct+Fin S]
Evaluation of Chunker • Evalb program used to evaluate bracketing of 250 sens. 150 Development Set Sentences 100 Test Set Sentences
Future Work • Partial Parsing to date as we have not addressed • Co-ordination • He packed his [clothes] and [shoes] • [He packed his clothes] and [left] • PP-attachment • [He] [stabbed] [the man with the knife] • [He] [stabbed] [the man] [with the knife] • PP-function • locative vs. stative • adjunct v.s. indirect object • adding additional info in the FS Lexicons, e.g. noun sub-classes, subcategorisation frames for verbs • Irish Text Processing Tools: http://www.scss.tcd.ie/Elaine.UiDhonnchadha/irish.utf8.htm