460 likes | 577 Views
Finding your way through the woods with GrETEL. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde. TABU-dag - June 14, 2013. GrETEL. Gr eedy E xtraction of T rees for E mpirical L inguistics Query engine for treebanks
E N D
Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query engine fortreebanks • Nederbooms projectExploitationof Dutch treebanksfor research in linguistics • Goals • User-friendly tools • Access to large data files • Fastand accurate
GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch)
GrETEL • Greedy Extraction of Trees for Empirical Linguistics • Query engine for treebanks • Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch) • Parsere.g. Alpino (Van Noord 2006)
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.”
ALPINO PARSER Dit is een zin. >> ALPINO parser >> “This is a sentence.” XML trees Query language: XPath
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
XPATH //node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample
GrETEL • GreedyExtraction of Trees forEmpiricalLinguistics • Query treebanksbyexample • First version => onlyfor LASSY treebank • New release => GrETELfor CGN treebank => update based on user reviews
GrETEL the user • Example sentence • Indicate relevant itemsof the sentence • (Adapt XPath) • Select treebank • Inspect results • Parser (Alpino) • AutomaticallygenerateXPathexpression • Present results
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
CASE STUDY • Verbswithfixedpreposition • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witchwith a heavy heart .’ • VERB + (…+) PREP LASSY: • Xpath query //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
CASE STUDY • Verbs with fixed preposition • E.g. Hij keeknaar de heks. ‘he was lookingat the witch .’ • Discontinuous constructions! • E.g. Hij keek met een bang hartje naar de heks. ‘he was lookingat the witch with a heavy heart .’ • VERB + (…+) PREP
Other treebank, other format … Hij keek met een bank hartje naar de heks • CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] • LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
Other treebank, other format … Hij keek met een bang hartje naar de heks CGN /node[@cat="smain" and node[@rel="hd" and @pt="ww" and @lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]] LASSY //node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]
RESULTS Verb plus fixed preposition • E.g. Hij keek naar de heks. ‘A number of trees fell down.’ • VERB + (…+) PREP 4004 matches in 3881 sentences
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
SEARCH OPTIONS Below annotation matrix
SEARCH OPTIONS Green versus red word order in Dutch • green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft • red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
OUTLINE • GrETEL in a nutshell • GrETEL demo • Case study • Search options • Conclusions and future work
CONCLUSIONS • GrETEL: search engine for Dutch treebanks • Input = naturallanguageexample • Output = sample of similarsentences • Syntacticconcordancer • Available online (via Mozilla Firefox) • No installationrequired
FUTURE WORK • GrETEL2.0 • IncludeSoNaR corpus (ca 500M tokens) • More generic • AfriBooms • GrETELfor Afrikaans • Includeothertreebank formats
CASE STUDY • Collective noun constructions • E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’ • DET + NOUN + PLURAL NOUN • Discontinuous constructions! • E.g. Een groot aantal oude bomen zijn omgevallen. ‘A large number of old trees fell down.’
Try it yourself at http://nederbooms.ccl.kuleuven.be/eng/gretel Thanks for your attention!
Waaraan vs Waar … aan Waar denk je aan ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results) • Waar bemoei je je mee? • Wanneer gaat een koortsstuip over in epilepsie?
Waaraan denk je ? //node[@cat="top" and node[@rel="--" and @cat="whq" and node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results) • Waarom werken we ? • Waartoe verbind ik mij als ouder door dit formulier in te vullen ? • Vanwaar die gulle hand van een Turkse overheid die in de schulden zwemt ?
Hij klom de boom in //node[@cat="top" and node[@rel="--" and @cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results) • Door haar winst komt Clijsters de top-20 binnen . • In feite ging minder dan de helft van Dorsets de rivier over . • Nederland gaat de bezettingstijd in .