360 likes | 483 Views
Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde. CLARIN Sofia, 2012-10-28. NEDERBOOMS. Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG)
E N D
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28
NEDERBOOMS • Exploitation of Dutch treebanks for research in linguistics • CLARIN-VL Project • Centre for Computational Linguistics (CCL) • Dutch Grammar and Language Use (NGTG) • Goals: • User-friendly tools • Access to large data files
NEDERBOOMS How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?
QUERYING LASSY Existing search tools: dtsearch (Kloosterman 2007) Dact (de Kok 2010) stand-alone toolsquery language: XPath = standard query language for xml trees
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"] “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken” //node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]
QUERYING LASSY XPath Not user-friendly Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort
QUERYING LASSY XPath Not user-friendly Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort How to make interaction between computational linguistics and theoretical linguistics possible?
GrETEL Greedy Extraction of Trees for Empirical Linguistics Search tool based on example sentences Input = natural language No explicit knowledge of formal query language nor Alpino grammar required Bridge gap between descriptive and computational linguistics Available online http://nederbooms.ccl.kuleuven.be(optimised for Mozilla Firefox)
GrETEL - input Green versus red word order in Dutch green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)
GrETEL - input >> parsed with Alpino
GrETEL - annotation >> info added to Alpino parse
GrETEL – query info input example
GrETEL – query info input example Alpino parse
GrETEL – query info input example Alpino parse query tree
GrETEL – query info input example Alpino parse query tree XPath query
GrETEL – query info input example Alpino parse query tree XPath query treebanks
GrETEL – query tree >> subtree extraction
GrETEL – query tree query tree
GrETEL – query tree query tree >> XPath generator >>
GrETEL – query tree query tree >> XPath generator >> XPath expression //node[@cat="ssub" and node[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]
GrETEL – treebanks LASSY Small in PostgreSQL database >> no local installation required
GrETEL – results >> (adapted) query
GrETEL – results >> (adapted) query >> quantitative information
GrETEL – results >> tree viewer
GrETEL – results >> tree viewer >> list of results
liesbeth@ccl.kuleuven.be http://nederbooms.ccl.kuleuven.be Thanks for your attention! Questions?