1 / 27

Capturing linguistic interaction in a grammar

Capturing linguistic interaction in a grammar. A method for empirically evaluating the grammar of a parsed corpus. Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk. Capturing linguistic interaction. Parsed corpus linguistics Empirical evaluation of grammar

Albert_Lan
Download Presentation

Capturing linguistic interaction in a grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capturing linguistic interaction in a grammar A method for empirically evaluatingthe grammar of a parsed corpus Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk

  2. Capturing linguistic interaction... • Parsed corpus linguistics • Empirical evaluation of grammar • Experiments • Attributive AJPs • Preverbal AVPs • Embedded postmodifying clauses • Conclusions • Comparing grammars or corpora • Potential applications

  3. Parsed corpus linguistics • Several million-word parsed corpora exist • Each sentence analysed in the form of a tree • different languages have been analysed • limited amount of spontaneous speech data • Commitment to a particular grammar required • different schemes have been applied • problems: computational completeness + manual consistency • Tools support linguistic research in corpora

  4. Parsed corpus linguistics • An example tree from ICE-GB (spoken) S1A-006 #23

  5. Parsed corpus linguistics • Three kinds of evidence may be obtained from a parsed corpus • Frequencyevidence of a particular known rule, structure or linguistic event • Coverage evidence of new rules, etc. • Interaction evidence of the relationshipbetween rules, structures and events • This evidence is necessarily framed within a particular grammatical scheme • So… how might we evaluate this grammar?

  6. Empirical evaluation of grammar • Many theories, frameworks and grammars • no agreed evaluation method exists • linguistics is divided into competing camps • status of parsed corpora ‘suspect’ • Possible method: retrievability of events • circularity: you get out what you put in • redundancy: ‘improvement’ by mere addition • atomic: based on single events, not pattern • specificity: based on particular phenomena • New method: retrievability of eventsequences

  7. Experiment 1: attributive AJPs • Adjectives before a noun in English • Simple idea: plot the frequency of NPs with at leastn = 0, 1, 2, 3… attributive AJPs

  8. Experiment 1: attributive AJPs • Adjectives before a noun in English • Simple idea: plot the frequency of NPs with at leastn = 0, 1, 2, 3… attributive AJPs Raw frequency Log frequency NB: not a straight line

  9. Experiment 1: analysis of results • If the log-frequency line is straight • exponential fall in frequency (constant probability) • no interaction between decisions (cf. coin tossing) • Sequential probability analysis • calculate probability of adding each AJP • error bars (binomial) • probabilityfalls • second < first • third < second • fourth < second • decisions interact

  10. Experiment 1: analysis of results • If the log-frequency line is straight • exponential fall in frequency (constant probability) • no interaction between decisions (cf. coin tossing) • Sequential probability analysis • calculate probability of adding each AJP • error bars (binomial) • probabilityfalls • second < first • third < second • fourth < second • decisions interact probability

  11. Experiment 1: analysis of results • If the log-frequency line is straight • exponential fall in frequency (constant probability) • no interaction between decisions (cf. coin tossing) • Sequential probability analysis • calculate probability of adding each AJP • error bars (binomial) • probability falls • decisions interact • fit to a power law • y = m.xk • findmandx probability y = 0.1931x-1.2793

  12. Experiment 1: explanations? • Feedback loop: for each successive AJP, it is more difficult to add a further AJP • Explanation 1: semantic constraints • tend to say tall green ship • do not tend to say tall short shipor green tall ship • Explanation 2: communicative economy • once speaker said tall green ship, tends to only say ship • Further investigation required • General principle: • significant change (usually, fall) in probability is evidence of an interaction along grammatical axis

  13. Experiments 2,3: variations  Restrict head: common and proper nouns • Common nouns: similar results • Proper nouns and adjectives are often treated as compounds (Northern Englandvs. lower Loire)  Ignore grammar: adjective + noun strings • Some misclassifications / miscounting (‘noise’) • she was [beautiful, people] said; tall very [green ship] • Similar results • slightly weaker (third < second ns at p=0.01) • Insufficient evidence for grammar • null hypothesis: simple lexical adjacency

  14. Experiment 4: preverbal AVPs • Consider adverb phrases before a verb • Results very different • Probability does not fall significantly between first and second AVP • Probability does fall between third and second AVP • Possible constraints • (weak) communicative • not (strong) semantic • Further investigationneeded

  15. Experiment 4: preverbal AVPs • Consider adverb phrases before a verb • Results very different • Probability does not fall significantly between first and second AVP • Probability does fall between third and second AVP • Possible constraints • (weak) communicative • not (strong) semantic • Further investigationneeded • Not power law: R2 < 0.24 probability

  16. Experiment 5: embedded clauses • Another way to specify nouns in English • add clause after noun to explicate it • the ship [that was tall and green] • the ship [in the port] • may be embedded • the ship [in the port [with the ancient lighthouse]] • or successively postmodified • the ship [in the port][with a very old mast] • Compare successive embedding and sequential postmodifying clauses • Axis = embedding depth / sequence length

  17. Experiment 5: method • Extract examples with FTFs • at least nlevels of embedded postmodification:

  18. Experiment 5: method • Extract examples with FTFs • at least nlevels of embedded postmodification: 0 1 2 (etc.)

  19. Experiment 5: method • Extract examples with FTFs • at least nlevels of embedded postmodification: 0 1 2 • problems: • multiple matching cases (use ICECUP IV to classify) • overlapping cases (subtract extra case) • co-ordination of clauses or NPs (use alternative patterns) (etc.)

  20. Experiment 5: analysis of results • Probability of adding a further embedded clause falls with each level • second < first • sequential < embedding • Embedding only: • third < first • insufficient data forthird < second • Conclusion: • Interaction along embedding and sequential axes

  21. Experiment 5: analysis of results • Probability of adding a further embedded clause falls with each level • second < first • sequential < embedding • Embedding only: • third < first • insufficient data forthird < second • Conclusion: • Interaction along embedding and sequential axes embedded sequential probability

  22. Experiment 5: analysis of results • Probability of adding a further embedded clause falls with each level • second < first • sequential < embedding • Fitting to f = m.xk • k < 0 = fall (f = m/x|k|) • |k| is high = steep • Conclusion: • Both match power law: R2 > 0.99 embedded y = 0.0539x-1.2206 sequential y = 0.0523x-1.6516

  23. Experiment 5: explanations? • Lexical adjacency? • No: 87% of 2-level cases have at least one VP, NP or clause between upper and lower heads • Misclassified cases of embedding? • No: very few (5%) semantically ambiguous cases • Language production constraints? • Possibly, could also be communicative economy • contrast spontaneous speech with other modes • Positive ‘proof’ of recursive tree grammar • Established from parsed corpus • cf. negative ‘proof’ (NLP parsing problems)

  24. Conclusions • A new method for evaluating interactions along grammatical axes • General purpose, robust, structural • More abstract than ‘linguistic choice’ experiments • Depends on a concept of grammatical distance along an axis, based on the chosen grammar • Method has philosophical implications • Grammar viewed as structure of linguistic choices • Linguistics as an evaluable observational science • Signature (trace) of language production decisions • A unification of theoretical and corpus linguistics?

  25. Comparing grammars or corpora • Can we reliably retrieve known interaction patterns with different grammars? • Do these patterns differ across corpora? • Benefits over individual event retrieval • non-circular: generalisation across local syntax • not subject to redundancy: arbitrary terms makes trends more difficult to retrieve • not atomic: based on patterns of interaction • general: patterns may have multiple explanations • Supplements retrieval of events

  26. Potential applications • Corpus linguistics • Optimising existing grammar • e.g. co-ordination, compound nouns • Theoretical linguistics • Comparing different grammars, same language • Comparing different languages or periods • Psycholinguistics • Search for evidence of language production constraints in spontaneous speech corpora • speech and language therapy • language acquisition and development

  27. Links and further reading • Survey of English Usage • www.ucl.ac.uk/english-usage • Corpora and grammar • .../projects/ice-gb • Full paper • .../staff/sean/resources/analysing-grammatical-interaction.pdf • Sequential analysis spreadsheet (Excel) • .../staff/sean/resources/interaction-trends.xls

More Related