150 likes | 309 Views
Modelling Language Evolution Lecture 4: Learning bias and linguistic structure. Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit. Summary – the story so far. What is a model? Why do linguists need computational models?
E N D
Modelling Language EvolutionLecture 4: Learning bias and linguistic structure Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit
Summary – the story so far • What is a model? Why do linguists need computational models? • Modelling learning. One approach: Neural nets • Nodes, activations, connection weights, hidden representations • Error driven learning • Learning syntax: recurrent nets, starting small, critical period • Evolving network structure: genetic algorithms
Learning bias • We have been talking about what learners are “good” or “bad” at – what they can and cannot learn. • We refer to the learner’s prior bias • (This can be given a simple mathematical definition – but let’s not worry about that…) • Prior bias is everything the learner brings to the problem that is independent of the data • Where does the bias come from? • It comes from biology. It is what is innate.
Language universals and learning biases • Christiansen suggests that languages themselves adapt to learners. • So far we have looked at long-distance dependency and embedding… • Christiansen suggests less general targets for explanation: • Branching direction/head-order consistency • Subjacency • Typically, these are assumed to be innate (and therefore evolved by natural selection) • What if they arise naturally from sequential learning biases?
Head-ordering consistency • Languages typically head-first or head-last. • (for the linguists…) This might be explained with a parameterised of X-bar theory
Recursive consistency • Christiansen generalises head-ordering in terms of the interaction of recursive rules. • Consistent trees:
Recursive consistency • Christiansen generalises head-ordering in terms of the interaction of recursive rules. • Inconsistent trees:
A simple typology • Typologists construct a space of logically-possible languages and assign each a type • Christiansen’s binary typology: • English is 11100
Which languages can SRNs learn? • If languages adapt to learning biases (as opposed to the other way round), perhaps some types will be better than others? • Will the SRN biases predict cross-linguistic distribution? • 8x8x8 SRN trained on next-category prediction • Categories: • Singular N, Plural N • Singular V, Plural V • Singular genitive, Plural genitive • Adposition • End of sentence marker
Experimental setup • Trained on each of the 32 languages • Each language trained on 25 nets • Each of these had 5 different initial weight settings and 5 different random training sets • Each set contained 1000 words • Each net trained on 7 passes through data • So: 800 simulations of 7000 words each • Output in terms of mean standard error of predicting the correct probability distribution for next-word
Results 1: Net error v. recursive inconsistency • Net error correlates very well with number of inconsistencies (r=.83, p<.0001)
Typological data • 625 languages have been characterised in terms of: • Verb-object order • Adposition order (i.e., prepositions or postpositions) • Genitive order • Grouped according to historical relatedness into 252 genera. (Why?) • This controls for imbalances in the sample that are due to historical epiphenomena.
Results 2: Net error v. cross-linguistic distribution • Net error correlates well with proportion of genera (r=.35, p<.05)
Conclusions, and potential problems • We have moved from: • Learners adapt to be good at language (via natural selection) • To: • Language adapts to us • Concerns: • What do Christiansen’s results say about Elman and Batali’s? • Are the neural nets modelling learning, or processing? • What about other universals (e.g., subjacency) • Is equating learning difficulty and universal distribution valid? • Where do the languages come from? and what do the errors mean?