210 likes | 323 Views
March 15 2006. CS 182 Sections 101 - 102. slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM. Announcements. a5 is due Friday night at 11:59pm a6 is out tomorrow (2 nd coding assignment), due the Monday after spring break Midterm solution will be posted (soon).
E N D
March 15 2006 CS 182Sections 101 - 102 slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM
Announcements • a5 is due Friday night at 11:59pm • a6 is out tomorrow (2nd coding assignment), due the Monday after spring break • Midterm solution will be posted (soon)
Quick Recap • This Week • you just had the midterm • a bit more motor control • some belief net, feature structure • Coming up • Bailey’s Model of learning hand action words
Your Task: As far as the brain / thought / language is concerned, what is the single biggest mystery to you at this point?
Remember Recruitment Learning? • One-shot learning • The idea is for things like words or grammar, kids learn at least something given a single input • Granted, they might not get it completely right in the first shot • But over time, their knowledge slowly converges to the right answer (i.e. built a model to fit the data)
Model Merging • Goal: • learn a model given data • The model should: • explain the data well • be "simple" • be able to make generalizations
Naïve way to make a model • create a special case for each piece of data • of course get the training data completely right • cannot generalize at all when test data comes • how to fix this — Model Merging • "compact" the special cases into more descriptive rules without losing too much performance
Basic idea of Model Merging • Start with the naïve model: one special case for each piece of data • While performance increases • Create a more general rule that explains some of the data • Discard the corresponding special cases
2 examples of Model Merging • Bailey’s VerbLearn system • model that maps actions to verb labels • performance: complexity of model + ability to explain data MAP • Assignment 6 - Grammar Induction • model that maps sentences to grammar rules • performance: size of grammar + derivation length of sentences cost
Grammar • Grammar: rules that governs what sentences are legal in a language • e.g. Regular Grammar, Context Free Grammar • Production rules in a grammar have the form • Terminal symbols: a, b, c, etc • Non-terminal symbols: S, A, B, X, etc • Different classes of grammar restrict where these symbols can go • We’ll see an example on the next page
Right-Regular Grammar • Right-Regular Grammar is a further restricted class of Regular Grammar • Non terminal symbols are always on the right end • e.g: S -> a b c X X -> d e X -> f • valid sentences would be "abcde" and "abcf“
Grammar Induction • As input data (e.g. “abcde”, “abcf”) comes in, we’d like to build up a grammar that explains the data • We can certainly have one rule for each sentence we see in the data naive approach, no generalization • Would rather “compact” your grammar • In a6, you have two ways of doing this “compaction” • prefix merge • suffix merge
prefix merge Sa b c d e Sa b c f becomes Sa b c X X d e X f suffix merge S a b c d e S f c d e becomes S a b X S f X X c d e How do we find the model?
Contrived Example • Suppose you have these 3 grammar rules: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there • 5 merging options • prefix merge (r1, r2, 1) • prefix merge (r1, r2, 2) • suffix merge (r1, r3, 1) • suffix merge (r1, r3, 2) • suffix merge (r1, r3, 3)
Computationally • Kids aren’t presented all the data at once • Instead they’ll hear these sentences one by one: • eat them here or there • eat them anywhere • like them anywhere or here or there • As each sentence (i.e. data) comes in, you create one rule for it, e.g. S eat them here or there • Then you look for ways to merge as more sentences come in
Example 1: just prefix merge • After the first two sentences are presented, we can already do a prefix merge of length 2: r1: S eat them here or there r2: S eat them anywhere r3: S eat them X1 r4: X1 here or there r5: X1 anywhere
Example 2: just suffix merge • After the first three sentences are presented, we can do a suffix merge of length 3: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there
Your Task in a6 • pull in sentences one by one • monitor your sentences • do either a prefix merge or a suffix merge as soon as it’s “good” to do so
How do we know if a model is good? • want a small grammar • but want it to explain the data well • minimize the cost along the way: c(G) = s(G) + d(G,D) size of grammar derivation length of sentences : learning factor to play with
Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your original grammar: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there size of grammar = 15 derivation length of sentences = 1 + 1 + 1 = 3 c(G) = s(G) + d(G,D) = ∙ 15 + 3
Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your new grammar: r2: S eat them anywhere r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there so in fact you SHOULDN’T merge if ≤ 2 size of grammar = 14 derivation length of sentences = 2 + 1 + 2 = 5 c(G) = s(G) + d(G,D) = ∙ 14 + 5