1 / 14

Announcements

Announcements. Nearing the end (gasp…) Two lectures on miscellaneous topics One on evaluation of learning systems Topics Topics / confusions in statistics Artificial Neural Nets Any specific topics desired for Wednesday??? Final Exam Dec. 12, 7-8:15 PM (not cumulative).

Download Presentation

Announcements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Announcements • Nearing the end (gasp…) • Two lectures on miscellaneous topics • One on evaluation of learning systems • Topics • Topics / confusions in statistics • Artificial Neural Nets • Any specific topics desired for Wednesday??? • Final Exam Dec. 12, 7-8:15 PM (not cumulative) CS446-Fall ’06

  2. Religious wars among StatisticiansWhat’s real? Distribution or Data • Frequentist / objectivist / Fisherian / classical statistics • Probability = limiting relative frequencies as the sample size increases • Bayes Theorem is OK but not so central • Priors and inference must be “objective” – rooted in counting • Ronald Fisher • Bayesian • Rev. Thomas Bayes (also Laplace) • Inference should also reflect beliefs (“subjective priors”) • Bayes theorem specifies how to use subjective priors • Probabilities as capturing uncertainties • They often agree on conclusions • Methods often differ • Bayesians will claim inferences that frequentists eschew CS446-Fall ’06

  3. Are You a Frequentist? • The world is a distribution • Data are a random sampling – our beliefs are irrelevant • We can come to know the World via data • the distribution is primary • the particular data are incidental and not important • Different samples have the same expected information (assuming independent samples & same sample size) • Hypothesize; Observe; Evaluate • Changing the hypothesis taints the data • baseball • lottery • Stock scam, wrong hypothesis CS446-Fall ’06

  4. Are You a Bayesian? • Evidence is primary • Evidence can be objective (data) or subjective • Evidence (even as data) can testify for / against different distributions at the same time • Will my plane crash? • What is the chance of rain tomorrow (yesterday) • Subjective uncertainty as a statistical distribution • Two meter problem CS446-Fall ’06

  5. Two Envelope Problem • I have two envelopes • They each contain money • I offer you one • You can’t tell from the outside, but I put twice as much in one as the other… • What’s the analysis for this problem? CS446-Fall ’06

  6. Simpson’s Paradox • Dr Bayes – an implemented statistical inference system • There is a dreaded disease w/ two treatments A and B • Dr. Bayes has seen some training data • We observe this dialog: • Should Patient 1 take A or B? • Dr.B.: Is Patient 1 male or female? Male • Dr.B.: Patient 1 should take A • Should Patient 2 take A or B? • Dr.B.: Is Patient 2 male or female? Female • Dr.B.: Patient 2 should take A • Should Patient 3 take A or B? • Dr.B.: Is Patient 3 male or female? unknown • Dr.B.: Patient 3 should take B • Do we look for a bug in Dr. Bayes? CS446-Fall ’06

  7. Dr. Bayes • Three Boolean random variables: Gender M/F, Treatment A/B, Improvement Y/N • 100 patients: • 50 M 50 F • 50 A 50 B • Want probability of improvement given what we know: gender and treatment • P(Y|M,A) = 0.625 P(Y|M,B) = 0.5 • P(Y|F,A) = 0.9 P(Y|F,B) = 0.8 • P(Y|A) = 0.68 P(Y|B) = 0.74 CS446-Fall ’06

  8. Simpson’s Paradox • Real life examples • Quality of health care in hospitals • Gender discrimination in Eng College admission CS446-Fall ’06

  9.  … Perceptron to ANN • Very limited expressiveness • Can’t do XOR on two Booleans • If only we could stack them • What functions could we represent? CS446-Fall ’06

  10.   … …  …  … What can multi-layer perceptrons (ANNs) represent? What if we change the topology? More levels = more expressiveness? CS446-Fall ’06

  11. Can We Still Learn Efficiently?(is there a generalized perceptron convergence theorem?)   … …  …  … Now any assignment of labels (any function) can be represented Is this a good thing? CS446-Fall ’06

  12. No* • Minsky and Papert suspected there was not in Perceptrons (1969) • This largely killed off research interest • Minsky and Papert were right * but for a slightly modified linear device the answer becomes Yes, quite easily CS446-Fall ’06

  13.   … Threshold or step fcndiscontinuous, non-differentiable …  … Sigmoid fcn differentiable Why “No”? CS446-Fall ’06

  14. Back-Propogation • Hinton, Rumlehart,… • Common sigmoid: g(x) = (1+e-x)-1 • Then g’ = g(1-g) • This is the missing factor in our original gradient weight update expression. • Now internal gradients exist (and are easily calculated) • Standard gradient descent works quite well • Can get caught in local extrema • Boltzmann machine • Add a hidden node • Random restarts • Need to limit hidden nodes – why? • Suppose we learn to 100% accuracy on the training data • Interpreting hidden nodes / extracting rules from ANNs • New resurgence in interest from statistical learning – “neural” largely gone • Think: a nonlinear multidimensional optimization device CS446-Fall ’06

More Related