ARTIFICIAL INTELLIGENCE: THE MAIN IDEAS

ARTIFICIAL INTELLIGENCE:THE MAIN IDEAS OLLI COURSE SCI 102 Tuesdays, 11:00 a.m. – 12:30 p.m. Winter Quarter, 2013 Higher Education Center, Medford Room 226 Nils J. Nilsson nilsson@cs.stanford.edu http://ai.stanford.edu/~nilsson/ Course Web Page: www.sci102.com/ For Information about parking near the HEC, go to: http://www.ci.medford.or.us/page.asp?navid=2117 There are links on that page to parking rules and maps

AI in the News?

PART THREEAGENTS THAT REASON(Continued)

Adding Reasoning to The Model of an Agent Action Selection Perception Planner Memory Reasoner

Reasoning Under Uncertainty We (and AI agents) are uncertain about almost everything! In these matters the only certainty is that nothing is certain. -- Pliny the Elder When one admits that nothing is certain one must, I think, also admit that some things are much more nearly certain than others. -- Bertrand Russell

How Should We Reason if Things Are Uncertain? ∀(x)[Green(x) => Liftable(x)] Green(A) Conclusion: Liftable(A) But, what if we are not certain that A is green? Perhaps it’s only “likely” that it’s green. And, perhaps we are not certain that all green blocks are liftable? We need to use “probabilistic reasoning”

Probabilities: A Tool for Dealing With Uncertainty Probability of Rain in Pendleton, OR As of Jan. 19. 2013

Good Poker Players Are Familiar with Probabilities Hand Probability Single Pair Two Pair Triple A Straight Full House 0.422569 0.047539 0.021128 0.00392465 0.001441

So Are Physicians Symptoms Family History Medical History Tests . . . Rule-out unlikely but serious diagnoses Likely diagnoses Probabilistic Reasoning Medical Knowledge Bases Clinical Experience

Methods for Coming Up With Probability Values Mathematical: As in calculating poker probabilities Frequency: From large databases of records (such as mortality tables, etc.) “Subjective”: As in guessing football odds, horse-racing odds, etc.

Markets Are Often Used to Establish Subjective Probabilities Foresight Exchange Prediction Market “A power plant will sell energy produced by nuclear fusion by 31 December 2045. After its initial energy sale, it must operate (i.e., sell energy) regularly for a minimum of one year. ‘Regularly’ is defined as >50% of the time.” Last Price: $0.71 That is, the probability is 0.71 http://www.ideosphere.com/fx/index.html

More Prices http://www.ideosphere.com/fx/index.html

Defining Bets Cold fusion of hydrogen in nickel can produce over 10 watts/cc net power [by Jan. 1, 2015]. The phrase "cold fusion" has its vernacular meaning of any low energy nuclear reaction that produces heat. http://www.ideosphere.com/fx/index.html

Some Basics About Probabilities p(x) denotes the probability of x [sometimes Pr(x)] Example: p(heads) = 0.5 p(x) always between 0 and 1 (sometimes expressed as a percentage) p(x) sometimes expressed as “odds” (4 to 1 odds in favor of x is the same as p(x) = 0.8) p(x) + p(y) = 1 if x and y are mutually exclusive and exhaustive Example: p(heads) + p(tails) = 1

Conditional Probabilities The probability that Joe has a fever, given that he has the flu is greater than the probability that he has a fever not knowing that fact. p(Joe has high fever|Joe has flu) > p(Joe has high fever) a conditional probability a prior probability

Much Causal Knowledge is Probabilistic flu causes a fever (usually) p(Joe has high fever|Joe has flu) cause a possible symptom

What is the Likely Cause of a Symptom? p(Joe has high fever|Joe has flu) known from medical (causal) knowledge But, what about the reverse: p(Joe has flu|Joe has high fever) ?

Bayes’ Rule to the Rescue p(x|y)=p(y|x)p(x)/p(y) p(cause|symptom) = p(symptom|cause)p(cause)/p(symptom) Thomas Bayes (c. 1701 –April 1761) was an English mathematician and Presbyterian minister

Deriving Bayes’ Rule p(x|y)p(y) =p(x & y) = p(y & x) unconditioning order of x and y doesn’t matter p(y & x) = p(y|x)p(x) from whence: p(x|y) = p(y|x)p(x)/p(y)

Using Bayes’ Rule p(Joe has flu|Joe has high fever) = p(Joe has high fever|Joe has flu) p(Joe has flu)/p(Joe has high fever) From thermometer From medical and statistical “knowledge”

Baysian Networks Judea Pearl

In AI, Probabilistic Knowledge is Represented in “Bayesian Networks” “causality” link is annotated with a table of probabilities Joe has flu A Very Small Network Joe has flu p(Joe has high fever) T 0.95 F 0.005 Joe has high fever Going in the Direction of the Arrow is Called “Causality Reasoning”

Use Bayes’ Rule to Go AgainstThe Arrow calculate probability using Bayes’ rule Joe has flu probability = 0.99 given symptom Joe has high fever Going Against the Arrow is Called “Evidential Reasoning”

The Calculations Are More Complex for Bigger Networks Conditional Probability Table Expresses Causality Information

A Larger Causality Network Check out interactive applet at: http://aispace.org/bayes/ and click on sample problems

The Car Doesn’t StartEvidential Reasoning Needed p = 0.283 Computed p = 0.023 Given: p = 0

New Information Changes Things p = 0.283 p = 0.1 Given: p = 0 Given: p = 0

Bayesian Networks Can Be Learned Can This Net be Learned Using the Statistical Data That it Generates?

Suppose We Know There Are 37 Nodes Can We Learn What is Linked to What

Generate Thousands of Samples That Obey The Underlying Probabilities Given by the Unknown Network These Were Generated From the Unknown Network

The Learned Network Sprites, P., and Meek, C. 1995. “Learning Bayesian networks with discrete variables from data.” Proc. 1st Int. Conf. on Knowedge Discovery and Data Mining.

Comparison Learned network (generated from the data) Target (from which data was generated)

Availability of Large Amounts of Data Permits the Statistical Analysis Needed to Learn Bayesian Networks

Some Learned Networks Can Be Quite Large Gene network inferred analyzing human cell cycle expression data http://www.sciencedirect.com/science/article/pii/S1532046411000311#

Applications of Bayesian Networks * Computational biology and bioinformatics (gene regulatory networks, protein structure, . . .) * Medicine * Document classification * Information retrieval * Image processing * Decision support systems * Engineering * Law * Speech recognition http://en.wikipedia.org/wiki/Bayesian_network#Applications

PART FOURAGENTS THAT UNDERSTAND HUMAN LANGUAGE

Adding Language Ability to The Model of an Agent Perception Action Selection Language Processing Planner Memory Reasoner

Natural Language Processing (NLP) * Converting Speech to Text * “Understanding” Text * Translation * Speech and Text Generation

Converting Speech to Text * Capturing the Speech Waveform * Division of Waveform Into “Frames” * Determining “Features” of Each Frame * Recognizing “Phonemes” * Converting Phonemes to Text

A Hard Problem! There are many ways to recognize speech. There are many ways to wreck a nice beach. I scream for ice cream

Example Speech Waveform Sentence that was spoken “Phonemes” are acoustic elements

Notation For English Phonemes

Early Processing Speech waveform Features of the waveform Spectral Analysis, Etc. F1, F2, F3, . . .

Different Phonemes Cause Different Waveform Features (with some uncertainty) u and v are variables phoneme (could be one of 40 or so) u v A Bayesian Network p(v|u) “acoustic model” features (could be one of several)

Given a Particular Feature, F, Select The Most Probable Phoneme phoneme (could be one of about 40) u F A Bayesian Network p(F|u) “acoustic model” Bayes’ Rule: p(u|F) = p(F|u)p(u)/p(F) Substitute each of the 40 or so phonemes in the above equation and note which gives the largest value. That’s our guess for which phoneme was spoken.

Different Words Cause Different Phonemes (again, with some uncertainty) word (could be one of thousands) x y A Bayesian Network p(y|x) “word model” phonemes (could be one of 40 or so)

Given a Particular Phoneme, PH, Select The Most Probable Word word (could be one of thousands) PH x A Bayesian Network p(PH|x) “word model” Bayes’ Rule: p(x|PH) = p(PH|x)p(x)/p(PH) Substitute each of the thousands of words in the above equation and note which gives the largest value. That’s our guess for which word was spoken.

But, The Process is MUCH More Complicated! Language Model (what words are likely) y1 y2 y3 x1 x2 x3 Word Level Word Model Phoneme Level The Bayesian Network Connecting Words With Phonemes Is Called a “Hidden Markov Model (HMM)”

We Also Have a HMM Connecting Phonemes With Features Articulation Model (what phonemes are likely) v1 v2 v3 u1 u2 u3 Phoneme Level Acoustic Model Feature Level

Use Both HMM’s And Select The Most Probable Word Sequence v1 v2 v3 u1 u2 u3 x1 x2 x3 Phoneme Level Word Level Waveform Feature Level Learning: The Various Probabilities Can Be Tuned For a Particular Speaker

ARTIFICIAL INTELLIGENCE: THE MAIN IDEAS