Watson Systems

Watson Systems By- Team 7 : PallavDhobley 09005012 VihangGosavi 09005016 AshishYadav 09005018

Motivation: • Deep-Blue’s Triumph over Kasparov in 1997. • In search of new challenge.

Jeopardy! • 2004 – Search ends! • One of the most popular Quiz show in U.S.A. • Broad/Open Domain. • Complex Language. • High Speed. • High precision. • Accurate Confidence.

Jeopardy! • 2004 – Search ends! • One of the most popular Quiz show in U.S.A. • Broad/Open Domain. • Complex Language. • High Speed. • High precision. • Accurate Confidence. *le IBM

Easier than playing Chess? • Chess: • Finite moves and states. • Mathematically well defined • search space • Symbols have mathematical • meaning • Natural Language: • Implicit • Highly Contextual • Ambiguous • Imprecise

Easier than playing Chess? NO!! • Chess: • Finite moves and states. • Mathematically well defined • search space • Symbols have mathematical • meaning • Natural Language: • Implicit • Highly Contextual • Ambiguous • Imprecise

Easy Question (LN(1,25,46,798*π))^3 / 34,600.47 = ?

Easy Question: (LN(1,25,46,798*π))^3 / 34,600.47 = 0.155

Hard Question: • Where was our “father of nation” born? - contextual. - imprecise. • Easy for us Indians to relate term “father of nation” with M.K. Gandhi. • Not the same with computers. • Need of learning from As-Is content.

Learning the As-Is text (NLP):

What is Watson? • Advanced Search Engine? × • Some fancy Database Retrieval System? × • Beginning of Sky-Net? × • Science behind an Answer? √

DeepQA

Principles of DeepQA: • Massive Parallelism - Each hypothesis and interpretation is analyzed independently in parallel to generate candidate answers. • Many experts - Facilitate the integration and contextual evaluation of a wide range of analytics generated by several algorithms running in parallel.

Principles of DeepQA (ctd.) • Pervasive Confidence Estimation - No component commits to an answer • Integrate shallow and deep knowledge - Using shallow and deep semantics for better precision e.g. Shallow semantics : Keyword matching Deep semantics : Logical Relationships

Shallow Semantics:

Deep Semantics:

How does Watson Learn?

Step 0 : Content Acquisition • Identifying and gathering the content to be used for answering and evidence supporting. • Involves analyzing example questions from the problem space which consists of Q-A from previous games. • Encyclopedias, dictionaries, wiki pages etc. are use to make up the evidence sources. • Extract , verify and merge the most informative nuggets as a part of content acquisition.

Step 1 : Question Analysis The initial analysis that determines how the question will be processed by the rest of the system. • Question Classification e.g. puzzle/math • Focus and (Lexical Answer Type)LAT e.g. “On this day” LAT – date/day • Relation Detection e.g. sea(India, x, west) • Decomposition - divide and conquer.

Step 2 : Hypothesis Generation • Primary search : • Keyword based search • Top 250 results are considered for Candidate Answer generation. • Empirical statistics : 85% time answer is within top 250 results. • CA generation : above results are further processed for CA generation. • Soft Filtering • It reduces set of candidate answers using superficial analysis (machine learning). • Reduction in number of CA to approx. 100 • Answers are not fully discarded , may be reconsidered at final stage.

Step 2: Hypothesis Generation (ctd.) 4. Each CA plugged back into the question is considered a hypothesis which the system has to prove correct with some threshold of confidence. 5. If failed at this state , system has no hope of answering the question whatsoever. • Noise tolerance.

Step 3 : Hypothesis & evidence scoring • Evidence retrieval : • Further evidences are gathered to support the Hypothesis formed in last step . e.g. Passage search: gathering passages by adding CA to primary search query. • Scoring: • Deep content analysis • Determines degree of certainty that retrieved evidence supports the CA.

Step 4 : Final Merging and Ranking • Merging: • Merging all the hypothesis which give you the same answer. • Using an ensemble of matching, normalization and co-reference resolution algorithms, Watson identifies equivalent and related hypothesis. • Ranking and confidence estimation: • The final set of hypothesis after merging are ran over set of training questions with known answers.

Example : • Q : “Who is the antagonist of Stevenson's Treasure Island?” • Step 1 : Parse and generate a logical structure to describe the question. -antagonist(X) -antagonist_of(X, Stevenson’s TI) -adj_possesive(Stevenson, TI)

Example (ctd.): • Step 2: Generating semantic assumptions - island (TI) -book(TI) - movie(TI) -author(Stevenson) -director(Stevenson) • Step 3:Builds different semantic queries based on phrases, keywords and semantic assumptions. • Step 4 : Generates 100s of answers based on passages, documents and facts returned from 3. Long-John Silver is likely to be one of them.

Example (ctd.): • Step 5:Formulate evidence in support or refutation. (+VE) evidence : 1. Long-John Silver the main character in TI. 2. The antagonist in Treasure Island is Long-John Silver 3. Treasure Island, by Stevenson was a great book. (-VE) evidence : Stevenson = Richard Lewis Stevenson antagonist = Wolverine

Example (ctd.): • Step 6: - Combine all the evidence and their scores. - Analyze evidences to compute confidence and return the most confident answer. Long-John Silver in this case !

Watson- Performance:

Watson’s Brain (Software): • Languages used : Java , C++ , prolog. • Apache Hadoop framework for distributed computing. • Apache UIMA framework. • Helps in DeepQA’s demand for Massive Parallelism. • Facilitated rapid component integration, testing , evaluation • SUSE Linux Enterprise Server 11

Watson’s Brain(Hardware): • One Jeopardy! Question takes 2hours on normal desktop computer! • The real task - Confidence determination before buzzing. • High Time need of faster Hardware support.

Watson’s Brain: (ctd.) • Total Ninety POWER-750 servers. • Total 2880 POWER7 processor cores. • Total 16 Terabytes of R.A.M. • Each POWER-750 server uses a 3.5 GHz POWER7eight core processor, with 4 Threads per core. • Size of total 8 refrigerators. • Can process data up-to the speed of 500 GB/s.

Watson’s Brain: (ctd.)

Watson – Runtime Stack

The Final Blow! • 3 rounds of Jeopardy! Between Watson , Rutter& Jennings. • Watson comprehensively defeats it’s competitors with net score of $77,147 • Jennings managed $24,000. • Rutter ended third with $21,600.

The Final Blow! (ctd.) “I for one welcome our new computer overlords” - Jennings

Conclusion: • High performance analytics • Non-cognitive • Smart Learner • Not invincible

Watson & Suits • Tech support • Knowledge management • Business Intelligence • Improvised Information sharing

Watson for society- Health Care • Symptoms • Patient Records • Tests • Medications • Notes/Hypothesis • Texts, Journals Diagnosis Models Finding appropriate “Disease” , As per Asked by adjoining “Symptoms” and “Records”

References: • Watson Systems: http://www-03.ibm.com/innovation/us/watson/ • Wiki Page http://en.wikipedia.org/wiki/Watson_%28computer%2 • Research Papers: http://researcher.ibm.com/researcher/view_page.php?id=2121

References: • Jeopardy! IBM Watson Day 1 (Feb 14, 2011) http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related • Science Behind an Answer- http://www-03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html • The AI magzine http://www.aaai.org/ojs/index.php/aimagazine/article/view/2303

References: • Philip Resnik. 1999.Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research. • Tom M. Mitchell. 1997. Machine Learning. Computer Science Series. McGraw-Hill.

Watson Systems

Watson Systems

Presentation Transcript

Jean Watson

Jean watson

Jean Watson

WATSON

Mercy watson

James Watson

Jessica Watson.

Mark Watson

Jean watson

Laura Watson

Watson

Jane Watson

Steve Watson

Chris Watson

Watson Davis

WATSON @ RPI

Chuck Watson

Jami Watson

- Lyall Watson

Emma Watson

Paul Watson

Tony Watson