280 likes | 632 Views
Uncertainty in Artificial Intelligence Research at USC: Research Presentation for Graduate Students. September 10, 2004 Marco Valtorta SWRG 3A55 mgv@cse.sc.edu. Uncertainty in Artificial Intelligence. Artificial Intelligence (AI) [Robotics] Automated Reasoning
E N D
Uncertainty in Artificial Intelligence Research at USC: Research Presentation for Graduate Students September 10, 2004 Marco Valtorta SWRG 3A55 mgv@cse.sc.edu
Uncertainty in Artificial Intelligence • Artificial Intelligence (AI) • [Robotics] • Automated Reasoning • [Theorem Proving, Search, etc.] • Reasoning Under Uncertainty • [Fuzzy Logic, Possibility Theory, etc.] • Normative Systems • Bayesian Networks • Influence Diagrams (Decision Networks)
Research Interests • Algorithms for Probability Update in BNs • factor tree method, with Mark Bloemeke • Modeling of uncertain evidence • observation variables, with Young-Gyun Kim and Jirka Vomlel • Soft Evidential Update in BNs • and the big clique algorithm, with Young-Gyun Kim and Jirka Vomlel • Causal Bayesian networks • Learning • CB algorithm, with Moninder Singh and Bing Xia • the effect of data quality on learning, with Valerie Sessions
Algorithms and Modeling • Algorithms for probability update in BNs • factor tree method, with Mark Bloemeke • Modeling of uncertain evidence with observation variables, with Young-Gyun Kim and Jirka Vomlel • Soft evidential update in BNs and the big clique algorithm, with Young-Gyun Kim and Jirka Vomlel • Causal Bayesian networks, with Yimin Huang
Correlation vs. Causation • The genotype theory (Fisher, 1958) of smoking and lung cancer: smoking and lung cancer are both effects of a genetic predisposition • Three node network • X( smoking) and Y( lung cancer) are in lockstep • X precedes Y in time (smoke before cancer) • But, X does not cause Y, because if we set X, Y does not change: Y only changes according to the value of U (the genotype) U X Y Causality: Models, Reasoning and Inference Chapter 3
An Example [Cochran through Pearl, 2000] Soil fumigants (X) are used to increase oat crop yields (Y) by controlling the eelworm population (Z). Last year’s eelworm population (Z0) is an unknown quantity that is strongly correlated with this year’s population. Through laboratory analysis of soil samples, we can determine the eelworm populations before and after the treatments (Z1 and Z2). Furthermore , we assume that the fumigants do not affect the growth of eelworms surviving the treatment. Instead, eelworm’s growth depends on the population of birds (B), which is correlated with last year’s eelworm population and hence with the treatment itself. Z3 here represents the eelworm population at the end of the season. We wish to assess the total effect of the fumigants on yields. But, controlled randomized experiment are unfeasible and Z0 is unknown. If we got a correct model, can we obtain consistent estimate of the target quantity – the total effect of the fumigants on yields – through observations?
Nonidentifiability • The identifiablility of the effect of X on Y ensures that it is possible to infer the effect of action do(X=x) on Y from passive observations and the causal graph G, which specifies which variables participate in the determination of each variable in the domain • To prove nonidentifiability, it is sufficient to present two sets of structural equations that induce identical distributions over observed variables but have different causal effects • X and Y are observable, U is not. All of them are binary variables • Let P(X=0|U) = (0.5,0.5) • P(Y=0|X,U) is given by the table on the right • We cannot observe U, so we do not know P(U) • When P(U=0) = 0.5, P(Y|X=0) =(.45,.55) • When P(U=0) = 0.1, P(Y|X=0) =(.73,.27) • So, P(Y|do(X)) is non-identifiable U X Y Causality: Models, Reasoning and Inference Chapter 3
Smoking and the genotype theory • Consider the relation between smoking(X) and lung cancer(Y). • The tobacco industry has managed to forestall antismoking legislation by arguing that observed correlation between smoking and lung cancer could be explained by some sort of carcinogenic genotype(U) that involves inborn carving for nicotine • Suppose that Z is the amount of tar deposited in a person's lungs and we believe in the causal model shown on the right. • Can we now recover from observational data only? Causality: Models, Reasoning and Inference Chapter 3
Learning • Parallel learning with background knowledge, with Bhaskara Moole • CB algorithm, with Moninder Singh and Bing Xia • Effect of data quality on learning, with Valerie Sessions
Sample(s) Key Yes: 1 Read: 1 Received: 1 Heard: 1 Received: 1 No: 2
Visual CB • CB [Singh and Valtorta, 1993; 1995] • in Visual C++ Bing Xia, MS, 2002
Applications • Assessment of the risk of mental retardation in infants, with Subramani Mani and Suzanne McDermott • Agent-based intrusion detection with soft evidence, with Vaibhav Gowadia and Csilla Farkas • Support for intelligence analysis, with Michael Huhns, Hrishi Goradia, Jiangbo Dang, and Jingshan Huang • Modeling damage in critical resources, with Yimin Huang and Bill Full
The OmniSeer Project • Represent prior knowledge to support intelligence analysis • Explicate formerly tacit knowledge for use and collaboration • Support relevance analysis, evidence gathering, and novelty detection • …with Bayesian networks!
The massive data might be filtered by preferences and interests specified in the UConn User Model Events Messages Tasks Massive Data Evidence Documents <Date>2002-09-20</Date> <Person>John Doe</Person> <Place>London</Place> … <Date>2002-09-27</Date> <Person>John Doe</Person> … Bayesian networks Outdated fragments Modified Text Tagged messages Instantiated Fragments Bayesian Reasoning Service BN Fragments Explanation Analysis Situation Specific Scenarios Value of Information Analyst Sensitivity Analyzer Visualization Explanation Analysis Surprise Detector OmniSeer Functional Architecture Outdated fragments are removed periodically from the set of partially instantiated fragments Tacit Knowledge Matcher BN fragments represent an analyst’s prior knowledge about terrorist activities or other domains of interest specified in the UConn user model Differences between an analyst’s conclusion and the situation-specific scenario lead to explication of formerly tacit knowledge, represented as new BN fragments Forgetter Alerts The noun-phrase analyzer from UConn processes messages; a 3rd-party tagger processes news feeds The analyst explores which information should be acquired to reduce uncertainty and assesses the robustness of conclusions The analyst is notified of surprises and interesting situations, as specified in the UConn User Model Relevant facts extracted from the documents and messages fill in the details of the BN fragments of interest Composer Instantiated BN fragments are composed into scenarios specific to the situation at hand
Competence and Resources • Several faculty members in the CSE department have worked in normative probabilistic reasoning for many years • Some colleagues and students in the Statistics department are also interested • Tools for editing BNs and IDs, propagation, interface with relational databases, soft evidential update, learning, etc., have been acquired or developed and used in projects and courses (CSCE 582 and CSCE 822)
Some Local UAI Researchers (Notably Missing: Juan Vargas) Billy Turkett, Ph.D. (Wake Forest) Young-Gyun Kim, Ph.D. (S.C. State) Wayne Smith, Ph.D. (Presyterian College) Clif Presser, Ph.D. (Gettysburg College) Miguel Barrientos, Ph.D.
Additional Information • Bayesian networks journal club • meets every two weeks on Wednesdays: next meeting on September 15 at 1pm in 3A75 • http://www.cse.sc.edu/~mgv/BNSeminar/index.html • 3A55, 777-4641 • mgv@cse.sc.edu • www.cse.sc.edu/~mgv