540 likes | 698 Views
MITM 613 Intelligent System. Chapter 3: Dealing with Uncertainty. Sources of Uncertainty Bayesian Updating Certainty theory Note: Possibility Theory to be covered in another slide. Contents. Earlier we assume that the world is a clear cut true/false world.
E N D
MITM 613Intelligent System Chapter 3: Dealing with Uncertainty
Sources of Uncertainty Bayesian Updating Certainty theory Note:Possibility Theory to be covered in another slide. Contents Abdul Rahim Ahmad
Earlier • we assume that the world is a clear cut true/false world. • Many systems used closed-world assumption – any unknown hypothesis is assumed false. • Everything are either true or false. • However, in real world • many things are uncertain • assumptions have to be made: • Eg: assume false if unknown Sources of Uncertainty Abdul Rahim Ahmad
3 distinct forms of uncertainty: • Uncertainty in the rule • Uncertainty in the evidence • Use of vague language Sources of Uncertainty Abdul Rahim Ahmad
IF transducer output is low THEN water level is low • The rule above is uncertain. Why? • A low level of water in the drum is not the only possible explanation for a low transducer output. • Another cause could be that the float attached to the transducer is stuck. • What we really mean by this rule is that if the transducer output is low then the water level is probably low. Source 1 - Uncertainty in the rule Abdul Rahim Ahmad
IF transducer output is low THEN water level is low • The evidence that the rule based on may be uncertain. • Two reasons: • Evidence may come from a source not totally reliable. (transducer output relies upon voltage measurement) • The evidence itself may have been derived by a rule whose conclusion was probable rather than certain. Source 2 - Uncertainty in the evidence Abdul Rahim Ahmad
IF transducer output is low THEN water level is low The above rule is based around the notion of a “low” transducer output. How low is “low” – If the output is a voltage, then we must consider whether “low” corresponds to 1mV, 1V or 1kV. Source 3 – Use of Vague language Abdul Rahim Ahmad
Uncertainty in the evidence Handling Uncertainty • Uncertainty in the rule • Bayesian updating • based on probability theory, assume statistical independence. • Certainty theory • No rigorous mathematical basis, but practical in overcoming some of the limitations of Bayesian updating. • Use of vague language • Possibility theory, or fuzzy logic • allows vague language, to be used in a precise manner. Abdul Rahim Ahmad
Ascribe a probability to every hypothesis or assertion • Probabilities are updated in the light of evidence for or against a hypothesis or assertion. • Probability updating can either be: • By using Bayes’ theorem directly OR • By Calculation of likelihood ratios Bayesian Updating Abdul Rahim Ahmad
Using rules: • Consider the hypothesis steam outlet blocked. In close world assumption, when no evidence hypothesis is false. • In Bayesian approach: • When no evidence, a prior probability isfirst given to the hypothesis that steam outlet is blocked. • When evidence 1 (release valve stuck) is encountered, the probability is updated. • When evidence 2 (steam escaping) is encountered, the probability is again update cumulatively. Bayesian Updating in Boiler Control Example /* Rule 2.4 */ IF release valve stuck THEN steam outlet blocked /* Rule 2.6 */ IF steam escaping THEN steam outlet blocked Abdul Rahim Ahmad
For Bayes theorem to be applied, the rules need to be modified a bit, for example: • Suppose that the prior probability of steam outlet blockage is 0.01 (rarely occur). • As the evidence steam escaping is observed, P(steam outlet blocked) is updated. • steam outlet blockage is a hypothesis, steam escaping is supporting evidence Direct Application of Bayes Theorem /* Rule 2.6 */ IF steam escaping THEN steam outlet blocked /* Rule 2.6 */ IF steam escaping THEN update P(steam outlet blockage) Abdul Rahim Ahmad
Update probability of a hypothesis P(H) in the presence of evidence E. • This is based upon Bayes’ theorem. • i.e: Conditional probability P(H|E) of a hypothesis H given some evidence E, is given in terms of conditional probability P(E|H) the evidence E given H. Bayesian Updating Abdul Rahim Ahmad
P(H|E) is the fraction in which H is also observed from an expected population of events in which E is observed. Similarly , thus Replacing for P(H & E), we get the Bayes theorem: For calculation, we use the equivalent: where Proof of Bayes Theorem Abdul Rahim Ahmad
Using this equation we can update the probability of a hypothesis H in the light of new evidence E using the knowledge of: • P(H) - the current probability of the hypothesis. If this is the first update for this hypothesis, then P(H) is the prior probability. • P(E|H) - the conditional probability that the evidence is present, given that the hypothesis is TRUE. • P(E|~H) - the conditional probability that the evidence is present, given that the hypothesis is NOT TRUE Bayesian updating Abdul Rahim Ahmad
Thus, to build a system that makes direct use of Bayes ‘ theorem: • P(H), P(E|H), and P(E|~H) values for all the different hypotheses and evidence are needed in advance. • P(E|H) and P(E|~H) can be informally estimated. • In rule 2.6a • We have some idea of how often steam is observed escaping when there is an outlet blockage P(E|H), but is less likely to know how often a steam escape is due to an outlet blockage P(H|E). Bayesian updating /* Rule 2.6a */ IF steam escaping THEN update P(steam outlet blockage) Abdul Rahim Ahmad
Bayes’ theorem, in effect, performs abduction (i.e., determining causes) using deductive information (i.e., the likelihood of symptoms, effects, or evidence). The premise that deductive information is more readily available than abductive information is one of the justifications for using Bayesian updating. Bayesian Updating Abdul Rahim Ahmad
Likelihood ratios, provide an alternative means of representing Bayesian updating. The rule should be written as: If the evidence steam escaping is observed, we can update the probability of steam outlet blockage provided we have X expressed as odds rather than a probability. The odds O(H) is given by: where Here, ~H means “not H”. P(H) can also be expressed in O(H) as : Likelihood ratios IF steam escaping THEN steam outlet blockage IS X times more likely Abdul Rahim Ahmad
Examples: • If P(H) = 0.2, O(H) = 0.2/(1 – 0.2) = 0.2 / 0.8 = 0.25 (ie: “4 to 1 against”). • If P(H) = 0.8, O(H) = 0.8/(1 – 0.8) = 0.8 / 0.2 = 4 (ie: “4 to 1 on”). • If P(H) = 1, O(H) = 1/(1 – 1) = 1/0 = infinity. • As P(H) -> 1, O(H) -> infinity. • Normally, limits are often set on odds values: • If O(H)>10^6 then H is true • if O(H)<10^6 then H is false. Likelihood ratios Abdul Rahim Ahmad
Using Bayes theorem for H, Using Bayes theorem for ~H, Dividing the first by the second, Using the definition for odds; and subsitute from 1st two equations, we get where If we take , then O(H|E) = A x O(H). Updating Likelihoods Abdul Rahim Ahmad
O(H|E) is the updated odds of H, given the presence of evidence E. A is the affirms weight of evidence E. We can also use another likelihood ratio, the denies weight D of evidence E. The denies weight can be obtained by considering the absence of evidence, i.e., ~E: where Updating Likelihoods Abdul Rahim Ahmad
The affirm and denies functions are as shown below. Rather than displaying odds values, which have an infinite range, the corresponding probabilities have been shown. The weight (A or D) has been shown on a logarithmic scale over the range 0.01 to 100. Using the likelihood ratios Abdul Rahim Ahmad
Equation O(H|E) = A x O(H) is used in updating our confidence in hypothesis H in light of evidence E given A and O(H) (current odds of H). O(H) will be at its a priori value if it has not previously been updated by other pieces of evidence. In the case of Rule 2.6, H refers to the hypothesis steam outlet blockage and E refers to the evidence steam escaping. The absence of evidence may reduce the likelihood of hypothesis (equivalent to the presence of opposing evidence). The known absence of evidence is not the same as not knowing whether the evidence is present. Can be used to reduce the probability (or odds) of the hypothesis using the denies weight, D Using the likelihood ratios /* Rule 2.6 */ IF steam escaping THEN steam outlet blocked IF steam escaping THEN steam outlet blockage IS X times more likely Abdul Rahim Ahmad
If an evidence E has affirms weight A > 1, then its denies weight must be less than 1 and vice versa: • A>1 implies D<1, • A<1 implies D>1. If A<1 and D>1, then the absence of evidence is supportive of a hypothesis. • Rule 2.7 provides an example of this, where NOT(water level low) supports the hypothesis pressure high and water level low opposes the hypothesis • Bayesian version of Rule 2.7: Using the likelihood ratios /* Rule 2.7 */ IF temperature high AND NOT(water level low) THEN pressure high /* Rule 3.1 */ IF temperature high (AFFIRMS 18.0; DENIES 0.11) AND water level low (AFFIRMS 0.10; DENIES 1.90) THEN pressure high Abdul Rahim Ahmad
As with the direct application of Bayes rule, likelihood ratios have the advantage that the definitions of A and D are in terms of the conditional probability of evidencegiven a hypothesis P(E|H) which is more readily available than the conditional probability of a hypothesis, given the evidence, P(H|E). Even if accurate P(E|H) is not available, Bayesian updating using likelihood ratios is still useful if A and D can be found heuristically. Using the likelihood ratios Abdul Rahim Ahmad
So far • we assumed evidence is either definitely present (i.e., has a probability of 1) or definitely absent (i.e., has a probability of 0). • If the probability of the evidence lies between these extremes, then the confidence in the conclusion must be scaled appropriately. Dealing with uncertain evidence Abdul Rahim Ahmad
Two reasons why the evidence may be uncertain: • evidence could be generated by uncertain rule (therefore has a probability associated with it) • evidence may be in the form of data which are not totally reliable (such as output from a sensor). Dealing with uncertain evidence Abdul Rahim Ahmad
In terms of probabilities, we wish to calculate P(H|E), where E is uncertain. • We can assume E was asserted by another rule whose evidence was B (certain and has probability 1). • Given the evidence B, the probability of E is P(E|B). We thus need to calculate P(H|B). • An expression for P(H|B) is: Uncertain Evidence Abdul Rahim Ahmad
The expression is: • OK if Bayes’ theorem is used directly • Not OK if using likelihood ratios. • Alternatively, we can modify the A and D weights to reflect the uncertainty in E by interpolating the weights linearly for 0<E<1 as seen on the next page. Uncertain Evidence Abdul Rahim Ahmad
This scaling process, shows the interpolated affirms and denies weights are given the symbols A' and D', respectively. Uncertain Evidence • P(E) > 0.5, affirms weight is used • P(E) < 0.5, the denies weight is used. • Over the range of values for P(E), A' and D' vary between 1 (neutral weighting) and A and D, respectively. Abdul Rahim Ahmad
How to combine several evidences supporting the same hypothesis? If n pieces of evidence support a hypothesis H, then where Combining Evidences • Note: Since we do not know which evidence will be available to support the hypothesis H, we need to write expressions for A covering • All possible pieces of evidence Ei. • All combinations of the pairs Ei&Ej. • All the triples Ei&Ej&Ek. • All quadruples Ei&Ej&Ek&Em • and so on. Abdul Rahim Ahmad
This is unrealistic for cases of many evidences. Thus we normally assume all evidences are statistically independent (though not accurate). • If 2 evidences (E1 and E2) are statistically independent, the probability of E1 given E2 is identical to the probability of just E1 (no information about E2). • i.e: and • Thus and for each piece of evidence Ei Combining Evidences Abdul Rahim Ahmad
If, in a given run of the system, n pieces of evidence are found that support or oppose H, then the updating equations are simply: and Combining Evidences Abdul Rahim Ahmad
Interdependence of evidences is OK if the rule base is properly structured. • If evidences are dependent on each other; • They should not be combined in a single rule. • Instead, assertions — and the rules that generate them — should be arranged in a hierarchy: • from low-level input data to high-level conclusions, with many levels of hypotheses between. • amount of evidence that is considered in reaching a conclusion is not limited, but interactions between evidences are controlled. Combining Evidences Abdul Rahim Ahmad
Use Inference networks to represent levels of assertions from input data, through intermediate deductions to final conclusions. Inference Network • Each node represents either a hypothesis or a piece of evidence, and has an associated probability (not shown). • All evidence that is relevant to particular conclusions is drawn together in a single rule for each conclusion, producing a shallow network (no intermediate levels between input data and conclusions). Only reliable if there was little or no dependence between the input data. Abdul Rahim Ahmad
Inference network that includes several intermediate steps Inference Network Note: The probabilities at each node are modified as the reasoning process proceeds, until they reach their final values. Abdul Rahim Ahmad
In a practical rule-based system, we may wish to mix uncertain rules with production rules. For instance, we may wish to make use of the production rule: even though the assertion release valve is stuck may have been established with a probability less than 1. In this case the hypothesis release valve needs cleaning can be asserted with the same probability as the evidence. This avoids the issue of providing a prior probability for the hypothesis or a weighting for the evidence. Combining Bayesian Rules with Production Rules IF release valve is stuck THEN release valve needs cleaning Abdul Rahim Ahmad
If a production rule contains multiple pieces of evidence that are independent from each other, their combined probability can be derived from standard probability theory. Consider, for example, a rule in which two pieces of independent evidence are conjoined (i.e: they are joined by AND): The probability of hypothesis H3 is given by: Bayesian Rules + Production Rules IF evidence E1 AND evidence E2 THEN hypothesis H3 Abdul Rahim Ahmad
For production rules containing independent evidence that is disjoined (i.e., joined by OR) can be treated in a similar way. So given the rule: the probability of hypothesis H3 is given by Bayesian Rules + Production Rules IF evidence E1 AND evidence E2 THEN hypothesis H3 Abdul Rahim Ahmad
See text – page 74 - 78 Working example of Bayesian Updating Abdul Rahim Ahmad
Advantages and disadvantages Abdul Rahim Ahmad
An adaptation of Bayesian updating. Overcome some of the shortcomings of Bayesian updating. Less mathematical rigor than Bayesian updating Certainty theory Abdul Rahim Ahmad
Instead of using probabilities, each assertion has a certainty value associated with it (between 1 and –1). • For a given: hypothesis H, its certainty value C(H) is given by: • C(H) = 1.0 if H is known to be true; • C(H) = 0.0 if H is unknown; • C(H) = –1.0 if H is known to be false. • There is a similarity between certainty values and probabilities, such that: • C(H) = 1.0 corresponds to P(H)=1.0; • C(H) = 0.0 corresponds to P(H) being at its a priori value; • C(H) = –1.0 corresponds to P(H)=0.0. • Each rule also has a certainty associated with it, certainty factor CF. Making Uncertain Hypothesis Abdul Rahim Ahmad
Certainty factors serve a similar role to the affirms and denies weightings in Bayesian systems: Identical measures of certainty are attached to rules and hypotheses. The certainty factor of a rule is modified to reflect the level of certainty of the evidence, such that the modified certainty factor CF is given by: CF’ = CF x C(E) If the evidence is known to be present, i.e., C(E) = 1, then the Equation yields CF’ = CF. Making Uncertain Hypothesis IF <evidence> THEN <hypothesis> WITH certainty factor CF Abdul Rahim Ahmad
The technique for updating the certainty of hypothesis H, in the light of evidence E, involves the application of the following composite function: Updating Certainty where: C(H|E) is the certainty of H updated in the light of evidence E; C(H) is the initial certainty of H, i.e., 0 unless it has been updated by the previous application of a rule; |x| = the magnitude of x, ignoring its sign. • The updating procedure consists of adding a positive or negative value to the current certainty of a hypothesis. • This contrasts with Bayesian updating, where the odds of a hypothesis are multiplied by the appropriate likelihood ratio Abdul Rahim Ahmad
The function for Certainty Updating is similar to the Bayesian updating equation earlier Updating Certainty Abdul Rahim Ahmad
In standard certainty theory, a rule can only be applied if C(E) > 0. Some system restricts rule firing further by requiring that C(E) > 0.2 to save computational power and makes explanations clearer. It is also possible to allow rules to fire regardless. The absence of supporting evidence, indicated by C(E) < 0, would then be taken into account since CF’ would have the opposite sign to CF. Updating certainty Abdul Rahim Ahmad
Continuous and has no singularities or steps; The updated certainty C(H|E) always lies within the bounds –1 and +1; If either C(H) or CF’ is +1 (i.e., definitely true) then C(H|E) is also +1; If either C(H) or CF’ is –1 (i.e., definitely false) then C(H|E) is also –1; When contradictory conclusions are combined, they tend to cancel each other out, i.e., if C(H) = – CF’ then C(H|E) = 0; Several pieces of independent evidence can be combined by repeated application of the function, and the outcome is independent of the order in which the pieces of evidence are applied; If C(H) = 0, i.e., the certainty of H is at its a priori value, then C(H|E) = CF’ If the evidence is certain (i.e., C(E) = 1) then CF’ = CF. Although not part of the standard implementation, the absence of evidence can be taken into account by allowing rules to fire when C(E) < 0. Properties of Updating Function Abdul Rahim Ahmad
In Bayesian updating systems, each piece of evidence that contributes toward a hypothesis is assumed to be independent and is given its own affirms and denies weights In systems based upon certainty theory, the certainty factor is associated with the rule as a whole. Simple algorithm for determining the value of the certainty factor that should be applied when more than one item of evidence is included in a single rule. The relationship between pieces of evidence is made explicit by the use of AND and OR. If separate pieces of evidence are intended to contribute toward a single hypothesis independently of each other, they must be placed in separate rules. The algorithm for combining items of evidence in a single rule is borrowed from possibility theory (Lotfi Zadeh). Logical combinations of evidence Abdul Rahim Ahmad
Logical combinations of evidence Abdul Rahim Ahmad