370 likes | 525 Views
Value of Information (VOI) Theory. Advisor: Dr Sushil K Prasad By: DM Rasanjalee Himali. Introduction. Value of information (VoI) in decision analysis is the amount a decision maker would be willing to pay for information prior to making a decision. Ex:
E N D
Value of Information (VOI) Theory Advisor: Dr Sushil K Prasad By: DM Rasanjalee Himali
Introduction • Value of information (VoI) in decision analysis is the amount a decision maker would be willing to pay for information prior to making a decision. • Ex: • Consider a decision situation with one decision :Vacation Activity and one uncertainty :Weather Condition which will be resolved only after the Vacation Activity decision has been made. • Value of information on Weather Condition • captures the value of being able to know Weather Condition even before making the Vacation Activity decision. • It is quantified as the highest price decision-maker is willing to pay for being able to know Weather Condition before making Vacation Activity decision.
Uncertainty • The concept of uncertainty is closely connected with the concept of information. • Uncertainty involved in any problem-solving situation is a result of some information deficiency. • There are many forms of information deficiency: • The information may be, for example, incomplete, imprecise, fragmentary, unreliable, vague, or contradictory.
Uncertainty • The amount of uncertainty is reduced by obtaining relevant information as a result of some action • Ex: performing a relevant experiment, and observing the experimental outcome, searching for and discovering a relevant historical record, requesting and receiving a relevant document from an archive • Then, the amount of information obtained by the action can be measured by the amount of reduced uncertainty. • That is, the amount of information pertaining to a given problem-solving situation that is obtained by taking some action is measured by the difference betweena priori uncertainty and a posteriori uncertainty
Entropy • Entropy: • a measure of the uncertainty of a random variable. • Also called Shannon entropy, • quantifies the information contained in a message, usually in units such as bits. • Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable. • The concept was introduced by Claude E. Shannon in his 1948 paper "A Mathematical Theory of Communication"
Entropy • Let X be a discrete random variable with alphabet X and probability mass function p(x) = Pr{X = x}, x ∈ X. • The entropy H(X) of a discrete random variable X is defined by: • The log is to the base 2 and entropy is expressed in bits.
Relative Entropy and Mutual Information • The entropy of a random variable is a measure of: • the uncertainty of the random variable; • the amount of information required on the average to describe the random variable. • Relative Entropy • is a measure of the distance between two distributions. • The relative entropy D(p||q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p • The relative entropy or Kullback–Leibler distance between • two probability mass functions p(x) and q(x) is defined as:
Relative Entropy and Mutual Information • Mutual Information • measure of the amount of information that one random variable contains about another random variable. • the reduction in the uncertainty of one random variable due to the knowledge of the other • Consider two random variables X and Y with a joint probability mass function p(x, y) and marginal probability mass functions p(x) and p(y). • The mutual information I (X; Y) is the relative entropy between the joint distribution and the product distribution p(x)p(y):
Relative Entropy and Mutual Information • Relationship between Relative Entropy and Mutual Information • We can rewrite the definition of mutual information I (X; Y) as: • Thus, the mutual information I (X; Y) is the reduction in the uncertainty of X due to the knowledge of Y.
Relative Entropy and Mutual Information • Since H(X,Y) = H(X) + H(Y|X), we have • Finally, we note that • Thus, the mutual information of a random variable with itself is the • entropy of the random variable. • This is the reason that entropy is sometimes referred to as self-information
Relative Entropy and Mutual Information • Relationship between entropy and mutual information
Value of Information (VOI) • The information theory developed by Shannon was designed to place a quantitative measure on the amount of information involved in any communication. • The early developers stressed that the information measure was dependent only on the probabilistic structure of the communication process. • Attempts to apply Shannon's information theory to problems beyond communications have, in the large, come to grief. • The failure of these attempts could have been predicted because no theory that involves just the probabilities of outcomes without considering their consequences could possibly be adequate in describing the importance of uncertainty to a decision maker. • It is necessary to be concerned not only with the probabilistic nature of the uncertainties that surround us, but also with the economic impact that these uncertainties will have on us.
Value of Information • To develop a fully operational theory for dealing with uncertainty requires that issues be addressed at each of the following four levels: • • Level 1— • We need to find an appropriate mathematical formalization of the conceived type of uncertainty. • • Level 2— • We need to develop a calculus by which this type of uncertainty can be properly manipulated. • • Level 3— • We need to find a meaningful way of measuring the amount of relevant uncertainty in any situation that is formalizable in the theory. • • Level 4— • We need to develop methodological aspects of the theory, including procedures of making the various uncertainty principles operational within the theory.
VOI in the field of Computer Science • VOI has been successfully applied in past in a variety of fields such as robotics and sensor networks. • Ex: • Scalable information-driven sensor querying and routing for ad-hoc heterogeneous sensor networks by Maurice Chu , Horst Haussecker , Feng Zhao • Application : localization and tracking • Objective: maximize Information Utility, minimize detection latency and bandwidth • Human-Robot communication for corporative Decision making by Tobias Kaupp, Alexei Makarenko and Hugh Whyte • Application: human-robot cooperative decision making • Objective: adjustable autonomy
Represent Belief in some probabilistic representation • Ex: Probability function, Bayesian Network, Influence diagrams
VOI and P2P Search • we model the search in an unstructured P2P network using Value of Information theory • The main idea of the model is to improve the quality of search by selecting the peers to query based on utility of information they have to offer while minimizing cost of search.
A good search mechanism should aim to achieve several goals: • high quality search results, • load balance, • minimum state maintained per node and • efficient object lookup in terms of speed and bandwidth consumption. • relevance of result and the • effectiveness of the search mechanism. • Etc. • Informed search mechanisms perform better in achieving these goals than many blind search methods
P2P search model • The P2P search model defines and updates a belief state regarding the location of the requested data object. • This belief state is incrementally updated by incorporating the next best peer that has not yet been incorporated into the current belief state. • This next best peer is the one that provide maximum information utility while minimizing the cost of search.
P2P search model • The current belief state needs to be held by some peer in the network. Let us call this peer the leader node l. • The leader node can be a persistent one where the belief resides in the leader node for longer period of time or it can be a dynamic one where the belief dynamically travels through the network and the node holding the current belief state is assigned leader position dynamically.
P2P search model • Assuming the leader node holds the current belief state, now the objective function can be defined as follows: Mc(l ,j , Pr(x | {zi}iS)) = Mu(Pr(x | {zi}iS), j) – (1-) Ma(l , j) • The composite objective function Mc has a linear relationship with the information utility function Mu and search cost function Ma. The parameter is a constant between the ranges 0 to 1. • The selected peer j is the peer chosen from the remaining set of peers not in S that maximize the composite objective function: j = max Mc((l , j, Pr(x | {zi}iS))
P2P search model • Each node estimates a time dependent measurement of the location of the peer containing the target information. The time dependent measurement of peer i, zi is given as follows: zi = f (x, ki (t), i (t)) • x :the unknown location of the target information, • ki :time dependent knowledge of peer i on queried data location, and • i :peer characteristics. • The function f depends on x and ki and i.
The time dependent knowledge ki of a peer i depends on the state maintained per node. • This knowledge includes factors such as: • peer’s search success history, • peer’s global knowledge on other peers etc. • Peer characteristics i include factors such as: • peers storage capacity, • peer’s processing power, and • peer node type, (regular node or a super node). • i and ki are explicitly represented because these characteristics affect the peers estimate or measurement of the target peer.
Efficiency Metrics • Objective: Low latency, Low consumed bandwidth, load balance • - Peer average response time ri • - Peer average bandwidth consumption per query. bi • -Peer node type ti • - Peer storage space / processing power pi • - The number of neighbors maintained per peer ni • - The location of peer xi • Relevance Metrics • Objective: High relevant results • - Peer search success history si: • - Peer global knowledge gi • Cost Metrics • Objective: Low latency, Low consumed bandwidth, Minimum routing state maintained per node • - Number of overlay hops per query oi • - Number of messages per query
peer i measurement zi can be defined by the following equation: • where, zi is the measurement of peer i, ri,ti,wi,si,oi and bi stand for the peer i’s average response time per query, peer node type, processing power, peer search success history rate, overlay hops per query and average bandwidth consumption per query respectively.
Belief representation • We define ‘belief’ to be the posterior probability distribution of x given the measurements z1,…zn: Pr(x | z1,…zn ) • The estimate is taken to be the mean value of the probability distribution: • The uncertainty of estimate is given by the covariance of the estimate:
Peer Selection Process • Calculation of the belief state requires that the measurement z1,..zn to be known prior to calculation. • However, in a distributed environment like a P2P, the measurement zi and peer characteristics i reside only within peer i. • Thus we need to communicate this information across peers. • Communication among peers incur cost. • Thus we need to intelligently chose best subset of peers providing best information utility at minimum cost.
Peer Selection Process • Peer selection is an incremental process • The best sub set of peers are selected one at a time from the previously not considered peer set. • The current belief state must be incrementally updated based on measurements of these previously not considered peers. • The useful information a peer may provide vary based on relevance of the peers information content to find x. • Also, there may exist, useful but redundant information. • Therefore, incremental update of belief state requires both selection of optimal set of peers and the optimal order of incorporating these peers in to the current belief state. • At each step of incorporating a new peer into the belief state it should lead to reduced uncertainty of the belief state
Information Utility • he peer selection task is to choose a peer that has not yet been incorporated into the belief state yet provide the most useful information. • The information utility of a peer can be formally defined as follows: • : Pr(Rx) R • maps all the probability distributions on Rx and return a real number which indicates how spread out or uncertain the distribution is. • Our goal is to obtain a larger value to R indicating a tighter distribution
Information Utility • Assume there are N peers in the network labeled 1 to N and their corresponding measurements are {zi} 1i N. • Let U {1..N} be the set of peers in the network while the set S {1 ... N} be the set of peers whose measurements are already incorporated in to the belief state. Clearly, S U. • The current belief is represented as: • Pr(x | {zi}iS) • The next best peer, say peer j is selected from the set U-S. • Incorporating a measurement zj from peer j maps the current probability distribution of x to a new probability distribution which minimizes uncertainty. The new belief state is represented as: • Pr(x | {zi}iS {zj} )
Information Utility • The best peer j to choose is: j = max (Pr(x | {zi}iS {zj})) • The peer j is the peer that provides the minimum uncertainty in terms of information utility.
Information Utility Measurement • The information utility can be quantified in many ways. • These measurements should exploit the inverse relationship between the uncertainty of belief state and the information utility: • Shannon entropy • Fisher Information Matrix • Etc.
A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. • For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. • Formally, Bayesian networks are directed acyclic graphs whose nodes represent variables, and whose missing edges encode conditional independencies between the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent variable or a hypothesis. They are not restricted to representing random variables, which represents another "Bayesian" aspect of a Bayesian network • An Influence diagram is a Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.