490 likes | 507 Views
CS533 Information Retrieval. Dr. Michal Cutler Lecture #11 March 1, 1999. This Class. Inference networks conditional independence belief nets Inference nets in Information Retrieval Evaluation. Conditional Independence.
E N D
CS533 Information Retrieval Dr. Michal Cutler Lecture #11 March 1, 1999
This Class • Inference networks • conditional independence • belief nets • Inference nets in Information Retrieval • Evaluation
Conditional Independence • Variable V is conditionally independent of a set of variables V1 given another set of variables V2 if: P(V|V1,V2)=P(V|V2) • Intuition: V1 gives us no more information about V than we already know because of knowing V2.
Conditional Independence • Our belief that a person will have a Steady job, given evidence on Education and Job, is independent of whether or not the person owns a Home • P(Steady|Ed, Job, Home) = P(Steady|Education , Job)
Belief networks A belief network is a graph in which: 1. A set of random variables make up the nodes of the network 2. A set of directed edges connect pairs of nodes. The intuitive meaning of an arrow from node X to node Y is that X has direct influence on Y
Belief networks 3. Each node has a link matrix (conditional probability table) that quantifies the effects the parents have on the node 4. The graph has no directed cycles
Example:Getting a loan • Education and a Job influence having a Steady job • Family influences having a Guarantor for a loan • Steady job and Guarantor influence getting a Loan
P(F) .7 P(J) .8 E J P(S) T T .9 T F .6 F T .7 F F .1 S G P(L) T T 1 T F 0 F T 0 F F 0 P P(G) T .7 F .2 Getting a loan P(E) .6 E J F S G L
Computing the conditional probabilities • The conditional probabilities can be computed from experience (past records of loan applications) • Or, use formulas for computing the conditional probabilities • For example the link matrix for Loan is of type LAND.
Semantics of belief networks • To construct a net, think of representing the joint probability distribution. • Every entry in the joint probability distribution can be computed from the information in the network
Probability of getting a loan • We can use the belief network to compute the probability .39 of getting a loan (next foil) • If we have a person with a job, education, and home the probability of getting a loan is higher. P(L=true|E,J,H)=P(L=true|S,G)* P(S|E,J)*P(G|H)=1*.9*.7=.63
Inference Networks for IR • Turtle and Croft introduced the inference network model for information retrieval • This is a probability-based method • Ranks documents by probability of satisfying a user's information need.
Document / concept/ query network Document network Concept network Query network
Each document can have many representations The relationships between concepts generate the concept network The Document/Concept Network
The information need may be based on complex interactions between various sources of evidence and different representations of the user's need. The Query Network
d2 d3 di d1 r3 r1 r2 rj q1 qk I An inference network
Inference Networks • There are four kinds of nodes in this network.
Inference Networks • The di nodes represent particular documents and correspond to the event of observing that document. • The rj nodes are concept representation nodes. These correspond to the concepts that describe the contents of a document.
Inference Networks • The qk nodes are query nodes. • They correspond to the concepts used to represent the information need of the user. • The single leaf node I corresponds to the (unknown) information need.
Inference Networks • To evaluate a particular document, the single corresponding node is instantiated and the resulting probabilities are propagated through the network to derive a probability associated with the I node. • To generate a ranking for all documents in the collection, this occurs for each of the nodes in the network.
Inference Networks • Each di node is instantiated only once and no other nodes are active at the same time. • The probabilities associated with child nodes are based on the probabilities of their parents and on a ``link matrix'' that specifies how to combine the evidence from the parents.
Inference Networks • The ``link matrices'' between the di nodes and the rj nodes represent the evidence for the proposition that this concept occurs in this document. • The link matrices between the rj nodes and the qk nodes specify how the representation concepts are combined to form the final probability.
Inference Networks • The di and rj nodes are static for a given static collection and are constructed independently of any particular query. • The qk and I portions of the network are constructed individually for each query.
Example • D1: Searching the web • D2: Spiders for searching the web • D3: Tools for searching files • After indexing the terms are: file, search, spider, tool, web • Assume a query “Search tools”
Computing df/idf • df(file)=1, idf(file)=lg(3/1)+1=2.58 • df(search)=3, idf(search)=lg(3/3)+1=1 • df(spider)=1, idf(spidera)=lg(3/1)=2.58 • df(tool)=1, idf(tool)=2.58 • df(web)=2, idf(web)=lg(3/2)+1=1.58
S T P(Q) T T .9 T F 0.7 F T 0.4 F F 0.1 The inference net D1 D2 D3 Search Tools Q
Link matrices • The link matrix for the query terms is supplied by the user • If the query was Boolean and had AND, OR, and NOT nodes link matrices for each could be used • Concept probability is: P(rj|Di) = 0.5+0.5*ntfj*nidfi
Link matrix for “search” P(search|D1)=0.5+0.5*ntf*nidf=0.5+0.5*1*0.38=0.69
P(Q|D1) • P(search=true|D1)=0.69 • P(tools|D1)=0 • P(Q|D1)=P(Q|~S,~T)P(~S)P(~T)+ +P(Q|~S,T)P(~S)P(T)+ +P(Q|S,~T)P(S)P(~T)+ +P(Q|S,T)P(S)P(T)= .1*.31*1 + .4*.31*0 + .7*0.69*1 +.9*0.69*0=0.514
P(Q|D3) • P(search=true|D3)=0.69 • P(tools|D3)=1 • P(Q|D1)=P(Q|~S,~T)P(~S)P(~T)+ +P(Q|~S,T)P(~S)P(T)+ +P(Q|S,~T)P(S)P(~T)+ +P(Q|S,T)P(S)P(T)= .1*.31*0 + .4*.31*1 + .7*0.69*0 +.9*0.69*1=0.745
Evaluation • Fallout • Relevance judgements • 11 point recall/precision • Average recall/precision
Problems with Recall & precision • Recall • undefined when there are no relevant documents • Precision • undefined when no documents are retrieved
Fallout • Fallout= • A good system should have high recall and low fallout
Relevance judgment • Exhaustive? • Assume 100 queries • 750,000 documents in collection • Requires 75 million relevance judgments
Relevance judgment • Sampling? • with 100 queries, average 200 and maximum 900 relevant docs per query, and 750,000 documents, the size of the sample needed for good results is still too large
Relevance judgment • Polling used in TREC • It is assumed that all relevant documents have been retrieved • 200 top documents of 33 runs, an average of 2398 docs per topic
11 point recall/ precision • 11 point (sometimes 20 point) average precision is used to compare systems. • An alternative is to compute recall and precision at N1, N2, N3,…, documents retrieved • Assume 6 relevant docs in collection • 200 documents in collection
Precision 1.0 Relevant nonrelevant 1 2 4 0.75 7 3 10 0.57 5 6 0.5 13 0.46 8 9 11 12 200 0.0 1.0 0.5 0.6 0.8 Recall
Interpolated values • The interpolated precision is the maximum precision at this and all higher recall levels
Precision Interpolated Values 1.0 1 2 4 0.75 7 3 10 0.57 5 6 0.5 0.46 8 9 11 12 200 0.0 1.0 0.5 0.6 0.8 Recall
Averaging performance • Average recall/precision for a set of queries is either user or system oriented • User oriented • Obtain the recall/precision values for each query and • then average over all queries
Averaging performance • System oriented - use the following totals for all queries: • relevant documents, • relevant retrieved, • total retrieved • User oriented is commonly used
User oriented recall-level average • Average at each recall level after interpolation
Precision Query 1 1.0 Query 2 0.75 0.57 0.5 0.46 0.0 1.0 0.5 0.6 0.8 Recall