220 likes | 317 Views
Learning Probabilistic Models of Link Structure. Getoor, Friedman, Koller, Taskar. Example Application: WebKB. Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class of those pages, recursively)
E N D
Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar
Example Application: WebKB • Classify web page as course, student, professor, project, none using… • Words on the web page • Links from other web pages (and the class of those pages, recursively) • Words in the “anchor text” from the other page <a href=“url”>anchor text</a>. • Web pages obtained from Cornell, Texas, Washington, and Wisconsin
Example Application: CORA • Classify documents according to topic (7 levels) using… • words in the document • papers cited by the document • papers citing the document
Document Document Document Document Document Document Document Document class class class class class class class class words words words words words words words words Standard PRM • parents(Doc.class) = {MODE(Doc.citers.class),MODE(Doc.cited.class)} citers MODE MODE cited
Problem: The Citation Structure is Fixed • The existence (or non-existence) of a link cannot serve as evidence • Individually-linked papers only influence the class through the MODE.
Possible Solution: Link Uncertainty • Model the existence of links as random variables • Create a Link instance for each pair of possibly-linked objects
Cites Cites Cites Document Document Document Exists Exists Exists class class class words words words Unrolled Network
Getoor’s Diagram • Entity classes (Paper) • Relation classes (Cites) • Technically, every instance has an Exists variable which is true for all Entity instances.
Semantics • P is the basic CPT • P* will be the equivalent unrolled CPT • Require that an object does not exist if any of the objects it points to do not exist
Experimental Results • Cora and WebKB
A Second Approach:Reference Uncertainty • Treat reference attributes as random variables • Each reference attribute takes as value an object of the indicated class • Citation • Citing: reference attribute, value is a Paper • Cited: reference attribute, value is a Paper
Problems • How many citation objects exist? Consequently, how many reference random variables exist? • How do we represent P(Citation.cites | …)? Citation.cites could take on thousands of possible values. • Huge conditional probability table • Costly inference at run time
SolutionsProblem 1: How many citations? • Fix the number of Citation objects • This gives the “object skeleton”
Theory Learning Paper Paper Paper Paper Paper Paper Paper Paper Paper Citation Graphics Citing Cited Problem 2: Too many potential values for a reference attribute • Attach to each reference attribute a set of partition attributes • The reference attribute chooses a partition • A Paper is then chosen uniformly at random from the partition
Representing Constraints Between Citing and Cited Papers Parents(Cites.Cited) = {Cites.Citing.Topic}
Sciting Theory Learning Paper Paper Paper Paper Paper Paper Paper Paper Paper Graphics Details • Each reference attribute has a selector attribute S that chooses the partition. Citation Citing Scited Cited
Class-level Dependency Graph • Five types of edges • Type I: edges within a single object • Type II: edges between objects • Type III: edges from every reference attribute along any reference paths • Type IV: edges from every partition attribute to the selector attributes that use those partition attributes to choose a partition • Type V: edge from selector attributes to their corresponding reference attributes
Movie Theater Example • Type I: Genre Popularity • Type II: Shows.Movie.Genre Shows.Profit Shows.Theater.Type SMovie • Type III: Move Profit; Theater Smovie • Type IV: Genre SMovie • Type V: STheater Theater; SMovie Movie
Unrolled Graph? • The Unrolled Graph can have a huge number of edges • Is learning and inference really feasible?
Homework Exercise • Construct the dependency graph for the citation example • Construct an unrolled network for a reference uncertainty example