1 / 22

Learning Probabilistic Models of Link Structure

Learning Probabilistic Models of Link Structure. Getoor, Friedman, Koller, Taskar. Example Application: WebKB. Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class of those pages, recursively)

halil
Download Presentation

Learning Probabilistic Models of Link Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar

  2. Example Application: WebKB • Classify web page as course, student, professor, project, none using… • Words on the web page • Links from other web pages (and the class of those pages, recursively) • Words in the “anchor text” from the other page <a href=“url”>anchor text</a>. • Web pages obtained from Cornell, Texas, Washington, and Wisconsin

  3. Example Application: CORA • Classify documents according to topic (7 levels) using… • words in the document • papers cited by the document • papers citing the document

  4. Document Document Document Document Document Document Document Document class class class class class class class class words words words words words words words words Standard PRM • parents(Doc.class) = {MODE(Doc.citers.class),MODE(Doc.cited.class)} citers MODE MODE cited

  5. Problem: The Citation Structure is Fixed • The existence (or non-existence) of a link cannot serve as evidence • Individually-linked papers only influence the class through the MODE.

  6. Possible Solution: Link Uncertainty • Model the existence of links as random variables • Create a Link instance for each pair of possibly-linked objects

  7. Cites Cites Cites Document Document Document Exists Exists Exists class class class words words words Unrolled Network

  8. Getoor’s Diagram • Entity classes (Paper) • Relation classes (Cites) • Technically, every instance has an Exists variable which is true for all Entity instances.

  9. Semantics • P is the basic CPT • P* will be the equivalent unrolled CPT • Require that an object does not exist if any of the objects it points to do not exist

  10. WebKB Network

  11. Experimental Results • Cora and WebKB

  12. WebKB with various features

  13. A Second Approach:Reference Uncertainty • Treat reference attributes as random variables • Each reference attribute takes as value an object of the indicated class • Citation • Citing: reference attribute, value is a Paper • Cited: reference attribute, value is a Paper

  14. Problems • How many citation objects exist? Consequently, how many reference random variables exist? • How do we represent P(Citation.cites | …)? Citation.cites could take on thousands of possible values. • Huge conditional probability table • Costly inference at run time

  15. SolutionsProblem 1: How many citations? • Fix the number of Citation objects • This gives the “object skeleton”

  16. Theory Learning Paper Paper Paper Paper Paper Paper Paper Paper Paper Citation Graphics Citing Cited Problem 2: Too many potential values for a reference attribute • Attach to each reference attribute a set of partition attributes • The reference attribute chooses a partition • A Paper is then chosen uniformly at random from the partition

  17. Representing Constraints Between Citing and Cited Papers Parents(Cites.Cited) = {Cites.Citing.Topic}

  18. Sciting Theory Learning Paper Paper Paper Paper Paper Paper Paper Paper Paper Graphics Details • Each reference attribute  has a selector attribute S that chooses the partition. Citation Citing Scited Cited

  19. Class-level Dependency Graph • Five types of edges • Type I: edges within a single object • Type II: edges between objects • Type III: edges from every reference attribute along any reference paths • Type IV: edges from every partition attribute to the selector attributes that use those partition attributes to choose a partition • Type V: edge from selector attributes to their corresponding reference attributes

  20. Movie Theater Example • Type I: Genre  Popularity • Type II: Shows.Movie.Genre  Shows.Profit Shows.Theater.Type  SMovie • Type III: Move  Profit; Theater  Smovie • Type IV: Genre  SMovie • Type V: STheater Theater; SMovie  Movie

  21. Unrolled Graph? • The Unrolled Graph can have a huge number of edges • Is learning and inference really feasible?

  22. Homework Exercise • Construct the dependency graph for the citation example • Construct an unrolled network for a reference uncertainty example

More Related