120 likes | 276 Views
Chapter 10 Link Analysis. Data Mining Techniques So Far…. Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering Chapter 9 – Market Basket Analysis and Association Rules.
E N D
Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules
Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary • Apple’s HyperCard (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory
Introduction • As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations • Identifying authoritative sources of information on the WWW by analyzing page links • Understanding physician referral patterns • Analyzing telephone call patterns
Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs consist of • Nodes (vertices) which are the things in the graph that have relationships • Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph
Basic Graph Theory • A path is an ordered sequence of nodes connected by edges • Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them • Example: Weights support the association between two products being purchased together
Graph Theory Classic Problems • Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) • Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) • Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120)
Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) • Ex: WWW
Web pages = nodes Hyperlinks = edges Spiders & Web crawlers updating Kleinberg’s Algorithm Hub – a page that links to many authorities Authority – a page that is linked to by many hubs Google – Directed Graph Example
Google – example continued • Authority versus mere popularity • Rank by number of unrelated sites linking to a site yields popularity • Rank by number of subject-related hubs that point to them yields authority • Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it
Examples of Link Analysis • Recent Int’l Data Mining Conference • http://www.siam.org/meetings/sdm04/ • Chapter10-Example1.pdf • Chapter10-Example2.pdf • Chapter10-Example3.pdf • Megaputer (PolyAnalyst vendor) page: • http://www.megaputer.com/products/pa/algorithms/la.php3