CSE 522 – Algorithmic and Economic Aspects of the Internet

CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian

Previously in this class • Properties of social networks • Probabilistic and game theoretic models for social networks

This Lecture Ranking Web Pages using Link Analysis

Components of a search engine • Crawler • How to handle different types of URL • How often to crawl each page • How to detect “duplicates” • Indexer • Data structures (to minimize # of disk access) • Query handler • Find the set of pages that contain the query word. • Sort the results.

Sorting the search results • HITS (Hypertext Induced Topic Selection) J. Kleinberg, “Authorative sources in a hyperlinked environment”, SODA 1998. • PageRank S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine”, WWW 1998. L. Page, S. Brin, R. Motwani, and Winograd, “The PageRank citation ranking: bringing order to the web”.

Difficulties • Too many hits (“abundance”) # indexed pages: 110,000 in 94; 100,000,000 in 97. ) Often too many pages contain the query. • Sometimes pages are not suff. self-descriptive. Brin & Page: As of Nov 97, only one in the top four commercial search engine finds itself! • Need to find “popular” pages.

Link analysis • Instead of using text analysis, we analyze the structure of hyperlinks to extract information about the popularity of a page. • Advantages: • No need for complicated text analysis • Less manipulable, and independent of one person’s point of view. (think of it as a voting system).

Relevance vs. popularity • Need to achieve a balance between relevance and popularity. • Kleinberg’s approach: construct a focused subgraph based on relevance, and return the most popular page in this subgraph. • Google’s approach: compute a measure of relevance (considering how many times and in what form [title/url/font size/anchor] the query appears in the page), and multiply with a popularity measure called PageRank.

Constructing a focused subgraph • Desired properties: • Relatively small • Rich in relevant pages • Contains most of the strongest authorities on the subject.

Constructing a focused subgraph • Given query , start with the set R of the top ~200 text-based hits for . • Add to this set: • the set of pages that have a link from a page in R; • the set of pages that have a link to a page p in R, with an upper limit of ~50 pages per p 2 R.ssdf • Call the resulting set S. • Find the most “authorative” page in G[S].

Finding authorities • Approach 1: vertices with the largest in-degrees • This approach is used to evaluate scientific citations (the “impact factor”). • Deficiencies: • A page might have a large in-degree from low-quality pages. • “universally popular” pages often dominate the result. • Easy to manipulate.

Finding authorities • Approach 2: define the set of authorities recursively. • Best authorities on a subject have a large in-degree from the best hubs on the subject. • Best hubs on a subject give links to the best authorities on the subject. • Formulation as a principal eigenvector

Discussion • This algorithm can also be used to find the closest pages to a give page p. • Let R be the set of at most ~200 pages that point to p. • Can also compute multiple sets of hubs and authorities.

PageRank • Again, the idea is a recursive definition of importance: • An important page is a page that has many links from other important pages. • Problems: • Not always well-defined. • Pages with no out-degree form rank sinks.

PageRank • Fix: consider a “random surfer”, which every time either clicks on a random link, or with probability , gets bored and starts again from a random page. • PageRank takes  ¼ 1/7, and uses a non-uniform distribution for starting again.

CSE 522 – Algorithmic and Economic Aspects of the Internet

CSE 522 – Algorithmic and Economic Aspects of the Internet

Presentation Transcript

Incorporating News into Algorithmic Market Trading

CHAPTER 1

Algorithmic Game Theory and Internet Computing

Algorithmic Puzzles

Aspects of Plant Breeding

A few economic aspects of tobacco smuggling in Hungary

Algorithmic and Economic Aspects of Networks

Algorithmic and Economic Aspects of Networks

Special Topics on Algorithmic Aspects of Wireless Networking

Socio-Economic and Environmental Aspects of Credit Management

Machine Learning Applications in Algorithmic Trading

Socio-Economic Aspects of Fusion Energy

SECOND PART: Algorithmic Mechanism Design

Algorithmic Game Theory and Internet Computing

Financial Aspects of Economic Condition

Backtracking Algorithmic Complexity Attacks Against a NIDS

PART FOUR: Algorithmic aspects related to security in distributed systems

Algorithmic Game Theory and Internet Computing

Algorithmic Problems for Curves on Surfaces

EU-Ukrainian, EU-Russian relations: converging or diverging frameworks? - Economic aspects

Algorithmic Game Theory and Internet Computing

CADASTRAL REGISTRATION OF DWELLINGS: LEGAL AND ECONOMIC ASPECTS