500 likes | 603 Views
Information technology in business and society. Session 9 – Search and Advertising Sean J. taylor. Administrativia. Assignment 2 online d ue Saturday 2/25 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism
E N D
Information technology in business and society Session 9 – Search and Advertising Sean J. taylor
Administrativia • Assignment 2 onlinedue Saturday 2/25 at 1am • Assignment 2 resources • Assignment 3 preview • Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism • Substitute on Thursday 3/1Professor Dylan Walker
Learning objectives • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search
Search Engines and Web Directories • Resources on the Web that help you find sites with the information and/or services you want. • Directory search engine - organizes listings of Web sites into hierarchical lists. • Search engine - uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes.
Web directories Example Advantages? Disadvantages?
Search engine examples Advantages? Disadvantages?
How search engines work • Search engines discover new pages by following links • Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list • Text content is important! But is not enough! (Why?) How do search engines rank pages?(why does this matter?)
PageRank is really a “Random Surfer” Model The matrix if page i links to page j Random Surfer Model: Transfer Matrix: The probability that a surfer follows a link from webpage i to webpage j is = [Prob. you were not “picked up”] * [prob. of following link i->j ] Let’s count the surfer’s that pass through each point: What about getting stuck in loops? takes care of that
Measuring Importance of Linking A PageRank Algorithm • Idea: important pages are pointed to by other important pages • Method: • Each link from one page to another is counted as a “vote” for the destination page • The number of incoming links is important! • But it is not enough! • But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on) • Compare, for example, the cases for the circled page in cases A and B B
BOOK A BOOK B BOOK C BOOK D People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book B book C book A book C book C book D Computing Pagerank (ignoring damping factor for illustration)
BOOK A BOOK B BOOK C BOOK D People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book B book C book A book C book C book D Computing PageRank (ignoring damping factor for illustration)
PageRank BOOK B BOOK D BOOK C BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book A book A book C book C book C book D .250 .250 .250 .250 (ignoring damping factor for illustration)
PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .250/3 .250/2 .250 .250 .250/3 .250/3 .250 .250/2 .250 .250 .250 (ignoring damping factor for illustration)
PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .250/3 .250/2 .375 .083 .250/3 .250/3 .250 .250/2 .083 .458 .250 (ignoring damping factor for illustration)
PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .375/3 .083/2 .375 .083 .375/3 .375/3 .458 .083/2 .083 .458 .083 (ignoring damping factor for illustration)
PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .375/3 .083/2 .500 .125 .375/3 .375/3 .458 .083/2 .125 .250 .083 (ignoring damping factor for illustration)
PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .400/3 .133/2 .400 .133 .400/3 .400/3 .333 .133/2 .133 .333 .133 (ignoring damping factor for illustration)
Gaming PageRank and Trust Links from untrusted sources A TrustRank Algorithm • Initial votes come only from trusted pages • Compare, for example, the cases for the circled page in cases A and B B trusted page trusted page
BOOK A BOOK D BOOK C BOOK B People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book A book C book B book C book C book D SimulatingChanges in PageRank .400 .133 .133 .333
importance of anchor text <a href=http://www.sims…> A terrific course on search engines</a> <a href=http://www.sims…> INFOSYS 141</a> The anchor text summarizes what the website is about.
Other ranking factors • Location, Location, Location...and Frequency • Query words in title, or in first few sentences • The more frequent the query words, the better • Click through measurement • How often users click on your URL, when they see it • How long do they stay (using toolbars!)
Outline • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search
Achieving Higher Results Rankings • Position your keywords (title, headings, early on page) • Make text visible (no tiny fonts, no white-on-white) • Frames can kill • Have relevant content • Do not change topics • Just say no to search engine spamming • Submit your key pages • Verify your listing often
Manipulating Rankings • Motives • Commercial, political, religious, lobbies • Promotion funded by advertising budget • Operators • Contractors (Search Engine Optimizers) for lobbies, companies • Web masters • Hosting services What are the techniquesused by rankings manipulators?
SPAM N Is this a Search Engine spider? Fake Doc Y Manipulation technologies • Cloaking • Serve fake content to search engine robot • DNS cloaking: Switch IP address. Impersonate • Doorway pages • Pages optimized for a single keyword that re-direct to the real target page • Keyword Spam • Misleading meta-keywords, excessive repetition of a term, fake “anchor text” • Hidden text with colors, CSS tricks, etc. • Link spamming • Mutual admiration societies, hidden links, awards • Domain flooding: numerous domains that point or re-direct to a target page • Robots • Fake click stream • Fake query stream Cloaking Meta-Keywords = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …” Risky to use any of these as search engines aregetting better at detecting and punishing them
Outline • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search
Paid Ranking Promoting without Manipulation: Paid placement • Keyword bidding for targeted ads • Pay-per-click • Higher bids result in higher ranks for the ad • Higher percentage of clicks on the ad, increase the rank as well (why?) • Google's AdWords is the biggest player • Google’s 2007 revenue was more than $16 Billion, 2008 ~ $22 Billion, mostly from such ads
Example AdWords Placement AdWords Placement Most relevant sites
Google also delivers ads to other websites Fund Your Website: AdSense • Sign-up for Google AdSense, and Google delivers ads to your website (common source of income for “professional” bloggers) • How ads are delivered: • If website best for targeted keywords • If users of website click on results • Strategies for successful ads: • Place the ads on top • Blend with the rest of the website • Ads at the bottom are ignored consistently
Targeting Banner Ads Context: Movie reviews User Profile: NYU userNew York Targeted Ad is Delivered to User Request for Ad from Ad Server IP Address Country, Domain, Company Browser, Operating System Surfing Behavior from cookies Demographic Data?
User Clicks & Visits Advertiser’s Site User Visits Publisher Sites Boomerang Captures User Action Data Data Analysis DART For Advertisers Boomerang Compiles & Reports Response For Future Targeting Ads Delivered By Dart For Advertisers Closed Loop Marketing Databank Source: Doubleclick, Inc.
Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search
Information Extraction System (e.g., NYU’s Proteus) Information Extraction • Information extraction applications extract structured relations from unstructured text May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis… Disease Outbreaks in The New York Times
Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search
Y! Answers • Launched in second half of 2005 • Incentive system based on points and voting for best answers • Questions grouped by category • Some statistics: • over 60 million users • over 120 million answers, available in 18 countries and in 6 languages
Long-term Prospects • Questions follow a power-law: • Large number of questions will be asked by many people (20% of questions80% of requests) • We only need one answer for each question • Acquire quickly high-quality answers for 80% of queries • …people will take care in time of the “long tail” of the remaining questions
Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search
Next Class:Social Networks • Work on Assignment 2