1 / 50

Information technology in business and society

Information technology in business and society. Session 9 – Search and Advertising Sean J. taylor. Administrativia. Assignment 2 online d ue Saturday 2/25 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism

willis
Download Presentation

Information technology in business and society

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information technology in business and society Session 9 – Search and Advertising Sean J. taylor

  2. Administrativia • Assignment 2 onlinedue Saturday 2/25 at 1am • Assignment 2 resources • Assignment 3 preview • Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism • Substitute on Thursday 3/1Professor Dylan Walker

  3. Learning objectives • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search

  4. Search Engines and Web Directories • Resources on the Web that help you find sites with the information and/or services you want. • Directory search engine - organizes listings of Web sites into hierarchical lists. • Search engine - uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes.

  5. Web directories Example Advantages? Disadvantages?

  6. Search engine examples Advantages? Disadvantages?

  7. Search engines drive ecommerce!

  8. Where is consumers attention?

  9. Eyetracking study of Google Results

  10. How search engines work • Search engines discover new pages by following links • Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list • Text content is important! But is not enough! (Why?) How do search engines rank pages?(why does this matter?)

  11. PageRank is really a “Random Surfer” Model The matrix if page i links to page j Random Surfer Model: Transfer Matrix: The probability that a surfer follows a link from webpage i to webpage j is = [Prob. you were not “picked up”] * [prob. of following link i->j ] Let’s count the surfer’s that pass through each point: What about getting stuck in loops? takes care of that

  12. Measuring Importance of Linking A PageRank Algorithm • Idea: important pages are pointed to by other important pages • Method: • Each link from one page to another is counted as a “vote” for the destination page • The number of incoming links is important! • But it is not enough! • But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on) • Compare, for example, the cases for the circled page in cases A and B B

  13. BOOK A BOOK B BOOK C BOOK D People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book B book C book A book C book C book D Computing Pagerank (ignoring damping factor for illustration)

  14. BOOK A BOOK B BOOK C BOOK D People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book B book C book A book C book C book D Computing PageRank (ignoring damping factor for illustration)

  15. PageRank BOOK B BOOK D BOOK C BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book A book A book C book C book C book D .250 .250 .250 .250 (ignoring damping factor for illustration)

  16. PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .250/3 .250/2 .250 .250 .250/3 .250/3 .250 .250/2 .250 .250 .250 (ignoring damping factor for illustration)

  17. PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .250/3 .250/2 .375 .083 .250/3 .250/3 .250 .250/2 .083 .458 .250 (ignoring damping factor for illustration)

  18. PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .375/3 .083/2 .375 .083 .375/3 .375/3 .458 .083/2 .083 .458 .083 (ignoring damping factor for illustration)

  19. PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .375/3 .083/2 .500 .125 .375/3 .375/3 .458 .083/2 .125 .250 .083 (ignoring damping factor for illustration)

  20. PageRank BOOK C BOOK D BOOK B BOOK A People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book B book C book A book A book C book C book D .400/3 .133/2 .400 .133 .400/3 .400/3 .333 .133/2 .133 .333 .133 (ignoring damping factor for illustration)

  21. Gaming PageRank and Trust Links from untrusted sources A TrustRank Algorithm • Initial votes come only from trusted pages • Compare, for example, the cases for the circled page in cases A and B B trusted page trusted page

  22. BOOK A BOOK D BOOK C BOOK B People who bought this also bought… People who bought this also bought… People who bought this also bought… People who bought this also bought… book A book A book C book B book C book C book D SimulatingChanges in PageRank .400 .133 .133 .333

  23. importance of anchor text <a href=http://www.sims…> A terrific course on search engines</a> <a href=http://www.sims…> INFOSYS 141</a> The anchor text summarizes what the website is about.

  24. Other ranking factors • Location, Location, Location...and Frequency • Query words in title, or in first few sentences • The more frequent the query words, the better • Click through measurement • How often users click on your URL, when they see it • How long do they stay (using toolbars!)

  25. Outline • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search

  26. Achieving Higher Results Rankings • Position your keywords (title, headings, early on page) • Make text visible (no tiny fonts, no white-on-white) • Frames can kill • Have relevant content • Do not change topics • Just say no to search engine spamming • Submit your key pages • Verify your listing often

  27. Manipulating Rankings • Motives • Commercial, political, religious, lobbies • Promotion funded by advertising budget • Operators • Contractors (Search Engine Optimizers) for lobbies, companies • Web masters • Hosting services What are the techniquesused by rankings manipulators?

  28. SPAM N Is this a Search Engine spider? Fake Doc Y Manipulation technologies • Cloaking • Serve fake content to search engine robot • DNS cloaking: Switch IP address. Impersonate • Doorway pages • Pages optimized for a single keyword that re-direct to the real target page • Keyword Spam • Misleading meta-keywords, excessive repetition of a term, fake “anchor text” • Hidden text with colors, CSS tricks, etc. • Link spamming • Mutual admiration societies, hidden links, awards • Domain flooding: numerous domains that point or re-direct to a target page • Robots • Fake click stream • Fake query stream Cloaking Meta-Keywords = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …” Risky to use any of these as search engines aregetting better at detecting and punishing them

  29. Outline • Learn how search engines rank pages • Learn how to design effectively for high rankings • Learn how online advertising works, especially search ads and keyword auctions • The future of search

  30. Paid Ranking Promoting without Manipulation: Paid placement • Keyword bidding for targeted ads • Pay-per-click • Higher bids result in higher ranks for the ad • Higher percentage of clicks on the ad, increase the rank as well (why?) • Google's AdWords is the biggest player • Google’s 2007 revenue was more than $16 Billion, 2008 ~ $22 Billion, mostly from such ads

  31. Example AdWords Placement AdWords Placement Most relevant sites

  32. Google also delivers ads to other websites Fund Your Website: AdSense • Sign-up for Google AdSense, and Google delivers ads to your website (common source of income for “professional” bloggers) • How ads are delivered: • If website best for targeted keywords • If users of website click on results • Strategies for successful ads: • Place the ads on top • Blend with the rest of the website • Ads at the bottom are ignored consistently

  33. Example: Washington PostWebsite

  34. Analysis of Washington PostWebsite

  35. Targeting Banner Ads Context: Movie reviews User Profile: NYU userNew York Targeted Ad is Delivered to User Request for Ad from Ad Server IP Address Country, Domain, Company Browser, Operating System Surfing Behavior from cookies Demographic Data?

  36. User Clicks & Visits Advertiser’s Site User Visits Publisher Sites Boomerang Captures User Action Data Data Analysis DART For Advertisers Boomerang Compiles & Reports Response For Future Targeting Ads Delivered By Dart For Advertisers Closed Loop Marketing Databank Source: Doubleclick, Inc.

  37. Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search

  38. Information Extraction System (e.g., NYU’s Proteus) Information Extraction • Information extraction applications extract structured relations from unstructured text May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis… Disease Outbreaks in The New York Times

  39. Return structured answers, not Webpages

  40. Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search

  41. Y! Answers • Launched in second half of 2005 • Incentive system based on points and voting for best answers • Questions grouped by category • Some statistics: • over 60 million users • over 120 million answers, available in 18 countries and in 6 languages

  42. Y! Answers

  43. Y! Answers

  44. Long-term Prospects • Questions follow a power-law: • Large number of questions will be asked by many people (20% of questions80% of requests) • We only need one answer for each question • Acquire quickly high-quality answers for 80% of queries • …people will take care in time of the “long tail” of the remaining questions

  45. Future of Search • Information Extraction:Search on Structured Data • Social Search • Privacy Preserving Search

  46. Privacy preserving Search

  47. Next Class:Social Networks • Work on Assignment 2

More Related