1 / 57

Search and the New Economy Wisdom of the Crowds

Search and the New Economy Wisdom of the Crowds. Prof. Panos Ipeirotis. Summary from last session. We can quantify unstructured, qualitative data. We need: A context in which content is influential and not redundant (experiential content for instance)

reegan
Download Presentation

Search and the New Economy Wisdom of the Crowds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search and the New EconomyWisdom of the Crowds Prof. Panos Ipeirotis

  2. Summary from last session • We can quantify unstructured, qualitative data. We need: • A context in which content is influential and not redundant (experiential content for instance) • A measurable economic variable: price (premium), demand, cost, customer satisfaction, process cycle time • Methods for structuring unstructured content • Methods for aggregating the variables in a business context-aware manner

  3. Summary from last session • What are good properties of opinion mining systems? • Structuring: Opinions are expressed in many ways, summarize uniformly • Independent summaries: Not all scenarios have associated economic outcomes, or difficult to measure (e.g., discussion about product pre-announcement) • Personalization: The weight of the opinion of each person varies (interesting future direction!) • Data collection: Rarely evaluations are in one place (basic value proposition of Buzzmetrics)

  4. Summary from last session • Review/reputation systems gather the opinion of many users • Review/reputation systems exhibit biases • Prediction markets aggregate information from traders • Prediction markets appear to be (?) robust to biases • Today’s question: Can we harness any wisdom from the crowd?

  5. Madness of the Crowds • In 19th century, it was • “The madness of the crowds” (Mackay, 1841) • “A sensible individual, as a member of a crowd becomes a blockhead” (Baruch) • “Madness is the exception in individuals and the rule in groups” (Nietzsche) • “Crowds deliver verdicts of which every individual disagree” (Le Bon) Were they wrong?

  6. Case Study 1: InnoCentive

  7. Innocentive • Started in 2001 by VP of Eli Lilly • A Craigslist for solutions to scientific problems • Posters post problem and bounty • Solvers try to find a solution • Double blind process, identities hidden • More than 1/3 of problems solved • 57% “reduction to practice” • 43% pencil-and-paper • Problems unsolvable by labs of P&G, etc.

  8. The reach of InnoCentive = Countries With Registered Solvers Total Solvers > 70,000 Total Seekers - 34 Scientific Disciplines - 40 Challenges Posted > 200 Challenges Solved > 58

  9. InnoCentive reaches talent pools "Wherever you work, most of the smart people are somewhere else." --- Bill Joy • Traditional pools • US, EU Academics • Contract labs (FTE) • Individual networks • Opportunity pools • Global Academia • Researchers in Russia, India, China. • Scientists in other industries • Excess capacity • Retirees Traditional networks Nontraditional pools of intellectual capacity

  10. Who are the solvers? • INNOCENTIVE 216128 (Protein crosslinks) • INNOCENTIVE 3109(R4-(4-Hydroxyphenyl) Butanoic Acid) • INNOCENTIVE 96229 (Regio-StereocontrollexTricyclicAlcohols) • INNOCENTIVE 258382 (Paracrystalline Arrays) • INNOCENTIVE 55195(Substituted isoquinoline) Head of Indian research institute Retired head of Hoechst R&D N. Ireland CRO and US Professor Outside discipline Russian scientist Note: In other InnoCentive-like companies, solutions often come not only from individuals but from companies in other industries (the case of unused patents)

  11. InnoCentiveProposition • Seekers • Project Roadblock • Revive Stalled Projects or Close “Dead” Projects • Culture – work outside the walls • Solvers • Intellectual challenge • Business development opportunities • Financial reward • Recognition • No direct involvement of a “crowd”: Single solver

  12. Why InnoCentive’s approach works? • Reason for success? • Motivation for participants? • Ways to improve?

  13. Case Study 2: Procter & Gamble

  14. P&G’s Collaborative Approach The "P&G Advisors” program allows consumers to try new products and offer suggestions and feedback to P&G for refining their products and shaping national marketing plans. Before, P&G would spend $25,000 to field a new product concept test that took two months to complete. Now, by engaging the customers, the company spends $2,500 and gets results in about two weeks.

  15. P&G spins profits with the SpinBrush • Developed in 1998 by a startup (Dr Johns' Products) • Originated as the “Spin Pop”, a lollipop with a battery-operated handle that twirled the candy in the eater’s mouth. • Price point of $5 was a breakthrough in making this a mass-market product. • In January 2001, Oshersold the idea to P&G. • P&G retained Osher and his team for a year to oversee the transition and gave Osher and his team lots of leeway in bending P&G’s corporate rules in marketing the product. • The Crest SpinBrush is the best-selling toothbrush in the US. It generates over $300 million in annual revenues for P&G.

  16. Why P&G’s approach worked? • Reason for success? • Motivation for participants? • Ways to improve?

  17. Case Study 3: Wikipedia IS IT TRUE THAT ANYONE CAN EDIT IT? WHAT IS IT? WHO’S USING IT? WHAT SHOULD I TELL WHEN I GET CONTENT FROM THERE? IS IT BAD? IS IT HERE TO STAY? IS IT GOOD? HOW DOES IT COMPARE TO OTHER ENCYCLOPEDIAS?

  18. The “Wiki” in Wikipedia • Shorter form of wiki wiki (weekie, weekie) which is from the native language of Hawaii, where it is commonly used to denote something "quick" or "fast” • A wiki is a collaborative website which can be directly edited by anyone with access to it

  19. Wikipedia History • Formally began on January 15, 2001 as a complement to the Nupedia project

  20. Wikipedia in 2007 • Wikipedia continues to grow, with some 5 million registered editor accounts; the combined Wikipedias in all languages together contain 1.74 billion words in 7.5 million articles in approximately 250 languages; the English Wikipedia gains a steady 1,700 articles a day, with the wikipedia.org domain name ranked at around the 10th busiest on the Internet …

  21. Nature (2005) • Compared Wikipedia with Britannica Online • 42 science entries blindly reviewed by experts • Results: Britannica averaged 3 errors, Wikipedia 4 • Nature: All entries were blinded • Britannica: Study had numerous errors

  22. Nature (2005) • Economist (4-6-06): Study compares apples and oranges • Britannicaarticles shorter; omissions counted against it • Authorities not favored, even viewed with suspicion • Response: Do we really need experts for most entries in a general reference source?

  23. Nature (2005) • Entries for pop cultural figures vs. those for great literary figures, scientists, etc. • Entry for Britney Spears longer than entry for St. Augustine • Seinfeld longer than Shakespeare; Barbie longer than Bellow • Further drawback of the Naturestudy: No comparisons of style

  24. Why Wikipedia Works? • Reason for success? • Motivation for participants?

  25. Case Study 4: Collective Tagging

  26. Flickr • Online photo albums • People describe photos using tags • Tag search

  27. How to label ALL images on the Web? • The slides that follow demonstrate simple principles: • motivate your users • other people’s procrastination can be your productivity

  28. MARTHA STEWART FLOWERS SUPER EVIL Labeling Images with Words STILL AN OPEN PROBLEM

  29. Desiderata A METHOD THAT CAN LABEL ALL IMAGES ON THE WEB FAST AND CHEAP

  30. Using Humans CLEVERLY THE ESP GAMECOULD LABEL ALL IMAGES ON THE WEB IN 30 DAYS! THE ESP GAME

  31. The ESP Game TWO-PLAYER ONLINE GAME PARTNERS DON’T KNOW EACH OTHER AND CAN’T COMMUNICATE OBJECT OF THE GAME: TYPE THE SAME WORD THE ONLY THING IN COMMON IS AN IMAGE

  32. GUESSING: CAR GUESSING: BOY GUESSING: CAR GUESSING: HAT GUESSING: KID SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR The ESP Game PLAYER 1 PLAYER 2

  33. © 2004 Carnegie Mellon University, all rights reserved. Patent Pending.

  34. The ESP Game is FUN 4.1 MILLION LABELS WITH 23,000 PLAYERS THERE ARE MANY PEOPLE THAT PLAY OVER 20 HOURS A WEEK 5000 PEOPLEPLAYING SIMULTANEOUSLY CAN LABEL ALL IMAGES ON GOOGLE IN30 DAYS! INDIVIDUAL GAMES IN YAHOO! AND MSN AVERAGE OVER 5,000 PLAYERS AT A TIME

  35. WE EMULATE PARTNER BY PLAYING PRE- RECORDED MOVES The ESP Game in Single-Player Mode A SINGLE PERSON CAN PLAY WITH PRE-RECORDED ACTIONS AS THEIR PARTNER (0:12) CAR (0:15) HAT (0:21) KID (0:08) BOY (0:23) CAR NOTICE THAT THIS DOESN’T STOP THE LABELING PROCESS! WHEN 2 PEOPLE PLAY, WE RECORD EVERY ACTION WITH TIMING INFORMATION

  36. What about Cheating? Speed Detection IF A PAIR PLAYS TOO FAST, WE DON’T RECORD THE WORDS THEY AGREE ON

  37. Qualification Test WE GIVE PLAYERS TEST IMAGES FOR WHICH WE KNOW ALL THE COMMON LABELS: WE ONLY STORE A PLAYER’S GUESSES IF THEY SUCCESSFULLY LABEL THE TEST IMAGES What about Cheating?

  38. BEACHCHAIRSSEAPEOPLEMAN WOMANPLANTOCEAN TALKINGWATERPORCH Sample Labels

  39. SADDAM MR. WILSON MAN FACE MOUSTACHE COMING SOON: MEET YOUR SOUL MATE THROUGH THE GAME! BUSH PRESIDENT DUMB YUCK YUCK SAMPLE LABELS

  40. Why ESP Works? • Reason for success? • Motivation for participants?

  41. Case Study 5: Amazon Mechanical Turk Some tasks at present can’t be done by computers, or humans do them much better At Amazon, this is called a Human Intelligence Task (HIT) Also called Artificial artificialintelligence HITs taken by Turks Examples of HITs Add Keywords to images Crop Images Distributed Telemarketing Spam Identification Subtitling, speech-to-text Adult content analysis Facial Recognition Proof Reading OCR Correction/Verification Document labelling

  42. MTurk – HITs • “Turkers” can take Human Intelligence Tests from the Amazon website • Paid whatever the HIT is worth, potential bonus • Can be required to “qualify” for the HIT • Results Submission can be file upload, multiple choice or freeform text • Demo HIT at mturk.com

  43. MTurk – Creating HITs • Define the Question/Task and submission method • Question/Answer Schema • Define number of ‘assignments’ • Qualifications for the HIT • Value of the HIT

  44. MTurk – Quality Control • How do you ensure the work delivered by the Turk is of good quality? • Accepting only “correct” answers • Manually • Automatically? • Reputation system for Turkers • Qualification Tests

  45. MTurk - Qualifications • Account Based Qualifications • HIT Abandonment Rate (%) • HIT Approval Rate (%) • HIT Rejection Rate (%) • … more combinations of the above • Location • Create Your own

  46. Why Mturk Works? • Reason for success? • Motivation for participants?

  47. Case Study 6: Digg • Kevin Rose, founder of Digg.com • $60 million in 18 months • New model for a newspaper • Readers are also contributors • Readers dig up interesting stories from all over the web and post brief synopses • Other readers vote on them—the most popular ascend the page It is a community made up of a fairly homogenous demographic—80% are male, mainly young techie readers

  48. Execution • The site harnesses the competitive instincts of the readers/contributors to compete to see whose story will lead • The site is dynamic—leading stories change by the minute or hour

More Related