1 / 64

Web Behavior Analysis

Web Behavior Analysis. Your Last Words? (in 22 nd century). To family To your best friend?. Web Behavior Analysis. Why important? Why scary?. Part I: Why Important?. We rely more and more on search for our real-life decision Opportunities for business Concerns for privacy.

ismael
Download Presentation

Web Behavior Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Behavior Analysis

  2. Your Last Words? (in 22nd century) • To family • To your best friend?

  3. Web Behavior Analysis • Why important? • Why scary?

  4. Part I: Why Important?

  5. We rely more and more on search for our real-life decision Opportunities for business Concerns for privacy Users Relying More on Search Q. In the past six months have you used a search engine to help inform your decisions for the following tasks? 66% of people are using search more frequently to make decisions Users need help with tasks and making decisions

  6. Focus on new territory What should be done? Decision Sessions are Lengthy Length of Sessions by Type Complex task and decision sessions could be easier

  7. Taxonomy of Web queries • Navigational (we are good at this) • to reach a particular site • E.g., Searching for top page of company • Informational • to acquire pages that provide knowledge for user’s information need • Conventional ad hoc retrieval • Transactional • to perform a Web-mediated activity • E.g., online shopping

  8. Example: Good and Bad  Navigational Queries  Pseudo- Navigational Queries

  9. Example of “Hard Queries”:Informational/Transactional Car GPS around $300 Four day trip to Bhutan from Delhi to visit important Buddhist places

  10. Party Site Game Consoles

  11. What we want?

  12. Current research directions • How to classify queries? • Then what? • Search engines trying to reduce clicks for “hard queries” • Extracting info from forum

  13. Importance of query classification: “obama” • Informational: People may search to know more about Barak Obama • Navigational: visit his official website • Transactional: perhaps the user goal is to donate money online to support Mr. Obama’s campaign

  14. Yahoo numbers • ~25 informational  content text? • ~40 navigational  anchor text? • ~35 transactional  site template?

  15. Can you tell if query is “navigational” or not?

  16. Lee et al.[WWW05]: Overview • Analyzing how query term is used in anchor texts Q = “search” Q = “WWW2008” search search WWW2008 WWW2008 Description in Wikipedia Search engine Top page of WWW2008 Destinations are diverse → Informational Destinations are identical → Navigational

  17. Anchor-link distribution (ALD) Probability that page linked by t is d t = search t = WWW2008 ALD is uniform ALD is skewed Google Yahoo! Wikipedia Top page of WWW2008 Informational Navigational

  18. Lee et al.: Problem • Targeting only anchor texts that are exactly same as the query • If the same anchor text as the query does not exist on the Web, ALD cannot be computed • Problematic queries • Long phrase • E.g., “information retrieval system research” • Multiple keywords • E.g., “trec, nist, test collection”

  19. Multi-query solution QueryQ = “trec, test collection” TermsT = {trec, test, collection} destinationsD = {d1, d2, …} t = trec t = test t = collection Compute ALD on a term-by-term basis and integrate them

  20. Computation of classification score • Entropy of D Entropy of a single term t Weighted average

  21. Now what? • For “WWII” • Google: http://www.google.com/search?q=WWII&hl=en&tbo=1&output=search&tbs=ww:1 • Microsoft: http://www.bing.com/reference/semhtml/World_War_II?fwd=1&qpvt=wwii&src=abop&q=wwii • Wolfram: http://www.wolframalpha.com/input/?i=wwII • Can you tell information vs. transactional?

  22. Challenges/Opportunities • Slightly subtle/interleaved • But huge advertisement revenue (yet to be explored)!!!! • Classic querylog+Clicks on surface web not enough.. • Any ideas?

  23. Eye movement? Brain signal? More signals?

  24. More corpus? (social corpus for polls? expert advice?)

  25. More signal

  26. CS: Client Simple • First representation: • Trajectory length • Horizontal range • Vertical range Horizontal range Trajectory length Vertical range

  27. CF: Client Full • Second representation: • 5 segments: initial, early, middle, late, and end • Each segment: speed, acceleration, rotation, slope, etc. 1 2 3 4 5

  28. Navigational query: “facebook”

  29. Informational query: “spanish wine”

  30. Transactional query: “integrator”

  31. More corpus • cQA successful, as “additional corpus”, not as “additional means” • Challenges?

  32. cQA (Yahoo Answers)

  33. How Yahoo Answers works

  34. Good questions draw good answers

  35. Good Q/A? -- Text Check also: http://www.addedbytes.com/code/readability-score/

  36. Good Q/A? -- Clicks

  37. Good Q/As? -- Community

  38. Why scary?

  39. Useful beyond imagination • Spell checker: SIGMOD Did you mean “sigmoid”? • Entity relation: SIGMOD ~ SIGIR • Translation: SIGMOD, 씨그모드  sigmod.com • Query suggestion: 영일대  호텔 영일대 • Rank learning: top 10 entry is visited all the time, what should we do? • Reason of migrain?

  40. Companies need YOUR HELP • AOL released logs • Guess what happened?

  41. More scientific observations (Yahoo Research) • X={query1, query2, query3} • Y= age gender area XY (how likely?)  Validate with ground-truth info (Yahoo account)

  42. See if you can do it? • You observe yourself: http://aolpsycho.com/user/5826-kallemeyn

More Related