1 / 54

Machine Learning in realtime

Machine Learning in realtime. Hilary Mason h@bit.ly @ hmason. What’s a bitly ?. http://www.pcworld.com/article/223409/move_over_dr_soong_ girls_can_build_android_apps_too.html. http://bit.ly/ hOnbWg. Email IM. Twitter. mobile. F acebook. Google+. b itly !.

sahara
Download Presentation

Machine Learning in realtime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in realtime Hilary Mason h@bit.ly @hmason

  2. What’s a bitly?

  3. http://www.pcworld.com/article/223409/move_over_dr_soong_ girls_can_build_android_apps_too.html http://bit.ly/hOnbWg

  4. Email IM Twitter mobile Facebook Google+ bitly!

  5. Our challenge: What’s happening on the internet in realtime.

  6. Our analysis:analyticsproductscience

  7. Our challenge: What’s happening on the internet in realtime.

  8. [zoom in]

  9. What spoken languages are in a page?

  10. raw data "es" "en-us,en;q=0.5" "pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4" "en-gb,en;q=0.5" "en-US,en;q=0.5" "es-es,es;q=0.8,en-us;q=0.5,en;q=0.3” "de, en-gb;q=0.9, en;q=0.8"

  11. entropy calculation def ghash2lang(g, Ri, min_count=3, max_entropy=0.2): """ returns the majority vote of a langauge for a given hash """ lang = R.zrevrange(g,0,0)[0] # let's calculate the entropy! # possible languages x = R.zrange(g,0,-1) # distribution over those languages p = np.array([R.zscore(g,langi) for langi in x]) p /= p.sum() # info content I = [pi*np.log(pi) for pi in p] # entropy: smaller the more certain we are! - i.e. the lower our surprise H = -sum(I)/len(I) #in nats! # note that this will give a perfect zero for a single count in one language # or for 5K counts in one language. So we also need the count.. count = R.zscore(g,lang) if count < min_count and H > max_entropy: return lang, count else: return None, 1

  12. http://4sq.com/96kc1O

  13. How do people click on URLs over time?

  14. normal click distributions

  15. abnormal click distributions

  16. clusted URLs

  17. What is the half life of a link?

  18. Does this apply to other dimensions?

  19. Can we predict at time t how many clicks a link is likely to ever get?

  20. Text analysis?

  21. How do we rank document relevance in realtime?

  22. Simple> Complex (especially algorithms)

More Related