1 / 26

hive@king Threshing data

hive@king Threshing data. Mattias Andersson, BI Developer, matte@king.com.

radha
Download Presentation

hive@king Threshing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. hive@kingThreshing data Mattias Andersson, BI Developer, matte@king.com “Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”

  2. Agenda A short history of King Why do we use hive at King? I will discuss hive from an analytics and data warehouse user perspective Keep it short

  3. Level 1 This is Bragging warning!

  4. Founded in 2003 by a bunch of ex-Spray guys Thomas Hartwig (CTO), Patrik Stymne (Architect) Sebastian Knutsson (Chief Product Officer), Riccardo Zacconi (CEO), Lars Markgren (GM Sweden)

  5. A European developer with its heart in Sthlm ”Silicontull” + in London, Malmö, Bucharest, San Fran, Malta & Barcelona.

  6. We create & publish casual games

  7. 2003-2010

  8. 200+ casual games 2003-2010 The foundation for our crusade on Facebook and mobile

  9. Fucked by Facebook (FBF Index) Fall of 2010 Facebook unique visitors Yahoo Games US unique visitors 500m 2004 2005 2006 2007 2008 2009 2010

  10. Facebook Fall of 2012, Industry experts: “King missed the train, it’s too late now” “Zynga and Wooga owns the market”

  11. King’s response?

  12. It is never too late to disrupt an industry

  13. April 2011: Bubble Saga on Facebook The Saga format 2011

  14. Bubble Saga was a hit…n.7 on Facebook after 4 months April 2011 Daily Active Users (DAU 2.4 million DAU!

  15. Bubble Witch Saga… Oct 2011-2012 Daily ActiveUniques (DAU) Explosive growth: from 0 to 6 million daily players in 4 months 1 year growth: from 220,000 DAU to 8,500,000!

  16. Mobile: July 2012

  17. Mobile July 2012 - now Also #1 top grossing app in Sweden since February

  18. How we succeeded technically speaking… Our platform Tech choices: Application – 96 servers (java) MySQL – 59 servers Memcache – 24 servers Hadoop cluster – 20 servers How it all works from a BI perspective MySQL shards with user state, they are off limits for BI The game logs events whenever something interesting has happened Hourly rolling of logs to central logserver where we fetch the data

  19. Big data, bigger metadata Metadata…

  20. We are on our way… Are we Big Data?

  21. The most important successfactor for hive Hive connectivity Web interface to hive Easy to use so is a great first encounter Hue Enables us to pull data from hive into Qlikview/R/Excel ODBC The default/advanced interface Command line interface Different interfaces use different escape sequences/variable substitution… Scumbag hive:

  22. This is what sold it to me Hive programmability Hive custom transform from ( from dual map a using 'seq1 5' as sequence int sort by sequence ) map_out reduce sequence using 'awk "{sum+=$0\; print sum}"' as cumulative int; Output: 1 3 6 10 15 Really easy to make something horribly unmaintainable. Perl/xslt/wget in one hql-statement… Scumbag hive:

  23. Map as a double entendre Hive complexity Map datatype create table if not exists test2( test map<string,map<string,int>> ) ROW FORMAT DELIMITED STORED AS TEXTFILE; select test ["test"]["x"] from test2; There is no syntax to declare map/array separators after the first for hive in textfile format, \004 \005 and \006 \007 is hardcoded. Scumbag hive:

  24. Its complicated… So why did we choose to use hive? Pros SQL is easy to learn Supports custom mapreduce jobs ODBC connection for QlikView Hue for lightweight access Development is moving fast Open source Cons High latency Lots of moving parts Not free from bugs

  25. The end.

More Related