160 likes | 272 Views
[[The Wikibon Project]]. Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011. What is Big Data?. Big Data n Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools.
E N D
[[The Wikibon Project]] Big Data and Hadoop: Key Drivers, Ecosystem and Use Cases November 2011
What is Big Data? Big Datan Data sets whose size, type and/or speed make them impractical to process and analyze with traditional database technologies and related data management tools.
Why is Big Data Important? Big Data is the new definitive source of competitive advantage across industries … … For those organizations that embrace Big Data, the possibilities for innovation, improved agility, and increased profitability are nearly endless.
Three Key Big Data Drivers Volume, Variety, Velocity Hardware Commoditization Cloud Computing
Hadoop Open source framework for processing, storing and analyzing Big Data. Fundamental concept: Rather than banging away at one, huge block of data with a single machine, Hadoop breaks up Big Data into multiple parts so each part can be processed and analyzed in parallel.
Hadoop: The Pros and Cons First the pros … Hadoop is a time- and cost-effective approach to store, process and analyze large volumes of unstructured data allowing for new and unprecedented types of analytics. • Now the cons … Hadoop is complex and difficult to deploy and manage; there’s a dearth of Hadoop-savvy engineers and Data Scientists on the job market; the risk of forking and vendor lock-in remains.
Hadoop: The Pros and Cons cont. More pros … Many bright minds contributing to Hadoop resulting in rapid development and an ecosystem of vendors emerging to make Hadoop enterprise-ready.
Big Data Pioneers • Largest Hadoop instance on the planet … 40,000 nodes handling 200+ PB of data. • Used to support research for ad systems and Web search. • Match ads with users, detect spam in Yahoo! Mail, pick relevant top stories.
Big Data Pioneers cont. • Two major clusters processing and storing over 30 PB of data. • Uses HDFS to store copies of internal log and dimension data. • Developed Hive to perform large-scale analytics on user data. • Using HBase to store, manage and retrieve Facebook Messenger data.
Big Data Pioneers cont. • Uses Hadoop to support “People You May Know” feature. • Tailors its search engine to return most relevant results for recruiters, employers and job seekers. • Created a visualization tool to allow users to explore their professional network to discover hidden patterns.
Big Data in Financial Services • Over 30,000 databases and 15,000 applications spread across 7 business units. • Using Hadoop as the basis of its Common Data Platform. • Looking to establish 360 degree view of customer for upsell and cross-sell opportunities.
Big Data in Financial Services cont. • Risk management and analysis to understand financial exposure. • Detecting fraudulent transactions and potentially criminal activity. • Conduct sentiment analysis on social media data.
Thank You Jeffrey F. Kelly Principal Research Contributor The Wikibon Project jeff.kelly@wikibon.org @jeffreyfkelly www.wikibon.org www.siliconangle.com