450 likes | 831 Views
Big Data Processing. Using Amazon Web Services. Our Team. Presentation Agenda. Introduction to Big Data AWS for Big Data Processing Case Studies SWOT Analysis. Introduction. What is Big Data?. Plenty of information S low processing time Sources of Big Data
E N D
Big Data Processing Using Amazon Web Services
Presentation Agenda Introduction to Big Data AWS for Big Data Processing Case Studies SWOT Analysis
What is Big Data? Plenty of information Slow processing time Sources of Big Data Primary & Secondary sources 3Vs of Big Data Volume, Velocity, Variety
What can Big Data do? Forecast and predict future trends Predictive Modelling Develop strategies Improve decision making process
Processing & Using Big Data Data Management Data storage infrastructures Data Analysis Technologies & tools to analyze data Data Use Using & applying insights obtained
Who uses Big Data analytics? Companies Customer acquisition, retention Government Policy making Researchers Analyze causes/factors of events
Why use Big Data? Big Data for information Eg. Healthcare – Analyse cause of diseases and predict impending diseases outbreaks Big Data for decision making Delivers the cost-effective prospect to improve decision making in critical development areas
Trends Better technology Better Data Storage infrastructures & Big Data processing technologies readily available Benefits Reduce operating costs Optimizing profit Gain competitive advantage
Challenges Security & privacy Missing data Data relevancy Demographic issues, etc.
Storage Amazon S3 Compute And Analytics Data from different sources Amazon Elastic MapReduce (Hadoop) Amazon EC2 Database Amazon RDS Real-time access to analytical reports
Amazon Technologies Amazon Simple Storage Service (S3) Store & retrieve any amount of data at anytime, anywhere on the web Amazon Relational Database Service (RDS) Relational database in the cloud
Amazon Technologies Amazon Elastic MapReduce (EMR) Enables easy and cost-effective processing of vast amounts of data Manages Hadoop clusters
Hadoop What is it? Open source software framework created by the Apache Software Foundation (ASF) Allows companies to create and develop tools to analyze and produce valuable insights from big data Who is using it?
Amazon Technologies Amazon Elastic MapReduce (EMR) Enables easy and cost-effective processing Suitable for vast amounts of data Manages Hadoop clusters Easy provision of capacity Seamless integration with S3
Case Study FoursquareLabs, Inc. location-based social network 10 million users world-wide Approximately 5 million daily check-ins Technologies used Amazon ElasticMapReduceAmazon EC2 Spot Instances Amazon S3
Case Study Benefits Ease-of-use Simplifies process of deploying and managing Hadoop cluster Low Cost Only charged for the resources used Overall, reduced time, effort, and cost to generate customer insights
Case Study Pulse App for Android, iOS and browser Displays news from multiple RSS feeds 20 million users 10 million stories read per day
Case Study Collecting Data Built on Amazon EC2 and S3 Data stored in two separate S3 buckets Additional reliability and robustness Uses Elastic Load Balancer (ELB) Distributed request loads
Case Study Analyzing with Hadoop Extract breaking stories quickly Analysis engine Built on top of EMR and Hadoop Scans through big data sets Surface all the top stories Amazon EMR interface Simple & cost effective
Case Study Serving Feeds After EMR jobs finish, final top stories pushed to datastore on Google App Engine (GAE)
Service Level Agreement Requires Multiple Availability Zones Additional Cost Complex & Limited in Scope
Vendor Lock-in Difficult to migrate Requires Code Rewrite Tied with IaaS Capabilities
Windows Azure HDInsight Similar cloud offerings Hadoop based data processing in the cloud Open, scalable and elastic Integration with Microsoft apps Works with Microsoft Office Excel Windows Azure Marketplace Ease of access to apps and data sources
Google BigQuery Very similar to Amazon’s offerings Infrastructure as a Service, Easily scalable Fast and Secure Simple SQL-like queries to huge data sets Secured with SSL with Access Control Ease of Access Implemented as a RESTful web service Integrated with Google Apps
Reputation Synonymous with “The Cloud” Market leader and industry innovator Huge portfolio of big-name clients Good track record & robust service
Rich Product Portfolio Most comprehensive product offerings Largest pool of capacity Brings products to market quickly Adapts products to customer feedback One of the lowest cost IaaS providers
Greater Efficiency Demand for faster data warehousing Queries 100s of gigabytes to a petabyte or more Improvements in price-to-performance ratio Increasing trend of big data analytics Affordable fast, scalable data warehousing tech Enable smaller firms’ access to big data analytics
Better Apps Good infrastructure alone not enough Better apps that can exploit Big Data Provide useful and actionable insights Analyse and optimise business processes Greater integration Ensure insights are embedded into workflow
More Partnerships Partner with big software companies Encourage large client firms to adopt AWS Partner with many smaller developers Increase selection of apps in marketplace Better ecosystem of apps attract more users
Conclusion What is Big Data How is big data being used today What are the trends and challenges AWS Success Stories Pulse, Foursquare Evaluation of Service SWOT analysis of the AWS offering Exciting opportunities for future development