240 likes | 271 Views
Introduction. Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. Volume. Velocity. Variety. Other Characteristics. Clustered Computing. Ingesting Data into the System. Persisting the Data in Storage.
E N D
BIGDATA By professionalguru
Introduction • Big Data may well be the Next Big Thing in theIT world. • Bigdataburstuponthesceneinthefirstdecadeofthe 21stcentury. • The first organizations to embrace itwere online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. • Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and serviceofferings. http://professional-guru.com
What is BIGDATA? • ‘Big Data’ is similar to ‘small data’, but bigger in size • but having data bigger it requires different approaches: • – Techniques, tools andarchitecture • an aim to solve new problems or old problems ina • betterway • Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computingtechniques. http://professional-guru.com
http://professional-guru.com What is BIGDATA • Walmart handles more than 1 millioncustomer transactions everyhour. • Facebook handles 40 billion photos from its userbase. • Decoding the human genome originally took 10yearsto process; now it can be achieved in oneweek.
Three Characteristics of Big DataV3s • Volume • Data quantity • Velocity • Data Speed • Variety • Data Types http://professional-guru.com
1st Character of BigData Volume • A typical PC might have had 10 gigabytes of storage in2000. • Today, Facebook ingests 500 terabytes of new data everyday. • Boeing 737 will generate 240 terabytes of flightdata during a single flight across theUS. • The smart phones, the data they create and consume; sensors embeddedintoeverydayobjectswillsoon resultinbillionsofnew, constantly-updated data feeds containing environmental,location, and other information, includingvideo. http://professional-guru.com
2nd Character of BigData Velocity • Clickstreams and ad impressions capture user behaviorat millions of events persecond • high-frequency stock trading algorithms reflectmarket changes within microseconds • machine to machine processes exchange databetween billions ofdevices • infrastructure and sensors generate massive log data inreal- time • on-line gaming systems support millions ofconcurrent users, each producing multiple inputs persecond. http://professional-guru.com
3rd Character of BigData Variety • Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log filesand socialmedia. • Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent datastructure. • Big Data analysis includes different types ofdata http://professional-guru.com
Storing BigData • Analyzing your datacharacteristics • Selecting data sources foranalysis • Eliminating redundantdata • Establishing the role ofNoSQL • Overview of Big Datastores • Data models: key value, graph, document, column-family • Hadoop Distributed FileSystem • HBase • Hive http://professional-guru.com
Selecting Big Datastores • Choosing the correct data stores basedon your datacharacteristics • Moving code todata • Implementing polyglot data storesolutions • Aligning business goals to theappropriate datastore http://professional-guru.com
Processing BigData • Integrating disparate datastores • Mapping data to the programmingframework • Connecting and extracting data fromstorage • Transforming data forprocessing • Subdividing data in preparation forHadoop MapReduce • Employing HadoopMapReduce • Creating the components of Hadoop MapReducejobs • Distributing data processing across serverfarms • Executing Hadoop MapReducejobs • Monitoring the progress of jobflows http://professional-guru.com
The Structure of BigData • Structured • Most traditionaldata sources • Semi-structured • Many sources ofbig data • Unstructured • Video data, audiodata 13
Why BigData • Growth of Big Data isneeded • Increase of storagecapacities • Increase of processingpower • Availability of data(different datatypes) • Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two yearsalone http://professional-guru.com
Why BigData • FB generates 10TBdaily • Twitter generates 7TB ofdata • Daily • IBM claims 90% of today’s stored data was generated injustthelasttwoyears. http://professional-guru.com
How Is Big DataDifferent? • Automatically generated by amachine (e.g. Sensor embedded in anengine) • Typically an entirely new source ofdata (e.g. Use of theinternet) • Not designed to befriendly (e.g. Textstreams) • May not have muchvalues • Need to focus on the importantpart 16 http://professional-guru.com
Big Datasources Users Large and growing files (Big datafiles) Application Systems Sensors http://professional-guru.com
Data generation points Examples MobileDevices Microphones Readers/Scanners Science facilities Programs/Software Social Media Cameras
Big DataAnalytics • Examining large amount ofdata • Appropriateinformation • Identification of hidden patterns, unknowncorrelations • Competitiveadvantage • Better business decisions: strategic andoperational • Effective marketing, customer satisfaction,increased • revenue http://professional-guru.com
Types of tools used in Big-Data • Where processing ishosted? • Distributed Servers /Cloud (e.g. Amazon EC2) • Where data isstored? • Distributed Storage (e.g. AmazonS3) • What is the programmingmodel? • Distributed Processing (e.g.MapReduce) • How data is stored &indexed? • High-performance schema-free databases (e.g.MongoDB) • What operations are performed ondata? • Analytic /Semantic Processing http://professional-guru.com
Risks of BigData • Will be sooverwhelmed • Needtherightpeopleandsolvetherightproblems • Costs escalate toofast • Isn’t necessary to capture100% • Many sources of bigdata isprivacy • self-regulation • Legalregulation 22 http://professional-guru.com
Benefits of BigData • Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse,It’s about the ability to make better decisions and take meaningful actions at the righttime. • Fast forward to the present and technologies likeHadoop give you the scale and flexibility to store data before you know how you are going to processit. • Technologies such as MapReduce,Hive and Impala enable you to run queries without changing the data structures underneath. http://professional-guru.com
Benefits of BigData • Our newest research finds that organizations are using big data to target customer-centric outcomes, tap intointernal data and build a better informationecosystem. • Big Data is already an important part of the $64billion database and data analyticsmarket • It offers commercial opportunities ofa comparable scale to enterprise software in thelate 1980s • AndtheInternetboomofthe1990s,andthesocialmedia explosion oftoday. http://professional-guru.com
Future of BigData • $15 billion on software firms only specializingin data management andanalytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growthto • $53.4B in2017 • The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow44x between 2009 and2020. http://professional-guru.com