1 / 24

introduction to big data

Introduction. Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. Volume. Velocity. Variety. Other Characteristics. Clustered Computing. Ingesting Data into the System. Persisting the Data in Storage.

Download Presentation

introduction to big data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIGDATA By professionalguru

  2. Introduction • Big Data may well be the Next Big Thing in theIT world. • Bigdataburstuponthesceneinthefirstdecadeofthe 21stcentury. • The first organizations to embrace itwere online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. • Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and serviceofferings. http://professional-guru.com

  3. What is BIGDATA? • ‘Big Data’ is similar to ‘small data’, but bigger in size • but having data bigger it requires different approaches: • – Techniques, tools andarchitecture • an aim to solve new problems or old problems ina • betterway • Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computingtechniques. http://professional-guru.com

  4. http://professional-guru.com What is BIGDATA • Walmart handles more than 1 millioncustomer transactions everyhour. • Facebook handles 40 billion photos from its userbase. • Decoding the human genome originally took 10yearsto process; now it can be achieved in oneweek.

  5. Three Characteristics of Big DataV3s • Volume • Data quantity • Velocity • Data Speed • Variety • Data Types http://professional-guru.com

  6. 1st Character of BigData Volume • A typical PC might have had 10 gigabytes of storage in2000. • Today, Facebook ingests 500 terabytes of new data everyday. • Boeing 737 will generate 240 terabytes of flightdata during a single flight across theUS. • The smart phones, the data they create and consume; sensors embeddedintoeverydayobjectswillsoon resultinbillionsofnew, constantly-updated data feeds containing environmental,location, and other information, includingvideo. http://professional-guru.com

  7. 2nd Character of BigData Velocity • Clickstreams and ad impressions capture user behaviorat millions of events persecond • high-frequency stock trading algorithms reflectmarket changes within microseconds • machine to machine processes exchange databetween billions ofdevices • infrastructure and sensors generate massive log data inreal- time • on-line gaming systems support millions ofconcurrent users, each producing multiple inputs persecond. http://professional-guru.com

  8. 3rd Character of BigData Variety • Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log filesand socialmedia. • Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent datastructure. • Big Data analysis includes different types ofdata http://professional-guru.com

  9. Storing BigData • Analyzing your datacharacteristics • Selecting data sources foranalysis • Eliminating redundantdata • Establishing the role ofNoSQL • Overview of Big Datastores • Data models: key value, graph, document, column-family • Hadoop Distributed FileSystem • HBase • Hive http://professional-guru.com

  10. Selecting Big Datastores • Choosing the correct data stores basedon your datacharacteristics • Moving code todata • Implementing polyglot data storesolutions • Aligning business goals to theappropriate datastore http://professional-guru.com

  11. Processing BigData • Integrating disparate datastores • Mapping data to the programmingframework • Connecting and extracting data fromstorage • Transforming data forprocessing • Subdividing data in preparation forHadoop MapReduce • Employing HadoopMapReduce • Creating the components of Hadoop MapReducejobs • Distributing data processing across serverfarms • Executing Hadoop MapReducejobs • Monitoring the progress of jobflows http://professional-guru.com

  12. The Structure of BigData • Structured • Most traditionaldata sources • Semi-structured • Many sources ofbig data • Unstructured • Video data, audiodata 13

  13. Why BigData • Growth of Big Data isneeded • Increase of storagecapacities • Increase of processingpower • Availability of data(different datatypes) • Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two yearsalone http://professional-guru.com

  14. Why BigData • FB generates 10TBdaily • Twitter generates 7TB ofdata • Daily • IBM claims 90% of today’s stored data was generated injustthelasttwoyears. http://professional-guru.com

  15. How Is Big DataDifferent? • Automatically generated by amachine (e.g. Sensor embedded in anengine) • Typically an entirely new source ofdata (e.g. Use of theinternet) • Not designed to befriendly (e.g. Textstreams) • May not have muchvalues • Need to focus on the importantpart 16 http://professional-guru.com

  16. Big Datasources Users Large and growing files (Big datafiles) Application Systems Sensors http://professional-guru.com

  17. Data generation points Examples MobileDevices Microphones Readers/Scanners Science facilities Programs/Software Social Media Cameras

  18. Big DataAnalytics • Examining large amount ofdata • Appropriateinformation • Identification of hidden patterns, unknowncorrelations • Competitiveadvantage • Better business decisions: strategic andoperational • Effective marketing, customer satisfaction,increased • revenue http://professional-guru.com

  19. Types of tools used in Big-Data • Where processing ishosted? • Distributed Servers /Cloud (e.g. Amazon EC2) • Where data isstored? • Distributed Storage (e.g. AmazonS3) • What is the programmingmodel? • Distributed Processing (e.g.MapReduce) • How data is stored &indexed? • High-performance schema-free databases (e.g.MongoDB) • What operations are performed ondata? • Analytic /Semantic Processing http://professional-guru.com

  20. Risks of BigData • Will be sooverwhelmed • Needtherightpeopleandsolvetherightproblems • Costs escalate toofast • Isn’t necessary to capture100% • Many sources of bigdata isprivacy • self-regulation • Legalregulation 22 http://professional-guru.com

  21. Benefits of BigData • Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse,It’s about the ability to make better decisions and take meaningful actions at the righttime. • Fast forward to the present and technologies likeHadoop give you the scale and flexibility to store data before you know how you are going to processit. • Technologies such as MapReduce,Hive and Impala enable you to run queries without changing the data structures underneath. http://professional-guru.com

  22. Benefits of BigData • Our newest research finds that organizations are using big data to target customer-centric outcomes, tap intointernal data and build a better informationecosystem. • Big Data is already an important part of the $64billion database and data analyticsmarket • It offers commercial opportunities ofa comparable scale to enterprise software in thelate 1980s • AndtheInternetboomofthe1990s,andthesocialmedia explosion oftoday. http://professional-guru.com

  23. Future of BigData • $15 billion on software firms only specializingin data management andanalytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growthto • $53.4B in2017 • The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow44x between 2009 and2020. http://professional-guru.com

  24. ThankYou.

More Related