1 / 24

Presentation on Big Data

Presentation on Big Data. P.Venugopal 27.09.2013. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion. Definition Big data is high-volume,

fbono
Download Presentation

Presentation on Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presentation on Big Data P.Venugopal 27.09.2013

  2. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  3. Definition Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

  4. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  5. The Basics In the past, when a company needed to store and access more data they had one of two choices.  One option would be to buy a bigger machine with more CPU, RAM, disk space, etc.   This is known as scaling vertically.  Of course, there is a limit to how big of a machine you can actually buy and this does not work when you start talking about internet scale.  The other option would be to scale horizontally.   This usually meant contacting your vendor like (Oracle, EMC, etc.) to buy a bigger solution.  These solutions do not come cheap and therefore required a significant investment. 

  6. Thoughts for ……. This is where the new breed of solutions comes in.  They are designed around cheap commodity hardware. The system takes care of all the complexities of replication, failover, etc.  This allows the developer to think about the actual data problem at hand and allows businesses to scale horizontally but at a much cheaper price.

  7. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  8. Foundation of the Big Data movement MapReduce The foundation of the BigData movement comes from a framework that was originally developed by Google to deal with the large amount of search data, MapReduce.There are two parts to MapReduce, a Map phase and a Reduce phase.  In the Map phase, a problem is broken out in to smaller chunks.  This enables the work to be redistributed to the various nodes in the cluster.  In the Reduce phase, the work is collected from all the nodes and then reassembled to come up with the final output.

  9. The Data sets Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, ranging from a few dozen terabytes to many petabytes of data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data

  10. Few Facts …..Why Big Data is relevant The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates. A shocking 90% of all the world’s data was created in the last two years. The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013. Google …… more than 7.2 billion page views / day Face Book …… more than 1 billion users and 35 % of worlds photographs You Tube ….. More than 4 billion views / day Twitter …… More than 4500 tweets / sec

  11. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  12. Driving Factors for Big Data Analysis Finance, health care, government, social media, and retail players all recognize the significant economic value that petabytes (even zettabytes!) of consumer-generated data possess. The advent of Big Data delivers the cost-effective prospect to improve decision-making in critical development areas such as above and also in employment, economic productivity, crime and security, and natural disaster and resource management. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress.

  13. Big Data is Driving three things ; • a dramatic increase in velocity of innovation • an acceleration of social change • an alteration of the global flow of information

  14. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  15. Impact of Big Data • Know what we know • Discover the gaps in our knowledge • Focus targeting to fill the gaps • More effective use of expensive or long lead collection assets • Better global coverage to limit surprise • Enhance understanding and improve analysis

  16. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  17. Technologies to Process large Data Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. A 2011 McKinsey report suggests suitable technologies include A/B testing, association rule learning, classification, cluster analysis, crowdsourcing, data fusion and integration, ensemble learning, genetic algorithms, machine learning, natural language processing, neural networks, pattern recognition, anomaly detection, predictive modelling, regression, sentiment analysis, signal processing, supervised and unsupervised learning, simulation, time series analysis and visualisation. Multidimensional big data can also be represented as tensors, which can be more efficiently handled by tensor-based computation, such as multilinear subspace learning. Additional technologies being applied to big data include massively parallel-processing (MPP) databases, search-based applications, data-mining grids, distributed file systems

  18. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  19. Applications areas of Big Data Examples include Big Science, RFID, sensor networks, social networks, big social data analysis Internet documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting , medical records, photography archives, video archives, and large-scale e-commerce etc.

  20. Initiatives of US Government In March 2012, The White House announced a national "Big Data Initiative" that consisted of six Federal departments and agencies committing more than $200 million to big data research projects The Big Data Research and Development Initiative, is to explore how big data could be used to address important problems faced by the government. The initiative was composed of 84 different big data programs spread across six departments. The United States Federal Government owns six of the ten most powerful supercomputers in the world.

  21. Agenda Definition Basics Foundation of the Big Data movement Driving Factors for Big Data Analysis Impact of Big Data Technologies to Process large Data Applications areas of Big Data Conclusion

  22. Conclusion Today Analytics and tools are hard to use Specialists are required to derive value Skilled people are in short supply Algorithms are dense and arcane Require a lot of hand curation Built for business not for intelligence Tomorrow Elegant, powerful and easy to use tools and visualisation Machines to do more of the heavy lifting Intelligent systems that learn from the user Correlation not search Curiosity layer – machines that are curious on your behalf

  23. Thank you

  24. Facebook handles 50 billion photos from its user base There are 4.6 billion mobile-phone subscriptions worldwide and there are between 1 billion and 2 billion people accessing the internet Decoding the human genome originally took 10 years to process, now it can be achieved in less than a week

More Related