1 / 16

Big Data

Big Data. Anton Boyko. Agenda. What is Big Data? Why Big Data? How to Big Data?. What is Big Data?. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage , and process the data within a tolerable elapsed time. Data growth.

mary-arnold
Download Presentation

Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Anton Boyko

  2. Agenda • What is Big Data? • Why Big Data? • How to Big Data?

  3. Whatis Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.

  4. Data growth • Velocity 4.3 • Volume 10x • Variety 85% • Big Data

  5. How to process Big Data?

  6. Move data to compute

  7. Move compute to data • Fast storage vs. fast CPU and fast networking • Linear scalability

  8. Map/Reduce workflow Mappers (find matches) Reducers (combine matches) Mappers (inverse keys and values) Reducer (combine results) File system DFS temp File system

  9. Map/Reduce – how it works public class NamespaceMapper: MapperBase { //Override the map method. public override void Map( string inputLine, MapperContextcontext) { varreg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;"); var matches = reg.Matches(inputLine); foreach (Match match in matches) { //Just emit the namespaces. context.EmitKeyValue(match.Value,"1"); } } } public class NamespaceReducer: ReducerCombinerBase { //Accepts each key and count the occurrences public override void Reduce( string key, IEnumerable<string> values, ReducerCombinerContextcontext) { //Write back context.EmitKeyValue(key,values.Count().ToString()); } }

  10. Traditional RDBMS vs. Map/Reduce

  11. Hadoop – implementation of Map/Reduce engine

  12. Hadoop ecosystem

  13. Offering • ODBC for Excel • PowerPivot • Windows Server or Windows Azure • C#, Java, JavaScript

  14. Demo

  15. Pricing

  16. Вопросы? Антон Бойко boyko.ant@live.com

More Related