161 likes | 559 Views
Big Data. Anton Boyko. Agenda. What is Big Data? Why Big Data? How to Big Data?. What is Big Data?. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage , and process the data within a tolerable elapsed time. Data growth.
E N D
Big Data Anton Boyko
Agenda • What is Big Data? • Why Big Data? • How to Big Data?
Whatis Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.
Data growth • Velocity 4.3 • Volume 10x • Variety 85% • Big Data
Move compute to data • Fast storage vs. fast CPU and fast networking • Linear scalability
Map/Reduce workflow Mappers (find matches) Reducers (combine matches) Mappers (inverse keys and values) Reducer (combine results) File system DFS temp File system
Map/Reduce – how it works public class NamespaceMapper: MapperBase { //Override the map method. public override void Map( string inputLine, MapperContextcontext) { varreg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;"); var matches = reg.Matches(inputLine); foreach (Match match in matches) { //Just emit the namespaces. context.EmitKeyValue(match.Value,"1"); } } } public class NamespaceReducer: ReducerCombinerBase { //Accepts each key and count the occurrences public override void Reduce( string key, IEnumerable<string> values, ReducerCombinerContextcontext) { //Write back context.EmitKeyValue(key,values.Count().ToString()); } }
Offering • ODBC for Excel • PowerPivot • Windows Server or Windows Azure • C#, Java, JavaScript
Вопросы? Антон Бойко boyko.ant@live.com