450 likes | 716 Views
Big Data in Excel. “Big Data is defined as anything that doesn’t fit in a spreadsheet.”. –Anon. Map-Reduce. Map : Run one calculation over an entire column Filter: Select which rows to keep Reduce: Combine an entire, filtered column into a result. Map. =LTV * 2. Map-Reduce.
E N D
“Big Data is defined as anything that doesn’t fit in a spreadsheet.” –Anon.
Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result
Map =LTV * 2
Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result
Filter =IF(LTV > 0, LTV, 0)
Map-Reduce • Map: Run one calculation over an entire column • Filter: Select which rows to keep • Reduce: Combine an entire, filtered column into a result
Reduce =AVERAGE(LTV)
100,000 rows • Data still fits in Excel, but Excel will start to slow down • Best practices: • Start with a sample • Operate on columns (Map, Filter) • Aggregate Results (Reduce) • Alternative: Pivot Tables
100,000 rows • Data still fits in Excel, but Excel will start to slow down • Best practices: • Start with a sample • Operate on columns (Map, Filter) • Aggregate Results (Reduce) • Alternative: Pivot Tables
10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python
10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python
10,000,000 rows • Data no longer fits in Excel (but it’s still not too big) • For viewing: store data outside of Excel • SQL database + PowerQuery • Filter data, run calculations inside spreadsheet • For modeling: Run calculations outside of Excel • Python
1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets
1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets
1,000,000,000 rows • Power Pivot • Can work with as much data as will fit in RAM(~1B rows) • Power BI • Visualize huge data sets
100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel
100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel
100,000,000,000 rows • For viewing: store data in Hadoop • Azure + Power Query • For modeling: Rook • Explore and Analyze Hadoop datadirectly from Excel
Rook • Explore your company’s entire datastore in Excel • We’ll support Hadoop, SQL, MongoDB, etc. • Good from one million rows to one trillion+ rows • Manipulate the entire data set with Excel formulas • Get results directly in your spreadsheet
We’re looking for beta users. • Interested? Contact us at info@datanitro.com. • Thanks for listening!