440 likes | 552 Views
Big Data and the BI Wild West. Don’t Bring an Elephant to a Gun Fight!. Paul Groom. Tools Processes Objectives. Why Business Intelligence?. Community. Acquire. View. Learn. Action. What is Business Intelligence?. Numbers Tables Charts I ndicators. Time - History - Lag. Access
E N D
Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom
Tools Processes Objectives
Why Business Intelligence? Community Acquire View Learn Action
What is Business Intelligence? Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with easeand simplicity
Business [Intelligence] Desires More timely Lower latency Richer data model More granularity More users interactions Self service
Got mobile? 200 million Employees bring their own device to work 50% Companies BYOD orgs have had a security breach Nearly half Of the workforce will be made up of millennials by 2020 1/3 Have broken or would break corporate policy on BYOD
Disruption: Data Discovery tools Dynamic access Drill unlimited
BI tools have plateaued …again Decision Support (Reporting) in late 90’s …led to data mining Business Intelligence of 00’s …leading to analytics and data science
More math …a lot more math
The drive for deeper understanding Campaign Management Machine learning algorithms Dynamic Simulation Behaviour modelling Clustering Analytical Complexity Dynamic Interaction Statistical Analysis Fraud detection Reporting & BPM Technology/Automation
Behind the numbers create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES INTEGER ) partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1) colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median) daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) forecast<-array(0,c(dim1[1]+28,4)) colnames(forecast)<-c("ID","ACTUAL","PREDICTED","RESIDUALS") select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts, cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend, rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts, rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spend from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary group by Trans_Year, Num_Trans order by Trans_Yeardesc, Num_Trans; select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000;
It’s all about getting work done Tasks evolving: Used to be simple fetch of value Bottlenecks Bottlenecks Then was compute dynamic aggregate Now complex algorithms!
Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized Time to influence
BI Wild West Data
Business [Intelligence] Desiresin relation to Big Data More timely Lower latency Richer data model More granularity More users interactions Self service
Reports against the DW are just plain dull , boring even!
Hadoop ticks many but not all the boxes a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
Stomped on costs Made economics of scale practical
New economics = New attitude just grab and retain all data the data science team will dig into it later Talk to BI team about plugging into Hadoop – should be simple? Call IT: Why SQL so limited? No need to triage before storage No need to pre-process before storage i.e. no need to align to storage
Early bridge Building Early Hadoop integration tools
Wanted Dead or Alive The No SQL Posse SQL • The new bounty hunters: • Drill • Impala • Pivotal • Stinger
still …but Hadoop too slow for interactive BI …loss of train-of-thought
For once technology is on our side …oh and BTW RAM is cheap!
Hadoop is… Lots of these Hadoop inherently disk oriented Not so many of these Typically low ratio of CPU to Disk
Analytics needs low latency, no I/O wait
Analytical Platform Reference Architecture Application & Client Layer All BI Tools All OLAP Clients Excel Analytical Platform Layer Near-line Storage (optional) Reporting Persistence Layer Kognitio Storage Cloud Storage HadoopClusters Enterprise Data Warehouses Legacy Systems
Cognos SQL MDX
Reach out, actively select and pull back to consume
“No SQL” graduates to “not-only-SQL” SQL remains preferred data access language … for business community SQL can encapsulate other processing - in-line Python, R, Java etc. MPP everything – get more work done
Discovery Production
Big Data + Hadoop + in-memory for BI a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
Wild West 1865 to 1890 "The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner.The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization. For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.
connect contact kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com kognitio.tel kognitio.com/blog Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com twitter.com/kognitio linkedin.com/companies/kognitio Steve Friedberg - press contact MMI Communications steve@mmicomm.com tinyurl.com/kognitio youtube.com/kognitio Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!