120 likes | 252 Views
Business Intelligence. New Technologies, Methodology Implications, Enterprise Architecture and Control. New Technologies – “Big Data”. What is it?. History & Technologies. Historical BI/DW practices driven by four variables: Disk, Memory, Processor Power and Licensing
E N D
Business Intelligence New Technologies, Methodology Implications, Enterprise Architecture and Control
New Technologies – “Big Data” What is it? History & Technologies Historical BI/DW practices driven by four variables: Disk, Memory, Processor Power and Licensing Publication of a paper by Google on a process called Map Reduce Parallel processing in a highly distributed environment Many relatively simple machines running Map Reduce processes HADOOP was born as the Apache implementation of Map Reduce Challenges Not ACID and no SQL language support Legacy reporting tools do not understand these sources NoSQL Key Value Pairs Document model • New buzz word everyone wants to talk about • What does it mean? • Simply, data sets large enough to be not easily managed and analyzed using standard relational or OLAP toolsets • Where does it apply • Born out of web traffic analysis, advertising targeting and product suggestion • Scientific Applications • Language Analysis
New Technologies – “Big Data” Vendors • Legacy versus New Challengers (Commercial/Open Source) • Legacy Data Warehouse Vendors: Oracle, IBM, Microsoft, Teradata, Neteeza • Many New Entrants • Cloud Based • Amazon Elastic Map Reduce (EMR) • HADOOP Meets SQL • NuoDB, Cloudera, Cassandra, Accumulo, MS PolyBase • NoSQL • http://nosql-database.org/ • MongoDB, CouchDB, RavenDB • Basic Question: Do you have an infrastructure that has multiple BI platforms (Relational/OLAP and HADOOP)? Or wait for one of the legacy vendors to supply enough HADOOP functionality in its core offering to suffice?
New Technologies – In Memory Analytics What is it? History & Technologies Enterprise Deployed – Small & Mid Size Enterprise QlikView was a pioneer in this space Tabelau Cloud Based SAP HANA From SAP or Amazon $’s per hour of use Oracle TimesTen Microsoft SQL Server 2014 • Full (or targeted bits of) data set in system memory • Moving from Appliance based to Cloud based • Initially was analytics focused • Self Service Analytics using disparate data sources • No ETL • No central data architecture control • Intended to be high performance • Beginning to spread into the Transactional/Relational space • Becoming Main Stream technology
New Technologies – BI In the Cloud Traditional Vendors Cloud Based BI – New Entrants Cloud Only Deployment SaaS pricing Typically full life cycle solutions ETL Reporting Vendors Birst DOMO GoodData Indicee Jaspersoft • The “Cloud” has many definitions • Virtual Machines versus “Cloud” processes • Major Players • Microsoft • Azure • SQL Server progressively moving to Azure • Slowly adding options for higher performance • Reporting Service / HDInsight • Oracle in Azure… Wait.. What? • Amazon Web Services • (Almost) Everyone is welcome • Microsoft, Oracle, SAP HANA, NoSQL, HADOOP • Largely Virtual Machine based • SAP • HANA
New Technologies – Data Services In the Cloud New Uses • Disaster Recovery • Tight integration of local SQL Engines and Cloud based failovers • Backup and Restore using the Cloud • Complex Event Processing • Microsoft StreamInsight
New Technologies – Methodology impacts – Big Data “We need BIG DATA!” • Sometimes more is not better • John Snows Cholera Map • Discovering the cause of a particular cholera epidemic as well as discovering the general concept of infectious disease was determined by analyzing 620 data points • Location of infections on map of London limited to a particular area. Initial analysis pointed towards water pumps in the vicinity. Confirming data was that Monks only drink beer. • A “Big Data” project might have involved the compilation and analysis of all infection locations worldwide integrated with all activities performed by those individuals. The analysis would have likely have been swamped by noise. • Avoid the temptation to push for bigger and bigger data sets without a clear objective in mind and some scientific reasoning as to why more will be better. • Make sure a limited scope data set is also an option for analysis when looking for specific causation. • Consider the role of Data Scientist within the organization
New Technologies – Methodology impacts – In Memory “Who needs a data warehouse anymore?” • I blame QlikView for the above statement. • Cloud Based BI tools are heading down the same road. I’m looking at you SAP. • Vendors perceived a market opportunity to gain customers by claiming In Memory technology allowed for the elimination of costs related to data architecture and ETL development • Statements you may hear: • “I don’t need good data architecture because the speed will make up for inefficiencies in joins or storage of the data” • “The users want the flexibility to join to any data source and any time. ETL just slows us down.” • The above runs contrary to another concept that is increasingly gaining traction (finally). Master Data Management.
New Technologies – Methodology impacts – In Memory “Who needs a data warehouse anymore?” • Observations • Results are very mixed • Very hard to maintain a proper Data Governance/MDM process • The best results I’ve seen have involved the use of In Memory tools on top of quality data mart/warehouse environments • There is no free lunch, buying more memory or more virtual servers will only take you so far • BUT, there is some merit here • Pure speed does give you options • We see utility in prototyping new additions to the formal Data Warehouse structure or for giving users some room to roam from the base • Data Governance needs to maintain control
New Technologies – Architecture & Control “Beware the Zombie Clouds!” • Clouds are the new Flash drives with regards to data control and security • There a million new low cost Software as a Service options on the market • No or low up front adoption costs • Can be initiated by the user/business side of the enterprise as well as IT personnel outside the data governance process • Many are designed to quickly accept your data and make it easily accessible to an audience (which you don’t control or might not even know about) • Some offer Single Sign-On, but is not required • Some might be quickly abandoned and data is left in a zombie state in perpetuity • Data timeliness and provenance becomes very suspect
New Technologies – Contact Info Paul Dausman pdausman@valordevelopment.com twitter: @pdausman www.valordevelopment.com www.valianthealth.com www.techweuse.com