370 likes | 387 Views
Optigrise Technology Solutions LLC, New Jersey - The World's Largest Professional Community to help Business People with Up-to-Date Digital Transformation Service. Optigrise Technology is specifically designed to help enterprises succeed in their digital transformation by re-imagining businesses to generate growth with cost efficiency and business agility.
E N D
Data Warehouse & BusinessIntelligence (DW/BI) Optigrise Technology
DigitalSBU Data, Analytics & InsightsService CloudServices AI & CognitiveServices Digital IntegrationServices • CloudConsulting • CloudArchitecture • CloudMigration • Cloud NativeDev • Cloud Testing &Ops • AIConsulting • Data Science &Machine • Learning • Conversational AI, NLP, Chatbot/VirtualAgents • Voice, Speech &Video • Data Strategy, Consulting &Architecture • Data Warehouse & Business Intelligence • Operational Databases,OLTP • Data Warehouse, OLAP & Data Mart BusinessIntelligence • ETL • MDM/Master Data Management Big Data &Analytics • Modern Data Warehouse,DWaaS • Big Data & Analytics, Big Dataon cloud, DataMigration • DataVisualization • Data Ops, Data Integration &ELT • DigitalIntegration • Architecture • APIGateway • Microservice • EAI andSOA • DevOps FocusAreas ConsultingServices EngineeringServices ProfessionalServices
Data Warehouse & BusinessIntelligence Operational Databases Data Warehouse & DataMart BusinessIntelligence ETL MDM/Master DataManagement
“Data could be your biggest asset, data could be your biggestchallenge” Data, Analytics & Insight is fueling digitaltransformation “The goal is to turn data into information, and information into insight.” – Carly Fiorina, former executive, president, and chair of Hewlett-PackardCo. Global Data Warehouse market is projected to reach $35 billion by2025 (current $20B). Analytics Market is expected to reach $71.1 billion by2022.
DW &BI • TypicalApproach • Siloed approach – Separate tools, processfor differentteams. • Separate pipeline – Separate pipeline/dataflow b/w traditional data engineering, big data & ML teams. • Focus on Data science only – While AI and predictiveanalyticscansolvemanyusecases,still organizations have huge amount of data in relational & structured form. They should continuetohaveastrongDW/BIstrategy Data Science &Machine Learning Reporting &Visualization How our data services & solutions are differentfrom others? BusinessIntelligence DataWarehouse Big Data* • OurApproach • Unified approach – Unified tools &process. • Unified pipeline – Unified pipeline from data ingestion, data preparation to visualization for traditionalDW/BI,Bigdata,AI&otheranalytics. • Balanced Approach: Balanced approachb/w traditional DW/BI, Big data &AI • DataOps– BringinginDevOps&Agileprinciplesto Dataprojects. • Cost Optimization - Cost saving on DW/BI, so that additionalsavingscouldbespentonAI&BigData. DataStrategy Reporting &Visualization Data Science &Machine Learning BusinessIntelligence Big Data &Analytics DataWarehouse Strong DataFoundation
DW &BI How Business Analytics solution have changed overtime • Business Analytics 10 yearsbefore • Simple 3 laterstack BI (BusinessIntelligence) DW (DataWarehouse) ETL • Business AnalyticsNow • Unified approach – Unified tools &process. • Unified pipeline – Unified pipeline from data ingestion, data preparation to visualization for traditionalDW/BI,Bigdata,AI&otheranalytics. • Balanced Approach: Balanced approachb/w traditional DW/BI, Big data &AI • DataOps– BringinginDevOps&Agileprinciplesto Dataprojects. • Cost Optimization - Cost saving on DW/BI, so that additionalsavingscouldbespentonAI&BigData. Self ServiceAnalytics MobileBI Challenge: # of tools have exploded in recent years. This poses a huge challenge for enterprises of allsize BI (Business Intelligence) Visualization Time SeriesProcessing SpatialProcessing MachineLearning GraphProcessing Core BI Big DataAnalytics OurSolution: We use areference architecturebased approach to mapclient’s unique need to one of our proven DW/BI referencearchitecture. Modern DataWarehouse HybridCloud DW (DataWarehouse) DataLake ETL ELT, DataOps
OLTP ETL DataWarehouse BI &Visualization MDM • Terradata • SQL ServerDW, Azure SQLData warehouse • Oracle • SAP • Snowflake • AWSRedshift • IBMBigQuery • Tableau • Qlickview • PowerBI • SSRS • FusionChats • D3.js • Informatica • IBM • SSIS / SQLServer • IntegrationSvc. • Talend • InformaticaPower • Oracle • MS SQLServer • Oracle • IBMDb2 • MySQL • PostgreSQL • NoSQLs • NewSQLs • Center • IBMInfosphere InformationServer • OracleData Integrator • AbInitio • ApacheNifi • SAS –Data • IntegrationStudio • SAP BusinessObjects DataIntegrator DW/BI Vendors andToolchain
DW &BI DW BIarchitecture Data warehouse design/build ETLdesign/build/test • Datastrategy • Dataconsulting • DW/BIreference architecture • Data Opsconsulting • Data governance &quality • strategy • Datasecurity • Dataarchival OurOfferings • Data warehouse schema& modeldesign • Data warehousebuild/test • Data warehouse performanceoptimization. • Cloud Data warehouseand • migration • Modern data warehouse design with AI & Bigdata analytics • Spark MachineLearning • ETL pipelinedesign • ETL build andtest • Manage/monitor ETLbatch jobs • NoETL/Streaming ETL design & build on kafka / other messagingplatform • ELT design/build todata lake • Cloud/SaaSETL design/build. Our TechExpertize • ETL / DataMovement: • Talend, Stitch, SSIS, Informatica, AWSGlue, Azure DataFactory • DataWarehouse: • Teradata Vantage, MSSQL Server Data Warehouse, Oracle DW, IBM Db2DW Visualization &Reporting BusinessIntelligence DataOps • BI & Analyticsdesign • Dimensionalmodeling, OLAP Cubedesign • Self serviceanalytics • BItest • Business Intelligence, Visualization &Dashboards: • Talend, Power BI,Qlick • Visualization &graph design/build • Reportingdesign/build • Dashboardsdesign/build • Testing • Traditional ETLtools (Talend,SSIS) • Big data/data lakerelated ELTtools • Data integrationtools (Streamsets, Altryx) development • Others: • GDPR, HIPPA anddata privacyconsulting • Dataarchival
DW/BI architecture – very smallorganizations • No StagingArea • Often time in very small organizations & POCs, Data warehouse does not have separate‘Staging Area’ • Data from operational systems are moveddirectly to datawarehouse • No DataMart • Analytics/Visualization/reporting apps directly query datawarehouse.
DW/BI architecture – small sizedorganizations • StagingArea • Data from operational systemsare moved to stagingarea. • Later its moved to datawarehouse • No DataMart • Analytics/Visualization/reporting apps directly query datawarehouse.
DW/BI architecture – medium & largeorganizations • Uses StagingArea • Data Marts: Departmental Data Marts based on business / subject area. BI/Visualization tools access data mart data and not raw data in data warehouse.
DW/BI architecture – very largeorganizations Often called “Three tier DW/BIarchitecture” • UsesStaging Area • Departmental Data Martsbased on business / subjectarea. • OLAP Servers: OLAP Cubesused for dimensional modeling. • BI/Visualization tools accessdata mart data and not raw data in datawarehouse. Tier1 Tier2 Tier3
Operational databases – Relational, NoSQL, Time series, Graph… • PostgreSQL • DB4o • Microsoft SQLServer • Oracle • IBM Db2 • MySQL • MariaDB • Sybase Object Relational • AWS Quantum Leger Database/QLDB (Blockchaindatabase) • SpatialDatabase • GISDatabase Specialized Databases Relational/ RDBMS • Redis • Memcached • Amazon DynamoDB(Cloud) • Azure CosmosDB(Cloud) • Aerospike • Riak • Oracle BerkleyDB • Neo4J • Tinkerpop/Gremlin • AWS NeptuneDB(Cloud) • Azure CosmosDBw/ GremlinAPI • JanusGraph • RDFStores Graph/RDF Database Key value Store/Cache DataContinuum Polyglotpersistence • MongoDB • AWS DynamoDB(Cloud) • CouchBase /CouchDB • Azure CosmosDB(Cloud) • GCP Datastore(Cloud) • RavenDB • IBM Cloudant(Cloud) • ElasticSearch • Solr • Marklogic • Amazon CloudSearch(Cloud) • Azure Search(Cloud) Document Database Search Wide Column Store Timeseries Database • InfluxDB • Prometheus • AmazonTimestream (Cloud) • Cassandra • Hbase • Azure CossmosDB w/ Cassandra API(Cloud) • Google Cloud BigTable(Cloud)
Data Warehouse /DW (also called Enterprise Data Warehouse / EDW)
Our technology expertise & focus in DW/EDWtechnologies • Microsoft – SQL Server DW (on premise),Azure SQL Data Warehouse(cloud) • Teradata – TeradataVantage • Oracle- • AWS – Redshift, RedshiftSpectrum • Snowflake – Cloud hosted Data Warehouseas a Service(DWaaS) • Google Cloud – Google CloudBigQuery • IBM – IBM Db2 datawarehouse • Neo4j – Neo4j GraphDatabase
Data Warehouse Categorization &Trends All major DW vendors are coming up with services around Next Gen AI & Big Data enabled data warehouse Next Gen DataWarehouse TraditionalData Warehouse DataLake Modern DataWarehouse • Amazon Redshift • Snowflake • Azure SQLData Warehouse • IBMBigQuery • TeradataVantage • SQL Server 2019 Data Warehouse • IBM Db2 Data Warehouse/ Db2 DW onCloud • Oracle AutonomousDW • OracleDW • SQL ServerDW • Teradata • Hadoop HDFS • S3 • Azure BlobStorage, Azure Data Lake • DatabricksLake • Massive Parallel Processing(MPP) • Separate storage & compute layers for scale & flexibility • Ability to store/analyze semi- structureddata • Cheapstorage • MPP – Massively parallel processing • Separate storage & compute layers • SQL on unstructureddata • ML in thedatabase. • Unified analytical platformw/ support for big data, ML, graph, time series,spatial. • Support for R/Python/Scala and Spark in the coreengine • Typically runs HybridCloud • Columnarstorage • High performance, optimized queryengine • Secure, Strongtoolset • SQL support, ACID compliant, Enterprise grade • Analyticalfunctions • Flexibleschema • Capability to store & analyzeunstructured, semi structured & structureddata • SQL on unstructured • data • Unlimited/Elasticstorage • Cheap storage& compute
DW &BI Traditional DataWarehouse DataLake Next GenData Warehouse • Columnarstorage • High performance, optimized queryengine • Secure, Strongtoolset • SQL support, ACID compliant, Enterprisegrade • Analyticalfunctions • Flexibleschema • Capability to store &analyze unstructured, semi structured & structureddata • SQL on unstructureddata • Unlimited/Elasticstorage • Cheap storage &compute • Next Gen DataWarehouse • MPP – Massivelyparallel • processing • Separate storage &compute layers • SQL on unstructureddata • Unified analyticalplatform with support for big data, ML/AI, graph, time series, spatial. • Support for R/Python/Scala and Spark in the coreengine Modern DataWarehouse AI/ML • Massive ParallelProcessing • (MPP) • Separate storage & compute layers for scale& flexibility • Ability to store/analyze semi- structureddata • Cheapstorage • MachineLearning • Algorithms • Support for R/Python/Scala in the coreengine • Graphprocessing • Timeseries • Spatialsupport
Teradata Vantage – Bringing the power of AI and big data to traditionalDW • High performance SQL engine - Modern newgen NewSQL engine improving query performance at scale. • Multi genre analytics - Built in Big data, AI andgraph • analyticsengines • Supports Machine Learning – Supports R and Python apart fromSQL. • Hybrid cloud solution – available on prem, onpublic • cloud (AWS, Azure) and Teradatacloud.
SQL Server Data Warehouse // Azure SQL DataWarehouse • DataVirtualization:UsingPolybasetechnologySQLServerengineaccessdata stored on other Relational DBs (MySQL, Db2, Teradata, Oracle), NoSQL databases (MongoDB or Azure CosmosDB) and big data platforms and data lake (Hadoop HDFS, Cloudera andSpark) • IntegratedSQLand MLAnalysisengine:CananalyzedatausingSQLengine, Spark, Spark Machine Learning and SQL Server MLservices. • BigDataClusters:Providesscalablecomputeandstorageenginebasedon Spark embedded within the coredatabase. • Graphprocessing:Providespowerfulgraphprocessingonlinkeddata. • BICapabilities:BIcapabilitieswithPowerBIandReportingService • AnalysisEngine:DimensionalmodelingcapabilitieswithsupportforOLAB cubes apart from relationalmodels.
Db2 DW – Spark & R Analytics running within core databaseengine
Our technology expertise & focus in ETL & Data Integration • Informatica - PowerCenter, PowerExchange,Data Replication • IBM - IBM InfoSphere Information Server, IBM InfoSphere DataReplication, • Microsoft – SQL Server Integration Service / SSIS(On premise), Azure Data Factory(Cloud) • Talend - Talend Open Studio, Talend Data Fabric, Talend Data ManagementPlatform • Oracle - Oracle Data Integration Platform Cloud,Oracle GoldenGate (OGG), Oracle GoldenGate Cloud, Oracle Data Integrator(ODI). • Apache Nifi (opensource) • CloudOnly • AWSGlue • Alooma - now part of GoogleCloud • Panopfly–bothdataintegration& lightweightdata warehouse. Cloud SaaSsolution • Stitch – Light weightsolution • Azure DataFactory
Talend ETL and Data IntegrationPlatform • Connects to anything via 900+connectors • andcomponents • Manages data across all environments(multi- cloud andon-premises) • Supports batch, real-time, streaming, andbig data use cases. SupportsSpark. • Offers built-in machine learning, dataquality, • and governancecapabilities • Provides full API development lifecycle support • Supports on prem and cloud hosted integration platform as aservice/iPaaS solutions • Supports MDM via centralized datacalaog • support. • Supports data quality - Profile, clean, and mask data in any format or size to deliver data you can trust for the insights youneed. • Have data cleansing andpreparation • features. • • • • • • • •
Challenges in traditional ETL andsolution Clean Streaming/Messaging basedIntegration ETL hell or Integrationspaghetti Challenge: More often than not, within large enterprises there are thousands on point to point ETL pipelines, which performs data integration from source system, app databases, COTS/SaaS to data warehouses and other systems. This causes what is calledETL hell or Integration spaghetti, which is difficult to manage & operate and becomes a huge bottle neck for “digital transformation”. Traditional ETL is also not real time and can not scale to cope up with the growing datavolume. Solution: Streaming and Messaging based systems like Kafka or Kinesis or Message Bus based architecture could solve these problems. Using a pub sub based architecture removes the point to point Integration spaghetti. Also modern platforms like Kafka scales extremely well and can handle real time streaming data from varioussources.
Our technology expertise & focus in Business Intelligence &Analytics • Tableau – Tableu on prem and cloudproducts • Microsoft – Power BI, SQL ServerReporting Service(SSRS) • Qlik -Qlikview • SAS – SASplatform • Looker - now part of GoogleCloud • MicroStrategy • IBM –Cognos • TIBCO -Spotfire
PowerBI • Business analytics service that delivers insights to enablefast, informeddecisions • Could connect to all industry standard datawarehouses. • Transform data into stunning visuals and share them with colleagues on anydevice. • Visually explore and analyze data—on-premises and inthe cloud—all in oneview. • Collaborate on and share customized dashboards and interactivereports. • Scale across your organization with built-in governance and security. • Supports cloud anddesktop • versions.
Master Data Management(MDM) • Our technology expertise & focusin • MDM • Informatica: Informatica MDM,Informatica MDMCloud • IBM: IBM InfoSphere Master Data Management, IBM Master Data Management onCloud
Data Security, Privacy &Compliance • SOX (Sarbanes-OxleyAct) • Corporate Responsibility for Financial Reports (Section 302) - CEOs and CFOs must review all financial reports and that the reports are "fairly presented" and don't contain misrepresentations. • Management Assessment of Internal Controls (Section 404) - requires companies to publish details about their internal accounting controls and their procedures for financial reporting as part of their annual financialreports • GDRP • Consent management • Right to besecured • Dataminimization • Right toportability • Right to beinformed • Right to beforgotten • PCIDSS • Information security standard for organizations that handle branded credit cards from the major cardschemes. • Build and Maintain a Secure Network and Systems • Protect CardholderData • Maintain a Vulnerability ManagementProgram • Implement Strong Access ControlMeasures • Regularly Monitor andTest • Networks • Maintain an Information SecurityPolicy HIPPA Patient health information (examples below) needs to be “protected”-
GDPRSolution • Change in Customer journey/UX and database for- • Requests for consent must be simple to understand, clearly requested, and as easy to give aswithdraw. • Opt-in marketing will replace opt-out marketing in the post GDPR era. ConsentManagement Right to besecured All PII data be secured by pseudonymization or encryption, whether at rest or intransit. Change in Customer journey/UX and database for- • personal data collected be “adequate, relevant, and kept no longer than necessary for which the personal dataare • processed”. • Outdated and irrelevant data must beeliminated. Dataminimization Customers have the right to export their PII data in an encrypted format, such that it can easily be imported into a different IT environment. This could have huge implications in big data ecosystems. For example, a customer could request to have their telematics data transferred from one insurance carrier toanother. Right toportability In the post-GDPR world, customers will have the right to request and be shown how and why they were targeted for a specific marketingcampaign. Right to beinformed Three fundamental aspects comprise the right to beforgotten. • First, the customer has the right to “Opt Out” from receiving marketingcommunications. • Second, customers have the right to have their PII marketing dataanonymized. • Last, in most instances, customers can refuse to be analyzed. That means, even if you lawfully collect the data, customers can still say no to profiling; e.g., having their data analyzed for preferences and buyingbehavior. Right to beforgotten
DW &BI Futureof DW/BI • Modern next gen datawarehouses • Cloud datawarehouses • Data Warehouse + Data Lake basedsolutions • AI & Big data enabled Data warehouseplatforms • No ETLMovement • Messaging & Streaming platforms for dataintegration • Datavirtualization • SaaS/Cloud ETL platforms • ELT (Entry Load transform) for big dataworkloads • End to endDataOps • MobileBI • Cloud based BIsolutions • Self Service BI &Analytics