1 / 35

Where Should My Data Live (and Why)?

Join Matt Gordon, Architect II at Insight Digital Innovation, as he discusses the factors and considerations in determining where your data should reside. Explore the pros and cons of cloud and on-premises deployment options, and learn about case studies to help inform your decision-making process.

miar
Download Presentation

Where Should My Data Live (and Why)?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matt Gordon, Architect II, Insight Digital Innovation Where Should My Data Live (and Why)?

  2. Speaker Info Matt Gordon Architect II Matt.Gordon@insight.com :@sqlatspeed www.sqlatspeed.com

  3. Evaluations Saturday feedback: https://forms.gle/rddzwGffw3qa8JZu5 Conference feedback: https://forms.gle/8e96WvDuTvLAP9SA6

  4. About Me – What I’ve Done • 15+ years of SQL Server experience • Microsoft Data Platform MVP • 2019 IDERA ACE • MCSE: Data Management and Analytics • Two-time PASS Summit speaker • Leader of Lexington, KY (USA) PASS Local Group

  5. About Me – Where I Live

  6. About Me – Where I Live (Kentucky, USA)

  7. How I Picked My Twitter Handle and Domain Name

  8. Agenda • Where Does Our Data Live Now? • Why Does Our Data Live Where It Does? • Cloud, On-Premises, or Both? • Case Studies • Wrap-up

  9. Definitions

  10. Discussion Points • Cloud is just somebody else’s computer in somebody else’s datacenter • Rapid development from cloud providers constantly expands options • Are you locked into deployment locations for certain platforms? • Database engine always on-premises • Hadoop always in cloud • Blending of technologies and platforms may/may not be the right answer

  11. Where Does Our Data Live Now?

  12. Where Does Our Data Live Now? • How many of us do not have a single data environment in the cloud? • How many of us have only test/dev/QA data environments in the cloud? • How many of us have a “trial” production data environment in the cloud? • How many of us have all production data environments in the cloud? • How many of us have all (or nearly all) data environments in the cloud?

  13. Why Does Our Data Live Where It Does?

  14. Why Does Our Data Live Where It Does? • Cost • Leveraging “investments” • Can cost less if uptime is not critical • Comfort level • “I can go see it” • Physical control and security • Data accessible even when all external telecom is down • Licensing • On-Premises Pros

  15. Why Does Our Data Live Where It Does? • Generally requires large up-front investment • Requires corresponding infrastructure • Rackspace, cooling, cabling, telecom, fire suppression, etc. • May require backup datacenter • Depends on uptime requirements • On-site personnel often needed to maintain operations • More expensive from a resource perspective • On-Premises Cons

  16. Why Does Our Data Live Where It Does? • Cost • Buy only what you need • Scalability (vertical and horizontal) • Global redundancy • Storage durability • Data availability from all locations • PaaS often satisfactory to government security audits/approvals • High availability and disaster recovery often built-in* • Cloud Pros

  17. Why Does Our Data Live Where It Does? • Can require robust Internet connectivity • VPN cost can be significant • Minimal to no control over underlying infrastructure • “Noisy neighbors” • Design apps to deal with connection hiccups more efficiently • Perception of lighter security • “Things happen by magic” • Cloud Cons

  18. Cloud, On-Premises, or Both?

  19. Cloud Deployment Options (Azure) • Mimics on-premises behavior but resources are on Azure • Full control of configuration • Full control of maintenance • MPP cloud-based, scale-out, relational database • Separates storage and compute • Can pause compute capacity when not needed • PaaS flavor of SQL Server database • Very limited control of maintenance • Limited control of configuration • Microsoft’s flavor is known as HDInsight • Used for semi-structured data • Can connect from database engine using PolyBase • SQL Server (IaaS) • Azure SQL DW • Azure SQL DB • Hadoop

  20. Cloud Deployment Options (Amazon) • Mimics on-premises behavior but resources are on Amazon EC2 • Full control of configuration • Full control of maintenance • Amazon equivalent of Azure SQL DW • Fully managed • Easily scalable • Amazon PaaS offering • Supports six database engines • Minimal configuration control • Amazon’s HDInsight equivalent is EMR • Supports traditional Hadoop tooling • Can connect from database engine using PolyBase • SQL Server on EC2 • Amazon Redshift • Amazon RDS • Hadoop

  21. Cloud Deployment Options (Google) • SQL Server on Google Cloud Platform is IaaS offering • Full control of configuration • Full control of maintenance • Multiple versions and editions supported • Currently requires large amounts of data for efficient performance • PaaS flavor of database engines • Supports MySQL and PostgreSQL (beta) • Fully-managed • Google’s fully-managed flavor is known as Google Cloud Dataproc • Used for semi-structured data • Can connect from database engine using PolyBase • Google Compute Engine • Google BigQuery • Google Cloud SQL • Hadoop

  22. On-Premises Deployment Options • Traditional deployment of the database engine • Full control of configuration • Full control of maintenance • MPP appliance • Evolution of Parallel Data Warehouse • Architecture of Azure SQL DW based on this design • No on-premises equivalent of Azure SQL Database • Microsoft’s flavor is known as HDInsight • Many other non-Microsoft deployment options • Can connect from database engine using PolyBase • SQL Server • Microsoft APS • PaaS Database • Hadoop

  23. Pause for One More PaaS Option… • New deployment model of Azure SQL Database • Provides near 100% compatibility with SQL Server on-premises Enterprise Edition • Preserves PaaS capabilities • Automatic patching & version updates • Automated backups • High availability • Using DMS (Data Migration Service) is a popular lift & shift path • Azure SQL Database Managed Instance

  24. Hybrid Deployment Options/Scenarios On-Premises App Servers & Azure SQL DB Availability Groups with Azure Replica(s) Easy to create and destroy databases as needed for development and deployment Removes management responsibility from devs Good choice if DBA team short on resources Uses Azure as backup datacenter(s) Requires robust network infrastructure Good for minimum datacenter proximity requirements

  25. Hybrid Deployment Options/Scenarios Replication to Azure IaaS VM Replication to Azure SQL Database Tried and true technology in use Identical to doing this on-premises other than network portion Good way to ease into comfort with the cloud Azure SQL Database can be a replication subscriber Eases DBA team into cloud and PaaS interactions Straightforward setup

  26. Demo Setting up replication to Azure SQL Database

  27. Hybrid Deployment Options/Scenarios Log Shipping to Azure IaaS VM PolyBase to Azure Blob Storage Popular with customers who want a copy of data stored completely off-site Straightforward setup Expands environment without requiring cluster or other complicated infrastructure Great for querying large quantities of semi-structured data Good way to introduce team to PolyBase Subject of our first case study

  28. Case Studies

  29. Transportation Planning Agency • Statistical models generating TBs of output every year • Storage costs spiraling upward and difficult to manage • Output stored in relational database tables requiring constant maintenance • Output generated as text files which were fed into the relational tables • Loaded output files into Azure Blob Storage (cold) • Query performance increased • Storage costs decreased by 96% ($2k per year vs. $75k per year)

  30. Demo Querying statistical model outputs stored in Azure Blob Storage

  31. Geospatial Research Center • Hosted Hadoop cluster • Hosted HDFS storage storing Excel, CSV, XML, JSON, etc. • SQL Server installed on Azure VMs • Database engine, DQS, MDS, and SSAS in use • PolyBase used to query semi-structured data from main SQL Server databases • Data consumers presented with common interface to access heterogeneous data

  32. Wrap-up

  33. Discussion Points • Cloud is just somebody else’s computer in somebody else’s datacenter • Rapid development from cloud providers constantly expands options • Are you locked into deployment locations for certain platforms? • Database engine always on-premises • Hadoop always in cloud • Blending of technologies and platforms may/may not be the right answer

  34. Recommendations • Set expectations what cloud technologies are and what they can do • Management • Team • HA/DR isn’t done by magic – it’s just different • Stay abreast of new technologies • Research • Training • Azure Stack • Embrace it all!

  35. Thank You For Attending! Matt Gordon Architect II Matt.Gordon@insight.com :@sqlatspeed www.sqlatspeed.com

More Related