1 / 20

A Database View of Intelligent Disks

A Database View of Intelligent Disks. James Hamilton JamesRH@microsoft.com Microsoft SQL Server. Overview. Do we agree on the assumptions? A database perspective: Scalable DB clusters are inevitable Affordable SANs are (finally) here Admin & Mgmt costs dominate

lstewart
Download Presentation

A Database View of Intelligent Disks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Database View ofIntelligent Disks James Hamilton JamesRH@microsoft.com Microsoft SQL Server

  2. Overview • Do we agree on the assumptions? • A database perspective: • Scalable DB clusters are inevitable • Affordable SANs are (finally) here • Admin & Mgmt costs dominate • Intelligent disks are coming • DB exploitation of intelligent disk • Failed DB machines all over again? • Use of intelligent disk: NASD model? • full server slice vs. a file block server • Conclusions

  3. Do we agree on the assumptions? • From the NASD web page: “Computer storage systems are built from sets of disks, tapes, disk arrays, tape robots, and so on, connected to one or more host computers. We are moving towards a world where the storage devices are no longer directly connected to their host systems, but instead talk to them through an intermediate high-performance, scalable network such as FibreChannel. To fully benefit from this, we believe that the storage devices will have to be come smarter and more sophisticated.” • Premise: conclusion is 100% correct; we’ll question the assumptions that led to it • There are alternative architectures with strong advantages for DB workloads

  4. Clusters Are Inevitable: Query • Data intensive application workloads (data warehousing, data mining, & complex query) are growing quickly • Greg’s law: DB capacity growing 2X every 9-12 months (Patterson) • DB capacity requirements growing super-Moore • Complex query workloads tend to scale with DB size • many CPU’s required • “Shared memory is fine as long as you don’t share it” (Helland) • clusters only DB architecture with sufficient scale • We only debate at what point, not if, clusters are required

  5. Clusters Are Inevitable: TP Apps • Most database transaction workloads currently hosted on SMPs • Prior to the web, TP workloads tended to be reasonably predictable • TP workloads scale with customer base/business size • Load changes at speed of business change (typically slow) • Web puts back office in the front office • Much of the world has direct access - very volatile • Capacity planning goes from black art to impossible • Server capacity variation is getting much wider • Need near infinite, incremental growth capability with potential to later de-scale • Wheel-in/wheel out upgrade model doesn’t work • clusters are only DB architecture with sufficient incremental growth capability

  6. Clusters Are Inevitable: Availability • Non-cluster server architectures suffer from many single points of failure • Web enabled direct server access model driving high availability requirements: • recent high profile failures at eTrade and Charles Schwab • Web model enabling competition in access to information • Drives much faster server side software innovation which negative impacts quality • “Dark machine room” approach requires auto-admin and data redundancy (Inktomi model) • 42% of system failures admin error (Gray) • Paging admin at 2am hoping for quality response is dangerous • Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data • Cluster Architecture is required for availability

  7. Shared Nothing Clusters Are Inevitable • Data-intensive application capacity growth requirement is seriously super-Moore • Increasing proportion of apps are becoming data intensive: • E.g. High end web sites typically DB backed • Transaction workloads now change very rapidly and unpredictably • High availability increasingly important • Conclusion: cluster database architecture is required • supported by Oracle, IBM, Informix, Tandem, … • Why don’t clusters dominate today? • High inter-server communications costs • Admin & management costs out of control

  8. Affordable SANs Are (Finally) Here • TCP/IP send/receive costs on many O/Ss in 15k instr range • some more than 30K • Communications costs makes many cluster database application model impractical • Bandwidth important, but prime issues CPU consumption and, to lesser extent, latency • A system area network (SAN) is used to connect clustered servers together • typically high bandwidth • Send/receive without O/S Kernel transition (50 to 100 instructions common) • Round trip latency in 15 microsecond range • SANs not new (e.g. Tandem) • Commodity-priced parts are new (Myrinet, Giganet, Severnet, etc.) and available today • www.viarch.org

  9. Admin & Mgmt Costs Dominate • Bank of America: “You keep explaining to me how I can solve your problems” • Admin costs single largest driver of IT costs • Admitting we have a problem is first step to a cure: • Most commercial DBs now focusing on admin costs • SQL Server: • Enterprise manager (MMC framework--same as O/S) • Integrated security with O/S • Index tuning wizard (Surajit Chaudhuri) • Auto-statistics creation • Auto-file grow/shrink • Auto memory resource allocation • “Install and run” model is near • Trades processor resources for admin costs

  10. Intelligent Disk are Coming • Fatalism: they’re building them so we might as well figure out how to exploit (Patterson trying to get us DB guys to catch on) • Reality: disk manufacturers work with very thin margins and will continue to try to add value to their devices (Gibson) • Many existing devices already (under-) exploiting commodity procs (e.g. 68020) • Counter argument: Prefer general purpose processor for DB workloads: • Dynamic workload requirements: computing joins, aggregations, applying filters, etc. • What if it was both a general purpose proc and embedded on disk controller?

  11. DB Exploitation of Intelligent Disk • Each disk includes network, CPU, memory and drive subsystem • All on disk package—it already had power, chassis and PCB • scales as a unit in small increments • Runs full std O/S (e.g. Linux, NT, …) • Each is a node in single image, shared nothing database cluster • Continues long standing DB trend of moving function to the data: • Stored procedures • Joins done at server • Internally as well: SARGable predicates run in storage engine

  12. DB Exploitation of Intelligent Disk • Client systems are sold complete: • Include O/S, relevant device drivers, office productivity apps, … • Server systems require weeks to months of capacity planning, training, installing, configuring, and testing before going live • Let’s make the client model work for servers: • Purchase a “system” (frame & 2 disk, cpu, memory, and network units) • Purchase server-slices as required when required • Move to a design point where H/W is close to free and admin costs dominate design decisions • High hardware volume still drives significant revenue

  13. DB Exploitation of Intelligent Disk • Each slice contains S/W for file, DB, www, mail, directory, … no install • Adding capacity is plugging in a few more slices and choosing personality to extend • Due to large number of components in system reliability an issue • “Nothing fails fast … just eventually performs poorly enough to be “fired” … typically devices don’t just “quit” (Patterson) • Introspection is key: dedicate some resources to tracking intermittent errors and predicting failure • Take action prior to failure … RAID-like model where disks fail but system keeps running • Add slices when capacity increase or accumulating failures require it

  14. Failed DB machines all over again? • Numerous past projects both commercial & research • Britton Lee probably best remembered • Solutions looking for a problem (Stonebraker) • What went wrong? • Special purpose hardware with low volume • High, difficult to amortize engineering costs • Fell off general purpose system technology curve • Database sizes were smaller and most server systems were not single function machines • Non-standard models for admin, management, security, programming, etc.

  15. How about H/W DB accelerators? • Many efforts to produce DB accelerators • E.g. ICL CAFS • I saw at least one of these proposals a year while I was working on DB2 • Why not? • The additional H/W only addresses a tiny portion of total DB function • Device driver support required • Substantial database engineering investment required to exploit • Device must have intimate knowledge of database physical row format in addition to logical properties like international sort orders (bug-for-bug semantic match) • Low volume so devices quickly fall off commodity technology curve • ICL CAFS supported single proc & general commodity SMPs made irrelevant

  16. Use of intelligent disk: NASD? • NASD has architectural advantages when data can be sent from block server directly to client: • Many app-models require significant server side processing preventing direct transfer (e.g. all database processing) • Could treat the intermediate server as a NASD “client” • Gives up advantages of not transferring data through intermediate server • Each set of disk resources requires additional network, memory, and CPU resources • Why not add together as self contained locally attached unit? • Rather than directly transfer from the disk to the client, move intermediate processing to data (continuation of long database tradition)

  17. Use of Intelligent disk: NASD Model? • Making disk unit full server-slice allows use of existing: • Commodity operating system • device drivers framework and drivers • file system (API and on-disk format) • No client changes required • Object naming and directory lookup • Leverage on-going DB engineering investment • LOB apps (SAP, Peoplesoft … ) • security, admin, and mgmt infrastructure • Customer training and experience • Program development environment investment • if delivered as peer nodes in a cluster, no mass infrastructure re-write required prior to intelligent disk adoption

  18. Use of Intelligent disk: NASD Eng. Costs • New device driver model hard to sell: • OS/2 never fully got driver support • NT still has less support than Win95/98 • Typical UNIX systems support far fewer devices • Getting new file system adoption difficult • HPFS on OS/2 never got heavy use • After a decade NTFS now getting server use • Will O/S and file system vendors want new server side infrastructure: • What is upside for them? • If written and evangelized by others, will it be adopted without system vendor support? • Intelligent disk is the right answer, question is what architecture exploits them best and promotes fastest adoption

  19. Conclusions • Intelligent disk will happen • An opportunity for all of us to substantially improve server-side infrastructure • NASD could happen but alternative architectures also based upon intelligent disk appear to: • Require less infrastructure re-work • Offer more benefit to non-file app models (e.g. DB) • Intelligent disk could form generalized, scalable server side component • CPU, network, memory, and disk • Emulate client-side sales and distribution model: all software and hardware included in package • Client side usage model: use until fails and then discard

  20. A Database View ofIntelligent Disks James Hamilton JamesRH@microsoft.com Microsoft SQL Server

More Related