200 likes | 207 Views
A Database View of Intelligent Disks. James Hamilton JamesRH@microsoft.com Microsoft SQL Server. Overview. Do we agree on the assumptions? A database perspective: Scalable DB clusters are inevitable Affordable SANs are (finally) here Admin & Mgmt costs dominate
E N D
A Database View ofIntelligent Disks James Hamilton JamesRH@microsoft.com Microsoft SQL Server
Overview • Do we agree on the assumptions? • A database perspective: • Scalable DB clusters are inevitable • Affordable SANs are (finally) here • Admin & Mgmt costs dominate • Intelligent disks are coming • DB exploitation of intelligent disk • Failed DB machines all over again? • Use of intelligent disk: NASD model? • full server slice vs. a file block server • Conclusions
Do we agree on the assumptions? • From the NASD web page: “Computer storage systems are built from sets of disks, tapes, disk arrays, tape robots, and so on, connected to one or more host computers. We are moving towards a world where the storage devices are no longer directly connected to their host systems, but instead talk to them through an intermediate high-performance, scalable network such as FibreChannel. To fully benefit from this, we believe that the storage devices will have to be come smarter and more sophisticated.” • Premise: conclusion is 100% correct; we’ll question the assumptions that led to it • There are alternative architectures with strong advantages for DB workloads
Clusters Are Inevitable: Query • Data intensive application workloads (data warehousing, data mining, & complex query) are growing quickly • Greg’s law: DB capacity growing 2X every 9-12 months (Patterson) • DB capacity requirements growing super-Moore • Complex query workloads tend to scale with DB size • many CPU’s required • “Shared memory is fine as long as you don’t share it” (Helland) • clusters only DB architecture with sufficient scale • We only debate at what point, not if, clusters are required
Clusters Are Inevitable: TP Apps • Most database transaction workloads currently hosted on SMPs • Prior to the web, TP workloads tended to be reasonably predictable • TP workloads scale with customer base/business size • Load changes at speed of business change (typically slow) • Web puts back office in the front office • Much of the world has direct access - very volatile • Capacity planning goes from black art to impossible • Server capacity variation is getting much wider • Need near infinite, incremental growth capability with potential to later de-scale • Wheel-in/wheel out upgrade model doesn’t work • clusters are only DB architecture with sufficient incremental growth capability
Clusters Are Inevitable: Availability • Non-cluster server architectures suffer from many single points of failure • Web enabled direct server access model driving high availability requirements: • recent high profile failures at eTrade and Charles Schwab • Web model enabling competition in access to information • Drives much faster server side software innovation which negative impacts quality • “Dark machine room” approach requires auto-admin and data redundancy (Inktomi model) • 42% of system failures admin error (Gray) • Paging admin at 2am hoping for quality response is dangerous • Fail fast design approach is robust but only acceptable with redundant access to redundant copies of data • Cluster Architecture is required for availability
Shared Nothing Clusters Are Inevitable • Data-intensive application capacity growth requirement is seriously super-Moore • Increasing proportion of apps are becoming data intensive: • E.g. High end web sites typically DB backed • Transaction workloads now change very rapidly and unpredictably • High availability increasingly important • Conclusion: cluster database architecture is required • supported by Oracle, IBM, Informix, Tandem, … • Why don’t clusters dominate today? • High inter-server communications costs • Admin & management costs out of control
Affordable SANs Are (Finally) Here • TCP/IP send/receive costs on many O/Ss in 15k instr range • some more than 30K • Communications costs makes many cluster database application model impractical • Bandwidth important, but prime issues CPU consumption and, to lesser extent, latency • A system area network (SAN) is used to connect clustered servers together • typically high bandwidth • Send/receive without O/S Kernel transition (50 to 100 instructions common) • Round trip latency in 15 microsecond range • SANs not new (e.g. Tandem) • Commodity-priced parts are new (Myrinet, Giganet, Severnet, etc.) and available today • www.viarch.org
Admin & Mgmt Costs Dominate • Bank of America: “You keep explaining to me how I can solve your problems” • Admin costs single largest driver of IT costs • Admitting we have a problem is first step to a cure: • Most commercial DBs now focusing on admin costs • SQL Server: • Enterprise manager (MMC framework--same as O/S) • Integrated security with O/S • Index tuning wizard (Surajit Chaudhuri) • Auto-statistics creation • Auto-file grow/shrink • Auto memory resource allocation • “Install and run” model is near • Trades processor resources for admin costs
Intelligent Disk are Coming • Fatalism: they’re building them so we might as well figure out how to exploit (Patterson trying to get us DB guys to catch on) • Reality: disk manufacturers work with very thin margins and will continue to try to add value to their devices (Gibson) • Many existing devices already (under-) exploiting commodity procs (e.g. 68020) • Counter argument: Prefer general purpose processor for DB workloads: • Dynamic workload requirements: computing joins, aggregations, applying filters, etc. • What if it was both a general purpose proc and embedded on disk controller?
DB Exploitation of Intelligent Disk • Each disk includes network, CPU, memory and drive subsystem • All on disk package—it already had power, chassis and PCB • scales as a unit in small increments • Runs full std O/S (e.g. Linux, NT, …) • Each is a node in single image, shared nothing database cluster • Continues long standing DB trend of moving function to the data: • Stored procedures • Joins done at server • Internally as well: SARGable predicates run in storage engine
DB Exploitation of Intelligent Disk • Client systems are sold complete: • Include O/S, relevant device drivers, office productivity apps, … • Server systems require weeks to months of capacity planning, training, installing, configuring, and testing before going live • Let’s make the client model work for servers: • Purchase a “system” (frame & 2 disk, cpu, memory, and network units) • Purchase server-slices as required when required • Move to a design point where H/W is close to free and admin costs dominate design decisions • High hardware volume still drives significant revenue
DB Exploitation of Intelligent Disk • Each slice contains S/W for file, DB, www, mail, directory, … no install • Adding capacity is plugging in a few more slices and choosing personality to extend • Due to large number of components in system reliability an issue • “Nothing fails fast … just eventually performs poorly enough to be “fired” … typically devices don’t just “quit” (Patterson) • Introspection is key: dedicate some resources to tracking intermittent errors and predicting failure • Take action prior to failure … RAID-like model where disks fail but system keeps running • Add slices when capacity increase or accumulating failures require it
Failed DB machines all over again? • Numerous past projects both commercial & research • Britton Lee probably best remembered • Solutions looking for a problem (Stonebraker) • What went wrong? • Special purpose hardware with low volume • High, difficult to amortize engineering costs • Fell off general purpose system technology curve • Database sizes were smaller and most server systems were not single function machines • Non-standard models for admin, management, security, programming, etc.
How about H/W DB accelerators? • Many efforts to produce DB accelerators • E.g. ICL CAFS • I saw at least one of these proposals a year while I was working on DB2 • Why not? • The additional H/W only addresses a tiny portion of total DB function • Device driver support required • Substantial database engineering investment required to exploit • Device must have intimate knowledge of database physical row format in addition to logical properties like international sort orders (bug-for-bug semantic match) • Low volume so devices quickly fall off commodity technology curve • ICL CAFS supported single proc & general commodity SMPs made irrelevant
Use of intelligent disk: NASD? • NASD has architectural advantages when data can be sent from block server directly to client: • Many app-models require significant server side processing preventing direct transfer (e.g. all database processing) • Could treat the intermediate server as a NASD “client” • Gives up advantages of not transferring data through intermediate server • Each set of disk resources requires additional network, memory, and CPU resources • Why not add together as self contained locally attached unit? • Rather than directly transfer from the disk to the client, move intermediate processing to data (continuation of long database tradition)
Use of Intelligent disk: NASD Model? • Making disk unit full server-slice allows use of existing: • Commodity operating system • device drivers framework and drivers • file system (API and on-disk format) • No client changes required • Object naming and directory lookup • Leverage on-going DB engineering investment • LOB apps (SAP, Peoplesoft … ) • security, admin, and mgmt infrastructure • Customer training and experience • Program development environment investment • if delivered as peer nodes in a cluster, no mass infrastructure re-write required prior to intelligent disk adoption
Use of Intelligent disk: NASD Eng. Costs • New device driver model hard to sell: • OS/2 never fully got driver support • NT still has less support than Win95/98 • Typical UNIX systems support far fewer devices • Getting new file system adoption difficult • HPFS on OS/2 never got heavy use • After a decade NTFS now getting server use • Will O/S and file system vendors want new server side infrastructure: • What is upside for them? • If written and evangelized by others, will it be adopted without system vendor support? • Intelligent disk is the right answer, question is what architecture exploits them best and promotes fastest adoption
Conclusions • Intelligent disk will happen • An opportunity for all of us to substantially improve server-side infrastructure • NASD could happen but alternative architectures also based upon intelligent disk appear to: • Require less infrastructure re-work • Offer more benefit to non-file app models (e.g. DB) • Intelligent disk could form generalized, scalable server side component • CPU, network, memory, and disk • Emulate client-side sales and distribution model: all software and hardware included in package • Client side usage model: use until fails and then discard
A Database View ofIntelligent Disks James Hamilton JamesRH@microsoft.com Microsoft SQL Server