1 / 47

Database Performance in The Era of Free Computing

Database Performance in The Era of Free Computing. Jim Gray Microsoft Research http://Research.Microsoft.com/~Gray/Talks. Generic Keynote Outline . The glorious past The uncertain present A promising path forward. Benchmarks are Wonderful Things Transactions Per Second Queries Per Minute.

Download Presentation

Database Performance in The Era of Free Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Performancein The Era of Free Computing Jim Gray Microsoft Research http://Research.Microsoft.com/~Gray/Talks

  2. Generic Keynote Outline • The glorious past • The uncertain present • A promising path forward

  3. Benchmarks are Wonderful ThingsTransactions Per SecondQueries Per Minute • They set performance agenda for industry • Many performance bugs get fixed • But their time passes • Became an end-in-themselves • Eventually became negative: • Improving good parts (benchmark specials) • rather than fixing holes (load/backup/restore/reorg)

  4. Example…. • 1985 goal: 1,000 transactions per second • Couldn’t do it at the time • At the time: • 100 transactions/second • 50 M$ for the computer (y2005 dollars)

  5. Old Problems Now Look Easy • 1985 goal: 1,000 transactions per second • Couldn’t do it at the time • At the time: • 100 transactions/second • 50 M$ for the computer (y2005 dollars) • Now: easy • Laptop does 8,200 debit-credit tps • ~$400 desktop Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive, Gray & Levine, MSR-TR-2005-39, ftp://ftp.research.microsoft.com/pub/tr/TR-2005-39.doc

  6. 2006: 5.5..7.4 tpmC/MHz 1000.00 TPC-A and TPC-C tps/$ Trends 100.00 10.00 Throughput / k$ TPC-C TPC A 1.00 ~100x in 10 years ~2x per 1.5 years 0.10 0.01 1990 1992 1994 1996 1998 2000 2002 2004 No obvious end in sight! Hardware & Software Progress Throughput/$ 2x per 1.5 years 40%/y hardware, 20%/y software Throughput 2x per 2 years tracks MHz (except lately!) ~2x / 1.5 years A Measure of Transaction Processing 20 Years Laterftp://ftp.research.microsoft.com/pub/tr/TR-2005-57.doc IEEE Data Engineering Bulletin, V. 28.2, pp. 3-4, June 2005

  7. tpcC goes crazy • 16M$ computer • 64 cores (processors) • 2.5M users • 6,540 disks • 3.2 M tpmC @5$/tpmC

  8. Amazing Price/Performance TPC-C results referenced above are Dell PowerEdge running SQL Server 2005, 38,622 tpmC, .99 $/tpmC, available 11/8/05

  9. TpcC • 1$/tmpC @39ktpmC

  10. More Glorious Past • Big success in semi-standard benchmarks • SAP, Peoplesoft benchmarks … • Data Mining suites • OLAP council • SPEC: cpu, files, … • PCmags: NetBench,… • Micros: Streams,… • Internet2 Land Speed Record

  11. Outline • The glorious past • The uncertain present • A promising path forward

  12. Tps: A solved problem? • 6B people on the planet • 1$ tpmC => 1B$ ~ 150 transactions/day for every person on the planet. • A 0B$ industry? • Need new model of computers applications. But,… Industry is chasing TOP500 (LINPAC fits in cache)TpmC: Millions of tpmC

  13. A Way Out: Use XML & Perl? • XML is easy! • Perl … is easy • But it only gives you 10x fewer tps • 800 tps (64 M transactions/day) is more than most companies need!And… that’s only a laptop. Philippe Lacoude, “Pushing SQL Server 2005 Limits - Dealing with Oversized XML Documents” http://www.lacoude.com/docs/public/public.aspx?doc=SQL90XML.pdf

  14. Outline • The glorious past • The uncertain present • A promising path forward

  15. What Makes A Good Benchmark • Relevant: measures something interesting • Simple: easy to understand • Portable: vendor / technology neutral • Scalable: grows with technology/time • I am complaining about the “relevance”.

  16. Two Kinds of Benchmarks • Evaluate some design concept • A performance study others can replicate • Examples: Wisconsin, Sort, DebitCredit, XML • A performance agenda • A formal standards group • Examples: SPEC, TPC,… Evaluation is HARD but much easier than Agendas.

  17. Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task

  18. What Do You Mean? • You are probably asking: • HUH? • What do you mean? • Can you give two examples. • Here are three things we did recently • Sort • To Blob or Not to Blob? • Copy a PetaByte?

  19. Records per Second per CPU 1.E+6 slow improvement after 1995 cache conscious 1.E+5 Super 1.E+4 records/sec/cpu 1.E+3 GPU better memory architecture, so finally more records/second Mini 1.E+2 1.E+1 1985 1990 1995 2000 2005 Sort 100 byte records (minute / penny)Shows We Hit Memory Ceiling in 1995 http://research.microsoft.com/barc/SortBenchmark/ • Sort recs/s/cpuplateaued in1995 • Had to get GPU to getbetter Memory bandwidth • SIGMOD 2006GpuTeraSort

  20. Sort Performance/Price improved comparable to TpmC • Based on parallelism and “commodity” not per-cpu performance.

  21. To Blob or Not To Blob • For objects X smaller than 1MBSelect X into x from T where key = 123faster than h = open(X); read(h,x,n); close(h) • So, blob beats file for objects < 1MB (on SQL Server – what about other DBs?) • Because DB is CISC and FS is RISC • Most things are less than 1MB • DB should work to make this 10MB • File system should borrow ideas from DB. “To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?” Rusty Sears, Catharine Van Ingen, Jim Gray, MSR-TR-2006-45, April 2006

  22. How Often do Disks Fail?

  23. What About Bit Error Rates • Uncorrectable Errors on Read (UERs) • Quoted uncorrectable bit error rates10-13 to 10-15 • That’s 1 error in 1TB to 1 error in 100TB • WOW!!! • We moved 1.5 PB looking for errors • Saw 5 UER events • 3 real, 3 of them were masked by retry • Many controller fails and system security reboots • Conclusion: • UER not a useful metric – want mean time to data loss • UER better than advertised. Empirical Measurements of Disk Failure Rates and Error Rates Jim Gray, Catharine van Ingen, Microsoft Technical Report MSR-TR-2005-166

  24. So, You Want to Copy a Petabyte? • Today, that’s 4,000 disks (read 2k write 2k) • Takes ~4 hours if they run in parallel, but… • Probably not one file. • You will see a few UERs. • What’s the best strategy? • How fast can you move a Petabyte from CERN to Pasadena? Is sneaker-net fastest and cheapest?

  25. UER things I wish I knew • Better statistics from larger farms, and more diversity. • What is the UER on a LAN, WAN? • What is the UER over time: for a file on disk for a disk • What’s the best replication strategy? • Symmetric (1+1)+(1+1) or triplex (1+1) + 1

  26. More Generally: Performance Metrics • Operational performance tests • Things that help us design systems • We can do these things as research tasks. • but they are harder than they look characterize the problem do the actual measurements. • In the end they look simple & easy, but not in the beginning

  27. More on Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task

  28. Ease-of-Use is HARD • Few ease-of-use metrics • They are difficult • “ease of use is what you are used to” • Big difference are obvious • Rock vs hammer • It depends on the task • Python vs MathLab vs FrontPage • Defining the tasks seems to define the winner • Bench Marketing: for every product there is a benchmark at which it is “the best”.

  29. Computers Are “Free” • Processing cost is going to zero $/instruction • Storage cost is going to zero $/byte & $/access • Networking cost is going to zero $/message • Operations cost is going to zero $/node • Ratios are changing: • Distance: bandwidth/latency →∞ • Heat: Access/Second/GB → 0

  30. But… • So, anyone with non-zero budget has infinite storage and processing and … • Several groups are deploying • Fractions of EXABYTES • Fractions of MILLIONS OF NODES (peta ops) • Fractions of TERABYTES/sec OF NETWORK • Microsoft, Yahoo!, Google, Spooks spending x$B (not $1..)so they have 1storage/processing/networking

  31. Oh! And PEOPLE COSTS are HUGE! • People costs always exceeded IT capital. • But now that hardware is “free” … • Key Goal: • self-organizing . • self-healing, • No DBAs for cell phones or cameras.

  32. CapX (capital expense) • 1$/GB..100$/GB => 1B$...100B$/ExaByte • 1k$..100k$/node => 1B$..100B$ /MegaNodes • 10$..1k$ /Mb/s/mo =>1B$ .. 100B$/TB/s/y (200 lambdas) • And then there is the power and building!!!

  33. Price/Performance Still Matters • 10% of 1B$ is significant! == $100,000,000 • So, “small improvements” are significant. • 12 TB + cpu + RAM … for less than 10k$ • How much less??

  34. OpX (Operations Expense) • Gartner numbers • 300k$/TB/y => 300B$/ExaB/y • 7k$/node/y => 7B$/M-nodes/y • So… Have to do things differently… • Autopilot: touch free • Make OpX less than 10% of CapX => less than person per 10M$ CapX person per ~ 10,000 processors person per ~ 40,000 disks person per ~ 1,000 routers

  35. And the Ratios are ChangingOther things approaching zero • Bandwidth/latency • CapX/Opx • Access/sec vs Storage Capacity • Moore’s wall: chips/thread • cost-per-person is rising.

  36. TCO: Performance Metrics • TCO (Total Cost Of Ownership) • CapX and OpX interact • You can waste hardware to save OpX • Metrics (CapX + OpX): • $/ExaB/year • $/PetaOp/year • $/PageViews/year • Availability (fraction of requests serviced) • Agreed: • It is difficult to do this in academe • It is difficult to publish results • Competitive issues • Reviewers do not “get it”. • But it is very important

  37. KVM / IP The TerraServer Story(not able to talk about others) • 1997: 8 DEC Alpha, • 8GB, • 480x18GB disks • ~1 TB • 2000: 4x8 Pentium3 600Mhz, • 16GB ram • 540 36GB FC SCSI disks • FC SAN • ~18TB • 2004: 7x2 Xeon • ~100 250 GB SATA disks • 28 TB • 70k$ • NO TAPE Now antique but you get the idea “TerraServer Bricks – A High Availability Cluster Alternative” MSR-TR-2004-107 “TerraServer Cluster and SAN Experience” MSR-TR-2004-67

  38. Oh! And there is the other 99% • Mega-servers are just the core • Most processing is in the periphery • Billions of Clients need Millions of Servers • Sensors, Cameras/ Cell phones / PCs • Issues • Admin cost • Footprint (where to put the intelligence) • Replication strategies • Battery life • …

  39. More on Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task

  40. Outline • The glorious past • The uncertain present • A promising path forward

  41. Filler Slides in Case No Questions

  42. 3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive Many Little beat Few Big $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?

  43. In The Limit: The Pico ProcessorSmoking Hairy Golf Ball • 1 M SPECmarks, • 1TFLOP • 106 clocks to bulk ram • Event-horizon on chip. • VM reincarnated • Multi-program cache • On-Chip SMP

  44. 1.8" 2.5" 3.5" 5.25" 9" 14" Disc Trends Discs are getting Bigger (­ 1TB/unit)Cooler 50,000 access/sec/TB -> 50 a/s/TB

  45. Latency: How Far Away is the Data?

  46. Why Parallel Access To Data? At 10 MB/s 1.2 days to scan 1,000 x parallel 1.5 minute SCAN. BANDWIDTH Parallelism: divide a big problem into many smaller ones to be solved in parallel.

  47. Latency Data-Flow ProgrammingPrefetch & Post-write Hide Latency • Can't wait for the data to arrive (2,000 years!) • Need a memory that gets the data in advance ( 100MB/S) • Solution: • Pipeline from source (tape, disc, ram...) to cpu cache • Pipeline results to destination

More Related