470 likes | 578 Views
Database Performance in The Era of Free Computing. Jim Gray Microsoft Research http://Research.Microsoft.com/~Gray/Talks. Generic Keynote Outline . The glorious past The uncertain present A promising path forward. Benchmarks are Wonderful Things Transactions Per Second Queries Per Minute.
E N D
Database Performancein The Era of Free Computing Jim Gray Microsoft Research http://Research.Microsoft.com/~Gray/Talks
Generic Keynote Outline • The glorious past • The uncertain present • A promising path forward
Benchmarks are Wonderful ThingsTransactions Per SecondQueries Per Minute • They set performance agenda for industry • Many performance bugs get fixed • But their time passes • Became an end-in-themselves • Eventually became negative: • Improving good parts (benchmark specials) • rather than fixing holes (load/backup/restore/reorg)
Example…. • 1985 goal: 1,000 transactions per second • Couldn’t do it at the time • At the time: • 100 transactions/second • 50 M$ for the computer (y2005 dollars)
Old Problems Now Look Easy • 1985 goal: 1,000 transactions per second • Couldn’t do it at the time • At the time: • 100 transactions/second • 50 M$ for the computer (y2005 dollars) • Now: easy • Laptop does 8,200 debit-credit tps • ~$400 desktop Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive, Gray & Levine, MSR-TR-2005-39, ftp://ftp.research.microsoft.com/pub/tr/TR-2005-39.doc
2006: 5.5..7.4 tpmC/MHz 1000.00 TPC-A and TPC-C tps/$ Trends 100.00 10.00 Throughput / k$ TPC-C TPC A 1.00 ~100x in 10 years ~2x per 1.5 years 0.10 0.01 1990 1992 1994 1996 1998 2000 2002 2004 No obvious end in sight! Hardware & Software Progress Throughput/$ 2x per 1.5 years 40%/y hardware, 20%/y software Throughput 2x per 2 years tracks MHz (except lately!) ~2x / 1.5 years A Measure of Transaction Processing 20 Years Laterftp://ftp.research.microsoft.com/pub/tr/TR-2005-57.doc IEEE Data Engineering Bulletin, V. 28.2, pp. 3-4, June 2005
tpcC goes crazy • 16M$ computer • 64 cores (processors) • 2.5M users • 6,540 disks • 3.2 M tpmC @5$/tpmC
Amazing Price/Performance TPC-C results referenced above are Dell PowerEdge running SQL Server 2005, 38,622 tpmC, .99 $/tpmC, available 11/8/05
TpcC • 1$/tmpC @39ktpmC
More Glorious Past • Big success in semi-standard benchmarks • SAP, Peoplesoft benchmarks … • Data Mining suites • OLAP council • SPEC: cpu, files, … • PCmags: NetBench,… • Micros: Streams,… • Internet2 Land Speed Record
Outline • The glorious past • The uncertain present • A promising path forward
Tps: A solved problem? • 6B people on the planet • 1$ tpmC => 1B$ ~ 150 transactions/day for every person on the planet. • A 0B$ industry? • Need new model of computers applications. But,… Industry is chasing TOP500 (LINPAC fits in cache)TpmC: Millions of tpmC
A Way Out: Use XML & Perl? • XML is easy! • Perl … is easy • But it only gives you 10x fewer tps • 800 tps (64 M transactions/day) is more than most companies need!And… that’s only a laptop. Philippe Lacoude, “Pushing SQL Server 2005 Limits - Dealing with Oversized XML Documents” http://www.lacoude.com/docs/public/public.aspx?doc=SQL90XML.pdf
Outline • The glorious past • The uncertain present • A promising path forward
What Makes A Good Benchmark • Relevant: measures something interesting • Simple: easy to understand • Portable: vendor / technology neutral • Scalable: grows with technology/time • I am complaining about the “relevance”.
Two Kinds of Benchmarks • Evaluate some design concept • A performance study others can replicate • Examples: Wisconsin, Sort, DebitCredit, XML • A performance agenda • A formal standards group • Examples: SPEC, TPC,… Evaluation is HARD but much easier than Agendas.
Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task
What Do You Mean? • You are probably asking: • HUH? • What do you mean? • Can you give two examples. • Here are three things we did recently • Sort • To Blob or Not to Blob? • Copy a PetaByte?
Records per Second per CPU 1.E+6 slow improvement after 1995 cache conscious 1.E+5 Super 1.E+4 records/sec/cpu 1.E+3 GPU better memory architecture, so finally more records/second Mini 1.E+2 1.E+1 1985 1990 1995 2000 2005 Sort 100 byte records (minute / penny)Shows We Hit Memory Ceiling in 1995 http://research.microsoft.com/barc/SortBenchmark/ • Sort recs/s/cpuplateaued in1995 • Had to get GPU to getbetter Memory bandwidth • SIGMOD 2006GpuTeraSort
Sort Performance/Price improved comparable to TpmC • Based on parallelism and “commodity” not per-cpu performance.
To Blob or Not To Blob • For objects X smaller than 1MBSelect X into x from T where key = 123faster than h = open(X); read(h,x,n); close(h) • So, blob beats file for objects < 1MB (on SQL Server – what about other DBs?) • Because DB is CISC and FS is RISC • Most things are less than 1MB • DB should work to make this 10MB • File system should borrow ideas from DB. “To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?” Rusty Sears, Catharine Van Ingen, Jim Gray, MSR-TR-2006-45, April 2006
What About Bit Error Rates • Uncorrectable Errors on Read (UERs) • Quoted uncorrectable bit error rates10-13 to 10-15 • That’s 1 error in 1TB to 1 error in 100TB • WOW!!! • We moved 1.5 PB looking for errors • Saw 5 UER events • 3 real, 3 of them were masked by retry • Many controller fails and system security reboots • Conclusion: • UER not a useful metric – want mean time to data loss • UER better than advertised. Empirical Measurements of Disk Failure Rates and Error Rates Jim Gray, Catharine van Ingen, Microsoft Technical Report MSR-TR-2005-166
So, You Want to Copy a Petabyte? • Today, that’s 4,000 disks (read 2k write 2k) • Takes ~4 hours if they run in parallel, but… • Probably not one file. • You will see a few UERs. • What’s the best strategy? • How fast can you move a Petabyte from CERN to Pasadena? Is sneaker-net fastest and cheapest?
UER things I wish I knew • Better statistics from larger farms, and more diversity. • What is the UER on a LAN, WAN? • What is the UER over time: for a file on disk for a disk • What’s the best replication strategy? • Symmetric (1+1)+(1+1) or triplex (1+1) + 1
More Generally: Performance Metrics • Operational performance tests • Things that help us design systems • We can do these things as research tasks. • but they are harder than they look characterize the problem do the actual measurements. • In the end they look simple & easy, but not in the beginning
More on Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task
Ease-of-Use is HARD • Few ease-of-use metrics • They are difficult • “ease of use is what you are used to” • Big difference are obvious • Rock vs hammer • It depends on the task • Python vs MathLab vs FrontPage • Defining the tasks seems to define the winner • Bench Marketing: for every product there is a benchmark at which it is “the best”.
Computers Are “Free” • Processing cost is going to zero $/instruction • Storage cost is going to zero $/byte & $/access • Networking cost is going to zero $/message • Operations cost is going to zero $/node • Ratios are changing: • Distance: bandwidth/latency →∞ • Heat: Access/Second/GB → 0
But… • So, anyone with non-zero budget has infinite storage and processing and … • Several groups are deploying • Fractions of EXABYTES • Fractions of MILLIONS OF NODES (peta ops) • Fractions of TERABYTES/sec OF NETWORK • Microsoft, Yahoo!, Google, Spooks spending x$B (not $1..)so they have 1storage/processing/networking
Oh! And PEOPLE COSTS are HUGE! • People costs always exceeded IT capital. • But now that hardware is “free” … • Key Goal: • self-organizing . • self-healing, • No DBAs for cell phones or cameras.
CapX (capital expense) • 1$/GB..100$/GB => 1B$...100B$/ExaByte • 1k$..100k$/node => 1B$..100B$ /MegaNodes • 10$..1k$ /Mb/s/mo =>1B$ .. 100B$/TB/s/y (200 lambdas) • And then there is the power and building!!!
Price/Performance Still Matters • 10% of 1B$ is significant! == $100,000,000 • So, “small improvements” are significant. • 12 TB + cpu + RAM … for less than 10k$ • How much less??
OpX (Operations Expense) • Gartner numbers • 300k$/TB/y => 300B$/ExaB/y • 7k$/node/y => 7B$/M-nodes/y • So… Have to do things differently… • Autopilot: touch free • Make OpX less than 10% of CapX => less than person per 10M$ CapX person per ~ 10,000 processors person per ~ 40,000 disks person per ~ 1,000 routers
And the Ratios are ChangingOther things approaching zero • Bandwidth/latency • CapX/Opx • Access/sec vs Storage Capacity • Moore’s wall: chips/thread • cost-per-person is rising.
TCO: Performance Metrics • TCO (Total Cost Of Ownership) • CapX and OpX interact • You can waste hardware to save OpX • Metrics (CapX + OpX): • $/ExaB/year • $/PetaOp/year • $/PageViews/year • Availability (fraction of requests serviced) • Agreed: • It is difficult to do this in academe • It is difficult to publish results • Competitive issues • Reviewers do not “get it”. • But it is very important
KVM / IP The TerraServer Story(not able to talk about others) • 1997: 8 DEC Alpha, • 8GB, • 480x18GB disks • ~1 TB • 2000: 4x8 Pentium3 600Mhz, • 16GB ram • 540 36GB FC SCSI disks • FC SAN • ~18TB • 2004: 7x2 Xeon • ~100 250 GB SATA disks • 28 TB • 70k$ • NO TAPE Now antique but you get the idea “TerraServer Bricks – A High Availability Cluster Alternative” MSR-TR-2004-107 “TerraServer Cluster and SAN Experience” MSR-TR-2004-67
Oh! And there is the other 99% • Mega-servers are just the core • Most processing is in the periphery • Billions of Clients need Millions of Servers • Sensors, Cameras/ Cell phones / PCs • Issues • Admin cost • Footprint (where to put the intelligence) • Replication strategies • Battery life • …
More on Suggestion #1 • Do not chase the tpmC goal • Do not chase the top500 goal Work on “real application benchmarks” Measure things people care about: TCO: Holistic “total cost of ownership” Reliability Cost/task: backup/restore/recovery Install/Configure application. Time-to-solution for a analysis task
Outline • The glorious past • The uncertain present • A promising path forward
3 1 MM 10 nano-second ram 10 microsecond ram 10 millisecond disc 10 second tape archive Many Little beat Few Big $1 million $10 K $100 K Pico Processor Micro Nano 10 pico-second ram 1 MB Mini Mainframe 10 0 MB 1 0 GB 1 TB 1 00 TB 1.8" 2.5" 3.5" 5.25" 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP 9" 14" • Smoking, hairy golf ball • How to connect the many little parts? • How to program the many little parts? • Fault tolerance?
In The Limit: The Pico ProcessorSmoking Hairy Golf Ball • 1 M SPECmarks, • 1TFLOP • 106 clocks to bulk ram • Event-horizon on chip. • VM reincarnated • Multi-program cache • On-Chip SMP
1.8" 2.5" 3.5" 5.25" 9" 14" Disc Trends Discs are getting Bigger ( 1TB/unit)Cooler 50,000 access/sec/TB -> 50 a/s/TB
Why Parallel Access To Data? At 10 MB/s 1.2 days to scan 1,000 x parallel 1.5 minute SCAN. BANDWIDTH Parallelism: divide a big problem into many smaller ones to be solved in parallel.
Latency Data-Flow ProgrammingPrefetch & Post-write Hide Latency • Can't wait for the data to arrive (2,000 years!) • Need a memory that gets the data in advance ( 100MB/S) • Solution: • Pipeline from source (tape, disc, ram...) to cpu cache • Pipeline results to destination