190 likes | 198 Views
Store Everything Online In A Database. Jim Gray Microsoft Research Gray@Microsoft.com http://research.microsoft.com/~gray/talks. Outline. Store Everything Online (Disk not Tape) In a Database. How Much is Everything?. Yotta Zetta Exa Peta Tera Giga Mega Kilo. Everything!
E N D
Store EverythingOnlineIn A Database Jim Gray Microsoft Research Gray@Microsoft.com http://research.microsoft.com/~gray/talks http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Outline • Store Everything • Online (Disk not Tape) • In a Database http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
How Much is Everything? Yotta Zetta Exa Peta Tera Giga Mega Kilo Everything! Recorded • Soon everything can be recorded and indexed • Most bytes will never be seen by humans. • Data summarization, trend detection anomaly detection are key technologies See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html See Lyman & Varian: How much information http://www.sims.berkeley.edu/research/projects/how-much-info/ All BooksMultiMedia All LoC books (words) .Movie A Photo A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Storage capacity beating Moore’s law 3 k$/TB today (raw disk) 1k$/TB by end of 2002 http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Outline • Store Everything • Online (Disk not Tape) • In a Database http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Online Data • Can build 1PB of NAS disk for 5M$ today • Can SCAN (read or write) entire PB in 3 hours. • Operate it as a data pump: continuous sequential scan • Can deliver 1PB for 1M$ over Internet • Access charge is 300$/Mbps bulk rate • Need to Geoplex data (store it in two places). • Need to filter/process data near the source, • To minimize network costs. http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
The “Absurd” Disk • 2.5 hr scan time (poor sequential access) • 1 access per second / 5 GB (VERY cold data) • It’s a tape! 1 TB 100 MB/s 200 Kaps http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Disk 80 GB 35 MBps 5 ms seek time 3 ms rotate latency 3$/GB for drive 2$/GB for ctlrs/cabinet 15 TB/rack 1 hour scan Tape 40 GB 10 MBps 10 sec pick time 30-120 second seek time 2$/GB for media8$/GB for drive+library 10 TB/rack 1 week scan Disk vs Tape Guestimates Cern: 200 TB 3480 tapes 2 col = 50GB Rack = 1 TB =12 drives The price advantage of disk is growing the performance advantage of disk is huge! At 10K$/TB, disk is competitive with nearline tape. http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Building a Petabyte Disk Store • Cadillac ~ 500k$/TB = 500M$/PB plus FC switches plus… 800M$/PB • TPC-C SANs (Brand PC 18GB/…) 60 M$/PB • Brand PC local SCSI 20M$/PB • Do it yourself ATA 5M$/PB http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
2x800 Mhz 512 MB Cheap Storage and/or Balanced System • Low cost storage (2 x 3k$ servers) 5K$ TB2x ( 800 Mhz, 256Mb + 8x80GB disks + 100MbE)raid5 costs 6K$/TB • Balanced server (5k$/.64 TB) • 2x800Mhz (2k$) • 512 MB • 8 x 80 GB drives (2K$) • Gbps Ethernet + switch (300$/port) • 9k$/TB 18K$/mirrored TB http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Next step in the Evolution • Disks become supercomputers • Controller will have 1bips, 1 GB ram, 1 GBps net • And a disk arm. • Disks will run full-blown app/web/db/os stack • Distributed computing • Processors migrate to transducers. http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
It’s Hard to Archive a PetabyteIt takes a LONG time to restore it. • At 1GBps it takes 12 days! • Store it in two (or more) places online (on disk?).A geo-plex • Scrub it continuously (look for errors) • On failure, • use other copy until failure repaired, • refresh lost copy from safe copy. • Can organize the two copies differently (e.g.: one by time, one by space) http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Outline • Store Everything • Online (Disk not Tape) • In a Database http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Why Not file = object + GREP? • It works if you have thousands of objects (and you know them all) • But hard to search millions/billions/trillions with GREP • Hard to put all attributes in file name. • Minimal metadata • Hard to do chunking right. • Hard to pivot on space/time/version/attributes. http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
The Reality: it’s build vs buy • If you use a file system you will eventually build a database system: • metadata, • Query, • parallel ops, • security,…. • reorganize, • recovery, • distributed, • replication, http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
OK: so I’ll put lots of objects in a fileDo It Yourself Database • Good news: • Your implementation will be 10x faster than the general purpose one easier to understand and use than the general purpose on. • Bad news: • It will cost 10x more to build and maintain • Someday you will get bored maintaining/evolving it • It will lack some killer features: • Parallel search • Self-describing via metadata • SQL, XML, … • Replication • Online update – reorganization • Chunking is problematic (what granularity, how to aggregate) http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Top 10 reasons to put Everything in a DB • Someone else writes the million lines of code • Captures data and Metadata, • Standard interfaces give tools and quick learning • Allows Schema Evolution without breaking old apps • Index and Pivot on multiple attributes space-time-attribute-version…. • Parallel terabyte searches in seconds or minutes • Moves processing & search close to the disk arm (moves fewer bytes (qestons return datons). • Chunking is easier (can aggregate chunks at server). • Automatic geo-replication • Online update and reorganization. • Security • If you pick the right vendor, ten years from now, there will be software that can read the data. http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
DB Centric Examples • TerraServer • All images and all data in the database (chunked as small tiles).www.TerraServer.Microsoft.com/ • http://research.microsoft.com/~gray/Papers/MSR_TR_99_29_TerraServer.doc • SkyServer & Virtual Sky • Both image and semantic data in a relational store. • Parallel search & NonProcedural access are important. • http://research.microsoft.com/~gray/Papers/MS_TR_99_30_Sloan_Digital_Sky_Survey.doc • http://dart.pha.jhu.edu/sdss/getMosaic.asp?Z=1&A=1&T=4&H=1&S=10&M=30 • http://virtualsky.org/servlet/Page?F=3&RA=16h+10m+1.0s&DE=%2B0d+42m+45s&T=4&P=12&S=10&X=5096&Y=4121&W=4&Z=-1&tile.2.1.x=55&tile.2.1.y=20 http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt
Outline • Store Everything • Online (Disk not Tape) • In a Database http://research.microsoft.com/~gray/talks/Science_Data_Centers.ppt