540 likes | 640 Views
Microsoft Large Databases and Grid Computing. Jim Gray Microsoft Research Gray@Microsoft.com http://research.Microsoft.com/~gray 21 May 2003 . About me. in Microsoft research (located in San Francisco) A database researcher IBM, Tandem, DEC, Microsoft Work on Scalable Systems
E N D
Microsoft Large DatabasesandGrid Computing Jim Gray Microsoft Research Gray@Microsoft.com http://research.Microsoft.com/~gray 21 May 2003
About me • in Microsoft research (located in San Francisco) • A database researcher • IBM, Tandem, DEC, Microsoft • Work on Scalable Systems • Building supercomputers from commodity components. • Do academic/government things too • PITAC, GriPhyn TAB, NSF/CISE, Library of Congress, … • For the last 4 years, been working with the astronomy community to build the World Wide Telescope.
Agenda • TerraServer • What it is • What we learned • What we are doing now. • SkyServer / WWT • What it is • What we learned • What we are doing now • Grid Computing • General comments • Build a web service
TerraServerTerraService.net • A photo of the United States • 1 meter resolution (photographic/topographic) • USGS data • Some demographic data (BestPlaces.net) • Home sales data • Linked to Encarta Encyclopedia • 15 TB raw, 6 TB cooked (grows 10GB/w) • Point, Pan, zoom interface • Among top 1,000 websites • 40k visitors/day • 4M queries/day • 3 B page views (in 5 years) • All in an SQL database
Daily Average Peak Day 40,011 277,292 1,266,838 2,401,209 Sept ’01 June 1998 - Oct, 2002 June ‘98 Jan ‘99 Jan ‘00 May ‘00 3,735,789 10,475,674 4,484,089 12,388,104 63,656,904 70 gb 163 gb 2,015,539,605 5,943,641,024 7,134,186,170 108tb 298 m Rows 231 m Rows SQL 20001 Server.8 TB Db SQL 2000.8 TB Db 217 m Rows SQL 7.01 Server1.5 TB Db 173 m Rows SQL 7.01.0 TB Db SQL 7.0.75 TB Db TerraServer Statistics Dec ‘02 Unique Users Page Views Image Tiles Db Queries Bytes Xfered 900 m Rows 755mRows SQL 20002.0 TB Db SQL 20001.4 TB Db SQL 20002.0 TB Db SQL 20001.2 TB Db SQL 20002.0 TB Db SQL 7.01.0 TB Db SQL 20001.0 TB Db 1 Server / Win NT 4.0 EE 2nd Server / Win 2k DataCenter 4 Node / Win2k Datacenter Failover Cluster
2200 2200 2200 E E J J O O 2200 2200 2200 G F P Q K L 2200 2200 2200 R S M N H I TerraServer Cluster 8 Compaq DL360 “Photon” Web Servers One SQL database per rack Each rack contains 4.5 TB 1 rack not in picture 18.0 TB total Fiber SAN Switches Meta Data Stored on 101 GB “Fast, Small Disks”(18 x 18.2 GB) SQL\Inst1 Imagery Data Stored on 4 339 GB “Slow, Big Disks” (15 x 73.8 GB) SQL\Inst2 SQL\Inst3 Added 90 72.8 GB Disks in Feb 2001 to create 18 TB SAN Spare 4 Compaq ProLiant 8500 Db Servers
Cluster Configuration Internet Cisco 12000 Internet Router TerraServer SAN Gigibit Ethernet 1 Database Compaq StorageWorks 100-Mbps Ethernet Compaq DL360 (10) Cluster MA8000/HSG80 Controllers (3) Summit 7i Switch (2) 2 Compaq DL360 (6) (Windows 2000 Web Servers) TerraServer.microsoft.com Compaq SANswitch by Brocade Communications Internet Extreme Networks Summit 48 Switch ADIC LTO Tape Library 3 Compaq ProLiant 8500 (4) Microsoft Corporate LAN
TerraServer Becomes a Web ServiceTerraServer.net -> TerraService.Net • Web server is for people. • Web Service is for programs • The end of screen scraping • No faking a URL: pass real parameters. • No parsing the answer: data formatted into your address space. • Hundreds of users but a specific example: • US Department of Agriculture
And now.. 4 slides from the “customer”who built a portal using TerraService
Catalog Service Data Gateway Functional Overview ITC - Fort Collins, Colorado NCGC - Fort Worth, Texas Customer Orders Data Terra Service Billing Services Soil Data Viewer Navigation Service Rimage CD Service XML XML XML FTP Services ASP Ship Service <<Requests Products>> Package Service Send order info Order Placer validate (dtd) Insert into SQL @@Identity / GUID to client return est time raise OrderMgr.event Product Catalog Updates Geospatial Data Order Database Data Services Logger XML Request for data Called by anyone rasies to stats svc' Item Broker Selects from Listen for OrderPlacer Raised Event Select sequenced Item Acknowledges item ready for delivery Output XML rasie event : stats.delivery start
Custom End Product Web Soil Data Viewer XML Soil Report Soil Interpretation Map
Web Server - COM+ Applications ArcIMS Connector WebSDV IMSNavigator Image Retriever Connects to ArcIMS; communication is done through ArcIMS XML (AXL) Retrieves and processes Soils Data from the NASIS relational Database Generates maps (JPGs) using ArcIMS Retrieves imagery from the Microsoft TerraServer Database Server - ESRI Spatial Data Server ESRI Spatial Data Engine Database Server - Microsoft SQL Server Business Rules National SoilsData GeospatialData Microsoft Terraserver Terraserver
Brief tour of TerraService • Show map service • Show some methods • See TerraService.NET: An Introduction to Web ServicesTom Barclay; Jim Gray; Eric Strand; Steve Ekblad; Jeffrey Richter, MSR TR 2002-53, pp 13, June 2002
What We Learned • You can build and manage a very popular website with relatively little effort (if you do it right and have Tom Barclay) • Loading 20 TB takes a lot of energy • And you get to do it many times -- automate • Tape and tape software are problematic • Triplex and snap-shot disks works (we have never had to use it, but..) • The internet gives you 2-9’sServers can run at 4 9’s easily, 5 9’s with effort.
What we are doing now. • Building with 3K$ 2TB bricks • 4 bricks = 1 backend • Triplexing systems • Duplexing sites. • 4*3*2 = 24k$ for Geoplex • Very simple operations model • See: • “TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange,” Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan Vandenberg, pp. 1-8, May 2002
Agenda • TerraServer • What it is • What we learned • What we are doing now. • SkyServer / WWT • What it is • What we learned • What we are doing now • Grid Computing • General comments • Build a web service
SkyServerSkyServer.SDSS.org • Like the TerraServer, but looking the other way: a picture of ¼ of the universe • Pixels +Data Mining • Astronomers get about 400 attributes for each “object” • Get Spectrograms for 1% of the objects
ROSAT ~keV DSS Optical IRAS 25m 2MASS 2m GB 6cm WENSS 92cm NVSS 20cm IRAS 100m Why Astronomy Data? • It has no commercial value • No privacy concerns • Can freely share results with others • Great for experimenting with algorithms • It is real and well documented • High-dimensional data (with confidence intervals) • Spatial data • Temporal data • Many different instruments from many different places and many different times • Federation is a goal • The questions are interesting • How did the universe form? • There is a lot of it (petabytes)
Demo of SkyServer • Shows standard web server • Pixel/image data • Point and click • Explore one object • Explore sets of objects (data mining)
Virtual Observatoryhttp://www.astro.caltech.edu/nvoconf/http://www.voforum.org/ • Premise: Most data is (or could be online) • So, the Internet is the world’s best telescope: • It has data on every part of the sky • In every measured spectral band: optical, x-ray, radio.. • As deep as the best instruments (2 years ago). • It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..). • It’s a smart telescope: links objects and data to literature on them.
Time and Spectral DimensionsThe Multiwavelength Crab Nebulae Crab star 1053 AD X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ CalTech.
Data Federations of Web Services • Massive datasets live near their owners: • Near the instrument’s software pipeline • Near the applications • Near data knowledge and curation • Super Computer centers become Super Data Centers • Each Archive publishes a web service • Schema: documents the data • Methods on objects (queries) • Scientists get “personalized” extracts • Uniform access to multiple Archives • A common global schema Federation
Grid and Web Services Synergy • I believe the Grid will be many web services share data (computrons are free) • IETF standards Provide • Naming • Authorization / Security / Privacy • Distributed Objects Discovery, Definition, Invocation, Object Model • Higher level services: workflow, transactions, DB,.. • Synergy: commercial Internet & Grid tools
Web Services: The Key? Your program Web Server http • Web SERVER: • Given a url + parameters • Returns a web page (often dynamic) • Web SERVICE: • Given a XML document (soap msg) • Returns an XML document • Tools make this look like an RPC. • F(x,y,z) returns (u, v, w) • Distributed objects for the web. • + naming, discovery, security,.. • Internet-scale distributed computing Web page Your program Web Service soap Data In your address space objectin xml
SkyQuery: a prototype • Defining Astronomy Objects and Methods. • Federated 3 Web Services(fermilab/sdss, jhu/first, Cal Tech/dposs) multi-survey cross-match Distributed query optimization (T. Malik, T. Budavari, Alex Szalay @ JHU) http://SkyQuery.net/ • My first web service (cutout + annotated SDSS images) online • http://skyservice.pha.jhu.edu/devel/ImgCutout/chart.asp • WWT is a great Web Services (.Net) application • Federating heterogeneous data sources. • Cooperating organizations • An Information At Your Fingertips challenge.
Demo of Image Cutout Service • Shows image cutout • Show project and debugging project • Show hello World • Show “theAnswer” method
SkyQuery (http://skyquery.net/) • Distributed Query tool using a set of services • Feasibility study, built in 6 weeks from scratch • Tanu Malik (JHU CS grad student) • Tamas Budavari (JHU astro postdoc) • Implemented in C# and .NET • Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
SkyNode Basic Web Services • Metadata information about resources • Waveband • Sky coverage • Translation of names to universal dictionary (UCD) • Simple search patterns on the resources • Cone Search • Image mosaic • Unit conversions • Simple filtering, counting, histogramming • On-the-fly recalibrations
Portals: Higher Level Services • Built on Atomic Services • Perform more complex tasks • Examples • Automated resource discovery • Cross-identifications • Photometric redshifts • Outlier detections • Visualization facilities • Goal: • Build custom portals in days from existing building blocks (like today in IRAF or IDL)
Architecture Image cutout SkyNodeFirst Web Page SkyQuery SkyNode2Mass SkyNodeSDSS
Summary So Far • Some real web services deployed today • Easy to build & deploy • Services publish data, Portals unify it • Tools really work! • I’m using C# and foundation classes of VisualStudio, a great! Tool • A nice book explaining the ideas:(.Net Framework Essentials, Thai, Lam isbn 0-596-00302-1)
Your program Web Service soap Data In your address space objectin xml Possible Relevance to You • This web service stuff is REAL • If you have a class, It is a way to publish data: Internet Intranet • It is a way to find data data comes with schema no more screen scraping/parsing • Business model unclear • Your ideas go here.
What We Learned • Web services really are a breakthrough. • Data mining worked beautifully. SeeData Mining the SDSS SkyServer Database,”J. Gray, D. Slutz, A. Szalay, A. Thakar, P. Kuntz, C. Stoughton, MSR TR 2002-1, pp1-40, 2002. • You can operate a system in Chicago from San Francisco – Terminal Server is wonderful. • The Internet gives you 2 9’s of availability • TeraScale SneakerNet works well
What we are doing now. • Loading more data (next data release) • Preparing for the next generation • Building the WWT • Web Services for the Virtual Observatory,Alexander S. Szalay, Tamás Budavária, Tanu Malika, Jim Gray, and Ani Thakar, SPIE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii, • Petabyte Scale Data Mining: Dream or Reality?,Alexander S. Szalay; Jim Gray; Jan vandenBerg, SIPE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii, • Online Scientific Data Curation, Publication, and ArchivingJim Gray; Alexander S. Szalay; Ani R. Thakar; Christopher Stoughton; Jan vandenBerg, SPIE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii,
Agenda • TerraServer • What it is • What we learned • What we are doing now. • SkyServer / WWT • What it is • What we learned • What we are doing now • Grid Computing • General comments • Build a web service
The Grid • Computation Grid: harvest Internet cpus. • Data Grid: Share files • Application Grid: Web services • Access Grid: teleconferencing
The Microsoft View • Web Services will subsume the Grid • The Grid will be data and servicesnot renting cycles • OGSA: evolution of Globus Toolkit to Web services concepts and technologies… • Lots of encouragement from Microsoft, IBM, Oracle, Sun • GGF as forum for discussion
Engagement with Grid Community • Goal: GXA as infrastructure for Grids • Working with Globus & GGF • Funding work at Argonne National Lab (Globus) • Globus Toolkit 3, and CondorG on Windows • http://www.globus.org/win-alpha/ (we sponsored this) • OGSA for .NET (prototyping) • http://www.globus.org/ogsa/ • Also OGSI.NET at U. VA is very interesting • http://www.cs.virginia.edu/~gsw2c/ogsi.net.html • GGF • Active membershp • HPC .net kit – see http://www.microsoft.com/HPC • Part of .net server scale out development • Includes MPI-CH 1.2.4, distributed job scheduler,… • Thomas Sterling, Beowulf on Windows, MIT Press 2001
What’s Microsoft Doing • Mostly .NET, W3C standards, web services, … • I think SkyQuery is the best web service (grid app) in GriPhyN today. • My stuff is grid computing • But… • Globus (GT3), OGSA, and CondorG ported to Windows (we sponsored it) • We have a HPC toolkit: MPI-CH 1.2.4 • See http://www.microsoft.com/windows2000/hpc/ for many useful links
I Can Talk About Computing on Demand But… Best to read • Distributed Computing Economics, Jim Gray, MSR-TR-2003-24, March 2003 • The slides that follow are based on that paper.
Distributed Computing Economics • Why is Seti@Home a great idea • Why is Napster a great deal? • Why is the Computational Grid uneconomic • When does computing on demand work? • What is the “right” level of abstraction • Is the Access Grid the real killer app?
Computing is Free • Computers cost 1k$ (if you shop right) • So 1 cpu day == 1$ • If you pay the phone bill (and I do)Internet bandwidth costs 50 … 500$/mbps/m(not including routers and management). • So 1GB costs 1$ to send and 1$ to receive
Why is Seti@Home a Good Deal? • Send 300 KB for costs 3e-4$ • User computes for ½ day: benefit .5e-1$ • ROI: 1500:1
Why is Napster a Good Deal? • Send 5 MB costs 5e-3$ • ½ a penny per song • Both sender and receiver can afford it. • Same logic powers web sites (Yahoo!...): • 1e-3$/page view advertising revenue • 1e-5$/page view cost of serving web page • 100:1 ROI
The Cost of Computing:Computers are NOT free! • Capital Cost of a TpcC system is mostly storage and storage software (database) • IBM 32 cpu, 512 GB ram 2,500 disks, 43 TB(680,613 tpmC @ 11.13 $/tpmc available 11/08/03)http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf • A 7.5M$ super-computer • Total Data Center Cost: 40% capital &facilities 60% staff(includes app development)
Computing Equivalents1 $ buys • 1 day of cpu time • 4 GB ram for a day • 1 GB of network bandwidth • 1 GB of disk storage • 10 M database accesses • 10 TB of disk access (sequential) • 10 TB of LAN bandwidth (bulk)
Some consequences • Beowulf networking is 10,000x cheaper than WAN networkingfactors of 105 matter. • The cheapest and fastest way to move a Terabyte cross country is sneakernet.24 hours = 4 MB/s50$ shipping vs 1,000$ wan cost. • Sending 10PB CERN data via networkis silly: buy disk bricks in Geneva, fill them, ship them – one way. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg Microsoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569
SpeedMbps Rent$/month $/TBSent Context $/Mbps Time/TB 0.04 40 1,000 3,086 6 years Home phone Home DSL 0.6 70 117 360 5 months T1 1.5 1,200 800 2,469 2 months T3 43 28,000 651 2,010 2 days OC3 155 49,000 316 976 14 hours OC 192 9600 1,920,000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2.2 hours How Do You Move A Terabyte?
Computational Grid Economics • To the extent that computational grid is like Seti@Home or ZetaNet or Folding@home or… it is a great thing • The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs. • The Internet is NOT the cpu backplane. • The USG should not hide this economic fact from the academic/scientific research community.
Computing on Demand • Was called outsourcing / service bureaus in my youth. CSC and IBM did it. • Payroll is standard outsource. • Now we have Hotmail, Salesforce.com, Oracle.com,…. • Works for standard apps. • Airlines outsource reservations.Banks outsource ATMs. • But Amazon, Amex, Wal-Mart, ...Can’t outsource their core competence. • So, COD works for commoditized services. • It is not a new way of doing things: think payroll.