120 likes | 279 Views
Microsoft Research. SKYSERVER. Jim Gray Distinguished Engineer Microsoft Research San Francisco. Microsoft Research. Organization goal: Advance state of the art More than 700 staff, 55 areas Labs in US, Europe, Asia Internationally recognized teams
E N D
Microsoft Research SKYSERVER Jim GrayDistinguished EngineerMicrosoft Research San Francisco
Microsoft Research • Organization goal: Advance state of the art • More than 700 staff, 55 areas • Labs in US, Europe, Asia Internationally recognized teams • University organizational model Open research environment Close ties to universities • Close working relations with development.
Experiments & Instruments facts questions facts ? Other Archives facts answers Literature facts Simulations My Research Goal • Information at your fingertips • Bring all scientific literature and data online • Focus on large database issues, and scalable servers.
World Wide Telescope • Premise: Most Astronomy data is online • The Internet is the world’s best telescope • It has data on every part of the sky • In every measured spectral band: • As deep as the best instruments • It is up when you are up.The “seeing” is always great(no working at night, no clouds no moons no..). • It’s a smart telescope: links data with literature.
SkyServer.SDSS.orgBuilt with Johns Hopkins U. • A modern archive • Raw data in file servers • Catalog data (derived objects) in Database • 10 billon records, 2 TB • Also used for education • 150 hours of online Astronomy • Interesting things • Based on Web Services • Spatial data search • Cloned by other surveys (a design template)
Massive datasets live near their owners: Near instrument software pipeline, apps Near data knowledge and curation Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Uniform access to multiple Archives A common global schema Scientists get “personalized” extracts DB DB DB DB Service Oriented ArchitectureData Federations of Web Services DB
Each SkyNode publishes Schema Web Service Data Query Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service ImageCutout SkyQuery Portal 2MASS INT SDSS FIRST SkyQuery Structure
Federation: SkyQuery.Net • Combines 15 archives • Send query to portal, portal joins data from archives. • Problem: want to do multi-step data analysis (not just single query). • Solution: Allow personal databases on portal • Problem: some queries are monsters • Solution: “batch scheduler” on portal server, Deposits answer in personal db.
Multi Stream tpc/ip 7.1 Gbps ~900 MBps New speed record @ http://ultralight.caltech.edu/lsr-winhec/ Single Stream tpc/ip 6.5 Gbps ~800 MBps File Transfer Speed ~450 MBps Current Status: CERN → Pasadena 7,000 6,000 5,000 4,000 mbps per second 3,000 2,000 1,000 0 2000 2001 2002 2003 2004 2005
Filter ~PBps ~1 GBps Experiment ~5 GBps … ~1 GBps CERN ~1 GBps Tier 1 Tier 2 FNAL INFN INP3 RAL Physics data cache .1 GBps Workstations Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 3 Tier 4 OC192= 9.9 Gbps Institute Institute Institute Institute Challenge: Move Data from CERN to Remote Centers @ 1GBps • Disk-to-Disk • gigabyte / second data rates • 80TB/day • 30petabytes by 2008 • 1 exabyte by 2014 Graphics courtesy of Harvey Newman @ Caltech
Summary • Microsoft Research is active inside and outside Microsoft. • World Wide Telescope is coming • Exemplifies service oriented architecture • Built with web services and databases • Has interesting spatial database algorithms • 10Gbps Networking is coming,x-64 is comingand we are investing to make them real. • Details on my website:http://research.microsoft.com/~Gray
© 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.