120 likes | 329 Views
Microsoft Research CERN-Pasadena at 1 GBps ( 8 Gbps) World Wide Telescope. Jim Gray Researcher Microsoft Research . Microsoft Research. Organizational goal: Advance state of the art More than 700 staff, 55 areas Labs in US, Europe, Asia Internationally recognized teams
E N D
Microsoft ResearchCERN-Pasadena at 1 GBps (8 Gbps)World Wide Telescope Jim Gray Researcher Microsoft Research
Microsoft Research • Organizational goal: Advance state of the art • More than 700 staff, 55 areas • Labs in US, Europe, Asia Internationally recognized teams • University organizational modelOpen research environmentClose ties to universities • Close working relations with development.
Experiments & Instruments facts questions facts ? Other Archives facts answers Literature facts Simulations My Research Goal • Information at your fingertips • Bring all scientific literature and data online • Focus on large database issues, and scalable servers.
Filter ~PBps ~1 GBps Experiment ~5 GBps … ~1 GBps CERN ~1 GBps Tier 1 Tier 2 FNAL INFN INP3 RAL Physics data cache .1 GBps Workstations Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 3 Tier 4 OC192= 9.9 Gbps Institute Institute Institute Institute Challenge: Move Data from CERN to Remote Centers @ 1GBps • Disk-to-Disk • gigabyte / second data rates • 80TB/day • 30petabytes by 2008 • 1 exabyte by 2014 Graphics courtesy of Harvey Newman @ Caltech
Multi Stream tpc/ip 7.1 Gbps ~900 MBps New speed record @ http://ultralight.caltech.edu/lsr-winhec/ Single Stream tpc/ip 6.5 Gbps ~800 MBps File Transfer Speed ~450 MBps Current Status: CERN → Pasadena 7,000 6,000 5,000 4,000 mbps per second 3,000 2,000 1,000 0 2000 2001 2002 2003 2004 2005
World Wide Telescope • Premise: Most Astronomy data is online • The Internet is the world’s best telescope • It has data on every part of the sky • In every measured spectral band: • As deep as the best instruments • It is up when you are up.The “seeing” is always great(no working at night, no clouds no moons no..). • It’s a smart telescope: links objects and data with literature.
SkyServer.SDSS.orgBuilt with Johns Hopkins U. • A modern archive • Raw data in file servers • Catalog data (derived objects) in Database • 10 billon records, 2 TB • Online query to any and all • Also used for education • 150 hours of online Astronomy • Implicitly teaches data analysis • Interesting things • Based on Web Services • Spatial data search • Cloned by other surveys (a design template)
DB DB DB DB Service Oriented ArchitectureData Federations of Web Services • Massive datasets live near their owners: • Near instrument software pipeline, apps • Near data knowledge and curation • Each Archive publishes a web service • Schema: documents the data • Methods on objects (queries) • Uniform access to multiple Archives • A common global schema • Scientists get “personalized” extracts DB
Federation: SkyQuery.Net • Combines 15 archives • Send query to portal, portal joins data from archives. • Problem: want to do multi-step data analysis (not just single query). • Solution: Allow personal databases on portal • Problem: some queries are monsters • Solution: “batch scheduler” on portal server, Deposits answer in personal database.
Each SkyNode publishes Schema Web Service Data Query Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service ImageCutout SkyQuery Portal 2MASS INT SDSS FIRST SkyQuery Structure
Summary • Microsoft Research is active inside and outside Microsoft. • 10Gbps Networking is coming,x-64 is comingand we are investing to make them real. • World Wide Telescope is coming • Exemplifies service oriented architecture • Built with web services and databases • Has interesting spatial database algorithms • Details on my website:http://research.microsoft.com/~Gray