180 likes | 281 Views
Experience Building The World Wide Telescope aka: The Virtual Observatory. Jim Gray Alex Szalay. The Evolution of Science. Observational Science Scientist gathers data by direct observation Scientist analyzes data Analytical Science Scientist builds analytical model Makes predictions.
E N D
Experience Building The World Wide Telescope aka: The Virtual Observatory Jim Gray Alex Szalay
The Evolution of Science • Observational Science • Scientist gathers data by direct observation • Scientist analyzes data • Analytical Science • Scientist builds analytical model • Makes predictions. • Computational Science • Simulate analytical model • Validate model and makes predictions • Data Exploration Science Data captured by instrumentsOr data generated by simulator • Processed by software • Placed in a database / files • Scientist analyzes database / files
Information Avalanche Image courtesy C. Meneveau & A. Szalay @ JHU • In science, industry, government,…. • better observational instruments and • and, better simulations producing a data avalanche • Examples • BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information • CERN: LHC will generate 1GB/s .~10 PB/y • VLBA (NRAO) generates 1GB/s today • Pixar: 100 TB/Movie • New emphasis on informatics: • Capturing, Organizing, Summarizing, Analyzing, Visualizing BaBar, Stanford P&E Gene Sequencer From http://www.genome.uci.edu/ Space Telescope
World Wide TelescopeVirtual Observatoryhttp://www.ivoa.net/ • Premise: Most data is (or could be online) • The Internet is the world’s best telescope: • It has data on every part of the sky • In every measured spectral band: optical, x-ray, radio.. • As deep as the best instruments (2 years ago). • It is up when you are up.The “seeing” is always great(no working at night, no clouds no moons no..). • It’s a smart telescope: links objects and data to literature on them.
The WWT Components • Data Sources • Literature • Archives • Unified Definitions • Units, • Semantics/Concepts/Metrics, Representations, • Provenance • Object model • Classes and methods • Portals
Data Sources • Literature online and cross indexed • Simbad, ADS, NED,http://simbad.u-strasbg.fr/Simbad, http://adswww.harvard.edu/, http://nedwww.ipac.caltech.edu/ • Many curated archives online • FIRST, DPOSS, 2MASS, USNO, IRAS, SDSS, VizeR,… • Typically files with English meta-data and some programs • Groups, Researchers, Amateurs Publish • Datasets online in various formats • Documentation varies • Publications are Ephemeral • Unknown provenance
Unified Definitions • Universal Content Definitions http://vizier.u-strasbg.fr/doc/UCD.htx • Collated all table heads from all the literature • 100,000 terms reduced to ~1,500 • Rough consensus that this is the right thing. • Refinement in progress as people use UCDs • Defines • Units: • gram, radian, second, ... • Semantic Concepts / Metrics • Std error, Chi2 fit, magnitude, flux @ passband, velocity,
Provenance • Most data will be derived. • To do science, need to trace derived data back to source. • So programs and inputs must be registered. • Must be able to re-run them. • Example: Space Telescope Calibrated Data • Run on demand • Can specify software version (to get old answers) • Scientific Data Provenance and Curation are largely unsolved problems (some ideas but no science).
Object Model • General acceptance of XML • Recent acceptance of XML Schema (XSD over DTD) • Wait-and-See about SOAP/WSDL/… • “ Web Services are just Corba with angle brackets.” • FTP is good enough for me. • Personal opinion: • Web Services are much more than “Corba + <>” • Huge focus on interop • Huge focus on integrated tools • But the community says “Show me!” • Many technologists sold, but not the astronomers
Classes and Methods • First Class: VO tablehttp://www.us-vo.org/VOTable/VOTable-1-0.htm • Represents an answer set in XML • Defined by an XML Schema (XSD) • Metadata (in terms of UCDs) • Data representation(numbers and text) • First method • Cone Search: Get objects in this cone
Other Classes • Space-Time class • http://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdf • Image Class (returns pixels) • SdssCutout • Simple Image Access Protocol http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/ACF8DE.pdf • HyperAtlashttp://bill.cacr.caltech.edu/usvo-pubs/files/hyperatlas.pdf • Spectral • Simple Spectral Access Protocol • 500K spectra available athttp://voservices.net/wave • Query Services • ADQL and SkyNode http://skyservice.pha.jhu.edu/develop/vo/adql/ • Registry: • see below
The Registry • UDDI seemed inappropriate • Complex • Irrelevant questions • Relevant questions missing • Evolved Dublin Core • Represent Datasets, Services, Portals • Needs to be machine readable • Federation (DNS model) • Push & Pull: register then harvest • http://www.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
SkyQueryA Prototype WWT • Started with SDSS data and schema • Imported about 9 other datasets into that spine schema. • Unified them with a portal • Implicit spatial join among the datasets. • All built on Web Services • Pure XML • Pure SOAP • Used .NET toolkit
Demo • SkyServer: • navigator showing cutout web service • List: showing many calls and variant use. • SkyQuery: • Show integration of various archives. • Explain spatial join xMatch operator.
MyDB • Portal allows federation of data but… • Intermediate results may be large. • Intermediate results feed into next analysis step. • Sending them back-and-forth to client is costly and sometimes infeasible. • Solution: create a working DB for client at Portal: MyDB
MyDB • Anyone can create a personal DB at SkyServer portal. • It is about 100 MB • It is private • Simple queries done immediately • Complex queries done by batch scheduler • All queries can create/read/write MyDB tables • Very popular with “serious” users. • MyDB will be sharable with by a group.
Open SkyQuery • SkyQuery being adopted by AstroGrid as reference implementation for OGSA-DAI(Open Grid Services Architecture, Data Access and Integration). • SkyNode basic archive objecthttp://www.ivoa.net/twiki/bin/view/IVOA/SkyNode • SkyQuery Language (VoQL) is evolving.http://www.ivoa.net/twiki/bin/view/IVOA/IvoaVOQL
The WWT Components What we learned • Astro is a community of 10,000 • Homogenous & Cooperative • If you can’t do it for Astro, do not bother with 3M bio-info. • Agreement • Takes time • Takes endless meetings • Big problems are non-technical • Legacy is a big problem. • Plumbing and tools are thereBut… • What is the object model • What do you want to save. • How document provenance. Outline • Data Sources • Literature • Archives • Unified Definitions • Units, • Semantics/Concepts/Metrics, Representations, • Provenance • Object model • Classes and methods • Portals • WWT is a poster child for the Data Grid.