1 / 20

Web Services for the Virtual Observatory

SPIE, Hawaii, 2002. Web Services for the Virtual Observatory. (Living in an exponential world….). Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar. Outline. Collecting Data Exponential Growth Making Discoveries Publishing Data VO: How will it work? Web Services

Patman
Download Presentation

Web Services for the Virtual Observatory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPIE, Hawaii, 2002 Web Services for the Virtual Observatory (Living in an exponential world….) Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar

  2. Outline • Collecting Data • Exponential Growth • Making Discoveries • Publishing Data • VO: How will it work? • Web Services • Atomic vs Composite services • Distributed queries with SkyQuery • Cross-Matching Algorithm • SkyNode Web Services + Portal Alex Szalay, SPIE 2002

  3. The World is Exponential • Astrophysical data is growing exponentially • Doubling every year (Moore’s Law+):both data sizes and number of data sets • Computational resources scale the same way • Constant $$$ will keep up with the data • Main problem is the software component • Currently components are not reused • Software costs are increasingly larger fraction • Aggregate costs are growing exponentially Alex Szalay, SPIE 2002

  4. Making Discoveries • When and where are discoveries made? • Always at the edges and boundaries • Going deeper, using more colors…. • Metcalfe’s law • Utility of computer networks grows as the number of possible connections: O(N2) • VO: Federation of N archives • Possibilities for new discoveries grow as O(N2) • Current sky surveys have proven this • Very early discoveries from SDSS, 2MASS, DPOSS Alex Szalay, SPIE 2002

  5. Publishing Data Roles Authors Publishers Curators Consumers Traditional Scientists Journals Libraries Scientists Emerging Collaborations Project www site Bigger Archives Scientists Alex Szalay, SPIE 2002

  6. Changing Roles • Exponential growth: • Projects last at least 3-5 years • Data sent upwards only at the end of the project • Data will be never centralized • More responsibility on projects • Becoming Publishers and Curators • Larger fraction of budget spent on software • Lot of development duplicated, wasted • More standards are needed • Easier data interchange, fewer tools • More templates are needed • Develop less software on your own Alex Szalay, SPIE 2002

  7. Emerging New Concepts • Standardizing distributed data • Web Services, supported on all platforms • Custom configure remote data dynamically • XML: Extensible Markup Language • SOAP: Simple Object Access Protocol • WSDL: Web Services Description Language • Standardizing distributed computing • Grid Services • Custom configure remote computing dynamically • Build your own remote computer, and discard • Virtual Data: new data sets on demand Alex Szalay, SPIE 2002

  8. Shielding Users • Users do not want to deal with XML,they want their data • Users do not want to deal with configuring grid computing, they want results • SOAP: data appears in user memory, XML is invisible • SOAP call: just a remote procedure Alex Szalay, SPIE 2002

  9. NVO: How Will It Work? • Define commonly used `atomic’ services • Build higher level toolboxes/portals on top • We do not build `everything for everybody’ • Use the 90-10 rule: • Define the standards and interfaces • Build the framework • Build the 10% of services that are used by 90% • Let the users build the rest from the components Alex Szalay, SPIE 2002

  10. Atomic Services • Metadata information about resources • Waveband • Sky coverage • Translation of names to universal dictionary (UCD) • Simple search patterns on the resources • Cone Search • Image mosaic • Unit conversions • Simple filtering, counting, histogramming • On-the-fly recalibrations Alex Szalay, SPIE 2002

  11. Higher Level Services • Built on Atomic Services • Perform more complex tasks • Examples • Automated resource discovery • Cross-identifications • Photometric redshifts • Outlier detections • Visualization facilities • Expectation: • Build custom portals in matter of days from existing building blocks (like today in IRAF or IDL) Alex Szalay, SPIE 2002

  12. SkyQuery • Distributed Query tool using a set of services • Feasibility study, built in 6 weeks from scratch • Tanu Malik (JHU CS grad student) • Tamas Budavari (JHU astro postdoc) • Implemented in C# and .NET • Won 2nd prize of Microsoft XML Contest • Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2 Alex Szalay, SPIE 2002

  13. Architecture Web Page Image cutout SkyQuery SkyNodeSDSS SkyNode2Mass SkyNodeFirst Alex Szalay, SPIE 2002

  14. Cross-id Steps SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND (o.i - t.m_j) > 2AND o.type=3 • Parse query • Get counts • Sort by counts • Make plan • Cross-match • Recursively, from small to large • Select necessary attributes only • Return output • Insert cutout image Alex Szalay, SPIE 2002

  15. Monte-Carlo Simulation • Comparing different algorithms for 3-way xid • Transmit all the data • Transmit after filtering • Recursive cross-match • Surveys • SDSS • 2MASS • First • Random variables: • Sky Area (0..10 sqdeg) • Selectivity of each subselect (0..1) • Efficiency of join (0.5..2) • Selectivity of common select (0..1) Alex Szalay, SPIE 2002

  16. SkyNode • Metadata functions (SOAP) • Info, Tables, Columns, Schema, Functions, Keysearch • Query functions (SOAP) • Dataset Query(String sqlCmd) • Dataset Xmatch(Dataset input, String sqlCmd, float eps) • Database • MS SQL Server • Upload dataset • Very fast spatial search engine (HTM-based)crossmatch takes <3 ms/object over 15M in SDSS • User defined functions and stored procedures Alex Szalay, SPIE 2002

  17. SkyQuery SkyNode 1 SkyNode 2 SkyNode 3 Data Flow query http://www.skyquery.net Alex Szalay, SPIE 2002

  18. Other web services • Create density maps and masks for angular clustering • Deliver photometric redshifts form photometry data • Intersect pointed observations with surveys • Generate XSLT from script XML=> SVG • Wrap legacy (Linux C) data mining applications as a web service • Create a C# class for the CFITSIO library Alex Szalay, SPIE 2002

  19. Archive Footprint • Footprint is a ‘fractal’ • Result depends on context • all sky, degree scale, pixel scale • Translate to web services • Footprint()returns single region that contains the archive • Intersection(region, tolerance)feed a region and returns the intersection with archive footprint • Contains(point)returns yes/no (maybe fuzzy) if point is inside archive footprint Alex Szalay, SPIE 2002

  20. Summary • Exponential data growth – distributed data – federation needed • Projects now Publishers and Curators • Web Services – hierarchical architecture • Use the 90-10 rule (maybe 80-20) • There are clever ways to federate datasets! Alex Szalay, SPIE 2002

More Related