220 likes | 302 Views
AstroGrid Datacenters. AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE). Outline . Challenge Approach Developed: Storepoints Describing data Query Language Status Versioning Software: Publisher’s AstroGrid Library. Problem Challenge Outline.
E N D
AstroGrid Datacenters AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE)
Outline • Challenge • Approach • Developed: • Storepoints • Describing data • Query Language • Status • Versioning • Software: Publisher’s AstroGrid Library
Problem Challenge Outline • Large datasets (to Petabytes) • So? • Distributed; Science comes from combining • Bandwidth rising slower than • No/few established suitable standards • FITS images/‘tables’. Ambiguous headers. Ambiguous subformat, eg spectra. • VOTable introduced. Ambiguous subformat eg spectra vs catalogue. Verbose. • No/few established common terms • Involves Scientists…
Approach: ‘Publisher’s AstroGrid Library’ • General solution to: • Discover problems faced, accumulate solutions in software • Experimentally publish sets and types (not host). • Many smaller datasets owned by people without web skills (eg solar) so: • Need 'easy‘/’unskilled’ installation • Able to proxy; 3rd parties can publish data without requiring more work from owner (eg VizieR, Trace) • ‘Free’ website, range of standard interfaces • Danger: too general (any query against any dataset producing any results).
Existing Solutions • Common task: publish RDBMs to web • Accumulated tools & skill-sets • No combined solution offering: • Standard interface (eg query language) • Scientific values (errors, units) • Spatial querying (common) • VO Metadata for query and results
Developing Standards • Resource metadata • Query language (ADQL/s, ADQL/x) • Web interfaces • Working beyond standards • Feeding research to IVOA • Parallel development • In the VO: eg Starlink, NVO, VizieR • External: SRB, Taverna, GridPP monitor • Convergence
Protocols & Interfaces • Human – web pages • SOAP • Toolkit Incompatibilities • Streaming awkward (via Toolkits) • Longer term benefits? • ‘Raw Http post’ (eg servlets, CGI) • Simpler • More existing skills amongst Astronomers • Mixed (eg SIAP, SkyNode) • Don’t Choose – Implement • Mix & Match, Plug & Play:
Releasing • Deploy early – if temporarily • Independent & Integrated Access • Versioning: • Servers & clients, ie new clients can still use old servers, and new servers work with old clients. • Add and ‘deprecate’, don’t change • Delete intelligently • (Remove quickly unused i/fs, eg CEA if CEA upgrades, JSPs) • Need hosts… • Hosts need hardware • Publishers need to know their data
Describing Data • Registry ‘Resource’ documents • IVO Tabular Sky Service • Units, UCDs • Solar vs Sky vs… • Images vs Catalogues • Concept extended for ‘RdmsMetadata’ • UCD1+ -> Dictionaries & Ontologies • Relationships (simple: errors) • Queryable • Mirrors vs Copies
Query Language • SQL -> ADQL/xml • Defined common functions – CIRCLE & XMATCH (sky not solar) • Working on: • XQL • Units • Investigating: UCDs instead of columns • Cross-dataset querying
Results • Query+Metadata+RawResults = VoResults • FITS vs VOTable vs HDF vs CSV vs HTML vs… • All of them • Results -> queryable data -> inputs
Data Analysis (Clive Page) • Faster feasible • < 10^6s OK. 10^8 not… • Joins • Polar coordinate matches (+ HTM, HealPix). • Cross-match algorithms • Distributed queries • Breaking down query • Moving the right data • Combining the results
Status • Readily available • Debugging; developer • Debugging; astronomer • Inform User
Storepoints • No data persistence at PALs • Web server machines not data storage ones • Large result sets • No workspace, memory models, etc • Streaming outputs • SRB, GridFTP not ready.
Identifying Storepoints • Concepts MySpace Community HomeSpace SRB FTP FTP VoSpace (Registered) SRB GridFTP MySpace SRB GridFTP HTTP • FTP, File, MySpace + extend. • 3rd iteration; 2nd in use
Data Service Architecture JSP SIAP CEA Axis AstroGrid SkyNode Plugin Manager Cone Datacenter Implementation Slinger /XML/CSV zip/plain email/file/ftp/myspace
Publishers’ AstroGrid Library • ‘Easy to publish to the VO’ • Web Application, includes: • SOAP (AstroGrid, CEA, prepped for SkyNode) • CGI (SIAP, NVO-cone search, SSA) • HTML pages (cone search, query builder, status monitor) • Features • Asynchronous (‘stateful’) & Synchronous Queries • Queues • Comprehensive Status (incl historical) • Variety results • Fully ‘Streamed’ – no curation issues • Server ‘Plugins’, including: • RDBMS (JDBC) • FITS file collection • eXist (XML) • Helper Tools • Metadata Generators • Ready-made website access
Situation Now • Installed: • SuperCOSMOS Science Archive (RDBMS) • astrogrid.roe.ac.uk:8080/pal-ssa/ • astrogrid.roe.ac.uk:8080/pal-twomass/ • astrogrid.roe.ac.uk:8080/pal-usnob/ • 6dF – Spectra • grendel12.roe.ac.uk:8080/pal-6df/ • Wide Field Survey • TRACE (FITS files, Solar, under test) • Proxy (bespoke special plugins) • All NVO-cone-compatible DBs (test) • VizieR • Evaluated/ing at: • ESO • RAL (solar) • JBO (Merlin) • Reviewing Query Language, metadata documents, etc
Future • Quality… • Metadata ‘wizards’ • Sell to hosts; deploy to Leicester, JBO, ESO, RAL, The World.... • Explicit and Investigative Queries • Distributed queries & combining results (NVO Exec plans) • Full SIA, SSA interface • More user & admin web pages • Local authorisation