260 likes | 358 Views
Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB. SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels. What will be covered. Where is the data and the access architecture for the users
E N D
Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels Cospar10 Bremen
Cospar10 Bremen What will be covered • Where is the data and the access architecture for the users • Some basic terms • User access methods • modules • basic web access • virtual observatories • simplified web access • pseudo files and other developments • Interesting issues • Retention • Saved searches • Evolving calibration • Neat stuff to come • Cutouts • Helioviewer • Grid integration
Where is the data for the users • Data is available from one or more data centre(s) - all are networked • Some users are "close", some are "far" - distance matters • All data is available somewhere • Users can get data (an "export") • from the nearest centre directly • via the nearest centre from a remote centre • directly from another centre • Most of this is automatic • you will see differences in e.g. delays Cospar10 Bremen
How the data is accessed(a bit technical) • the system is the netDRMS • created by the JSOC at Stanford • files are generated by content • system holds data files + metadata • SUMS + DRMS • mediator is an "export" module • makes your very own file • FITS, tar of FITS etc. • SQL etc. is hidden from user Cospar10 Bremen
Cospar10 Bremen Access summary ... • No files until you ask for them • Data is referenced by content - provided as a file(s) with whatever name you want • The exported files are built using stored elements, so e.g. FITS with Rice compression quite direct as AIA data is stored internally in this format • Can get anything but... • you may as well ask for all metadata • the files can be large - best not to ask for 100's
Cospar10 Bremen Some basic terms • series • basic collection of data items with shared properties • by convention named <project>.<data> • all series records share a metadata format (i.e. keywords) • keywords • FITS style keywords plus added metadata only keywords • correspond to columns in the metadata (DRMS) database • online means • available from a disk at the site • so offline means : not yet arrived/available, deleted but can be fetched • data format • whatever is stored is native (FITS, JP2000), conversion is post-processing • characterised by resolution, cadence (e.g. 4K x 4K at 10s, 1K x 1K at 90s) • naturally can't do better, but can reduce by "cutouts" in time or space • data records • can be several items as a group (e.g. image + bad pixel map + alternative format) • data is SUMS plus metadata, referenced by metadata tables (DRMS) - usually one to one • each is self contained, for example cadence is not part of data
Cospar10 Bremen Example series • aia_test.lev1 AIA images 4Kx4K full disk full cadence • aia_test.synoptic2 AIA images reduced to 1Kx1K full disk and 90s cadence • hmi_test.M_45s magnetograms, 45s cadence • hmi_test.v_45s dopplergrams, 45s cadence • jpeg2K to come, browsing and forecasting
Cospar10 Bremen User access methods • Direct via “modules” • on site of data centre • Query based • precursor to full data access • checks a part of the data (metadata) without having to retrieve the very large part • Indirect via network • web/http based • delivers data somewhere - maybe to fetch immediately or later • Direct via wrapper • on site e.g. IDL (Matlab on way)
Cospar10 Bremen A practical pause - limitations • Sheer size of request - even if you have a 2TB USB stick, that's only 2 days • Network speed - at about 200Mb/s it takes a day to get a day's worth • Search/database speed - millions of records • Raw data access/retrieval speed - the basic image data takes time to get from disk • Retention time - you can get anything, but you probably have to wait for a full day from 2 years ago that nobody else has ever used
Cospar10 Bremen Access by : modules - the basic bricks • At the data centres, for example • show_series • show_info • jsoc_export_as_fits [jdb@db1 ~]$ show_series aia_test.lev1 aia_test.synoptic2 drms.sites hmi.doptest hmi_test.m_45s hmi_test.s_720s lm_jps.lev1_test4k10s [jdb@db1 ~]$ show_info -s ds=aia_test.synoptic2 First Record: aia_test.synoptic2[2010-05-21T15:00:00.57Z][171] is first of 6 records matching first keyword, Recnum = 1 Last Record: aia_test.synoptic2[2010-07-14T11:58:41.07Z][335] is first of 2 records matching first keyword, Recnum = 445376 Last Recnum: 445377 [jdb@db1 ~]$ jsoc_export_as_fits reqid=REQ_FTP expversion=0.5 rsquery=aia_test.lev1[:#209866] path=tmp method=url protocol=FITS '10552320' bytes exported.
Cospar10 Bremen Access by : basic web access • System developed by JSOC : lookdata.html • Online via JSOC web site, but heavily loaded • Being tested at ROB • Provides an easy access to an overview of all the available data • Formulating a selection query does require knowledge of query syntax • Provides for a wide variety of data packaging • normal user FITS or internal format (FITS with no keywords) • via web for immediate or later access, as one or more individual files or as tar • ROB working on fewer packaging options
Access by : basic web access Cospar10 Bremen
Cospar10 Bremen Access by : Virtual Observatories • VSO • development of existing VSO • prototype for SDO running and definitive version in preparation • http://sdac.virtualsolar.org/cgi/search • Soteria • demo provider made for ROB/USET, SDO provider being coded now • http://soteria-space.eu/ • Uniform search paradigm • Infrastructure hides efficient searches with complex syntax e.g. SQL in various flavours
Access by : Soteria Virtual Observatory • One part of an EU project • Based on current web access technology • The example is for the ROB USET telescope as a data provider, each SDO site will able be able to act as a provider Cospar10 Bremen
Cospar10 Bremen Access by : simplified web access • Work in progress • Limited offer to direct request of tar files or individual FITS format files, front end for PFS • Simplified enquiry based such as : • aia.lev1 + time + period + cadence + wavelengths • Preparation is actually more complex than basic access - for example it requires decisions as to what keys are useful for what series
Access by : pseudo files (PFS) • Systematically named files in a directory tree with no real files until you access them • Typically based on query covering a much wider range than you really need (or could use) • Real files kept in cache so further access very cheap Cospar10 Bremen
Cospar10 Bremen Access by : pseudo files (PFS) • Example with 160 file names, all AIA wavelengths, 15min cadence • In prototype at ROB, source downloadable mnt `-- aia_test.lev1 `-- 2010 `-- 06 `-- 17 |-- H0000 | |-- AIA20100617_000000570000_0171.fits | |-- AIA20100617_000003570000_0304.fits | |-- AIA20100617_000009580000_94.fits | |-- AIA20100617_000018570000_1600.fits | |-- AIA20100617_000050070000_211.fits | |-- AIA20100617_000053050000_335.fits | |-- AIA20100617_000056100000_193.fits ...... | |-- AIA20100617_004505070000_335.fits | |-- AIA20100617_004506570000_1600.fits | |-- AIA20100617_004508070000_193.fits | |-- AIA20100617_004509580000_94.fits | `-- AIA20100617_004511070000_131.fits |-- H0100 | |-- AIA20100617_010000580000_0171.fits | |-- AIA20100617_010002080000_211.f ....... |-- AIA20100617_043008060000_193.fits |-- AIA20100617_043009550000_94.fits |-- AIA20100617_043011090000_131.fits |-- AIA20100617_043018580000_1600.fits |-- AIA20100617_044500560000_0171.fits |-- AIA20100617_044502050000_211.fits |-- AIA20100617_044503570000_0304.fits |-- AIA20100617_044505070000_335.fits |-- AIA20100617_044506570000_1600.fits |-- AIA20100617_044508070000_193.fits |-- AIA20100617_044509580000_94.fits `-- AIA20100617_044511070000_131.fits 9 directories, 160 files
Cospar10 Bremen Access by : useful methods in development • Order and notify via e-mail for manual fetch • Order and automatic delivery (e.g. sftp)
Cospar10 Bremen Interesting issue - Retention • All netDRMS sites have full information for selected series - their “subscribed” series • But is it on line? • sites keep the latest, but must selectively discard • Enquiry modules can tell if online, but implications (delay...) if not? • You can request it, but it can take some time to obtain • for now quick, but after a year or so a record nobody has looked at will be from tape
Cospar10 Bremen Interesting issue - Saved searches • How to describe a selection of data • Can save result as a record list for a reasonable number of records but this does not save the query • save both query and result? • For both your own use and publication • Saved query might give different results (e.g. online only) • Relates to the issue of calibration
Cospar10 Bremen Interesting issue - Evolving calibration and which data did I use? • More accurate calibration will be available as time goes on and more calibration points are acquired • So the newest and best data can change • This done for most by applying a calibration series e.g. via Solarsoft • But there can also be metadata changes • The raw data is unlikely to change
Cospar10 Bremen Neat stuff to come - cutouts • This is well on the way again being developed by JSOC and LMSAL - for those who don't need the full 4Kx4K • Very much reduced data storage requirements • Closely related to event tracking and the HEK
Neat stuff to come - Helioviewer • www.helioviewer.org • Existing project now being directed towards use with SDO data • JPEG2000 based viewer with event marker overlay • integration with JPEG2000 series • rapid browsing with links to full data • ROB is CoI in requested next stage Cospar10 Bremen
Cospar10 Bremen Neat stuff to come - grid integration • The data element size (10's of MB) is natural for use in a high performance grid • The data already geographically distributed - variety of access routes • Distributed variety of resources - large clusters, pipelines, GPU's • Sites are on high performance research networks
Cospar10 Bremen Thanks to • JSOC at Stanford • LMSAL • Belnet and Geant2 for networking • The enthusiastic cooperation from the partner data centres • Our sister institutes at the ROB site for hosting the data centre and infrastructure
Cospar10 Bremen Web addresses • The main source : JSOC at jsoc.stanford.edu • HEK : www.lmsal.com/hek • ROB : wissdom.oma.be • SAO : www.cfa.harvard.edu/sao • GDS : www.mps.mpg.de/projects/seismo/GDC-SDO • UCLan : www.star.uclan.ac.uk • IAS : idc-medoc.ias.u-psud.fr