1 / 26

SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels

Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB. SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels. What will be covered. Where is the data and the access architecture for the users

yagil
Download Presentation

SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Authors David Boyes, Benjamin Mampaey, Cis Verbeeck, Veronique Delouille, Jean-François Hochedez STCE + ROB SDO Data Access and Distribution in Europe and the WisSDOm Data Centre in ROB, Brussels Cospar10 Bremen

  2. Cospar10 Bremen What will be covered • Where is the data and the access architecture for the users • Some basic terms • User access methods • modules • basic web access • virtual observatories • simplified web access • pseudo files and other developments • Interesting issues • Retention • Saved searches • Evolving calibration • Neat stuff to come • Cutouts • Helioviewer • Grid integration

  3. Where is the data for the users • Data is available from one or more data centre(s) - all are networked • Some users are "close", some are "far" - distance matters • All data is available somewhere • Users can get data (an "export") • from the nearest centre directly • via the nearest centre from a remote centre • directly from another centre • Most of this is automatic • you will see differences in e.g. delays Cospar10 Bremen

  4. How the data is accessed(a bit technical) • the system is the netDRMS • created by the JSOC at Stanford • files are generated by content • system holds data files + metadata • SUMS + DRMS • mediator is an "export" module • makes your very own file • FITS, tar of FITS etc. • SQL etc. is hidden from user Cospar10 Bremen

  5. Cospar10 Bremen Access summary ... • No files until you ask for them • Data is referenced by content - provided as a file(s) with whatever name you want • The exported files are built using stored elements, so e.g. FITS with Rice compression quite direct as AIA data is stored internally in this format • Can get anything but... • you may as well ask for all metadata • the files can be large - best not to ask for 100's

  6. Cospar10 Bremen Some basic terms • series • basic collection of data items with shared properties • by convention named <project>.<data> • all series records share a metadata format (i.e. keywords) • keywords • FITS style keywords plus added metadata only keywords • correspond to columns in the metadata (DRMS) database • online means • available from a disk at the site • so offline means : not yet arrived/available, deleted but can be fetched • data format • whatever is stored is native (FITS, JP2000), conversion is post-processing • characterised by resolution, cadence (e.g. 4K x 4K at 10s, 1K x 1K at 90s) • naturally can't do better, but can reduce by "cutouts" in time or space • data records • can be several items as a group (e.g. image + bad pixel map + alternative format) • data is SUMS plus metadata, referenced by metadata tables (DRMS) - usually one to one • each is self contained, for example cadence is not part of data

  7. Cospar10 Bremen Example series • aia_test.lev1 AIA images 4Kx4K full disk full cadence • aia_test.synoptic2 AIA images reduced to 1Kx1K full disk and 90s cadence • hmi_test.M_45s magnetograms, 45s cadence • hmi_test.v_45s dopplergrams, 45s cadence • jpeg2K to come, browsing and forecasting

  8. Cospar10 Bremen User access methods • Direct via “modules” • on site of data centre • Query based • precursor to full data access • checks a part of the data (metadata) without having to retrieve the very large part • Indirect via network • web/http based • delivers data somewhere - maybe to fetch immediately or later • Direct via wrapper • on site e.g. IDL (Matlab on way)

  9. Cospar10 Bremen A practical pause - limitations • Sheer size of request - even if you have a 2TB USB stick, that's only 2 days • Network speed - at about 200Mb/s it takes a day to get a day's worth • Search/database speed - millions of records • Raw data access/retrieval speed - the basic image data takes time to get from disk • Retention time - you can get anything, but you probably have to wait for a full day from 2 years ago that nobody else has ever used

  10. Cospar10 Bremen Access by : modules - the basic bricks • At the data centres, for example • show_series • show_info • jsoc_export_as_fits [jdb@db1 ~]$ show_series aia_test.lev1 aia_test.synoptic2 drms.sites hmi.doptest hmi_test.m_45s hmi_test.s_720s lm_jps.lev1_test4k10s [jdb@db1 ~]$ show_info -s ds=aia_test.synoptic2 First Record: aia_test.synoptic2[2010-05-21T15:00:00.57Z][171] is first of 6 records matching first keyword, Recnum = 1 Last Record: aia_test.synoptic2[2010-07-14T11:58:41.07Z][335] is first of 2 records matching first keyword, Recnum = 445376 Last Recnum: 445377 [jdb@db1 ~]$ jsoc_export_as_fits reqid=REQ_FTP expversion=0.5 rsquery=aia_test.lev1[:#209866] path=tmp method=url protocol=FITS '10552320' bytes exported.

  11. Cospar10 Bremen Access by : basic web access • System developed by JSOC : lookdata.html • Online via JSOC web site, but heavily loaded • Being tested at ROB • Provides an easy access to an overview of all the available data • Formulating a selection query does require knowledge of query syntax • Provides for a wide variety of data packaging • normal user FITS or internal format (FITS with no keywords) • via web for immediate or later access, as one or more individual files or as tar • ROB working on fewer packaging options

  12. Access by : basic web access Cospar10 Bremen

  13. Cospar10 Bremen Access by : Virtual Observatories • VSO • development of existing VSO • prototype for SDO running and definitive version in preparation • http://sdac.virtualsolar.org/cgi/search • Soteria • demo provider made for ROB/USET, SDO provider being coded now • http://soteria-space.eu/ • Uniform search paradigm • Infrastructure hides efficient searches with complex syntax e.g. SQL in various flavours

  14. Access by : Soteria Virtual Observatory • One part of an EU project • Based on current web access technology • The example is for the ROB USET telescope as a data provider, each SDO site will able be able to act as a provider Cospar10 Bremen

  15. Cospar10 Bremen Access by : simplified web access • Work in progress • Limited offer to direct request of tar files or individual FITS format files, front end for PFS • Simplified enquiry based such as : • aia.lev1 + time + period + cadence + wavelengths • Preparation is actually more complex than basic access - for example it requires decisions as to what keys are useful for what series

  16. Access by : pseudo files (PFS) • Systematically named files in a directory tree with no real files until you access them • Typically based on query covering a much wider range than you really need (or could use) • Real files kept in cache so further access very cheap Cospar10 Bremen

  17. Cospar10 Bremen Access by : pseudo files (PFS) • Example with 160 file names, all AIA wavelengths, 15min cadence • In prototype at ROB, source downloadable mnt `-- aia_test.lev1 `-- 2010 `-- 06 `-- 17 |-- H0000 | |-- AIA20100617_000000570000_0171.fits | |-- AIA20100617_000003570000_0304.fits | |-- AIA20100617_000009580000_94.fits | |-- AIA20100617_000018570000_1600.fits | |-- AIA20100617_000050070000_211.fits | |-- AIA20100617_000053050000_335.fits | |-- AIA20100617_000056100000_193.fits ...... | |-- AIA20100617_004505070000_335.fits | |-- AIA20100617_004506570000_1600.fits | |-- AIA20100617_004508070000_193.fits | |-- AIA20100617_004509580000_94.fits | `-- AIA20100617_004511070000_131.fits |-- H0100 | |-- AIA20100617_010000580000_0171.fits | |-- AIA20100617_010002080000_211.f ....... |-- AIA20100617_043008060000_193.fits |-- AIA20100617_043009550000_94.fits |-- AIA20100617_043011090000_131.fits |-- AIA20100617_043018580000_1600.fits |-- AIA20100617_044500560000_0171.fits |-- AIA20100617_044502050000_211.fits |-- AIA20100617_044503570000_0304.fits |-- AIA20100617_044505070000_335.fits |-- AIA20100617_044506570000_1600.fits |-- AIA20100617_044508070000_193.fits |-- AIA20100617_044509580000_94.fits `-- AIA20100617_044511070000_131.fits 9 directories, 160 files

  18. Cospar10 Bremen Access by : useful methods in development • Order and notify via e-mail for manual fetch • Order and automatic delivery (e.g. sftp)

  19. Cospar10 Bremen Interesting issue - Retention • All netDRMS sites have full information for selected series - their “subscribed” series • But is it on line? • sites keep the latest, but must selectively discard • Enquiry modules can tell if online, but implications (delay...) if not? • You can request it, but it can take some time to obtain • for now quick, but after a year or so a record nobody has looked at will be from tape

  20. Cospar10 Bremen Interesting issue - Saved searches • How to describe a selection of data • Can save result as a record list for a reasonable number of records but this does not save the query • save both query and result? • For both your own use and publication • Saved query might give different results (e.g. online only) • Relates to the issue of calibration

  21. Cospar10 Bremen Interesting issue - Evolving calibration and which data did I use? • More accurate calibration will be available as time goes on and more calibration points are acquired • So the newest and best data can change • This done for most by applying a calibration series e.g. via Solarsoft • But there can also be metadata changes • The raw data is unlikely to change

  22. Cospar10 Bremen Neat stuff to come - cutouts • This is well on the way again being developed by JSOC and LMSAL - for those who don't need the full 4Kx4K • Very much reduced data storage requirements • Closely related to event tracking and the HEK

  23. Neat stuff to come - Helioviewer • www.helioviewer.org • Existing project now being directed towards use with SDO data • JPEG2000 based viewer with event marker overlay • integration with JPEG2000 series • rapid browsing with links to full data • ROB is CoI in requested next stage Cospar10 Bremen

  24. Cospar10 Bremen Neat stuff to come - grid integration • The data element size (10's of MB) is natural for use in a high performance grid • The data already geographically distributed - variety of access routes • Distributed variety of resources - large clusters, pipelines, GPU's • Sites are on high performance research networks

  25. Cospar10 Bremen Thanks to • JSOC at Stanford • LMSAL • Belnet and Geant2 for networking • The enthusiastic cooperation from the partner data centres • Our sister institutes at the ROB site for hosting the data centre and infrastructure

  26. Cospar10 Bremen Web addresses • The main source : JSOC at jsoc.stanford.edu • HEK : www.lmsal.com/hek • ROB : wissdom.oma.be • SAO : www.cfa.harvard.edu/sao • GDS : www.mps.mpg.de/projects/seismo/GDC-SDO • UCLan : www.star.uclan.ac.uk • IAS : idc-medoc.ias.u-psud.fr

More Related