1 / 19

EISCAT-3D Data System Specifications and Possible Solutions

Ian McCrea Rutherford Appleton Laboratory Chilton, Oxfordshire, UNITED KINGDOM 25 May 2009. EISCAT-3D Data System Specifications and Possible Solutions. Principles. EISCAT-3D is very different to EISCAT Much more low-level data Continuous operation, unattended remotes

holden
Download Presentation

EISCAT-3D Data System Specifications and Possible Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ian McCrea Rutherford Appleton Laboratory Chilton, Oxfordshire, UNITED KINGDOM 25 May 2009 EISCAT-3D Data SystemSpecifications and Possible Solutions

  2. Title here Principles EISCAT-3D is very different to EISCAT Much more low-level data Continuous operation, unattended remotes Interferometry as well as standard IS Lots of supporting instruments Store data at the lowest practical level Analysis can be done direct from samples Any pre-processing reduces flexibility A wide range of appliations and techniques Data volumes are very large Can’t store lowest level data forever Keep them until they are “optimally processed” Keep a set of correlated data forever (as now)

  3. Title here Types of Data • Incoherent Scatter • Continuous, complex, amplitude-domain data • Two polarisation streams/beam • 80 MHz sampling at 16 bits • Bandpassing, but limit set by modulation bandwidth • Interferometry • Continuous on limited number of baselines • Don’t record if nothing happening….but need ability to “backspace” and “run on”. • Save data until optimum brightness function is made and transferred to archive • Supporting Instruments • EISCAT-3D will attract many supporting instruments, using same data system • Some data sets big (e.g. imagers) but not always interesting • Suitable for mixture of short-term buffer and permanent archive

  4. Title here Types of Archive • Ring Buffer • High volume (~100 TB) short duration (hours to days) • Data accumulate constantly – oldest data over-written • Records IS data and interferometry when events detected • Needs to record latent archive data in event of network outage • Interferometry System • Small storage area (~100 GB), holds only the past few minutes of data • Data accumulate constantly, and tested against threshold • If event detected, divert to ring buffer (for backscpacing) otherwise delete • Permanent Archive • Large capacity (~1PB) permanent archive • Mid and high-level data @ 200 TB/year • Tiered storage, connected to multi-user computing facility

  5. Title here Data System Overview

  6. Some data rates….. • Lowest-level data • 80 MHz, 16 bits, 2.56 Gb/s/element, 4x1013 b/s (!) • Impossible to store • Combine by group (49 antennas) then into <10 beams • Each beam 25 TB/day (still the same order as the LHC) • Central site • Only one signal beam (because of transmitter) • Calibration beam(s) will be small data volume • Approx 1 TB/hour (320 MB/s) • Remote sites • 5-10 beams, but intersection limited • Challenge is of same order as central site • Need identical ring buffers at all sites Title here

  7. Band-passing and Bandwidths • 80 MHz sampling oversamples a 30 MHz band • But not all of this contains data… • N*Ion lines + 2N*plasma lines • Ion lines ~ 50 kHz, plasma lines ~ 100 kHz. • Seems that we can bandpass…. • Bandpassing limited by modulation bandwidth • Convolution of backscatter spectrum and pulse spectrum • Shorter pulses/bauds have higher bandwidth • Some codes at EISCAT have 500 kHz mod. band. • Bandpassing depends on coding.. • Worthwhile to bandpass for standard codes • But we need an algorithm to set the pass bands… • Design for worst case (no bandpass) • Don’t forget interferometry…. Title here

  8. Title here Higher-Level Data Rates • Interferometry Data • 19 modules tested (202 MB/s, 17 TB/day) • But maybe only 5% of samples above threshold • Five minutes of data = 60 GB (lead-in) • Permanent Archive • Continue to store lag profiles • Ability to store limited raw data • Data from supporting instruments • Current archive growth is a few TB/year • We want better time/range resolution • Allow archive to grow at > 1 TB/year

  9. Title here Supporting Instruments • Examined data rates from variety of instruments • Other radars (coherent scatter, meteor) • Lidars and advanced sounders • Passive optical (high-resolution imagers) • Radio instruments (riometers, VLF, GPS) • Magnetometers • Advanced instruments at central site only • Remote sites unattended, therefore • No instruments needing manual intervention • No huge data sets allowed • Data volumes can still be large • High-resolution cameras can produce 100s of GB/day • But not all of the data are interesting…. • Design for 150 GB/day at central site, 30 GB/day at remotes.

  10. Title here Approach to Vendors • Ring Buffer and Central Archives very different • Need completely different technical solutions • Interferometer is a subset of the ring buffer • Same problem, but smaller data sets • Based on data rates, produced specifications • Two questions for manufacturers: • Are requiremements achieveable now ? • What kind of technologies can be used to achieve them ? • Manufacturer approaches began at Storage Expo 2007 • ~20 companies contacted • Significant discussions with ~10

  11. Title here Specifications: Ring Buffer • Minimum of 56 TB short-term storage • IS and interferometry both produce ~ 1 TB/hour • ~2 days of full-bandwidth ISR data • ~2 weeks of bandpassed data • Several months of high-level data (weather latency) • 4 input, 8 output channels @ 160 MB/s • Somebody needs to read these data !! • 38 input, 38 output channels @ 6 MB/s • Less demanding things, including monitoring • Identical systems at central site and remotes • Power draw < 300 kW • System management, monitoring tools, warranties

  12. Title here Specifications: Central Archive • 1 PB of usable initial storage • Initially a five year archive • 400 TB on line • 600 TB “near on-line” e.g. tape library • Extensible at 200 TB/year • Input 300 MB/s over all channels • > 20 TB/day – allows fast filling • Output 1 TB/s over all channels • 100 output channels • Assume we have lots of simultaneous users • 2 high-volume output channels • Power draw <300 kW • System management, monitoring tools, warranties

  13. Title here Solutions: Ring Buffer • Two solutions for multiple input channels • Each channel separate, lots of channels • Multiplex into a few high-rate channels • Second solution probably better • e.g. 20 channels, multiplexed into 8 links of 6 GB/s. • Multiple drives, multiple enclosures • 10 drive enclosures • Each enclosure 50% filled (300 x 300 GB SAS drives) • Resulting capacity 90 TB (72 TB directly usable) • Expansion and degradation • Half-filled cabinets allow expansion • RAID6 allows graceful degradation • Power draw and temperature range within spec.

  14. Title here Solutions: Central Archive • Central site only • Can be a staffed system, easier maintenance • Mix of on-line and near-on-line • User access should look immediate, even for historic data • MAID – Massive Array of Idle Disks • Large disk arrays with “sleep mode” • Spin down if unused for given time (5-330 mins) • Example system: 1.2 PB archive • 24 GB/s bandwidth over 20 channels, 20 TB/hour backup • RAID 6 graceful degradation, modular “hot swap” components • Single system supports 1200 disk drives

  15. Title here Network Issues • Big issue is data transfer from the remotes • 1 beam is 320 MB/s, remotes will have multiple beams • Supporting instruments add ~30% overhead • Need to recover from interrupts quickly • Otherwise we may never catch up • Interrupts might last days/weeks • Fast links already practical • Protocols for 10 GB/s links exist already • We may need to provide some of the networking… • A back-up option is needed if the network fails • Something to tell us if the site is alive • …and how cold it is…… • Options are mobile phone, satellite, microwave link…

  16. Some Other Ideas • Project Blackbox • Containerised data centre • Transports on a truck • 1.5 PB disk storage • 7 TB memory • 250 servers, 2000 cores, 8000 threads • When full, drive back to HQ and put in another… • Turns out to be a very high-bandwidth solution….. • …provided you can integrate rates over time Title here

  17. Title here Visualisation • What kinds of visualisation will we need ? • This work transferred from RAL to UiT • Carried out by Bjorn Gustavsson • Learn from other radars • AMISRs already have same problem • Jicamarca has lots of imaging software • Software is open-source and adaptable • Allow users to bring their own routines • Develop an open source library • Make full use of cuts, movies etc. • Don’t try to be too smart • The human brain can only interpret so much !

  18. Title here Risk and Return • Some questions to consider: • Should we provide a data system like this ourselves ? • Why not let a commercial data centre do this ? • Should we explore other funding sources for this ? • The EU e-infrastructures programme • What’s the cost/benefit between the data system and (say) an extra site ? • What about metadata and services – almost forgotten so far ?

  19. Title here Summary and Conclusions • Based on an analysis of expected performance and data rates of the new radar, we proposed a data system with three distinct elements: • Cyclic buffers (short duration for low-level data) • Interferometry (allows ability to back-space to start of event) • Permanent Archive (ulimate home of all summary data and centre of user analysis) • Data system also handles supporting instruments • Storage, I/O and other specifications were put on all system elements and discussed with vendors: • Data volumes and rates are challenging, but can already be handled now • Appropriate systems can be had to provide functionality we need • Implementation depends on many things (funding, politics, scientific priorities etc.) • These will be better defined during next phase of the project

More Related