100 likes | 237 Views
Regional/Science Archives: MKI MWA EOR Archive. Melbourne Meeting December 2011. Purpose of regional/science archives Distribute load of archiving/distributing data Provide more convenient/speedy local access Provide repository for software and processed data for science topic
E N D
Regional/Science Archives:MKI MWA EOR Archive Melbourne Meeting December 2011
Purpose of regional/science archives • Distribute load of archiving/distributing data • Provide more convenient/speedy local access • Provide repository for software and processed data for science topic • Provide focus for group interaction • MIT for EoR; Melbourne for EoR; RRI for EoR; others? • Archives on same science topic should coordinate
MIT Current Status • Two computers • Data from X5-X16 (except X7 and solar data) Evaluating • File system performance (xfs much better) • Power usage and thermal loads
User Interface • http://mwa.mit.edu/eor_archive • Based on work by Dave Pallet at Curtin for selecting observations • We are adding download features. • We plan to add data quality measurements (e.g. # working Tiles) to the selection criteria. • Web page actions may also be performed by command line or python calls.
Data Quality Selection • Near term • Is all Instrument housekeeping nominal? • Were all data files captured? • How many tiles were functioning? • Were dipoles all working? (see talk by Aaron Ewall-Wice) • Were the band shapes good? ( see talk by Lu Feng) • Longer term • RTS meta data will be examined • Look to data analysis groups for more ideas
Future Build out of Archive • Current straw man • Due to recent rise in disk prices, if we had to purchase the archive today we could only afford 12 new computers each with 55TB (660 TB total) • We expect disk prices to fall again and we should be able to purchase > 1PB when we really need it • GPU’s are great for pre-processing (eg gridding); currently limited by thermal constraints.
What to put in Archive?(Feedback welcome) • The Archive will only contain EOR observations (no SHI, GEG or raw survey) • As much raw visibility data as possible, especially early on (725 GB/Hour) • RTS healpix images • RTS meta data • Foreground-subtracted data and images • Simulated EoRcubes • Survey for sky model • Other?
Data Format Issues • What is the format of the output of the new correlator? • Should we store the equivalent of splatdas output instead? • It would be really nice if all data files had a header, rather than separate info files. • RTS healpix output is well defined • RTS meta data formats need to be defined
Data Transport • How to transport a PB of data half way around the world? • Baseline plan is to ship disks • Writing disks is fast • No Additional hardware needed • Heavy, expensive to ship • Should we plan to ship tapes instead? • Can they be written fast enough? • Are tape drives reliable enough?
Issues and Concerns • Floods in Thailand driving up costs • Should be resolved by fall 2012 • Cooling Requirements • A significant cost (cannot be charged to grants in US!), especially with GPUs; lower disk costs => increased capacity => cooling issues • Coordination of efforts – archiving work package with representation at project meetings?