280 likes | 441 Views
NEXPReS Project: recent developments in Italy Mauro Nanni Franco Mantovani Istituto di Radioastronomia - INAF Bologna Italy. NEXPReS is an Integrated Infrastructure Initiative (I3), funded under the European Community's Seventh Framework
E N D
NEXPReS Project: recent developments in Italy Mauro Nanni Franco Mantovani Istituto di Radioastronomia - INAF Bologna Italy
NEXPReS is an Integrated Infrastructure Initiative (I3), funded under the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° RI-261525. Design document of storage element allocation method This report is the deliverable D8.3 of the WP 8 Participants to D8: JIVE, ASTRON, INAF, UMAN, OSO, PSNC and AALTO.
14 Institutes 21 Radio telescopes Yellow/: current operational EVN stations Cyan/Red: existing telescopes soon to be EVN stations Cyan/Blue: new EVN stations under construction Pink: non-EVN stations that have participated in EVN observations Green: non-EVN stations with whom initial EVN tests have been carried out
Data acquisition and storage EVN observations > concentrated in three periods along the year > lasting about twenty days each > a burst in data production (and more time to plan the data storage and data processing) IVS observations − scheduled more frequently − have a rather shorter duration. − there are much less data to handle and less time to process them. The present analysis will be based on the amount of data acquired for both astronomical and geodetic observations during a period of about six years
Astronomical observations Total data acquired by EVN antennas in all sessions scheduled during the period 2005-2011
Data acquired by a single antenna in all sessions scheduled during the period 2005-2011
Terabyte of data acquired by each EVN antenna in the period 2009-2011
Size distribution of the data acquired Distribution by observing bandwidth Information kindly provided by Alessandra Bertarini Richard Porcas
Conclusions (from astronomical observations) Up to 100 Terabyte of data per session per station Some antennas are recording a lower amount of data In a 20 day session 30 datasets are at present collected on average We can estimate 50 datasets per session as a realistic upper limit (with 16 antennas we need to manage a maximum 800 datasets for session) Next future New back-ends will allow 2 Gpbs and 4 Gbps bandwidth (at 5 GHz) The capacity to store 150 - 200 Terabyte of data per station required
Geodetic observations Do not require too much space on disks Bottleneck is the data transfer speed from stations to correlators An antenna needs to store locally up to5 Terabyte for few weeks. (There is a plan to upgrade the bandwidth to 512 Mbit/s up to 12 Terabyte per antenna)
The storage units in the antenna network “Near” real time e-VLBI is a possibility a) store the data at the stations b) transfer the data sets to the correlator via the fibre optic c) start the correlation process This strategy is used in geodetic VLBI observations
Copy data of low stations on the correlator disks Correlation of local and remote data
Let’s suppose > EU stations can transfer data at 10 Gbps > Extra-EU have a poorer connectivity Requirement > 300 Terabyte of disk space at the correlator The time required to transfer all data depends on the transfer speed (To transfer 40 Terabyte at 512 Mbps requires 5 to 10 days) The next figure illustrate a possible configuration
1 – Attended antennas with local storage system (B) 2 – Groups of antennas with a central storage system and a local correlator (C) 3 – Unattended antenna with local storage system (D) 4 – Antennas with good network connection using a remote storage (E or A) 5 – Antennas with poor network connection using legacy disk-pack
A “possible” situation in future > Heterogeneneous systems at the stations > More than one correlator in operation > Distributed correlation (data read simultaneusly by different correlators) Requirement − A specialized data centre is needed − Located on a primary node of the fibre optic network − Able to provide the needed data throughput
Deliverable D8.3 Hardware design document for simultaneous I/O storage elements Search for a possible model The storage system should be > Cheap > High performance solution System: NAS-24D SuperMicro with motherboard X8DTL-IF CPU Intel Xeon E5620 at 2.4GHz Raid board 3 Ware 9650SE 24 disks 2Tby Sata II
Tests on 3 different Raid configurations with standard “ext3 linux file-system” Single disk 24 disks Raid_5 24 disks Raid_6 24 disks Raid_0 123 Mbyte/s 655 Mbyte/s 580 Mbyte/s 680 Mbyte/s Recording speed: 4 Gbit/s Raid_5: prevents the loss of all data in case of one disk crashes; it can continue to work at lower recording speed Raid_0: a disk crash implies the loss of all the recorded data Raid_6: performs like Raid_5 , however is more realiable
Three storage units prototype at IRA-INAF Motherboard SuperMicro X8DTH-IF 7 PCI-Expres x8 CPU 2 X Intel Xeon E5620 Raid board 3Ware SAS 9750-24i4e SATA 3 support Disks 12 X 2 Tbyte SATA-3 12/24 for tests Network Intel 82598EB 10 Gbits/s Costs per unit: 7,500 Euros
The storage system is managed as a collection of “tanks of radio data” connected to a router Cisco 4900
Evaluation of > various file-systems and related parameters > several systems and transmission protocols Results − Excellent performance with ext4 file-system − Increased writing speed: 1Gbyte/s in Raid_5 − Transfer speed between “tanks” of 500MByte/s (using Grid-FTP)
Space allocation method Stations should have enough disk space available to store data recorded along a full sessions (about 100 TByte) Experiments From 20 to 50 in a session Typical size of an experiment is 2-4 Terabyte Maximum size of an experiment is 20 Terabyte Experiment: a file for each scan, hundreds of files of tens of Gbyte maximum size of a file more than 1 Terabyte Antennas have different storage systems Example An economic COTS “tank” holds from 20 to 80 Terabyte of data Many “tanks” needed Some file systems provide only 16 Terabyte per partition
Two possible solutions: a) Optimize the disk usage filling up the partitions regardless of files produced by different experiments b) Organize individual experiment in a directory tree Both solutions require a table at each antenna describing the structure of the storage system and the amount of space available Those tables require to be updated at the beginning of a new session The storage allocation method is simple if the file are sequentially saved: an experiment is easily found by the first scan file and by the number of files belonging to that experiment
Management of Storage units A common access policy to the network storage system should be established Many network autentication/authorization systems can run under Linux: LDAP, Radius, SSH keys, and certificates A “Certification Authority” can be establish at the correlation centres “Grid-File Transfer Protocol” under test to evaluate if it is fine for our needs (it allows parallel and stripped transfer, fault tolerance and restart, third party transfer, able to also use TCP, UDP, UDT protocols) Comparison with “Tsunami”
Institute of Radio Astronomy Observatories Noto Medicina