140 likes | 230 Views
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013. ACA plan Manabu Watanabe National Astronomical Observatory of Japan. ACA involved failures in Q1 2013. 24% of failures have its origin in the bugs of ACA software.
E N D
ALMA Integrated Computing TeamCoordination & Planning Meeting #1Santiago, 17-19 April 2013 ACA plan Manabu Watanabe National Astronomical Observatory of Japan
ACA involved failures in Q1 2013 • 24% of failures have its origin in the bugs of ACA software. (From a simple JIRA ticket analysis) • Shared memory trouble and the bulk data trouble go up to 10/13.
Planning items and time frame(1) • ACA planning items with rough time frames
Planning items and time frame(2) • ACA planning items with rough time frames (continued)
Bug fix • Failure in attaching the shared memory • The observation fails if the problem happens, then the involved container should be shutdown and restarted. • sendData() method sometimes takes very long to return • The observation fails because CDP master fails to send data if the problem happens, then the observation should be run again. The root cause of the problem is still unclear, network, software (ACACORR, BDS, bulk data receivers),… • XP delay should be effective for the single-dish observation as well • The cross polarization delay does not work for the single dish cross polarization observation. It does work properly for the interferometry. • 1.907KHz shift in the center frequency of channels • The center frequency of channels are always shifted by about 1.907 kHz between the frequency label and the actual spectra. • Phase of the Walsh function should be changed every subscans • ACACORR has thought the 90 degree phase switching starts at the beginning of each subscan. But, actually LO starts the switching from 1970-01-01T00:00:00. The phase of the 90 degree phase switching could be different to each other. ACACORR plans to change the beginning phase of the 90 degree phase switching for each subscan.
Adjustment to ACA correlator (1) • Relax a health check of 3bit histogram • The observation fails when the total number of samples in the histogram is NOT equal to 3886632960 at the correlator calibration. 3886632960 samples corresponds to 960ms which is the sampling period of the histogram. This health check fails frequently in the observation these days and Fujitsu ensures the soundness of the histogram even when the total number of samples of that is different from 3886632960. We plan to relax the health check of the histogram. • Increasing time interval for getting 3bit histogram • The ACA correlator sometimes fails in the inter-module communication. The failure may lead to the observation failure. We suspect that the frequent getting 3bit histogram disturbs the inter-module communication of the ACA correlator. This change should be available in April or May after the further investigation of the problem. • Remove the check of FFT overflow flag in CDP nodes • CDP nodes print messages of the FFT overflow when CDP nodes detect FFT overflow flag in the data header which received from the ACA correlator. But, the ACA correlator had been changed. The FFT overflow flag is still there but it is trustless any more. It should be nice to remove the trustless FFT overflow messages from the container log of CDP nodes.
Adjustment to ACA correlator (2) • Suppress warning for the FFT overflow and the delta sigma overflow • CCC print messages of the FFT overflows and the delta sigma overflows when ACA correlator detects the overflows. These are useful information. The problem is the FFT overflows and the delta sigma overflows will continuously happen during the interval between observations. During the interval, the input signals from the antennas may NOT be reliable, e.g., missing frames, broken frames, zero signal levels, and so on. So, these overflow messages are useless in that case and very annoying. It should be nice to remove these overflow messages during the interval between observations. • Parallelize the monitor commands for all quadrants • CCC monitors the status (temperature, fan speed, voltage) of the ACA correlator. The monitoring will be parallelized for 4 quadrants. Get hardware failure command will be parallelized as well.
New features (1) • New ACACorrGUI • We have several requests to improve ACACorrGUI. Some of the requests are motivate of the totally new ACACorrGUI. The new ACACorrGUI will be a receiver of the BDF transmitted from CDP master and display the spectra for all baselines at one time. • Alarm based on the analysis on container log files • Failures occur continuously at a certain frequency in the observation with ACA correlator. It takes long time to identify the root cause of the failure every time. We plan to implement a simple log inspection program to push alarms by identifying some of the failures which are familiar occurrence. • ACA specific delay read from TMCDB • ACA correlator needs its specific delay compensation. Takeshi Kamazaki requests that the specific delay should be in TMCDB for necessary change. • Window function read from TMCDB • ACA correlator applies a window function by weighted running mean. Takeshi requests the weight function should be in the TMCDB for necessary change.
New features (2) • Finite dead time in the bin switching • ACACORR support the bin switching. The dwell time should be given in advance and the dead time should be zero. These assumptions should be justified for the frequency switching but not for nutator switching. • Increasing the number of bins (3 or more) • ACACORR support the bin switching for 2 bins usecase. The bins of ACACORR should be extended if 3 or more bins are needed. • WVR coefficients • ACACORR cares about the effective period of WVR coefficients but CORR does not. CORR could have multiple WVR coefficients for each spectral windows but ACACORR could have only one WVR coefficients for the receiver band at once. ACACORR should (or should not?) follow CORR. • ACACORR porting to 64bit OS • ACACORR should be ported into 64bit RH6.4 or so.
New features (3) • BDNT configuration read from TMCDB • Bogdan requests CORR and ACACORR to read the BDNT configuration from TMCDB. • TCP connection in BDNT • Bogdan requests CORR and ACACORR to use TCP instead of UPD in the data transmission from CDP nodes to CDP master.
Reqest for improvement (1) • Increasing efficiency (1) in Tsys measurement • Stuart requests ACA to reduce the Tsys measurement time to 30 seconds from 2 minutes ACA takes currently. We think we can reduce it up to 1 minutes by taking advantage of the subscan sequence with “delta requantization correction”. • Increasing efficiency (2) • Takeshi requests reduce the overhead (lead time and processing time) which takes about 20 seconds for the correlator calibration and about 15 seconds for the real observation. The slow response of the ACA correlator gives the major part of the lead time so software has a limited amount of time to be reduced. • Special data rate calculation in AUTO_ONLY mode • Takeshi requests that the data rate should be calculated as TP array when the number of antennas is 4 or less in the array regardless of their CAIs in the AUTO_ONLY mode. • Reduce unnecessary warning messages • It should be nice to reduce the annoying log messages where practical.
Reqest for improvement (2) • Updating 3bit linearity correction every integrations • Takeshi requests an enhancement of the 3bit linearity correction. • Updating delta requantization correction every integrations • Takeshi requests an enhancement of the delta requantization correction. • Automatic self-test of ACA correlator when ACACORR gets started • Takeshi requests ACACORR to run a self-test of ACA correlator (mci_st) automatically whenever ACACORR gets started. This will help the operator.
New features unspecified yet • Digitizer quantization correction • Takeshi should provide the algorithm of the digitizer quantization correction for ACA correlator. Then ACACORR will implement that. • Subarraying in an SB • ACA phase calibration may require subarraying in the execution of SB. Science should clarify the calibration plan first, then Computing should discuss about the implementation of that in detail. Probably, Scheduling, CONTROL, DataCapture, ASDM, OT, ACACORR should be involved.
No plan yet • 3LO in interferometry • 3LO is available for the single dish observation but 3LO of ACA does not work as planned for the interferometry. Takeshi explains the root cause of the problem in the ticket. Please refer to the ticket for details. Note that 2LO should work properly and 90 degree phase switching is another alternative. • Phase-up mode • ACA phase up mode for VLBI has never been considered seriously. The ACA correlator should need some further development work if the phase up mode is necessary which naturally requires some further works for ACACORR.