240 likes | 358 Views
EPIC SW Status. Guillermo Buenadicha TOS/OFX, SOC Operations Support Group Palma de Mallorca, 1 st February 2005. Introduction. ACTIVITIES SINCE LAST TTD SW Versions EMDH K upload EPIC PN Spare testing 5 YEARS OBSW, WRAP UP. Why modifying SW? Examples Where are we What’s next?
E N D
EPIC SW Status Guillermo Buenadicha TOS/OFX, SOC Operations Support Group Palma de Mallorca, 1st February 2005
Introduction • ACTIVITIES SINCE LAST TTD • SW Versions • EMDH K upload • EPIC PN Spare testing • 5 YEARS OBSW, WRAP UP. • Why modifying SW? • Examples • Where are we • What’s next? • OBSW Contract expiration
EMDH K UPLOAD • The EMDH K was uploaded on the 5th of May 2004, after testing on the Spare and a prior uplink end of April. • A refined version used (just SW reallocation), to avoid alarms related to boot of the EMDH Slave processor. • All items for extended BP handling in place, only pending issue is the modification of the ground SW to ingest them into the relevant FITs. This will be implemented into the new S2K system. • Also in place the preventive correction for the FW movement in presence of a sensor failure, and the HK flagging mechanism. • Any ideas for the later???
EPIC MOS: EMDH K ECR #7 • After cooling in rev 533, reduction by 8 in the number of Bright Pixels. • However, management still deemed necessary to cope with future increase. Preliminary analysis of options in ISS-MOS-BPT-TN-01 (July 2002). • Option selected for a demonstration prototype is to increase up to 250 pixels per HBR, using a data reduction strategy in the BPT storage. • The constraints in any implementation are the reduced and limited data space in the EMDH processors. The implementation chosen also impacts in the BPT uplink and report mechanisms, and the rejection procedure, although limited changes are foreseen there to maintain performance.
Existing logic, Safe entering Logic After EMDH K EPIC MOS: EMDH K ECR 12 Prime sensor Closed Prime sensor Closed Redun sensor Closed Redun sensor Closed Move FW Ext Temp sensor < 60 deg Move FW Ext Temp sensor < 60 deg Check Override bit What if FW Open, and Ext Temp broken (temp >60)??? If bit set to 0, no differences w.r.t. current situation. If bit set to 1, FW will move in any case.
EPIC PN SPARE TESTING • EPIC PN Testing on the Spare • Performed on the Panter facility on the last two weeks of Nov. 2004. LABEN, ESAC, PI’s. • Purpose of it was to verify the readiness of the Spare Chain, and to upgrade it with the latest SW (EPEA 517, EPDH K, latest Ops DB procedures). • Special test devoted to perform the fast dump of the offsets via HBR. • Dedicated set of tools developed to support testing and data analysis. • Identified the need of a warm reset in case of a CDMU crash (quite unlikely!!!). Procedure change.
25999 words CDMU BRAT selected: 16 Kbps for 1 quad 34 Kbps for 2 quad GROUND VC-7 TM EPEA FIFO 8 Kwords TC F0106 PST (130 Words + 10 msec) * 200 >= 2 secs HBR Buffer 40 Kwords Xqt_offs Science Queue 16 Packets HK 4 NP 4 Up to 68000 reads per second??? Dump_offset 1 packet every 250 msec. EPIC PN: ECR # 2 EPDH
NEW OFFSET DATA STRUCTURE CURRENT OFFSET DATA STRUCTURE EPIC PN: ECR # 2 + 59 x x 64 + x 216 + x 200 Q = Quadrant Id C = CCD Id T = Table Id B = Block Id P = Pixel Status O = Offset Value S = Seconds of Offset Calculation F = Fraction of Seconds I= Pixel I coordinate J = Pixel J coordinate Q = Quadrant Id C = CCD Id O = Offset Value
EPIC OBSW RELEASES EPEA 517 EMDH I EMDH J+ EPDH J EMDH J EPDH K EPDH I EMDH K
5 years of EPIC SW • EMDH 4 versions EMDH I, J, J+, K • EPDH 3 versions EPDH I, J, K • EPEA 1 version EPEA 517 • No other units modified
Reasons for SW modifications The OBSW is typically modified after launch due to 3 scenarios: • Need to correct launch SW bug and to tailor the OBSW during the commissioning phase. CORRECT • Fit the unit to the nominal performance and implementation of improvements. ADAPT • Preparation of the instrument for an extended lifetime and prevention/correction of HW failures or degradation. PROTECT
OBSW changes XMM CORRECT PROTECT ADAPT
Examples EPIC: Correct • EPEA 517 NCR’s 5 and 12, Low energy noise, Image correction consistent. • EMDH I TC rejected, FW movement, Headers and trailers not matching. (NCR’s 3, 9, 14) • EPDH I TC rejected
Examples EPIC: Adapt • EMDH J, J+ Internal LABEN fixes, Watchdog Function • EPDH J Watchdog function • EPDH K Fast dump of Offsets
Examples EPIC: Protect • CDMU J NCR 83, PN Operating heater autonomous switch off • EMDH K Extended BP capability Prevent possible sensor failure impacting on FW ???
Where are we? • After 5 years, it is time to sum up SW status and see what is needed. • Several NCR’s declared as “Unresolvable”, ground W/A in place. Are we happy? • Adaptations needed. • Any performance improvements? • Security issues • Do we need to monitor performance? • HW failures • How to deal with them • Mission lifetime and unit specifications
Unresolvable items • NCR-39 PN quadrants not listening. Performance. • NCR – 75 Fail to reset EPEA Time at 32400. Processor performace. • NCR –99 Late start of time in EPEA. Fixed in SAS but still problem OB. Processor performace • NCR-97 TC E0001 fails transmission. SW bug. • NCR-100 Erroneous fine Time in the Time verification packet. SW bug. • NCR –103, Corrupted events confused with time info’s. Processor performance. • NCR-106 Failure of F0119 (LBR error) • NCR –107 3 rows missing in MOS-2 CCD 6. Not clear.
Performance and security • NCR’s like 75 (EPEA time reset), 103 (FFFF words) affect quality of data, so far taken care by On Ground SW. • 107 3 rows missing in MOS2 CCD6 • MOS Offset??? • Response of PN quadrants to TC’s TREND MONITORING?, ACTIONS?
EPIC PN: Time Problems 3 Revs. 478 to 510: FAILS / EVENTS ratio
HW failures, possible impact • µP performance, future degradation Impact on BW and quality of the data transmitted (EPEA problems) • HW failures(PN Q2 Voltage converter current out of limit, EMAE 28 volts line down). RGS ADC error. • LCL trip of, NCR 83 on PN. • Processor resets, radiation harness, non EDAC memories. Changes in SW variables • CCD degradation or unavailability.
WHAT’s NEXT? • Better flagging and SW debugging? Becomes more and more important. Information on scheduled process, variable status, buffer occupation… • Failures HW related? Do we know the impact of LCL failures, quadrants not working, status of redundant channels (readout nodes, units, etc)? Any preventive measure like NCR 83 or ECR 12 OB? • Degradation and performance How to cope with CCD degradation. How to handle more failures or worse response from the event analyzers. • New modes foreseen Is there anything expected? Old modes to be revisited? • Instrument operations in reduced coverage scenario? Can we operate them with longer outage periods?
OBSW Contract • The OBSW contract ESA/Consortium expires mid 2005. • Is the HW going to be kept? • Expertise maintenance? • Any foreseen activity before expiration?