1 / 32

GLAST Large Area Telescope: NCR 529 Systems I&T Quality

Gamma-ray Large Area Space Telescope. GLAST Large Area Telescope: NCR 529 Systems I&T Quality. Agenda. Goal for today Review NCR history to be sure that everyone is on the same page Discuss possible sources of the problem

yahto
Download Presentation

GLAST Large Area Telescope: NCR 529 Systems I&T Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gamma-ray Large Area Space Telescope GLAST Large Area Telescope: NCR 529 Systems I&T Quality

  2. Agenda • Goal for today • Review NCR history to be sure that everyone is on the same page • Discuss possible sources of the problem • Review configuration changes and NCR history to see if we can identify a leading candidate • Discuss possible tests • Decide on the next steps

  3. NCR 529 Single Run History • The NCR started with a TEM register test that also exercises the Tracker registers • The GTFE calibration mask readout did not match what had been written to the GTFE • GTFE 23 in GTRC 8 in GTCC 2 in Bay 9 (Tracker SN 2) • Specifics of this test and follow on tests are below and in the next several slides • Included single runs in several configurations • Included two different tests run over clock frequency

  4. Single Run Errors • 135002864 • Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings • Errors: • 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b4a8 • 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b4b4b4b4b4b48 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b568 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b4b179697ad28 • module aborted with reason *** errors limit reached • 135002865 • Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 49 tests, 4 errors, 0 warnings • Errors: • 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b169796979694 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b76968 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa5a58b4b4b4b5694 • module aborted with reason *** errors limit reached • 135002867 • CALIB_MASK (GTCC 2, GTRC 8, GTFE 23): loading 0xAAAAAAAAAAAAAAAAL and reading back 0xAAAAAAAAAAA01554L. • 135002868 • Layer X17 , Beginning test with 0/24 split: , ERROR: Reading out channel 548 from layer X17 (not included in calib mask).

  5. Single Run Errors (continued) • 135002882 • Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings • Errors: • 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5b3dea6968 • 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b4b17962e2d28 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x58b262d5d2d5d2d0 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b169796970a50 • module aborted with reason *** errors limit reached • 135003064 • Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings • Errors: • 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5a5a5a5a50 • 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa5a58b179697ad28 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5a5a5a5a14 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa5a5a5a5a1013d28 • module aborted with reason *** errors limit reached • 135003067 • Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings • Errors: • 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5bfcb5b4b5b4b4 • 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa5a58b0906979694 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b4b4 • 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b5edf04060a50 • module aborted with reason *** errors limit reached

  6. Summary of Tests vs Frequency

  7. REGISTER TESTS • TEM register tests • Read / write tests are performed using multiple patterns • Tests patterns for 64 bit registers are: • 0x0000000000000000L • 0xFFFFFFFFFFFFFFFFL • 0x5A5A5A5A5A5A5A5AL • 0XA5A5A5A5A5A5A5A5L • Each mask register is tested two ways • Direct write followed by a read, using the 4 patterns in sequence • Broadcast write followed by read using the 4 patterns in sequence • TKR GTFE CHECK SCRIPT • Writes two patterns • 0xAAAAAAAAAAAAAAAA • 0x55555555555555555555 • Might stop on error

  8. Discussion of Symptoms • Failures are intermittent (I believe this is consistent with the Tracker level experience with NCR 104) • Not all reads fail • Not all instances of reading the same pattern fails • The read back patterns do not show a repeatable error • There appears to be some pattern sensitivity • The three readouts that fail may have clock frequency sensitivity • Have tested using both sides of the EM GASU • Have tested with and without powering the Calorimeter • Conclusion • The symptoms match what was seen on the Tracker level NCR 104 • We can’t tell if Tracker SN 3 performance has changed or whether the errors are induced by the test configuration

  9. What Could be Wrong? • Tracker GTFE sensitivity has increased • Circuit damage or degradation increased sensitivity • Tracker, flight TEM/TPS and EM GASU adversely interact (i.e. this is just how this particular tracker will work in the LAT) • TEM/TPS voltage is higher, potentially increasing sensitivity • Clock reaching the tracker is noisy enough to disturb the readout circuit • Termination resistance in tracker MCM incorrect • Termination resistance in flex cable incorrect • Bad cable mate in Flex cable to TEM • TEM/TPS is providing signals out of spec • TPS noise feeds into clock • LVDS signal distorted (e.g. one side with soft short to ground) • GASU to TEM cable introducing noise • Bad mate at either end • Build error in cable • EM GASU is providing signals out of spec • This specific GASU output connector has a problem • GASU removal and replacement introduced error • Other external noise sources disturbing normal operations of the tower • EGSE power supply • Chiller running or adds ground loop • Changes to Building 33 power or grounding • Software • Schemas • Other possibilities?

  10. NCR History • Tracker • Detailed Tracker SN 3 NCR information available in the backup charts • Nothing obvious related to this issue (at least to me) • NCRs 104 and 107 document issues at a lower level of assembly • NCR 104 documents the register readout issue similar to this NCR • Shows some sensitivity to duty cycle • NCR 107 documents the duty cycle issue which is a different problem • TEM/TPS History • NCR review complete • Some worksmanship NCRs, but nothing directly relevant • NCRs attached at the end of the presentation for reference • Data Package still in process • This unit had 60 operating hours

  11. Recent Configuration Changes • Changes in the configuration from 4 tower testing till now • Towers 8 and 9 installed and cabled with flight cables • Towers 4 and 5 were de-mated on the side facing towers 8 and 9, then re-mated • Installed the chiller on wings of LAT • Installed Shear plates over bays 0, 1, 4, 5.

  12. Possibly Related NCRs

  13. Potential Tests • Set the register to a known value, then repeatedly read it out • Failure pattern may help pin down the source (if it’s 60Hz…) • Connect EGSE to other GASU ports and check clock quality (or other signals) to see if the GASU meets spec • Connect tower to proven GASU port to see if the port that the tower is currently connected to has a problem • Connect tower to external crate to remove the EM GASU from the picture • Configuration matches earlier receiving test except for use of flight TEM/TPS • Put in a breakout box, check tracker termination resistance and verify TEM/TPS outputs are as expected • Based on our discussions, are there any others?

  14. New Topics for Discussion • Understand Root Cause of failure. • Known failure due to lack of rework. • Failure due to environment. • Failure due to degradation. • After Root Cause is determined, need to understand what options, and resulting impacts, are available. • Is rework an option? • Better access now than later? • Wait and watch? • What opportunities will be missed? • Access now vs. later? • Window between 6 and 8 towers? • What is impact of use as is? • MCM may still work if tested with the EM TEM and EGSE used for the TKR test. • Suggested Tests: • Change GASU port (it might have done already.) • Change the cable between GASU and TEM • Change TEM • Connect TKR to EM TEM and EGSE.

  15. New Information • Question 1) The TKR cable interface to the TEM is the only feasible access point to the TKR, is there anything that can be learned (or confirmed) by making measurements at this interface? • Measuring the TKR/TEM interface could check whether there is an additional contributing problem on the cable.  It does not give any visibility to the internal MCM termination resistor. • Question 2) Is it likely that an MCM with the 100 ohm GCRC termination resistor would successfully make it through the TKR test regime and not fail until LAT level testing? • The fact that this MCM made it through the TKR test regime and then failed at LAT level testing may reflect a difference in the environment it sees in the LAT.  If it has the wrong termination resistor, then it will be sensitive to changes in the clock and voltage.  • Question 3) If multiple MCMs, say 10, contain the 100 ohm GCRC termination resistor, is it likely that all make it through the TKR test regime and have only 1 fail at LAT level testing?  • Yes.  When this problem was first found more than a year ago, it was only showing up in less than 10% of the MCMs when testing at low temperature.  Due to natural variations, some chips have more margin than others, even with the 100-ohm termination.

  16. New Information • Teledyne paperwork for rework of termination resistor does not indicate that rework was performed on MCM 11377. However, Teledyne paperwork is not trustworthy. Based on SLAC visual inspection It is likely that MCM does have the correct termination resisters. • The tower worked with many EM2 EGSE successfully in SLAC and Italy, which indicates it works with majority of EM2 TEM. (It also worked in a certain ranges, DVDD=2.5-2.8 V, clock frequency=18-22 MHz, clock duty cycle=42-58%.) • The tower started exhibiting the problem as soon as it is connected with a flight TEM, indicating it does not like the clock feature of this TEM. • There is a chance that the tower will works with another flight TEM. Suggest installing another TEM on this TKR tower.

  17. Backup Charts

  18. GTC NCMR Report TEM/TPS 1835 NCRs - GTC

  19. Electronics NCR Report GTC NCMR Report TEM/TPS 1835 NCRs - SLAC

  20. Electronics NCR Report GTC NCMR Report TEM/TPS 1835 NCRs – SLAC (continued)

  21. Tracker NCR History • NCR 104 • Description • During the functional test at a temperature of -30C, the following errors occurred: • a) SN 1099, GTFE #23 sometimes did not read back correctly the calibration mask • b) SN 480, GTFE #22 sometimes did not read back correctly the data and trigger masks • c) SN 269, GTFE #23 sometimes did not read back correctly the data and trigger masks • d) SN 261, GTFEs #21, 22, 23 sometimes did not read back correctly the trigger and calibration masks • No errors occurred in the same setup at ambient temperature, at +25C, or at +60C. None of the other 11 MCMs in the setup gave any errors. The test procedure specifies that an NCR should be filled out in case of a failure at any of these temperatures, but the burn-in may proceed. Hence the burn-in at 85C is presently in progress for this set of 15 MCMs. • Disposition • 8/9/2004 2:19:01 PM marsh • All MCM units are to be reworked by replacing two 100 Ohm termination resistors with 75 Ohm resistors. Rework of these MCM units will be performed by Zentek. Source inspections will be performed 100% after rework, before conformal coat, and again at final inspection. Final acceptance of MCM units will be performed upon succesful retesting of reworked MCMs. • 8/13/2004 8:08:31 AM marsh • NCR closure was approved by Bill Jimenez, LAT Quality Engineering, and Robert Johnson, LAT Tracker Subsystem Manager (e-mails on file). • Root Cause • Root cause has been determined to be an issue of crosstalk between the register readback output and the clock signal, which caused a glitch on the clock resulting in inproper shifting of the register. • Corrective Action • Corrective Action consists of changing termination resistors R41, and R44 from 100 Ohm to 75 Ohm. Revision 10 to LAT-DS-00898, and Revision 9 to LAT-DS-00899, implements the change. Teledyne?s production line will change over in the near future. Until then, all MCM?s received from Teledyne (except S/Ns 259, 260, 263, 314, 346 [short], and 1088 [Tall]) with 100 Ohm termination resistors at R41, and R44, will be sent to Zentek for rework to the current revision. • Status • Closed

  22. Tracker NCR History • Attachment to NCR 104 (page 1 of 2) • NCR Supporting Document • Initiator Name: Robert Johnson • Found by: in-process test • Type of Nonconformance: minor • Discrepancy Level: flight hardware • Subsystem: Tracker • Item Description: MCM • Drawing #/Revision #: LAT-DS-00898-9 and LAT-DS-00899-8 • Supplier: Teledyne Electronic Technologies • Location: Los Angeles • Lot #: N/A • Serial Numbers: DS-00899: 1099; DS-00898: 480, 269, and 261 • Test Procedure: LAT-TD-002367-6 • Description of Nonconformance: • During the functional test at a temperature of –30C, the following errors occurred: • SN 1099, GTFE #23 sometimes did not read back correctly the calibration mask • SN 480, GTFE #22 sometimes did not read back correctly the data and trigger masks • SN 269, GTFE #23 sometimes did not read back correctly the data and trigger masks • SN 261, GTFEs #21, 22, 23 sometimes did not read back correctly the trigger and calibration masks • No errors occurred in the same setup at ambient temperature, at +25C, or at +60C. None of the other 11 MCMs in the setup gave any errors. The test procedure specifies that an NCR should be filled out in case of a failure at any of these temperatures, but the burn-in may proceed. Hence the burn-in at 85C is presently in progress for this set of 15 MCMs.

  23. Tracker NCR History • Attachment to NCR 104 (page 2 of 2) • Discussion: • The point in the MCM design with the most limited performance margin is well known to be the transfer of register information from the GTFE chip to the GTRC chip. This transfer is done by single-ended CMOS levels on a 3-state bus. Unlike the data readout, it was not made LVDS because register readback should never be done during running (and hence there is no issue of interference with the detectors). Also, unlike the data readout it was not made left-right redundant because failure of this functional feature would not significantly impair science operations. • The problem is that the 3-state bus runs the length of the MCM and has 26 drivers and receivers hanging on it. Therefore it has a large capacitance, and the driver on the GTFE chip takes quite a bit of time charging it up. This time is added to the time to send the LVDS clock from GTRC to GTFE, the time to receive the clock in the GTFE chip, and the time to pass through several layers of gates and buffers in the GTFE chip to move the data into and through the driver. If the data become stable on the bus (i.e. above the GTRC threshold) only after the arrival of the next clock edge, then the wrong bits get transferred and an error is detected. • The timing margin is affected by temperature, voltage, and clock frequency. In this case the –30C temperature is far below the Tracker operational range, by at least 50 degrees. Such a test demonstrates margin in some way, although in a much less straightforward way than simply raising the clock frequency. All of these MCMs passed the tests specified in the procedure LAT-PS-01971, including a test that was simultaneously 10% high in frequency and 5% low in voltage. It is very doubtful that these MCMs should be rejected as a result of the register readback errors because • the error occurs far outside the MCM operational range and even well outside of the Tracker test range. • the mask and calibration register readback are not essential features, in case that they really do fail during operation. The fault is always detectable, because the whole point of reading back the register is only to check the results against what was loaded. If there is a fault, one can easily verify that the register was loaded properly by doing a charge injection run, in which case the data comes back through a path that was proved even in these MCMs to work at all temperatures. • Nevertheless, first we need to be sure of the cause of the bit errors. It is possible that it is simply a result of bad solder connections on the cables in the test system. In fact we have had problems with that before. I recommend that following the burn-in we connect these 4 MCMs to different cables or in different locations on the same cables and repeat the test at –30C.

  24. Tracker NCR History • NCR 107 • Description • These MCMs had some event-data (from charge injection) readback errors only when reading to the right-hand side and only in the +60C test (they were fine at -30C and 25C). • We found that the errors from SN-366 went away if the VDD voltage is raised to 2.65V, but the errors from SN-592 did not. • We tried some other relevant temperatures as follows: • Tracker Acceptance Test Limit: 35C Both function perfectly. • Tracker Qualification Limit: 50C SN-366 functions perfectly but SN-592 has errors. • Disposition • Use-As-Is, all units which pass retest with 75-Ohm test cables, in addition to the duty cycle testing, as acceptable for flight use. • NCR closure approved by Bill Jimenez, LAT Quality Engineering, and Robert Johnson, LAT Tracker Subsystem Manager (e-mails on file). • Root Cause • MCM timing issue. Failures always occur in one of two internal memory buffers in the GTRC chips, while attempting to access them. Some chips are faster than others, and those modules that failed each have one GTRC chip that is on the tail of the distribution. In general, high temperature, high clock frequency, and high clock duty factor (> than 50%) will all contribute to failing modules with a marginal timing margin. Investigation determined that use of 100 Ohm termination resistors on the test cables, further eroded the timing margins. • Corrective Action • An MRB conclusion was that replacement of the 100 Ohm termination resistors with 75 Ohm resistors, would resolve the problem. • (1) Existing Flight and Test cables are all being reworked to incorporate 75 Ohm resistors. • (2) Drawings will be updated and all new cables will incorporate 75 Ohm resistors. (3) Duty Cycle Testing will be performed on all MCMs prior to shipment to INFN. The Burn-in procedure has been modified By Dr. Johnson to include Duty Cycle Testing, and is under review, while implementation of the duty cycle testing in the burn-in script is in process by Marcus Ziegler. • (3) The ICD is being revised to specify that the tracker function at 20MHz over the duty cycle range of 45% to 55%, by Richard Bright. • (4) Implementation of a test of this part of the ICD, in the Tracker test plan, is in process by Hiro Tajima • Status • Closed

  25. Tracker NCR History • Attachment to NCR 107 (page 1 of 3)

  26. Tracker NCR History • Attachment to NCR 107 (page 2 of 3)

  27. Tracker NCR History • Attachment to NCR 107 (page 3 of 3)

  28. Tracker SN 3 NCR History

More Related