190 likes | 205 Views
Tracker Timing Margin MRB. R.P. Johnson U.C. Santa Cruz October 15, 2004. NCRs. 107: failure of the strip data readout in the burn-in system at +60C. 114: same as 107. 118: same as 107. 139: same as 107, but some failures even at room temperature.
E N D
Tracker Timing MarginMRB R.P. Johnson U.C. Santa Cruz October 15, 2004
NCRs • 107: failure of the strip data readout in the burn-in system at +60C. • 114: same as 107. • 118: same as 107. • 139: same as 107, but some failures even at room temperature. • 193: about 10% of the MCMs have poor timing margins with respect to clock duty cycle. R.P. Johnson
1. The Issues • A few MCMs sent back incorrect data on every other event when operated at +60C (NCR 107). • Even more fail at +85C. • When investigating that, we found that the same error symptoms could be provoked in any MCM in the burn-in system by • raising the frequency enough, • lowering the voltage enough, • raising the clock duty cycle enough, • or some combination of these and temperature. • Some MCMs have very low duty-cycle margins at 20 MHz when operated with burn-in cables or flight cables. NCR 193 lists several that fail below 55% duty cycle (from an external clock source). R.P. Johnson
Duty Cycle Margins Results from testing 36 MCMs in the burn-in station. No significant improvements were seen using flight cables without connector savers. No obvious pattern repeats from CC to CC or from one set of 36 MCMs to another. R.P. Johnson
Other Facts • The errors always occur in every other event, and always are associated with the same 1 of 2 GTRC registers (or 2 of 4 GTFE registers). • The errors are never associated with bad parity detections. • The errors are never associated with trigger tag mismatches. • In a given set of conditions the problem generally only occurs on one side of the readout (left or right). • The error generally does not occur in the MCM test station, except at much higher frequency. Therefore: • The problem cannot be internal to the GTFE. • It cannot be in the transmission from GTFE to GTRC. • It cannot be in the transmission from GTRC to TEM. • It must be internal to the GTRC, probably in the reading and/or writing of one of the two internal memory buffers. • It is also related somehow to the cables. R.P. Johnson
2. Additional Testing? • Recently we started measuring the margins with respect to duty cycle in each burn-in setup (36 MCMs at a time). • Up to now there is no documented requirement on this margin, but I started placing into NCR 193 all MCMs that show any problems below 55% duty cycle. • Probably it would be a good idea to add this test to the LAT-TD-02367 procedure (environmental test and burn-in). • (However, I recommend that we first reduce the clock-bus termination resistance in the burn-in cables from 100 ohms to 75 ohms.) R.P. Johnson
3. Hypothesis • Suspected root cause: • The errors occur when a timing margin is exceeded in the internal GTRC communication with one of its two memory buffers. • The problem is exacerbated by the relatively poor quality of the clock on the flex-circuit cables, compared with the clean clock supplied over a very short cable in the MCM test stand. • We can test this by • Studying the clock signal on the cables. • See later slides • Studying the timing margin while varying characteristics of the cables. • Our first attempt was to see if replacing the burn-in cables by flight cables and removing the connector savers would help. This did not significantly improve the situation. • Second, we tried varying the termination resistor on the cable. This had a significant effect! R.P. Johnson
150 The timing margin with respect to duty cycle steadily improves with lowered termination resistance. These measurements were made using a flight cable (which is now nonflight) and no connector savers. Just going from 100 ohms down to 75 ohms will make all of our MCMs function up to at least 55% duty cycle. 100 75 50 35% 69% R.P. Johnson
Scope Traces 50 1 MHz 100 Clock on the cable, measured at the connector of the MCM closest to the TEM. R.P. Johnson
20 MHz 50 75 100 150 Clock measurements made at position 0 on the cable (the MCM connector nearest to the TEM). At 20 MHz the observed amplitude hardly depends on the resistance. R.P. Johnson
20 MHz 50 75 100 150 Clock measurements made at position 4 on the cable. R.P. Johnson
20 MHz 50 75 100 150 Clock measurements made at position 8 on the cable (nearest to the termination). R.P. Johnson
20 MHz, Cable Position 8 50, 50% 100, 50% 50, 55% 100, 55% 50, 60% 100, 60% The duty cycle that we see on the cable is always 5 or 6% greater than what supply to the EGSE system from the external Lecroy pulser. R.P. Johnson
Clock on the MCM 50, 55%, Cable RC 8 50, 55%, MCM 8 100, 55%, Cable RC 8 100, 55%, MCM 8 The clock output from the GTRC can be seen at the termination resistor on the internal MCM clock bus. See the two plots on the right above. The 100-ohm cable termination appears to give a slightly longer duty cycle on the MCM bus. R.P. Johnson
Conclusion • The EGSE stretches the duty cycle by about 5%, comparing what we see coming out of the CC versus what we input into the VME crate. This eats into the Tracker timing margin. • The first point of failure on the MCM when the duty cycle or frequency gets too large is always one of the two memory buffers in the GTRC. • There is a surprisingly large variation among the GTRC chips in their sensitivity to this. • The problem is far worse when using the TEM plus flex-circuit cables versus using the test-interface-board of the MCM test stand. • Although it is hard to understand exactly why, an unmistakable improvement in margin can be obtained for all MCMs by lowering the clock termination resistance on the cables. R.P. Johnson
4. Impact to Inventory • If we solve this problem by simply discarding all MCMs with low margin, we will lose 10% to 15% of our inventory from this alone. Probably that would require augmenting the production at Teledyne. R.P. Johnson
5. Suggested Corrective Action • Build Tower-A with cables as-is, but screen the MCMs (already done) to remove those with low margin. • Rework all cables already assembled to replace the 100-ohm clock-bus termination resistor with a 75-ohm resistor (we already have this part in the MCM production parts stock). • Introduce an ECO into the Parlex production asap, so that cables get built with the 75-ohm termination. • Make the same resistor change on the burn-in system cables. • Add the duty-cycle screening to the burn-in procedure (LAT-TD-02367). Retest all MCMs that have already gone through burn-in without this screening. • Reject MCMs that do not function over the full range of 35% to 55% in duty cycle of the external clock source (this is more like 40% to 60% duty cycle on the Tracker cable). R.P. Johnson
Impacts to Other Systems • Request that the electronics group verify that in the flight instrument the Tracker clock duty cycle will fall well within our MCM test range. • Add this to the Tracker ICD. Cost-Schedule Impact • Resistor cost is negligible. • Cost of the rework and Parlex ECO is unknown. Probably in the low 4 figures. • The cable rework for at least 1 set needs to be done by end of November for Tower-B. • The proposed new testing at SLAC can be accomplished with existing labor and negligible impact to the delivery schedule. R.P. Johnson
6. Effectiveness of the Corrective Action • There is no change in the design performance capability. • Based on our available data, we expect that this change will recover all, or nearly all, of the MCMs impacted by the subject NCRs. • Reliability of the system will be enhanced. The margins will be improved on all MCMs, significantly decreasing the likelihood of finding timing problems in the integrated system. 7. Recommended Disposition • A few of the subject MCMs have already been dispositioned for EGSE work. • Retest all of the remaining subject MCMs after the burn-in system and procedure have been upgraded. Use for flight those that pass. R.P. Johnson