Accelerated Thermal Cycling: The Impact of Rate of Increase

Click For Summary

Discussion Overview

The discussion revolves around the impact of the rate of temperature increase during accelerated thermal cycling testing of integrated circuits (ICs) and systems. Participants explore how this rate might influence reliability outcomes, referencing specific testing methodologies and models related to thermal cycling.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions whether the rate of temperature increase should be considered in the Temp Cycling Acceleration Factor, suggesting that rapid thermal shocks may yield different behaviors compared to slower transitions.
  • Another participant, with a background in systems, notes that while thermal ramp rates affect reliability testing in systems, they are uncertain about their impact on ICs, citing the controlled nature of IC fabrication and packaging.
  • Reference is made to the "STRIFE" accelerated life testing method, with one participant mentioning an optimal ramp rate of around 10C/minute for systems, cautioning against both too fast and too slow rates.
  • A participant emphasizes that thermal testing of ICs primarily addresses electro-mechanical stresses rather than circuit operation or lifetime, particularly in the context of non-volatile memory reliability qualification.
  • Discussion includes the complexity of building reliability models, with one participant stating that extensive testing is required to derive coefficients for reliability equations, rather than simply inputting values into existing equations.
  • Another participant expresses interest in finding simpler equations for understanding lifetime acceleration factors, acknowledging the complexity of the problem.
  • One participant provides a link to material fatigue concepts, suggesting that thermal cycling relates to mechanical stress factors.
  • Another participant shares insights on the STRIFE testing process, detailing the importance of statistical sampling and the specific conditions under which testing is conducted.

Areas of Agreement / Disagreement

Participants express differing views on the relevance and impact of thermal ramp rates on IC reliability testing, with some emphasizing the mechanical aspects while others focus on the electrical implications. The discussion remains unresolved regarding the specific effects of ramp rates on ICs compared to systems.

Contextual Notes

Participants note that the complexity of reliability modeling and the need for extensive testing may limit the availability of straightforward equations for understanding the effects of thermal cycling parameters.

ZeroFunGame
Messages
93
Reaction score
5
TL;DR
when ICs undergo accelerated testing via thermal cycling, doesnt rate of temperature increase factor into the equation?
When ICs undergo accelerated testing via thermal cycling, wouldn't rate of temperature increase factor into the equation? For example, in Equation 4 in this pdf: http://www.issi.com/WW/pdf/semiconductor-reliability.pdf

The Temp Cycling Acceleration Factor is a function of Tmax,stress and Tmin,stress. Wouldn't this also be a function of the rate at which Tstress is increased? For example, if you thermal shock an IC from 0C to 100C in 1 second, it probably would have a different behavior than 0C to 100C in 1hr. Is this capture anywhere in the equation?
 
Engineering news on Phys.org
My background is more in systems versus ICs; the thermal ramp rate definitely affects the results of reliability testing of systems, but I'm not so sure about ICs.

Based on "STRIFE" accelerated life testing developed by Hewlett Packard, the optimal ramp rate for system thermal cycle testing is around 10C/minute. If you go faster than that, you get false failures from the overstress. If you go slower than that, you may not catch all of the potential long-term reliability problems that this type of Accelerated Life Testing is trying to find.

Thermal testing of ICs is different, in my experience. The fabrication and packaging of ICs is a pretty well-known and controlled science, unless you are trying to qualify a new lead frame or a new IC package or something. Thermal ramp testing is mostly about electro-mechanical stresses, and not so much about circuit operation or lifetime.

High temperature testing is used in non-volatile memory reliability qualification, since leakage from storage gates ratios with temperature. But fast temperature ramps would not show up any design issues with flash memory that I'm aware of.
 
berkeman said:
My background is more in systems versus ICs; the thermal ramp rate definitely affects the results of reliability testing of systems, but I'm not so sure about ICs.

Based on "STRIFE" accelerated life testing developed by Hewlett Packard, the optimal ramp rate for system thermal cycle testing is around 10C/minute. If you go faster than that, you get false failures from the overstress. If you go slower than that, you may not catch all of the potential long-term reliability problems that this type of Accelerated Life Testing is trying to find.

Thermal testing of ICs is different, in my experience. The fabrication and packaging of ICs is a pretty well-known and controlled science, unless you are trying to qualify a new lead frame or a new IC package or something. Thermal ramp testing is mostly about electro-mechanical stresses, and not so much about circuit operation or lifetime.

High temperature testing is used in non-volatile memory reliability qualification, since leakage from storage gates ratios with temperature. But fast temperature ramps would not show up any design issues with flash memory that I'm aware of.

Thanks for the response! Would you happen to know of any links on STRIFE? Like a STRIFE handbook? There's sparse information from a quick google search.

Also, would you happen to know if there are equations that can be used to model the reliability (or accelerated lifetime) as a function of ramp time and dwell time at the peaks?
 
Keep in mind I'm by no means an expert in cyclic fatigue issues.

My understanding of things like the coffin manson type reliability models is that you need to do a lot of testing with different parameters and then solve for the coefficients in the equation to build a reliability model, not the other way around. Ie short of some serious FEA tool with validated fatigue/creep models etc, I don't think you can just put numbers into an equation and get any sort of reasonable result. The equations we use are far more complicated, however I cannot share here because we have these standards under NDA, eg LV324.

Regarding cycle times and duration, as you have observed, cycle time, duration, temp swing, average temperature all excite different failure modes which should be accounted for in the model. So for example LV324 requires 5 tests just to exercise cyclic fatigue, passive thermal shock, short cycle power temp cycle (PTC) at two different delta Tj, long cycle PTC at two delta Tj, then the results of all these tests get plugged into a fancy equation to generate a predictive model.
 
  • Like
Likes   Reactions: berkeman
Thanks! Any chance there's a rudimentary equation that's used in textbooks? I recognize that fancy equations are needed to model actual product, but I'm just trying to get a back-of-the-envelop understanding of how these factors potentially play into the lifetime acceleration factor. Or, perhaps a basic equation just does not exist due to the complexity of the problem.
 
Thermal cycling is usually a mechanical stress factor - so the issue becomes that the mechanical structure of the UUT is critical for an understanding of the problem. --- as for Napkin calculations here is a reference for IGBT modules ... we're going to need a bugger napkin.
 
  • Like
Likes   Reactions: Asymptotic
ZeroFunGame said:
Thanks for the response! Would you happen to know of any links on STRIFE? Like a STRIFE handbook? There's sparse information from a quick google search.
Yeah, there may not be much published in the public literature, since the process was developed at HP to help make the reliability of their products better than their competitors. Many of those of us who worked at HP back then have moved on to other companies, and brought the STRIFE reliability testing paradigm along with us.

Here is a diagram from one of my old Product Design Specification documents, showing the STRIFE cycle profile that I used on that product during reliability testing. You want to put enough test units into the oven to give you good statistics, and you want to run them long enough to give things a chance to break if that part of the design has issues. For larger devices (like PC-size), you can usually get away with something like 10 test units. On smaller devices like network transceivers, you will want to be testing more like 100 at a time. Test times usually ran for several days, with about 90 minutes per cycle. We use liquid CO2 dewars and ovens to get the 10 degree C per minute ramp rates and -50C capability. You will typically test to temperatures 10-15C outside of the operating temperature range specified on the product datasheet. The product below was rated for the Industrial Temperature Range of -40C to +85C.

In the diagram, the cross-hatch periods show when we shut off the power to the UUTs. This is done to be sure that their power supply circuits can restart at both high and low temperature. The little notches in the start of the cooling profile (when starting to ramp down from the highest temperature) are there because the liquid CO2 needs to flow in the hose from the dewar to the oven for a few seconds in order to cool down the hose so that the liquid makes it all the way through the hose to the solenoid valve in the oven. After a dwell time at high temperature, the CO2 in the hose is in the gaseous state, so there is no cooling at first when the oven solenoid opens to try to start the cooling ramp down. That would mess up the 10C/minute integrity of the cooling ramp if we did not wait a minute or two to let the CO2 in the hose cool down.

We've found many problems over the years in product designs with STRIFE testing, and kept those issues from making it into the field in the final products. We've found everything from missing ground connections in inner layers of PCBs, to mechanically over-constrained heat-sinked power transistors, to marginal memory timing issues, to power supply start-up issues at low- or high-temperatures, and many more. :smile:
1574441394108.png
 
  • Informative
Likes   Reactions: Asymptotic
ZeroFunGame said:
Summary: when ICs undergo accelerated testing via thermal cycling, doesn't rate of temperature increase factor into the equation?

It does and the thing you're trying to avoid is call thermal shock. If one cycles too fast one gets temperature gradients on the thing getting cycled and those gradients lead to shock.

https://en.wikipedia.org/wiki/Thermal_shock
The size and magnitude of the gradient is hugely material and geometry dependent so I am not aware of any general lifetime equations that include could it. Basically to make the general model valid, one should put temp sensors on the thing getting cycled and if you observe (subjectively) risky gradients, slow down.

It's pretty hard to shock a SOC (or most ICs) though. Think about how fast a CPU core can heat relative to its cache, for example. Normal operations of most ICs create large power gradients across the die and the mechanical engineers (hopefully) created paths with low thermal impedance to get that concentrated power out and those low impedance paths make it difficult to create a significant gradient across the die via external things (like a thermal chamber).
 
  • Informative
Likes   Reactions: berkeman

Similar threads

Replies
1
Views
2K
  • · Replies 89 ·
3
Replies
89
Views
38K
Replies
1
Views
5K
  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 8 ·
Replies
8
Views
6K
Replies
4
Views
10K
  • · Replies 11 ·
Replies
11
Views
3K