Accelerated Thermal Cycling: The Impact of Rate of Increase

ZeroFunGame · Nov 19, 2019

When ICs undergo accelerated testing via thermal cycling, wouldn't rate of temperature increase factor into the equation? For example, in Equation 4 in this pdf: http://www.issi.com/WW/pdf/semiconductor-reliability.pdf

The Temp Cycling Acceleration Factor is a function of Tmax,stress and Tmin,stress. Wouldn't this also be a function of the rate at which Tstress is increased? For example, if you thermal shock an IC from 0C to 100C in 1 second, it probably would have a different behavior than 0C to 100C in 1hr. Is this capture anywhere in the equation?

berkeman · Nov 19, 2019

My background is more in systems versus ICs; the thermal ramp rate definitely affects the results of reliability testing of systems, but I'm not so sure about ICs.

Based on "STRIFE" accelerated life testing developed by Hewlett Packard, the optimal ramp rate for system thermal cycle testing is around 10C/minute. If you go faster than that, you get false failures from the overstress. If you go slower than that, you may not catch all of the potential long-term reliability problems that this type of Accelerated Life Testing is trying to find.

Thermal testing of ICs is different, in my experience. The fabrication and packaging of ICs is a pretty well-known and controlled science, unless you are trying to qualify a new lead frame or a new IC package or something. Thermal ramp testing is mostly about electro-mechanical stresses, and not so much about circuit operation or lifetime.

High temperature testing is used in non-volatile memory reliability qualification, since leakage from storage gates ratios with temperature. But fast temperature ramps would not show up any design issues with flash memory that I'm aware of.

ZeroFunGame · Nov 21, 2019

berkeman said:

My background is more in systems versus ICs; the thermal ramp rate definitely affects the results of reliability testing of systems, but I'm not so sure about ICs.

Based on "STRIFE" accelerated life testing developed by Hewlett Packard, the optimal ramp rate for system thermal cycle testing is around 10C/minute. If you go faster than that, you get false failures from the overstress. If you go slower than that, you may not catch all of the potential long-term reliability problems that this type of Accelerated Life Testing is trying to find.

Thermal testing of ICs is different, in my experience. The fabrication and packaging of ICs is a pretty well-known and controlled science, unless you are trying to qualify a new lead frame or a new IC package or something. Thermal ramp testing is mostly about electro-mechanical stresses, and not so much about circuit operation or lifetime.

High temperature testing is used in non-volatile memory reliability qualification, since leakage from storage gates ratios with temperature. But fast temperature ramps would not show up any design issues with flash memory that I'm aware of.

Thanks for the response! Would you happen to know of any links on STRIFE? Like a STRIFE handbook? There's sparse information from a quick google search.

Also, would you happen to know if there are equations that can be used to model the reliability (or accelerated lifetime) as a function of ramp time and dwell time at the peaks?

essenmein · Nov 21, 2019

Keep in mind I'm by no means an expert in cyclic fatigue issues.

My understanding of things like the coffin manson type reliability models is that you need to do a lot of testing with different parameters and then solve for the coefficients in the equation to build a reliability model, not the other way around. Ie short of some serious FEA tool with validated fatigue/creep models etc, I don't think you can just put numbers into an equation and get any sort of reasonable result. The equations we use are far more complicated, however I cannot share here because we have these standards under NDA, eg LV324.

Regarding cycle times and duration, as you have observed, cycle time, duration, temp swing, average temperature all excite different failure modes which should be accounted for in the model. So for example LV324 requires 5 tests just to exercise cyclic fatigue, passive thermal shock, short cycle power temp cycle (PTC) at two different delta Tj, long cycle PTC at two delta Tj, then the results of all these tests get plugged into a fancy equation to generate a predictive model.

ZeroFunGame · Nov 21, 2019

Thanks! Any chance there's a rudimentary equation that's used in textbooks? I recognize that fancy equations are needed to model actual product, but I'm just trying to get a back-of-the-envelop understanding of how these factors potentially play into the lifetime acceleration factor. Or, perhaps a basic equation just does not exist due to the complexity of the problem.

essenmein · Nov 21, 2019

Since thermal cycling, and power temp cycling are driving thermal strain related fatigue issues (ie mechanical not electrical problems), here is a good place to start:

https://en.wikipedia.org/wiki/Fatigue_(material)

Windadct · Nov 22, 2019

Thermal cycling is usually a mechanical stress factor - so the issue becomes that the mechanical structure of the UUT is critical for an understanding of the problem. --- as for Napkin calculations here is a reference for IGBT modules ... we're going to need a bugger napkin.

berkeman · Nov 22, 2019

ZeroFunGame said:

Thanks for the response! Would you happen to know of any links on STRIFE? Like a STRIFE handbook? There's sparse information from a quick google search.

Yeah, there may not be much published in the public literature, since the process was developed at HP to help make the reliability of their products better than their competitors. Many of those of us who worked at HP back then have moved on to other companies, and brought the STRIFE reliability testing paradigm along with us.

Here is a diagram from one of my old Product Design Specification documents, showing the STRIFE cycle profile that I used on that product during reliability testing. You want to put enough test units into the oven to give you good statistics, and you want to run them long enough to give things a chance to break if that part of the design has issues. For larger devices (like PC-size), you can usually get away with something like 10 test units. On smaller devices like network transceivers, you will want to be testing more like 100 at a time. Test times usually ran for several days, with about 90 minutes per cycle. We use liquid CO2 dewars and ovens to get the 10 degree C per minute ramp rates and -50C capability. You will typically test to temperatures 10-15C outside of the operating temperature range specified on the product datasheet. The product below was rated for the Industrial Temperature Range of -40C to +85C.

In the diagram, the cross-hatch periods show when we shut off the power to the UUTs. This is done to be sure that their power supply circuits can restart at both high and low temperature. The little notches in the start of the cooling profile (when starting to ramp down from the highest temperature) are there because the liquid CO2 needs to flow in the hose from the dewar to the oven for a few seconds in order to cool down the hose so that the liquid makes it all the way through the hose to the solenoid valve in the oven. After a dwell time at high temperature, the CO2 in the hose is in the gaseous state, so there is no cooling at first when the oven solenoid opens to try to start the cooling ramp down. That would mess up the 10C/minute integrity of the cooling ramp if we did not wait a minute or two to let the CO2 in the hose cool down.

We've found many problems over the years in product designs with STRIFE testing, and kept those issues from making it into the field in the final products. We've found everything from missing ground connections in inner layers of PCBs, to mechanically over-constrained heat-sinked power transistors, to marginal memory timing issues, to power supply start-up issues at low- or high-temperatures, and many more.

eq1 · Nov 22, 2019

ZeroFunGame said:

Summary: when ICs undergo accelerated testing via thermal cycling, doesn't rate of temperature increase factor into the equation?

It does and the thing you're trying to avoid is call thermal shock. If one cycles too fast one gets temperature gradients on the thing getting cycled and those gradients lead to shock.

https://en.wikipedia.org/wiki/Thermal_shock
The size and magnitude of the gradient is hugely material and geometry dependent so I am not aware of any general lifetime equations that include could it. Basically to make the general model valid, one should put temp sensors on the thing getting cycled and if you observe (subjectively) risky gradients, slow down.

It's pretty hard to shock a SOC (or most ICs) though. Think about how fast a CPU core can heat relative to its cache, for example. Normal operations of most ICs create large power gradients across the die and the mechanical engineers (hopefully) created paths with low thermal impedance to get that concentrated power out and those low impedance paths make it difficult to create a significant gradient across the die via external things (like a thermal chamber).

Accelerated Thermal Cycling: The Impact of Rate of Increase

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Electric power distribution from powerplant to homes

Why must residential electrical systems be connected to Earth (soil)?

One pin AC connector (Coaxial)

VFD for powering a car lift

Series motors, switched to parallel

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight