ZeroFunGame said:
In the IC electronics world, is there a metric that manufacturers put on their product that says if a certain % of the transistors fail, the circuit would still be operational? For example, if Apple tells TSMC that for their next iphone chip, they will deliver an SoC that is 100% function, but if after say 1 year of use and 2% of their transistors fail, the likelihood of the IC still working is 70%? Is that something that is spec'd?
I think you're looking at this wrong. In ICs it is simply
not a typical failure mode for transistors to just "stop working". If they work when they are manufactured, they typically keep working for a long, long time (unless you're working in an extreme environment such as radiation or cold temperature where stress can causes devices to fail. In this case you have a mean-time between failures (MTBF) specification). There are many transistors in a device that if a single one failed, the chip would fail (for example, a pull up device in an accumulator, or a power down switch, for instance).
In memories (especially DRAMs) soft errors (or single-event effects) are common. This means a bit in the RAM was flipped due to a cosmic ray, heavy ion hit, or similar. This doesn't cause the transistors to fail permanently, rather it causes a data error that the extra transistors included for error correction can fix.
Memories also typically have a lot of redundancy to handle manufacturing losses, so if a column of your memory is dead on arrival, it can still function fine in practice so you don't have to throw the chip away.
TSMC does not specify lifetime in the way that you suggest. Lifetime is determined (among other metrics) by Iq of the device (current draw) and chips die when they start leaking too much due to lattice damage. This happens faster at high temperature, but at room temperature the MTBF for a typical IC is 1000s of years or more. The packaging and bonding will fail far earlier than the chip. The silicon itself is extremely reliable (once it has been demonstrated to work at all).
Performance (not functionality) can drift over time, especially for analog circuits. In this case, the circuits typically are quite programmable to adjust biases and gains and so on. The additional registers used to tune analog circuits are called chicken bits and are necessary in modern SoC-targeted processes due to the extreme performance variability of the raw devices. It is next to impossible to design a high-performance op amp in a modern nanometer process these days without some type of tuning or calibration.
There
is a spec that TSMC will give Apple regarding manufacturing and that is yield. Yield is the percentage of chips fabricated that actually work. It can vary wildly depending on a lot of factors, and is critically important to price. One the nonrecurring engineering costs are paid, the price per wafer is typically fixed. This means higher yield means more chips per wafer, or equivalently, lower cost per chip. Apple and TSMC care very much about yield, believe me.