Tabula Rasa Semantics, in Microprocessor Burn-in. Part-III

CPU by KeithSuppe @ 2003-08-04

Tabula Rasa Semantics, in Microprocessor Burn-in. Part-III:
There are a great many overclockers who swear by the benefits of Burn-in, even from a prima facie perspective it "seems" logical. Yet, truth of the matter is, the moment current begins to pass through an electronic device, its finite existence is predestined.

  • prev
  • Go to mainpage

Burning it In

Opposite to the convictions of many enthusiasts, "burn in" can only have deleterious long-term affects on a processor's performance, Burin in is not a temporary condition, because its' effects are permanent. And immediately following burn-in, the microprocessor is only stressed further in its overclocking. It’s possible enthusiasts have confused the term, burn-in with the manufacturing version, and there-by believe it is safe. But do not be mislead those claiming burn-in is safe "because manufacturers" put all their chips through these rigors. There is a significant difference in the semantics of burn-in as it is applies to industry standards, the methodologies are completely different, for example:


There is burn-in and then there is burn-in. In semiconductor manufacturing terminology "burn-in" is a stage of the production flow after packaging in which the CPU is placed in an elevated temperature environment and is stressed at atypical operating conditions. The end goal of this is to dramatically reduce the statistical probability of "infant mortality" failures of product on the street. "Infant mortality" is a characteristic of any form of complex manufacturing in that if you were to plot device failures in the y-axis and time in the x-axis, the graph should look like a "U". As the device is used, initially quite a few fail but as time goes on this number drops off (you are in the bottom of the "U" in the graph). As the designed life of the product is reached and exceeded, the failure count rises back up again. Burn-in is designed to catch the initial failures before the product is shipped to customers and to put the product solidly in the bottom section of the "U" graph in which few failures occur. During this process there is a noticeable and measurable circuitry slow-down on the chip that is an unfortunate by-product of the process of running at the burn-in operating point. You put a fast chip into the burn-in ovens and it will always come out of the ovens slower than when it went in - but the ones that were likely to fail early on are dead and not shipped to customers...There are two mechanisms that cause the circuitry in CMOS - particularly modern sub-micron CMOS - to slow down when undergoing the burn-in process: PMOS bias-temperature instability (PMOS BTI) and NMOS hot-electron gate-impact ionization (known as "NMOS hot-e"). Both of these effects are complex quantum-electrical effects that result in circuitry slowing down over time.



I believe NMOS hot-e gate impact is relative to SSOI, or SOI, while PMOS BIT is an industry standard test. Regardless one wonders what the innate potential of a newly fabricated CPU would be, if it were set aside from the burn-in criteria? It seems logical to theorize from the stress they endure during burn-in, their life potential is essentially cut in half, at least a third. I surmise if one were able to source processor's prior to burn-in, they would out-perform anything else in their class. If you were to ask manufacturers to give synopsis of their burn-in criteria, the following would best describe those processes:


Accelerated testing--increases the usage of a component in a short time...if a relay must operate 10 times a day in a certain application, and the required operating life is estimated at 10 years, the relay must operate 36,500 times. You can test a relay and estimate the life expectancy of a batch of the same type of relays by performing 1000 operations per day within 36.5 days...Accelerated tests apply temperatures in the range of 75° C to 225° C, depending on the failure mechanisms you want to test for, and depending on the type of device you are testing. In many cases, you will apply a nominal voltage to the component...You may also want to test components under conditions of high humidity, in the region of 50% relative humidity (RH) to 90% RH, and under temperatures ranging from 85° C to 150° C.



Albeit essential for physical stressing of the product, burn-in seems almost a barbaric manner in which to treat processors. I imagine as micro circuitry continues to evolve so will the QC, and back-line criteria for testing. In fact there have already been advances in QC methodologies for .13 and .09 micron die's:


As 130- and 90-nanometer processes move deeper into production, new statistical techniques are replacing the go/no-go orientation of traditional ASIC testing. LSI Logic Corp. this week announced an example of the trend: a technique that completely replaces go/no-go fault testing with a series of statistical analyses performed after the wafer has left the tester....The reasons for the technique, which LSI calls statistical post-processing (SPP), are several...They include the increasing importance of delay or other "soft" faults compared with "hard'' stuck-at faults, the growing problem of achieving adequate test coverage on multimillion-gate designs, and the gradual breakdown of silver-bullet parametric techniques such as IDDq tests....IDDq-the measurement of quiescent supply current-is a case in point. Comparing measured against nominal supply current has in the past been an excellent indicator of defect-caused faults in ICs, even when those faults didn't appear as a hard short or open circuit. But with 130- and 90-nm processes, the intrinsic leakage current is so large that minor variations in IDDq may be tiny in comparison. Worse, cross-wafer process variations may cause IDDq variations that far exceed those caused by a genuine defect. So screening dice based on IDDq both rejects dice that are probably defect-free and passes dice that probably have defects. The result is lower-than-necessary yield and higher early field failures.



These and other methodologies under consideration, should significantly improve defect detection, and perhaps the actual "stressing" of the microchip will soon become a thing of the past. It seems only logical this would extend the life of the processor, and most likely improve performance out of the box! In this scenario, I'd be more inclined to (as would industry experts) to measure the effects of "Burn-in" as defined by the enthusiast segment. In this respect, those first instances of junction capacitance, and raised voltages, would certainly have a greater effect, then on a processor which has already been stressed to near the point of failure. It would certainly come closer to meeting the criteria for my microelectronic tabula rasa theory, where the circuitry is unalloyed, or unaltered by the presence of electrons. We may never see these specific tests developed, however; brighter minds may extrapolate much more intelligible data then I.


In summation, I would like to point out one semantic oddity which piqued my interest. I'd come across the following statement in the Anandtech "Burn-in" article; "There may be some effect that people are seeing at the system level, but I'm not aware of what it could be." These final qualifications really stuck in my craw, as to completely negate the entire piece, or perhaps more accurately, leave open a "back door".

During the writing and more importantly research of this article, I realized based upon the large numbers of those who swear by the attributes of burn-in, there must be some type of occurrence? The problem lies in isolating where exactly the phenomenon exists. Is it in the microprocessor, at the system level, or in the belief system of the enthusiast? In fact the term "enthusiast" may denote a predilection to "believe" something positive is occurring, although it is not. Then again this is as random as believing an "overclocker" has a predilection for over winding his/her watch.

In order to have an intelligent discussion, we need to have a common frame of reference, so when the statement above is uttered (or printed) we have a clear definition for "system level occurrence." I cannot, therefore, in good faith recommend any type of burn-in, albeit the end-user, or the manufacturer. I can only wait, while manufacturing technology rapidly progresses. There may come a time, when some type of back-propagation "smart" electrons, perhaps a product of nanotechnology, can enter the processor mimicking precisely the attributes of DC, and tell us if any anomalies or propensity for failure exists in the device.

What I'm "enthusiastic" about is such technology is not only feasible, but foreseeable.

Comments, questions and suggestions can be placed in this thread @ our forums (no registration)

  • prev
  • Go to mainpage