by Lawrence W. Kessler
Defense contractors responsible for qualifying, screening, storing, and using plastic encapsulated microcircuits (PEMs) in their systems face a fundamental reliability problem. Most of the qualifying data for PEMs is from electrical testing, even though field failures often result from packaging defects.
A defense contractor typically has considerable data relating to a part's upscreened electrical performance over the military temperature range of -55 to 125 degrees Celsius, yet when a PEM fails in the field the failure can originate from a fault in the plastic package.
Experts may trace the failure of a guidance system, for example, to an integrated circuit from a particular lot date code from a particular manufacturer. The defense contractor's records will probably list moisture sensitivity level handling instructions, along with either Go/No Go or attribute electrical test data.
Go/No Go data simply indicates whether a part, when tested, met the electrical parameters on the manufacturer's spec sheet, but does not further quantify the parameter values. Attribute testing (also called variable data testing) reads and records the test data for every electrical parameter. Since one part may have 50 or more electrical parameters, the reported test data is extensive, but it permits engineers, particularly in aerospace applications, to narrow their own criteria to select only those parts mostly likely to be reliable in a given application.
Despite the abundance of electrical data, an integrated circuit's real failure may arise from a void in the die attach material that caused the chip to overheat and fail. Or the failure may result from a delamination at the encapsulant interface that let moisture and contaminants collect on metal surfaces inside the plastic package, causing corrosion of the bond pads on the chip.
The number of different package-related failures that can happen inside any PEM is large. When a packaging anomaly causes an electrical failure, the failure generally could not be predicted by the electrical tests that the PEM has passed.
Packaging issues can be more difficult than electrical issues to define and predict. Great efforts go into X-ray, acoustic imaging, and destructive analysis of components, but the interior of a PEM is geometrically complex, and these methods can miss internal anomalies that will turn into field failures.
Traditionally, it has been very difficult to acquire PEM structural data that is as comprehensive as the electrical data without physically destroying the part to obtain structural data.
A new technology developed by Sonoscan Inc. of Elk Grove Village, Ill., helps capture the internal three-dimensional structure of a PEM, whether mounted on a board or loose, nondestructively in one process. The technology, called Virtual Rescanning Module (VRM), preserves all of the structural information about the chip, lead frame, and other elements in the volume of the package in a data file that called a "virtual sample." The output of this method is an initial acoustic image and a data file from which users can make 20,000 or more individual acoustic images.
If the PEM fails in service, technicians can scan the virtual sample to see the PEM's internal features in their original, unchanged condition. Technicians can apply any of several acoustic imaging modes to the data file to acquire the most accurate information about the original condition of the PEM. This analysis may reveal the previously unknown anomaly that caused the failure — information that may be applicable to other PEMs having the same lot date code.
Because they absorb moisture easily, each PEM type has its own moisture sensitivity level rating that determines how long it may be out of its moisture barrier bag and exposed to the atmosphere before solder reflow. Exposure can occur during the splitting of lots by a distributor, during final test at a third party facility, and at the board assembly facility. Despite the care taken to combat moisture sensitivity, a PEM that has an internal delamination, void, or other packaging anomaly often given no electrical indication of the anomaly until the PEM fails.
Military specifications often require hermetic ceramic or glass-to-metal seal packages if such a part is available. Since the implementation of the COTS (commercial ott-the-shelf) initiative, military-grade parts are frequently not available. Fewer and fewer hermetic military parts are being produced. A significant advantage of hermetic parts is the ability to conduct internal visual inspection before the closing of the hermetic seal. The manufacturing processes used in PEMs preclude this inspection.
A defense contractor unable to acquire a Qualified Manufacturers List or Qualified Product List hermetic military-grade ceramic device turns next to an industrial-grade PEM and — if an industrial-grade part is not available — to a commercial-grade PEM.
A military IC is typically approved for use in temperatures ranging from -55 to 125 C; an industrial part is usually rated from either -45 or -40 C to 85 C. Commercial parts are typically tested and approved from 0 to 80 C, but there is considerable variation from manufacturer to manufacturer. For example, a commercial part might be tested at 80 C, but not at 0 C, because the latter temperature is reasonably close to room temperature. The traceability of the history of a commercial-grade part by date lot code may be imperfect or lacking.
The temperature ranges are important because defense contractors or subcontractors build the systems to operate in harsh environmental extremes and are often forced to up-screen a particular PEM. Typically PEMs rated for an industrial or commercial temperature range are purchased and tested by a third party test facility or distributor. Testing covers the military range from -55 to 125 C.
If nondestructive acoustic micro imaging were applied, the upscreen process could be more efficient and more cost-effective. Defects such as encapsulant delaminations at the die, lead frame, and die pad regions, as well as die attach faults, could be identified and their growth could be followed during conditioning. By identifying PEMs that have pre-existing flaws prior to the upscreen process, the customer could contact the manufacturer early and return the product for replacement.
Acoustic micro imaging systems use an ultrasonic transducer that scans back and forth across a PEM. Several thousands of times a second the transducer pulses very high frequency ultrasound into the PEM and collects the echo. Consider the surface of the PEM as an x-y grid; conventional quality-control acoustic micro imaging collects one echo signal at each x-y coordinate. The echo signals are typically all from a pre-defined depth — the interface between the molding compound and the die face, for example. The result of this conventional scanning is one acoustic image of the selected depth.
The ultrasound that pulses into the PEM at each x-y coordinate typically travels through the entire thickness of the PEM, and sends back echo signals from every interface between materials. This means that each x-y coordinate actually generates a very large number of echo signals, even though only one signal is used in making the acoustic images for quality control or failure analysis purposes.
VRM captures all of the echo signals from all of the x-y coordinates. It thus captures the data representing every internal feature in the PEM. At the end of acoustic scanning, the technician has one acoustic image, but he also has a data file from which thousands of different images can be made to show what any desired internal feature looked at the time the scan was made.
Conventional acoustic imaging shows features at a given interface depth, such as the die attach depth. But the anomaly that will lead to a failure may lie at the die face, or at the base of the die, or at some other location that could be found only be exhaustive conventional imaging. VRM solves this problem by storing the data that will, if needed, generate the acoustic image of any internal feature.
A PEM may fail because a small void in the die attach acted as a thermal insulator and caused local overheating of the chip. The void may have been located away from the corners of the die attach and may have been acceptable. After the failure, thermal runaway may have considerably degraded the die attach and the surrounding plastic. Scanning the part's VRM, though, will show the original size and location of the void. If individual VRMs have been made of other parts in the same lot, the data in the VRMs makes it possible to recall PEM having similar voids.
It may be too time-consuming to collect a VRM file for every part in a lot, and it may not even be necessary.
Lawrence W. Kessler is president of Sono-scan Inc.
CRISIS!! COTS parts in service will wear out in three to seven years
by David A.Douthit
Lloyd Condra from Boeing and Joe Chapman representing the Defense Standardization Office of the U.S. Department of Defense (DOD) are traveling around the United States issuing this warning.
It applies to the new nanometer-size technology for integrated circuits. This is the first time in the history of the semiconductor/microcircuit industry that such a condition has been admitted publicly. Up to now, solid-state devices were claimed not to have any wear-out modes.
The commercial electronics industry has ceased to produce what is necessary for high-reliability and harsh environments. Even though this segment only represents about 1 percent of the component market it is involved in 100 percent of essential systems such as traffic control (air and ground), national security, telecommunications, banking, aircraft, manufacturing equipment, and medical devices. This "new" failure mode will have a major impact.
The challenge
"This challenge is as grave as any since the beginning of the solid-state revolution 50 years ago," Boeing's Condra said in his presentation as the scheduled keynote speaker at this year's Military & Aerospace Conference-West (with COTSCON) Dec. 10-11 in San Diego. "It must be solved strategically, not tactically."
Let's take a strategic look at the issues surrounding COTS. Up until this "wear-out" failure mode appeared, electronic components accounted for less than 10 percent of documented high reliability failures. In fact, generally the largest percentage (sometimes in excess of 50 percent) of anomalies is classified as NFF, or No-Fault/Trouble-Found. Anomalies involve events that do not meet designed performance requirements.
The one guaranteed result from a NFF is the failure will happen again; it will continue to occur until the problem is solved. Unfortunately, NFF problems will not be solved because the proper test equipment and testing protocols do not exist.
The reason for this is that reliability/qualification testing done during design and manufacturing has been based on military specifications and standards. DOD officials were concerned with parts surviving "overstress" situations involving factors such as temperature, voltages, vibration, and humidity, and not with lifecycle issues.
Since no "wear-out" mode for parts existed, prevailing thinking went, no "aging" tests were needed. This led to test equipment and test methods designed only to detect "youth" failures. Test equipment designed for field- and depot-level repair uses the same philosophy and methods. Tests for electronic/environmental stress screening — better known as ESS — were developed to detect and eliminate weak designs and or components. These accelerated stress levels have been in use for a long time. The documented results indicate that fewer than 10 percent failures were caused by faulty components. This would seem to indicate the program was fairly successful.
But now let's look at the overall picture. The efficiency and durability of systems has been steadily declining. Hardware and software are becoming increasingly complex and integrated. These systems are being used in increasingly harsh and difficult situations. In the past few years the failure rate for COTS components has begun to trend upwards.
The environmental stress capabilities for today's small-volume high-reliability applications are not necessary for large-volume commercial products. Commercial product vendors have long since quit building components and assemblies that would pass these stress tests. This left the high-reliability equipment suppliers with a serious problem.
Suppliers have attempted to do a "work around" by uprating components though third-party stress testing from organizations such as the University of Maryland CALCE. This "uprating" is based entirely on temperature limits. Unfortunately, humidity, contamination, vibration, and electrostatic discharge can also damage components and assemblies.
Legal liability issues associated with uprating have become complicated; component manufacturers do not want to be blamed if their components cause costly failures when operated outside of their factory-specified limits. Attempts are being made to "harden" or improve the temperature range of these components, such as by ignoring vibration. Yet these efforts still do not address the majority of system failures. Limiting resources to one minor issue, such as temperature, wastes time and money.
COTS components go out of production very quickly — generally in less than three years. This rapid turnover also includes passive components, materials, design tools, test equipment, plus manufacturing equipment and processes. This is the source of many problems for high-reliability systems.
Failure modes such as leakage currents, crosstalk, dendrites, conductive anodic filaments (CAF), delamination, and solder joint cracks are now common causes of intermittent failures. Field- and depot-level test equipment is incapable of identifying many of these failure modes. Here is the cause of many No-Fault-Found problems.
The combination of so many variables can create new failure modes and cause failure rates to vary wildly. We cannot wait for field failure reports to evaluate a design, because to takes years for a high-reliability system to go into production and even longer to reach the field. Even the testing of prototype systems under field conditions requires a great deal of time. Without accurate environmental testing based on expected end-use conditions, the idea that we can build highly reliable systems is laughable.
Present designs are losing durability and robustness as commercial industry quits following DOD requirements and moves towards less durable but more profitable designs. Industry makes money by selling products and services. High-reliability designs are less profitable.
Performance-based-specifications
Currently used testing methods, established under military standards and specification process, were not designed to determine lifecycle capabilities but rather to find infant mortality and warranty failures. They were based primarily on accelerated stresses to stimulate failure modes based on process, materials, or design weaknesses. Many, if not most, of the military standards relating to reliability, "how to" build, certify, and qualify components and assemblies, have been canceled.
Civilian leaders of the military services hoped that performance-based-specifications and standards implemented through contractual arrangements would assure long-term reliability. The main feature of these contractual arrangements is the requirement to maintain various levels of reliability, durability, dependability, and maintainability for specified lengths of time. The lack of test equipment and testing protocols capable of meeting the requirements of these contracts means there is no way to enforce performance-based-specifications/standards. These programs require predictability of the lifecycle for systems. The data concerning end-use conditions necessary to establish baseline-testing methods is incomplete, if available at all. It is not possible to determine the lifecycle of a system without this information.
"Wear out" — a new failure mode
The "wear-out" issue may tempt vendors and OEMs to abandon their present attempts at reliability testing. Even present-day ESS may be abandoned or greatly reduced. Why? Because some of these components will "wear out" before other stresses can cause problems.
This "wear-out" mode involves metal from the traces migrating across or diffusing into the silicon substrate. This issue has been known for years. Today's and tomorrow's reduced geometry, increased speed, and densities of IC designs in the nanometer range have moved this failure mode to the front. This is a thermodynamic-based failure mode. Even the elevated temperatures used in ESS can shorten the life of components. The possible result is manufacturers who barely turn on assemblies and systems to see if they function, and then immediately ship the parts.
These systems will need regularly timed replacements with newly designed hardware (and possibly software) because the original components are obsolete. Spare components will have possible storage issues, even at room temperature. This is dangerous for officials in the military, NASA, the FAA, and any other long-term high-reliability systems, who hope for 20-year life, but is profitable for their suppliers.
All this is based on what is known about this "wear-out" mode and current methods of dealing with reliability issues. Remember as many as 50 percent of anomalies are No-Fault-Found and less than 10 percent failures are attributed to component failures. The reduction in size, the increase in speeds, the lowering of signal strength, the change of materials, and new processes have created this new "wear-out" mode.
Because these changes are occurring in other segments of the electronics manufacturing industry it has led to other new failure modes and increased failure rates.
The new modes are based on stress factors other than temperature. There are four environmental stress factors that limit the lifecycle of assemblies and systems:
- temperature;
- vibration;
- humidity; and
- contamination.
Even if someone succeeds in developing non-wear-out mode components, 90 percent of the failures and a large percentage of No-Fault-Founds will still exist in military parts warehouses. This backsliding in reliability is beginning to have serious effects on equipment.
David A. Douthit is manager of the Mesa, Ariz.-based LoCan LLC consulting group. He can be reached by e-mail at [email protected].