Those designing high-performance embedded systems are starting to realize that the inability for cooling technologies to keep pace with the shrinking sizes and increasing capabilities of computer hardware threatens to bring Moore’s Law to a screeching halt. Today’s most advanced cooling technologies are starting to take center stage.
By J.R. Wilson
As demands for more speed and capability drive board and component manufacturers to ever-greater chip densities, the ability to cool electronic and electro-optical elements has become increasingly difficult. This is especially true in military and aerospace applications where extreme environments and long lifecycles are the norm.
Each of the primary methods-convection (moving air), conduction (using solid materials) and liquid (circulating, spray and immersion)-has its own set of pros and cons.
“You can only cool about 1 to 1.5 watts per square inch with air, less than half that for conduction; with liquid, we cool up to 100 watts per square inch or better,” explains Ray Alderman, executive director of VITA, the open-standards organization in Scottsdale, Ariz. “In some instances, with the techniques, materials and concepts being developed, we could get up to 250 watts,” Alderman says. “That’s where we’re going. Some people claim we may get up to 1000, although I haven’t seen any real evidence of that.”
The choice of method depends on factors such as altitude, G-forces, noise, accessibility, size, and weight. “If you move more air, you create a lot of noise problems without getting any more cooling, Alderman says. “You also get eddies and hot spots, so increasing the air flow just creates more problems.”
Limits of air-cooling
Ivan Straznicky, senior staff mechanical engineer at Curtiss-Wright Controls Embedded Computing in Ottawa agrees that air-cooling has reached its limit.
“We specify our cards in terms of an air inlet temperature of 70 degrees Celsius and 15-cubic-feet-per-meter volumetric air-flow rate,” he says. “You can’t really specify much more than that because the air-movers you put on the chassis can’t supply much more than 20. Another issue with air-cooling is it has low specific heat and thermal conductivity and is compressible, so it isn’t as efficient as noncompressed liquids.
“Air cooling is being pushed as far as possible, especially in commercial and industrial applications, where it is the number one approach,” Straznicky continues. “They are looking for higher conductivity thermal interface materials, heat spreaders, and heat sinks; we all leverage off those innovations. But the simple fact is, with increasing heat and heat density it has begun to reach its limits.”
Another problem for critical embedded systems is the fan, which becomes a single point of failure due to the low mean time between failures (MTBF) of its moving parts. To deal with that, designers must install sensors to detect a loss of airflow and shut down the electronics before they get too hot-what Alderman calls a “graceful degradation.”
Elma Electronics in Fremont, Calif., uses flow-through software to characterize the enclosure, then, if given a thermal profile for the card, can create a thermal model of the entire box, identifying hot spots and temperature gradients to ensure the critical areas are addressed.
“We are asked to cool in excess of 15 linear feet per minute for board manufacturers, which involves two factors-how much power density is available, and how to design baffles to go over a particular slot,” says Ram Rajan, Elma’s vice president of engineering. “The key is increasing the airflow velocity for a particular board.”
“There are applications where forced-air convection cannot be used and a sealed enclosure is needed. That is primarily on aircraft flying at altitudes without enough air to work or a UAV [unmanned aerial vehicle] exposed to extremely low temperatures, dust, salt-spray and so on,” Rajan says. “The solution there is primarily conduction cooling, dissipating 100 to 150 watts per enclosure, with a ceiling of more than 40 watts per slot in a 6U form factor. These also are weight sensitive, so we are asked to provide solutions less than 15 pounds to enclose a power supply and three or four slots.”
UAVs present a different kind of problem for manufacturers, end users, and cooling specialists.
“If you look at the UAV market, you have to design something that is pre-cutting-edge, fairly lightweight, with ad hoc systems added on to get capability out there now. If you look at the services provided, you don’t have fluid cooling systems and other infrastructure to take advantage of,” says Matt Tracewell, executive vice president at Tracewell Systems in Columbus, Ohio. “Contrast that to a fighter jet, where you have other subsystems with cooling schemes you can take advantage of or standard racks and modules that will already cool a device you design to fit there.
“You also tend to have longer design cycles and bigger, tier-one design teams working on those,” Tracewell continues. “But on a UAV, you will see a lot more different companies working to deploy a system on that platform. For each specific class of UAV, you have to get the requirements-size, power budget, and weight-from two directions: the platform designer saying what space can be spared, power available, and where to dump the heat; and the component manufacturer saying this is the component and what I need. Then you work out the best solution. Good designs are in the detail-a general solution is a good fit for nothing.”
Conduction hitting the wall?
While conduction also is the first choice where electrical or audible noise cannot be allowed, industry experts say it, too, appears to have reached its inherent limits.
“We used to think 60 watts was the limit for 6U [230 by 160 millimeters]; again, higher thermal conductivity materials and innovative designs are the keys,” Straznicky says. “We’re starting to see incorporation of other advanced cooling techniques, such as heat pipes, into conduction-cooled frames.
“It depends on parameters-most specify cards at card-edge temperatures, typically 85°C-so that probably will top out at 120 watts. It’s hard to give a specific number because it depends on board layout and heat density per chip. If you had 100 watts on one chip, you couldn’t cool to 85°C unless it was spread out. So 120 watts at 85°C; if you used a liquid-cooled chassis, you might get to 75°C and push that cardage to 200 watts.”
Straznicky says the Achilles Heel of conduction cooling is thermal contact resistance where the card edge meets the chassis.
“Basically, you have two metal surfaces in contact and get thermal contact resistance on the order of 0.4 degrees Celsius per watt,” he says. “If you get 50 watts to either edge, you get a 20-degree rise in temperature, which is huge if you are trying to cool a device to 100 degrees.”
That can be partially alleviated by increasing the area of contact-or spreading out the thermal resistance. Engineers at Curtiss-Wright are investigating this by doubling the surface area in some designs. Increasing clamping force from the wedge locks serves a similar purpose, but soon reaches a point of diminishing return; doubling pressure, for example, does not cut thermal resistance in half.
Another approach to conduction is a cold plate, incorporating a hollow heat sink through which low-pressure, low-velocity liquid moves.
“We are seeing some conduction/liquid hybrids. When you snap a conduction-cooled board into an ATR chassis, some companies, instead of having a heat fin on the side of the ATR box, use a hollowed-out maze with liquid put through the sidewalls of the box. That is called a cold wall and can get you up to 75 to 80 watts per board, removing a lot more heat more efficiently,” Alderman says. “That isn’t high pressure-about two to three pounds per square inch. Just moving more liquid doesn’t move more heat, so you have to come to a balance between how much liquid you move and how much heat you move.”
Heat pipes-typically copper tubes with a wick in the center and a reservoir at one end that keeps the wick wet-also have utility in limited applications, such as fixed ground stations. Heat causes the liquid on the wick to evaporate, implementing one of the most efficient cooling processes-liquid changing to vapor. However, they also are subject to the G forces in jet aircraft and even submarines, forcing the liquid to pool in one end of the pipe and compromise its ability to cool.
Other liquid cooling methods include flow-through, spray, and immersion, which offer significant improvement over other forms of cooling. “We can cool up to about 600 watts and probably beyond on a 6U card” using liquid cooling, Straznicky says.
With flow-through, liquid moves from a valve attached to one end of a board and under the chips through a maze of interior channels created during the layer-lamination process. The heated liquid then moves out to a cooling tower-a coil of pipes cooled by a fan. With pumps, fans, and liquid, however, this system combines some of the worst elements of air and liquid cooling.
A partial answer lies with micromachined spray cooling-a series of miniaturized nozzles, pumps, and condensers dispersing a fine mist of Fluorinert, an inert and environmentally safe liquid from 3M Corp. that has been used to clean semiconductor wafers. The cooling system packager determines the temperature at which cooling is necessary, and the Fluorinert manufacturer sets the liquid’s boiling point accordingly. The aerosol creates a thin layer of liquid that evaporates when the chip exceeds its boiling point; the resulting vapor is pumped off and compressed to create condensation, which then runs through a heat exchanger and back into a reservoir for reuse.
Officials of the National Coordination Office for Networking and Information Technology Research and Development (NCO/NITRD) in Arlington, Va., say only a few ounces of Fluorinert in such a closed-cycle cooling system can extract 20 kilowatts of heat, using minimal operating power. However, the cost of creating the system itself can be prohibitive. With a half-life of thousands of years, Fluorinert is considered extremely inert in addition to being nonconductive and harmless to humans, all of which also make it extremely expensive.
“When you start cooling at the board level, it is very complicated-and expensive-to engineer the board, the maze of channels, dripless valves, pumps, and fans. Cooling at the systems level, such as with spray Fluorinert, gets really expensive; the chassis is hermetically sealed to keep contaminants out of the Fluorinert, then you have the fans and so on,” Alderman explains. “And if you had a spray Fluorinert system in an aircraft, what happens to the liquid when you are upside down? You would have to put pumps on all six possible sides and anything that requires a reservoir is not a system designed for pulling Gs without some complex responses.”
Even so, Alderman concedes, very high-density electronics demand some form of liquid cooling.
“For every 10 degrees you drop temperatures on electronics, your MTBF will double, so the hotter the electronics get, the lower the MTBF. The trick with liquid cooling for critical applications is, if you can get the temps down about 50 degrees, your MTBF will jump out to about eight years. That’s why the military is extremely interested in liquid cooling. That is an incredible benefit, despite all you have to go through to get there,” Alderman notes.
“By putting three or four boards in a can and cooling at the subsystem level, it becomes much easier, less expensive, and more reliable. The fire control radars on today’s jet fighters are liquid cooled, using a coolant material called PAO (polyalphaolefin), which is essentially Mobile One synthetic motor oil with another chemical that keeps it from degrading. It is very messy, but does a good job of cooling those radars.”
The Advanced Amphibious Assault Vehicle (AAAV) also uses spray Fluorinert to cool chips on boards in the chassis-one of the first liquid-cooled embedded systems on a military vehicle.
Nano cooling
The next big leap in cooling technology will come at the microscopic level, Tracewell predicts.
“The future is clearly in nanotech. If you look at what industry is doing with standard materials technology versus what some of the advanced papers are talking about in nano, then nano is taking the jump we need,” he says. “But in the next five years, my prediction is incremental refinement of existing technologies-until we start to get some breakthrough materials, which will begin to show up as an implementation that looks like a heat sink or heat spreader, and then we will go through whole process again.”
This is one reason electronics packagers such as Tracewell do not discriminate on what the component is-from laser diodes to computer boards-because the problem is more about heat density and conductive surfaces. As a result, they tend to approach the problem as generalists, rather than as proponents for one cooling method or another, which also helps them deal with designs that did not have sufficient input from cooling experts.
The S42 enclosure from Tracewell Systems is an ultra-high-performance computer platform for rugged air and ground-mobile applications requiring small size and extreme low weight. The power system has as many as five 900-watt plugging power modules. Cooling supports 200 watts per slot at sea level at 50°C; 85°C at 10,000 feet.
“We’re finding if you need to put a large quantity into a single box, things often are not thought through well. To get the heat out of the plate and the density up as high as possible, you embed heat pipes directly into the plates or put microchannels under each component,” Tracewell says. “As higher and higher power laser devices become available, designers are putting what a few years ago would have been considered an unreasonable number into a small box. Then you add the additional power they require.
“I’m looking for the best point solution for a specific application my customer is trying to deploy in. It may be the exact same item going on a fighter and a UAV, but packaged completely different for each application.”
Solving such problems often requires seeing alternative applications for technologies designed for some other industry.
“Rather than reinvent the wheel, you look at companies building microchannel cooling for chip-based products for the computer industry. They don’t think of it for laser packages, but you can adapt the work they have done to those applications. The same is true with heat- spreading technologies, taking the best of breed from a number of industries to get the thermal result you want,” Tracewell says. “It is a very dynamic and fluid industry. Taking these various techniques from other industries and mixing them together puts some very interesting design proposals out into industry.”
The cooling industry has been working for years to overcome the problems associated with all forms of cooling, including meeting the military’s stringent requirements for nontoxic materials, yet each advance in cooling tends to be countered by yet another advance in chip power and density-which means more heat in even smaller spaces.
“The tighter you pack electronics, bringing ten boards down to three, for example, you get a huge heat problem, so you have to liquid cool,” Alderman says. “Once you can liquid cool everything at 100 watts per square inch, you can pack your electronics magnitudes more densely and the chassis doesn’t have to be as big and the weight of the liquid coolant is balanced out.
“In February, Hewlett Packard released a liquid cooling system for a 19-inch rack for Pentiums that has to cool 30 kilowatts; by the end of this year, with multicore Pentiums, they will have to enhance that to 60 kilowatts per 19-inch rack. There have been published reports about power densities going up to 10 watts per square inch, but 50 to 60 with multicore processors. That’s an astonishing jump.”
As the military has become a relatively minor player in the electronics marketplace, most manufacturers have dropped military standards in testing to focus on commercial requirements. Straznicky points out that Intel designs and delivers 100-watt chips, but no one in the military and aerospace community designs 6U boards with 100-watt chips. So even if the next generation goes from 100 to 150 watts and density goes up 25 instead of 50 percent, cooling will still be a major issue.
There is also some question about whether military applications will need that kind of computing power, except for narrow requirements, Rajan says.
“They may only require one cluster of processors, for example, to handle a specific high-power requirement, with the rest of the system using more traditional chip packages, compared to some commercial systems, where all the applications may require processor clusters,” he says.
That difference, however, only adds to the growing divide between military and aerospace and the broader commercial marketplace of cell phones, TVs, and PCs.
“Those manufacturers want things to die in a few years so you’ll buy another one, rather than lasting decades, as the military wants. So the industry is out of sync with military requirements,” Alderman says. “They have no incentive to solve these problems, which is why they still make chips with 50 watts per square inch, because in an air conditioned house or office, it isn’t a problem. And they will continue to do that, so we have to deal with it at the systems level.”
Even so, some of the hotter chips, especially microprocessors, are being designed with the back of the die left open to implement cooling solutions, while others come with an integral heat spreader that can be air or conduction cooled. While that is intended to address commercial, not military, heat concerns, the new chip designs are, nonetheless, making it easier to cool in all applications.
The role of VITA 58
“We’re working on VITA 58, which is a modular subsystems approach to solving the heat problem,” Alderman says. “There are some new military requirements that every piece of equipment is set up for a 72-hour mission; you must be able to remove and replace all electronics and cabling in an M1 tank within 30 minutes-without using a single tool-and be ready for battle. We’ve shown how that can be done.”
Increasingly, the military is couching its requirements in terms of functionality, which also means commonality. If an M1 tank’s fire-control computer goes down, they want to be able to pull the GPS processing unit from a Humvee, install it in the tank, and download encrypted software from a satellite that turns it into an M1 fire-control computer.
By placing common computing component capability into a common form factor “can”-with full interoperability across platforms and services-the military can cut acquisition, maintenance, training, and logistics costs. Cooling (defined by VITA 48), power, and data all would be blind-mated, so the can (defined by VITA 58) could simply be latched into place.
Buying cans in bulk-specified to functionality rather than use-also would reinstate the military, to a degree, as a volume user of electronics. It also could help address the cooling issue, as well.
“Consider a can as a subsystem. Cooling is inexpensive and simple at the subsystem level and really straightforward. That can be air-cooled for those things that aren’t that hot, but each can will have the capability of being air-, conduction- or liquid-cooled, depending on how it is used and the military specification. SIGINT, for example, would have to be liquid cooled,” Alderman notes. “When you put the power, data and cooling sticks in, the power goes on top, data in the center and cooling on the bottom, so if you have a leak it doesn’t leak across the power and data buses.
“The goal is for it not to matter what’s in the can. We have too many form factors and buses, all creating problems in cooling and obsolescence. But if we go to the can level, you isolate yourself from the technology; the technology doesn’t matter, the form factor doesn’t matter so long as it fits in the can.”
This route also addresses another new military requirement-two-level maintenance-by moving from line replaceable units (LRUs), such as circuit boards, to line replaceable modules (LRMs).
“VITA 58 will allow you to do that, isolating you from obsolescence, giving the military the ability to buy in volume across the services and enable a tank crew-with only 10 minutes of training-to completely rebuild a tank and have it ready for battle in 30 minutes,” Alderman explains.
“At VITA, we are updating all the mechanical standards on the cans, all the architectural standards on rapid I/O, Ethernet, high speed serial interconnect, high-speed data downloads, etc.-how things plug together and work. But this is not just military-telecom, transportation, medicine and many other markets are looking at this, as well. It is for any critical embedded system.”
Composite materials also are playing a role in conduction cooling, especially those with much higher thermal conductivity than aluminum or copper, the current mainstays. That would not be on the chip itself, but on the cooling system-such as a heat spreader-implemented above the chip.
For COTS suppliers to military and aerospace manufacturers, the movement is clearly from air or conduction cooling throughout to liquid flow-through (LTF) at the system level and conduction at the chip for the next generation.
“I don’t see anything really required beyond liquid flow-through or spray cooling; with the capability of many hundreds of watts, where we are struggling with 100 watts now, that will give us head room for a good decade, anyway,” Alderman says. “But it is demand and technology driven. If customers had chasses that would accept COTS LFT now, you would see that demand.”
For military use especially, systems integrators have to think far enough in advance to design cooling systems up front that can handle technology refreshes in the future. That effort is greatly enhanced if the cooling contractor is involved from the beginning of the design effort. But that also has its problems.
“If it is a custom board and custom chassis, then it makes sense to work concurrently to define optimum chip location and air flow,” Rajan says. “We would like to start running the simulations as early as possible, but if the board is constantly being changed, it can slow things.
“If a board is still in development, they aren’t sure what the heat dissipation may be or where the chips will be located, so the thermal profile is in doubt. At that point, we are working with assumptions. The output of a thermal simulation is only as good as the input, so we have to make sure we are realistic in terms of the heat source.”
Straznicky agrees that accuracy declines as you try to predict the future further out.
“We try to support them by doing our own power-dissipation projection charts, taking data at the chip and board level from the last 10 to 20 years and extrapolating those to tell the integrators what they are likely to see 10 to 15 years down the road,” he says. “That makes it easier for them to design cooling systems for the anticipated heat loads they will be seeing.”
Or, as Tracewell puts it: “I don’t recommend being the last stop, because then you are more band-aid than design.”