Microelectronics Reliability 51 (2011) 1964–1967
Contents lists available at ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel
Dynamic active cooling for improved power system reliability A. Castellazzi ⇑, W.J. Choy, P. Zanchetta Department of Electrical and Electronic Engineering, University of Nottingham, Nottingham NG7 2RD, UK
a r t i c l e
i n f o
Article history: Received 30 May 2011 Received in revised form 30 June 2011 Accepted 19 July 2011 Available online 9 August 2011
a b s t r a c t This paper presents the development of an adaptive cooling methodology in which the thermal impedance of the power system is continuously varied as a function of both its power dissipation and ambient temperature. As compared with a fixed parameter cooling solution, such an approach enables for a reduction of the amplitude of thermal cycles and the improvement of overall system efficiency and operational lifetime. The study comprises of an initial theoretical part and of subsequent experimental proof-of-concept demonstration of the proposed cooling strategy. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Fig. 1 summarises the findings of a reference research project about the influence of both the average operational temperature, Tm, and the amplitude of thermal cycles, DTj, on IGBT module reliability [1]: both increasing average temperature and thermal cycle amplitude affect the module lifetime in the same direction, that is, the number of cycles to failure decrease. However, the quantitative influence of either factor can be very different: so, for instance, an increase of 20 K in the thermal cycle amplitude has a much more severe effect than a corresponding 20 K increase in average temperature. In particular, it can be inferred that, in terms of number of cycles to failure, an increase in average temperature is amply compensated by an equal decrease in the thermal cycle amplitude, resulting in a clear benefit for the IGBT module operational lifetime. That aspect is illustrated by the arrows in Fig. 1: moving, for instance, from point 1 (Tm1 = 333 K, DT = 50 K) to point 2 (Tm2 = 353 K; DT = 50 K) the average temperature is increased of 20 K, implying a reduction in the number of cycles to failure from about 3e6 to 5e5; however, a reduction of 20 K in the DT value for Tm = 353 K (i.e., moving from point 2 to point 3), brings the lifetime estimate back up to 1e7 cycles, that is, even better than the starting point. A similar situation can be observed for many other test conditions, for example in the cycle comprising points a, b and c. In the experiments underlying the results of Fig. 1 both thermal resistance changes due to modifications in interface layers and bond-wire lift-off are reported to have been observed in a concurrent way, with changes in thermal resistance being the primary factor and wire-bond lift-off a secondary, though possibly the ultimate, failure mechanism [1,2]. Temperature variations in power systems are caused, on the one hand, by power dissipation and self-heating (active thermal cycling) in the components (e.g., semi-
conductor devices) and, on the other hand, by changes in ambient temperature (passive thermal cycling). Presently, thermal design mainly aims at ensuring that a given maximum temperature is never trespassed at a critical system location (e.g., semiconductor power device junction temperature), to avoid catastrophic failure, and is based on cooling solutions with fixed parameters, defined on the basis of full-load or worst-case conditions. However, most power systems actually work at their rated maximum power or under abnormal overload conditions for a very small fraction of their operational lifetime. So, while prevention of catastrophic failure is a clear need, it is nevertheless true that thermal design is far from an optimum in terms of overall system performance (i.e., unnecessary power put into the active cooling device reduce overall system efficiency) and long-term degradation (i.e., the amplitude of thermal cycles). Based on these observations, it is reasonable to consider the development of a cooling approach which may allow for a somewhat higher average operational temperature in favour of a corresponding reduction of the amplitude of thermal cycles. 2. The dynamic active cooling concept For ease of illustration, in the following reference is made to forced-convection air cooling (e.g., fan attached to a heat-sink). The discussion is however valid and can be applied straightforward to the case of pumped liquid cooling, too. As is common for treating thermal problems analytically, a linear system is assumed (i.e., temperature independent thermal properties). So, taking into account both self-heating and changes in ambient temperature, temperature variations at a given system location, X, can be described by:
T X ¼ T AMB þ ⇑ Corresponding author. E-mail address:
[email protected] (A. Castellazzi). 0026-2714/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.microrel.2011.07.072
Z
ðPHEAT PCOOL ÞðsÞ Z_ Th ðt sÞds
ð1Þ
A. Castellazzi et al. / Microelectronics Reliability 51 (2011) 1964–1967
1965
The combination of the two solutions into a unified cooling scheme represents the envisaged dynamic active cooling concept and will be considered in a subsequent step. 2.1. Temperature regulation
Fig. 1. Summary of reliability IGBT reliability testing results [1].
to which the linear physical thermal model of Fig. 2 corresponds. PHEAT is typically defined by the power losses in the system components (mainly the semiconductor devices) and PCOOL will depend on the particular cooling mechanism and device. For convective boundary conditions, it is usually expressed as:
PCOOL ¼ h S ðT X T AMB Þ
ð2Þ
where h is the heat-transfer coefficient, which incorporates the description of a number of factors (shape, material, size, type of heat-sink). In the envisaged dynamic active cooling scheme, h is no longer constant, but it becomes a function of power dissipation and ambient temperature, to yield a varying value of both ZTh and PCOOL in Eqs. (1) and (2), depending on real-time operational and boundary conditions, to minimise temperature variations at a given critical location of the system, denoted with X. In particular, as discussed more in detail ahead, for the case of a fan, the cooling heatflux can be expressed as
U¼
T X T AMB RThN þ a RThHSO
ð3Þ
with RThHso indicating the equivalent heat-sink thermal impedance without the fan action (but including convective boundary conditions) and a varying between 1 and a small, non-zero value. To develop sound understanding and an optimised cooling solution, initially compensation of power losses and variations in ambient temperature are treated separately. Correspondingly, a distinction is made between: – temperature regulation,consisting in an open-loop real-time adaptation of the thermal impedance as a function of the instantaneous power dissipation; – temperature control, consisting in a closed loop scheme aimed at counterbalancing primarily variations in ambient temperature.
Fig. 2. Simplified conceptual model of the thermal system.
The temperature regulation approach is schematised in Fig. 3. It refers to a feed-forward control type implementation, where the power dissipation (e.g., in the semiconductor devices) is sensed and used to set the speed of the fan in order to minimise variations in TX. The attractiveness of this approach lies in its being open-loop (i.e., it is not affected by stability problems and constraints) and in the possibility to anticipate temperature variations, that is, of reacting to disturbances (changes in the power losses) before any variation in temperature has actually taken place, making the system response faster. Moreover, sensing of the power (if not directly the power dissipation) is relatively straightforward and oftentimes the information is already available within the power system design. The limit of the scheme is that it does not directly monitor and correct TX, which is the actual variable whose value needs to follow a given reference. 2.2. Temperature control The temperature control approach is schematised in Fig. 4. It refers to a classic closed-loop feed-back scheme, where the value of TX is continuously sensed and compared with the reference signal, controlling the speed of the fan corresponding to the magnitude and sign of the error between the two. The merit of this approach is that it allows for a tighter control of TX and also captures and reacts to variations in ambient temperature. The drawback is that it requires a change in TX to have taken place before any corrective action is started. Also, identifying the most suitable candidate for TX and properly sensing it is typically not very easy within a real power system (e.g., TX may be the temperature of the top metallisation of a power device or that of the solder layer below it, both of which cannot be really easily accessed; or, even if they were accessed, their variations may be too fast for changes in the speed of the fan to compensate them [3]). 2.3. Dynamic cooling As is apparent from Fig. 4, the sense and feedback of TX can in principle also capture variations brought along by power dissipation effects. However, having a separate and specifically designed feed-forward loop, too, further to enabling relatively easy implementation, as discussed above, is expected to be useful in anticipating temperature variations and thus speed up the system dynamic response. So, the envisaged global scheme for the proposed dynamic active cooling concept is as shown in Fig. 5. With the envisaged solution, the maximum TX temperature value remains unchanged as compared to a standard heat-sink design with fixed parameters. However, the minimum and average temperature values are raised. This can clearly result in higher
Fig. 3. Concept scheme of temperature regulation approach.
1966
A. Castellazzi et al. / Microelectronics Reliability 51 (2011) 1964–1967
Fig. 4. Concept scheme of temperature control approach.
Fig. 7. Plot of the regulated temperature, TX, as a function of dissipated power, P, without and with the regulation scheme in place, respectively.
Fig. 5. Concept scheme of envisaged dynamic active cooling scheme. Table 1 Measured transient values of P and THS with and without the regulation scheme.
average switching losses of the power devices, but as a whole, system efficiency is expected to be improved by reducing the energy usage in the cooling device by proper design. 3. Proof-of-concept experimental demonstration Details about the methodology employed to produce the required regulation law and corresponding hardware implementation into an electronic circuit were presented in [4]. The hardware test assembly is shown in Fig. 6: the top power resistors, connected to a regulated current source, act as the heating elements; a series sense resistor monitors the current level and, indirectly, the power dissipation in the resistors; this value is the input to the regulation circuit, which produces a varying output voltage controlling the speed of the fan. The fan nominal bias voltage is 12 V; its maximum rating 18 V. A thermocouple is used to monitor the temperature at the centre of the heat-sink. 3.1. Steady-state regime In a first set of tests, the power dissipation in the resistors was varied statically between around 100 and 200 W. Fig. 7 summarises the results for the steady-state case: the average temperature in the case of a constant, non-regulated fan bias voltage is Tm,12V = 54.5 °C, for a maximum temperature variation DT12V = 19 K; if the regulation circuit is applied the values are changed to Tm,Reg = 58 °C and DTReg = 8 K, with a clear benefit for the system reliability according to the results of Fig. 1 and the discussion
No regulation (fan f (Hz) 0.02 Tmin (°C) 37.6 Tmax (°C) 42.8 DT (K) 5.2 Tave (°C) 40.2
at 12 V) 0.05 37.6 42.4 4.8 40
With regulation circuit f (Hz) 0.02 0.05 Tmin (°C) 45.6 44.4 Tmax (°C) 48 46.8 DT (K) 2.4 2.4 Tave (°C) 46.8 45.6
0.1 37.6 42 4.4 39.8
0.5 37.6 41.6 4 39.6
1 37.6 41.6 4 39.6
1.5 37.6 41.6 4 39.6
2 37.6 41.2 3.6 39.4
4 37.6 40.8 3.2 39.2
10 37.6 40.8 3.2 39.2
0.1 43.2 45.6 2.4 44.4
0.5 42 45.2 4.4 43
1 40.8 45.2 4.4 43
1.5 40.8 45.2 4.4 43
2 40.8 45.2 4.4 43
4 40.8 45.2 4.4 43
10 40.8 45.2 4.4 43
above. It should be noted that in Table 1 the fan bias voltage is allowed to go up to 14 V in the case of the regulation circuit. However, this does not much affect the validity of the results and, more importantly, this corresponds to an approximated regulation law (see [5]). It is expected that an optimised law extraction and implementation does not require voltages in excess of the nominal rating. 3.2. Transient regime In our experiment we could create sinusoidal power profiles with a minimum frequency of 0.02 Hz. Results for constant and regulated fan bias in the transient regime are summarised in Table 1. These well indicate the capability to effectively reduce the amplitude of thermal cycles in the lower frequency range (up to about 1 Hz). The existence of an upper frequency limit for the validity of the regulation approach is expected from the underlying physics and had already been pointed out in [4]. This limit depends on the system characteristics and on the location of temperature regulation: the regulation scheme should be designed to output constant bias for frequencies beyond such value. As can be further inferred from Table 1, in this case the change in average temperature is greater than the reduction in thermal cycle amplitude and so, the benefit for reliability would still need to be demonstrated (see Fig. 1). This is however also partly attributed to the still approximated regulation law: it is expected that the average temperature with regulation can be further decreased by optimising the regulation circuit. 4. Conclusion and discussion
Fig. 6. Photograph of the hardware used to develop the temperature regulation cooling scheme.
This work presents a novel approach for the cooling of power systems. Based on the dynamic active cooling concept, the
A. Castellazzi et al. / Microelectronics Reliability 51 (2011) 1964–1967
methodology enables for a reduction of the amplitude of thermal cycles as compared with standard cooling approaches, which benefits system reliability and overall efficiency. Here, only the temperature regulation scheme has been discussed. Preliminary results for the temperature control concept are presented in [5]. In relation to the results of Fig. 1 and to the underlying failure mechanisms, it is worth making a distinction between temperature changes brought along by power dissipation (self-heating) and changes in ambient temperature (passive cycling). As far as power dissipation is concerned, the present study refers essentially to changes in thermal resistance due to modifications in the solder layer or other interfaces of the power assembly. Though in theory, the developed regulation solution is applicable to all sources of power dissipation, there exists, however, a practical frequency limit up to which the system can respond and up to which the response can be effective. This upper limit is determined by the time constants of the system. Thermal cycling of the bond-wires
1967
is characterised by relatively high frequency components, which cannot be realistically expected to be compensated by the regulation scheme. As far as ambient temperature variations are concerned, the temperature control scheme enables to address both failure mechanisms successfully. References [1] Scheuermann U, Hecht U. Power cycling lifetime of advanced power modules for different temperature swings. In: Proc of PClM, Nuremberg, Germany; 2002. p. 59–64. [2] Held M, Jacob P, Nicoletti G, Scacco P, Poech MH. Fast power cycling test for IGBT modules in traction application. Proc Power Electron Drive Syst 1997:425–30. [3] Ciappa M. Selected failure mechanisms of modern power modules. Microelectron Reliab 2002;42:653–67. [4] Choy WJ, Castellazzi A. Regulated cooling strategy for reduced thermal cycling of power devices and assemblies. In: Proc ESREF, Gaeta, Italy; 2010. [5] Choy WJ, Castellazzi A, Zanchetta P. Adaptive cooling of power modules for reduced power and thermal cycling. In: Proc EPE, Birmingham, UK; 2011.