# An Algorithm Used in a Power Monitor to Mitigate Dark Silicon on VLSI Chip

Zhou Zhao, Ashok Srivastava, and Shaoming Chen Division of Electrical & Computer Engineering Louisiana State University, Baton Rouge, LA 70803, USA {zzhao13, eesriv, schen26}@lsu.edu

Abstract-Data with increasing bandwidth requires future general-purpose as well as application specific microprocessors to improve performance endlessly. Transistor scaling, novel transistor structures, novel state-of-art VLSI design techniques and new computer architectures are the key drivers for boosting power and performance of microprocessors. Unfortunately, the processor cooling technique is unable to keep pace with higher density of transistors and high performance. For appropriate trade-offs between performance and limitation of power dissipation, dark silicon has appeared in the current processors. With the number of transistors increasing in future chips, we could envision that next generation processors might be getting darker and darker. This compromise could reduce multiple-core processors' efficiency. In this paper, power dissipation and circuit optimization are discussed in an attempt to mitigate dark silicon for future processors. A power monitor and its algorithm are proposed mainly to explain how to efficiently regulate voltage and power in the future processors with multiple cores.

Keywords—Dark silicon; Transistor scaling; Power dissipation; Power monitor; Multicore processors

## I. INTRODUCTION

Down scaling of transistors promotes high density of transistors in a microprocessor design. Meanwhile in last 20 years, clock frequency as well as the required LAN bandwidth LAN increased significantly due to the development of advanced communication technology [1]. Dark silicon, which refers to a part of transistors in a chip which drops work frequency and compromises with the limitation of cooling technique, increases in a chip [2]. Dark silicon largely occupies the entire chip and seriously influences work performance of processors, especially in advanced multiple-core designs [3]. It can be anticipated that dark silicon will become larger in chip if there is no implementation of novel processor topology as well as invention of state-of-art circuit. Even worse, dark silicon might result in failure of MOS scaling [4].

To solve this issue, some architecture-based solutions have been proposed. Miller et al. [5] proposed a power switch to deliver more current for single core with the purpose of improving performance at the cost of increasing low power consumption. Chen et al. [6] set power wire to work in two modes including traditional power mode and data mode. This novel topology obviously increases bandwidth of data but does not increase power consumption. For portable devices, Goulding-Hotta, et al. [7] proposed a novel "greendroid" concept to solve dark silicon issue for Android smartphones. Saraju P. Mohanty Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA {saraju.mohanty@unt.edu}

These current solutions are implemented in architecture level based on abstract computer architecture analysis. This work also gives some mathematical analysis from circuit point of view and proposes a feasible solution at the circuit level. Section II shows analysis regarding power dissipation, which is the key factor resulting in dark silicon. Section III proposes an on-chip power monitor and its relative algorithm to regulate work mode of a multiple-core processor facing different power situation. Results are summarized in Section IV.

II. MODELING OF POWER DISSIPATION COMPONENTS

The size of transistor still shrinks due to advanced fabrication technology. Nanometer process makes transistor to switch at higher frequencies. Unfortunately, transistor scaling has negative impact on power dissipation of chip and challenges current cooling techniques. It is to be mentioned that the power dissipation per unit area but not the total power dissipation restricts the development of processors, which brings dark silicon into a chip. Power dissipation per unit area has increased in recent years due to slower reduction of power consumption as compared to reduction in chip area. Meanwhile, current cooling technique has met bottleneck and will not be suitable for future processors [8]. One exception is that liquid cooling technique performs well in the condition of high temperature but the cost is too high to be used widely. Hence, further reduction of power dissipation is needed to mitigate dark silicon.

The problem of static power dissipation is very serious in nanometer technology with low power supply due to leakage current. The quantum tunneling effect mainly leads to a large leakage current [9, 10]. With transistor scaling, quantum tunneling effect would make carriers to lose control under gate voltage. This becomes a serious issue and a main source of leakage current when transistor is off. The leakage current can be expressed as follows [10]:

$$I_{leak} = I_{sub} + I_{ox} = K_1 W e^{-Vth/nV_0} \left(1 - e^{-V/V_0}\right) + K_2 W \left(\frac{V}{T_{ox}}\right) e^{-aT_{ox}/V} (1)$$

The leakage current includes subthreshold leakage and gate oxide leakage. It can be noted that if supply voltage and threshold voltage increase with transistor scaling, leakage current will contribute more to total power dissipation. The thermal voltage,  $V_0$  increases with temperature thus increasing leakage. Thus, if transistor scaling continues and there is no  $V_{Ad}$  I.



Figure 1: The proposed power dissipation monitor.



Fig. 2: The proposed algorithm for dark silicon reduction.

breakthrough in chip cooling techniques, leakage current in VLSI chip will be more significant and even forces more areas of a processor to change to dark silicon.

Besides above static power dissipation, dynamic power dissipation is the other issue. The equation for calculation of dynamic power is shown as follows [11]:

$$P_{dynamic} = \frac{N_{work}}{N_{work} + N_{sleep}} C_{total} V_{dd}^2 f$$
(2)

where  $N_{work}$  and  $N_{sleep}$  represent the number of on transistors and transistors in sleep-mode, respectively. Future CPUs will be operating at higher clock frequencies. But  $V_{dd}$  will not decrease much because of design difficulty at extremely low power. Hence, dynamic power dissipation will not reduce. Equation (2) can be modified as follows:

$$P_{dynamic} = \frac{N_{work}}{N_{work} + N_{sleep}} C_{total} V_{dd} R_{total} f I_{total}$$
(3)

Based on Eqn. (3), it is clearly indicated that reducing current seems to be a deserved method. Increasing total resistance might be a feasible method to reduce current. However, in VLSI design, W/L, which could change resistance directly, has been set to a small value. Therefore, this method is not feasible. Another consideration is the propagation delay. For a CMOS inverter, the delay time,  $t_{pd}$  is the average of  $t_{pLH}$  and  $t_{pHL}$  which is expressed as follows:

$$t_{plH} = t_{plH} = 2 \frac{C_L}{\beta(V_{DD} - V_{th})} \left[ \frac{V_{th} - 0.1V_{DD}}{V_{DD} - V_{th}} + \frac{1}{2} \ln\{\frac{19V_{DD} - 20V_{th}}{V_{DD}}\}\right]$$
(4)

where  $V_{th}$  is threshold voltage and it is assumed to be same for both n- and p-MOSFETs. Symmetrical transconductance parameter,  $\beta$  is also assumed same for both devices for design. It can be seen that decreasing current will seriously restrict bandwidth for processor. However, simultaneous decrease of supply voltage and threshold voltage might be a potential method to reduce power dissipation and avoid reducing bandwidth.

The third contribution to total power dissipation results from interconnects wires, which cannot be ignored. To sum up, total power consumption can be shown as follows:

$$P_{total} = \frac{N_{work}}{N_{work} + N_{sleep}} C_{total} V_{dd}^2 f + \sum I_{leak} * V_{dd} + \sum R_{wire} I_{wire}^2$$
(5)

Eqn. (5) shows that besides inventing some energysaving underlying circuits, such as, the variety of gate circuits, to reduce power dissipation and thus solve dark silicon issue, many parameters can be optimized and looked as potential breakthrough facing many tradeoffs between performance and power dissipation.

# III. PROPOSED POWER REGULATOR FOR THE PROCESSORS

In [12], the importance of voltage regulator brings to the next generation CPU. The work in [13] gives a multiple threshold voltage regulator to reduce static power dissipation.

A novel power monitor, including a feedback system, to monitor and control work mode of a CPU for different total current and to predict temperature, is described as follows. The essence of the proposed design is using a feedback system to monitor and modify work mode of a CPU. Figure 1 shows the schematic. It can be seen that this feedback system is not very complex compared to a single core. This simple feedback system could make single core to have three working states, which are high performance mode, energy saving mode, and sleep mode.

The first mode is the high performance mode in which the core works with the high clock frequency as well as full supply voltage. The second mode is used when the single core either meets the limitation of power dissipation or achieves high temperature which could seriously damage the chip. The last mode is for the multiple core system, in which both work core and sleep core might exist at the same time.  $M_{hp}$ ,  $M_{es}$  and  $M_{sp}$  represent three switches which connect to high performance mode, energy saving mode and sleep mode, respectively. The proposed power monitor can be used to let chip correctly switch between the first two modes, and control signal out of single core can determine whether the core works in the sleep mode.

The proposed design is used for letting a chip avoid bottleneck power dissipation which leads to and work in extremely high temperatures. Therefore, first we introduce an equation which could estimate maximum power dissipation of a chip for a given chip area, fabrication process, and cooling technique as follows:

$$P_{\max} \approx P_{unit-area} S \alpha_1 \tag{6}$$

 $P_{unit-area}$  is determined by the fabrication process and cooling technique. S is the total area of a chip. The correction factor is  $\alpha_1$  which has a value of less than 1 to prevent the chip from crossing maximum power dissipation. Then using Eqn. (5) and (6), maximum current is estimated as follows:

$$I_{\max} \approx \frac{N_{work}}{N_{work} + N_{sleep}} C_{total} V_{dd} f + \sum I_{leak} + \frac{V_{dd}}{R_{wire}}$$
(7)

where, C<sub>total</sub> and I<sub>leak</sub> are described by the following:

$$C_{total} \approx \alpha_2 WLC_{ox} N_{total} \tag{8}$$

$$\sum I_{leak} \approx I_s e^{\frac{qV_{gs}}{nkT}} \left(1 - e^{-\frac{qV_{ds}}{kT}}\right) N_{sleep} \alpha_3 \tag{9}$$

In Eqn. (8),  $\alpha_2$  is larger than 1 and represents that capacitor includes gate capacitor and bulk capacitor.  $\alpha_3$  is a correction parameter to account for several off-transistors in a loop with only one value of current. So  $\alpha_3$  should be lower than 1. The maximum current can be calculated from Eqn. (7), (8), (9) with some approximations. From the proposed structure, M1 should work in saturation region due to its diode configuration which

gives gate-source voltage to compare with the reference voltage,  $V_{ref}$ . Using this maximum value of current, Vref can be calculated as follows:

$$V_{ref} = \sqrt{\frac{2I_{max}}{\mu C_{ox}} \frac{W_1}{L_1}} + V_{th}$$
(10)

For the actual current, the voltage to be compared can be obtained from the following expression:

$$V_{comp} = \sqrt{\frac{2I_{total}}{\mu C_{ox} \frac{W_1}{L_1}}} + V_{th}$$
(11)

If output of the comparator is low, which means total current is larger than the maximum current which is restricted by the fabrication process and cooling technique. The feedback connects to energy saving mode in which the core works in relative low frequency with a small supply voltage. In this case, due to a small supply voltage, threshold voltage of transistor in the core should be reduced in order to avoid serious frequency drop which is the essence of dark silicon. Making tradeoff between frequency, design complexity and static power consumption, only  $M_{\rm es}$  is designed with low threshold voltage.

If output of comparator is high, which means the core works in a safe state with an acceptable power dissipation. In this case, the core connects to the high performance mode. However, even if the core always keep distance from the limitation of power consumption, heat the core generates will be increasing all the time and the temperature of chip will go high. This high temperature will be negative for the performance of the chip. Therefore, to avoid this phenomenon, a temperature calibration part is added. CLK<sub>t</sub> is the key part to force the core to work in energy saving mode if the core has been working in high performance mode for a long time. If CLK<sub>t</sub> is high, the core is normally controlled by the feedback system as previously explained. When CLK<sub>t</sub> is low, and the core still works in high performance mode, that implies the chip would be very hot, the additional logic circuit makes the core switch to energy saving mode to reduce temperature. Note that the three-state buffer connected to energy saving mode can't generate wrong control for the core. Determining time of CLK<sub>t</sub> being high or low is very difficult since temperature changing in a VLSI chip relies on many complicated factors. Monitoring temperature of a real CPU is a feasible method to determine  $T_{high}$  and  $T_{low}$ .

The last mode is sleep mode which is used for future CPUs with multiple core which could work asynchronously. To reduce leakage current, threshold voltage of  $M_{sp}$  should be set to a high value. This mode is controlled by the control line out of one core. To sum up, Fig. 2 shows the total process of the proposed algorithm. The first step is to calculate  $I_{max}$  for a given power consumption. Then compare  $I_{max}$  and actual current  $I_{total}$ . If output of comparator is 0, the core will work in

energy saving mode and if output of comparator is 1, the core will safely stay in high performance mode. Further, if the core works in high performance mode for a long time, temperature



calibration, which is  $CLK_t$  will change to 0. In this case, the core will be forced to work in energy saving mode to reduce temperature of the chip. In next cycle of  $CLK_t$ , because  $CLK_t$  returns to 1, judging what mode the core works, will depend on the comparator again. Therefore, this feedback system uses few transistors, compared to the core, to achieve switching work mode of the core according to different power dissipation and temperature.

To verify the function of the proposed power monitor, an RC model of a microprocessor with 4 cores is needed to make. Using the parameters of various Intel processors [14], 16 nm predictive transistor model [15] and the trend of processors' performance, the virtual processor in this verification is determined as an ultra-low power chip used in portable devices with 0.9V voltage supply, 0.6 mA maximum current, 3GHz clock frequency, 90 million transistors. Then use equation (8) and the equation used for calculating the resistance of a transistor working in linear region, the entire capacitance and resistance of the virtual processor are estimated as 1020 nF and 1500  $\Omega$ , respectively. Figure 3 shows the results of mode switching, which is ideally defined as 1ns, high performance mode, energy saving mode and sleep mode last around 0.2 ns, 0.2ns, and 0.5ns, respectively. It can be seen from the figure that the entire switching process is correct. The problem with this power monitor is that switch delay is large and current loss is over 0.1mA. Figure 4 shows the Fourier transform. It can be seen that in the proposed design, the maximum frequency of switching work mode is 13GHz, which is way larger than the clock frequency regulated before.

### IV. CONCLUSIONS AND FUTURE RESEARCH

In this paper, dark silicon issue, power dissipation and its impact on chip are analyzed. From the analysis, potential method to reduce power dissipation and mitigate dark silicon on a chip is proposed. At the circuit level, a power monitor and its algorithm are proposed at very low cost compared to a complex core with numerous transistors. Due to time limitation, we only substituted microprocessor by a RC circuit to demonstrate usefulness of our proposed method. The simulation at an abstract level which lacks simulation of a real core cannot safely demonstrate that our idea is correct and feasible. Our future work would include real core also include more circuits to a microprocessor in an attempt to reduce power dissipation without large frequency drop.

#### ACKNOWLEDGMENT

Part of the work is supported under NSF grant 1422408. Authors thankfully acknowledge help from Dr. Lu Peng on the problem of addressing dark silicon and finding solution.

#### REFERENCES

- Danowitz, Andrew, et al. "CPU DB: recording microprocessor history", *Communications of the ACM*, 55.4 (2012): 55-63.
- [2] Goulding-Hotta, Nathan, et al. "The greendroid mobile application processor: An architecture for silicon's dark future", *IEEE Micro*, 31.2 (2011): 86-95.
- [3] Esmaeilzadeh, Hadi, et al. "Dark silicon and the end of multicore scaling", in *Proceedings of the 38th IEEE Annual International Symposium on. Computer Architecture (ISCA)*, 2011, 2011.
- [4] Taylor, Michael B. "A landscape of the new dark silicon design regime", *IEEE Micro*, 33.5 (2013): 8-19.
- [5] Miller, Timothy N., et al. "Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips", in *Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture*, 2012.
- [6] Chen, Shaoming, et al. "Increasing off-chip bandwidth in multi-core processors with switchable pins", in *Proceeding of the 41st Annual International Symposium on Computer architecuture*, 2014.
- [7] Goulding-Hotta, Nathan, et al. "Greendroid: An architecture for the dark silicon age", in *Proceedings of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC)*, 2012.
- [8] Hardavellas, Nikos, et al. "Toward dark silicon in servers", *IEEE Micro*, 31.EPFL-ARTICLE-168285 (2011): 6-15.
- [9] Abbas, Zia, and Mauro Olivieri. "Impact of technology scaling on leakage power in nano-scale bulk CMOS digital standard cells", *Microelectronics Journal*, 45.2 (2014): 179-195.
- [10] S. P. Mohanty, Nanoelectronic Mixed-Signal System Design, McGraw-Hill, 2015, ISBN: 978-0071825719 and 0071825711.
- [11] S. P. Mohanty, N. Ranganathan, and S. K. Chappidi, "Peak Power Minimization through Datapath Scheduling", in *Proceedings of the IEEE CS Annual Symposium on VLSI (ISVLSI)*, pp.121-126, 2003.
- [12] López, Toni, Reinhold Elferich, and Eduard Alarcón, "Voltage regulators for next generation microprocessors", Springer, 2010.
- [13] Kao, James, Siva Narendra, and Anantha Chandrakasan. "Subthreshold leakage modeling and reduction techniques" in *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, 2002.
- [14] http://ark.intel.com/#@Processors
- [15] http://ptm.asu.edu/modelcard/LP/16nm\_LP.pm