# ILP Models for Simultaneous Energy and Transient Power Minimization during Behavioral Synthesis

Saraju P. Mohanty Dept. of Comp. Science and Eng. Univ. of North Texas, TX 76203. smohanty@cs.unt.edu N. Ranganathan and Sunil K. Chappidi Dept. of Comp. Science and Eng. Univ. of South Florida, FL 33620. ranganat@csee.usf.edu

In low-power design for battery driven portable applications, the reduction of peak power, peak power differential, cycle difference power, average power and energy are equally important. These are different forms of dynamic power dissipation of a CMOS circuit, which is predominant compared to static power dissipation for higher switching activity. The peak power, the cycle difference power, and the peak power differential drive the transient characteristic of a CMOS circuit. In this paper, we propose an ILP-based framework for the reduction of energy and transient power through datapath scheduling during behavioral synthesis. A new metric called "modified cycle power function" (CPF\*) is defined that captures the above power characteristics and facilitates integer linear programming formulations. The ILP-based datapath scheduling schemes with CPF\* as objective function are developed assuming three modes of datapath operation, such as, single supply voltage and single frequency (SVSF), multiple supply voltages and dynamic frequency clocking (MVDFC), and multiple supply voltages and multicycling (MVMC). We conducted experiments on selected high-level synthesis benchmark circuits for various resource constraints and estimated power, energy and energy delay product for each of them. Experimental results show that significant reductions in power, energy and energy delay product can be obtained.

Categories and Subject Descriptors: B.5.1 [**Register-Transfer-Level Implementation**]: Datapath Design; B.5.2 [**Register-Transfer-Level Implementation**]: Automatic Synthesis, Optimization; G.1.6 [**Numerical Analysis**]: Optimization, Integer Programming

General Terms: Algorithms, Performance, Design, Reliability, Linear Modeling of Non-linearity, Scheduling Additional Key Words and Phrases: peak power, cycle difference power, peak power differential, average power, multiple supply voltages, dynamic frequency clocking, multicycling, datapath scheduling

#### 1. INTRODUCTION

The proliferation of portable systems and mobile computing platforms has increased the need for the design of low power consuming integrated circuits. The increase in chip density and clock frequencies due to technology advances has made low power design a critical issue. Low power design is further driven by several other factors such as thermal considerations and environmental concerns. In battery driven portable systems, the peak power, cycle difference power, peak power differential, average power and energy are equally critical design constraints.

This research was carried out at the Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620.

Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.

© 2005 ACM 1084-4309/2005/0400-0111 \$5.00

The reduction of peak power consumption is essential due to following reasons [Singh et al. 1995]: (i) to maintain supply voltage levels, (ii) to increase reliability, (iii) to use smaller heat sinks, and (iv) to make packaging cheaper. The peak power of a circuit is the maximum power consumption of the circuit at any instance during its execution. High peak power can affect the supply voltage levels. The large current flow causes high IR drop in the power line, which leads to the reduction of the supply voltage levels at different parts of the circuit. High current flow can reduce reliability because of hot electron effects and high current density. The hot electrons may lead to electrostatic discharge and runaway current failures, and high current density can cause electromigration failures. It has been observed that the mean time to failure (MTF) of a CMOS circuit is inversely proportional to the current density (or power density). Large power dissipation results in the need for expensive heat dissipation mechanisms to maintain the operating temperatures of the ICs within their tolerance limits.

The cycle difference power and the peak power differential need to be reduced for the following reasons : (i) to reduce power supply noise, (ii) to reduce cross talk and electromagnetic noise, (iii) to increase battery efficiency and (iv) to increase reliability. Power fluctuation leads to larger  $\frac{di}{dt}$  causing power supply noise, (similar to IR drop), because of self inductance of power supply lines. Crosstalk is the noise voltage induced in signal line due to the switching in the adjacent signal lines [Singh et al. 1995]. The voltage induced by the mutual inductance is expressed as  $L\frac{di}{dt}$  and that induced by the mutual capacitance as  $C\frac{dv}{dt}$ . If the power fluctuation is high, then large  $\frac{di}{dt}$  and  $\frac{dv}{dt}$  can introduce significant noise in the signal lines. As the power fluctuation increases, it reduces the electrochemical conversion and hence there is decrease in battery life. High current peaks (power fluctuation) in short time spans can cause high heat dissipation in a localised area of silicon die which may lead to permanent failure of the integrated circuit.

Energy and average power reduction is essential for the following reasons : (i) to increase battery life time, (ii) to enhance noise margin, (iii) to reduce cooling and energy costs, (iv) to reduce use of natural resources, and (v) to increase system reliability. The battery life time is determined by the Ah (ampere hour) rating of the battery. If the average power (and/or energy) consumption is high, this will lead to reduced battery life time. The reduction of average power is essential to enhance noise margin (to decrease functional failure). The cost of packaging and cooling is determined by average current flow and hence, the average power and energy. Due to the proliferation of the use of computers, the large amount of energy consumption leads to the need for more power generation, which could lead to environmental concerns. In integrated circuits, if the average power consumption is high, the operating temperature of the chip will increase leading to failures. It is estimated that for each  $10^{\circ}C$  increase in the operating temperature, the failure rates of the component is roughly doubled.

Power and energy reduction can be achieved at different levels, such as the architecture, algorithm, behavioral, register-transfer, logic and transistor levels. In this work, we focus on low power datapath scheduling at the behavioral level. We propose ILP based schemes for energy and transient power minimization. The first scheme uses multiple supply voltages and dynamic frequency clocking (MVDFC) and the second scheme is based on the use of multiple supply voltages and multicycling (MVMC). We also consider traditional scheduling using single supply voltage and single frequency (SVSF) to facilitate comparison among all the schemes. In multiple supply voltage scheme, the functional units can

be operated at different supply voltages. The energy reduction due to the use of multiple voltages is often accompanied by the degradation of performance because of the increase in critical path delay. This degradation in performance can be compensated by the use of techniques such as dynamic frequency clocking [Mohanty and Ranganathan 2003b; Mohanty et al. 2002], multicycling and chaining [Park and Choi 2001], and variable latency components [Benini et al. 1998; Benini et al. 1999]. In the case of multicycling, an operation is scheduled in more than one consecutive control step and in addition, each control step is of equal length. In dynamic frequency clocking, the clock frequency may be varied on-the-fly depending on the computations being performed. In this case, an operation is scheduled in a single control step and the control steps of a schedule may vary in length.

The contributions of this paper are multifold:

- —a method that facilitates the simultaneous reduction of several dynamic power components such as average power, peak power, cycle difference power, peak power differential, and energy.
- —a new cycle power function called "modified cycle power function CPF\*" is defined so as to capture the different dynamic power components and to facilitate integer linear programming (ILP) formulation.
- —the exploration of design alternatives due to the combined use of dynamic frequency clocking and multiple supply voltages.
- —ILP formulations based on transforming the nonlinear objective function CPF\* to linear frameworks and under different constraints.
- -a set of datapath scheduling algorithms are developed for different modes of operations

The rest of the paper is organized as follows. The various related works are briefly visited in the next section. The formulation of the the modified cycle power function CPF<sup>\*</sup> is presented followed by the issues in transforming the function into a linear objective function. The ILP formulations and the datapath scheduling algorithms are described in Sections 5 and 6, the experimental results in Section 7 followed by the conclusions.

### 2. RELATED WORK

Very few works address the reduction of peak power and peak power differential during behavioral synthesis. On the other hand, quite many works have appeared in the literature addressing average power and energy reduction during datapath synthesis. We will briefly discuss the above works followed by those related to dynamic frequency clocking.

In the work reported in [Martin and Knight 1996a], the peak power reduction is achieved through simultaneous assignment and scheduling. Specifically, the simultaneous use of SPICE and behavioral synthesis tools is described. Genetic algorithms are used for optimization of average and peak power. In [Shiue 2000], ILP based scheduling and force directed scheduling are proposed to minimize peak power under latency constraints. ILP based models to minimize both peak power and peak area are proposed in [Shiue and Chakrabarti 2000a] for latency constraint scheduling. In [Shiue et al. 2000], the authors describe a time constrained scheduling algorithm for real time systems using a modified ILP

model that minimizes both peak power and number of resources. In [Raghunathan et al. 2001], the authors propose the use of data monitor operations for simultaneous peak power reduction and peak power differential. In [Mohanty and Ranganathan 2003a], a heuristic based scheme is proposed that minimizes peak power, peak power differential, average power and energy. In [Mohanty et al. 2003], ILP based datapath scheduling schemes are described for peak power minimization under resource constraints.

Several research works, such as, [Johnson and Roy 1997; Chang and Pedram 1997; Shiue and Chakrabarti 2000b] have used multiple supply voltage method for energy reduction during datapath synthesis. The above scheduling techniques consider various concepts such as multiple supply voltages, voltage scaling, capacitance reduction, and switching activity reduction for minimizing either total energy or average power, but not both at the same time. However, these works have not considered dynamic frequency clocking or transient power reduction.

Many works considering the use of variable latency, dynamic frequency and multiple frequencies have appeared in the literature. The usage of "telescopic" units to improve the throughput of digital systems is introduced in [Benini et al. 1998; Benini et al. 1999]. The telescopic units complete execution in a variable number of clock cycles depending on the input data. The concept of dynamic frequency clocking is introduced and a SIMD linear array image processor design is discussed in [Ranganathan et al. 1996]. A low power design using multiple clocking scheme is presented in [Papachristou et al. 1999]. The usage of frequency scaling in an MPEG2 decoder design is discussed in [Kim and Chae 1996]. A time constrained heuristic scheduling algorithm is discussed in [Mohanty et al. 2002] that uses both frequency and voltage scaling. In [Mohanty and Ranganathan 2003b], heuristic algorithms are proposed for energy minimization using multiple voltages and dynamic frequency clocking during datapath scheduling. Several system-level approaches [Burd et al. 2000; Hsu et al. 2000; Pouwelse et al. 2001] have been investigated towards reducing power consumption in both general purpose and special purpose processors using simultaneous voltage and frequency scaling.

Most works discussed above address either average power, energy or peak power, but not all the forms of dynamic power consumption together. In this paper, we describe an ILP-based framework for the simultaneous minimization of energy, average power, peak power, cycle difference power, and peak power differential. A parameter called modified cycle power function (CPF<sup>\*</sup>) is defined as an equally weighted sum of the normalized mean cycle power and the normalized mean cycle difference power. Minimizing CPF\* using multiple supply voltages and dynamic frequency clocking or multicycling results in the reduction of both energy and transient power. The cycle difference power is modeled as the absolute deviation from the mean cycle power. The switching activity of the functional units is characterized through simulations and incorporated into the model. The datapath scheduling algorithms to minimize the CPF\* using dynamic frequency clocking or multicycling and multiple supply voltages are proposed. The algorithm assumes different types and numbers of resources (such as, multipliers and ALUs) at different operating voltages and the number of allowable operating frequencies as resource constraints and attempts to minimize CPF\*. The scheduling algorithm generates a parameter called *cycle frequency index*, denoted as  $cfi_c$  for control step c to be stored in the controller. This parameter serves as the clock dividing factor for the dynamic clocking unit (DCU) which is used to generate different frequencies on the fly.

ILP models for simultaneous energy and transient power minimization . 115

# 3. MODIFIED CYCLE POWER FUNCTION

In this section, we define a metric called modified cycle power function (CPF<sup>\*</sup>) that captures the average and transient power characteristics of a datapath circuit. It may be noted that though CPF<sup>\*</sup> captures the average and transient power characteristics, its minimization using multiple voltages leads to reduction of energy as well. Moreover, while defining the CPF<sup>\*</sup> which is a nonlinear function, we take into the account the fact that the complexity ILP formulations to be used for its minimization is reduced. We assume that the datapath is represented as a sequencing data flow graph (DFG). The definitions and notations needed for the model are given in Table I.

|                | Table I. Notations and Definitions used in CPF* Modeling                                  |
|----------------|-------------------------------------------------------------------------------------------|
| N              | : total number of control steps in the DFG                                                |
| 0              | : total number of operations in the DFG                                                   |
| c              | : a control step or a clock cycle in DFG                                                  |
| $o_i$          | : any operation, where $1 \le i \le O$ ,                                                  |
| $P_c$          | : the total power consumption of all functional units active                              |
|                | in control step $c$ (cycle power consumption)                                             |
| $P_{peak}$     | : peak power consumption for the DFG = $(max(P_c)_{\forall c})$                           |
| P              | : mean power consumption of the DFG (average $P_c$ over all control steps)                |
| $P_{norm}$     | : normalised mean power consumption of the DFG                                            |
| $DP_c$         | : cycle difference power (for cycle <i>c</i> ; a measure of cycle power fluctuation)      |
| $DP_{peak}$    | : peak differential power consumption for the DFG = $(max(DP_c)_{\forall c})$             |
| DP             | : mean of the cycle difference powers for all control steps in DFG                        |
| $DP_{norm}$    | : normalised mean of the cycle difference powers for all steps in DFG                     |
| CPF*           | : modified cycle power function                                                           |
| $FU_{k,v}$     | : any functional unit of type $k$ operating at voltage level $v$                          |
| $FU_i$         | : any functional unit $FU_{k,v}$ needed by $o_i$ for its execution ( $o_i \in FU_{k,v}$ ) |
| $FU_{i,c}$     | : any functional unit $FU_i$ active in control step $c$                                   |
| $R_c$          | : total number of functional units active in step $c$                                     |
|                | (same as the number of operations scheduled in $c$ )                                      |
| $\alpha_{i,c}$ | : switching activity of resource $FU_{i,c}$                                               |
| $V_{i,c}$      | : operating voltage of resource $FU_{i,c}$                                                |
| $C_{i,c}$      | : load capacitance of resource $FU_{i,c}$                                                 |
| $f_c$          | : frequency of control step $c$                                                           |
| $f_{clk}$      | : operating clock frequency for MVMC or SVSF scheme                                       |
| V              | : operating supply voltage for MVMC or SVSF scheme                                        |

We would like to define CPF<sup>\*</sup> as a multicost objective function such that minimizing it should facilitate the minimization of the multiple objectives, the different dynamic power components. The simultaneous optimization of different inter-related parameters is difficult. In general, if we have to optimize four parameters, say, a, b, c and d, the simplest approach is to form the objective function as,  $obj_1 = a + b + c + d$ . This function will not be effective for all parameters if the quantitative differences among the values of the parameters are large. So, one way of handling this is to use different weighting functions, and reformulate the objective function as,  $obj_2 = \alpha * a + \beta * b + \gamma * c + \theta * d$ , where the weighting functions can be used for tuning the objective function. In this scenario, there are eight different parameters to be handled. Another way of forming the objective function is  $obj_3 = \frac{a}{b} + \frac{c}{d}$ . In this case, we need to deal with only four parameters, but the question arises as to how to handle such a fractional function? Say, we want to minimize

 $obj = \frac{x}{y}$ , then minimization of both x and y will not minimize obj. We need to fix y and minimize obj, which in turn will minimize x. In other words, we can place a constraint on y and minimize (obj + y) to achieve the simultaneous minimization of x and y. Of course, in such a scenario minimization of y is not optimal. In the proposed CPF\*, a fractional form objective function, we place the denominator as a constraint and minimize the term. Moreover, the four parameters of CPF\*, are P,  $P_{peak}$ , DP, and  $DP_{peak}$ . Thus, CPF\* captures the average and transient characteristics of a datapath circuit. Its minimization using ILP results in optimal minimization of average power and cycle difference power, and suboptimal minimization of peak power and peak power differential. Further, its minimization in the multiple supply voltage and dynamic frequency framework results in the minimization of energy and energy delay product as well.

In general, for a set  $(x_1, x_2, ..., x_n)$  of n observations from a given distribution, the sample mean (which is an unbiased estimator for the population mean,  $\mu$ ) is  $m = \frac{1}{n} \sum_{i=1}^{n} x_i$ . The absolute deviation of the observations is defined as  $\Delta x_i = |x_i - m|$ . The mean deviation of the observations is given by  $MD = \frac{1}{n} \sum_{i=1}^{n} |x_i - m|$ . In a datapath circuit represented as a DFG, the cycle power consumption values are represented as  $(P_1, P_2, ..., P_N)$  for N possible control steps, in other words,  $P_c$  corresponds to  $x_i$ . The mean cycle power P is an unbiased estimate of the average power consumption of the DFG, is effectively the sample mean of N observations. Similarly, the cycle difference power  $DP_c$  is the the absolute deviation of cycle power  $P_c$  from the mean cycle power P (corresponds to  $\Delta x_i$ ). Finally, the mean cycle difference power DP is modeled as mean deviation of the cycle power  $P_c$  (which is in fact the MD is statistical sense).

The power consumption for any control step c is  $P_c$ . This is the total power consumption of all the functional units active in control step c and also includes the power consumption of the level converters. If a current resource is driven by a resource operating at lower voltage, then level converters are needed as additional resources operating in the current cycle. The peak power consumption of the DFG,  $P_{peak}$  is the maximum power consumption over all the control steps or clock cycles. The mean cycle power (P) which is an unbiased estimate of the average power consumption of the DFG is the mean of  $P_c$ . The cycle difference power  $(DP_c)$  for any control step can be defined as the absolute deviation of the cycle power from the mean cycle power consumption of the DFG. This is a measure of the cycle power fluctuation of the DFG. The peak differential power  $(DP_{peak})$ , which characterises the maximum power fluctuation of DFG over all control steps is the maximum of  $DP_c$  over all control steps. The mean cycle difference power (DP) is calculated as sample mean of  $DP_c$ . This is a measure of the power spread or distribution of the cycle power over all control steps of the DFG. Based on the above discussion, we have the various definitions as follows.

$$P_{peak} = max(P_c)_{\forall c=1,2,...N}$$

$$P = \frac{1}{N} \sum_{c=1}^{N} P_c$$

$$DP_c = |P - P_c|$$

$$DP_{peak} = max(|P - P_c|)_{\forall c=1,2,...N}$$

$$DP = \frac{1}{N} \sum_{c=1}^{N} DP_c$$
(1)

Now,  $CPF^*$  can be defined as the sum of the following components: the normalized mean cycle power and the normalized mean cycle difference power. The normalized mean cycle power  $(P_{norm})$  is the mean cycle power (P) normalized with the peak power consumption  $(P_{peak})$  of the DFG. Similarly, the normalized mean cycle difference power  $(DP_{norm})$  is the mean cycle difference power (DP), normalised with the peak differential power  $(DP_{peak})$ . Thus, CPF<sup>\*</sup> is a nonlinear fractional function  $\frac{P}{P_{peak}} + \frac{DP}{DP_{peak}}$ , with absolute function in numerator and denominator. We are aiming at formulating ILP-based models for its minimization, which will be of very large complexity for such a function. Particularly, the presence of two different forms of nonlinearity, such as absolute function and fraction form, and two different denominators make the ILP-formulation cumbersome. However, for a DFG representing a datapath circuit, we deal with optimization of positive power values, so we can simplify the function to certain extent to make ILP formulations easy. It is known that the denominator parameters,  $P_{peak}$  equals to  $max(P_c)_{\forall c}$  and the  $DP_{peak}$  equals to  $max(|P - P_c|)_{\forall c}$ . In the DFG for a datapath circuit, P is an positive entity, and the worst case values of  $P_c$  is zero or  $P_{peak}$ , and the average power P is upper bounded by  $P_{peak}$ . So, the maximum possible value of  $max(|P-P_c|)_{\forall c}$  is  $P_{peak}$ ; in other words,  $DP_{peak}$  is upper bounded by  $P_{peak}$ . Thus, we can normalize DP with the peak power consumption  $P_{peak}$  instead of  $DP_{peak}$  to reduce the complexity of ILP formulation.

The modified cycle power function CPF<sup>\*</sup> that is modeled as an equally weighted sum of the normalized mean cycle power  $(P_{norm})$  and the modified normalized mean cycle difference power  $(DP_{norm})$  is given below.

\_\_\_\_\_

$$CPF^{*}(P_{norm}, DP_{norm}) = P_{norm} + DP_{norm}$$

$$= \frac{P}{P_{peak}} + \frac{DP}{P_{peak}}$$

$$= \frac{P + DP}{P_{peak}}$$

$$= \frac{\frac{1}{N} \sum_{c=1}^{N} P_{c} + \frac{1}{N} \sum_{c=1}^{N} |P - P_{c}|}{P_{peak}}$$
(2)

It may be noted that  $P_{norm}$  and  $DP_{norm}$  are unitless quantities in the range [0,1]. Thus, CPF\* has a value in the range [0,2]. The above function (Eqn. 2) can serve as the objective function for low power datapath scheduling. The minimization of this objective function using multiple supply voltages, dynamic frequency clocking and multicycling can reduce various forms of dynamic power, energy and energy delay product either optimally or suboptimally. From Eqn. 2, we observe the following about CPF\* : it is a non-linear function having two nonlinearity, fractional form and absolute function, and has a single denominator.

The power consumption for any control step c is given by Eqn. 3 as follows.

$$P_{c} = \sum_{i=1}^{R_{c}} \alpha_{i,c} C_{i,c} V_{i,c}^{2} f_{c}$$
(3)

This power models uses generic parameters, such as  $\alpha_{i,c}$ ,  $C_{i,c}$ ,  $V_{i,c}$  and  $f_c$ . The model can accomodate both the look-up table based energy (power) models and energy (power) macro-models. The generic model can also help in easy integration of power model in behavioral synthesis tool that uses both a behavioral power estimator and a datapath scheduler. Using the dynamic energy model proposed in [Chang and Pedram 1997], the effective

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, April 2005.

switching capacitance can be expressed as,

$$\alpha_i C_i = C_{sw_i}(\alpha_i^{\ 1}, \alpha_i^{\ 2}) \tag{4}$$

Here,  $\alpha_i$  and  $C_i$  are the parameters corresponding to the functional unit  $FU_i$  as defined before.  $C_{swi}$  is a measure of the effective switching capacitance of the functional unit  $FU_i$ , which is a function of  $\alpha_i^1$  and  $\alpha_i^2$ ;  $\alpha_i^1$  and  $\alpha_i^2$  are the average switching activities on the first and second input operands of resource  $FU_i$ . It may be noted that in the above switching model, (in Eqn. 4) the input pattern dependencies can be handled. Moreover, the generic power model can be easily tuned to handle any of the modes of datapath circuit operation, such as, SVSF, MVDFC and MVMC. Thus, we write the cycle power consumption of any control step of a DFG as follows.

$$P_c = \sum_{i=1}^{R_c} C_{swi,c} V_{i,c}^2 f_c$$
(5)

The notation  $C_{swi,c}$  represents  $C_{swi}$  for the functional unit  $FU_i$  active in control step c.  $\alpha_i{}^1$  and  $\alpha_i{}^2$  are estimated using behavioral simulation of a DFG with a set of input vectors [Landman and Rabaey 1995; Satyanarayan and Parhi 2000; Ramprasad et al. 1997]. A look-up table is constructed that stores the  $C_{sw}$  values for ( $\alpha^1$  and  $\alpha^2$ ) combinations for different types of functional units, such as multipliers and ALUs. We use interpolation to find the  $C_{sw}$  values for the ( $\alpha^1$  and  $\alpha^2$ ) combinations that are not available in the look-up table.

We consider three modes of datapath operation, such as MVDFC, MVMC and SVSF. It may be noted that for the SVSF operation mode,  $V_{i,c}$  and  $f_c$  is the same for all c, whereas for MVMC operation mode  $f_c$  is same for all c. Using Eqn. 5, 2, and 1, we can write CPF<sup>\*</sup> for various modes of datapath operation as follows. Let us assume that V is the operating voltage for SVSF mode and  $f_{clk}$  is the operating frequency for MVMC or SVSF mode of datapath operation.

$$CPF-MVDFC^{*} = \frac{\frac{1}{N} \sum_{c=1}^{N} \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{c}}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{c} \right)_{\forall c}} + \frac{\frac{1}{N} \sum_{c=1}^{N} \left( \left| \frac{1}{N} \sum_{c=1}^{N} \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{c} \right) - \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{c} \right| \right)}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{c} \right)_{\forall c}}$$

$$CPF-MVMC^{*} = \frac{\frac{1}{N} \sum_{c=1}^{R} \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{clk}}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{clk} \right)_{\forall c}} + \frac{\frac{1}{N} \sum_{c=1}^{N} \left( \left| \frac{1}{N} \sum_{c=1}^{N} \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{clk} \right) - \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{clk} \right| \right)}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V_{i,c}^{2} f_{clk} \right)_{\forall c}}$$
(6)

$$CPF-SVSF^{*} = \frac{\frac{1}{N} \sum_{c=1}^{N} \sum_{i=1}^{R_{c}} C_{swi,c} V^{2} f_{clk}}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V^{2} f_{clk} \right)_{\forall c}} \\ + \frac{\frac{1}{N} \sum_{c=1}^{N} \left( \left| \frac{1}{N} \sum_{c=1}^{N} \left( \sum_{i=1}^{R_{c}} C_{swi,c} V^{2} f_{clk} \right) - \sum_{i=1}^{R_{c}} C_{swi,c} V^{2} f_{clk} \right| \right)}{max \left( \sum_{i=1}^{R_{c}} C_{swi,c} V^{2} f_{clk} \right)_{\forall c}}$$

The above equations can be used as the objective functions for ILP formulations and con-ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, April 2005. sequently, for the datapath scheduling algorithms. We develop scheduling algorithms that accept, an unscheduled DFG, the resource/time constraints, switching activity information, load capacitance, voltage levels and the number of allowable frequency levels as input parameters.

#### 4. MODELING OF NON-LINEARITIES

The "modified cycle power function"  $CPF^*$  (Eqn. 6) discussed in the previous section, is a *non-linear function*. The nonlinearity is because of the *absolute function* (*abs* or | |) and also because of the *fractional form* of the function itself. The ILP formulations need to handle these two forms of non-linearity. We first address the transformations required to derive linear models of the nonlinear functions. Let us represent the general linear programming model as follows [Rao 1996]:

$$\begin{array}{ll} \text{Minimize}: & \sum_{j} c_{j} * x_{j} \\ \text{Subject to}: & \sum_{j} a_{ij} * x_{j} \leq b_{i}, \forall i; \quad x_{j} \geq 0, \forall j \end{array}$$
(7)

where,  $c_j$ ,  $a_{ij}$ ,  $b_i$  are known constants and  $x_j$  are the decision variables.

# 4.1 LP Formulation Involving Sum of Absolute Deviations

The general form of this type of programming can be represented as given below [Panik 1996; McCarl and Spreen 1997].

$$\begin{array}{ll} \text{Minimize}: & \sum_{i} |y_{i}| \\ \text{Subject to}: & y_{i} + \sum_{j} a_{ij} * x_{j} \leq b_{i}, \forall i; \quad x_{j} \geq 0, \forall j \end{array}$$

$$\tag{8}$$

where,  $y_i$ , is the deviation between the prediction and observation. The  $|y_i|$  is non-linear because of absolute function. This can be linearized using the following transformation.

Let,  $y_i$  be represented as the difference of two non-negative variables,

$$y_i = y_i^1 - y_i^2 (9)$$

Using these variables, we can rewrite the LP formulation in Eqn. 8 as follows.

$$\begin{array}{ll} \text{Minimize}: & \sum_{i} \left| y_{i}^{1} - y_{i}^{2} \right| \\ \text{Subject to}: & y_{i}^{1} - y_{i}^{2} + \sum_{j} a_{ij} * x_{j} \leq b_{i}, \forall i; \quad x_{j} \geq 0, \forall j; \quad y_{i}^{1}, y_{i}^{2} \geq 0, \forall i \end{array}$$
(10)

If the product of  $y_i^1$  and  $y_i^2$  is zero, then,

$$\left|y_{i}^{1}-y_{i}^{2}\right| = \left|y_{i}^{1}\right|+\left|y_{i}^{2}\right| = y_{i}^{1}+y_{i}^{2}$$

$$(11)$$

Using the above, we can write the LP formulation expressed in Eqn. 10 as shown below.

$$\begin{array}{ll} \text{Minimize}: & \sum_{i} y_{i}^{1} + y_{i}^{2} \\ \text{Subject to}: & y_{i}^{1} - y_{i}^{2} + \sum_{j} a_{ij} * x_{j} \leq b_{i}, \forall i; \quad x_{j} \geq 0, \forall j; \quad y_{i}^{1}, y_{i}^{2} \geq 0, \forall i \end{array}$$
(12)

The formulations in Eqn. 8 and 12 are equivalent and the minimization of Eqn. 12 will result in the minimization of Eqn. 8.

## 4.2 LP Formulation Involving Fraction

The general expression for the LP formulation involving fractions is considered below [McCarl and Spreen 1997].

$$\begin{array}{ll}
\text{Minimize} : & \sum_{j} c_j * x_j \\
\text{Subject to} : & \sum_{j} a_{ij} * x_j \leq b_i, \forall i; \quad x_j \geq 0, \forall j
\end{array} \tag{13}$$

where,  $c_j$  and  $d_j$  are known constants and the denominator  $\sum_j d_j * x_j$  is strictly positive. Let us assume new variables as follows :

$$z_{0} = \left| d_{0} + \sum_{j} d_{j} * x_{j} \right|^{-1}$$

$$x_{j} = \frac{z_{j}}{z_{0}}$$
(14)

Using the above transformation, the original formulation in Eqn. 13 can be modified to the following.

$$\begin{array}{ll} \text{Minimize:} & c_0 * z_0 + \sum_j c_j * z_j \\ \text{Subject to:} & \sum_j a_{ij} * z_j - b_i * z_0 \leq b_i, \forall i \\ & \sum_j d_j * z_j + d_0 * z_0 = 1; \quad z_0, z_j \geq 0, \forall j \end{array}$$
(15)

The problems defined in Eqn. 13 and 15 are equivalent. On solving the problem in Eqn. 15, we substitute,  $z_j = x_j * z_0$  to get the results for  $x_j$ .

# 5. ILP FORMULATIONS TO MINIMIZE MODIFIED CYCLE POWER FUNCTION

In this section, we discuss the ILP models for minimization of the "modified cycle power function" (CPF<sup>\*</sup>). We describe the ILP models for two different scenario of ASIC design. The first one targets design with multiple supply voltages and dynamic frequency clocking (MVDFC). The other one targets multiple supply voltages and multicycling (MVMC) based designs. The single voltage and single frequency (SVSF) ILP models become trivial once these two are presented and hence, are not shown. The ILP models formulated to ensure that the dependency constraints and the resource constraints are satisfied. In order to formulate an ILP based model for Eqn. 6 and the scheduling schemes for the DFG, we use the notations shown in Table II.

|                    | Table II. Notations used in ILP Formulations                                 |
|--------------------|------------------------------------------------------------------------------|
| $M_{k,v}$          | : maximum number of functional units of type $k$ operating                   |
|                    | at voltage level $v (FU_{k,v})$                                              |
| $S_i$              | : as soon as possible (ASAP) time stamp for the operation $o_i$              |
| $E_i$              | : as late as possible (ALAP) time stamp for the operation $o_i$              |
| $P(C_{swi}, v, f)$ | : power consumption of functional unit $FU_i$ at voltage level $v$           |
|                    | and operating frequency $f$ used by $o_i$ for its execution                  |
| $x_{i,c,v,f}$      | : decision variable which takes the value of 1 if operation $o_i$            |
|                    | is scheduled in control step $c$ using the functional unit $F_{k,v}$         |
|                    | and $c$ has frequency $f_c$                                                  |
| $y_{i,v,l,m}$      | : decision variable which takes the value of 1 if operation $o_i$ is         |
|                    | using the functional unit $F_{k,v}$ and scheduled in control steps $l \to m$ |
| $L_{i,v}$          | : latency for operation $o_i$ using functional unit operating                |
|                    | at voltage $v$ (in terms of number of clock cycles)                          |

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, April 2005.

#### 5.1 Multiple Supply Voltages and Dynamic Frequency Clocking (MVDFC)

In this subsection, we describe the ILP formulation for minimization of CPF-MVDFC<sup>\*</sup> using multiple supply voltages and dynamic frequency clocking. In dynamic frequency clocking [Kim and Chae 1996; Ranganathan et al. 1998; Brynjolfson and Zilic 2000], the clock frequency is varied on-the-fly based on the functional units active in that cycle. In this clocking scheme, all the units are clocked by a single clock line which switches at runtime. The frequency reduction creates an opportunity to operate the different functional units at different voltages, which in turn, helps in further reduction of power.

(a) *Objective Function* : The objective is to minimize the modified cycle power function CPF-MVDFC\* described in Eqn. 6 of the whole DFG over all control steps.

$$Minimize: CPF-MVDFC^*$$
(16)

Using Eqn. 2, this can be restated as :

Minimize: 
$$\frac{\frac{1}{N} \sum_{c=1}^{N} P_c + \frac{1}{N} \sum_{c=1}^{N} |P - P_c|}{P_{peak}}$$
(17)

This objective function has the two types of non-linearities mentioned in the previous section. We first remove the non-linearity introduced because of the fraction by putting the denominator as a constraint. Then, the problem in Eqn. 17 transformed to the one given below.

Minimize : 
$$\frac{1}{N} \sum_{c=1}^{N} P_c + \frac{1}{N} \sum_{c=1}^{N} |P - P_c|$$
  
Subject to : Peak power constraints (18)

However, this transformed problem still has the non-linearity in it because of the absolute function. This can be converted to an equivalent problem using the transformation suggested in the previous section.

Minimize : 
$$\frac{1}{N} \sum_{c=1}^{N} P_c + \frac{1}{N} \sum_{c=1}^{N} (P + P_c)$$
  
Subject to : Modified peak power constraints (19)

The "peak power constraint" in Eqn. 18 and the "modified peak power constraint" in Eqn. 19 will be discussed in later part of the subsection. Using Eqn. 1 repeatedly, the problem expressed in Eqn. 19 is simplified to :

Minimize : 
$$\left(\frac{3}{N}\right)\sum_{c=1}^{N} P_c$$
  
Subject to : Modified peak power constraints (20)

Using the decision variables, the objective function is formulated as,

Minimize : 
$$\sum_{c} \sum_{i \in F_{k,v}} \sum_{v} \sum_{f} x_{i,c,v,f} * \left(\frac{3}{N}\right) * P(C_{swi}, v, f)$$
  
Subject to : Modified peak power constraints (21)

Assuming,  $P^*(C_{swi}, v, f)$  as  $P(C_{swi}, v, f) * \left(\frac{3}{N}\right)$ ,

Minimize : 
$$\sum_{c} \sum_{i \in F_{k,v}} \sum_{v} \sum_{f} x_{i,c,v,f} * P^*(C_{swi}, v, f)$$
  
Subject to : Modified peak power constraints (22)

(b) Uniqueness Constraints : These constraints ensure that every operation  $o_i$  is scheduled to one unique control step within the mobility range  $(S_i, E_i)$  with a particular supply voltage and operating frequency. We represent them as,  $\forall i, 1 \le i \le O$ ,

$$\sum_{c} \sum_{v} \sum_{f} x_{i,c,v,f} = 1$$
(23)

(c) Precedence Constraints : These constraints guarantee that for an operation  $o_i$ , all its predecessors are scheduled in an earlier control step and its successors are scheduled in a later control step. These are modeled as,  $\forall i, j, o_i \in Pred_{o_i}$ ,

$$\sum_{v} \sum_{f} \sum_{d=S_{i}}^{E_{i}} d * x_{i,d,v,f} - \sum_{v} \sum_{f} \sum_{e=S_{j}}^{E_{j}} e * x_{j,e,v,f} \leq -1$$
(24)

(d) *Resource Constraints*: These constraints make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage level v. These can be enforced as,  $\forall c$ ,  $1 \le c \le N$  and  $\forall v$ ,

$$\sum_{i \in F_{k,v}} \sum_{f} x_{i,c,v,f} \leq M_{k,v} \tag{25}$$

(e) *Frequency Constraints* : This set ensures that if a functional unit is operating at a higher voltage level then it can be scheduled in a lower frequency control step, whereas, a functional unit is operating at lower voltage level then it can not be scheduled in a higher frequency control step. We write these constraints as,  $\forall i, 1 \le i \le O, \forall c, 1 \le c \le N$ , if f < v, then  $x_{i,c,v,f} = 0$ .

(f) Peak Power Constraints : As discussed before, with reference to the Eqn. 17 and 18, these constraints are introduced to eliminate the fractional non-linearity of the objective function. These constraints ensure that the maximum power consumption of the DFG does not exceed  $P_{peak}$  for any control step. We enforce these constraints as follows,  $\forall c$ ,  $1 \le c \le N$ ,

$$\sum_{i \in F_{k,v}} \sum_{v} \sum_{f} x_{i,c,v,f} * P(C_{swi}, v, f) \leq P_{peak}$$
(26)

The  $P_{peak}$  is peak power constraint, which is an unkown value during the minimization process, so it is added alongwith the objective function and minimized.

(g) Modified Peak Power Constriants : To eliminate the non-linearity introduced due to the absolute function, we modify the above constraints, as outlined in Eqn. 18 and 19. The peak power constraints in Eqn. 26 is modified as,  $\forall c, 1 \le c \le N$ ,

$$\frac{1}{N}\sum_{c}\sum_{i\in F_{k,v}}\sum_{v}\sum_{f}x_{i,c,v,f}*P(C_{swi},v,f) -\sum_{i\in F_{k,v}}\sum_{v}\sum_{f}x_{i,c,v,f}*P(C_{swi},v,f) \le P_{peak}^{*}$$

$$(27)$$

The  $P_{peak}^*$  is a modified peak power constraint which is added to the objective function and minimized along with it.

#### 5.2 Multiple Supply Voltages and Multicycling (MVMC)

In this subsection, we describe the ILP formulations based on the modified cycle power function CPF-MVMC<sup>\*</sup> (Eqn. 6) using multiple supply voltages and multicycling. In this scheme, the functional units are operated at multiple supply voltages. The functional units operating at lower voltages may need to be active in more than one consecutive control steps to complete execution.

(a) *Objective Function* : The objective is to minimize the CPF-MVMC<sup>\*</sup> for the entire DFG. Using Eqn. 1, this can be represented as :

Minimize: CPF-MVMC<sup>\*</sup>  
= 
$$\frac{\frac{1}{N}\sum_{c=1}^{N}P_c + \frac{1}{N}\sum_{c=1}^{N}|P - P_c|}{P_{reach}}$$
 (28)

As discussed in the previous subsection, this objective function has two types of nonlinearities, which are because of the absolute function and the fractional form. The fractional non-linearity is removed by introducing the denominators as a constraint. The corresponding constraints are known as "peak power constraints". We remove the absolute function non-linearity by modifying the peak power constraints which give rises to "modified peak power constraints". Thus, the problem in Eqn. 28 is transformed to the following.

Minimize : 
$$\frac{1}{N} \sum_{c=1}^{N} P_c + \frac{1}{N} \sum_{c=1}^{N} (P + P_c)$$
  
Subject to : Modified peak power constraints (29)

The "peak power constraint" and the "modified peak power constraint" are discussed in the later part of the subsection. Using Eqn. 1 repeatedly, the problem in Eqn. 29 is simplified to :

Minimize : 
$$\left(\frac{3}{N}\right)\sum_{c=1}^{N} P_c$$
  
Subject to : Modified peak power constraints (30)

Using the decision variables, the above LP objective function is formulated as,

Minimize : 
$$\sum_{l} \sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * \left(\frac{3}{N}\right) P(C_{swi}, v, f)$$
  
Subject to : Modified peak power constraints (31)

Where, f is the operating frequency level of the datapath circuit in multicycling mode.

Minimize : 
$$\sum_{l} \sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P^*(C_{swi}, v, f)$$
  
Subject to : Modified peak power constraints (32)

Where,  $P^*(C_{swi}, v, f) = \left(\frac{3}{N}\right) * P(C_{swi}, v, f)$ , are modified power values.

(b) Uniqueness Constraints : These constraints ensure that every operation  $o_i$  is scheduled in appropriate control steps within the mobility range  $(S_i, E_i)$  with a particular supply voltage. Depending on the supply voltage it may be operated at more than one clock cycle. We represent them as,  $\forall i, 1 \le i \le O$ ,

$$\sum_{v} \sum_{l=S_i}^{S_i+E_i+1-L_{i,v}} y_{i,v,l,(l+L_{i,v}-1)} = 1$$
(33)

When the operators are computed at the highest voltage, they are scheduled in one unique control step, whereas, when they are to be operated at lower voltages they need more than

one clock cycle for completion. Thus, for lower voltage, the mobility is restricted.

(c) Precedence Constraints : These constraints guarantee that for an operation  $o_i$ , all its predecessors are scheduled in earlier control step and its successors are scheduled in later control step. These constraints should also take care of the multicycling operations. These are modeled as,  $\forall i, j, o_i \in Pred_{o_i}$ ,

$$\sum_{v} \sum_{l=S_i}^{E_i} (l+L_{i,v}-1) * y_{i,v,l,(l+L_{i,v}-1)} - \sum_{v} \sum_{l=S_j}^{E_j} l * y_{j,v,l,(l+L_{j,v}-1)} \le -1$$
(34)

(d) *Resource Constraints* : These constraints make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v. These can be enforced as,  $\forall v$  and  $\forall l$ ,  $1 \leq l \leq N$ ,

$$\sum_{i \in F_{k,v}} \sum_{l} y_{i,v,l,(l+L_{i,v}-1)} \leq M_{k,v}$$
(35)

(e) *Peak Power Constraints* : As discussed earlier with reference to Eqn. 28 and 29, these constraints are enforced to eliminate the fractional non-linearity of the objective function. We enforce these constraints as follows,  $\forall l, 1 \leq l \leq N$ ,

$$\sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P(C_{swi}, v, f) \le P_{peak}$$
(36)

Where,  $P_{peak}$  is the peak power constraint which is added to the objective function and minimized alongwith it.

(f) Modified Peak Power Constriants : These constraints are introduced to eliminate the absolute function non-linearity of the objective function. These constraints can be enforced as,  $\forall l, 1 \leq l \leq N$ ,

$$\frac{1}{N} \sum_{l} \sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P(C_{swi}, v, f) \\ - \sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P(C_{swi}, v, f) \le P_{peak}^{*}$$
(37)

Where,  $P_{peak}^*$  is the modified peak power constraint which is also minimized as a part of the objective function.

#### 6. ILP-BASED SCHEDULING

In this section, we discuss the solutions for the ILP formulations obtained in the previous section and develop scheduling algorithms for MVDFC and MVMC schemes. The scheduling for SVSF is simple and not discussed here for brevity. The target architecture model assumed for the scheduling schemes is from [Johnson and Roy 1997]. Each functional unit has a register and a multiplexer associated with it. The register and the multiplexor operate at the same voltage level as that of the functional unit. Level converters are used when a low-voltage functional unit drives a high-voltage functional unit [Johnson and Roy 1997; Shiue and Chakrabarti 2000b]. A controller decides which of the functional units are active in each control step and those that are not active are disabled using the multiplexors. For MVDFC scheme, the controller has a storage unit to store cycle

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, April 2005.

frequency index  $(cfi_c)$  values obtained from scheduling. This serves as the clock dividing factor for the dynamic clocking unit. The cycle frequency  $f_c$  is generated dynamically and a functional unit operating at one of the supply voltages is activated.

The inputs to the algorithm are an unscheduled data flow graph (UDFG), the resource constraints, the number of allowable voltage levels  $(L_V)$ , the number of allowable frequencies  $(L_f)$ , the delay of each resource  $(d_{FU})$ , the multiplexor  $(d_{Mux})$ , the register  $(d_{Reg})$  at different voltage levels. The delays of level converters  $(d_{Conv})$  is represented in the form of a matrix that shows the delay in converting one at voltage level  $V_i$  to another voltage level  $V_j$  (where, both  $V_i, V_j \in V_{L_V}$ ). The resource constraint includes the number of ALUs and multipliers at different voltage levels  $V_i$  (where,  $V_i \in V_{L_V}$ ). The scheduling algorithm determines the  $f_{base}$ ,  $cfi_c$  time stamp for each operation, and voltage level such that the function CPF<sup>\*</sup> (Eqn. 6) is minimum.

| Input   | : UDFG, resource constraints, $L_V$ , $L_f$ , all $V_i \in V_{L_V}$ , $d_{FU}$ , $d_{Mux}$ , $d_{Reg}$ , $d_{Conv}$ |
|---------|---------------------------------------------------------------------------------------------------------------------|
| Output  | : scheduled DFG, $f_{base}$ , $N$ , $cfi_c$ , power, energy and delay estimates                                     |
| Step 1  | : Construct a look up table for (effective switching capacitance, average switching activity) pairs.                |
| Step 2  | : Calculate the switching activities at the inputs of each node through                                             |
|         | behavioral simulation of the DFG.                                                                                   |
| Step 3  | : Find ASAP schedule for the UDFG.                                                                                  |
| Step 4  | : Find ALAP schedule for the UDFG.                                                                                  |
| Step 5  | : Determine the mobility graph of each node.                                                                        |
| Step 6  | : Modify the mobility graph for MVMC.                                                                               |
| Step 7  | : Model the ILP formulations of the DFG for MVDFC, MVMC or SVSF scheme using AMPL.                                  |
| Step 8  | : Solve the ILP formulations using LP-Solve.                                                                        |
| Step 9  | : Find the scheduled DFG.                                                                                           |
| Step 10 | : Determine the cycle frequencies $(f_c)$ , $f_{base}$ and $cfi_c$ for MVDFC scheme.                                |
| Step 11 | : Estimate the power and energy consumptions of the scheduled DFG.                                                  |

#### Fig. 1. Scheduling for CPF\* minimization

The ILP based scheduler which minimizes the modified cycle power function CPF<sup>\*</sup> of the DFG is outlined in Fig. 1. In step 1, the scheduler constructs a look-up table to store the effective switching capacitance and the average switching activity value pairs. In step 2, the scheduler determines the switching activities at the inputs of each node by using behavioral simulation of DFG. For this purpose, a different set of application specific input vectors (having different correlations) are given at the primary inputs of the DFG and average swtiching activity at each inputs of other nodes are calculated. It should be noted that if the look-up table (in step 1) does not have the switching capacitance for an average switching activity value (in step 2), then the scheduler uses interpolation techniques to find the same. The third step is to determine the as soon as possible (ASAP) time stamp of each operation. The fourth step is the determination of the as late as possible (ALAP) time stamp of each vertex for the DFG. The ASAP time stamp is the start time and the ALAP time stamp is the finish time of each operation. These two time stamps provide the mobility of an operation and the operation must be scheduled within this mobility range. This mobility graph needs to be modified for the MVMC scheme. The ILP formulations constructed based on the models described in section 5. The scheduler uses the modeling language AMPL to model the ILP formulations [Fourer et al. 2003]. At this step, we calculate the power consumption of the functional units as follows. The operational delay of a functional unit is assumed as

 $(d_{FU} + d_{Mux} + d_{Reg} + d_{Conv})$ . For the MVMC scheme, the operating frequency depends on the delay of the multiplier unit operating at the highest voltage level allowed. For the MVDFC scheme, the operating frequency of a functional unit is calculated using the formulas for delay given in [Mohanty and Ranganathan 2003a]. We obtain the switching capacitance from steps 1 and 2, and the power values are calculated whenever necessary for different operating voltages and frequencies. The scheduled DFG is obtained after the ILP formulation is solved using LP-Solve. Then, the scheduler determines the  $f_{base}$ ,  $cfi_c$  and cycle frequency ( $f_c$ ) using the methods proposed in [Mohanty and Ranganathan 2003a] based on the delay of each cycle. Finally, the power consumption, energy consumption and the energy delay product of the scheduled DFG are calculated.



Fig. 2. ASAP and ALAP schedule for example (EXP) DFG

# 6.1 MVDFC Scheduling Scheme

We illustrate the solution for the ILP formulation in the MVDFC case, with the help of the DFG shown in Fig. 2. The ASAP schedule is shown in Fig. 2(a) and the ALAP schedule is shown in Fig. 2(b). From the ASAP and ALAP schedules, we obtained the mobility graph which is Fig. 3(a). We get the ILP formulations using this mobility graph. We solved the formulation using LP-solve and based on the results, we obtained the scheduled DFG shown in Fig. 3(b) for the resource constraint (RC5), two multipliers at 2.4V and one ALU operating at 3.3V. Similarly, other schedules can be obtained for different resource constraints.

#### 6.2 MVMC Scheduling Scheme

We illustrate the solution for the ILP formulations of the MVMC case, using the DFG shown in Fig. 2. The ASAP schedule is shown in Fig. 2(a) and the ALAP schedule is shown in Fig. 2(b). From the ASAP schedule (Fig. 2(a)) and the ALAP schedule (Fig. 2(b)), we obtained the mobility graph shown in Fig. 4(a). This mobility graph is different from that shown in Fig. 3(a). In the MVMC case, the mobility graph considers the multicycle operations. In this illustration, we assume that we have two operating voltage levels, and when the multipliers are operated at the lower voltage, they take two clock



ILP models for simultaneous energy and transient power minimization · 127

Fig. 3. Mobility graph and final schedule for example DFG for RC5 using MVDFC



Fig. 4. Mobility graph and final schedule for example DFG for RC5 using MVMC

cycles. It should be noted that the mobility graph will depend on the number of operating voltages and the assumed operating frequency. We solved the ILP formulation using LP-solve and based on the results we obtained the scheduled DFG shown is Fig. 4(b) for the resource constraint (RC5), two multipliers at 2.4V and one ALUs operating at 3.3V.

# 7. EXPERIMENTAL RESULTS

The following notations are used to express the results :

 $P_{pS}$ : the peak power consumption (in mW) SVSF scheme,

 $P_{p_D}$ : the peak power consumption (in mW) for MVDFC operation,

 $P_{p_M}$ : the peak power consumption (in mW) for MVMC operation,

 $P_{mS}$ : minimum power consumption for any cycle assuming SVSF (in mW)

 $P_{mD}$ : minimum power consumption for any cycle for MVDFC (in mW)

 $T_S$ : execution time for single frequency

 $T_D$ : execution time for dynamic frequency

 $T_M$ : execution time for multicycling operation

 $E_S$ : total energy consumption (in nano-Joule or nJ) for SVSF scheme,

 $E_D$ : total energy consumption (in nJ) for MVDFC operation,

 $E_M$ : total energy consumption (in nJ) for MVMC operation,

 $P_{S}$ : average power consumption (in mW) for SVSF scheme, which is calculated as the mean of the cycle power consumptions

 $P_D$ : average power consumption (in mW) for MVDFC operation, estimated as the mean of the cycle power

 $P_M$  : average power consumption (in mW) for MVDFC operation, calculated as the mean of the cycle power consumptions

 $EDP_S$ : energy delay product (in  $10^{-15}$  Joule-sec or fJs) for SVSF operation (=  $E_S * T_S$ )  $EDP_D$ : energy delay product (in fJs) for MVDFC operation (=  $E_D * T_D$ )

 $EDP_M$ : energy delay product (in fJs) for MVMC operation (=  $E_M * T_M$ )

 $\Delta P_p$ : percentage peak power reduction, for MVDFC scheme this is defined as,  $\frac{(P_{P_S} - P_{P_D})}{P_s} *$ 

100 and for MVMC scheme it is calculated as,  $\frac{(P_{p_S} - P_{p_M})}{P_{p_S}} * 100$   $\Delta DP$ : percentage peak differential power reduction, which is calculated as  $\frac{(P_{p_S} - P_{m_S}) - (P_{p_D} - P_{m_D})}{(P_{p_S} - P_{m_S})} * 100$  for MVDFC scheme and as  $\frac{(P_{p_S} - P_{m_S}) - (P_{p_M} - P_{m_M})}{(P_{p_S} - P_{m_S})} * 100$ for MVMC scheme

 $\Delta P$  : percentage average power reduction, for MVDFC scheme it is  $\frac{P_S - P_D}{P_S} * 100$  and for MVMC scheme it is  $\frac{P_S - P_M}{P_S} * 100$ 

 $\Delta E$  : percentage reduction in total energy, is calculated as  $\frac{E_S - E_D}{E_S} * 100$  for MVDFC scheme and as  $\frac{E_S-E_M}{E_S}*100$  for MVMC scheme

 $\Delta EDP$  : percentage EDP reduction, calculated as  $\frac{(EDP_S - EDP_D)}{EDP_S} * 100$  for MVDFC scheme and as  $\frac{(EDP_S - EDP_M)}{EDP_M} * 100$  for MVMC scheme

The ILP based schedulers were tested with five benchmark circuits :

(1) Example circuit (EXP) (8 nodes, 3\*, 3+, 9 edges),

(2) FIR filter (11 nodes, 5\*, 4+, 19 edges),

(3) HAL differential equation solver (13 nodes, 6+, 2+, 2-, 1 <, 16 edges),

(4) IIR filter (11 nodes, 5\*, 4+, 19 edges) and

(5) Auto-Regressive filter (ARF) (15 nodes, 5\*, 8+, 19 edges).

We used the look-up table method presented in Section 3 for the average switching capacitance calculation. The look-up table construction consists of two phases, such as input pattern generation and cell characterization. We generate the primary input signals of different correlations using the autoregressive moving average (ARMA) model [Ramprasad

et al. 1997]. We perform the characterization of the physical implementations of the library modules available in [Mohanty and Ranganathan 2003b] by applying the input patterns generated above for some values of  $(\alpha_i^{1}, \alpha_i^{2})$  pairs. Whenever necessary, we used interpolation to find the average switching capacitance for any other values of  $(\alpha_i^{1}, \alpha_i^{2})$  pairs that do not exist in the look-up table. It should be noted that larger the size of look-up table, better is the accuracy. The above generated signals are propagated through different operators in the DFG and the average switching activities are calculated as described in [Ramprasad et al. 1997].

The scheduling algorithms were tested for five different sets of resource constraints (RC1,RC2,RC3,RC4,RC5):

(1) multipliers (2 at 2.4V and 1 at 3.3V) and ALUs (1 at 2.4V and 1 at 3.3V),

(2) multipliers (3 at 2.4V) and ALUs (1 at 2.4V and 1 at 3.3V),

(3) multipliers (2 at 2.4V) and ALUs (2 at 3.3V),

(4) multipliers (1 at 2.4V and 1 at 3.3V) and ALUs (1 at 3.3V), and

(5) multipliers (2 at 2.4V) and ALUs (1 at 3.3V).

The sets of resource constraints were chosen to be a good representation of the different types of resources at different operating voltages. The number of allowable voltage levels is two (2.4V, 3.3V) and maximum number of allowable frequencies being three. The experimental results for various benchmark circuits are reported in Table III for the MVDFC scheduling scheme and in Table IV for the MVMC scheduling scheme. The power/energy estimation include the power consumption of the overheads, such as level converters (data taken from [Mohanty and Ranganathan 2003b]). The results are reported for two supply voltages. In case of MVDFC scheduling the frequencies found out are 4.5MHz, 9MHz, 18MHz. For MVMC scheduling scheme, the operating frequency  $f_{clk}$ is 9MHz. The operating voltage and frequency for SVSF scheme is 3.3V and 9MHz, respectively. The tables show the estimates for power, energy and energy delay product for various benchmarks for different resource constraints. The reductions in power values are reported in terms of percentage with respect to single supply voltage and single frequency (SVSF) operation mode. The reductions in peak power, peak power differential and average power are in the range 70 - 76% for MVDFC; on the other hand, the reductions are in the range 0 - 40% for MVMC. The tables do not report the reduction in cycle difference power reduction due to lack of space as the tables became too large to fit the specified margin. Hence, we preferred to report peak power differential, which is a measure of maximum power fluctuation in a DFG. The reduction in the energy delay products for most of the benchmarks indicate that the reduction in power and energy achieved without compromising with the performance. This significance of the proposed scheduling approaches minimizing the new multicost objective function.

We plotted Fig. 5(a) and 5(b) to get a visual picture of the experimental results. The figures show the average reductions for different benchmarks averaged over all resource constraints. It is obvious from the figure that the reductions are significant, which are in the range 44 - 74% for MVDFC for various power and energy. Similarly, for MVMC, the average reductions are in the range 22 - 39%. It may also be noted that for the reductions for MVDFC scheme is better than the MVMC scheme. The MVDFC scheme works effectively for all resource constraints and all benchmarks, where as, the MVMC scheme does not produce good results for ARF benchmark. We did not find any work in the literature that deals with simultaneous reduction of energy and transient power, so we could not

|             |           | ч     | R     | A     | છ     |       |           | R     | Г     | Г     | 4     |       |           | г     | A     | Н     | 3     |       |         | R     | Г     | т     | 2     |       |           | P     | ×     | Π     | Ξ     |       | 1  |       |              |           |
|-------------|-----------|-------|-------|-------|-------|-------|-----------|-------|-------|-------|-------|-------|-----------|-------|-------|-------|-------|-------|---------|-------|-------|-------|-------|-------|-----------|-------|-------|-------|-------|-------|----|-------|--------------|-----------|
| ļ           |           | J     | 4     | ы     | 2     | 1     |           | J     | 4     | 3     | 2     | ,     |           | S     | 4     | ω     | 2     |       |         | S     | 4     | ω     | 2     | 1     |           | S     | 4     | 3     | 2     |       | 2  | C     | R            |           |
| erall avera | Average ' | 8.87  | 8.87  | 8.87  | 8.87  | 8.87  | Average . | 17.51 | 17.51 | 17.51 | 25.92 | 25.92 | Average . | 17.51 | 17.51 | 17.74 | 26.15 | 17.51 | Average | 17.51 | 17.28 | 17.51 | 25.92 | 17.51 | Average . | 17.28 | 8.87  | 17.28 | 17.28 | 17.28 | 3  | mW    | $P_{p_S}$    | 12        |
| lge         | values    | 2.39  | 2.39  | 2.39  | 2.34  | 2.34  | values    | 4.67  | 6.71  | 4.67  | 6.84  | 8.88  | values    | 4.67  | 6.71  | 4.78  | 6.9   | 4.62  | values  | 4.67  | 6.6   | 4.67  | 6.84  | 4.62  | values    | 4.56  | 2.39  | 4.56  | 4.56  | 4.56  | 4  | mW    | $P_{p_D}$    | uble III. |
| 71.70       | 73.28     | 73.05 | 73.05 | 73.05 | 73.62 | 73.62 | 69.54     | 73.33 | 61.68 | 73.34 | 73.61 | 65.74 | 71.06     | 73.33 | 61.68 | 73.05 | 73.61 | 73.62 | 71.14   | 73.33 | 61.81 | 73.33 | 73.61 | 73.62 | 73.50     | 73.61 | 73.05 | 73.61 | 73.61 | 73.61 | 5  | %     | $\Delta P_p$ | Powe      |
|             |           | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  |           | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  |           | 0.23  | 0.23  | 0.46  | 0.46  | 0.46  |         | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  |           | 0.23  | 0.45  | 0.46  | 0.46  | 0.46  | 6  | mW    | $P_{mS}$     | er, ener  |
|             |           | 0.45  | 0.45  | 0.45  | 0.12  | 0.12  |           | 0.45  | 0.45  | 0.45  | 0.12  | 0.12  |           | 0.45  | 0.45  | 0.9   | 0.35  | 0.35  |         | 0.45  | 0.45  | 0.45  | 0.12  | 0.12  |           | 0.45  | 0.23  | 0.9   | 0.35  | 0.35  | 7  | mW    | $P_{mD}$     | gy and    |
| 74.0        | 76.20     | 77.6  | 77.6  | 77.6  | 74.1  | 74.1  | 71.65     | 75.58 | 63.77 | 75.58 | 73.84 | 65.9  | 73.16     | 75.6  | 63.77 | 76.97 | 74.5  | 74.96 | 72.60   | 75.58 | 63.93 | 75.58 | 73.84 | 73.96 | 76.32     | 75.89 | 77.55 | 78.24 | 74.97 | 74.97 | 8  | %     | $\Delta DP$  | energy a  |
|             |           | 4.5   | 4.5   | 4.5   | 4.5   | 4.5   |           | 8.82  | 8.82  | 8.82  | 11.03 | 11.03 |           | 10.6  | 10.6  | 13.25 | 13.25 | 13.25 |         | 8.82  | 8.82  | 8.82  | 8.82  | 8.82  |           | 6.65  | 6.67  | 8.87  | 8.87  | 8.87  | 6  | mW    | $P_S$        | delay pi  |
|             |           | 1.4   | 1.4   | 1.4   | 1.22  | 1.22  |           | 2.5   | 3.32  | 2.57  | 2.98  | 3.5   |           | 2.98  | 3.73  | 3.73  | 3.55  | 3.55  |         | 2.5   | 2.84  | 2.5   | 2.36  | 2.35  |           | 1.96  | 1.87  | 2.61  | 2.42  | 2.42  | 10 | mW    | $P_D$        | roduct e  |
| 70.82       | 70.5      | 68.9  | 68.9  | 68.9  | 72.9  | 72.9  | 69.34     | 71.66 | 62.86 | 70.86 | 72.98 | 68.36 | 71.0      | 71.9  | 64.8  | 71.85 | 73.21 | 73.21 | 71.54   | 71.66 | 67.8  | 71.66 | 73.24 | 73.36 | 71.70     | 70.53 | 71.96 | 70.57 | 72.72 | 72.72 | 11 | %     | $\Delta P$   | stimate   |
|             |           | 5.0   | 5.0   | 5.0   | 5.0   | 5.0   |           | 4.9   | 4.9   | 4.9   | 4.9   | 4.9   |           | 5.9   | 5.9   | 5.9   | 5.9   | 5.9   |         | 4.9   | 4.9   | 4.9   | 4.9   | 4.9   |           | 2.96  | 2.96  | 2.96  | 2.96  | 2.96  | 12 | nJ    | $E_S$        | es for    |
|             |           | 2.74  | 2.74  | 2.74  | 2.64  | 2.64  |           | 2.64  | 3.54  | 2.64  | 2.6   | 3.05  |           | 3.17  | 4.07  | 3.17  | 3.12  | 3.12  |         | 2.6   | 3.1   | 2.6   | 2.6   | 2.6   |           | 1.6   | 1.58  | 1.6   | 1.57  | 1.57  | 13 | nJ    | $E_D$        | benchi    |
| 44.36       | 46.06     | 45.3  | 45.3  | 45.3  | 47.2  | 47.2  | 41.17     | 46.22 | 27.73 | 46.22 | 47.96 | 37.7  | 43.44     | 46.2  | 30.8  | 46.2  | 47.0  | 47.0  | 44.76   | 46.22 | 36.98 | 46.22 | 47.2  | 47.2  | 46.38     | 45.9  | 46.4  | 46.0  | 46.8  | 46.8  | 14 | %     | $\Delta E$   | narks u   |
|             |           | 5.56  | 5.56  | 5.56  | 5.56  | 5.56  |           | 2.72  | 2.72  | 2.72  | 2.18  | 2.18  |           | 3.27  | 3.27  | 2.62  | 2.62  | 2.62  |         | 2.7   | 2.7   | 2.7   | 2.7   | 2.7   |           | 1.31  | 1.31  | 0.99  | 0.99  | 0.99  | 15 | s f f | $EDP_S$      | ISING MV  |
|             |           | 3.8   | 3.8   | 3.8   | 4.4   | 4.4   |           | 2.05  | 2.75  | 2.05  | 1.73  | 2.04  |           | 2.46  | 3.85  | 2.23  | 2.43  | 2.43  |         | 2.0   | 2.9   | 2.0   | 2.0   | 2.3   |           | 0.87  | 1.14  | 0.8   | 0.87  | 0.87  | 16 | fJs   | $EDP_D$      | /DHC      |
| 17.31       | 27.31     | 31.63 | 31.63 | 31.63 | 20.83 | 20.83 | 15.24     | 24.71 | NA    | 24.71 | 20.44 | 6.57  | 10.34     | 24.66 | NA    | 12.55 | 7.25  | 7.25  | 16.21   | 24.71 | NA    | 24.71 | 26.09 | 15.52 | 17.41     | 32.49 | 12.89 | 18.98 | 11.34 | 11.34 | 17 | %     | $\Delta EDP$ |           |

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, April 2005.

130

|           | $\Delta EDP$ | $\eta_{c}$ | 17 | 8.63  | 29.07 | 9.98  | 15.33 | 32.5  | 19.10      | 24.38 | 36.64 | 24.71 | 2.19  | 24.71 | 22.53      | NA    | 20.61   | 6.11  | NA    | 13.76 | 8.10       | NA    | 20.81 | 13.96 | 13.28 | 16.23 | 12.76      | 31.4  | 31.4  | 28.9 | 17.11 | 28.9 | 27.54      | 17.99       |
|-----------|--------------|------------|----|-------|-------|-------|-------|-------|------------|-------|-------|-------|-------|-------|------------|-------|---------|-------|-------|-------|------------|-------|-------|-------|-------|-------|------------|-------|-------|------|-------|------|------------|-------------|
| 'MC       | $EDP_{M}$    | fJs        | 16 | 6.0   | 0.7   | 0.89  | 1.11  | 0.89  |            | 2.06  | 1.72  | 2.05  | 2.66  | 2.05  |            | 2.68  | 2.08    | 2.46  | 3.32  | 2.82  |            | 2.2   | 1.72  | 2.34  | 2.36  | 2.34  |            | 3.81  | 3.81  | 3.95 | 4.60  | 3.95 |            |             |
| using MV  | $EDP_{S}$    | $fJ_s$     | 15 | 0.99  | 0.99  | 0.99  | 1.31  | 1.31  |            | 2.72  | 2.72  | 2.72  | 2.72  | 2.72  |            | 2.62  | 2.62    | 2.62  | 3.27  | 3.27  |            | 2.18  | 2.18  | 2.72  | 2.72  | 2.72  |            | 5.56  | 5.56  | 5.56 | 5.56  | 5.56 |            |             |
| marks     | $\Delta E$   | %          | 14 | 31.47 | 46.8  | 46.0  | 16.46 | 46.0  | 37.35      | 37.0  | 47.2  | 46.22 | 18.5  | 46.22 | 39.03      | 31.6  | 47.0    | 46.19 | 15.4  | 46.18 | 37.27      | 19.22 | 47.2  | 46.22 | 27.73 | 46.22 | 37.32      | 47.22 | 47.22 | 45.3 | 36.24 | 45.3 | 44.26      | 39.05       |
| benchi    | $E_M$        | nJ         | 13 | 2.03  | 1.57  | 1.57  | 2.5   | 1.6   |            | 3.09  | 2.59  | 2.64  | 4.0   | 2.64  |            | 4.0   | 3.2     | 3.2   | 5.0   | 3.17  |            | 4.0   | 2.6   | 2.6   | 3.54  | 2.64  |            | 2.64  | 2.64  | 2.74 | 3.19  | 2.74 |            |             |
| es for    | $E_S$        | nJ         | 12 | 3.0   | 3.0   | 3.0   | 3.0   | 3.0   |            | 4.9   | 4.9   | 4.9   | 4.9   | 4.9   |            | 5.9   | 5.9     | 5.9   | 5.9   | 5.9   |            | 4.9   | 4.9   | 4.9   | 4.9   | 4.9   |            | 5.0   | 5.0   | 5.0  | 5.0   | 5.0  |            |             |
| stimat    | $\Delta P$   | %          | Π  | 22.9  | 21.53 | 36.75 | NA    | 15.64 | 19.36      | 13.04 | 13.15 | 12.13 | 14.85 | 24.6  | 15.55      | 31.47 | 30.26   | 39.77 | 15.2  | 39.53 | 33.14      | 18.85 | 30.37 | 34.01 | 14.85 | 34.01 | 26.42      | 20.44 | 20.44 | 18.9 | 20.9  | 18.9 | 19.92      | 22.51       |
| oduct e   | $P_M$        | m W        | 10 | 6.84  | 6.96  | 5.61  | 6.77  | 5.61  |            | 7.67  | 7.66  | 7.75  | 7.51  | 6.65  |            | 9.08  | 9.24    | 7.98  | 0.6   | 6.41  |            | 8.95  | 7.68  | 5.82  | 7.51  | 5.82  |            | 3.58  | 3.58  | 3.65 | 3.56  | 3.65 |            |             |
| elay pr   | $P_S$        | mW         | 6  | 8.87  | 8.87  | 8.87  | 6.67  | 6.65  |            | 8.87  | 8.82  | 8.82  | 8.82  | 8.82  |            | 13.25 | 13.25   | 13.25 | 10.6  | 10.6  |            | 11.03 | 11.03 | 8.82  | 8.82  | 8.82  |            | 4.5   | 4.5   | 4.5  | 4.5   | 4.5  |            |             |
| mergy d   | $\Delta DP$  | %          | 8  | 23.6  | 20.8  | 48.51 | NA    | 46.51 | 27.88      | NA    | 47.21 | 47.22 | 22.58 | 47.22 | 32.85      | NA    | 47.64   | 47.22 | 23.61 | 47.22 | 33.14      | 31.34 | 46.75 | 48.55 | 23.61 | 48.55 | 39.76      | NA    | NA    | NA   | NA    | NA   | 0          | 26.73       |
| gy and e  | $P_{m M}$    | mW         | 7  | 0.35  | 0.35  | 0.46  | 0.23  | 0.23  |            | 0.23  | 0.12  | 0.23  | 0.23  | 0.23  |            | 0.35  | 0.35    | 0.46  | 0.23  | 0.23  |            | 0.12  | 0.12  | 0.23  | 0.23  | 0.23  |            | 0.12  | 0.12  | 0.23 | 0.23  | 0.23 |            |             |
| sr, energ | $P_{mS}$     | mW         | 9  | 0.46  | 0.46  | 0.46  | 0.23  | 0.23  |            | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  |            | 0.46  | 0.46    | 0.46  | 0.23  | 0.23  |            | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  |            | 0.23  | 0.23  | 0.23 | 0.23  | 0.23 |            |             |
| Powe      | $\Delta P_p$ | %          | 5  | 23.61 | 20.83 | 47.22 | NA    | 45.9  | 27.51      | NA    | 47.22 | 46.6  | 22.28 | 46.6  | 32.54      | NA    | 47.23   | 46.0  | 23.3  | 46.6  | 32.63      | 31.48 | 46.76 | 47.92 | 23.3  | 47.92 | 39.48      | NA    | NA    | NA   | NA    | NA   | 0          | 26.44       |
| ole IV.   | $P_{p_M}$    | mW         | 4  | 13.2  | 13.7  | 9.12  | 13.43 | 9.35  | ues        | 17.76 | 13.68 | 9.35  | 13.43 | 9.35  | ues        | 17.76 | 13.8    | 9.58  | 13.43 | 9.35  | ues        | 17.76 | 13.8  | 9.12  | 13.43 | 9.12  | ues        | 9.24  | 9.24  | 9.35 | 13.43 | 9.35 | ues        |             |
| Tal       | $P_{p_S}$    | mW         | 3  | 17.28 | 17.28 | 17.28 | 8.87  | 17.28 | verage val | 17.51 | 25.92 | 17.51 | 17.28 | 17.51 | verage val | 17.51 | 26.15   | 17.74 | 17.51 | 17.51 | verage val | 25.92 | 25.92 | 17.51 | 17.51 | 17.51 | verage val | 8.87  | 8.87  | 8.87 | 8.87  | 8.87 | verage val | all average |
|           | ч            | с<br>U     | 2  | -     | 2     | m     | 4     | 5     | A          | -     | 2     | 3     | 4     | S     | A          | -     | 2       | e,    | 4     | 5     | A          | _     | 2     | e     | 4     | 5     | A          | 1     | 2     | 3    | 4     | S    | A          | Overa       |
|           |              |            | _  |       | Ξ     | щ     | ×     | д_    |            |       | 6     | ц     | -     | ч     |            |       | <u></u> | H     | A     | Ч     |            |       | 6     | -     | -     | ч     |            |       | 3     | A    | ъ     | ц    |            |             |

provide comparison with any other works.



Fig. 5. Average reductions in power or energy for various benchmark circuits

In order to study the power profile, we plotted the cycle power consumption for different benchmarks over all the control steps (clock steps). Fig. 6(a), 6(b), 6(c), 6(d), and 6(e) show power profiles for benchmarks for resource constraints RC1, RC2, RC3, and RC4, respectively. The curves labeled as "SF" correspond to the profile when the schedule is operated at a single frequency (which is the maximum frequency of slower operator, multiplier) and single voltage. The profiles labeled as "DFC" correspond to the case when dynamic clocking and multiple voltage scheme is used. Similarly, the profiles labeled as "MC" are for the MVMC scheme. The effectiveness of the proposed scheduling schemes can be seen from the plots. For MVDFC scheme, the power consmuption profile has improved compared to either MVMC or SVSF scheme.

#### 8. CONCLUSIONS

In low power designs for portable applications, the simultaneous minimization of total energy and transient power is essential. The metric CPF\* defined and used in this work essentially facilitates such simultaneous optimization using ILP formulations. The optimization is performed using MVDFC scheme and MVMC scheme. The datapath scheduling algorithm described in this paper is particularly useful for synthesizing data intensive application specific integrated circuits. The algorithm attempts to optimize energy and power while maintaining performance. The scheduling algorithm assumes number of different types of resources at each voltage levels (both MVDFC and MVMC) and the number of allowable frequencies (MVDFC scheme) as resource constraints. The energy delay product for both the MVDFC and MVMC scheduling scenario was estimated to keep track of the effect of scheduling algorithms on circuit performance. The MVDFC scheduling resulted in reduction of EDP for all benchmarks and all resource constraints, which shows its effectiveness. On the other hand, the MVMC scheme resulted in improvement in EDP in almost all cases, except for a few cases, where there was no improvement. *The results clearly indicate that the multiple supply voltage and dynamic frequency clocking scheme* 



Fig. 6. Power profile for benchmark circuit for different resource constraints

yields better power and energy minimization than the multiple supply voltage and multicycling scheme. The effectiveness of the scheduling schemes in the context of pipelined datapaths and also in control intensive applications, still needs to be investigated.

#### REFERENCES

- BENINI, L., MACII, E., PNOCINO, M., AND MICHELI, G. D. 1998. Telescopic Units : A New Paradigm for Performance Optimization of VLSI Design. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 17*, 3 (Mar), 220–232.
- BENINI, L., MICHELI, G. D., LIOY, A., MACII, E., ODASSO, G., AND PONCINO, M. 1999. Automatic Synthesis of Large Telescopic Units Based on Near-Minimum Timed Supersetting. *IEEE Transactions on Computers* 48, 8 (Aug), 769–779.
- BRYNJOLFSON, I. AND ZILIC, Z. 2000. Dynamic Clock Management for Low Power Applications in FPGAs. In *Proceedings of the IEEE Custom Integrated Circuits Conference*. 139–142.
- BURD, T., PERING, T. A., STRATAKOS, A. J., AND BRODERSEN, R. W. 2000. A Dynamic Voltage Scaled Microprocessor System. *IEEE Journal of Solid-State Circuits* 35, 11 (Nov), 1571–1580.
- CHANG, J. M. AND PEDRAM, M. 1997. Energy Minimization using Multiple Supply Voltages. *IEEE Transac*tions on VLSI Systems 5, 4 (Dec), 436–443.
- FOURER, R., GAY, D., AND KERNIGHAN, B. 2003. AMPL: A Modeling Language for Mathematical Programming. Thomson Brooks Cole.
- HSU, C. H., KREMER, U., AND HSIAO, M. 2000. Compiler-Directed Dynamic Frequency and Voltage Scheduling. In *Proceedings of the Workshop on Power-Aware Computer Systems*. 65–81.
- JOHNSON, M. AND ROY, K. 1997. Datapath Scheduling with Multiple Supply Voltages and Level Converters. *ACM Transactions on Design Automation of Electronic Systems 2*, 3 (July), 227–248.
- KIM, J. M. AND CHAE, S. I. 1996. New MPEG2 Decoder Architecture using Frequency Scaling. In Proceedings of the IEEE International Symposium on Circuits and Systems. 253–256.
- LANDMAN, P. E. AND RABAEY, J. M. 1995. Architectural Power Analysis : The Dual Bit Type Method. *IEEE Transactions on VLSI Systems 3*, 2 (Jun), 173–187.
- MANZAK, A. AND CHAKRABARTI, C. 2002. A Low Power Scheduling Scheme with Resources Operating at Multiple Voltages. *IEEE Transactions on VLSI Systems 10*, 1 (Feb), 6–14.
- MARTIN, R. S. AND KNIGHT, J. P. 1996a. Optimizing Power in ASIC Behavioral Synthesis. *IEEE Design and Test of Computers 13*, 2 (Summer), 58–70.
- MCCARL, B. A. AND SPREEN, T. H. 1997. Applied Mathematical Programming using Algebric Systems. Online Book at : http://agecon.tamu.edu/faculty/mccarl/regbook.htm.
- MOHANTY, S. P. AND RANGANATHAN, N. 2003a. A Framework for Energy and Transient Power Reduction During Behavioral Synthesis. In *Proceedings of the International Conference on VLSI Design*. 539–545.
- MOHANTY, S. P. AND RANGANATHAN, N. 2003b. Energy Efficient Scheduling for Datapath Synthesis. In *Proceedings of the International Conference on VLSI Design*. 446–451.
- MOHANTY, S. P., RANGANATHAN, N., AND CHAPPIDI, S. K. 2003. Peak Power Minimization Through Datapath Scheduling. In *Proceedings of the IEEE Computer Society Annual Symposium on VLSI*. 121–126.
- MOHANTY, S. P., RANGANATHAN, N., AND KRISHNA, V. 2002. Datapath Scheduling using Dynamic Frequency Clocking. In *Proceedings of the IEEE Computer Society Annual Symposium on VLSI*. 65–70.
- PANIK, M. J. 1996. Linear Programming : Mathematics, Theory and Practice. Kluwer Academic Publishers.
- PAPACHRISTOU, C., SPINING, M., AND NOURANI, M. 1999. A Multiple Clocking Scheme for Low Power RTL Design. *IEEE Transactions on VLSI Systems 7*, 2 (June), 266–276.
- PARK, S. AND CHOI, K. 2001. Performance-driven High-Level Synthesis with Bit-Level Chaining and Clock Selection. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 20*, 2 (Feb), 199–212.
- POUWELSE, J., LANGENDOEN, K., AND H.SIPS. 2001. Energy Priority Scheduling for Variable Voltage Processor. In Proceedings of the International Symposium on Low Power Electronics and Design. 28–33.
- RAGHUNATHAN, A. AND JHA, N. K. 1997. SCALP: An Iterative-Improvement Based Low-Power Datapath Synthesis System. *IEEE Transactions on CAD of Integrated Circuits and Systems 16*, 11 (Nov), 1260–1277.

- RAGHUNATHAN, V., RAVI, S., RAGHUNATHAN, A., AND LAKSHMINARAYANA, G. 2001. Transient Power Management through High Level Synthesis. In *Proceedings of the International Conference on Computer Aided Design*. 545–552.
- RAMPRASAD, S., SHANBHAG, N. R., AND HAJJ, I. N. 1997. Analytical Estimation of Signal Transition Activity from Word-Level Statistics. *IEEE Transactions on CAD of Integrated Circuits and Systems 16*, 7 (Jul), 718–733.
- RANGANATHAN, N., VIJAYKRISHNAN, N., AND BHAVANISHANKAR, N. 1996. A VLSI Array Architecture with Dynamic Frequency Clocking. In *Proceedings of the International Conference on Computer Design*. 137–140.
- RANGANATHAN, N., VIJAYKRISHNAN, N., AND BHAVANISHANKAR, N. 1998. A Linear Array Processor with Dynamic Frequency Clocking for Image Processing Applications. *IEEE Transactions on Circuits and Systems for Video Technology 8*, 4 (August), 435–445.
- RAO, S. S. 1996. Engineering Optimization : Theory and Practice. Addison-Wesley.
- SATYANARAYAN, J. H. AND PARHI, K. K. 2000. Theoritical Analysis of Word-Level Switching Activity in the Presence of Glitch and Correlation. *IEEE Transactions on VLSI Systems* 8, 2 (Apr), 148–159.
- SHIUE, W. T. 2000. High Level Synthesis for Peak Power Minimization using ILP. In *Proceedings of the IEEE* International Conference on Application Specific Systems, Architectures and Processors. 103–112.
- SHIUE, W. T. AND CHAKRABARTI, C. 2000a. ILP Based Scheme for Low Power Scheduling and Resource Binding. In *Proceedings of the IEEE International Symposium on Circuits and Systems (Vol. 3)*. 279–282.
- SHIUE, W. T. AND CHAKRABARTI, C. 2000b. Low-Power Scheduling with Resources Operating at Multiple Voltages. *IEEE Transactions on Circuits and Systems-II : Analog and Digital Signal Processing 47*, 6 (June), 536–543.
- SHIUE, W. T., DENISON, J., AND HORAK, A. 2000. A Novel Scheduler for Low Power Real Time Systems. In *Proceedings of the 43rd Midwest Symposium on Circuits and Systems*. 312–315.
- SINGH, D., RABAEY, J. M., PEDRAM, M., CATTHOOR, F., RAJGOPAL, S., SEHGAL, N., AND MOZDZEN, T. J. 1995. Power Conscious CAD Tools and Methodologies: A Perspective. *Proceedings of the IEEE 83*, 4 (Apr), 570–594.

Submitted on April 2003, revised on Mar 2004, and final version prepared on August 2004.