# Simultaneous Peak and Average Power Minimization during Datapath Scheduling for DSP Processors

Saraju P. Mohanty, N. Ranganathan and Sunil K. Chappidi Department of Computer Science and Engineering Nanomaterial and Nanomanufacturing Research Center University of South Florida, Tampa, FL 33620 {smohanty,ranganat,chappidi}@csee.usf.edu

# ABSTRACT

The use of multiple supply voltages for energy and average power reduction is well researched and several works have appeared in the literature. However, in low power design using deep submicron and nanometer technology, the peak power, peak power differential, average power and total energy are equally critical design constraints. In this work, we propose datapath scheduling algorithms for simultaneous minimization of peak and average power while maintaining performance by use of dynamic frequency clocking and multiple supply voltages. The algorithms use integer linear programming based models. The dynamic frequency clocking methodology is more useful for data intensive signal processing applications. The effectiveness of our scheduling technique is measured by estimating the peak power consumption, the average power consumption and the power delay product of the datapath circuit. Furthermore, the proposed scheduling scheme is compared with combined multiple supply voltages and multicycling scheme. Experimental results show that combined multiple supply voltages (3.3V, 2.4V) and dynamic frequency clocking scheme achieves significant reductions in peak power (72% on the average), average power (71% on the average) and power delay product (54% on the average).

# **Categories and Subject Descriptors**

B.5.1 [**Register-Transfer-Level Implementation**]: Datapath Design; B.5.2 [**Register-Transfer-Level Implementation**]: Automatic Synthesis, Optimization; G.1.6 [**Numerical Analysis**]: Optimization, Integer Programming

#### **General Terms**

Algorithms, Performance, Design, Reliability

#### **Keywords**

Peak power, Average Power, High-level Synthesis, Datapath Scheduling, Multiple Voltages, Dynamic Frequency Clocking

Copyright 2003 ACM 1-58113-677-3/03/0006 ...\$5.00.

## 1. INTRODUCTION

With the increase in chip densities and clock frequencies the demand for design of low power integrated circuits has increased. This trend of increasing chip density and clock frequency has made reliability a big issue for the designers mainly because of the high on-chip electric fields [18, 19, 20]. The average power reduction is essential for the following reasons : (i) to increase battery life time, (ii) to enhance noise margin, (iii) to reduce cooling and energy costs, (iv) to reduce use of natural resources and (v) to increase system reliability. The battery life time is determined by the Ah(ampere hour) rating of the battery. The battery life time may reduce due to high ampere consumption. Reduction of average power is essential to enhance noise margin (to decrease functional failure). The cost of packaging and cooling is determined by average current flow and hence by average power (energy). The increase in energy and average power increases the energy bill (Watt - hours or Wh). As the energy (or average power) consumption increases it necessitates the raise in generation and consequently escalates the usage of natural resources, which affect the environment. If the average current (power) is high then, the operating temperature of the chip increases, which may lead to failures.

The peak power is the maximum power consumption of the IC at any instance during its execution. In this work, peak power is defined as the maximum power consumption during any clock cycle. Reduction of peak power consumption is essential for the following reasons : (i) to maintain supply voltage levels and (ii) to increase reliability. High peak power can affect the supply voltage levels. The large current flow causes high IR drop in the power line, which leads to reduction of the supply voltage levels at different parts of the circuit. High current flow can reduce reliability because of hot electron effects and high current density. The hot electrons may lead to runaway current failures and electrostatic discharge failures. Moreover, high current density can cause electromigration failure. It is observed that the mean time to failure (MTF) of CMOS circuit is inversely proportional to the current density (or power density).

The reduction of energy or average power using multiple supply voltages is well researched and several works, such as [4, 3, 6, 10] have appeared. In multiple supply voltage scheme the functional units can be operated at different supply voltages. The energy savings in this scheme is often accompanied by degradation of performance because of increase in critical path delay due to aggressive use of multiple supply voltage functional units even at the critical path of the datapath circuit. The degradation in performance can be compensated using dynamic frequency clocking (DFC) [10], multicycling and chaining [13], and variable latency components [1]. In case of multicycling an operation is scheduled in more than one consecutive control step and in addition, each control step is of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

GLSVLSI'03, April 28-29, 2003, Washington, DC, USA.

equal length. On the other hand, in case of DFC, an operation is scheduled in one unique control step, but all the control steps of a schedule may not be of equal length. The clock frequency may be changed on the fly.

Peak power reduction through simultaneous assignment and scheduling is addressed in [8]. The authors use genetic algorithms for optimization of average and peak power. The same authors describe a behavioral synthesis system called PASSOS in [7]. They use the same approach as in [8] adding area optimization. In [15], ILP based scheduling and modified force directed scheduling have been proposed to minimize peak power under latency constraints. The ILP formulation considers multicycling and pipelining using single supply voltage. ILP based models to minimize peak power and peak area have been proposed in [16] for latency constrained scheduling. In [17], the authors describe a time constrained scheduling algorithm for real time systems using modified ILP model that minimizes both peak power and number of resources. The authors in [14] propose the use of data monitor operations for simultaneous peak power reduction and peak power differential. The authors advocate the need of judicious choice of transient power metric for minimization of area and performance overhead. In [11], heuristic based scheme is proposed that minimizes peak power, peak power differential, average power, energy altogether. In [12], the authors propose ILP based datapath scheduling schemes for peak power minimization under resource constraints. The scheduling algorithms handle multiple supply voltages, dynamic frequency clocking and multicycling. In this work, we propose scheduling scheme for simultaneous reduction of peak and average power at behavioral level using integer linear programming (ILP) based models.

## 2. PEAK AND AVERAGE POWER

In this section, we discuss different power terminologies with reference to a datapath circuit. Let us assume that the datapath is represented in the form of a sequencing data flow graph. The datapath uses various resources or functional units operating at different supply voltages. The level converters are considered as resources operating in the control step in which it needs to step up signal. The dynamic clocking unit (DCU) that generates dynamic frequency is accounted as a resource operating in all the control steps. For a data flow graph (DFG), we use the following notation and terminolgies. c = any control step or clock cycle in DFG

N =total number of control steps in the DFG

 $R_c$  = number of resources active in step c

 $f_c$  = cycle frequency for control step c

 $\alpha_{i,c}$  = switching at resource *i* operating in step *c* 

 $C_{i,c} =$  load capacitance of resource *i* operating in control step *c* 

 $V_{i,c}$  = operating voltage of resource *i* operating in control step *c* 

 $P_c$  = power consumption for the DFG for any control step c

 $P_p$  = maximum power consumption for the DFG

 $P_a$  = average power consumption for the DFG

T = critical path delay of the DFG

PDP = power delay product of the DFG

It may be noted that for single frequency and single supply voltage mode of operation,  $V_{i,c}$  and  $f_c$  are the same for any clock cycle (c) and resource (i). Similarly, for multicycling operation the  $f_c$  are the same for any clock cycle (c).

The power consumption for any control step c is

$$P_{c} = \sum_{i=1}^{R_{c}} \alpha_{i,c} C_{i,c} V_{i,c}^{2} f_{c}$$
(1)

The peak power consumption of the DFG is the maximum power

consumption over all the control steps which is expressed as below.

$$P_p = Max \left( P_c \right)_{\forall c=1,2,\dots,N} \tag{2}$$

We rewrite Eqn. 2 using Eqn. 1 as follows.

$$P_{p} = Max \left( \sum_{i=1}^{R_{c}} \alpha_{i,c} C_{i,c} V_{i,c}^{2} f_{c} \right)_{\forall c=1,2,...N}$$
(3)

The average power consumption of the DFG is characterised as the mean of the cycle powers  $(P_c)$  for all control steps.

$$P_a = \frac{1}{N} \sum_{i=1}^{N} P_c \tag{4}$$

Again using Eqn. 1, we rewrite Eqn. 4 as follows.

$$P_a = \frac{1}{N} \sum_{i=1}^{N} \sum_{i=1}^{R_c} \alpha_{i,c} C_{i,c} V_{i,c}^2 f_c$$
(5)

Since the simultaneous reduction of both peak and average power is aimed for, the objective function to be minimized by the scheduling algorithm is the sum of Eqn. 3 and 5.

The critical path delay of the DFG can be calculated as,

$$T = \sum_{i=1}^{N} \frac{1}{f_c} \tag{6}$$

It should be noted that the  $f_c$  is the same for single frequency and multicycling operations for all values of c and may be different for dynamic frequency clocking operations. The power delay product of the DFG is defined as the product of the average power consumption and critical path delay as shown below.

$$PDP = P_a * T \tag{7}$$

Using Eqn. 4 and 6, the following expression for the power delay product is obtained.

$$PDP = \frac{1}{N} \sum_{i=1}^{N} P_c * \sum_{i=1}^{N} \frac{1}{f_c}$$
(8)

Similarly, the following expression for the power delay product is arrived using Eqn. 5 and 6.

$$PDP = \frac{1}{N} \sum_{i=1}^{N} \sum_{i=1}^{R_c} \alpha_{i,c} C_{i,c} V_{i,c}^2 f_c * \sum_{i=1}^{N} \frac{1}{f_c}$$
(9)

To study the impact of the scheduling algorithms on the performance of the datapath the power delay product of the scheduled DFGs using the above expression will be estimated.

## **3. ILP FORMULATIONS : DFC**

In this section, the ILP formulation for simultaneous peak (Eqn. 3) and average power (Eqn. 5) minimization using multiple supply voltages and dynamic frequency clocking (DFC) are described. In dynamic frequency clocking [2, 5], the clock frequency is varied on-the-fly based on the functional units active in that cycle. In this clocking scheme, all the units are clocked by a single clock line which switches at run-time. The frequency reduction creates an opportunity to operate the different functional units at different voltages, which in turn, helps in further reduction of power. The following notations are used for ILP formulations.

O = total number of operations in the DFG excluding the source and sink nodes (NO-OPs)

 $o_i$  = any operation  $i, 1 \leq i \leq O$ 

 $F_{k,v}$  = functional unit of type k operating at voltage level v  $M_{k,v}$  = maximum number of functional units of type k operating at voltage level v

 $S_i$  = as soon as possible (ASAP) time stamp for the operation  $o_i$  $E_i$  = as late as possible (ALAP) time stamp for the operation  $o_i$ P(i, v, f) = power consumption of operation  $o_i$  at voltage level vand operating frequency f

 $x_{i,c,v,f}$  = decision variable which takes the value of 1 if operation  $o_i$  is scheduled in control step c using the functional unit  $F_{k,v}$  and c has frequency  $f_c$ 

(a) *Objective Function* : The objective is to minimize the peak power and the average power consumption of the whole DFG over all control steps simultaneously. These are already described above in Eqn. 3 and 5.

$$Minimize: P_p + P_a \tag{10}$$

Using decision variables the objective function can be rewritten as follows :

$$Min: P_p + \frac{1}{N} \sum_{c} \sum_{v} \sum_{i \in F_{k,v}} \sum_{f} x_{i,c,v,f} * P(i,v,f) \quad (11)$$

It should be noted that the  $P_p$  is an unknown which has to be minimized. It may be power consumption of any control step in the DFG depending on the scheduled operations and hence is later used as a constraint.

(b) Uniqueness Constraints : These constraints ensure that every operation  $o_i$  is scheduled to one unique control step within the mobility range  $(S_i, E_i)$  with a particular supply voltage and operating frequency. They are represented as,  $\forall i, 1 \le i \le O$ ,

$$\sum_{c} \sum_{v} \sum_{f} x_{i,c,v,f} = 1 \tag{12}$$

(c) *Precedence Constraints* : These constraints ascertain that for an operation  $o_i$ , all its predecessors are scheduled in an earlier control step and its successors are scheduled in an later control step. These are modelled as,  $\forall i, j, o_i \in Pred_{o_i}$ 

$$\sum_{v} \sum_{f} \sum_{d=S_{i}}^{E_{i}} d * x_{i,d,v,f} - \sum_{v} \sum_{f} \sum_{e=S_{j}}^{E_{j}} e * x_{j,e,v,f} \le -1$$
(13)

(d) *Resource Constraints* : These constraints establish that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v. These can be enforced as,  $\forall c, 1 \leq c \leq N$  and  $\forall v$ ,

$$\sum_{i \in F_{k,v}} \sum_{f} x_{i,c,v,f} \le M_{k,v} \tag{14}$$

(e) *Frequency Constraints* : This set ensures that if a functional unit is operating at higher voltage level then it can be scheduled in a lower frequency control step, whereas if a functional unit is operating at lower voltage level then it can not be scheduled in a higher frequency control step. These constraints are written as,  $\forall i$ ,  $1 \le i \le O$ ,  $\forall c, 1 \le c \le N$ , if f < v, then  $x_{i,c,v,f} = 0$ .

(f) *Peak Power Constraints* : These constraints make certain that the maximum power consumption of the DFG does not exceed  $P_p$  for any control step. These constraints are applied as follows,  $\forall c$ ,  $1 \leq c \leq N$  and  $\forall v$ ,

$$\sum_{i \in F_{k,v}} \sum_{f} x_{i,c,v,f} * P(i,v,f) \le P_p$$
(15)

# 4. ILP FORMULATIONS : MULTICYCLING

In this sectionwe the ILP formulations for simultaneous minimization of both peak and average power consumption of the DFG using multiple supply voltages and multicycling will be discussed. The following additional notations are used :

 $y_{i,v,l,m}$  = decision variable which takes the value of 1 if  $o_i$  is using the functional unit  $F_{k,v}$  and scheduled in control steps  $l \rightarrow m$  $L_{i,v}$  = latency for operation  $o_i$  using resource operating at voltage v (in terms of number of clock cycles)

(a) *Objective Function* : The objective is to minimize the peak and average power consumption of the whole DFG over all control steps. The expressions given in Eqn. 3 and Eqn. 5 are still valid here, with only difference being that  $f_c$  is the same for all control steps.

$$Minimize: P_p + P_a \tag{16}$$

In terms of decision variables, the above is written as :

$$\begin{aligned} Minimize : P_p \\ + \frac{1}{N} \sum_{l} \sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P(i,v,f_{clk}) \end{aligned} (17)$$

The  $P_p$  is used as a constraint later.

(b) Uniqueness Constraints : These constraints confirm that every operation  $o_i$  is scheduled in appropriate control steps within the mobility range  $(S_i, E_i)$  with a particular supply voltage. It may be operated at more than one clock cycle depending on the supply voltage. These constraints are represented as,  $\forall i, 1 \leq i \leq O$ ,

$$\sum_{v} \sum_{l=S_{i}}^{S_{i}+E_{i}+1-L_{i,v}} y_{i,v,l,(l+L_{i,v}-1)} = 1$$
(18)

When the operators are operating at highest voltage, they are scheduled in one unique control step, whereas, when they are to be operated at lower voltages they need more than one clock cycle for completion. Thus, for lower voltage the mobility is restricted.

(c) *Precedence Constraints* : These constraints guarantee that for an operation  $o_i$ , all its predecessors are scheduled in an earlier control step and its successors are scheduled in an later control step. These constraints should also take care of the multicycling operations. These are modelled as,  $\forall i, j, o_i \in Pred_{o_i}$ 

$$\sum_{v} \sum_{l=S_{i}}^{E_{i}} (l + L_{i,v} - 1) * y_{i,v,l,(l+L_{i,v}-1)} - \sum_{v} \sum_{l=S_{j}}^{E_{j}} l * y_{j,v,l,(l+L_{j,v}-1)} \leq -1$$
(19)

(d) *Resource Constraints* : These constraints make sure that no control step contains more than  $F_{k,v}$  operations of type k operating at voltage v. These can be enforced as,  $\forall v$  and  $\forall l, 1 \leq l \leq N$ ,

$$\sum_{i \in F_{k,v}} \sum_{l} y_{i,v,l,(l+L_{i,v}-1)} \le M_{k,v}$$
(20)

(e) *Peak Power Constraints* : These constraints ensure that the maximum power consumption of the DFG does not exceed  $P_p$  for any control step. These constraints are enforced as follows,  $\forall l$ ,  $1 \leq l \leq N$ 

$$\sum_{i \in F_{k,v}} \sum_{v} y_{i,v,l,(l+L_{i,v}-1)} * P(i,v,f_{clk}) \le P_p$$
(21)

# 5. ILP-BASED SCHEDULER

In this section, we will discuss the solutions for the ILP formulations obtained in the previous section. We use the same target architecture and characterised datapath components as in [10]. In this architecture, level converters are used when a low-voltage functional unit drives a high-voltage functional unit [4]. Peak power consumption of the DFG is minimized by the ILP based scheduler outlined in Fig. 1. The first step is to determine the as soon as possible (ASAP) time stamp of each operation. The second step is the determination of the as late as possible (ALAP) time stamp of each vertex for the DFG. The ASAP time stamp is the start time and ALAP time stamp is the finish time of each operation. These two times provide the mobility of a operation and the operation must be scheduled in this mobile range. This mobility graph needs to be modified for the multicycling scheme. Then the scheduler finds the ILP formulations based on the models described in section 2. At this point, the operating frequency of a functional unit is assumed as the inverse of its operational delay determined using the delay model given in [11]. After the ILP formulation is solved the scheduled DFG is obtained. The scheduler decides the cycle frequencies based on the formulas given in [11]. Finally, the power consumption of the scheduled DFG estimated.

Step 1: Find ASAP schedule of the UDFG.
Step 2: Find ALAP schedule of the UDFG.
Step 3: Determine the mobility graph of each node.
Step 4: Modify the mobility graph for multicycling.
Step 5: Construct the ILP formulations.
Step 6: Solve the ILP formulations using LP-Solve.
Step 7: Find the scheduled DFG.
Step 8: Determine the cycle frequencies for DFC scheme.
Step 9: Estimate the power consumptions of the DFG.

Figure 1: ILP-Based Scheduler

## 5.1 Scheduler using multiple supply voltages and dynamic frequency clocking

The solution for the ILP formulations for the multiple supply voltages and dynamic frequency clocking is illustrated using the DFG shown in Fig. 2. The ASAP schedule is shown in Fig. 2(a) and the ALAP schedule is shown in Fig. 2(b). From the ASAP and ALAP schedules the mobility graph shown in Fig. 2(c) is determined. Using this mobility graph, ILP formulations are made. The ILP formulations are solved using LP-solve and the scheduled DFG shown is Fig. 2(d) is obtained based on the results.

# 5.2 Scheduler using multiple supply voltages and multicycling

The solution for the ILP formulation for multiple supply voltages and multicycling is illustrated using the DFG shown in Fig. 3. The ASAP schedule is shown in Fig. 2(a) and the ALAP schedule is shown in Fig. 2(b). From the ASAP and ALAP schedules the mobility graph shown in Fig. 3(a) is obtained. This mobility graph is different from that shown in Fig. 2(c). The mobility graph considers the multicycle operations in the case of Fig. 3(a). Two operating voltage levels are assumed in Fig. 3(a). The multipliers take two clock cycles when operated at low voltage level. For the



Figure 2: Example DFG for resource constraint RC3; using multiple supply voltages and dynamic frequency clocking

characterised cells used in our experiment [10], the operating clock frequency,  $f_{clk}$  is 9MHz. The ILP formulations are obtained using this mobility graph. The ILP formulations are solved using LP-solve and based on the results the scheduled DFG shown in Fig. 3(b) is obtained.

# 6. EXPERIMENTAL RESULTS

The ILP-based schedulers for both multiple supply voltages and dynamic clocking frequency and multiply supply voltages and multicycling schemes were tested with five high-level synthesis benchmark circuits : (1) Example circuit (EXP), (2) FIR filter, (3) IIR filter, (4) HAL differential equation solver and (5) Auto-Regressive filter (ARF). The following notations are used to express results :  $P_{p_S}$ : the peak power consumption (in mW) for single supply voltage and single frequency operation

 $\tilde{P}_{pD}$ : the peak power consumption (in mW) for multiple supply voltages and dynamic frequency operation

 $P_{p_M}$ : the peak power consumption (in mW) for multiple supply voltages and multicycle operation

 $P_{aS}$ : the average power consumption (in mW) for single supply voltage and single frequency operation

 $P_{a\,D}$ : the average power consumption (in mW) for multiple supply voltages and dynamic frequency operation

 $P_{a M}$ : the average power consumption (in mW) for multiple supply voltages and multicycle operation

 $T_{\cal S}$  : the critical path delay for single supply voltage and single frequency operation

 $T_D$ : the critical path delay for multiple supply voltages and dynamic frequency operation

 $T_M$ : the critical path delay for multiple supply voltages and multicycle operation



(a) Mobility graph

(b) Final schedule

Figure 3: Example DFG for resource constraint RC3; using multiple supply voltages and multicycling

Table 1: Resource constraints used for our experiement

| R     | lesource ( | Constrain | ts    | Resource   |
|-------|------------|-----------|-------|------------|
| Multi | pliers     | AL        | .Us   | Constraint |
| 2.4 V | 3.3 V      | 2.4 V     | 3.3 V | Labels     |
| 2     | 1          | 1         | 1     | RC1        |
| 3     | 0          | 1         | 1     | RC2        |
| 2     | 0          | 0         | 2     | RC3        |
| 1     | 1          | 0         | 1     | RC4        |

 $PDP_{S} = P_{aS} * T_{S}$ : the power delay product (in nJ) for single supply voltage and single frequency operation

 $PDP_D = P_{aD} * T_D$ : the power delay product (in nJ) for multiple supply voltage and dynamic frequency clocking operation  $PDP_M = P_{aM} * T_M$ : the power delay product (in nJ) for mul-

tiple supply voltage and multicycle operation  $\Delta P_{p_D} = \frac{(P_{p_S} - P_{p_D})}{P_{p_S}} * 100 : \text{the percentage peak power reduction}$ using the multiple supply voltages and dynamic frequency scheme Using the multiple supply rotages and by  $\Delta P_{p_M} = \frac{(P_{P_S} - P_{P_M})}{P_{P_S}} *100 : \text{the percentage peak power reduction}$ using the multiple supply voltages and multicycle scheme  $\Delta PDP_D = \frac{(PDP_S - PDP_D)}{PDP_S} *100 : \text{the percentage PDP re tion multiple supply voltages and dynamic frequency.}$ 

duction using the multiple supply voltages and dynamic frequency scheme

 $\Delta PDP_M = \frac{(PDP_S - PDP_M)}{PDP_S} * 100$ : the percentage PDP reduction using the multiple supply voltages and multicycle scheme.

The schedulers were tested using different sets of resource constraints as shown in Table 1 for each benchmark circuit. The experimental results for various benchmark circuits are reported in Table 2 for both dynamic frequency clocking and multicycling schemes. The power estimation included the power consumption of the overheads, such as level converters (needed for multiple supply voltages scheme). It is assumed that each resource has equal switching activity  $(\alpha_{i,c})$ . The results for two supply voltages and switching = 0.5 are reported. The table also summerizes the average reductions for different benchmarks averaged over all resource constraints. It is obvious from the table that the reductions using combined multiple supply voltages and dynamic frequency clocking are appreciable. The power reductions for the proposed scheduling scheme are listed alongwith other scheduling algorithms dealing with peak power reduction in Table 3. The results are tabulated to present a general idea of relative performance and not to provide an exact comparison.

|                   |                | $\Delta PDP_M$   | 0     | 0     | 0     | 0    | 0         | 0     | 0     | 0     | 0     | 0         | 0     | 0     | 4.9   | 0     | 1.0       | 0.2   | 0     | 0     | 0     | 0.7       | 3.0  | 3.0  | 0    | 0    | 1.1       | 0.5        |
|-------------------|----------------|------------------|-------|-------|-------|------|-----------|-------|-------|-------|-------|-----------|-------|-------|-------|-------|-----------|-------|-------|-------|-------|-----------|------|------|------|------|-----------|------------|
| 5                 | (fu)           | $PDP_M$          | 2.92  | 3.1   | 3.1   | 2.95 |           | 4.85  | 5.12  | 5.12  | 4.9   |           | 4.97  | 5.12  | 4.66  | 5.37  |           | 5.88  | 6.15  | 6.20  | 5.93  |           | 4.85 | 4.85 | 5.0  | 5.0  |           |            |
| ng schemes        | DP Estimates ( | $\Delta PDP_D$   | 54.9  | 54.9  | 55.9  | 54.1 | 55.0      | 52.5  | 52.0  | 53.0  | 45.3  | 50.7      | 52.7  | 59.6  | 59.2  | 47.6  | 54.8      | 53.1  | 53.1  | 54.3  | 40.1  | 50.2      | 60.0 | 60.0 | 62.0 | 62.0 | 61.0      | 54.3       |
| scheduli          | d              | $PDP_{D}$        | 1.33  | 1.33  | 1.30  | 1.36 |           | 2.34  | 2.35  | 2.30  | 2.68  |           | 2.32  | 1.98  | 2.0   | 2.57  |           | 2.76  | 2.76  | 2.69  | 3.52  |           | 2.00 | 2.00 | 1.90 | 1.90 |           |            |
| rks using         |                | $PDP_{S}$        | 2.95  | 2.95  | 2.95  | 2.96 |           | 4.9   | 4.9   | 4.9   | 4.9   |           | 4.9   | 4.9   | 4.9   | 4.9   |           | 5.89  | 5.89  | 5.89  | 5.88  |           | 5.00 | 5.00 | 5.00 | 5.00 |           |            |
| enchma            |                | $\Delta P_{a M}$ | 25.8  | 21.2  | 37.0  | 0    | 21.0      | 17.5  | 12.9  | 24.7  | 16.7  | 18.0      | 18.9  | 30.4  | 40.6  | 8.7   | 24.7      | 33.4  | 30.3  | 39.8  | 16.0  | 29.9      | 24.4 | 24.4 | 18.9 | 23.1 | 22.7      | 23.3       |
| ates for <b>k</b> | (MM)           | $P_{aM}$         | 6.57  | 6.98  | 5.58  | 6.65 |           | 7.28  | 7.68  | 6.64  | 7.35  |           | 8.95  | 7.68  | 5.24  | 8.05  |           | 8.82  | 9.23  | 7.98  | 8.90  |           | 3.40 | 3.58 | 3.65 | 3.46 |           |            |
| P estima          | age Power      | $\Delta P_{a D}$ | 72.8  | 72.8  | 70.5  | 71.7 | 72.0      | 73.5  | 73.4  | 72.3  | 67.8  | 71.8      | 68.4  | 73.0  | 72.2  | 62.5  | 0.69      | 73.2  | 73.2  | 71.8  | 64.8  | 70.8      | 73.3 | 73.3 | 68.9 | 68.9 | 71.1      | 71.0       |
| and PD            | Ave            | $P_{a D}$        | 2.41  | 2.41  | 2.61  | 1.88 |           | 2.34  | 2.35  | 2.44  | 2.84  |           | 3.49  | 2.98  | 2.45  | 3.31  |           | 3.55  | 3.55  | 3.73  | 3.73  |           | 1.20 | 1.20 | 1.40 | 1.40 |           |            |
| Power             |                | $P_{aS}$         | 8.86  | 8.86  | 8.86  | 6.65 |           | 8.82  | 8.82  | 8.82  | 8.82  |           | 11.03 | 11.03 | 8.82  | 8.82  |           | 13.25 | 13.25 | 13.25 | 10.59 |           | 4.50 | 4.50 | 4.50 | 4.50 |           |            |
| Average           |                | $\Delta P_{pM}$  | 49.3  | 20.8  | 47.2  | 0    | 29.3      | 49.3  | 20.8  | 20.8  | 48.7  | 34.9      | 31.5  | 47.2  | 47.2  | 23.6  | 37.4      | 23.9  | 21.9  | 46.7  | 23.4  | 29.0      | 2.5  | 2.5  | 1.1  | 1.1  | 1.8       | 26.5       |
| Power,            | ( <i>M</i> )   | $P_{p M}$        | 8.76  | 13.68 | 9.12  | 8.86 |           | 8.76  | 13.68 | 13.68 | 8.86  |           | 17.76 | 13.68 | 9.12  | 13.20 |           | 13.32 | 13.68 | 9.34  | 13.42 |           | 8.64 | 8.64 | 8.76 | 8.76 |           |            |
| 2: Peak           | : Power (m     | $\Delta P_{p D}$ | 73.6  | 73.6  | 73.6  | 73.0 | 73.5      | 73.6  | 73.6  | 73.6  | 61.8  | 70.7      | 65.7  | 73.6  | 73.6  | 61.8  | 68.7      | 74.7  | 74.7  | 73.3  | 61.7  | 71.1      | 73.6 | 73.6 | 73.0 | 73.0 | 73.3      | 71.5       |
| Table             | Peak           | $P_{p D}$        | 4.56  | 4.56  | 4.56  | 2.39 | /alues    | 4.56  | 4.56  | 4.56  | 6.60  | /alues    | 8.88  | 6.84  | 4.56  | 6.60  | /alues    | 4.62  | 4.62  | 4.67  | 6.71  | /alues    | 2.34 | 2.34 | 2.39 | 2.39 | /alues    | nchmarks   |
|                   |                | $P_{p,S}$        | 17.28 | 17.28 | 17.28 | 8.86 | Average v | 17.28 | 17.28 | 17.28 | 17.28 | Average v | 25.92 | 25.92 | 17.28 | 17.28 | Average v | 17.51 | 17.51 | 17.51 | 17.51 | Average v | 8.86 | 8.86 | 8.86 | 8.86 | Average v | ver all be |
|                   | R              | U                | 1     | 5     | m     | 4    |           | 1     | 7     | e     | 4     |           | 1     | 2     | ŝ     | 4     |           | 1     | 2     | m     | 4     |           | 1    | 2    | ŝ    | 4    |           | erage o    |
|                   |                |                  | Ξ     | e     | х     | р    |           | 0     | f     |       | ч     |           | 3     |       |       | ŗ     |           | 4     | Ч     | a     | -     |           | (5)  | a    | 'n   | f    |           | Av         |

| Benc   | h-  | Percentage average data for various schemes |                 |              |              |              |              |              |              |              |              |  |  |  |  |
|--------|-----|---------------------------------------------|-----------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--|--|--|--|
| marl   | k   | DFC ba                                      | sed (This work) | Shiue        | e [15]       | Mart         | in [9]       | Raghun       | athan [14]   | Mohanty [11] |              |  |  |  |  |
| Circui | its | $\Delta P_p$                                | $\Delta P_a$    | $\Delta P_p$ | $\Delta P_a$ | $\Delta P_p$ | $\Delta P_a$ | $\Delta P_p$ | $\Delta P_a$ | $\Delta P_p$ | $\Delta P_a$ |  |  |  |  |
| EXP(   | 1)  | 73                                          | 73 72           |              | -            | -            | -            | -            | -            | -            | -            |  |  |  |  |
| FIR(2  | 2)  | 71                                          | 72              | 63           | NA           | 40           | NO           | 23           | 38           | 71           | 53           |  |  |  |  |
| IIR(3  | 3)  | 69                                          | 69              | -            | -            | -            | -            | -            | -            | -            | -            |  |  |  |  |
| HAL(   | (4) | 71                                          | 71              | 28           | NA           | -            | -            | -            | -            | 73           | 70           |  |  |  |  |
| ARF(   | 5)  | 73                                          | 71              | 50           | NA           | -            | -            | -            | -            | 68           | 67           |  |  |  |  |

Table 3: Power reduction for various scheduling schemes

# 7. CONCLUSIONS

Reduction of both peak power and average power consumption of a CMOS circuit is important. This paper addresses simultaneous peak power and average power reduction at behavioral level using low power datapath scheduling techniques. Two datapath scheduling schemes, one using multiple supply voltage and dynamic clocking and another using multiple supply voltage and multicycling have been introduced. ILP based optimization techniques were used for the above two modes of datapath operations. Significant amount of peak and average power reduction over the single supply voltage and single frequency scenario could be achieved in both the cases by the proposed scheduling algorithm. The reductions attained in peak power, average power and power delay product by using combined multiple supply voltage and dynamic frequency clocking were noteworthy. The results clearly indicate that the dynamic frequency clocking is a better scheme than the multicycling approach for power minimization.

## 8. **REFERENCES**

- L. Benini, E. Macii, M. Pnocino, and G. D. Micheli. Telescopic units : A new paradigm for performance optimization of VLSI design. *IEEE Trans. on CAD*, 17(3):220–232, Mar 1998.
- [2] I. Brynjolfson and Z. Zilic. Dynamic clock management for low power applications in FPGAs. In *Proc. of IEEE Custom Integrated Circuits Conference*, pages 139–142, 2000.
- [3] J. M. Chang and M. Pedram. Energy minimization using multiple supply voltages. *IEEE Trans. on VLSI Systems*, 5(4):436–443, Dec 1997.
- [4] M. Johnson and K. Roy. Datapath scheduling with multiple supply voltages and level converters. ACM Trans. on Design Automation of Electronic Systems, 2(3):227–248, July 1997.
- [5] J. M. Kim and S. I. Chae. New MPEG2 decoder architecture using frequency scaling. In *Proc. of ISCAS'96*, pages 253–256, 1996.
- [6] Y. R. Lin, C. T. Hwang, and A. C. H. Wu. Scheduling techniques for variable voltage low power design. ACM *Trans. on Design Automation of Electronic Systems*, 2(2):81–97, Apr 1997.
- [7] R. S. Martin and J. P. Knight. PASSOS: A different approach for assignment and scheduling for power, area and speed optimization in high-level synthesis. In *Proceedings of the 37th Midwest Symposium on Circuits and System (Vol.1)*, pages 339–342, 1994.
- [8] R. S. Martin and J. P. Knight. Optimizing power in ASIC behavioral synthesis. *IEEE Design & Test of Computers*, 13(2):58–70, Summer 1996.
- [9] R. S. Martin and J. P. Knight. Using SPICE and behavioral synthesis tools to optimize ASICs' peak power consupution. In Proc. of 38th Midwest Symposium on Circuits and Systems, pages 1209–1212, 1996.

- [10] S. P. Mohanty and N. Ranganathan. Energy efficient scheduling for datapath synthesis. In *Proc. of Intl. Conf. on VLSI Design*, pages 446–451, Jan 2003.
- [11] S. P. Mohanty and N. Ranganathan. A framework for energy and transient power reduction during behavioral synthesis. In *Proc. of Intl. Conf. on VLSI Design*, pages 539–545, Jan 2003.
- [12] S. P. Mohanty and N. Ranganathan and S. K. Chappidi. Peak power minimization through datapath scheduling. In *Proc. of IEEE CS Annual Symposium on VLSI (ISVLSI 2003)*, pages 121–126, Feb 2003.
- [13] S. Park and K. Choi. Performance-driven high-level synthesis with bit-level chaining and clock selection. *IEEE Trans. on CAD of Integrated Circuits and Systems*, 20(2):199–212, Feb 2001.
- [14] V. Raghunathan, S. Ravi, A. Raghunathan, and G. Lakshminarayana. Transient power management through high level synthesis. In *Proc. of ICCAD*, pages 545–552, 2001.
- [15] W. T. Shiue. High level synthesis for peak power minimization using ILP. In Proc. of IEEE International Conference on Application Specific Systems, Architectures and Processors, pages 103–112, 2000.
- [16] W. T. Shiue and C. Chakrabarti. ILP based scheme for low power scheduling and resource binding. In *Proc. of ISCAS*, pages III.279–III.282, 2000.
- [17] W. T. Shiue, J. Denison, and A. Horak. A novel scheduler for low power real time systems. In *Proc. of 43rd Midwest Symposium on Circuits and Systems*, pages 312–315, Aug 2000.
- [18] D. Singh, J. M. Rabaey, M. Pedram, F. Catthoor, S. Rajgopal, N. Sehgal, and T. J. Mozdzen. Power conscious cad tools and methodologies : A perspective. *Proceedings of the IEEE*, 83(4):570–594, Apr 1995.
- [19] D. Sylvester and H. Kaul. Power-driven challanges in nanometer design. *IEEE Design & Test of Computers*, 13(6):12–21, Nov-Dec 2001.
- [20] H. S. Yun and J. Kim. Power-aware modulo scheduling for high-performance VLIW processors. In *Proc. of the ISLPED*, pages 40–45, 2001.