# A Framework for Energy and Transient Power Reduction during Behavioral Synthesis

#### Saraju P. Mohanty and N. Ranganathan

Dept. of CSE, University of South Florida, Tampa, FL 33620, USA <u>smohanty, ranganat@csee.usf.edu</u>



# **Outline of the Talk**

- Different power parameters
- Related work
- Cycle power profile function
- Heuristic to minimize CPF
- Experimental results
- Conclusions



#### **Different Power and Energy Parameters**

- •Peak power
- •Cycle difference power
- •Peak power differential
- •Average Power
- •Total Energy



# **Peak Power**

The peak power is the maximum power consumption of the IC at any instance during its execution.

For a DFG, let  $P_c$  denote power consumption in any control step c, then we define peak (cycle) power as :

 $P_{peak} = maximum(P_c)$ , over all control steps



#### **Average Power and Total Energy**

Average power (P) = Average of (cycle power consumption i.e.  $P_c$ ) over all control steps

Total energy = Energy consumption for the DFG for all operations and control steps



# Cycle Difference Power and Peak Power Differential

Let,  $DP_c = absolute (P - P_c)$  denote the **cycle difference power**. This characterizes the power fluctuation for each cycle of DFG.

**Peak power differential** is defined as :  $DP_{peak} = maximum (DP_{c})$ 



#### **Transient Power ?**

Both the peak power and peak power differential drive the transient characteristic of a CMOS circuit.



# **Related Work**

# (Peak power reduction at behavioral level)

- Martin & Knight [7], 1996 simultaneous assignment and scheduling
- Raghunathan and et al. [13], 2001 also address peak power differential
- Shiue [15], 2000 ILP formulation to reduce peak power under latency constraints
- And many other works



#### Related Work: Martin and Knight [7]

•Peak power reduction is achieved through simultaneous assignment and scheduling

• Use minimization at one level of abstraction to achieve optimization at other level (specifically, simultaneous use of SPICE and behavioral synthesis tool)

- Genetic algorithm has been used for optimization
- •Peak power reduction : 40-60%,
- Average power penalty : 0.3-2.7%



#### **Related Work :** Raghunathan [13]

- •Simultaneous minimization of peak power and peak power differential
- Use data-monitor operations
- •Peak power reduction : 17-32%
- •Peak power differential reduction : 25-58%
- •Judicious use of transient power metric needed for minimization of area and performance overhead



#### Related Work : Shiue [15]

- •ILP based scheduling and modified force-directed scheduling
- •Peak power minimization under latency constraints
- •Single supply voltage, multicycling and pipelining
- •Peak power reduction : 0-75 %



# We Aim At :

- **Simultaneous reduction of :**
- •Peak power
- •Cycle difference power
- •Peak power differential
- •Average power
- •Total energy



# **Our Approach**

Define a new parameter (CPF) that captures all power parameters

Minimize the new parameter in using multiple supply voltage and dynamic frequency



# **Normalized Average Power (P<sub>norm</sub>)**

Normalized average power (P<sub>norm</sub>)

= Average of cycle power consumption over all control steps / maximum power consumption in any control step

= Average ( $P_c$ ) / maximum( $P_c$ )

 $= P / P_{peak}$ 



# **Normalized Average Cycle Difference Power (DP<sub>norm</sub>)**

Normalized average cycle difference power (DP<sub>norm</sub>)

= average cycle difference power over all control steps / maximum cycle difference power for any control step

- = Average  $(DP_c) / Maximum (DP_c)$
- = DP / DP<sub>peak</sub>



# Normalized cycle power profile function (CPF)

#### Normalized cycle power profile function is defined as :

 $CPF_{norm} = PF * P_{norm} + (1-PF) * DP_{norm}$ 

Where,  $PF = power profile factor used to make CPF_{norm}$ either cycle power dominating (average and peak) or difference power dominating (cycle difference and peak differential)

 $P_{norm}$  = normalized average power

DP<sub>norm</sub> = normalized average cycle difference power



# Normalized CPF .....

Is a function of five different parameters :

- Average power power (P)
- Peak power (P<sub>peak</sub>)
- Average cycle difference power (DP)
- Peak differential power (DP<sub>peak</sub>)
- Power profile factor (PF)



# **Each Power is Determined by :**

- $\alpha_{i,c}$  = switching activity of resource i active in control step c
- C<sub>i,c</sub> = load capacitance of resource i active in control step c
- V<sub>i,c</sub> = operating voltage of resource i active in control step c
- $f_c =$  frequency of control step c



# **CPF** Minimization

Minimization of the normalized cycle power profile function using multiple supply voltages and dynamic clocking frequency can minimize all the powers and energy parameters.



# **CPF-Scheduler**

**Input:** Unscheduled data flow graph, resource constraint, number of allowable voltage levels, number of allowable frequencies, load capacitance of each resource, delay of each functional unit at different voltage levels, operating frequencies and voltages

**Output:** Scheduled data flow graph, base frequency, cycle frequency index, operating voltage for each operation



# **CPF-Scheduling Algorithm Flow**

- **Step 1** : Get the ASAP and ALAP schedule
- **Step 2** : Modify the ASAP and ALAP schedules using the number of resources without operating voltage constraint
- Step 3 : Total No. of control steps = Maximum (ASAP steps, ALAP steps)
- **Step 4** : Find the vertices having zero and non-zero mobility
- **Step 5** : Use the CPF-Scheduler-Heuristic to assign time stamp, voltage level and cycle frequency such that  $CPF_{norm}$  is minimum
- **Step 6** : Find cycle frequency index for each cycle



#### **CPF-Scheduler Heuristic**

(01) initialize CurrentSchedule as ASAPSchedule ;

(02) while( all mobile vertices are not time stamped ) do

(03) for the CurrentSchedule

- (04) if ( $v_i$  is a multiplication) then find the lowest available voltage for multipliers;
- (05) if  $(v_i \text{ is add/sub})$  then find the highest available operating voltage for ALUs;
- (06) find CurrentCPF<sub>norm</sub> for CurrentSchedule; Maximum =  $-\infty$ ;

(07) for each mobile vertex  $v_i$ 

- (08)  $c_1 = CurrentSchedule[v_i]; c_2 = ALAPSchedule[v_i];$
- (09) for  $c = c_1$  to  $c_2$  in steps of 1

(10) find a TempSchedule by adjusting CurrentSchedule in which  $v_i$  is scheduled in c;

- (11) find next higher operating voltage for multiplication vertex (next lower for ALU operation) for the TempSchedule ;
- (12) find TempCPF<sub>norm</sub> for TempSchedule ; DiffCPF = CurrentCPF<sub>norm</sub>-TempCPF<sub>norm</sub>
- (13) if ( DiffCPF > Maximum ) then Maximum = DiffCPF ; CurrentVertex =  $v_i$ ; CurrentCycle = c ; CurrentVoltage = Operating voltage of  $v_i$
- (14) adjust CurrentSchedule to accommodate  $v_i$  in c operating at voltage assigned above ;



# **CPF-Scheduler Heuristic : Explanations**

- The heuristic is used to find proper time stamp, operating voltage for mobile vertices such that the  $CPF_{norm}$  is minimum for whole DFG.
- Initially assumes the modified ASAP schedule (with relaxed voltage resource constrained) as the current schedule.
- The Current $CPF_{norm}$  value for the current schedule is calculated.
- The heuristic finds  $CPF_{norm}$  values (Temp $CPF_{norm}$ ) for each allowable control step of each mobile vertices and for each available operating voltages.
- The heuristic fixes the time step, operating voltage and hence cycle frequency for which  $CPF_{norm}$  is minimum.



# **Experimental Results : Resource Constraints Used**

| Multipliers |      | ALUs |      | Serial |
|-------------|------|------|------|--------|
| 3.3V        | 5.0V | 3.3V | 5.0V | No     |
| 1           | 0    | 0    | 1    | 1      |
| 2           | 0    | 0    | 1    | 2      |
| 2           | 0    | 0    | 2    | 3      |
| 2           | 0    | 1    | 1    | 4      |
| 1           | 1    | 1    | 1    | 5      |



#### **Notations Used to Describe the Results**

- $\Delta P_p = (P_{pS} P_{pD})/P_{pS} = peak power reduction$
- $\Delta DP = ((P_{pS}-P_{mS}) (P_{pD}-P_{pD})) / (P_{pS}-P_{mS}) =$ peak differential reduction
- $\Delta P = (P_S P_D)/P_S = average power reduction$
- $\Delta E = (E_S E_D)/E_S = reduction in total energy$

#### Where,

subscript S : single voltage and single freq operation
subscript D : multiple voltage and dynamic freq
Subscript m : minimum power



#### **Percentage Reductions for Different Benchmarks**

|     | RCs | $\Delta P_{p}$ | ΔDP | $\Delta P$ | ΔΕ |
|-----|-----|----------------|-----|------------|----|
| ARF | 1   | 63             | 68  | 71         | 47 |
| (1) | 3   | 70             | 72  | 69         | 47 |
| BPF | 1   | 73             | 79  | 66         | 46 |
| (2) | 3   | 73             | 87  | 71         | 46 |
| DCT | 1   | 63             | 68  | 50         | 41 |
| (3) | 3   | 61             | 72  | 67         | 41 |
| EWF | 1   | 73             | 79  | 41         | 44 |
| (4) | 3   | 69             | 72  | 55         | 44 |
| FIR | 1   | 70             | 75  | 58         | 46 |
| (5) | 3   | 77             | 84  | 54         | 46 |
| HAL | 1   | 73             | 94  | 73         | 51 |
| (6) | 3   | 76             | 97  | 70         | 51 |



#### **Average Reductions for Benchmarks**







#### Power Profiles for Benchmarks ( $\alpha = 0.5$ , PF = 0.5, RC1)





#### **Power Profiles ...... (\alpha = 0.5, PF = 0.4, RC2)**





#### **Power Profiles ...... (\alpha = 0.3, PF = 0.8, RC3)**





#### **Power Profiles ...... (\alpha = 0.4, PF = 0.2, RC4)**





#### **Power Profiles ...... (\alpha = 0.4, PF = 0.7, RC5)**





#### **CPF Vs PF plot** ( $\alpha = 0.5$ , **RC3** and **RC4**)





# **Reductions Using Different Algorithms** (Only peak power reduction avg data given)

|     | CPF | Shiue[15] | Martin[7] | Raghunathan [13] |
|-----|-----|-----------|-----------|------------------|
| ARF | 68  | 50        | _         | -                |
| BPF | 71  | -         | -         | -                |
| DCT | 64  | 50        | 71        | 28               |
| EWF | 72  | 0         | -         | -                |
| FIR | 71  | 63        | 45        | 23               |
| HAL | 73  | 28        | -         | -                |



# Conclusions

- •This work is a unified framework for simultaneous power and energy reduction
- •The CPF parameter defined and used in this work facilitates such simultaneous reduction
- CPF-Scheduler algorithm developed that takes resources constraints, minimizes CPF
- •The average time penalty is estimated to be 40%
- •Future works needs to be done using better optimization technique



# Thank you

