A Combined DOE-ILP Based Power and Read Stability Optimization in Nano-CMOS SRAM

G. Thakral, S. P. Mohanty and D. Ghai Dept. of Comp. Science & Engineering University of North Texas, USA. Email: saraju.mohanty@unt.edu Dhiraj K. Pradhan Dept of Computer Science University of Bristol, UK. Email: pradhan@compsci.bristol.ac.uk

Acknowledgment: This research is supported in part by NSF award numbers CCF- 0702361 and CNS-0854182.





## **Outline of the talk**

- Introduction
- Problem Statement: How to decrease power while maintaining performance of SRAM?
- > Solutions: Assigning high/low V<sub>th</sub> to transistors
- > Proposed Optimal SRAM Design Flows
- > Experimental Results: Nominal an Monte Carlo
- > Related Prior Research
- Conclusions and Future Research





#### Why Efficient SRAM Design?

Cache (MB)

130nm

130nm

65nm

- Amount of on-die caches increases
- Up to 60% of the die area is devoted for caches in typical processor and embedded application.
- Largely contributes for leakage and power density.



#### **Issues in Nano CMOS**







#### Nano-CMOS SRAM Design Challenges ...

#### In nano-CMOS regime following are the major issues:

- Data stability and functionality
  - Non-destructive read
  - Successful write
  - Noise sensitivity
- Proper sizing of the transistors
  - To improve the write ability
  - To improve the read stability
  - To improve the data retention
- Minimum size of transistors to maximize the memory density.
- Minimum leakage for low-power design.
- Minimum read access time to improve the performance.



6transistor-SRAM





#### Nano-CMOS SRAM Design Challenges



- For proper read stability: N1 and N2 are sized wider than N3 and N4.
- For successful write: N3 and N4 are sized wider than P1 and P2.
- Minimum sized transistors do not provide good stability and functionality.
- SRAM cell ratio (β): ratio of driver transistor's W/L to access transistor's W/L.





# Single-Ended 7-Transistor SRAM



#### **Highlights of this SRAM**:

Single-ended I/O latch style 7transistor SRAM.

Functions in ultra-low voltage regime allowing subthreshold operation.

Better read stability, better writeability compared to standard SRAM.

Improved nanoscale process variation tolerance compared to the standard 6-transistor SRAM.



#### Source: Our publication in SOCC 2008



#### **Research Question**

# How to reduce power dissipation while maintaining/enhancing stability of SRAM.





### **The Solution Explored in This Paper**

- To reduce the power consumption this research investigates the process level technique, called dual-V<sub>th</sub>.
- Important is the selection of appropriate transistors for high-Vth assignment so that performance of SRAM is not degraded.
- SRAM is subjected to the dual-V<sub>th</sub> assignment using a novel combines Design of Experiments-Integer Linear Programming (DOE-ILP) algorithms.





## **Stability Analysis of SRAM: SNM**

 Static Noise Margin (SNM): It is the amount of maximum DC voltage (Vn) in this case, that SRAM can tolerate.









## **Currents in 7-Transistor SRAM: Write**







## **Currents in 7-Transistor SRAM: Read**









## **Combined DOE-ILP Approach**

- Design of Experiments (DOE) consists of purposeful changes of inputs (factors) to a process in order to observe the corresponding changes in the outputs (responses).
- Integer linear programming (ILP) is a technique for optimization of a linear objective function, subject to linear equality and linear inequality constraints. ILP determines the way to achieve the best outcome (such as maximum profit or lowest cost) in a given mathematical model and given some list of requirements represented as linear equations.





#### **Combined DOE-ILP Approach: Solution 1**



1: Input: Baselinecircuit, Nominal/Hgh-VTh models.

- 2: Output ObjectivesetSobj=[fpwr, fsnm] withtransistors identified for high VTh assignment
- 3: Setupexperimentfor transistors of SRAM cell using 2-Level Taguchi L-8array, where the factors are the transistors and the responses are average Psram and read SNMsram.

4: for Each1:8 experiment of 2-Level Taguchi L-8 array do

5: Perfom simulation and record Psram and SNMsram

6: end for

7: Form predictive equations  $\hat{f}_{PWR}$  for power,  $\hat{f}_{SNM}$  for SNM.

8: Solvefpwr using ILP. Solutionset: Spwr

9: Solve fsnm using ILP. Solutionset: Ssnm

10: Form  $S_{OBJ} = S_{PWR} \bigcap S_{SNM}$ 

11: Assign high  $V_{Th}$  to transisors based on SOBL

#### Algorithm -1





#### **DOE Predictive Equations**

$$\hat{f} = \overline{f} + \sum_{n=1}^{7} \left( \frac{\Delta(n)}{2} \times x_n \right),$$

Where:

- $X_n$  is the  $V_{Th}$  -state of transistor of nth transistor;
- $\hat{f}$  is the response of the transistor; (e.g. Power, SNM)
- $\left(\frac{\Delta(n)}{2}\right)$  is the half-effect of the nth transistor ;





## **Combined DOE-ILP Approach: Solution 2**



#### **Design Flow-2**

1: Input: Baseline circuit, Nominal/High - VTh models.

- 2: Output: Objective set Sobj \* = [fpwr \*, fsnm\*] with transistors identified for high VTh assignment
- 3: Setup experiment for transistors of SRAM cell using 2-Level Taguchi L-8 array, where the factors are the transistors and the responses

are average Psram and read SNMsram.

4: for Each1:8 experiments of 2-Level TaguchiL-8 array do

5: Perform simulations and record Psram and SNMsram.

6: end for

7: Form normalized predictive equations:  $\hat{f}_{PWR}$  \* and  $\hat{f}_{SNM}$  \*.

8: Form 
$$\hat{f}_{OBJ}^* = \left(\frac{\hat{f}_{PWR}^*}{\hat{f}_{SNM}^*}\right)$$

9: Solve  $\hat{f}_{OBJ}$ \*= using ILP. Solution set: Sobj\*.

10: Assign high  $V_{Th}$  to transistors based on  $S_{OBJ}$ \*.

#### Algorithm - 2





# **Selection of Appropriate Transistors**



#### **Configuration for flow 1**



#### **Configuration for flow 2**





#### **Experimental Results: 4 Alternatives**

| Design<br>Alternative | Parameter       | Value    | Change        |
|-----------------------|-----------------|----------|---------------|
| Baseline              | <b>P</b> sram   | 203.6 nW | -             |
|                       | <b>SNM</b> sram | 170mV    | -             |
| Spwr                  | $P_{sram}$      | 26.34 nW | 87.1%decrease |
|                       | <b>SNM</b> sram | 231.9 mV | 26.7%increase |
| Ssnm                  | <b>P</b> sram   | 113.6 nW | 44.2%decrease |
|                       | <b>SNM</b> sram | 303.3 mV | 43.9%increase |
| Sobj                  | Psram           | 113.6 nW | 44.2%decrease |
| Approach 1            | <b>SNM</b> sram | 303.3 mV | 43.9%increase |
| Sobj *                | <b>P</b> sram   | 100.5 nW | 50.6%decrease |
| Approach 2            | <b>SNM</b> sram | 303.3 mV | 43.9%increase |





#### **Experimental Results: SNM**



UNIVERSITY OF NORTH TEXAS Discover the power of ideas

#### **Experimental Results: Power/SNM**







#### **Monte-Carlo Distribution Results** . . .





## **Monte Carlo Simulation Results**

| Optimization     | Parameter       | Mean      | Standard deviation |
|------------------|-----------------|-----------|--------------------|
| Spwr             | <b>P</b> sram   | 28.91 nW  | 8.26 nW            |
|                  | SNM sram        | 180 mV    | 30 mV              |
| <b>S</b> snm     | <b>P</b> sram   | 147.73 nW | 101.4 nW           |
|                  | <b>SNM</b> sram | 295 mV    | 28 mV              |
| SOBJ : Approach1 | <b>P</b> sram   | 147.73 nW | 101.4 nW           |
|                  | <b>SNM</b> sram | 295 mV    | 28 mV              |
| SOBJ : Approach2 | <b>P</b> sram   | 135.24 nW | 101.85 nW          |
|                  | <b>SNM</b> sram | 295 mV    | 28 mV              |





## Conclusions

- A methodology for simultaneous optimization of SRAM power and read stability is presented.
- A 45nm single ended seven transistor SRAM was subjected to the proposed methodology (novel DOE-ILP algorithms) leading to 50.6% power reduction and 43.9% increase in read stability (read SNM).
- The effect of process variation of twelve process parameters on the SRAM is evaluated, and it is found to be process variation tolerant.
- A  $8 \times 8$  array has been constructed using the optimized cells whose average power consumption is  $4.5\mu$ W.





#### **Future Research**

- Future research will involve SRAM-array optimization where variability will be accounted in flow.
- Along with the states of transistors, the sizes will also be considered which will increase the solution space of the algorithms.
- In addition to the power, performance and process variation, thermal effects will also be taken into account.





# Thank you!!