#### A DOE-ILP Assisted Conjugate-Gradient Approach for Power and Stability Optimization in High-ĸ/ Metal-Gate SRAM

G. Thakral<sup>1</sup>, S. P. Mohanty<sup>1</sup>, D. Ghai<sup>1</sup>, and D. K. Pradhan<sup>2</sup> Department of Computer Science and Engineering, University of North Texas, USA. Department of Computer Science, University of Bristol, UK<sup>2</sup> Email-ID: saraju.mohanty@unt.edu

Acknowledgments: This research is supported in part by NSF award CNS-0854182.







1

#### **Outline of the Talk**

- Introduction
- Novel Contributions of this paper
- Related Prior Research
- Proposed Flow for Optimal Design of High-к NANO-CMOS SRAM
- Optimization methodologies for 10 Transistor SRAM
- Optimized Results
- Conclusions







#### Introduction

 Static Random Access Memories arrays are widely used as cache memory in microprocessors and application-specific integrated circuits occupy a significant portion of the die area.

In January 2010, a leading edge IC contained approximately 2 billion transistors.

- The process technology scaling and push for better performance enabled embedding of millions of SRAM cells into contemporary Integrated Circuits (ICs).
- In an attempt to optimize the power consumption/performance/cost ratio of such chips, designers are faced with a dilemma.







#### **Motivation For SRAM Research**

- Millions of minimum-size SRAM cells are tightly packed
- Such areas on the chip can be especially susceptible and sensitive to manufacturing defects and process variations.
- The stability is a growing concern in the design as the process technology continues to scale deeper
- Up 70% of die area is occupied by cache
- To meet performance and throughput requirements







L1 Cache 16KB-I and 16KE L2 Cache 256KB-I & D L3 Cache 1.5-25 MB



Source: Process-Aware SRAM Design and Test. Authors: Andrei Pavlov & Manoj Sachdev







#### Design Challenges for SRAM Why Low Power?



#### **Related Research: SRAM**



NanoSystem Design

laboratory |

UNIVERSITY OF NORTH TEXAS

Discover the power of ideas

7

#### **Contributions of This Paper**

- A novel design flow is proposed for power minimization and stability maximization in nano-CMOS SRAM circuits.
- A high-κ/metal-gate 32nm 10-transistor SRAM is subjected to this methodology to show it's effectiveness.
- A novel DOE-ILP based approach is proposed for power minimization in a SRAM circuit.
- A conjugate-gradient based algorithm is proposed for SNM maximization of the SRAM.
- Process variation analysis for robustness to study the SRAM.
- An 8 × 8 array is constructed using optimal SRAM cells.







#### High-K based 10-TRANSISTOR SRAM



#### **Highlights of 10T SRAM**

• Two inverters connected back to back in a closed loop fashion in order to store the 1-bit information

•Three transmission gates read, write and hold states, instead of access transistors used in the traditional 6-transistor SRAM

•Transmission gates carefully input and output the data to/ from the cell node Q at full logic level.

•This provide full swing during write and read operation.







#### High-KNANO-CMOS SRAM Models

- 1. For the design and simulation of SRAM presented in this research, a 32nm high-κ/metal-gate CMOS PTM is used.
- 2. For the PTM based on BSIM4/5, two methods are adopted:
  - The model parameter in the model file that denotes relative permittivity (EPSROX) is changed.
  - The equivalent oxide thickness (EOT) for the dielectric under consideration is calculated.
  - The total power of a nano-CMOS circuit is defined as:

$$P_{total} = P_{dynamic} + P_{subthreshold}$$

- The use of high-κ metal-gate technology eliminates the gate leakage in SRAM.







#### **Operations of Proposed SRAM**



#### Proposed Flow for Optimal Design of High-K NANO-CMOS SRAM

- Dual-V<sub>Th</sub> voltage technique has strong impact on power dissipation and SNM of the SRAM.
- This is performed using a
   DOE-ILP based approach
- ILP is used to the linear equations which ensures minimum power SRAM cell configuration.









## Proposed Flow for Optimal Design of High-K NANO-CMOS SRAM contd...

• However, this results in degradation in the stability (SNM) of the SRAM.

• To improve the stability of the SRAM, the minimumpower configuration SRAM is subjected to the conjugategradient based optimization loop for SNM maximization

• The parameter set for optimization includes the widths and lengths of the access, load and driver transistors of the SRAM cell.

• The output of this optimization loop is a highly stable SRAM cell, which consumes minimum power and better performance.







#### Optimization methodologies for 10-Transistor SRAM







#### DOE-ILP Approach for Minimum Power/Leakage Configuration

- Approach that uses both DOE and ILP is deployed for power minimization of the SRAM.
- Design of Experiments based approach is implemented using a 2-Level Taguchi L-12 array.
- The factors are the  $V_{Th}$  states of 10 transistors of the SRAM cell, and the response under consideration is the average power consumption of the cell ( $f_{PSRAM}$ ).







#### **DOE-ILP Approach contd...**

Equation: 
$$\left(\frac{\partial(n)}{2}\right) = \left(\frac{avg(+1) - avg(-1)}{2}\right)$$

where:  $\left(\frac{\partial(n)}{2}\right)$  is the half effect of nth transistor

avg(+1) avg power when transistor n is in high- $V_{th}$  state. avg(-1) avg power when transistor n is in low- $V_{th}$  state.

 Using other methods like full factorial would take 2<sup>10</sup> = 1024 runs, whereas the L-12 Taguchi array requires 12 runs.







#### DOE-ILP Approach for Minimum Power/Leakage Configuration contd...









### **Conjugate-gradient approach**

- Input: Minimum power configuration SRAM, Baseline model file, Highthreshold model file, Objective Set F = [SNM<sub>SRAM</sub>, P<sub>SRAM</sub>], Stopping Criteria S, parameter set D = [W<sub>pl</sub>, L<sub>pl</sub>, W<sub>nd</sub>, L<sub>nd</sub>, W<sub>pa</sub>, L<sub>pa</sub>, W<sub>na</sub>, L<sub>na</sub>], Lower parameter constraint C<sub>low</sub>, Upper parameter constraint C<sub>up</sub>.
- **Output:** Optimized objective set  $F_{opt}$ , Optimal parameter set  $D_{opt}$  for  $S \le \pm \beta$ . {1%  $\le \beta \le 5\%$  }
- Run initial simulation with initial guess of D.
- while (C<sub>low</sub> < D < C<sub>up</sub>) do
   Use Conjugate gradient method to generate new set of parameters
   D' = D ± ΔD
- Compute  $F = [SNM_{SRAM}, P_{SRAM}].$
- if  $(S \le \pm \beta)$  then

return 
$$D_{opt} = D'$$
.

- end if
- end while
- Using D<sub>opt</sub>, simulate the optimal SRAM.
- Record  $F_{opt}$  for the optimal SRAM.







#### Conjugate-gradient approach...









## Optimization Results for Power, Performance and Process Variation







#### **SRAM results after Optimization**

| Parameters                 | Values   |
|----------------------------|----------|
| <b>P</b> <sub>SRAM</sub>   | 314.5 nW |
| <b>SNM</b> <sub>SRAM</sub> | 295 mV   |



#### **Process Variation for 10T SRAM**



Effect of process variation on the butterfly curve of SRAM

Distribution of "High SNM" and "Low SNM"

Distribution of average power of SRAM

| SNM Value | μ (mV ) | σ (mV ) |
|-----------|---------|---------|
| SNM High  | 330.7   | 71.9    |
| SNM Low   | 290.3   | 12.7    |







#### Array organization for 10T SRAM



 $\bullet$  As per the design flow, an 8  $\times$  8 array is constructed using the optimized cell

 $\bullet$  The average power consumption of the array is 1.2  $\mu W$ 







#### **Conclusions and Future Work**

• A methodology is presented for cell-level optimization of SRAM power and stability.

• A 32nm high-κ metal gate 10-transistor SRAM is subjected to the proposed methodology which has shown 86% reduction in power and 8% increase in SNM.

• A novel DOE-ILP approach has been used for power minimization, and conjugate gradient method is used for SNM maximization.







#### **Conclusions and Future Work ...**

• The effect of process variation of 12 parameters on the proposed SRAM is evaluated.

• A 8 × 8 array has been constructed using the optimized cell and data for power and read static noise margin is presented.

• The future scope of this research involves array-level optimization of SRAM.

• For array optimization, both mismatch and process variation will be considered as part of the design flow.







# Thank you!!





