# DfX for Nanoelectronic Circuits and Systems

Saraju Mohanty

NanoSystem Design Laboratory (NSDL)

Dept. of Computer Science and Engineering

University of North Texas, Denton, TX 76203, USA.

Email: saraju.mohanty@unt.edu





#### **Ancient Computing Machines -- Mechanical**



2400 BC

- -- The abacus
- -- The first known calculator
- -- Invented in Babylonia



1832 AD

- -- The Babbage Difference Machine
- -- Tabulated polynomial functions
- -- Invented in Britain





#### The First Electronic Computer



1946

- -- ENIAC -- The first electronic general-purpose computer.
- -- Turing-complete, digital, and programmable.
- -- Invented in USA.



#### **Current Computing Systems**





**Tablet** 



Slate PC





Smart Phone

A green light to greatness.





### **Smallest Single-Board Computers**



Raspberry Pi



BeagleBone





#### **Variety of Integrated Circuits or Chips?**



Low-Cost ASIC



Communication Chip



Secure Media Processor



Intel Core i7 LGA1366 processor has 1366 pins.



**ADC Chip** 







### Intel Haswell Chip -- 2013

4th Generation Intel® Core™ Processor Die Map 22nm Tri-Gate 3-D Transistors



Quad core die shown above

Transistor count: 1.4 Billion

Die size: 177mm<sup>2</sup>

\*\* Cache is shared across all 4 cores and processor graphics







### **GPU** with Highest Transistor Count



Nvidia GK110 has 7.1 billion transistors of a 28nm technology.

Source: http://www.tomshardware.com/news/nvidia-tesla-k20-gk110-gpu,15683.html







# Processor for Mobile Systems: Essentially AMS-SoCs

Cortex-A9 Cortex-A9 Image Processor Processor Processor HD Video Decode ARM 7 Processor 2D/3D HD Video Audio Graphics Encode Processor Processor Processor

ARM-based from Qualcomm



NVIDIA's Tegra 2 die

Source: http://www.anandtech.com

Snapdragon S4 Block Diagram

Source: http://www.cnx-software.com

A green light to greatness.





Technology Miniaturization (aka Technology Scaling)

Nano

New Technology (Alternative Devices)



#### **How Small in Nano??**



- "nano" means onebillionth, or 10<sup>-9</sup>
- A sheet of paper is about 100,000 nanometers thick

A human hair is approx.
 100,000 nanometers wide

Source: <a href="http://www.nano.gov/nanotech-101/what/nano-size">http://www.nano.gov/nanotech-101/what/nano-size</a>

### A Typical Nanoelectronic System





### Scaling Reduces Power Dissipation





1 Virtex-7 2000T 19 Watts 4 Largest Monolithic FPGAs 112 Watts

Source: <a href="http://low-powerdesign.com/sleibson">http://low-powerdesign.com/sleibson</a>





### Scaling Reduces Cost of Electronics

In 1986: 1.3 megapixels CCD sensor Kodak camera was \$13,000. You can buy now for few dollars.



Nikon D7000 DSLR camera.

16 MP → \$700

Source: <a href="http://www.lensrentals.com/blog/2012/04/d7000-dissection">http://www.lensrentals.com/blog/2012/04/d7000-dissection</a>



### Nanoelectronics: Challenges





# DfX -- Design for X (aka Design for Excellence)

- **X** = set of IC design challenges
  - Manufacturability
  - Power
  - Variability
  - Cost
  - Yield
  - Reliability
  - Test
  - Debug





**Designers** 

Source: ISVLSI 2012 Andrew Kahng Keynote



# Consumer Electronics Demand More and More Energy



A green light to greatness.



### Different Electronic Systems: Common Story







- Smarter ... Faster ... High Throughput ...
- → Power Hungry!! Battery Hungry!!

A green light to greatness.





### **Battery Dependency: Not Overstated**





One 787 Battery: 12 Cells / 32 V DC



Boeing 787's across the globe were grounded in early 2013.

Source: <a href="http://www.newairplane.com">http://www.newairplane.com</a>





### **Battery Dependency: Not Overstated**



- Great idea: Smartwatch with functioning like smartphone.
- Big Problem: Battery life of one time charging is only 1 day.

Source: <a href="http://www.businessinsider.com">http://www.businessinsider.com</a>

# A Typical Electronic System: Where Energy Consumed??



Power dissipation breakdown in idle mode of a connected mobile device

Source: Pering MobiSys 2006





#### **DfP: Possible Solution Fronts**



### DfP: Design of an Universal Level Converter for Dynamic Power Management

# One Example Electronic System: Secure Digital Camera



# Universal Voltage-Level Converter: One Topology



- 20 transistor area efficient design.
- Energy hungry transistors are circled.

- Energy hungry transistors have thicker oxide.
- 90nm CMOS dual-oxide physical design of ULC.



A green light to greatness.



## Universal Voltage-Level Converter: Operations

#### Operations of the ULC:

- Level-up conversion
- Level-down conversion
- Blocking of input signal

| Select Signal |   | Type of Operation |  |  |
|---------------|---|-------------------|--|--|
| 0             | 0 | Block Signal      |  |  |
| 0             | 1 | Up Conversion     |  |  |
| 1             | 0 | Down Conversion   |  |  |





### Universal Voltage-Level Converter: Has Minimal Overhead

| Designs          | Technology (nm) | Power        | Delay    | Conversion                     | Design<br>Approach                    |
|------------------|-----------------|--------------|----------|--------------------------------|---------------------------------------|
| Ishihara<br>2004 | 130nm           |              | 127 ps   | Level-up and down              | Level<br>converting flip<br>flops     |
| Yu<br>2001       | 350nm           | 220.57<br>μW |          | Level-up                       | SDCVS                                 |
| Sadeghi<br>2006  | 100nm           | 10 μW        | 1 ns     | Level-up                       | Pass transistor and Keeper transistor |
| ULC              | 90 nm           | 12.26<br>μW  | 113.8 ps | Level-<br>up/down and<br>block | All conversion types and Programmable |



### Nanoelectronics Variability?

 Discrepancy between the chip parameters --Design Time versus Actual Post Fabrication



Source: <a href="http://apcmag.com/picture-gallery-how-a-chip-is-made.htm">http://apcmag.com/picture-gallery-how-a-chip-is-made.htm</a>





#### **Process Variation: Parameters**



Source-drain resistance different for different chips in a same die.





Gate-to-source and gate-to-drain overlap capacitance is different for different chips in a same die.

> Source: Bernstein et al., IBM J. Res. & Dev., July/Sep 2006.





### **Process Variation: The Impact**

- Yield Loss
- Reliability Issue
- Higher Cost

### **Process Variation: Sources**



Sophisticated Lithography





### **Process Variations: Solution**





# Process Variations Aware Optimization: Key Idea



Power / Performance Values





## DfV: Statistical Nano-CMOS RTL Optimization for Power

### Nano-CMOS RTL Statistical Optimization



# Statistical RTL Optimization: Formulation

Minimize:
$$FoM_{Total}^{DFG}ig(\mu_I^{DFG}, \sigma_I^{DFG}ig)$$

Subjected to (Resource/Time Constraints):

Allocated 
$$(FU_{k,i}) \le \text{Available}(FU_{k,i}), \forall \text{cycle } c$$

$$D_{CP}^{DFG}\left(\mu_{D}^{DFG},\sigma_{D}^{DFG}\right) \leq D_{Con}\left(\mu_{D}^{Con},\sigma_{D}^{Con}\right)$$

# Statistical RTL Optimization: Results on DSP Benchmarks





(For ARF Benchmark)

(For BPF Benchmark)





Source: http://www.ami.ac.uk/courses/ami4202\_mdesign/u02/







## One of the Key Issues: Time/Effort

The simulation time for a Phase-Locked-Loop (PLL) lock on a full-blown (RCLK) parasitic netlist is of the order of many days! → High NRE cost.



**PLL** 



- How fast can design space exploration be performed?
- How fast can layout generation and optimization be performed?

S NanoSyrtem Derign laboratory

UNT

## Standard Design Flow – Very Slow



- Standard design flow requires multiple manual iterations on the back-end layout to achieve parasitic closure between front-end circuit and back-end layout.
- Longer design cycle time.
- Error prone design.
- Higher non-recurrent cost.
  - Difficult to handle nanoscale challenges.

A green light to greatness.



# Automatic Optimization on Netlist (Faster than manual flow; still slow)



- Automatic iteration over netlist improves design optimization.
- Still needs multiple simulations using analog simulator (SPICE).
- SPICE is slow.

# Two Tier Speed Up Through Metamodel



Optimization over Metamodels 300x Speedup





## **Proposed Flow: Key Perspective**

- Novel design and optimization methodology that will produce robust AMS-SoC components using ultra-fast automatic iterations over metamodels (instead of netlist) and two manual layout steps.
- The methodology easily accommodates multidimensional challenges, reduces design cycle time, improves circuit yield, and reduces chip cost.

## Metamodel-Based Design Flow





### **Metamodels: Selected Types**



## Metamodels: Polynomial Example



**Components** 

Actual
Circuit
(SPICE
netlist) of
AMS-SoC



Statistical Sampling



Polynomial Function Fitting

$$f(W_n, W_p) = 7.94 \times 10^9 + 1.1 \times 10^{16} W_n + 1.28 \times 10^{15} W_p.$$





# Sampling Techniques: 45nm Ring Oscillator Circuit (5000 points)





## **Polynomial Metamodels**

- The generated sample data can be fitted in many ways to generate a metamodel.
- The choice of fitting algorithm can affect the accuracy of the metamodel.
- A simple metamodel has the following form:

$$y = \sum_{i,j=0}^{k} \left( \alpha_{ij} \times x_1^i \times x_2^j \right)$$

y is the response being modeled (e.g. frequency),  $x = [W_n, W_p]$  is the vector of variables and  $\alpha_{ii}$  are the coefficients.

## Metamodel: Polynomial Comparison

| Case Study                              | Polynomial | $\mu$ error | $\sigma$ error |
|-----------------------------------------|------------|-------------|----------------|
| Circuits                                | Order      | (in MHz)    | (in MHz)       |
|                                         | 1          | 571.0       | 286.7          |
| Ring Oscillator                         | 2          | 195.4       | 78.1           |
| King Oscinator                          | 3          | 37.2        | 18.0           |
| 45nm CMOS                               | 4          | 20.0        | 10.7           |
| Target f: 10GHz                         | 5          | 17.1        | 9.6            |
|                                         | 1          | 42.3        | 40.1           |
| LC-VCO                                  | 2          | 39.4        | 37.8           |
| LC-VCO                                  | 3          | 35.4        | 33.9           |
| 180nm CMOS                              | 4          | 30.5        | 29.3           |
| <b>Target</b> <i>f</i> : 2.7 <b>GHz</b> | 5          | 26.5        | 25.2           |

#### Ring oscillator – Order 1

### $f(W_n, W_p) = 7.94 \times 10^9 + 1.1 \times 10^{16} W_n$

$$+1.28 \times 10^{15} W_p$$
.

#### LC-VCO - Order 1

$$f(W_n, W_p) = 2.38 \times 10^9 - 3.49 \times 10^{12} W_n$$
$$-6.66 \times 10^{12} W_p.$$





# Artificial Neural Network (ANN) Metamodeling

- Feed-forward dual layer (FFDL) ANNs are considered.
- FFDL ANN created for each FoM:
  - Nonlinear hidden layer functions are considered each varying hidden neurons 1-20:

$$b_j(v_j) = \tanh(\lambda v_j)$$



# Metamodel Comparison: Polynomial Vs Nonpolynomial

Nonpolynomial (Artificial Neural Network) is more suitable large circuits.

180nm CMOS PLL with Target Specs: f = 2.7GHz, P = 3.9mW,  $8.5\mu s$ .

| Figures-of-<br>Merits (FoM) | Polynomial # of Coefficients RMSE |           | Nonpolynomial<br>(Neural Network) |
|-----------------------------|-----------------------------------|-----------|-----------------------------------|
| Frequency                   | 48                                | 77.96 MHz | 48MHz                             |
| Power                       | 50                                | 2.6mW     | 0.29mW                            |
| Locking Time                | 56                                | 1.9µs     | 1.2µs                             |

- 56% increase in accuracy over polynomial metamodels.
- On average 3.2% error over golden design surface.



## Selected Algorithms for Optimization over Metamodels



### **Exhaustive Search: 45nm RO**



- Searches over two parameter space.
- Parameters incremented over specified steps.

### DOE Assisted Tabu Search: 45nm RO



Search space is recursively divided into rectangles and each time the rectangle with superior result is selected.





# Comparison of the Running Time of Heuristic Algorithms: 45nm RO



- Optimization without metamodels: the tabu search optimization is faster by ~1000× than the exhaustive search and ~4× faster than the simulated annealing optimization.
- Optimization with metamodels: the simulated annealing optimization is faster by ~1000× than the exhaustive search and ~6× faster than the tabu search optimization.

## Case Study Circuit: 180nm PLL



Block diagram of a PLL.

- PLL circuit is characterized for frequency, power, vertical and horizontal jitter (for simple phase noise), and locking time.
- Metamodels are created for each FoM from same sample set.



PLL for 180nm.

### PLL: Polynomial Metamodels ...

- ➤ The number of coefficients corresponding to the order of the generated metamodel for settling time.
- ➤ This means that the model is over fitted, therefore for the metamodel that represents settling time, a polynomial order of 4 will be used.



## **Artificial Bee-Colony: Overview**

1. Initial food sources are produced for all worker bees.

#### 2. Do

- Each worker bee goes to a food source and evaluates its nectar amount.
- Each onlooker bee watches the dance of worker bees and chooses one of their sources depending on the dances and evaluates its nectar amount.
- 3) Determine abandoned food sources and replace with the new food sources discovered by scout bees.
- 4) Best food source determined so far is recorded.
- 3. While (requirements are met)

A food source  $\rightarrow$  a solution; A position of a food source  $\rightarrow$  a design variable set; Nectar amount  $\rightarrow$  Quality of a solution; Number of worker bees  $\rightarrow$  number of quality solutions.

## PLL: ABC over Poly. Metamodels

## PLL parameters with constraints and optimized values.

| and optimized values |             |        |         |            |
|----------------------|-------------|--------|---------|------------|
| Circuit              | Parameter   | Min    | Max     | Optimal    |
|                      |             | (m)    | (m)     | Value (m)  |
|                      | $W_{ppd1}$  | 400n   | $2\mu$  | $1.66\mu$  |
|                      | $W_{npd1}$  | 400n   | $2\mu$  | $1.11\mu$  |
| Phase Detector       | $W_{ppd2}$  | 400n   | $2\mu$  | 784n       |
| Thase Detector       | $W_{npd2}$  | 400n   | $2\mu$  | 689n       |
|                      | $W_{ppd3}$  | 400n   | $2\mu$  | $1.54\mu$  |
|                      | $W_{npd3}$  | 400n   | $2\mu$  | 737n       |
|                      | $W_{nCP1}$  | 400n   | $2\mu$  | $1.24\mu$  |
| Charge Pump          | $W_{pCP1}$  | 400n   | $2\mu$  | $1.35\mu$  |
| Charge Fump          | $W_{nCP2}$  | $1\mu$ | $4\mu$  | $1.35\mu$  |
|                      | $W_{pCP2}$  | $1\mu$ | $4\mu$  | $2.88\mu$  |
| LC-VCO               | $W_{nLC}$   | $3\mu$ | $20\mu$ | $18.62\mu$ |
|                      | $W_{pLC}$   | $6\mu$ | $40\mu$ | $37.48\mu$ |
| Divider              | $W_{p1Div}$ | 400n   | $2\mu$  | $1.65\mu$  |
|                      | $W_{p2Div}$ | 400n   | $2\mu$  | $1.54\mu$  |
|                      | $W_{p3Div}$ | 400n   | $2\mu$  | $1.38\mu$  |
|                      | $W_{p4Div}$ | 400n   | $2\mu$  | $1.96\mu$  |
|                      | $W_{n1Div}$ | 400n   | $2\mu$  | $1.09\mu$  |
|                      | $W_{n2Div}$ | 400n   | $2\mu$  | $1.17\mu$  |
|                      | $W_{n3Div}$ | 400n   | $2\mu$  | $1.29\mu$  |
|                      | $W_{n4Div}$ | 400n   | $2\mu$  | $1.95\mu$  |
|                      | $W_{n5Div}$ | 400n   | $2\mu$  | 536n       |

- An exhaustive search of the design space of 21 parameters with 10 intervals per parameter requires 10<sup>21</sup> simulations.
- 10<sup>21</sup> SPICE simulations is slow; 10min per one.
- 10<sup>21</sup> simulations using polynomial metamodels is fast.
- Time savings: ≈10<sup>20</sup>×
  SPICE simulation time.



## PLL: ABC Optimization: Poly Vs ANN

#### **Optimization Results**

| FoM           | Poly. Metamodel | ANN Metamodel |
|---------------|-----------------|---------------|
| Average Power | 3.9 mW          | 3.9 mW        |
| Frequency     | 2.6909 GHz      | 2.7026 GHz    |

#### **Optimization Time Comparison**

| Algorithm                   | Circuit Netlist                                                          | Poly. Metamodel                      | ANN Metamodel                                        |
|-----------------------------|--------------------------------------------------------------------------|--------------------------------------|------------------------------------------------------|
| ABC (100 iterations)        | #bees(20) * 5 min * 100 iteration = 10,000 minutes = 7 days (worst case) | 5 mins                               | 0.12 mins                                            |
| <b>Metamodel Generation</b> | 0                                                                        | 11 hours for LHS<br>+ 1 min creation | 11 hours for LHS + 10mins training and verification. |

### **Conclusions**

- Nanoelectronic circuits and systems have multifold design challenges.
- DfX is design for X Power, Variability, Cost …
- DfP:
  - 35% of total energy in USA is consumed by electronics.
  - Battery is an critical constraint for portable systems.
  - Energy efficient hardware, software at the same time better battery design needed for effective solutions.
- DfV: Reduce the variability in chip and enhance yield.
- DfC: Reduce NRE, yield, and time to market.
- Much more research is needed for combined consideration of issues, e.g. X Variability and Cost

### References

- S. P. Mohanty and E. Kougianos, "Incorporating Manufacturing Process Variation Awareness in Fast Design Optimization of Nanoscale CMOS VCOs", *IEEE Transactions on Semiconductor Manufacturing*, Accepted on 12 Nov 2013, DOI: <a href="http://libproxy.library.unt.edu:2083/10.1109/TSM.2013.2291112">http://libproxy.library.unt.edu:2083/10.1109/TSM.2013.2291112</a>.
- S. P. Mohanty, M. Gomathisankaran, and E. Kougianos, "Variability-Aware Architecture Level Optimization Techniques for Robust Nanoscale Chip Design", *Elsevier International Journal on Computers and Electrical Engineering (IJCEE)*, 2014, DOI: <a href="http://dx.doi.org/10.1016/j.compeleceng.2013.11.026">http://dx.doi.org/10.1016/j.compeleceng.2013.11.026</a>.
- O. Okobiah, S. P. Mohanty, and E. Kougianos, "Geostatistical-Inspired Fast Layout Optimization of a Nano-CMOS Thermal Sensor", *IET Circuits, Devices & Systems (CDS)*, Volume 7, No. 5, September 2013, pp. 253--262.
- O. Garitselov, S. P. Mohanty, and E. Kougianos, "A Comparative Study of Metamodels for Fast and Accurate Simulation of Nano-CMOS Circuits", *IEEE Trans. on Semiconductor Manufacturing*, Vol. 25, No. 1, Feb 2012, pp. 26--36.
- S. P. Mohanty, E. Kougianos, and O. Okobiah, "Optimal Design of a Dual-Oxide Nano-CMOS Universal Level Converter for Multi-*Vdd* SoCs", *Springer Analog Integrated Circuits* & *Signal Processing J.*, Vol. 72, No. 2, 2012, pp. 451--467.
- O. Garitselov, S. P. Mohanty, and E. Kougianos, "Accurate Polynomial Metamodeling-Based Ultra-Fast Bee Colony Optimization of a Nano-CMOS PLL", *Journal of Low Power Electronics*, Vol. 8, No. 3, June 2012, pp. 317--328.

