**Springer Analog Integrated Circuits and Signal Processing Journal manuscript No.** (will be inserted by the editor)

## **Optimal Design of a Dual-Oxide Nano-CMOS Universal Level Converter for Multi-***V*<sub>*dd*</sub>**SoCs**

Saraju P. Mohanty · Elias Kougianos · Oghenekarho Okobiah

Received: 31 Oct 2011 / Revised: 09 Mar 2012 / Accepted: date

Abstract Multiple supply voltage based  $(V_{dd})$  Systems on Chip (SoCs) allow designers to implement large, complex systems for diverse applications. However, the need for level conversion imposes penalties and often results in non-optimal SoCs. Thus, the level converters are overhead for the circuits in which they are being used. If power consumption of the level converters continues to grow, then they will fail to serve the very purpose for which they were built. This paper proposes the power (leakage)-delay optimization of a DC to DC Universal voltage Level Converter (ULC) using a dual-Tox (dual-oxide CMOS or DOXCMOS) technique and exploiting transistor geometry. The proposed ULC is a novel circuit proposed here for the first time and performs level-up, level-down conversion, or blocking of the input signal, based on the requirements. The paper further proposes a novel design methodology accompanied by an optimization algorithm for the parasitic-aware power-delay optimization of the ULC circuit. The entire design has been implemented in 90 nm CMOS up to layout, including DRC/LVS and parasitic (RC) re-simulation, and was subjected to process variation of 10 process parameters. The optimal ULC with 20 transistors yields power savings of 87.5%, delay improvement of 87.3% and area savings of 21% over the baseline design. It is a robust design performing a stable voltage level conversion for voltages as low as 0.6 V (50% of  $V_{dd}$ ) and loads varying from 10 fF to 200 fF.

**Keywords** Low-power design, nanoscale CMOS, subthreshold leakage, gate-oxide leakage, dual-oxide techniques, multi- $V_{dd}$  based circuits, system-on-chip (SoC)

Computer Science and Engineering, University of North Texas, Denton, TX 76203. Tel.: +1 940-565-3276 Fax: +1 940-565-2799 E-mail: saraju.mohanty@unt.edu

Electrical Engineering Technology, University of North Texas, Denton, TX 76203. Tel.: +1 940-891-6708 Fax: +1 940-565-2666 E-mail: elias.kougianos@unt.edu

Computer Science and Engineering, University of North Texas, Denton, TX 76203. E-mail: OghenekarhoOkobiah@my.unt.edu

## **1** Introduction

The major components of power dissipation in a CMOS circuit can be identified as the switching power dissipation, the short-circuit power dissipation, and the leakage power from various sources. Each one of these dissipation sources is dependent on supply voltage; some linearly, some quadratically and some even exponentially (gate tunneling). For example, switching power has a quadratic dependence on power-supply voltage ( $V_{dd}$ ). Power management is one of the most critical design constraints in integrated circuit (IC) design. Presently available nanoscale CMOS (nano-CMOS) processes deliver greater silicon performance and integration, but battery technology is still lagging. To compensate for this, new design techniques are being developed to address the need for low-power silicon.

Dynamic voltage scaling (DVS) is a power management technique, where the supply voltage of a system is increased or decreased, depending upon circumstances. DVS used to decrease voltage is known as undervolting and DVS used to increase voltage is known as overvolting. Undervolting is done in order to conserve power, particularly in laptops and other mobile devices, where energy comes from a battery and thus is limited. Overvolting is done in order to increase performance.

In a multiple supply voltage (MSV) design, the circuit is partitioned into voltage islands or voltage domains. Each island operates at a different supply voltage depending on its timing characteristics. Since lowering the supply voltage reduces the speed at which the transistors can switch, the designer must be selective in determining which parts of the design should have the voltage reduced. The blocks that are timing critical are grouped in one island that operates at the nominal supply voltage. The non-critical blocks are aggregated into another island, with the voltage scaled down.

A challenge with multiple-voltage based circuits is the need to translate the voltage levels for signals that interface between voltage domains. This is accomplished by inserting level converters (or shifters), which are special cells that perform voltage translation [11]. Essentially, there are two types of voltage conversion: level-up and level-down. A levelup converter is used as an interface where low- $V_{dd}$  cells drive high- $V_{dd}$  cells in order to reduce the short-circuit power dissipation [34]. One application is the dual- $V_{dd}$  FPGA fabric [21]. Level down conversions are not commonly used but are required for switching power reduction, where non-critical blocks of the circuit operate at a lower power supply voltage [11]. In the standby mode of a circuit, no active switching occurs and all power dissipation is due to standby leakage. A simple power-saving scheme could be to shut off unused blocks in the standby mode. Thus, we propose a voltage level converter that can perform all these functions: step-up, step-down, and blocking of signals (signal gating). We call it a Universal voltage Level Converter (ULC) which can find utility in the above applications. Moreover, there is a need for the design of efficient level converters that have minimal area and power overhead. The ULC could be used as a standard library cell which could relieve the designer from some of the burden of optimizing power dissipation.

The rest of this paper is organized in the following manner. Section 2 highlights the novel contributions of this paper. Section 3 discusses prior related research. Section 4 presents the leakage, power, and delay models and discusses the proposed optimal design flow. The 24-and 20-transistor design of the ULC is presented in Section 5. The parasitic-aware power and delay optimal design of the ULC is discussed in Section 6. Thorough characterization of the ULC is discussed in Section 7. A brief discussion of possible applications of the ULC is provided in Section 8. The paper is concluded in Section 9 with a discussion of related research and applications.

2

## 2 Novel Contributions of this Paper

The novel and manifold contributions of this paper to the state of the art are as follows:

- 1. The design of a key component, the Universal Level Converter (ULC), to be used for power management in multi- $V_{dd}$  based SoCs is presented.
- 2. The paper introduces a design methodology for power and delay tradeoff of the ULC circuit. A conjugate gradient based optimization algorithm is discussed for this purpose.
- 3. The ULC is capable of up-conversion, down-conversion and blocking of signals, and hence can be programmed for power management and reconfigurability.
- 4. The physical design (layout) of the ULC is presented for state of the art nanoscale CMOS technology (90 nm).
- 5. The ULC is subjected to parasitic-aware power and delay optimization using dual-oxide (dual- $T_{ox}$  or DOXCMOS) technology and the power and delay optimal physical designs are presented. It may be noted that dual- $T_{ox}$  has been used for digital circuits in the current literature; however, in this paper this technique is explored for mixed-signal circuits such as the ULC.
- 6. The entire characterization of the ULC has been performed using a parasitic extracted netlist of the physical design, hence the simulation results are of *comparable accuracy* to the silicon results.
- 7. Process variation study of the ULC is presented considering 10 important parameters to analyze its robust operation against variability and fluctuations.

#### **3 Related Prior Research**

Prior research on dual- $T_{ox}$  techniques focus on either synthesis techniques [13,25] or architecture level optimization [20,16], and are applied to digital circuits only. The application of such optimization techniques to mixed-signal circuits at transistor level is a unique contribution of this paper.

Selected related prior research works on level converters are presented in a taxonomy chart in Fig. 1. These existing works are diverse in terms of functionality, CMOS technology node, and circuit features; implementation of these circuits for the purpose of comparison are involved engineering tasks. The level converters are classified under functionality, level-up conversion, level-down conversion and level-up/level-down conversion. They are further classified under design technique as either multi-threshold or feedback based designs.



Fig. 1 A taxonomy of related level converter designs.

Two common techniques of level converter design are feedback based and multi-threshold voltage based topologies [14]. Examples of different level converter designs based on feedback techniques are presented in [11, 15, 33, 34, 18]. The performance of converters solely based on feedback circuitry, however, is significantly affected due to the increased response time from the feedback circuits [32]. Feedback based circuits also incur increased power consumption due to significant short circuit current. In [11], level-up converters and level down converters in flip-flops have been used to minimize energy and delay.

In [30], only short-circuit power dissipation is included. In [15], new level converting circuits that consume 8 - 50% less energy compared to traditional techniques are proposed. In [22], a ULC using 32 nm high- $\kappa$ /metal-gate nano-CMOS technology with dual- $V_{th}$  is discussed, but no physical-design optimization is presented. In [33], a symmetrical dual cascode voltage switch (SDCVS) is proposed which achieves 50% reduction in short-circuit power and 60% speed increase. A level-up converter using a Dual Cascode Voltage Switch (DCVS) is also presented in [34]. In [4], a design is presented that uses thicker oxide to allow for a wider range of level up conversion (0.36 V to 1.32 V). In [18], dynamic level converters that combine level conversion with logic gates (LC-LG) are presented. This design improves the power delay product and is applicable to asynchronous level conversion. A level-down converter with differential input pair operation is presented in [12].

The design of level converters using multi-threshold based techniques present an advantage over feedback converters by significantly reducing short circuit current with the use of multi-threshold devices [32]. Selected level converters based on multi-threshold devices are presented in [32,27]. In [27], power dissipation is decreased by approximately 47% in comparison to conventional feedback based level converters.

The current paper is based on preliminary research presented in a conference publication of the authors [6]. This archival journal publications has the following additional materials included to substantially enhance the presentation, scope and applicability of the ULC: (1) The optimal design flow is formally presented as an algorithm. (2) The design process with 24-transistor and 20-transistor circuits is presented. (3) The optimization algorithm is discussed in detail. (4) The ULC characterization is expanded. (5) The physical design is shown for 4 different ULC circuits. (6) The conclusions are expanded with discussion on prior research and applications.

## 4 The Proposed Methodology for Parasitic-Aware Power (Leakage) and Delay Optimal ULC Design

A high level representation of the ULC is shown in Fig. 2. The ULC is driven by an input voltage signal called  $V_{in}$ , two control signals S1 and S0, two supply voltages  $V_{ddl}$  ( $V_{dd}$  low),  $V_{ddh}$  ( $V_{dd}$  high), and provides an output voltage signal  $V_{out}$ . It may be noted that  $V_{in}$ , S1 and S0 are user inputs, whereas  $V_{ddh}$  and  $V_{ddl}$  are designer parameters. The values of the control signals determine the preformed functionality and depending on them, the input voltage  $V_{in}$  is transformed to the output voltage  $V_{out}$ . The ULC performs the functions defined in Table 1.

## 4.1 Leakage and Power Models for ULC

Aggressive scaling of oxide thickness [3] has resulted in significant increase in gate-oxide leakage in both active and standby modes of operation [23]. Scaling of threshold voltages with successive technology nodes has caused an alarming increase in subthreshold leakage



Fig. 2 High level representation of the universal voltage level converter.

Table 1 Signal table for functionality of the proposed ULC.

| Select Signal |    |                   |
|---------------|----|-------------------|
| SO            | S1 | Type of Operation |
| 0             | 0  | Block Signal      |
| 0             | 1  | Up Conversion     |
| 1             | 0  | Down Conversion   |

[19]. In nano-CMOS, gate-oxide leakage grows faster than subthreshold leakage since oxide thicknesses are scaled at a much faster rate than supply or threshold voltages [3]. Gate-oxide leakage is an issue when the transistor is in either the on or off state [23], whereas subthreshold leakage is an issue only when the transistor is off.

## 4.1.1 Gate-Oxide Leakage Dissipation

Gate-oxide leakage arises due to tunneling current through the gate dielectric. The tunneling between substrate and gate can be either direct tunneling or Fowler-Nordheim tunneling. The tunneling probability of an electron is affected by the barrier height, structure and thickness of the oxide. For short channel and ultra-thin oxide transistors, Fowler-Nordheim tunneling is negligible. The gate-oxide leakage current which is due to direct tunneling current is expressed as follows [25, 29, 5]:

$$I_{\text{gate-oxide}} = \alpha WL \left(\frac{V_{ox}}{T_{ox}}\right)^2 \exp\left(\frac{-\beta \left(1 - \left(1 - \frac{V_{ox}}{\phi_{ox}}\right)^{\frac{3}{2}}\right)}{\left(\frac{V_{ox}}{T_{ox}}\right)}\right), \qquad (1)$$

where  $I_{\text{gate-oxide}}$  is the direct-tunneling current, W is the width of the transistor, L is the channel length,  $V_{ox}$  is the potential drop across the thin oxide,  $T_{ox}$  is the oxide thickness,  $\phi_{ox}$  is the barrier height for the tunneling particle (hole or electron), and  $\alpha$  and  $\beta$  are physical parameters. These parameters are described as follows:  $\alpha = q^3 / (16\pi^2 \hbar \phi_{ox})$  and  $\beta = (4\sqrt{2m_{eff}}\phi_{ox}^{1.5}) / (3\hbar q)$ ; q is electronic charge,  $\hbar$  is Plancks constant, and  $m_{eff}$  is the effective mass of the tunneling particle.

From Eqn. 1, it is evident that gate-oxide leakage is exponentially dependent on the change in  $T_{ox}$ . This motivates the *use of different oxide thicknesses* for gate-oxide leakage reduction. In other words, to fabricate some transistors with high  $T_{ox}$  and other transistors with low  $T_{ox}$ . The high- $T_{ox}$  transistors have less gate-oxide leakage dissipation but also have a larger delay (as evident from Eqn. 7) as compared to the low- $T_{ox}$  transistors.

5

## 4.1.2 Subthreshold Leakage Dissipation

The subthreshold leakage current through a device is modeled as follows [1, 17]:

$$I_{\text{subthreshold}} = \gamma \exp\left(\frac{V_{gs} - V_{th}}{\tau v_{therm}}\right) \left(1 - \exp\left(\frac{-V_{ds}}{v_{therm}}\right)\right),\tag{2}$$

where  $V_{th}$  is the threshold voltage,  $\tau$  is the subthreshold swing factor,  $V_{gs}$  is the gate-tosource voltage,  $V_{ds}$  is the drain-to-source voltage, and  $v_{therm}$  is the thermal voltage. The physical parameter  $\gamma$  is calculated using the following expression:

$$\gamma = \mu_0 \left(\frac{\varepsilon_{ox}}{T_{ox}}\right) \left(\frac{W}{L}\right) v_{therm}^2 e^{1.8},\tag{3}$$

where  $\mu_0$  is the zero-bias mobility and  $\mathcal{E}_{ox}$  is the oxide dielectric constant or relative permittivity.

As the subthreshold leakage current is exponentially dependent on the threshold voltage, increasing  $V_{th}$  would decrease subthreshold leakage current substantially. In addition, the threshold voltage  $V_{th}$  is affected by the gate-oxide thickness  $T_{ox}$  as described in the following expression [23]:

$$V_{th} = V_{fb} + 2\phi_F + \left(\frac{T_{ox}}{\varepsilon_{ox}}\right)\sqrt{2q\varepsilon_{Si}N_{sub}\left(2\phi_F + V_{bs}\right)},\tag{4}$$

where  $V_{fb}$  is the flat-band voltage,  $V_{bs}$  is the body bias,  $\gamma_{body}$  is the body effect coefficient, and  $\phi_F$  is the Fermi level.  $V_{Th}$  and  $T_{ox}$  have a linear relationship. Thus, an increase in  $T_{ox}$ increases  $V_{Th}$  and consequently decreases the subthreshold leakage.

#### 4.1.3 Dynamic Power Consumption

The dynamic power consumption of a circuit is given as follows [28]:

$$P_{\rm dynamic} = \eta C_L V_{dd}^2 f,\tag{5}$$

where  $\eta$  is the activity factor,  $C_L$  is the total capacitive load,  $V_{dd}$  is the supply voltage, and f is the clock frequency.

## 4.1.4 Total Power Dissipation of the ULC

The total power of the ULC circuit accounting for all major sources is quantified using the following expression:

$$P_{ULC} = P_{\text{gate-oxide}} + P_{\text{subthreshold}} + P_{\text{dynamic}}, \tag{6}$$

where  $P_{\text{gate-oxide}}$ ,  $P_{\text{subthreshold}}$ , and  $P_{\text{dynamic}}$  are calculated from Eqns. (1), (2), and (5), respectively.

#### 4.2 Delay Model of the ULC

The propagation delay of a CMOS circuit is described as follows [31]:

$$T_d = \kappa \left( \frac{C_L V_{dd}}{\mu \left( \frac{\varepsilon_{ox}}{T_{ox}} \right) \left( \frac{W}{L} \right) \left( V_{dd} - V_{th} \right)^{\alpha}} \right),\tag{7}$$

where  $\kappa$  is a technology dependant constant,  $\mu$  is the electron surface mobility, and  $\alpha$  is the velocity saturation index (varies from 1.4 to 2 for nano-CMOS). For the ULC that performs both up-conversion and down-conversion, the delay is defined as the maximum of the up-conversion or down-conversion delays and is given as follows:

$$T_{d_{UIC}} = \max\{T_{d_{up}}, T_{d_{down}}\},\tag{8}$$

where  $T_{d_{up}}$  and  $T_{d_{down}}$  are the level-up conversion and level-down conversion delays, respectively.

In summary, it is observed that  $T_{ox}$  and the geometry of the transistors (W and L) play a crucial role in determining the power (leakage) dissipation and delay of the ULC circuit [24,7,22] and need to be optimally chosen, which the proposed design flow and optimization algorithm perform.

#### 4.3 The Parasitic-Aware Power(Leakage)-Delay Optimal ULC Design Methodology

In order to obtain a power (including leakage) and delay optimal circuit this paper investigates the dual- $T_{ox}$  circuit level technique for the mixed-signal circuit ULC. In this approach, the power dissipations of individual transistors are identified and the circuit is subjected to DOXCMOS techniques to reduce the total power consumption while not compromising the delay. Power and delay optimization is performed by sizing the following parameters:  $T_{ox}$ and W. It may be noted that power, delay and area are three classic axes of optimization. In the automatic design optimization methodology, power and delay are estimated from simulation of the parameterized netlist. However, the area is calculated from the physical design (layout) of the circuit which is a *time consuming manual step*. The automatic methodology is formulated based on these facts. The proposed optimal design methodology for the ULC design is outlined in Algorithm 1.

For an estimation of the parasitics present in the ULC circuit, a physical design of the baseline ULC is performed and then the netlist with parasitics is extracted. The parasitic-aware netlist obtained from this design is then parameterized for  $T_{ox}$  and W. The parameters (W and  $T_{ox}$ ) are considered to be independent of each other. The length of the transistors (L) is fixed at the nominal process length to reduce the complexity of the optimization process. The widths of all transistors involved in level-up conversion and level-down conversion are considered for optimization. The average power consumed by each transistor in the ULC is recorded during a full functional simulation, considering all functions of the ULC (up-conversion, down-conversion, and blocking).

The transistors are then ranked in the order of their power dissipation, compared to the total power dissipation. This ranking is useful to identify the power-hungry transistors which collectively consume a designer-defined percentage of total power. For effective  $T_{ox}$  assignment, only these power-hungry devices are subjected to a high  $T_{ox}$ , and the other transistors operate at low (nominal)  $T_{ox}$ . In other words, the higher the power dissipation of a

#### Algorithm 1 Parasitic-aware, power (leakage) - delay optimized ULC design methodology

- 1: Design and simulate level-up conversion circuit.
- 2: Design and simulate level-down conversion circuit.
- 3: Stitch the partial circuits to design the overall ULC.
- 4: Perform functional simulation of ULC to verify level-up, level-down, and block operations.
- 5: Perform transistor reduction by eliminating any redundancy.
- 6: Perform physical design of the baseline ULC circuit that uses nominal L, W, Tox.
- 7: Perform characterization of the parasitic (RC) extracted baseline physical design.
- 8: Obtain parasitic-aware netlist from the parasitic extracted physical design.
- 9: Parameterize the netlist for transistor width (W) and gate-oxide thickness ( $T_{ox}$ ).
- 10: Rank the individual transistors of the ULC according to total power dissipation, including leakage.
- 11: Identify the power-hungry transistors which collectively consume a designer-defined percentage of total
- power. 12: Call the conjugate gradient algorithm (presented in Algorithm 2) to select optimal  $T_{ox}$  for power-hungry transistors and W for all transistors.
- 13: Assign high- $T_{ox}$  to the power-hungry transistors and new W to all transistors.
- 14: Perform final physical design with new transistor sizes.
- 15: Obtain the parasitic aware netlist of the final physical design of the ULC.
- 16: Perform parametric power and load characterization of the final physical design.
- 17: Perform process variation analysis to ensure robustness of the final physical design of ULC.

transistor, the thicker is the gate oxide assigned to it, and the higher the priority to get thicker oxide. As thick oxide assignment leads to a larger delay [17], a selective  $T_{ox}$  assignment is essential so that the delay of the ULC is not compromised. To perform the optimization of power and delay a conjugate-gradient based optimization algorithm is presented in Section 6 [7]. Once the optimal values of  $T_{ox}$  and W are obtained using this algorithm (presented in Algorithm 2), the final physical design of the ULC is performed and characterized. This is followed by a process variation analysis of the final ULC design to ensure that the voltage conversion process is process-variation tolerant.

### 5 Design of the Proposed Universal Voltage-Level Converter (ULC)

Fig. 3 presents the ULC with the logical floor plan and placement of the overall circuit. The circuit has been designed with built-in programmability using switches. In the rest of this section, the baseline and the reduced transistor designs of the ULC are presented, followed by the power-delay optimal design. This is followed by the full characterization of the optimized ULC design.

#### 5.1 Level-Up Conversion Circuit

The level-up conversion circuit is shown in Fig. 4. It converts a signal at a lower voltage  $(V_{ddl})$  level to a higher voltage  $(V_{ddh})$ . A cross coupled level converting (CCLC) circuit is used to achieve the up conversion functionality [10]. The CCLC circuit is an asynchronous level converter, which means that it can be inserted anywhere in the circuit where level conversion is necessary. Because of this flexibility, CCLC is one of the most commonly used designs [11]. In this circuit, there are two cross-coupled PMOS transistors that form the circuit load. Thus, when the output at one side is pulled low, the opposite PMOS transistor will be turned on. The output on that side will be pulled high. Below the PMOS load, there are two NMOS transistors that are controlled by the input signal  $V_{in}$ . The cross-coupled

8



9

Fig. 3 Schematic logical block diagram of the ULC.

PMOS pair acts as a differential pair and the two NMOS transistors help in the operation of this pair.



Fig. 4 Level-up conversion circuit with baseline sizes indicated for a 90 nm CMOS technology node.

#### 5.2 Level-Down Conversion Circuit

The level-down conversion circuit is responsible for converting the high voltage  $(V_{ddh})$  input signal to a lower voltage level  $(V_{ddl})$ . Fig. 5 shows the circuit diagram of a differential input level-down converter with an inverter as an output buffer. A differential pair level converter is used for achieving the functionality of level down conversion [10]. The differential input enables high speed use and a stable operation at low voltages, compared to conventional level-down converters consisting of several cascaded inverters [12]. The differential input also provides high noise immunity compared to conventional level-down converters.

#### 5.3 The 24-Transistor Baseline ULC Circuit Design

The 24-transistor baseline circuit design of the ULC is shown in Fig. 6. Switches constructed using transmission gates are cascaded to the up-level converter and down-converter. The



Fig. 5 Level-down converter circuit with baseline sizes indicated for a 90 nm CMOS technology node.

output of the level converters is controlled by the switches S0 and S1. The average power consumption of this 24-transistor baseline ULC design is 97.72  $\mu$ W for  $V_{ddl} = 1.0$  V and  $V_{ddh} = 1.2$  V. The  $V_{ddl}$  and  $V_{ddh}$  values are based on the mid-point range of [32].



Fig. 6 Transistor level realization of the 24-transistor design of the ULC. The 5 circled transistors collectively consume approximately 50% of the total power.

The physical design of the 24-transistor baseline ULC is shown in Fig. 7. The physical design of the ULC has been performed using a generic 90 nm salicide 1.2V/2.5V/1P/9M process design kit. In this layout it is necessary to supply both  $V_{ddh}$  and  $V_{ddl}$  to the cell. The two supply rails travel side-by-side to provide the two voltages. Such a layout does not comply with conventional power routing, but is more robust [11]. The post-parasitic re-simulations matched the simulation results of the schematic level simulation. The use of additional vias has been made in the design wherever possible to improve fault tolerance [9]. The metal lines have been spread out wherever possible to control the capacitance and crosstalk. This approach is followed for all the physical designs of the ULC presented in this paper.



Fig. 7 Physical design of the baseline 24-transistor ULC for 90 nm CMOS.

## 5.4 The 20-Transistor ULC Circuit Design

The number of transistors of the ULC is reduced to 20 by eliminating some transistors of the level-up conversion (Fig. 4) and level-down conversion (Fig. 5) circuitry. Also, the level-down conversion has been buffered, which is not the case in the baseline design. The 20-transistor circuit of the ULC is shown in Fig. 8. The 20-transistor physical design of the ULC is shown in Fig. 9. The functional simulation of the proposed ULC is shown in Fig. 10. The overall ULC circuit is tested for functionality and characterized through parametric, load, and power analysis. For  $V_{ddl} = 1.0$  V and  $V_{ddh} = 1.2$  V, the average power consumption of the 20-transistor nominal ULC is 73.70  $\mu$ W.



Fig. 8 Transistor level realization of the 20-transistor design of the ULC. The 4 circled transistors collectively consume 50% of the total power consumption.



Fig. 9 Physical design of the 20-transistor ULC for 90 nm CMOS.



**Fig. 10** Functional simulation of the 20-transistor ULC. This waveform follows the truth table given in Table 1. The sequence of operations is block, step-down, and step-up. The 24-transistor ULC also has the same functional function simulation, but is not presented for brevity.

## 6 Optimization of the Proposed ULC

In this section, the optimization step of Algorithm 1 is presented to select  $T_{ox}$  and W such that the ULC circuit is optimized. First the conjugate-gradient-based algorithm is discussed, and is followed by the optimal physical design.

## 6.1 Conjugate-Gradient Based Optimization Algorithm

The physical design of the unoptimized ULC circuit is performed and the parasitic-aware (RC) netlist is obtained. This netlist is parameterized for W and  $T_{ox}$  of the various transistors. To minimize power, the power-hungry transistors are identified by measuring the

power consumed by each transistor of the circuit. The power estimation includes dynamic power, subthreshold leakage, and gate-oxide leakage, as presented in Section 4. A conjugate-gradient based algorithm (presented in Algorithm 2) is used to obtain the optimal  $T_{ox}$  and W values for the ULC circuit.

The conjugate-gradient method is an algorithm for the numerical solution of systems of linear equations whose matrix is symmetric and positive-definite. The main advantages of the conjugate gradient method are its low memory requirements, and its convergence speed [8]. This benefits the optimization process for mixed-signal circuits such as the ULC which may have very complex netlists when extracted with parasitics.

Algorithm 2 Conjugate-Gradient algorithm for DOXCMOS-based power and delay optimization of ULC.

- 1: **Input:** Parasitic-extracted netlist of the unoptimized ULC layout with nominal L, W, and  $T_{ox}$ , Target Objective Set  $F_{target} = [P_{ULC}, T_{d_{ULC}}]$ , Termination Criterion S, Design Variable Set  $D = [T_{oxNMOS}, T_{oxPMOS}, W_{PMOSdown}, W_{PMOSdown}, W_{NMOSdown}, W_{NMOSup}]$ , Lower Design Constraint  $C_{lower}$ , Upper Design Constraint  $C_{upper}$ .
- 2: **Output:**  $F_{optimized}$  and  $D_{optimal}$  for specified stopping criterion  $S \le \varepsilon$  (where  $\varepsilon$  is designer defined error margin) and resulting optimal ULC.
- 3: Perform first iteration with initial guess of  $D = D_0$ .
- 4: while  $(C_{lower} < D < C_{upper})$  do
- Using conjugate gradients generate new design guesses  $D^*$  in the range from  $(D \Delta D)$  to 5:  $(D + \Delta D)$ , based on design error margin by simultaneously varying design parameters as: (a)  $(T_{ox} - \Delta T_{ox})$  to  $(T_{ox} + \Delta T_{ox})$  for power-hungry transistors, and (b)  $(W - \Delta W)$  to  $(W + \Delta W)$  for all transistors. Compute  $F(D^*) = [P_{ULC}, T_{d_{ULC}}].$ Compute  $S = F_{target} - F(D^*).$ 6: 7: 8: if  $(S \leq \varepsilon)$  then 9: {Stopping criterion is in the error margin.} 10: return  $D_{optimal} = D^*$ . end if 11: 12: end while 13: Using  $D = D_{optimal}$ , design and simulate ULC.
- 14: Compute optimized objective set  $F_{optimized} = F(D_{optimal})$  for the ULC.

The inputs to the algorithm are comprised of the parasitic-extracted netlist, the target objective set  $F_{target}$  with a termination or stopping criterion (S), the design variable set (D) with its lower design constraint  $C_{lower}$  and upper design constraint  $C_{upper}$ . The design variable set (D) for the ULC are the following, which are varied for optimization:

- Power-hungry NMOS transistors T<sub>ox</sub> (T<sub>oxNMOS</sub>).
- Power-hungry PMOS transistors T<sub>ox</sub> (T<sub>oxPMOS</sub>).
- NMOS device width for down converter (W<sub>NMOSdown</sub>).
- PMOS device width for down converter ( $W_{PMOSdown}$ ).
- NMOS device width for up converter  $(W_{NMOSup})$ .
- PMOS device width for up converter  $(W_{PMOSup})$ .

The lower design constraint  $C_{lower}$  is  $(D - \Delta D)$  i.e.  $(T_{ox} - \Delta T_{ox})$  and  $(W - \Delta W)$ . The upper design constraint  $C_{upper}$  is  $(D + \Delta D)$  i.e.  $(T_{ox} + \Delta T_{ox})$  and  $(W + \Delta W)$ . The precision of these design parameters depends on the designer. The smaller the precision (or the smaller the steps of the increment) of the design parameters, the slower the algorithm converges. *S* is the stopping criterion for the optimization to terminate within  $\pm \varepsilon$  of the target objective

set  $(F_{target})$  (where  $\varepsilon$  is a designer specified error margin). *F* is the objective set which for the case of ULC is  $P_{ULC}$  and  $T_{d_{ULC}}$ .

The outputs of the algorithm are the optimized objective set  $F_{optimized}$  satisfying the termination criterion S ( $F_{optimized} = F_{target} \pm S$ ), and the optimal values of the design variable set  $D_{optimal}$  within the upper and lower design constraints. Once the optimal values of W and  $T_{ox}$ , are obtained the final layout is constructed.

The algorithm starts out with a guess of D ( $D_0$ ), and then iterates improving the guess each time until the target objective set  $F_{target}$  is met with the termination criterion S. For example,  $T_{ox}$  is varied between 110% to 200% of its nominal value in steps of 0.5 nm, where nominal  $T_{oxNMOS} = 2.33$  nm and nominal  $T_{oxPMOS} = 2.48$  nm. The width of the transistors is varied from 120 nm to 1  $\mu$ m, in steps of 10 nm. All transistors are assumed to have an effective length of 100 nm. For this experiment,  $\varepsilon$  is chosen as 5%.

### 6.2 The 24-Transistor ULC Circuit

For the 24-transistor ULC optimization, the optimal values of circuit parameters are given in Table 2. A preliminary power analysis of the ULC design identifies the circled transistors in Fig. 6 as the power hungry transistors. The optimized values of delay and power (as in section 5) at  $V_{ddl} = 1.0$  V,  $V_{ddh} = 1.2$  V are obtained as follows:

- Optimized average power ( $P_{ULC}$ ) = 17.67  $\mu$ W.
- Delay of up conversion  $(T_{d_{up}}) = 147.72$  ps.
- Delay of down conversion  $(T_{d_{down}}) = 136.5$  ps.
- Delay of ULC  $(T_{d_{ULC}}) = 147.72 \text{ ps.}$

The algorithm achieves 81.9% power savings and 86.6% delay savings as compared to the baseline design. The DOXCMOS optimal physical design is presented in Fig. 12. The conjugate-gradient optimization converged in 8 iterations, with each iteration typically lasting 4 minutes.

Table 2 Optimal parameter 24-Transistor DOXCMOS ULC.

| D            | $C_{lower}$ | $C_{upper}$ | $D_{optimal}$ |
|--------------|-------------|-------------|---------------|
| $T_{oxNMOS}$ | 2.563 nm    | 4.66 nm     | 2.667 nm      |
| $T_{oxPMOS}$ | 2.728 nm    | 4.96 nm     | 3.624 nm      |
| WPMOSup      | 120 nm      | 1 μm        | 220 nm        |
| WNMOSup      | 120 nm      | 1 μm        | 430 nm        |
| WPMOSdown    | 120 nm      | 1 μm        | 300 nm        |
| WNMOSdown    | 120 nm      | 1 μm        | 120 nm        |

#### 6.3 The 20-Transistor ULC Circuit

For the 20-transistor ULC optimization, the optimal values of the circuit parameters are given in Table 3. Fig. 8 shows the power hungry transistors for the design. The optimized values of the delay and power at  $V_{ddl} = 1.0 \text{ V}$ ,  $V_{ddh} = 1.2 \text{ V}$  are obtained as follows:

- Optimized average power ( $P_{ULC}$ ) = 12.26  $\mu$ W.



Fig. 11 DOXCMOS physical design of the power and delay optimal 24-transistor ULC for 90 nm CMOS.

- Delay of up conversion  $(T_{d_{up}}) = 113.8$  ps.
- Delay of down conversion  $(T_{d_{down}}) = 108.8$  ps. Delay of ULC  $(T_{d_{ULC}}) = 113.8$  ps.

The algorithm achieves 87.5% power savings and 89.5% delay savings as compared to the baseline design. The DOXCMOS optimal physical design is presented in Fig. 12. In this case, the optimization algorithm converged in 8 iterations each approximately lasting 4 minutes.

 Table 3 Optimal parameters for 20-Transistor DOXCMOS ULC.

| D                     | $C_{lower}$ | Cupper      | $D_{optimal}$ |
|-----------------------|-------------|-------------|---------------|
| ToxNMOS               | 2.563 nm    | 4.66 nm     | 2.617 nm      |
| $T_{oxPMOS}$          | 2.728 nm    | 4.96 nm     | 4.997 nm      |
| W <sub>PMOSup</sub>   | 120 nm      | $1 \ \mu m$ | 380 nm        |
| W <sub>NMOSup</sub>   | 120 nm      | $1 \ \mu m$ | 365 nm        |
| W <sub>PMOSdown</sub> | 120 nm      | 1 µm        | 120 nm        |
| W <sub>NMOSdown</sub> | 120 nm      | 1 µm        | 550 nm        |

#### 6.4 Discussion of the results

The power and delay results of various versions of the ULC circuit are summarized in Table 4. As evident from the table, the optimization approach results in significant reduction in both power (including leakage) and delay. In the case of the 24-transistor DOXCMOS ULC and the 20-transistor DOXCMOS ULC, it is observed that the power savings are more than 50%. This is due to judicious use of oxide thickness and transistor size through their optimal selection using the algorithm. As evident from Eqns. (1) - (8), both oxide thickness and transistor size affect the power dissipation and delay of the ULC. First the increase in



Fig. 12 DOXCMOS physical design of the power and delay optimal 20-transistor ULC for 90 nm CMOS.

oxide thickness ( $T_{oxNMOS}$  and  $T_{oxPMOS}$ ) reduces power dissipation. This reduction is further supplemented by the reduction in the power dissipation due to the decrease of the widths of transistors ( $W_{PMOSup}$ ,  $W_{NMOSup}$ ,  $W_{PMOSdown}$ , and  $W_{NMOSdown}$ ).

Table 4 Results of optimization for different ULC circuits.

| ULC<br>Circuits  | P <sub>ULC</sub> | P <sub>ULC</sub><br>Savings | T <sub>dULC</sub> | $T_{d_{ULC}}$<br>Reduction | Area                  | Area<br>Savings |
|------------------|------------------|-----------------------------|-------------------|----------------------------|-----------------------|-----------------|
| 24T baseline ULC | 97.72 μW         | _                           | 1058.0 ps         |                            | 146.5 μm <sup>2</sup> | -               |
| 20T baseline ULC | 73.70 μW         | 25.0 %                      | 894.0 ps          |                            | 118.6 μm <sup>2</sup> | 19.0 %          |
| 24T DOXCMOS ULC  | 17.67 μW         | 81.9 %                      | 142.7 ps          |                            | 141.5 μm <sup>2</sup> | 3.4 %           |
| 20T DOXCMOS ULC  | 12.26 μW         | 87.5 %                      | 113.8 ps          |                            | 115.7 μm <sup>2</sup> | 21.0 %          |

The results presented in Table 4 are due to different aspects: the number of transistors, the oxide thickness, and the device geometry. It may be noted that while all transistors contribute to the power (including leakage) dissipation, not all transistors contribute to the delay. The transistors that propagate signals in the critical path contribute to delay. It has also been observed that the results are different if the channel *L* is changed for constant  $(L/T_{ox})$  ratio or it is not changed [25]. The threshold voltage  $V_{th}$  which has exponential effect on subthreshold leakage is dependent on  $T_{ox}$ . The scenario is further involved with the sizing of *W*. Thus, in summary it is difficult to isolate the effect of one parameter on the overall power (including leakage) dissipation and delay.

A comparative perspective of the proposed ULC and design methodology along with selected related research works on level converters is presented in Table 5. These existing works are presented for a broad perspective, without direct comparison. Such a direct comparison may not be fair due to various differences among the designs, including topology and technology node.

| Research      | Technology          | Power          | Delay         | Conversion                        | Design                                   |
|---------------|---------------------|----------------|---------------|-----------------------------------|------------------------------------------|
|               |                     | Dissipation    |               | Туре                              | Approach                                 |
| Ishihara [11] | 130 nm              | -              | 127 ps        | Level-up and down                 | Level converting flip<br>flops           |
| Kulkarni [15] | 130 nm              | _              | _             | Level-up                          | DCVS and Keeper<br>transistor            |
| Yu [33]       | 350 nm              | $220.57~\mu W$ | -             | Level-up                          | SDCVS                                    |
| Sadeghi [30]  | 100 nm              | 10 µW          | 1 <i>ns</i>   | Level-up                          | Pass transistor and<br>Keeper transistor |
| Kanno [12]    | 140 nm              | -              | 5 ns          | Level-down                        | Differential input pair operation        |
| Yuan [34]     | 180 nm              | -              | -             | Level-up                          | DCVS                                     |
| Naik [27]     | 180 nm              | 158.92 μW      | 177 ps        | Level-up                          | Multi-threshold                          |
| Tawfik [32]   | 180 nm              | 4.53 μW        | 137 ps        | Level-up                          | Multi-threshold                          |
| Mohanty [22]  | 32 <i>nm</i> High-κ | 5 µW           | 1.6 <i>ns</i> | Level-up/Level-<br>down           | Multi-threshold                          |
| This Paper    | 90 nm               | 12.26 μW       | 113.8 ps      | Level-up,<br>Level-down,<br>Block | DOXCMOS and<br>Programmability           |

Table 5 Comparative perspective with selected related prior research.

#### 7 Characterization of the Parasitic-Aware Power (Leakage) and Delay Optimal ULC

In low-power designs, it is essential to consider various constraints concerning power consumption, lower-voltage level, and fan-out. The design of the proposed ULC mainly focuses on low-power multi-voltage circuit applications. Thus, it is essential to consider these design issues. The ULC circuit is characterized by performing three types of analysis: (1) Parametric, (2) Load, and (3) Power. The characterization results for only the 20-transistor power and delay optimal ULC are presented for brevity. These results are obtained from the parasitic netlist extracted from the physical design of the ULC, making them accurate and comparable to experimental or silicon results.

## 7.1 Parametric Analysis

Parametric analysis of the ULC is performed to demonstrate that the ULC provides stable output voltage, even when the input voltage fluctuates. In the parametric analysis, a transient analysis is performed wherein the output voltage is observed for a varying input voltage. For the down conversion parametric analysis, the values for the control signals (*S*1,*S*0) are kept at (0, 1). In this case,  $V_{in}$  is varied from 0.1 V to 1.2 V ( $V_{ddh}$ ) with an increasing step size of 0.1 V. The output plot for the level-down conversion parametric analysis is shown in Fig. 13(a). Thus, the ULC produce a stable output even for  $V_{in}$  as low as 0.6 V during the down conversion. However, the operation of the ULC as a down converter in the 0.6 V to 1.0 V range is inefficient because the voltage below ( $V_{ddl}$ ) leads to increased power consumption.



Fig. 13 Parametric analysis with input voltage sweep. The ULC provides stable (a) down-conversion and (b) up-conversion for minimum  $V_{in} = 0.6$  V.

For testing the level-up conversion of the ULC,  $V_{in}$  is varied from 0.1 V to 1.0 V ( $V_{ddl}$ ) with an increasing step of 0.1 V. The value of control signals (S1, S0) is kept at (1,0) to achieve the level up conversion functionality. The output signal is observed at the output terminal  $V_{out}$  of the ULC. The plot for the parametric analysis for up conversion is shown in Fig. 13(b). Thus, the ULC produces a stable output even for  $V_{in}$  as low as 0.6 V.

## 7.2 Load Analysis

When the ULC is used as an interface between two circuits or islands operating at different voltage levels, situations may arise where the output load of the level converting circuit is changing often. In another scenario, different ULCs placed in different parts of the SoC can be subjected to different loads. Thus, it is important that the ULC produces the desired results under varying load conditions. A load analysis on the ULC is performed where the output load capacitance of the circuit is varied and its effect on the output signal is observed. During the load analysis, the capacitive load (modeled with capacitors) at the output is varied from 10 fF to 200 fF. From this analysis it can be concluded that the ULC produces a stable output under varying load conditions. The output plot for the load analysis on the complete ULC is shown in Fig. 14.

#### 7.3 Power Analysis

Power analysis includes determining the total power consumed (Eqn.6) by the ULC circuit. During the power analysis, the total power (including leakage) dissipation of the ULC at three different loads of 10 fF, 45 fF and 90 fF is calculated. The operating conditions are:  $V_{ddh} = 1.2$  V,  $V_{ddl} = 1.0$  V. The instantaneous power plot of the level converter at a capacitive load of 45 fF is shown in Fig. 15. The power peaks are different at different times, as different parts of the circuit are operational depending on the functionality. The power estimation results are reported in Table 6. It is observed that there is not significant difference in the average power dissipation with varying capacitive loads. This is due to the small time

19



Fig. 14 Performance of the DOXCMOS ULC under varying output capacitive load (10 fF to 200 fF).

window for which the current flows which is evident from the instantaneous power plot and at the same time the leakage component of the circuit is not affected by the capacitive load. Also, the effects of the varying capacitive load are seen at the output, whereas all transistors also contribute to the dynamic power.

Table 6 Power consumption of the ULC for different loads.

| Capacitive Load (fF) | Average Power (including leakage) Dissipation ( $\mu$ W) |
|----------------------|----------------------------------------------------------|
| 10                   | 11.3                                                     |
| 45                   | 12.26                                                    |
| 90                   | 13.6                                                     |

# 7.4 Process-Variation Analysis of ULC to Demonstrate Reliable Voltage Conversion Under Fluctuations

Process variation is an important effect for nano-CMOS based circuit design which severely affects yield. Thus, we analyze the performance of the ULC against such variations in this section. The *10 process parameters considered for statistical process variation* are as follows: (1)  $T_{oxNMOS}$  (nm), (2)  $T_{oxPMOS}$  (nm), (3)  $L_{NMOS}$ : NMOS transistor channel length (nm), (4)  $L_{PMOS}$ : PMOS transistor channel length (nm), (5)  $W_{NMOSup}$ , (6)  $W_{PMOSup}$ , (7)  $W_{NMOSdown}$ , (8)  $W_{PMOSdown}$ , (9)  $N_{chn}$ : NMOS channel doping concentration ( $cm^{-3}$ ), (10)  $N_{chp}$ : PMOS channel doping concentration ( $cm^{-3}$ ). All these parameters are not independent. A correlation coefficient of 0.9 between  $T_{oxNMOS}$  and  $T_{oxPMOS}$  is assumed. Each of the input parameters is assumed to have a normal distribution, with mean ( $\mu$ ) as the nominal value specified in the technology file for the generic 90 nm process design kit used. The nominal values for the transistor length and widths are the same as the baseline design values shown in Fig. 8 The 3-sigma ( $3 - \sigma$ ) values are equal to 10% of the mean, where  $\sigma$  is



Fig. 15 Instantaneous power plot of the DOXCMOS ULC at a load capacitance of 45 fF. Average power consumed = 12.26 muW.

the standard deviation. For 1000 Monte-Carlo runs,  $V_{outdown}$  is observed to have a uniform distribution with  $\mu = 1 V$ , and  $\sigma = 202.53 \ \mu V$ .  $V_{outup}$  is observed to have a uniform distribution with  $\mu = 1.2 V$  and  $\sigma = 112.27 \ \mu V$ . Thus, the DOXCMOS based optimal ULC is process variation tolerant and can produce reliable voltage-levels even in the presence of process variations which is a major concern for nano-CMOS.

## 8 Applications of the Proposed ULC

The proposed ULC will be used in designs presented in [26], where the chip is implemented with dual voltage and dual frequency supplies. The ULC can be designed as a standard cell and added to existing standard cell libraries. The overall chip layout consists of two separate voltage islands, one low voltage and one high voltage. The ULC can be used for connecting these voltage islands. The delay caused by the ULC will be added to the clock period of the faster clock. The features provided in the ULC (up/down conversion and block) also make the interaction between these two islands more flexible. In addition, the same ULC can be used between an island and a power supply whose block operation can perform power gating to reduce standby leakage power.

An example of using the ULC is shown in Fig. 16. Consider any two different processing elements of the SoC, DC-1 operating at  $V_{ddl}$  and DC-2 operating at  $V_{ddh}$ . A step-down converted voltage from  $V_{ddh}$  to  $V_{ddl}$  is provided to DC-1 using a ULC. Also, a step-up conversion is required between DC-1 and DC-2 for signaling between two blocks. Now consider the case where DC-1 operates at  $V_{ddh}$ , and DC-2 operates at  $V_{ddl}$ . A step-down converted voltage is provided to DC-2. Here, a step-down conversion is required for the interaction between DC-2 and DC-1. The blocking feature of the ULC disconnects DC-1 from DC-2 to reduce static power dissipation when a processing unit is in an idle state.

The ULC functionality can find application in microprocessors such as the ARM1136, where a large number (3400) of level converters are required [2]. The ULC can be used as an interface where low- $V_{dd}$  cells drive high- $V_{dd}$  cells to reduce the short-circuit power



Fig. 16 Use of ULC in a multi- $V_{dd}$  SoC to reduce dynamic and standby power dissipation. ULC-1 and ULC-2 are responsible for standby power and ULC-3 is responsible for connecting two different processing elements of cores of an SoC operating are different supply voltages.

dissipation [34] in the case of a dual- $V_{dd}$  FPGA fabric [21]. In general, the ULC will be applicable to any multi- $V_{dd}$ , DVS operated AMS-SoC and multi- $V_{dd}$  chip.

## 9 Conclusions

In this paper, a dual- $T_{ox}$  (DOXCMOS) approach along with transistor geometry sizing is investigated to reduce the power and delay overhead of a Universal Voltage-level Converter (ULC). 24-transistor and a 20-transistor versions (with area reduction of 21%) of the ULC circuit are presented. The DOXCMOS physical designs of the ULC are presented for both versions based on optimal sizing. The ULC is characterized using parametric, load, and power analysis. It is observed that a stable output is obtained for voltages as low as 0.6 V and capacitive loads varying from 10 fF to 200 fF. The average power consumption of the final ULC is 12.26  $\mu$ W, which makes it suitable for low-power applications.

This paper presents a feedback based topology using dual-oxide devices. The dual-oxide technology improves the performance by using thicker oxide to reduce power consumption on non critical paths. The proposed design achieves power savings up to 87.5 % and delay reduction up to 89.5 % as compared to the original baseline design (refer Table 4). For fair comparison of power dissipation, the baseline design is used because the other designs presented are implemented in different technologies and are functionally different. Compared to the closest of the related research works [30], which is implemented at the 100 nm node, the proposed ULC is implemented at the 90 nm node, has much lower delay (i.e. higher performance) with similar amount of power dissipation while supporting more functionality.

**Acknowledgements** This research is supported in part by NSF awards CNS-0854182 and DUE-094262. The authors would like to acknowledge Dr. Dhruva Ghai, graduate of the University of North Texas. This archival journal paper is based on our shorter conference paper [6].

## References

- C. Hu and A. Niknejad and X. Xi and W. Liu and X. Jin and K. M. Cao and M. Dunga and J. Ou. (2005, Jul. 29). BSIM4 MOS Models, Release 4.5.0. [Online]. http://www.device.eecs.berkeley. edu/~bsim3/bsim4.html
- Collaboration Extends 90nm Low Power Design into Mainstream. http://techon.nikkeibp. co.jp/article/HONSHI/20061128/124594/

- 3. Semiconductor Industry Association, International Technology Roadmap for Semiconductors. http: //public.itrs.net
- Ashouei, M., Luijmes, H., Stuijt, J., Huisken, J.: Novel Wide Voltage Range Level Shifter for Near-Threshold Designs. In: Electronics, Circuits, and Systems (ICECS), 2010 17th IEEE International Conference on, pp. 285 –288 (2010)
- Choi, C.H., Oh, K.H., Goo, J.S., Yu, Z., Dutton, W.W.: Direct Tunneling Current Model for Circuit Simulation. In: Proceedings of International Electron Devices Meeting, pp. 735–738 (1999)
- Ghai, D., Mohanty, S.P., Kougianos, E.: A Dual-Oxide CMOS Universal Voltage Converter for Power Management in Multi-V<sub>DD</sub> SoCs. In: Quality Electronic Design, 2008. ISQED 2008. 9th International Symposium on, pp. 257–260 (2008)
- Ghai, D., Mohanty, S.P., Kougianos, E.: Parasitic Aware Process Variation Tolerant VCO Design. In: Quality Electronic Design, 2008. ISQED 2008. 9th International Symposium on, pp. 330–333 (2008)
- Hager, W.W., Zhang, H.: Algorithm 851: CG-DESCENT, A Conjugate Gradient Method with Guaranteed Descent. ACM Transactions on Mathematical Software 32(1), 113–137 (2006)
- Hawkins, C.: Little Vias Can Be Vicious. In: Proceedings of the 13th NASA Symposium on VLSI Design, p. 2.1 (2007)
- Heller, L.G., Griffin, W., Davis, J., Thoma, N.: Cascode Voltage Switch Logic: A Differential CMOS Logic Family. In: Solid-State Circuits Conference. Digest of Technical Papers. 1984 IEEE International, pp. 16–17 (1984)
- Ishihara, F., Sheikh, F., Nikolić, B.: Level Conversion for Dual Supply Systems. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 12(2), 185–195 (2004)
- Kanno, Y., Mizuno, H., Tanaka, K., Watanabe, T.: Level Converters with High Immunity to Power-Supply Bouncing for High-Speed Sub-1-V LSIs. In: VLSI Circuits, 2000. Digest of Technical Papers. 2000 Symposium on, pp. 202–203 (2000)
- Khouri, K.S., Jha, N.K.: Leakage Power Analysis and Reduction During Behavioral Synthesis. In: Computer Design, 2000. Proceedings. 2000 International Conference on, pp. 561–564 (2000)
- Kulkarni, S., Sylvester, D.: High Performance Level Conversion for Dual V<sub>DD</sub> design. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 12(9), 926 –936 (2004)
- Kulkarni, S.H., Sylvester, D.: Fast and Energy-Efficient Asynchronous Level Converters for Multi-VDD Design. In: SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip], pp. 169–172 (2003)
- Kumar, A., Anis, M.: Dual-Vt Design of FPGAs for Subthreshold Leakage Tolerance. In: Proceedings of the 7th International Symposium on Quality Electronic Design, pp. 735–740 (2006)
- Kumar, A., Anis, M.: Dual-Threshold CAD Framework for Subthreshold Leakage Power Aware FPGAs. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on 26(1), 53–66 (2007)
- Kuo, K.C., Chen, S.Q.: Low power level shifter and combined with logic gates. In: Circuits and Systems (APCCAS), 2010 IEEE Asia Pacific Conference on, pp. 324 –327 (2010)
- Kuroda, T., Fujita, T., Mita, S., Nagamatu, T., Yoshioka, S., Sano, F., Norishima, M., Murota, M., Kako, M., Kinugawa, M., Kakumu, M., Sakurai, T.: A 0.9V, 150MHz, 10mW, 4mm<sup>2</sup>, 2D Discrete Cosine Transform Core Processor with Variable Threshold (Vt) Scheme. Solid-State Circuits, IEEE Journal of **31**(11), 1770–1779 (1996)
- Lee, D., Blaauw, D., Sylvester, D.: Gate Oxide Leakage Current Analysis and Reduction for VLSI Circuits. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 12(2), 155–166 (2004)
- Li, F., Lin, Y., He, L., Cong, J.: Low Power FPGA using Pre-defined Dual-Vdd/Dual-Vt Fabrics. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on FPGAs, pp. 42–50 (2004)
- Mohanty, S.P., Ghai, D., Kougianos, E., Joshi, B.: A Universal Level Converter Towards the Realization of Energy Efficient Implantable Drug Delivery Nano-Electro-Mechanical-Systems. In: Quality of Electronic Design, 2009. ISQED 2009. Quality Electronic Design, pp. 673–679 (2009)
- Mohanty, S.P., Kougianos, E.: Modeling and Reduction of Gate Leakage During Behavioral Synthesis of NanoCMOS Circuits. In: Proceedings of the 19th International Conference on VLSI Design, pp. 83–88 (2006)
- Mohanty, S.P., Kougianos, E., Ghai, D., Patra, P.: Interdependency Study of Process and Design Parameter Scaling for Power Optimization of Nano-CMOS Circuits under Process Variation. In: Proceedings of the 16th ACM/IEEE International Workshop on Logic and Synthesis, pp. 207–213 (2007)
- Mohanty, S.P., Mukherjee, V., Velagapudi, R.: Analytical modeling and reduction of direct tunneling current during behavioral synthesis of nanometer CMOS circuits. In: Proceedings of the 14th ACM/IEEE International Workshop on Logic and Synthesis, pp. 249–256 (2005)
- Mohanty, S.P., Ranganathan, N., Balakrishnan, K.: A Dual Voltage-Frequency VLSI Chip for Image Watermarking in DCT Domain. Circuits and Systems II: Express Briefs, IEEE Transactions on 53(5), 394–398 (2006)

- 27. Naik, R.: Low Power and Optimal Delay Multi Threshold Voltage Level Converters. In: NORCHIP, 2010, pp. 1–4 (2010)
- Rabaey, J.M., Chandrakasan, A., Nikolić, B.: Digital integrated circuits:2nd edition. Prentice-Hall Publishers (2003)
- Roy, K., Mukhopadhyay, S., Meimand, H.M.: Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits. Proceedings of the IEEE 91(2), 305–327 (2003)
- Sadeghi, K., Emadi, M., Farbiz, F.: Using Level Restoring Method for Dual Supply Voltage. In: Proceedings of the 19th International Conference on VLSI Design, pp. 601–605 (2006)
- Sill, F., You, J., Timmerman, D.: Design of Mixed Gates for Leakage Reduction. In: Proceedings of the 17th Great Lakes Symposium on VLSI, pp. 263–268 (2007)
- Tawfik, S., Kursun, V.: Low Power and High Speed Multi Threshold Voltage Interface Circuits. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 17(5), 638–645 (2009)
- Yu, C.C., Wang, W.P., Liu, B.D.: A New Level Converter for Low Power Applications. In: Circuits and Systems, ISCAS. The 2001 IEEE International Symposium on, vol. 1, pp. 113–116 (2001)
- Yuan, C.P., Chen, Y.C.: A Voltage Level Converter Circuit Design with Low-Power Consumption. In: Proceedings of the 6th International Conference on ASIC, pp. 309–310 (2005)