# Low-Power Image Watermarking Chip Design

Saraju P. Mohanty
VLSI Design and CAD Laboratory (VDCL)
Dept of Computer Science and Engineering
University of North Texas.

Email: smohanty@cse.unt.edu



#### **Outline of the Talk**

- Introduction
- Why Low Power ?
- Related Works
- Watermarking Algorithms
- Proposed Architecture
- Prototype Chip Implementation
- Conclusions



### Nano-CMOS Based Systems







Radio















Energy costs, Battery life, Cooling costs



Low Power Design















Almost the entire electronic appliance industry today driven by CMOS technology.



### Power Dissipation in Nano-CMOS Based Systems

**Total Power Dissipation Static Dissipation Dynamic Dissipation** Capacitive Switching Current Sub-threshold Leakage Transient Gate Leakage Gate Leakage Short Circuit Current → Reverse-biased diode Leakage



#### Our Low-Power Design Approach

Adjust the frequency and supply voltage in a co-coordinated manner to reduce dynamic power while maintaining performance.



### Digital Watermarking ?





Whose is it this?
How to know?
What's the solution
of this ownership
problem?

Solution: "WATERMARKING"

Researcher



### Digital Watermarking ?

Digital watermarking is a process for embedding data (watermark) into a multimedia object for its copyright protection and authentication.

#### <u>Types</u>

- Visible and Invisible
- Spatial/DCT/ Wavelet
- Robust and Fragile



### An Watermarked Image (from IBM)





#### Another Example .....



### Watermarking: General Framework

- Encoder: Inserts the watermark into the host image
- Decoder: Decodes or extracts the watermark from image
- Comparator: Verifies if extracted watermark matches with the inserted one

#### Why Hardware Implementation?

Hardware implementations of watermarking algorithms necessary for various reasons:

- Easy integration with multimedia hardware, such as digital camera, camcorder, etc.
- Low power
- High performance
- Reliable
- Real time applications



### Previous Work (Hardware based Watermarking)

| Work                | Type                 | Target<br>Object | Domain  | Techn ology | Chip<br>Power |
|---------------------|----------------------|------------------|---------|-------------|---------------|
| Strycker,<br>2000   | Invisible<br>Robust  | Video            | Spatial | NA          | NA            |
| Tsai and<br>Lu 2001 | Invisible<br>Robust  | Video            | DCT     | 0.35µ       | 62.8<br>mW    |
| Mathai,<br>2003     | Invisible<br>Robust  | Image            | Wavelet | 0.18µ       | NA            |
| Garimella,<br>2003  | Invisible<br>Fragile | Image            | Spatial | 0.13µ       | 37.6<br>µW    |
| 2003                | Visible              | Image            | Spatial | Sever       | Many          |
| 2007                | Invisible            | Video            | DCT     | al          |               |



#### **Previous Work: Summary**

- Many software implementations of watermarking algorithms.
- Only few hardware implementations.
- Just one hardware implementation in frequency domain which can insert only invisible watermark.
- All other implementations in spatial domain.

### Highlights of our Designed Chip

- DCT domain Implementation
- First to insert both visible and / or invisible watermark
- First Low Power Design for watermarking using dual voltage and dual frequency
- Uses Pipelined / Parallelization for better performance

#### Watermarking through JPEG Encoder





### Watermarking in Digital Camera





### **Invisible Algorithm Implemented**

- 1. Divide the original image into blocks.
- 2. Calculate the DCT coefficients of all the image blocks.
- 3. Generate random numbers to use as watermark.
- 4. Consider the three largest AC-DCT coefficients of an image block for watermark insertion.

Reference: I.J. Cox, et. al., "Secure Spread Spectrum Watermarking for Multimedia", IEEE transactions on Image Processing, 1997.



#### Visible Algorithm Implemented

- 1. Divide Original and watermark image into blocks.
- 2. Calculate DCT coefficients of all the blocks.
- 3. Find the edge blocks in the original image.
- 4. Find the local and global statistics of original image using DC-DCT and AC-DCT coefficients.
- 5. The mean of DC-DCT coefficients and mean and the variance of AC-DCT coefficients are useful.
- 6. Calculate the Scaling and embedding factors.
- 7. Add the original image DCT coefficients and the watermark DCT coefficients block by block.

Reference: S. P. Mohanty, and et. al., "A DCT Domain Visible Watermarking Technique for Images", *Proc. of the IEEE ICME* 2000.



#### The Proposed Architecture





### The Proposed Architecture (Different Modules)

- **DCT Module**: Calculates the DCT coefficients.
- Edge Detection Module: Determines edge blocks.
- Perceptual Analyzer Module: Determines perceptually significant regions using original image statistics.
- Scaling and Embedding Factor Module: Determines the scaling and embedding factors for visible watermark insertion.
- Watermark Insertion Module: Inserts the watermark
- Random Number Generator Module: Generates random numbers.



## The Proposed Architecture (DCT Module)



DCT Module

- Computes DCT of a 4x4 block
- Both DCTX and DCTY modules have similar architectures

## The Proposed Architecture (DCT Module)

DCT module implements the following equations:

```
x00=((in00*c00) + (in01*c01) + (in02*c02) + (in03*c03))

x10=((in10*c00) + (in11*c01) + (in12*c02) + (in13*c03))

x20=((in20*c00) + (in21*c01) + (in22*c02) + (in23*c03))

x30=((in30*c00) + (in31*c01) + (in32*c02) + (in33*c03))
```

#### NOTE:

- ■in<sub>ij</sub> input, c<sub>ij</sub> constants, x<sub>ij</sub> coefficients
- 16 multiplications and 12 additions involved



## The Proposed Architecture (DCT Module)





### The Proposed Architecture (Edge Detection Module)



**Edge Detection Module** 

•Compute from AC-DCT: 
$$\mu_{AC_{Ik}} = \frac{1}{N_B * N_B} \sum_{m} \sum_{n} |c_{Ik}(m,n)|$$

•Find the maximum:  $|\mu_{AC_{\text{Im }ax}}| = Max \ (|\mu_{AC_{lk}}|)$ 

•Declare edge block if: 
$$\mid \mu_{AC_{Ik}} \mid > \tau \mid \mu_{AC_{\operatorname{Im} ax}} \mid$$



# The Proposed Architecture (Edge Detection Module)



## The Proposed Architecture (Perceptual Analyzer Module)



Perceptual Analyzer Module



## The Proposed Architecture (Perceptual Analyzer Module)





### The Proposed Architecture (Scaling and Embedding Factor Module)



Scaling and Embedding Factor Module



### The Proposed Architecture (Scaling and Embedding Factor Module)





### The Proposed Architecture (Invisible Insertion Module)



Invisible insertion process:

$$c_{I_{W_k}} = c_{I_k} + \alpha \omega_k$$

### The Proposed Architecture (Visible Insertion Module)



#### Visible insertion process:

$$c_{I_{W_k}} = \alpha_k c_{I_k} + \beta_k c_{W_k}$$

### The Proposed Architecture: Pipeline and Parallelism





## The Proposed Architecture: Dual Voltage and Frequency

Normal Voltage

Lower Voltage

DCT\_X

Level Converter

DCT\_Y

Slower
Clock

Edge Detection Module
Perceptual Analyzer
Module
Scaling and Embedding
Factor Module
Visible Watermark
Insertion
Invisible Watermark
Insertion

Normal Clock



#### **Dual Voltage: Level Converters**

- Level converters required to step up the low voltage to high voltage.
- Traditional level converter: Differential Cascode Voltage Switch (DCVS).
- In this work: Single Supply Level Converters faster, better power consumption, needs single voltage supply only.

Reference: R.Puri et. al., "Pushing ASIC performance in a power envelope" in the Proceedings of the Design Automation Conference, 2003, pp. 788-793



#### Layout and Schematic of SSLV







#### **Prototype Implementation: Flow**

- Algorithm selection and MATLAB/Simulink simulation and verification.
- FPGA based prototyping
- Standard cell implementation
  - Top-down hierarchical approach

# Prototype Chip Implementation: Design Flow





#### **Prototype Chip Implementation:**

| Tools                         | Purpose                       |
|-------------------------------|-------------------------------|
| Cadence NClaunch              | VHDL simulator                |
| Synopsys Design Analyzer      | Verilog netlist generation    |
| Cadence Silicon<br>Ensemble   | Layout, Placement and routing |
| Cadence Virtuose tool         | Layout Editing                |
| Cadence Abstract<br>Generator | Abstract generation           |
| Synopsys Nanosim              | Power and delay calculations  |

Standard Cell Design Style adopted. Standard Cells obtained from Virginia Tech. Technology: TSMC 0.25 µm



## Design Flow Example: VHDL

```
File Edit Window Tools Syntax
                                                                               Help
entity edm3 is
port (clk, reset, vdd1, enable, vss1 : in std_logic;
      AnMax, An : in std_logic_vector(16 downto 0);
      edge_block, done, write_edm3 : out std_logic;
countout : out std_logic_vector(7 downto 0)
end entity edm3;
architecture behav of edm3 is
component counter8 is
port (clk : in std_logic;
      reset, vdd1, enable : in std_logic;
      q : inout std_logic_vector(7 downto 0)
end component counters:
signal An_max, AnMax_by_2 : std_logic_vector(16 downto 0);
signal count_out : std_logic_vector(7 downto 0);
signal At, A, B, Bt, count, edgeblock, en_count, write, proces,
tempedgeblock : std_logic;
begin
COUNTER: counter8 port map (clk=>clk, reset=>reset, enable=>write, vdd1=>vdd1.
q=>count_out);
countout <= count_out;
counting: process(count_out) is
          begin
          if (count_out="11111111") then
          count <= 1:
          else
          count <= '0';
          end if;
          end process:
```

## Design Flow Example: Synthesized Verilog Netlist

```
File Edit Window Tools Syntax
                                                                                                      <u>H</u>elp
module edm3 ( clk, reset, vdd1, enable, vss1, AnMax, An, edge_block, done,
    write_edm3, countout );
input [16:0] An;
input [16:0] AnMax;
output [7:0] countout;
input clk, reset, vdd1, enable, vss1;
output edge_block, done, write_edm3;
    wire Bt, n_133, At88, n_134, At, \"<"-return148 , count, Bt100, n195, n196,</pre>
         n197, n198, n199, n200, n201, n202, n203, n204, n205, n206,
         \*cell*78/U5/Z_0;
    counter8 COUNTER ( .clk(clk), .reset(reset), .vdd1(vdd1), .enable(
         write_edm3), .q(countout) );
    and3_1 U39 ( .ip1(n195), .ip2(n_133), .ip3(n196), .op(\*cell*78/U5/Z_0 )
    or2_1 U40 ( .ip1(n197), .ip2(n198), .op(At88) );
    and2_1 U41 ( .ip1(\"<"-return148 ), .ip2(n_133), .op(n_134) );
    inv_1 U42 ( .ip(reset), .op(n_133) );
    nand2_1 U43 ( .ip1(n199), .ip2(n200), .op(Bt100) );
    nor2_1 U44 ( .ip1(n201), .ip2(n199), .op(n197) );
    nor2_1 U45 ( .ip1(n198), .ip2(n201), .op(n202) );
    nor3_1 U46 ( .ip1(n203), .ip2(n204), .ip3(n205), .op(count) );
    nand2_1 U47 ( .ip1(At), .ip2(n_133), .op(n199) );
mux2_2 U48 ( .ip1(n202), .ip2(n198), .s(n199), .op(write_edm3) );
mux2_2 U49 ( .ip1(n201), .ip2(enable), .s(n199), .op(n195) );
    and2_1 U50 ( .ip1(countout[1]), .ip2(countout[3]), .op(n206) );
nand3_1 U51 ( .ip1(countout[0]), .ip2(countout[2]), .ip3(n200), .op(n205)
    nand2_1 U52 ( .ip1(countout[5]), .ip2(countout[4]), .op(n203) );
    nand2_1 U53 ( .ip1(countout[7]), .ip2(countout[6]), .op(n204) );
    inv_1 U54 ( .ip(count), .op(n201) );
    nand2_1 U55 ( .ip1(Bt), .ip2(n_133), .op(n196) );
    inv_1 U56 ( .ip(n196), .op(n198) );
nand2_1 U57 ( .ip1(enable), .ip2(n196), .op(n200) );
    drp_2 At_reg ( .ck(clk), .ip(At88), .rb(n_133), .q(At) );
    lp_2 = dgeblock_reg (.ck(\evel^*78/U5/Z_0), .ip(n_134), .q(edge_block));
    dp_2 done_reg ( .ck(clk), .ip(count), .q(done) );
    drp_2 Bt_reg ( .ck(clk), .ip(Bt100), .rb(n_133), .q(Bt) );
    edm3_DW01_cmp2_17_0 \lt_100/lt/lt ( .A(An), .B({vss1, AnMax[16],
         AnMax[15], AnMax[14], AnMax[13], AnMax[12], AnMax[11], AnMax[10],
        AnMax[9], AnMax[8], AnMax[7], AnMax[6], AnMax[5], AnMax[4], AnMax[3], AnMax[2], AnMax[1]}), .LEQ(1'b0), .TC(1'b0), .LT_LE(\"<"-return148)
endmodule
```



### Design Flow Example:

**Placement and Routing** 





#### Design Flow Example: Layout





## Design Flow Example:

#### **Abstract Generation**





#### **Overall Prototype Chip: Layout**





#### **Prototype Chip: Floor plan**

Image DCT\_X Module

Watermark DCT\_X Module

Visible Insertion Module

Invisible Insertion Module

Image DCT\_Y Module Watermark
DCT\_Y
Module

Edge Detection Module Perceptual Analyzer Module Scaling and Embedding Factor Module



#### Prototype Chip: Pin diagram



### **Prototype Chip: Statistics**

Technology: TSMC 0.25 µ

Total Area: 16.2 sq mm

**Dual Clocks: 280 MHz and 70 MHz** 

**Dual Voltages: 2.5V and 1.5V** 

No. of Transistors: 1.4 million

Power (dual voltage and frequency): 0.3 mW

Chip (single voltage and frequency): 1.9 mW



#### **Conclusion and Future Work**

- Dual Voltage, Dual frequency watermarking chip was developed.
- Invisible / Visible insertion
- Pipelined and Parallelized architecture for performance.
- Frequency domain implementation for real time audio and video watermarking.
- Real time watermark extraction.
- Need more robust watermarking algorithms.



#### **About University of North Texas**

- Located near Dallas, TX
- Public University: approx. 35K students
- Department of CSE:
  - Ph.D.
  - M.S.
  - B.S.



#### References

Please visit:

http://www.cse.unt.edu/~smohanty

Contact at: smohanty@unt.edu