# Drive Strength and LVCMOS Based Dynamic Power Reduction of ALU on FPGA

Bishwajeet Pandey, Mayank Kumar, Nirmal Roberts, and Manisha Pattanaik Atal Bihari Vajpayee-Indian Institute of Information Technology and Management (ABV-IIITM), Gwalior, Madhya Pradesh, India Email: nirmal@iiitm.ac.in, manishapattanaik@iiitm.ac.in

Abstract— In this paper, we achieve 35.9% dynamic power reduction and 36.11% dynamic current reduction by shfting drive strength from 24mA to 2mA on LVCMOS25 when 2.5 V is output driver supply voltage. and 1.0V is input supply voltage. we achieve 30% dynamic power reduction and 21.7% dynamic current reduction by shfting drive strength from 24mA to 2mA on LVCMOS12 when 1.2V is output driver supply voltage. and 1.0V is input supply voltage. Virtex-6 XC6VLX75TFF484-1 FPGA device family is used to verify drive strength based dynamic power and current reduction. The ALU designed using Verilog HDL coding, implemented using Xilinx Integrated Software Environment (ISE) and validated using iSim, XPower, iMPACT and ChipScope. Dynamic power and dynamic current both are directly proportional to drive strength is our another observation. In view of power consumption, DCI is highest power consumer in between all used IO Standard in virtex-6 FPGA and LVCMOS is the best IO standard in term of power consumption.

*Index Terms*—Dynamic Power Reduction, Low Voltage, Dynamic Current Reduction, Drive Strength, IOStandard, Pull Type, Input Supply Voltage

## I. INTRODUCTION

Low Voltage Complementary Metal Oxide Semiconductor (LVCMOS) is a widely used switching standard implemented in CMOS transistors. This standard is defined by JEDEC. It is available in 4 iostandard. These are: LVCMOS12, LVCMOS15, LVCMOS18 and LVCMOS25. It supports 2/4/6/8/12/16/24mA Drive Strength for LVCMOS25. It supports 2/4/6/8mA Drive Strength for LVCMOS12.



Figure. 1. LVCMOS unidirectional termination [Source: Ref3]



Figure. 2. LVCMOS bidirectional termination [Source: Ref3]

# II. LOW POWER ALU

This paper deals with the application of LVCMOS in low power ALU design. We apply different drive strength for different LVCMOS in order to find the best one drive strength and LVCMOS for our low power ALU design.

# A. ALU as a Base Circuit.

Arithmetic Logic Unit is an integral part of any processor design. It performs arithmetic, Logic and Unary function on value stored in accumulator, register array, operand register and fetched value from external memory. In an 8-bit processor, if we mask last four bit of operation code to perform arithmetic and logic operations. With 4bit we can support a maximum of 16 operations. We are taking first eight as unary operation and rest as arithmetic and logic operations.

| Functions of Arithmetic and Logic Unit                                                            |      |                                                 |      |  |
|---------------------------------------------------------------------------------------------------|------|-------------------------------------------------|------|--|
| Unary                                                                                             | Sel  | Arithmetic & Logic                              | Sel  |  |
| Clear                                                                                             | 0000 | Add                                             | 1000 |  |
| Hold B                                                                                            | 0001 | Subtract                                        | 1001 |  |
| Complement B                                                                                      | 0010 | Add Carry                                       | 1010 |  |
| Hold A                                                                                            | 0011 | Subtract Borrow                                 | 1011 |  |
| Complement A                                                                                      | 0100 | Logical AND                                     | 1100 |  |
| Decrement A                                                                                       | 0101 | Logical OR                                      | 1101 |  |
| Increment A                                                                                       | 0110 | Logical XOR                                     | 1110 |  |
| Shift Left A                                                                                      | 0111 | Logical XNOR                                    | 1111 |  |
| All Flags are unaffected in<br>execution of Unary Function.<br>Except Carry Flag set for<br>Shift |      | All Flag set in every operation from 1000-1111. |      |  |

Manuscript received Dec 18, 2012; revised Feb 21, 2013

This ALU take 2 inputs: A, B. A is 8-bit value fetch from external memory and B is 8-bit value from operand register. Sel is first four bits of 8-bit operation code of processor.

| Flags | Zero | Carry | Sign | Parity |
|-------|------|-------|------|--------|
| are:  |      |       |      |        |

## B. Techniques to Achieve Low Power Design

Minimum Drive Strength, Efficient Mapping, Minimal LUT, BRAM, LVCMOS based Iostandard, LVDCI based Iostandard, Clock Gate and Power Gate are the current low power design techniques in VLSI circuit design. In this paper, we try to apply minimum drive strength on LVCMOS in our implementation of ALU in order to reduce power dissipation.

## III. RESULTS

Below is top-level schematic of ALU, on which we experiment the effect of drive strength and IO standard in power consumption.



Figure. 3. Low Power ALU top level schematic

## C. Dynamic Power is Proportional to Drive Strength

Dynamic power is a sum total of clock power, logic power, signal power and IO power. Clock power and Signal power is not affected by drive strength. Drive Strength mainly affect IOs Power that contribute in increase/decrease of Dynamic Power.

| LVCMOS25 on 40nm Virtex-6 FPGA |        |        |        |        |
|--------------------------------|--------|--------|--------|--------|
| Drive                          | 2mA    | 6mA    | 12mA   | 24mA   |
| $Strength \rightarrow$         |        |        |        |        |
| Clock Power                    | 5mW    | 5mW    | 5mW    | 5mW    |
| Signal Power                   | 1mW    | 1mW    | 1mW    | 1mW    |
| IOs Power                      | 44mW   | 62mW   | 64mW   | 72mW   |
| Dynamic                        | 50mW   | 68mW   | 70mW   | 78mW   |
| Power                          |        |        |        |        |
| Leakage                        | 1294mW | 1294mW | 1294mW | 1294mW |
| Power                          |        |        |        |        |
| Total Power                    | 1344mW | 1362mW | 1364mW | 1372mW |

Here, if drive strength is 24mA, then dynamic power consumption of ALU is 78mW. In the same way, if drive strength is 2mA then the dynamic power consumption is 50mW. So, we observe that dynamic power is directly proportional to drive strength.

Here, if drive strength is 8mA, then dynamic power consumption of ALU is 26mW. In the same way, if drive strength is 2mA then the dynamic power consumption is 20mW. So, we observe that dynamic power is directly

proportional to drive strength. Migration from LVCMOS25 to LVCMOS12 reduces 2mW leakage power.

| LVCMOS12 on 40nm Virtex-6 FPGA |        |        |        |        |
|--------------------------------|--------|--------|--------|--------|
| Drive                          | 2mA    | 4mA    | 6mA    | 8mA    |
| $Strength \rightarrow$         |        |        |        |        |
| Clock                          | 5mW    | 5mW    | 5mW    | 5mW    |
| Power                          |        |        |        |        |
| Signal                         | 1mW    | 1mW    | 1mW    | 1mW    |
| Power                          |        |        |        |        |
| IOs Power                      | 14mW   | 15mW   | 19mW   | 20mW   |
| Dynamic                        | 20mW   | 21mW   | 25mW   | 26mW   |
| Power                          |        |        |        |        |
| Leakage                        | 1292mW | 1292mW | 1292mW | 1292mW |
| Power                          |        |        |        |        |
| Total Power                    | 1312mW | 1313mW | 1317mW | 1318mW |
|                                |        |        |        |        |

## D. Dynamic Current is Proportional to Drive Strengt(DS)

| Dynamic Current by LVCMOS25 on Virtex-6 FPGA |              |        |        |         |             |
|----------------------------------------------|--------------|--------|--------|---------|-------------|
| Sup<br>Sum                                   | oply<br>mary | DS-2mA | DS-6mA | DS-16mA | DS-<br>24mA |
| V <sub>ccint</sub>                           | 1.0 V        | 8mA    | 8mA    | 8mA     | 8mA         |
| V <sub>ccaux</sub>                           | 2.5 V        | 0mA    | 1mA    | 1mA     | 2mA         |
| V <sub>cc025</sub>                           | 1.2 V        | 16mA   | 23mA   | 24mA    | 26mA        |

According to above table, in LVCMOS25, dynamic current is directly proportional to drive strength. Dynamic current is maximum for 24mA drive strength and is lowest at 2mA drive strength. Current produced by Vccint is not affected by drive strength. Only dynamic current produced by Vccaux and Vcc025 changes according to changes in drive strength.

| Dynamic Current by LVCMOS12 on Virtex-6 FPGA |       |         |                |      |  |
|----------------------------------------------|-------|---------|----------------|------|--|
| Supply                                       |       | DS(2mA) | S(2mA) DS(6mA) |      |  |
| Summary                                      |       |         |                |      |  |
| V <sub>ccint</sub>                           | 1.0 V | 8mA     | 8mA            | 8mA  |  |
|                                              |       |         |                |      |  |
| V <sub>ccaux</sub>                           | 2.5 V | 0mA     | 1mA            | 1mA  |  |
| V <sub>cc012</sub>                           | 1.2 V | 10mA    | 12mA           | 14mA |  |

According to above table, in LVCMOS12, dynamic current is directly proportional to drive strength. Dynamic current is maximum (i.e. 23mA) for 8mA drive strength and is lowest i.e. 18mA at 2mA drive strength. Current produced by Vccint is not affected by drive strength. Only dynamic current produced by Vccaux and Vcc012 changes according to changes in drive strength.

| LVCMOS on 40-nm FPGA |            |             |             |  |  |
|----------------------|------------|-------------|-------------|--|--|
|                      | LVCMOS12   | LVCMOS18    | LVCMOS25    |  |  |
| Vcco→                | 1.2V, DS=2 | 1.8V, DS=16 | 2.5V, DS=24 |  |  |
| Clock<br>Power       | 5mW        | 4mW         | 5mW         |  |  |
| Signal<br>Power      | 1mW        | 0mW         | 1mW         |  |  |
| IOs<br>Power         | 14mW       | 59mW        | 72mW        |  |  |
| Dynamic<br>Power     | 20mW       | 63mW        | 78mW        |  |  |
| Total<br>Power       | 1312mW     | 1359mW      | 1372mW      |  |  |

## E. Different Low Voltage CMOS based on Voltage

Power dissipation depends on Iostandard. An input or output standard needs a specific Vcco voltage. We use the UCF file to select any LVCMOS standard that is compatible with the Vcco of the bank. Obviously the UCF file cannot supply a different voltage to the bank. Total Power is directly proportional to Vcco voltage. Power dissipation is highest at 2.5V and is lowest at 1.2V.

## F. RTL Resource Estimation

PlanAhead provide resource estimation statistics based on the compiled RTL design. Here, RTL resource estimation has 73 out of 46560 LUT and 34 out of 240 IO on chip resource of Virtex-6 FPGA.



Figure. 4. RTL resource statistics

#### G. Pre-Synthesis Power Estimation

Power estimation and analysis is gaining important as FPGAs increase in logic capacity and performance by migrating to 28-nm technology. Total power in an FPGA is the sum of two components: One is Static power and the other is Dynamic power. Using XCF, PCF, Virtex-6 Default operating condition and UCF timing constraint of 1000.0 MHz on clock net 'CLK' in power estimation, RTL dynamic power estimation is 41mw. Total power estimation on ALU is: 1044 mW.



Figure. 5. Pre-synthesis & Pre-implementation power consumption

#### H. Synthesis Estimation

In the process of converting a higher-level form of a design into a lower-level implementation of ALU, ISE uses Register, BUFG, LUT and IO as a on chip resource of FPGA.





#### I. Netlist Estimation

Netlist usually convey connectivity information and provide nothing more than instances, nets, and perhaps some attributes. The netlist of ALU describes the connectivity of this design. It uses 1% Register, 1% LUT, 15% IO and 4% Global clock Buffer.



Figure. 7. Low Power ALU Netlist Resource Usage Statistics

This net list has no demand for Block Memory, DSP48, Clock Manager, Tri-Mode Ethernet MAC, PCI Express, and Gigabit Transceiver.

### J. Implemented Utilization

The summary of the device utilization after mapping and place and route of implementation phase are as following:



Figure. 8. Post implementation of resource usage statistics of ALU

After mapping and place and routing, the device usage is the highest. It uses OLOGICE1 (Flip-flop packed into Output Logic) and Slice (elementary programmable logic block in Xilinx FPGAs) other than device utilized by netlist. This is using 1% register, 1% LUT, 1% slice, 14% IO, 2% output logic and 3% global clock buffer available in virtex-6 device family FPGA. Then, 1.523W dynamic power, 1.327W quiescent power and total 2.851W total power consumed.

| IO Standard on 40-nm FPGA |        |        |          |  |
|---------------------------|--------|--------|----------|--|
|                           | HSTL   | DCI    | LVCMOS25 |  |
| Clock Power               | 7mW    | 7mW    | 5mW      |  |
| Signal Power              | 0mW    | 0mW    | 1mW      |  |
| IOs Power                 | 168mW  | 433mW  | 72mW     |  |
| Dynamic Power             | 175mW  | 450mW  | 78mW     |  |
| Total Power               | 1470mW | 1741mW | 1372mW   |  |

## K. LVCMOS in Compare to other IOStandard:

LVCMOS is more power efficient than any IO Standard use in this Virtex-6 FPGA. In view of power consumption, DCI is highest power consumer in between all used IO Standard in virtex-6 FPGA and LVCMOS is the best IO standard in term of power consumption.

## IV. CONCLUSION

If LVCMOS25 is taken as IOSTANDARD, we saved 35.9% dynamic power and 36.11% dynamic power current by shfting drive strength from 24mA to 2mA. Migration from LVCMOS25 to LVCMOS12 reduces 2mW leakage power. If LVCMOS12 is taken as IOSTANDARD, we saved 30% dynamic power and 21.7% dynamic power current by shfting drive strength from 24mA to 2mA. LVCMOS is more power efficient than other IO standard likes HSTL or DCI supported in this Virtex-6 240T FPGA. In view of power consumption, DCI is highest power consumer and LVCMOS has the lowest power consumption. For any low power VLSI design, it is required to use LVCMOS as IO standard in ucf file.

## V. FUTURE SCOPE

By migrating from iostandard LVCMOS to HSLVDCI or HSTL or HSTL\_DCI or LVDCI or SSTL, we can explore new option to reduce dynamic power dissipation. Power dissipation is sure to vary with different iostandard and the logic behind that. This FPGA is Virtex-6 based on 40-nm technology. There is open area to explore this design in latest FPGA Virtex-7. Virtex-7 is based on 28nm model technology which makes us capable to take the benefit of deeper sub micron circuit.

#### ACKNOWLEDGMENT

We are grateful to our director Prof. S.G Deshmukh for his motivation for research oriented works. Thanks and appreciation to the helpful people at ABV-IIITM, and CDAC Noida for their support. I would also thank my Institution and my faculty members without whom this work would have been a distant reality. I also extend my heartfelt thanks to my family and well wishers

#### REFERENCES

- D. Kumar, P. Kumar and M. Pattanaik, "Performance analysis of 90nm look Up Table(LUT) for low power applications," in *Proc.* 13<sup>th</sup> Euromicro Conference On Digital System Design Architectures, Methods and Tools, Lille, France, 1-3 September, 2010.
- [2] S. Ortega-Cisneros, J. J. Raygoza-Panduro, J. S. Muro, and E. Boemo, "Rapid prototyping of a self-timed ALU with FPGAs," *International Conference on Reconfigurable Computing and* FPGAs, 2012, pp. 26-33.
- Xilinx FPGA User Guide [Online]. Available: http://www.xilinx.com/support/documentation/user\_guides/ug070. pdf
- [4] D. Sharma and M. Pattanaik, "A novel high speed 32 bit hybrid carry propogate adder with eficient hardware resource in FPGA," *International Conference on Advances in Computing and Communication*, National Institute of Technology, Hamirpur, 2011.
- [5] V. Khorasani, B. V. Vahdat, M. Mortazavi, "Design & implementation of floating point alu on a fpga processor," *IEEE International Conference on Computing Electronics and Electrical Technologies (DSD)*, 2012, pp. 772-776.
- [6] R. Agarwal, D. Sharma, and M. Pattanaik, "Design and analysis of novel high speed carry save multipliers in FPGA," *International Conference on Issues and Challenges in Networking Intelligence and Computing Technologies*, Krishna Institute of Engg. And Technology, Gaziabad, 2-3 September 2011.
- [7] B. S. Ryu, J. S. Yi, K. Y. Lee, and T. W. Cho, "A design of low power 16-Bit ALU," in *Proc IEEE Tencon Conference*, 1999. pp. 868-871.
- [8] S. Birla, N. K. Shukla, K. Rathi, R. K. Singh, and M. Pattanaik, "Analysis of 8T sram cell at various process corners at 65nm process technology," *Circuit & Systems*, USA, vol. 2, no. 4, pp. 326-329, Oct 2011.
- [9] J. P. Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, "Clock gating and clock enable for FPGA power reduction," *VIII Southern Conference on Programmable Logic (SPL)*, 2012, pp. 1-5.
- [10] B. Pandey and M. Pattanaik, "Mapping based ALU design with efficinet HDL coding," in Proc. 5<sup>th</sup> International Conference on Computer Research Development, Ho-Ch-Minh, Vietnam, 23-24 Fabruary 2013.